CN111429907A

CN111429907A - Voice service mode switching method, device, equipment and storage medium

Info

Publication number: CN111429907A
Application number: CN202010220646.6A
Authority: CN
Inventors: 李扬; 李士岩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2020-07-17
Anticipated expiration: 2040-03-25
Also published as: CN111429907B

Abstract

The application discloses a voice service mode switching method, a voice service mode switching device, voice service mode switching equipment and a storage medium, and relates to the technical field of intelligent voice. The specific implementation scheme is as follows: identifying a current service scene of the avatar; determining a target service mode corresponding to the current service scene according to the current service scene; and switching the service mode of the virtual image to the target service mode, and carrying out voice interaction with the user according to the target service mode. According to the embodiment of the application, the corresponding service modes are switched according to different service scenes, so that the virtual image is more consistent with the current service scene and is more easily accepted by a user, the service is accurately provided for the user, and the user experience is improved.

Description

Voice service mode switching method, device, equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to an intelligent voice technology.

Background

With the development of information technology, intelligent voice technology has become the most convenient and effective means for people to acquire and communicate information. For example, an intelligent voice robot is usually placed in places such as a shopping mall and an exhibition hall, and a user can perform voice interaction with the robot, so that user questions are solved, goods are promoted to the user, the user chats with the user, and convenience is provided for the user.

The fixed function is accomplished to current intelligent voice robot under fixed scene usually, sets up fixed corpus usually in advance, and the question is all fixed with the answer, and what answer is asked for to the user, probably has improper answer, and intelligent voice robot is more fixed with user's interaction mode, lacks the hommization, and user experience is relatively poor.

Disclosure of Invention

The application provides a voice service mode switching method, a voice service mode switching device, voice service mode switching equipment and a voice service mode switching storage medium, so that voice interaction is carried out between different service scenes and a user in a proper service mode, service is accurately provided for the user, and user experience is improved.

A first aspect of the present application provides a method for switching a voice service mode, including:

identifying a current service scene of the avatar;

determining a target service mode corresponding to the current service scene according to the current service scene; wherein the target service mode comprises at least one of a target appearance, a target action policy, a target interaction logic, a target conversational policy of the avatar;

and switching the service mode of the virtual image to the target service mode, and carrying out voice interaction with the user according to the target service mode.

A second aspect of the present application provides a voice service mode switching apparatus, including:

the scene recognition module is used for recognizing the current service scene of the virtual image;

the service mode determining module is used for determining a target service mode corresponding to the current service scene according to the current service scene; wherein the target service mode comprises at least one of a target appearance, a target action policy, a target interaction logic, a target conversational policy of the avatar;

and the processing module is used for switching the service mode of the virtual image to the target service mode and carrying out voice interaction with a user according to the target service mode.

A third aspect of the present application provides an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

A fourth aspect of the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect.

A fifth aspect of the application provides a computer program comprising program code for performing the method according to the first aspect when the computer program is run by a computer.

One embodiment in the above application has the following advantages or benefits: identifying a current service scene of the avatar; determining a target service mode corresponding to the current service scene according to the current service scene; and switching the service mode of the virtual image to the target service mode, and carrying out voice interaction with the user according to the target service mode. According to the embodiment of the application, the corresponding service modes are switched according to different service scenes, so that the virtual image is more consistent with the current service scene and is more easily accepted by a user, the service is accurately provided for the user, and the user experience is improved.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a system diagram illustrating a voice service mode switching method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating a voice service mode switching method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a voice service mode switching method according to another embodiment of the present application;

fig. 4 is a schematic diagram of a voice service mode switching method according to another embodiment of the present application;

fig. 5 is a block diagram of a voice service mode switching apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of an electronic device for implementing a voice service mode switching method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

For a clear understanding of the technical solutions of the present application, a detailed description of the prior art solutions is first provided. The fixed function is accomplished to current intelligent voice robot under fixed scene usually, sets up fixed corpus usually in advance, and the question is all fixed with the answer, and what answer is asked for to the user, probably has improper answer, and intelligent voice robot is more fixed with user's interaction mode, lacks the hommization, and user experience is relatively poor.

In order to solve the problems, different service modes are configured in advance for different service scenes such as a sales scene, a customer service scene, a guidance scene, a consultation scene and an accompanying scene, and then a proper service mode can be selected according to the current service scene to perform voice interaction with a user, so that services can be accurately provided for the user, and the user experience is improved.

Furthermore, the virtual image displayed in the display device is adopted to perform voice interaction with the user, so that switching of service modes is facilitated, wherein at least one of appearance, action strategy, interaction logic and tactical strategy of the virtual image can be different in different service modes.

The voice service mode switching method provided by the present application can be applied to an intelligent voice system as shown in fig. 1, the system includes a display device 111, a control device 112, and may further include a sensor, a speaker, and the like (not shown in fig. 1), wherein the sensor may include a sensor for collecting sound, a sensor for collecting images, and the like, the display device 111 may be used for displaying an avatar 113, and the control device 112 is used for executing the voice service mode switching method of the present application, that is, recognizing a current service scene of the avatar, determining a target service mode according to the current service scene, switching the service mode of the avatar to the target service mode, and performing voice interaction with the user 120 according to the target service mode. Alternatively, the control device 112 may be integrated with the display apparatus 111.

The following describes the voice service mode switching process in detail with reference to specific embodiments.

An embodiment of the present application provides a method for switching a voice service mode, and fig. 2 is a flowchart of the method for switching a voice service mode according to the embodiment of the present invention. The execution subject may be a control device of an intelligent voice system, as shown in fig. 2, the voice service mode switching method specifically includes the following steps:

s201, identifying the current service scene of the virtual image.

In this embodiment, the avatar is displayed on a fixedly-arranged display device, and can collect the user's voice through a sensor, perform voice interaction with the user, and provide voice interaction service to the user, wherein the display device can be arranged in a storefront, a mall, a hall, a tourist center, a park, etc. In the process of providing services to the user by each avatar, different service scenarios may exist, for example, different situations such as user consultation, chatting, and asking for directions may be encountered by the avatar of a certain display device in a shopping mall. Optionally, the service scenario may include, but is not limited to, any of the following: the system comprises a sales scene, a customer service scene, a guidance scene, a consultation scene and an accompanying scene, and more accurate service can be conveniently realized by providing richer and more detailed service scenes in the embodiment.

Further, alternatively, the avatar may be displayed on a common display; in addition, the virtual image can be displayed on a transparent display device, such as an air screen, and the virtual image can be a three-dimensional character image, so that a user feels that the character image stands in front of the virtual image, a more real visual effect is brought to the user, and the user experience is improved. Of course, other display modes can be adopted, and are not described in detail herein. In the embodiment, the virtual image is adopted to more conveniently switch different service modes, particularly the appearance and action strategy of the virtual image.

In this embodiment, the user intention may be specifically obtained or the current service scenario may be determined by other means, so that different service modes are adopted for different service scenarios to provide services to the user.

S202, determining a target service mode corresponding to the current service scene according to the current service scene.

Wherein the target service mode includes at least one of a target appearance, a target action policy, a target interaction logic, and a target conversational policy of the avatar.

In this embodiment, after the current service scenario is determined, the corresponding target service mode may be determined according to the current service scenario. That is, in the present embodiment, different service scenarios correspond to different service modes, wherein at least one of the appearance of the avatar, the action policy, the interaction logic, and the conversational policy may be different in the different service modes.

For example, in a service mode of a sales scene, the clothes of the avatar can be formally the same as the clothes of a salesperson, the smile is softer, and the action of the avatar can conform to the sales scene, for example, the gesture amplitude is not too large, the gesture frequency is not too frequent, the gesture is soft, and the like, while the interaction logic can conform to the interaction logic in the sales scene, which cases can directly answer the user question, which cases can ask the user question, and which cases do not answer the user question, and the logics can be configured in advance, and the conversational strategy is to organize the language based on a pre-configured corpus of the sales scene, and adopt the tone, intonation, and the like of the sales scene; for another example, in the service mode of the accompanying scene, the avatar is more free to wear, the smile is also more free to allow an exaggerated laugh, the action of the avatar can be also exaggerated, the interaction logic can conform to the interaction logic in the accompanying scene, the user questions can be answered, the laugh and the joke can be played, the logic can be configured in advance, the speaking strategy is to organize the language based on the pre-configured corpus of the accompanying scene, and adopt the tone, tone and the like of the appropriate accompanying scene.

S203, switching the service mode of the virtual image to the target service mode, and carrying out voice interaction with the user according to the target service mode.

In this embodiment, after the target service mode of the avatar is determined, the avatar is switched to the service mode, and then voice interaction is performed with the client in the target service mode, so that the avatar better conforms to the current service scene, is easier to be accepted by the user, and accurately provides service for the user.

The voice service mode switching method provided by the embodiment identifies the current service scene of the virtual image; determining a target service mode corresponding to the current service scene according to the current service scene; and switching the service mode of the virtual image to the target service mode, and carrying out voice interaction with the user according to the target service mode. In the embodiment, the corresponding service modes are switched according to different service scenes, so that the virtual image is more consistent with the current service scene and is more easily accepted by a user, thereby realizing accurate service provision for the user and improving the user experience.

On the basis of the foregoing embodiment, as shown in fig. 3, the determining, according to the current service scenario and according to the current service scenario, a target service mode corresponding to the current service scenario in S202 may specifically include:

s301, determining at least one of a target appearance, a target action strategy, a target interaction logic and a target conversation strategy of the virtual image corresponding to the current service scene according to the current service scene;

s302, determining at least one of the target appearance, the target action strategy, the target interaction logic and the target conversation strategy as the target service mode.

In this embodiment, in order to provide services to users more accurately, a corresponding target service mode may be determined according to a current service scenario, where the target service mode may include at least one of a target appearance, a target action policy, a target interaction logic, and a target conversational policy. In this embodiment, when the mode needs to be switched, only one or more items of the appearance, the action policy, the interaction logic, and the conversational policy of the avatar may be changed, so that the service mode is switched more flexibly and naturally, and the service is provided to the user.

More specifically, the determining a target service mode corresponding to the current service scenario according to the current service scenario includes:

determining a preset service mode corresponding to the current service scene as the target service mode according to the current service scene and the corresponding relation between the preset service scene and the preset service mode;

wherein, the corresponding relationship between the preset service scene and the preset service mode comprises: and presetting a corresponding relation between the service scene and at least one of a preset appearance, a preset action strategy, a preset interaction logic and a preset conversation strategy of the virtual image.

In this embodiment, the preset service modes corresponding to different preset service scenes may be configured in advance, that is, at least one of the preset appearance, the preset action policy, the preset interaction logic, and the preset speech policy of the avatar under different preset service scenes is configured, the corresponding relationship between each preset service scene and at least one of the preset appearance, the preset action policy, the preset interaction logic, and the preset speech policy of the configured avatar is obtained, and the target service mode may be determined according to the current service scene and the corresponding relationship. In the embodiment, by configuring the preset service modes corresponding to different preset service scenes in advance, when the service scenes change, the user can be accurately switched to the appropriate service mode to perform accurate service.

On the basis of any of the above embodiments, in the identifying of the current service scene of the avatar in S201, some interactive scene identification methods may be adopted to determine which scene of the sales scene, the customer service scene, the guidance scene, the consultation scene, the company scene, and the like is the current service scene.

In an alternative embodiment, as shown in fig. 4, the identifying a current service scene of the avatar in S201 may specifically include:

s401, determining user intention according to a voice instruction of a user;

s402, identifying the current service scene of the virtual image according to the user intention.

In this embodiment, when a user has a conversation with an avatar, a current service scenario can be determined based on a collected voice instruction of the user, voice recognition and semantic understanding are performed according to the voice instruction of the user, and a user intention is determined, so that the requirement of the user can be determined to be any one of sales, customer service, guidance, consultation and companions, and further the current service scenario of the avatar is determined, for example, when the user inquires a toilet location, it is determined that the user intention is to require the avatar to guide the toilet location, and further the current service scenario of the avatar is determined to be a guidance scenario; if the user inquires the commodity information, the intention of the user is to buy the commodity and want to know the commodity information, and then the current service scene of the virtual image is determined to be a sale scene.

In another optional embodiment, the identifying the current service scene of the avatar in S201 may specifically include:

and if the commodity in the hand of the user is identified, determining that the current service scene of the virtual image is a sales scene.

In this embodiment, when the user interacts with the avatar or is in front of the avatar, the user may be subjected to image acquisition and recognition, and if a commodity in the hand of the user is recognized, the current service scenario of the avatar may be determined as a sales scenario, and further, according to the current service scenario, the target service mode is determined as a service mode corresponding to the sales scenario; and further switching the service mode of the virtual image to a service mode corresponding to a sales scene, and explaining the related information of the commodity to the user according to the service mode corresponding to the sales scene. In the embodiment, when the virtual image is in the service mode corresponding to the non-sales scene, after the commodities in the hand of the user are identified, the service mode corresponding to the sales scene is automatically switched, and the commodities can be more intelligently promoted to the user, so that the commodity promotion service can be accurately provided for the user, and the user experience is improved.

and carrying out face recognition on the user, and determining the current service scene of the virtual image according to the face recognition result.

In this embodiment, when the user interacts with the avatar or is in front of the avatar, the user may be subjected to face recognition, so as to adopt different service modes for different users, for example, in a welcome scene, different welcome service modes may be adopted for different users, that is, the welcome scene is divided into more detailed scenes corresponding to different users, so that the more flexible and targeted service modes are adopted to meet the requirements of different users, thereby improving the service quality and improving the user experience.

Further, in this embodiment, a corresponding relationship between preset user identity information and a preset service scene may be configured, and determining the current service scene of the avatar according to the face recognition result may include:

determining user identity information according to the face recognition result; and determining a preset service scene corresponding to the user identity as the current service scene of the virtual image according to the corresponding relation between the preset user identity information and the preset service scene.

In this embodiment, the user identity information may include information such as the gender and age of the user, or may include some user information input in advance, such as the name and occupation of the user. The corresponding relationship between the preset user identity information and the preset service scene may be the corresponding relationship between users of different categories and the preset service scene, for example, the corresponding relationship between users of different ages and the preset service scene, the corresponding relationship between users of different genders and the preset service scene, and the like, and of course, the corresponding relationship between a certain user and the preset service scene may also be refined, so that services can be provided to the user in a more targeted manner, and the requirements of different users are met.

On the basis of any of the above embodiments, the switching the service mode of the avatar to the target service mode in S203 may specifically include:

if the preset conditions are met, switching the service mode of the virtual image to the target service mode in real time;

if the preset condition is not met, keeping the service mode of the virtual image unchanged;

wherein the preset condition is a preset time condition and/or the preset environmental condition.

In this embodiment, when a service scene changes, switching of service modes can be performed in real time; certainly, some preset conditions may also be preset, the service mode is switched when the preset conditions are met, and the service mode is not switched when the preset conditions are not met, so that the service mode switching is more in line with the actual situation and more standard, for example, the virtual image may be switched in real time when the current time is within the first preset time period, and the virtual image may keep the service mode unchanged when the current time is not within the first preset time period (e.g., within the second preset time period), for example, the virtual image is specified to keep the service mode corresponding to the sales scene within the second preset time period, when the user needs to chat or accompany, the service mode may not be switched, and the user may be rejected, for example, the user replies that "i is working and cannot chat with you"; if the display device of the virtual image is positioned in front of the counter, the virtual image is required to keep the service mode corresponding to the sales scene unchanged, and the service mode can be switched in real time according to the service scene after the virtual image is moved to other positions; optionally, a plurality of display devices may be provided, the display device where the avatar is located is determined according to the user position acquired by the sensor, so that the avatar follows the user and performs a personal service, and when the avatar moves to a certain environment, that is, when the avatar appears in the display device in the environment, if the avatar moves to the display device in front of the counter, the service mode of the avatar may be set to be unchanged, if the service mode corresponding to the sales scene is kept unchanged, and when the avatar moves to another environment, the service mode may be switched in real time according to the service scene.

On the basis of the above embodiment, in the process of performing voice interaction with the user according to the target service mode in this embodiment, the flow-based interaction logic may not be limited, that is, the interaction process is not performed according to a flow mechanical manner, for example, when the user transacts a certificate, the user needs to consult or guide, the avatar selects a suitable non-flow interaction logic to communicate with the user instead of what must be communicated first and then what must be communicated, if the user inquires about weather suddenly, it is determined that the service scene changes, the service mode is switched to the service mode corresponding to the chatty scene, and the user continues to transact the certificate after asking about the weather, and then the service mode is switched back, so as to provide better service for the user through a flexible service mode switching process, and improve user experience.

Fig. 5 is a structural diagram of a voice service mode switching apparatus according to an embodiment of the present invention. As shown in fig. 5, the voice service mode switching apparatus 500 specifically includes: a scene recognition module 501, a service mode determination module 502, and a processing module 503.

A scene recognition module 501, configured to recognize a current service scene of the avatar;

a service mode determining module 502, configured to determine, according to the current service scenario, a target service mode corresponding to the current service scenario; wherein the target service mode comprises at least one of a target appearance, a target action policy, a target interaction logic, a target conversational policy of the avatar;

the processing module 503 is configured to switch the service mode of the avatar to the target service mode, and perform voice interaction with the user according to the target service mode.

On the basis of the foregoing embodiment, the service mode determining module 502 is configured to:

determining at least one of a target appearance, a target action strategy, a target interaction logic and a target operation strategy of the virtual image corresponding to the current service scene according to the current service scene;

and determining at least one of the target appearance, the target action strategy, the target interaction logic and the target dialogue strategy as the target service mode.

On the basis of any of the foregoing embodiments, optionally, the scene recognition module 501 is configured to:

determining user intention according to a voice instruction of a user;

identifying a current service scenario of the avatar according to the user intent.

if the commodity in the hand of the user is identified, determining that the current service scene of the virtual image is a sales scene;

the processing module 503 is configured to:

and switching the service mode of the virtual image to a service mode corresponding to a sales scene, and explaining the related information of the commodity to the user according to the service mode corresponding to the sales scene.

On the basis of the above embodiment, when determining the current service scene of the avatar according to the face recognition result, the scene recognition module 501 is configured to:

determining user identity information according to the face recognition result;

and determining a preset service scene corresponding to the user identity as the current service scene of the virtual image according to the corresponding relation between the preset user identity information and the preset service scene.

On the basis of any of the above embodiments, the processing module 503 is configured to:

On the basis of any of the above embodiments, the current service scenario includes any one of: sales scenes, customer service scenes, guidance scenes, consultation scenes and accompanying scenes.

On the basis of any one of the above embodiments, the avatar is displayed on the transparent display device.

The voice service mode switching apparatus provided in this embodiment may be specifically configured to implement the voice service mode switching method embodiments provided in fig. 2 to 4, and specific functions are not described herein again.

The voice service mode switching device provided by the embodiment identifies the current service scene of the virtual image; determining a target service mode corresponding to the current service scene according to the current service scene; and switching the service mode of the virtual image to the target service mode, and carrying out voice interaction with the user according to the target service mode. In the embodiment, the corresponding service modes are switched according to different service scenes, so that the virtual image is more consistent with the current service scene and is more easily accepted by a user, thereby realizing accurate service provision for the user and improving the user experience.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 6 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the voice service mode switching method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the voice service mode switching method provided by the present application.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the scene recognition module 501, the service mode determination module 502, and the processing module 503 shown in fig. 5) corresponding to the voice service mode switching method in the embodiment of the present application. The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 602, that is, implementing the voice service mode switching method in the above-described method embodiment.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the voice service mode switching method, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to the electronic device of the voice service mode switching method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the voice service mode switching method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the voice service mode switching method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. the output device 604 may include a display device, an auxiliary lighting device (e.g., L ED), a haptic feedback device (e.g., a vibration motor), etc.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices (P L D)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.

The systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or L CD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer for providing interaction with the user.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., AN application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with AN implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the current service scene of the virtual image is identified; determining a target service mode corresponding to the current service scene according to the current service scene; and switching the service mode of the virtual image to the target service mode, and carrying out voice interaction with the user according to the target service mode. In the embodiment, the corresponding service modes are switched according to different service scenes, so that the virtual image is more consistent with the current service scene and is more easily accepted by a user, thereby realizing accurate service provision for the user and improving the user experience.

The present application also provides a computer program comprising a program code for executing the method for switching voice service mode according to the above embodiment when the computer program is run by a computer.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for switching voice service modes, comprising:

identifying a current service scene of the avatar;

2. The method of claim 1, wherein the determining the target service mode corresponding to the current service scenario according to the current service scenario comprises:

3. The method of claim 2, wherein the determining the target service mode corresponding to the current service scenario according to the current service scenario comprises:

4. The method of claim 1, wherein the identifying a current service scenario of the avatar comprises:

determining user intention according to a voice instruction of a user;

5. The method of claim 1, wherein the identifying a current service scenario of the avatar comprises:

the switching the service mode of the virtual image to the target service mode, and performing voice interaction with the user according to the target service mode comprises:

6. The method of claim 1, wherein the identifying a current service scenario of the avatar comprises:

7. The method of claim 6, wherein determining the current service scenario of the avatar according to the face recognition result comprises:

determining user identity information according to the face recognition result;

8. The method according to any one of claims 1-7, wherein the switching the service mode of the avatar to the target service mode comprises:

9. The method according to any of claims 1-7, wherein the current service scenario comprises any of: sales scenes, customer service scenes, guidance scenes, consultation scenes and accompanying scenes.

10. The method according to any of claims 1-7, wherein the avatar is displayed on a transparent display.

11. A voice service mode switching apparatus, comprising:

12. The apparatus of claim 11, wherein the service mode determination module is configured to:

13. The apparatus of claim 12, wherein the service mode determination module is configured to:

14. The apparatus of claim 11, wherein the scene recognition module is configured to:

determining user intention according to a voice instruction of a user;

15. The apparatus of claim 11, wherein the scene recognition module is configured to:

the processing module is used for:

16. The apparatus of claim 11, wherein the scene recognition module is configured to:

17. The apparatus of claim 16, wherein the scene recognition module, when performing face recognition on the user and determining the current service scene of the avatar according to the face recognition result, is specifically configured to:

determining user identity information according to the face recognition result;

18. The apparatus of any one of claims 11-17, wherein the processing module is configured to:

19. The apparatus according to any of claims 11-17, wherein the current service scenario comprises any of: sales scenes, customer service scenes, guidance scenes, consultation scenes and accompanying scenes.

20. The apparatus of any of claims 11-17, wherein the avatar is displayed on a transparent display.

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.