CN116546139A

CN116546139A - Virtual person conversation method and device, electronic equipment and storage medium

Info

Publication number: CN116546139A
Application number: CN202310678478.9A
Authority: CN
Inventors: 郭俊廷; 支涛
Original assignee: Henan Yunji Intelligent Technology Co Ltd
Current assignee: Henan Yunji Intelligent Technology Co Ltd
Priority date: 2023-06-08
Filing date: 2023-06-08
Publication date: 2023-08-04

Abstract

The application relates to the technical field of communication and provides a virtual person communication method, a virtual person communication device, electronic equipment and a storage medium. The method comprises the following steps: determining a type to be serviced of an incoming call; the type of the service to be served comprises purposeful service and/or non-purposeful service; determining a target voice mode matched with the type to be served from a voice mode database according to the type to be served; a virtual person determining a target voice pattern is caused to talk to the user. According to the method and the device, the target voice mode matched with the type to be served is determined from the voice mode database according to the type to be served, so that a virtual person determining the target voice mode can communicate with a user, the voice mode can be automatically switched to the voice mode with different sounds according to the user needs to communicate, the communication atmosphere of a real person is created for a communication object, an intelligent customer service robot or a virtual person has emotion in the communication process, and user experience and satisfaction are improved.

Description

Virtual person conversation method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for talking a virtual person, an electronic device, and a storage medium.

Background

With the continuous development and application of artificial intelligence technology, more intelligent service scenes are applied to the ground, and man-machine interaction becomes a new era of normality. For the customer service industry, whether the intelligent customer service robot or the virtual person is a telephone sales center or a customer service center, the intelligent customer service robot or the virtual person can help enterprises save cost, reduce labor cost and greatly improve working efficiency, and is an optimal assistant for customer service personnel. The novel customer service mode combining the intelligent interaction service with the artificial service is realized by utilizing AI technologies such as big data analysis, knowledge engineering, machine learning and intelligent voice as an intelligent customer service system, so that the novel customer service mode meets the personalized and humanized interaction service of knowledge, overturns the traditional customer service mode and creates more intelligent experience.

However, in the prior art, the intelligent customer service robot or the virtual person has no emotion due to mechanization of the language in the process of replying to the call, so that a separation sense is generated in the process of communicating with the user, and the user experience is poor.

Disclosure of Invention

In view of this, the embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for a virtual person to solve the problem in the prior art that, because the virtual person is mechanized in the process of replying to a call, and does not have emotion, a sense of separation is generated in the process of communicating with a user, and the user experience is poor.

In a first aspect of the embodiments of the present application, a method for talking a virtual person is provided, where the method includes:

determining a type to be serviced of an incoming call; the type of the service to be served comprises purposeful service and/or non-purposeful service;

determining a target voice mode matched with the type to be served from a voice mode database according to the type to be served;

a virtual person determining a target voice pattern is caused to talk to the user.

In a second aspect of the embodiments of the present application, a call device for a virtual person is provided, including:

a waiting service type determining module: the method comprises the steps of determining a to-be-serviced type of an incoming call, wherein the to-be-serviced type comprises purposeful service and/or non-purposeful service;

a target voice mode determining module: the target voice mode matching the type to be served is determined from the voice mode database according to the type to be served;

and a communication module: a virtual person determining a target voice pattern is caused to talk to the user.

In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.

Compared with the prior art, the embodiment of the application has the beneficial effects that: according to the method, the target voice mode matched with the type to be served is determined from the voice mode database according to the type to be served, so that a virtual person determining the target voice mode can communicate with a user, the voice mode can be automatically switched to the voice mode with different sounds according to the needs of the user to communicate, the communication atmosphere of a real person is created for a communication object, the intelligent customer service robot or the virtual person has emotion in the communication process, and the user experience and satisfaction degree are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application;

fig. 2 is a flow chart of a method for talking a virtual person according to an embodiment of the present application;

fig. 3 is a schematic diagram of a communication device of a virtual person according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

The intelligent customer service robot or the virtual person is a brand new intelligent tool, and can reply the user questions in real time on line for 24 hours, so that the intelligent customer service robot or the virtual person is used as the supplement of the artificial customer service, and the capacity of serving the public can be obviously improved compared with the traditional pure artificial customer service.

The intelligent customer service robot can combine the media such as pictures, characters and even audio and video to the most complete reply of the user through a certain carrier such as WEB, M, WAP/SMS and the like, so that the user can solve the problem in communication. In the European and American countries where customer service centers are highly developed, a considerable number of enterprises have implemented intelligent customer service robot service systems to provide convenient, accurate and high-quality services for clients of the enterprises and governments by adopting artificial intelligence technology, so that the customer service centers are effectively matched, and the customer satisfaction is increased.

In view of the above problems in the prior art, the embodiments of the present disclosure provide a brand-new method, apparatus, electronic device, and storage medium for virtual person communication, by determining, from a voice pattern database according to a type to be served, a target voice pattern matching the type to be served, so that a virtual person determining the target voice pattern communicates with a user, and can automatically switch to voice patterns of different sounds according to user needs to perform communication, thereby creating a communication atmosphere of a real person for a communication object, enabling an intelligent customer service robot or virtual person to have emotion in the communication process, and improving user experience and satisfaction.

A brand new virtual person communication method, device, electronic equipment and storage medium according to the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application. The application scenario may include terminal devices 101, 102, and 103, server 104, network 105, and virtual person 106.

The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they may be various electronic devices having a display screen and supporting communication with the server 104, including but not limited to smartphones, tablets, laptop and desktop computers, etc.; when the terminal devices 101, 102, and 103 are software, they may be installed in the electronic device as above. The terminal devices 101, 102 and 103 may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited in this embodiment of the present application. Further, various applications, such as a data processing application, an instant messaging tool, social platform software, a search class application, a shopping class application, and the like, may be installed on the terminal devices 101, 102, and 103.

The server 104 may be a server that provides various services, for example, a background server that receives a request transmitted from a terminal device with which communication connection is established, and the background server may perform processing such as receiving and analyzing the request transmitted from the terminal device and generate a processing result. The server 104 may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center, which is not limited in this embodiment of the present application.

The server 104 may be hardware or software. When the server 104 is hardware, it may be various electronic devices that provide various services to the terminal devices 101, 102, and 103. When the server 104 is software, it may be a plurality of software or software modules providing various services to the terminal devices 101, 102, and 103, or may be a single software or software module providing various services to the terminal devices 101, 102, and 103, which is not limited in the embodiment of the present application.

The network 105 may be a wired network using coaxial cable, twisted pair and optical fiber connection, or may be a wireless network that can implement interconnection of various communication devices without wiring, for example, bluetooth (Bluetooth), near field communication (Near Field Communication, NFC), infrared (Infrared), etc., which is not limited in the embodiment of the present application.

The virtual person 106 is not a real existing person. The intelligent customer service robot can be an intelligent customer service robot, can be a program or software capable of simulating real human voice to automatically talk to clients, and can be a language platform added with a plurality of artificial intelligence technologies such as natural language processing, voice recognition, word meaning understanding and the like on the basis of a common calling system. The virtual person 106 can replace manual call answering by intelligent outbound and automatic answering, so as to achieve the marketing service purposes of screening intention customers, locking target customers, precisely classifying customers and the like. Or replace manual work to solve the problem for the call clients, implement help, and simply chat. The virtual person 106 can improve customer experience, improve marketing efficiency, optimize operation cost, reduce cost and increase efficiency for the assistance enterprises. The present embodiments do not limit the type of dummy 106.

The user can establish a communication connection with the server 104 via the network 105 through the terminal devices 101, 102, and 103 to receive or transmit information or the like. Specifically, a control system is integrated in the server 104, and issues control instructions through the terminal devices 101, 102 and 103, and the control system indirectly controls the administered virtual person 106 to receive the instructions and controls the virtual person 106 to execute instruction tasks according to instruction requirements.

It should be noted that the specific types, numbers and combinations of the terminal devices 101, 102 and 103, the server 104, the network 105 and the dummy 106 may be adjusted according to the actual requirements of the application scenario, which is not limited in the embodiment of the present application.

Fig. 2 is a flow chart of a method for talking a virtual person according to an embodiment of the present application. The method of talking the virtual person of fig. 2 may be performed by a control system integrated in the server 104 of fig. 1. As shown in fig. 2, the method for talking the virtual person includes:

s201, determining the type of service to be provided for an incoming call; the type of the service to be served comprises purposeful service and/or non-purposeful service;

s202, determining a target voice mode matched with the type to be served from a voice mode database according to the type to be served;

s203, enabling the virtual person determining the target voice mode to communicate with the user.

In particular, a virtual person refers to an object that simulates characteristics of a real human body by digital or intelligent technology. The virtual person in the embodiments of the present disclosure is not a real existing person, and refers to an object having the characteristics of human language. The intelligent customer service robot can be an intelligent customer service robot, can be a program or software capable of simulating real human voice to automatically talk to clients, and can be a language platform added with a plurality of artificial intelligence technologies such as natural language processing, voice recognition, word meaning understanding and the like on the basis of a common calling system. The virtual person can replace manual call answering through intelligent outbound and automatic answering, so that the marketing service purposes of screening intention clients, locking target clients, accurately classifying clients and the like are achieved. Or replace manual work to solve the problem for the call clients, implement help, and simply chat. An incoming call refers to a communication connection established between a calling client and a virtual person. The type to be served refers to what the call client specifically needs to do with customer service. For example, what kind of help is provided for him to solve what problems, or simply communication. Here, the type of service to be serviced is classified into a purposeful service and/or a non-purposeful service. Wherein a purposeful service may refer to a service provided for the purpose set by a customer. For example, a customer asks a way, providing it with route information; the client consults a certain building price and provides a market price for the building price; the customer does not understand how the handset operates, teaches its operating steps, etc., which are purposeful services. Non-purposeful services may refer to services provided to clients without purposefulness. For example, a customer only wants to find out that an individual chat, then dial a call, and a virtual person simply talk with the customer, and provide communication services for the customer, which are non-destination services.

Further, a target voice pattern matching the type to be served is determined from the voice pattern database according to the type to be served. The voice pattern database refers to a database in which a plurality of pieces of voice pattern data are stored. The speech mode is simply the type of speech being spoken, which can also be subdivided into pitch, intensity, duration and quality of the speech.

Where pitch refers to the level of sound, which depends on how fast the sounding body vibrates. The fast vibration will have a high pitch, whereas the slow vibration will have a low pitch. The vibration speed of the object is determined by the shape of the sounding body, and the vibration speed is expressed as follows:

large, thick, long, loose objects vibrate slowly and pitch low.

Small, thin, short, tight objects vibrate fast and pitch high.

The sound intensity refers to the intensity of sound, and depends on the amplitude of the vibration of the sounding body. The louder the amplitude, the louder the sound, and vice versa. The intensity of sound is determined by the intensity of force used in pronunciation, and if the intensity of force is large, the amplitude is large, the intensity of sound is strong, and if the intensity of force is small, the amplitude is small, and the intensity of sound is weak.

The length of sound is the length of sound, and is determined by the duration of vibration of the object during sound production, and if the sound production body vibrates for a long time, the longer the length of sound, otherwise the shorter the length of sound. In chinese, the length of sound is generally not used as a main means of distinguishing the meaning, but is a natural attribute in pronunciation and often appears as a concomitant feature.

Tone quality, also called timbre, refers to the essential characteristics of sound, the most fundamental characteristic of distinguishing one sound from other sounds. Depending on the form of the sound wave when sounding, the sound wave is different and the sound quality is different.

The above elements of different sounds constitute different speech pattern data. The process of establishing the voice mode database generally comprises the steps of firstly, obtaining a plurality of real person voice samples; extracting the voice signals (pitch, intensity, duration and tone quality) of the real human voice samples to determine voice pattern data; and finally, establishing a voice mode database according to the voice mode data. Thus, various voice pattern data are stored in the voice pattern database. The target voice pattern is the voice pattern adopted for matching the type to be served. For example, when the type of service to be serviced is determined to be a purposeful service, serious and regular standard calls may be used to communicate with the customer. When the type of service to be serviced is determined to be a non-destination service, a conversation may be conducted with the customer using a voice similar to boring or naughty.

Further, the target speech pattern, i.e. the sound elements used when talking to the user, is determined. The virtual person can receive the voice mode data of the target voice mode while sounding, and endow the virtual person with special pitch, intensity, duration and tone quality in the conversation process, so that the sound emitted by the virtual person is humanized and no mechanical or cold ice is generated.

According to the technical scheme provided by the embodiment of the disclosure, the voice modes of different sounds can be automatically switched to carry out conversation according to the user needs, so that the conversation atmosphere of a real person is created for a conversation object, the intelligent customer service robot or the virtual person has emotion in the conversation process, and the user experience and satisfaction are improved.

In some embodiments, determining the type of service to be serviced for the incoming call includes:

determining input information input by a user through an incoming call;

when the input information comprises a purposeful instruction, determining the type to be served as purposeful service;

and when the input information does not comprise the destination instruction, determining the type to be served as non-destination service.

Specifically, the input information in this embodiment refers to related information revealed by the user during the incoming call. For example what the user wants to do. The input information includes various instructions, and generally, the input information can be divided into two main categories, namely: including destination instructions and excluding destination instructions. The destination instruction may be a must-complete or have a required instruction target. For example, when a user A calls an incoming call, the user A replies the price of a teacup to the dummy person who wants to consult the price of the teacup. The input information is then the willingness of user a to be expressed in the incoming call, i.e. the price is queried. The price of a certain teacup is consulted to be a purposeful instruction, and the virtual person A replies the price of the teacup to be purposeful service. In the incoming call, a user B wants to find out a person to chat by telephone, but does not specify what to chat, and a virtual person communicates with the user B at will. The input information is the information expressed by the user B in the incoming call process, and people want to find out the telephone chat, but do not specify what is specifically chat and does not include the destination instruction, and the virtual person can communicate with the virtual person at will to serve the non-destination service.

In some embodiments, determining input information entered by the user through the incoming call includes:

determining key information and/or voice information input by a user through an incoming call;

input information is determined based on the key information and/or the voice information.

In particular, the determination of the input information can be divided into two ways, one being determined by key information and the other being determined by voice information. Wherein, when the key information is that the user dials the telephone, each number or letter key represents a certain service item, and the user presses which key to represent the corresponding service information according to the instruction and the own requirement, such as the number key 1 represents consultation, 2 represents chat, 3 represents promotion, etc. Each key is assigned corresponding service information. The voice information is service information required by the user through the semantic meaning of the spoken utterance of the user, the analysis process can be realized through a language training model, the concrete expression meaning of the spoken utterance of the user is analyzed and identified through the language training model, and corresponding service information is provided according to the expressed meaning.

In some embodiments, the speech pattern database includes purposive alternative speech patterns and non-purposive alternative speech patterns;

purposeful alternative speech modes include, consultative speech mode, operational speech mode, select speech mode, and verify speech mode;

non-destination alternative speech modes include a free speech mode and an interactive speech mode.

Specifically, the voice pattern database is that a plurality of pieces of voice pattern data are stored, and the voice pattern data can be divided into two types, namely a purposeful alternative voice pattern and a non-purposeful alternative voice pattern. These alternative speech patterns are all speech patterns to be selected and matched according to the type of service to be served. Wherein the general purposeful alternative voice modes include a consultation voice mode, an operation voice mode, a selection voice mode and a verification voice mode according to application scene division. Non-destination alternative speech modes include a free speech mode and an interactive speech mode.

The consultation voice mode can provide consultation call service for voice of a man with high voice speed, flood and bright voice and the like, and the consultation call service can be applied to concrete consultation scenes such as road asking, searching and the like.

The operation voice mode can be voice with moderate voice speed, flood voice and female voice to provide operation type call service, and the operation type call service can be applied to specific operation scenes such as use, description and the like.

The voice selection mode can be voice of a female voice with moderate voice and slow voice speed to provide a selection type call service, and the selection type call service can be applied to specific selection scenes such as questionnaires, problem solutions and the like.

The verification voice mode can be voice of slow voice, flood voice and female voice to provide verification type call service, and the verification type call service can be applied to specific verification scenes such as obtaining verification codes, determining authenticity of things and the like.

The free voice mode can be voice of slow voice, moderate voice and tone of the child voice to provide free call services, and the free call services can be applied to specific free scenes such as project introduction, story explanation and the like.

The interactive voice mode can be voice with moderate voice speed, moderate voice and female voice to provide interactive call services, and the interactive call services can be applied to chat interactive scenes.

In some embodiments, if the type of service to be serviced is a purposeful service, determining a target voice pattern from a voice pattern database that matches the type of service to be serviced according to the type of service to be serviced includes:

determining at least one purposeful alternative speech pattern from a speech pattern database as a first candidate speech pattern;

and matching according to the input information, and determining a target voice mode from the first candidate voice modes.

In particular, if the type of service to be served is a purposeful service, a plurality of first candidate speech patterns may be determined according to the purposeful service type. The first candidate speech pattern refers to all purposeful alternative speech patterns that match the purposeful service. Because the input information of the user through the incoming call contains the specific requirement of the user, the most suitable voice mode can be matched from the first candidate voice modes as the target voice mode according to the specific requirement.

In some embodiments, if the type of service to be served is a non-destination service, determining a candidate voice pattern matching the type of service to be served from the voice pattern database according to the type of service to be served comprises:

determining at least one non-destination alternative speech pattern from the speech pattern database as a second candidate speech pattern;

and matching according to the input information, and determining a target voice mode from the second candidate voice modes.

In particular, if the type of service to be served is a non-destination service, a plurality of second candidate speech patterns may be determined according to the type of non-destination service. The second candidate speech pattern refers to all non-destination candidate speech patterns matching the non-destination service. Because the input information of the user through the incoming call contains the specific requirement of the user, the most suitable voice mode can be matched from the second candidate voice modes as the target voice mode according to the specific requirement.

In some embodiments, further comprising:

determining the region information of a user of an incoming call;

and determining the language of the communication between the virtual person and the user according to the region information.

Specifically, the zone information generally refers to the location to which the user of the incoming call belongs. The determination of the zone information is generally determined by key information or voice information. A key may be set to represent a certain zone. When the user calls, the user selects the affiliated place. Or analyzing and identifying the structure and tone of the words spoken by the user through the pronunciation of the user and through a language training model to match the affiliated place of the user. The regional information of the user is determined, and the language of the virtual person communicating with the user is determined according to the regional information. The languages may include officially set categories of language family such as chinese, english, japanese, spanish, etc., and dialects of some regions.

According to the embodiment, the language of the words spoken by the virtual person can be set according to the language of the user, so that the user can easily understand the semantics expressed by the virtual person when communicating with the virtual person, the sense of relativity is also more obvious, and the distance between the person and the machine is shortened.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Fig. 3 is a schematic diagram of a call device for a virtual person according to an embodiment of the present application. As shown in fig. 3, the virtual person communication apparatus includes:

the pending service type determination module 301: configured to determine a type of service to be used for the incoming call, the type of service to be used including a purposeful service and/or a non-purposeful service;

the target speech mode determination module 302: is configured to determine a target voice pattern matching the type to be served from a voice pattern database according to the type to be served;

the call module 303: is configured to communicate the virtual person determining the target voice pattern with the user.

In some embodiments, the pending service type determination module 301 of fig. 3 includes:

determining input information input by a user through an incoming call;

In some embodiments, the target speech mode determination module 302 of fig. 3 includes:

the voice mode database comprises a purposeful alternative voice mode and a non-purposeful alternative voice mode;

if the type to be served is purposeful service, determining at least one purposeful alternative voice mode from a voice mode database to serve as a first candidate voice mode;

if the type to be served is non-destination service, determining at least one non-destination alternative voice mode from the voice mode database to serve as a second candidate voice mode;

In some embodiments, the conversation module 303 of fig. 3 further comprises:

determining the region information of a user of an incoming call;

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Fig. 4 is a schematic diagram of an electronic device 4 provided in an embodiment of the present application. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented by processor 401 when executing computer program 403. Alternatively, the processor 401, when executing the computer program 403, performs the functions of the modules/units in the above-described apparatus embodiments.

The electronic device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 4 and is not limiting of the electronic device 4 and may include more or fewer components than shown, or different components.

The processor 401 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4. Memory 402 may also include both internal storage units and external storage devices of electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow in the methods of the above embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program may implement the steps of the respective method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of communicating with a virtual person, the method comprising:

determining a type to be serviced of an incoming call; the type to be served comprises purposeful services and/or non-purposeful services;

the virtual person determining the target voice mode is communicated with the user.

2. The method of claim 1, wherein the determining the type of service to be serviced for the incoming call comprises:

determining input information input by a user through the incoming call;

3. The method of claim 2, wherein said determining input information entered by a user through said incoming call comprises:

determining key information and/or voice information input by a user through the incoming call;

and determining the input information according to the key information and/or the voice information.

4. The method of claim 1, wherein the speech pattern database includes purposive alternative speech patterns and non-purposive alternative speech patterns;

the purposeful alternative voice modes comprise a consultation voice mode, an operation voice mode, a selection voice mode and a verification voice mode;

the non-destination alternative speech modes include a free speech mode and an interactive speech mode.

5. The method of claim 4, wherein if the type of service to be serviced is a purposeful service, the determining a target voice pattern from a voice pattern database that matches the type of service to be serviced according to the type of service to be serviced comprises:

determining at least one purposeful alternative speech pattern from the speech pattern database as a first candidate speech pattern;

and matching according to the input information, and determining the target voice mode from the first candidate voice modes.

6. The method of claim 4, wherein if the type of service to be serviced is a non-destination service, the determining candidate speech patterns from a speech pattern database that match the type of service to be serviced according to the type of service to be serviced comprises:

7. The method according to any one of claims 1 to 6, further comprising:

determining the region information of a user of an incoming call;

and determining the language of the conversation between the virtual person and the user according to the region information.

8. A virtual person conversation apparatus, comprising:

a waiting service type determining module: determining a type to be served of an incoming call, wherein the type to be served comprises a purposeful service and/or a non-purposeful service;

a target voice mode determining module: the target voice mode matching the type to be served is determined from a voice mode database according to the type to be served;

and a communication module: the virtual person determining the target voice mode is communicated with the user.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.