CN113114851B

CN113114851B - Incoming call intelligent voice reply method and device, electronic equipment and storage medium

Info

Publication number: CN113114851B
Application number: CN202110313400.8A
Authority: CN
Inventors: 金晓波; 刘泽宙; 黄庆伟; 林晓斌; 王铮
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2022-06-21
Anticipated expiration: 2041-03-24
Also published as: CN113114851A

Abstract

The disclosure provides an incoming call intelligent voice reply method, incoming call intelligent voice reply equipment and an incoming call intelligent voice reply storage medium, and relates to the technical field of artificial intelligence such as voice technology. The specific implementation scheme is as follows: the method comprises the steps that when a called terminal makes an incoming call, the called terminal intelligently answers the incoming call, first incoming call voice information in the incoming call is obtained, a scene type corresponding to the first incoming call voice information is determined, in response to the fact that geographic keywords are not involved in the first incoming call voice information, geographic position information set for the scene type by the called terminal is obtained, first reply voice information corresponding to the first incoming call voice information is generated according to the geographic position information, and the reply voice information is played to reply the first incoming call voice information. Therefore, the called terminal can intelligently answer the incoming call, and the voice message of the incoming call can be intelligently replied based on the scene type and the geographic position information, so that the intelligence and the accuracy of the reply are improved.

Description

Incoming call intelligent voice reply method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to artificial intelligence technologies such as voice technologies, and in particular, to an incoming call intelligent voice reply method, device, and storage medium.

Background

At present, in the process of using a mobile phone, when a user is busy, is not answered by a person, cannot answer the phone, and the like, when another user dials a phone number of the mobile phone, the phenomenon of missed call often occurs, which easily causes the user to miss some important information, and how to intelligently switch the incoming call to the mobile phone is a technical problem which needs to be solved urgently at present.

Disclosure of Invention

The present disclosure provides a method, apparatus, and storage medium for intelligent voice reply of an incoming call.

According to one aspect of the disclosure, an incoming call intelligent voice reply method is provided, which includes: answering a call for a called terminal, and acquiring first incoming call voice information in the call; determining a scene type corresponding to the first incoming call voice information; responding to that the first incoming call voice message does not relate to geographic keywords, and acquiring geographic position information set by the called terminal for the scene type; generating first reply voice information corresponding to the first incoming call voice information according to the geographical position information; and playing the reply voice message to reply the first incoming call voice message.

According to another aspect of the present disclosure, there is provided an incoming call intelligent voice reply device, including: the first acquisition module is used for answering a call for a called terminal and acquiring first call voice information in the call; the determining module is used for determining the scene type corresponding to the first incoming call voice information; the second obtaining module is used for responding to the fact that the geographic keywords are not involved in the first incoming call voice message, and obtaining geographic position information set by the called terminal for the scene type; generating first reply voice information corresponding to the first incoming call voice information according to the geographical position information; and the playing module is used for playing the reply voice message to reply the first incoming call voice message.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the incoming intelligent voice reply method of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the incoming intelligent voice reply method disclosed in the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the incoming call intelligent voice reply method of the present disclosure.

One embodiment in the above application has the following advantages or benefits:

the method comprises the steps that when a called terminal makes an incoming call, the called terminal intelligently answers the incoming call, first incoming call voice information in the incoming call is obtained, a scene type corresponding to the first incoming call voice information is determined, in response to the fact that geographic keywords are not involved in the first incoming call voice information, geographic position information set for the scene type by the called terminal is obtained, first reply voice information corresponding to the first incoming call voice information is generated according to the geographic position information, and the reply voice information is played to reply the first incoming call voice information. Therefore, the called terminal can intelligently answer the incoming call, the scene type corresponding to the incoming call voice information in the incoming call is determined, the incoming call voice information is intelligently replied based on the scene type and the geographic position information, and the intelligence and the accuracy of the reply are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow chart of an incoming call intelligent voice reply method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an incoming call intelligent voice reply method according to another embodiment of the disclosure;

fig. 3 is a schematic flowchart of an incoming call intelligent voice reply method according to another embodiment of the present disclosure;

fig. 4 is a flowchart illustrating an incoming call intelligent voice reply method according to another embodiment of the present disclosure;

fig. 5 is a flowchart illustrating an incoming call intelligent voice reply method according to another embodiment of the present disclosure;

fig. 6 is an interaction flow diagram of an intelligent voice response method for an incoming call according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an intelligent voice response device for incoming calls according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an intelligent voice response device for incoming calls according to another embodiment of the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing an incoming call intelligent voice reply method according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An incoming call intelligent voice reply method, an incoming call intelligent voice reply device and a storage medium according to the embodiments of the present disclosure are described below with reference to the drawings.

Fig. 1 is a schematic flowchart of an incoming call intelligent voice reply method according to an embodiment of the present disclosure.

As shown in fig. 1, the incoming call intelligent voice reply method may include:

step 101, answering an incoming call for a called terminal, and acquiring first incoming call voice information in the incoming call.

The execution subject of the intelligent incoming call voice response method is an intelligent incoming call voice response device, which may be implemented in a software and/or hardware manner, and the intelligent incoming call voice response device in this embodiment may be configured in an electronic device, which may include but is not limited to a terminal device, a server, and the like, which is not limited in this embodiment.

In some embodiments, in a case that an incoming call is detected at a called terminal, if it is determined that the called terminal starts an incoming call intelligent voice reply function, the incoming call intelligent voice reply device answers the incoming call for the called terminal and obtains first incoming call voice information in the incoming call.

The intelligent voice response function of the incoming call can be preset on a client corresponding to the intelligent voice response device of the incoming call in the called terminal.

The client may be a single application program, or an applet, or a web page, and the embodiment is not limited in this respect.

And 102, determining a scene type corresponding to the first incoming call voice message.

In some embodiments, the determining the scene type corresponding to the first incoming call voice message may be implemented in various ways, and an exemplary implementation manner is as follows:

as an example, the type of the scene corresponding to the first incoming call voice information may be determined in combination with the calling number information corresponding to the incoming call.

Specifically, a tag label of the calling number information may be obtained from the number tag library, and a scene type corresponding to the first incoming call voice information may be determined according to the tag label. For example, when the tag label of the calling number information is determined to be the courier number according to the number tag library, the scene type corresponding to the first incoming call voice information may be determined to be an express scene.

For example, when the tag label of the calling number information is determined to be the number of the takeaway from the number tag library, the type of the scene corresponding to the first incoming call voice information may be determined to be the takeaway scene.

As another example, the intention analysis may be performed according to the calling number information corresponding to the incoming call and the first incoming call voice information, and the scene type corresponding to the first incoming call voice information may be determined according to the intention analysis result.

For example, the intention analysis is performed according to the calling number information corresponding to the incoming call and the first incoming call voice information, and the scene type corresponding to the first incoming call voice information can be determined as the takeaway scene type according to the intention analysis result, wherein the incoming call is for takeaway.

For another example, the intention analysis is performed according to the calling number information corresponding to the incoming call and the first incoming call voice information, and the scene type corresponding to the first incoming call voice information can be determined as the type of the car-calling scene according to the intention analysis result that the incoming call is for receiving the passenger.

For another example, the intention analysis is performed according to the calling number information corresponding to the incoming call and the first incoming call voice information, and the scene type corresponding to the first incoming call voice information can be determined to be an express scene type according to the intention analysis result, where the incoming call is for delivering an express.

As another example, voice recognition may be performed on the first incoming call voice information to obtain text information of the first incoming call voice information; and determining the scene type corresponding to the incoming call language information according to the text information.

In this example, the scene type corresponding to the incoming call voice information is accurately determined by analyzing the text information of the first incoming call voice information.

In some embodiments, in order to accurately determine the scene type corresponding to the incoming call voice information, one possible implementation manner of determining the scene type corresponding to the incoming call voice information according to the text information is as follows: determining that preset keywords exist in the text information; and acquiring a scene type corresponding to a preset keyword, and taking the scene type corresponding to the preset keyword as the scene type corresponding to the first incoming call voice information.

For example, after the incoming call is connected, voice recognition is performed on first incoming call voice information of the incoming call, the obtained text information is 'hello and your express delivery is achieved', it is determined that a preset keyword 'express delivery' exists in the text information, and at this time, it is determined that the scene type corresponding to the first incoming call voice information is an express delivery scene type.

For another example, after the incoming call is connected, voice recognition is performed on the first incoming call voice message of the incoming call, the obtained text message is "hello, your takeout" and it is determined that the preset keyword "takeout" exists in the text message, and at this time, it is determined that the scene type corresponding to the first incoming call voice message is the takeout scene type.

The scene type in this embodiment may include, but is not limited to, an express scene type, a take-away scene type, a taxi taking scene type, and the like, and the scene type in this embodiment is not limited thereto.

And 103, responding to the first incoming call voice message without involving the geographic keywords, and acquiring geographic position information set by the called terminal for the scene type.

In some embodiments, voice recognition may be performed on the first incoming call voice information to obtain text information of the first incoming call voice information, and whether the text information includes a geographic keyword is determined, if the text information does not include the geographic keyword, it is determined that the geographic keyword is not involved in the first incoming call voice information, and geographic location information set for a scene type by the called terminal is obtained.

It can be understood that the geographical location information set by the scene type is preset in the called terminal by the user.

As an exemplary embodiment, in the process of using the called terminal, the setting may be performed by a client terminal which may be in the called terminal. Specifically, a setting request for the scene type can be sent to the incoming intelligent voice reply device through the client. Correspondingly, the intelligent incoming call voice replying device can return a configuration interface corresponding to the scene type according to the setting request, and at the moment, the user can set the geographic position information corresponding to the scene type and the reply text corresponding to the geographic position information on the configuration interface according to the requirement. It should be noted that the number of the geographic location information may be one or more. Therefore, the user can flexibly set the reply contents determined in different places according to the requirements, the advanced customization is realized, and the personalized advanced customization requirements of the user are met.

For example, the scene type is an express scene type, a reply text corresponding to the geographical location information "a certain hospital" may be set as "help me to put in a doorway bar" on a configuration interface corresponding to the express scene type, and a reply text corresponding to the geographical location information "a certain science and technology park" may be set as "help me to put in a receiving and dispatching room bar".

It should be noted that, in some embodiments, in order to meet the requirement of personalized setting, an option of adding more positions may be further set on the configuration interface, so that the user sets a corresponding reply text for more geographical location information by triggering the option.

And 104, generating first reply voice information corresponding to the first incoming call voice information according to the geographical position information.

Step 105, playing the reply voice message to reply the first incoming call voice message.

According to the intelligent voice reply method for the incoming call, under the condition that the called terminal makes an incoming call, the called terminal intelligently answers the incoming call, first incoming call voice information in the incoming call is obtained, the scene type corresponding to the first incoming call voice information is determined, geographic position information set for the scene type by the called terminal is obtained in response to the fact that geographic keywords are not involved in the first incoming call voice information, first reply voice information corresponding to the first incoming call voice information is generated according to the geographic position information, and the reply voice information is played to reply the first incoming call voice information. Therefore, the called terminal can intelligently answer the incoming call, the scene type corresponding to the incoming call voice information in the incoming call is determined, the incoming call voice information is intelligently replied based on the scene type and the geographic position information, and the reply intelligence and accuracy are improved.

It can be understood that, in some implementations, in a case that the intelligent reply function corresponding to the scene type is not turned on, the first reply voice message corresponding to the first incoming call voice message may be generated according to the bottom-of-pocket reply set by the called terminal for the scene type.

For example, a bottom-of-pocket reply may be "i am now inconvenient, contact your bar later".

In some embodiments, the bottom-of-pocket reply corresponding to the scene type may be set by the user on a configuration interface corresponding to the scene type, and the configuration interface may further include a function of setting the corresponding geographic location information and the corresponding reply text, and for specific description, reference may be made to relevant description in the above embodiments, and details are not repeated here.

In some embodiments, in some scenarios, the first incoming voice message may directly include a geographic keyword, and in order to further improve the intelligence of the reply, as shown in fig. 2, the method may further include:

step 201, in response to that the first incoming call voice message includes the geographic keyword, determining second geographic position information corresponding to the geographic keyword in the first incoming call voice message.

Step 202, obtaining a first reply text set by the called terminal for the second geographic location information.

Step 203, generating a first reply voice message corresponding to the first incoming call voice message according to the question text.

In this embodiment, under the condition that the first incoming call voice message includes the geographic keyword, the first reply text set by the called terminal for the second geographic position information is acquired by combining the second geographic position information corresponding to the geographic keyword in the first incoming call voice message, and the first reply voice message corresponding to the first incoming call voice message is generated directly based on the question text. Therefore, accurate reply to the incoming call voice information is achieved by combining the geographical position information, and the phenomenon that articles are misplaced due to the fact that unified reply is given without combining the geographical position information is avoided.

Fig. 3 is a flowchart illustrating an incoming call intelligent voice reply method according to another embodiment of the present disclosure. It should be noted that this embodiment is further refined or optimized from the embodiment shown in fig. 1.

As shown in fig. 3, the incoming call intelligent voice reply method may include:

step 301, answering the incoming call for the called terminal, and acquiring the first incoming call voice information in the incoming call.

Step 302, determining a scene type corresponding to the first incoming call voice message.

Step 303, in response to that the first incoming call voice message does not relate to the geographic keyword, obtaining geographic location information set by the called terminal for the scene type.

It should be noted that the steps 301 to 303 are the same as the step 103 of the step 101, and are not described herein again.

Step 304, generating a question text including the geographical location information.

For example, the first incoming call voice message is "hello, your express delivery arrives", at this time, it is determined that the scene type corresponding to the first incoming call voice message is the express delivery scene type, and it is assumed that the geographic location information set by the called terminal for the scene type is "XX home", at this time, the generation of the question text including the geographic location information may be "ask a question in XX home".

Step 305, generating a first reply voice message corresponding to the first incoming call voice message according to the question text.

In this embodiment, after the question text is acquired, the voice information corresponding to the question text may be synthesized based on the voice synthesis module, and the generated voice information is used as the first reply voice information corresponding to the first incoming call voice information.

Step 306, playing the reply voice message to reply the first incoming call voice message.

In this embodiment, in the process of replying to the first incoming call voice message, a question text including the geographical location information is directly generated, and the first reply voice message corresponding to the first incoming call voice message is generated according to the question text. Therefore, the reply voice information for asking the question of the calling terminal is generated based on the geographic position of the called terminal in the scene type in advance, the geographic position information of the calling terminal is further determined, then the subsequent voice information based on the incoming call is more accurately replied conveniently, and the interaction intelligence is improved.

In other embodiments, another possible implementation manner of generating the first reply voice message corresponding to the first incoming call voice message according to the geographic location information is as follows: and acquiring a question text preset by the called terminal for the geographic position information, and generating first reply voice information corresponding to the first incoming call voice information based on the inquired question text. The questioning text comprises geographic position information.

Based on the foregoing embodiment, generally, multiple rounds of conversations may be involved in the incoming call process, and after the intelligent incoming call voice replying device replies to the first incoming call voice message by using the first reply voice message, in order to implement accurate and intelligent replying to the first incoming call voice message, on the basis shown in fig. 3 and as shown in fig. 4, the intelligent incoming call voice replying method may further include:

step 401, obtaining second incoming call voice information corresponding to the first reply voice information from the incoming call.

At step 402, an intent type of the second incoming voice message is determined.

In some embodiments, the second incoming call speech information may be speech recognized to obtain corresponding text information, and an intent type of the second incoming call speech information may be determined based on the text information.

The intention type may include a positive intention type and a negative intention type, among others.

And step 403, acquiring a second reply text preset for the geographical location information by the called terminal under the condition that the intention type is the positive intention type.

And step 404, generating a second reply voice message corresponding to the second incoming call voice message according to the second reply text.

In this embodiment, after the question text is acquired, the voice information corresponding to the question text may be synthesized based on the voice synthesis module, and the generated voice information may be used as the second reply voice information corresponding to the second incoming call voice information.

In some embodiments, there may be a case where the intention type is a negative intention type, and in order to intelligently reply to the incoming voice information in the case of the negative intention type, as shown in fig. 5, the method may further include:

step 501, determining whether the second incoming call voice message relates to a geographic keyword, if so, executing step 502 to step 503, otherwise, executing step 504 to step 506.

Step 502, selecting an un-traversed geographical location information from the plurality of geographical location information as a first target geographical location information.

Step 503, generating a second reply voice message corresponding to the second incoming call voice message according to the first target geographical location information.

Step 504, determining second target geographic position information corresponding to the geographic keyword in the second incoming call voice information.

And 505, acquiring a third reply text set by the called terminal for the second target geographic information.

Step 506, according to the third reply text, generating a second reply voice message corresponding to the second incoming call voice message.

In some embodiments, in order to make the present disclosure clear to those skilled in the art, the incoming call intelligent voice reply method in this embodiment is described below with reference to fig. 6. In this embodiment, the intelligent voice reply device for incoming call is described as an example, and the server may include a call center, a voice recognition module, an intelligent incoming call secretary platform, an intelligent dialog engine, and a voice synthesis module. The above components in the server are mutually matched to implement the incoming call intelligent voice reply method, where an interaction diagram among the above components is shown in fig. 6.

Specifically, after the incoming call is forwarded to the called terminal, the call center acquires the voice information a of the incoming call and sends the voice information a to the voice recognition module (601). Correspondingly, the voice recognition module sends the user text information A corresponding to the voice information A to the intelligent incoming call secretary platform (602). The intelligent incoming call secretary platform sends the user text information a to the intelligent dialogue engine (603). Correspondingly, the intelligent dialogue engine determines a corresponding scene type based on the user text information A and sends the scene type to the intelligent incoming secretary platform (604). Correspondingly, when the intelligent incoming call secretary platform determines that the scene type is not the preset scene type (for example, an express scene type, a take-away scene type and a taxi taking scene type), the reply configuration content is sent to the call center (6051). In addition, when the intelligent incoming call secretary platform determines that the scene type is one of the preset scene types, the intelligent incoming call secretary platform user generates the geographic position information corresponding to the scene type, generates a question text comprising the geographic position information, and sends the question text to the voice synthesis module (6052). Correspondingly, the voice synthesis module generates voice information of the question text and sends the voice information to the call center (606). For example, the text corresponding to the voice message may be "ask for a question in a certain number of morbid".

The call center receives the voice message B again, and the call center sends the voice message B received again to the voice recognition module (607). Correspondingly, the voice recognition module sends the user text information B corresponding to the voice information B to the intelligent incoming call secretary platform (608). The intelligent incoming secretary platform sends the user text information B to the intelligent dialog engine (609). Correspondingly, the intelligent dialog engine determines a corresponding intention type based on the user text information B, and sends the intention type to the intelligent incoming secretary platform (610), wherein the intention type comprises a positive intention type and a negative intention type. And under the condition that the intention type is determined to be the positive intention type, the intelligent incoming call secretary platform acquires a reply text preset by the called terminal for the geographic position information, acquires the voice information of the reply text and sends the voice information corresponding to the reply text to the call center (6111). Under the condition that the intelligent incoming call secretary platform determines that the intention type is a negative intention type, the intelligent incoming call secretary platform can obtain question text corresponding to next geographical position information and voice information of the question text based on the next geographical position information configured by the user, and sends the voice information to the call center (6112). And continuing to carry out multiple rounds of conversations based on the mode until the call is ended.

In order to implement the foregoing embodiment, an intelligent voice reply device for incoming calls is further provided in the embodiments of the present disclosure.

Fig. 7 is a schematic structural diagram of an intelligent incoming call voice response device according to an embodiment of the present disclosure.

As shown in fig. 7, the intelligent incoming call voice responding apparatus 700 may include a first obtaining module 701, a first determining module 702, a second obtaining module 703, a first reply generating module 704, and a playing module 705. Wherein:

the first obtaining module 701 is configured to answer an incoming call for a called end and obtain first incoming call voice information in the incoming call.

A first determining module 702 is configured to determine a scene type corresponding to the first incoming call voice information.

The second obtaining module 703 is configured to, in response to that the geographic keyword is not involved in the first incoming call voice message, obtain geographic location information set by the called end for the scene type.

The first reply generation module 704 is configured to generate a first reply voice message corresponding to the first incoming call voice message according to the geographic location information.

The playing module 705 is configured to play the reply voice message to reply the first incoming call voice message.

It should be noted that the explanation of the embodiment of the intelligent voice reply method for incoming calls is also applicable to this embodiment, and this implementation is not described again.

According to the intelligent voice reply device for the incoming call, when the called terminal makes an incoming call, the called terminal intelligently answers the incoming call, first incoming call voice information in the incoming call is obtained, the scene type corresponding to the first incoming call voice information is determined, geographic position information set for the scene type by the called terminal is obtained in response to the fact that geographic keywords are not involved in the first incoming call voice information, first reply voice information corresponding to the first incoming call voice information is generated according to the geographic position information, and the reply voice information is played to reply the first incoming call voice information. Therefore, the called terminal can intelligently answer the incoming call, the scene type corresponding to the incoming call voice information in the incoming call is determined, the incoming call voice information is intelligently replied based on the scene type and the geographic position information, and the reply intelligence and accuracy are improved.

In an embodiment of the present disclosure, as shown in fig. 8, the incoming intelligent voice reply device may include: a first obtaining module 801, a first determining module 802, a second obtaining module 803, a first reply generating module 804, a playing module 805, a second determining module 806, a third obtaining module 807, a second reply generating module 808, a fourth obtaining module 809, a third determining module 810, a fifth obtaining module 811, a third reply generating module 812, a selecting module 813, a fourth reply generating module 814, a fourth determining module 815, a sixth obtaining module 816, and a fifth reply generating module 817, wherein the first determining module 802 may include: a speech recognition unit 8021 and a determination unit 8022.

For a detailed description of the first obtaining module 801, the first determining module 802, the second obtaining module 803, the first reply generating module 804, and the playing module 805, please refer to the descriptions of the first obtaining module 701, the second obtaining module 703, the first reply generating module 704, and the playing module 705 in the embodiment shown in fig. 7, which will not be described herein.

In an embodiment of the present disclosure, the second determining module 806 is configured to determine, in response to that the first incoming call voice message includes a geographic keyword, second geographic location information corresponding to the geographic keyword in the first incoming call voice message.

A third obtaining module 807, configured to obtain a first reply text set by the called end for the second geographic location information;

the second reply generating module 808 is configured to generate, according to the question text, first reply voice information corresponding to the first incoming call voice information.

In one embodiment of the present disclosure, the first determining module 802 may include:

a voice recognition unit 8021, configured to perform voice recognition on the first incoming call voice information to obtain text information of the first incoming call voice information;

the determining unit 8022 is configured to determine, according to the text information, a scene type corresponding to the incoming call language information.

In an embodiment of the present disclosure, the determining unit 8022 is specifically configured to:

determining that preset keywords exist in the text information;

and acquiring a scene type corresponding to a preset keyword, and taking the scene type corresponding to the preset keyword as the scene type corresponding to the first incoming call voice information.

In an embodiment of the disclosure, the first recovery generation module 804 is specifically configured to: generating a question text including geographical location information; and generating first reply voice information corresponding to the first incoming call voice information according to the question text.

In one embodiment of the present disclosure, as shown in fig. 8, the apparatus further includes:

a fourth obtaining module 809, configured to obtain second incoming call voice information corresponding to the first reply voice information from the incoming call;

a third determining module 810 for determining an intention type of the second incoming call voice information;

a fifth obtaining module 811, configured to obtain, when the intention type is an affirmative intention type, a second reply text preset for the geographic location information by the called end;

and a third reply generating module 812, configured to generate a second reply voice message corresponding to the second incoming call voice message according to the second reply text.

In an embodiment of the present disclosure, as shown in fig. 8, the number of the geographic location information is multiple, and the apparatus further includes:

a selecting module 813, configured to, in a case that the intention type is a negative intention type, in response to that the second incoming call voice information does not relate to a geographic keyword, select one unthrowed geographic location information from the multiple geographic location information as the first target geographic location information;

a fourth reply generation module 814, configured to generate a second reply voice message corresponding to the second incoming call voice message according to the first target geographic location information.

a fourth determining module 815, configured to determine, in response to that the second incoming call voice message includes a geographic keyword, second target geographic location information corresponding to the geographic keyword in the second incoming call voice message;

a sixth obtaining module 816, configured to obtain a third reply text set by the called end for the second target geographic information;

and a fifth reply generation module 817 for generating a second reply voice message corresponding to the second incoming call voice message according to the third reply text.

It should be noted that the explanation of the embodiment of the intelligent voice reply method for incoming calls is also applicable to the intelligent voice reply device for incoming calls in this embodiment, and is not repeated here.

The present disclosure also provides an electronic device and a readable storage medium and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 901 performs the respective methods and processes described above, such as the incoming call intelligent voice reply method. For example, in some embodiments, the incoming intelligent voice reply method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When loaded into RAM 903 and executed by computing unit 901, may perform one or more of the steps of the incoming intelligent voice reply method described above. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the incoming intelligent voice reply method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the Internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be noted that artificial intelligence is a subject for studying a computer to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware and software technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An incoming call intelligent voice reply method comprises the following steps:

answering a call for a called terminal, and acquiring first incoming call voice information in the call;

determining a scene type corresponding to the first incoming call voice information;

responding to that the first incoming call voice message does not relate to geographic keywords, and acquiring geographic position information set by the called terminal for the scene type;

generating first reply voice information corresponding to the first incoming call voice information according to the geographical position information;

playing the reply voice message to reply the first incoming call voice message;

the generating of the first reply voice message corresponding to the first incoming call voice message according to the geographical location information includes:

generating a question text including the geographical location information;

generating first reply voice information corresponding to the first incoming call voice information according to the question text;

acquiring second incoming call voice information corresponding to the first reply voice information from the incoming call;

determining an intention type of the second incoming voice information;

under the condition that the intention type is a positive intention type, acquiring a second reply text preset by the called terminal for the geographical location information;

and generating second reply voice information corresponding to the second incoming call voice information according to the second reply text.

2. The method of claim 1, wherein the method further comprises:

responding to the fact that the first incoming call voice message comprises geographic keywords, and determining second geographic position information corresponding to the geographic keywords in the first incoming call voice message;

acquiring a first reply text set by the called terminal for the second geographical location information;

and generating first reply voice information corresponding to the first incoming call voice information according to the question text.

3. The method of claim 1, wherein the determining the scene type corresponding to the first incoming call voice message comprises:

carrying out voice recognition on the first incoming call voice information to obtain text information of the first incoming call voice information;

and determining the scene type corresponding to the incoming call language information according to the text information.

4. The method according to claim 3, wherein the determining a scene type corresponding to the incoming call language information according to the text information comprises:

determining that preset keywords exist in the text information;

and acquiring a scene type corresponding to the preset keyword, and taking the scene type corresponding to the preset keyword as the scene type corresponding to the first incoming call voice information.

5. The method of claim 1, wherein the geographic location information is plural, the method further comprising:

in the case that the intention type is a negative intention type, in response to the second incoming call voice information not relating to a geographic keyword, selecting one piece of unswept geographic position information from the plurality of pieces of geographic position information as first target geographic position information;

and generating second reply voice information corresponding to the second incoming call voice information according to the first target geographical position information.

6. The method of claim 5, wherein the method further comprises:

responding to the second incoming call voice message including the geographic key words, and determining second target geographic position information corresponding to the geographic key words in the second incoming call voice message;

acquiring a third reply text set by the called terminal for the second target geographic information;

and generating second reply voice information corresponding to the second incoming call voice information according to the third reply text.

7. An incoming call intelligent voice reply device, comprising:

the first acquisition module is used for answering a call for a called terminal and acquiring first call voice information in the call;

the first determining module is used for determining a scene type corresponding to the first incoming call voice information;

the second obtaining module is used for responding to the fact that the geographic keywords are not involved in the first incoming call voice message, and obtaining geographic position information set by the called terminal for the scene type;

the first reply generation module is used for generating first reply voice information corresponding to the first incoming call voice information according to the geographic position information;

the playing module is used for playing the reply voice message to reply the first incoming call voice message;

the first reply generation module is specifically configured to:

generating a question text including the geographical location information;

a fourth obtaining module, configured to obtain, from the incoming call, second incoming call voice information corresponding to the first reply voice information;

a third determination module for determining an intention type of the second incoming voice information;

a fifth obtaining module, configured to obtain a second reply text preset by the called end for the geographic location information when the intention type is an affirmative intention type;

and the third reply generation module is used for generating second reply voice information corresponding to the second incoming call voice information according to the second reply text.

8. The apparatus of claim 7, wherein the apparatus further comprises:

the second determining module is used for determining second geographic position information corresponding to the geographic keyword in the first incoming call voice information in response to the fact that the first incoming call voice information comprises the geographic keyword;

a third obtaining module, configured to obtain a first reply text set by the called end for the second geographic location information;

and the second reply generation module is used for generating first reply voice information corresponding to the first incoming call voice information according to the question text.

9. The apparatus of claim 7, wherein the first determining means comprises:

the voice recognition unit is used for carrying out voice recognition on the first incoming call voice information to obtain text information of the first incoming call voice information;

and the determining unit is used for determining the scene type corresponding to the incoming call language information according to the text information.

10. The apparatus according to claim 9, wherein the determining unit is specifically configured to:

determining that preset keywords exist in the text information;

11. The apparatus of claim 7, wherein the geographic location information is a plurality, the apparatus further comprising:

a selection module, configured to, in a case that the intention type is a negative intention type, in response to that the second incoming call voice information does not relate to a geographic keyword, select one piece of unextraversable geographic location information from the plurality of pieces of geographic location information as first target geographic location information;

and the fourth reply generation module is used for generating second reply voice information corresponding to the second incoming call voice information according to the first target geographical position information.

12. The apparatus of claim 11, wherein the apparatus further comprises:

a fourth determining module, configured to determine, in response to that the second incoming call voice message includes a geographic keyword, second target geographic location information corresponding to the geographic keyword in the second incoming call voice message;

a sixth obtaining module, configured to obtain a third reply text that is set by the called end for the second target geographic information;

and the fifth reply generation module is used for generating second reply voice information corresponding to the second incoming call voice information according to the third reply text.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.