CN117834783A

CN117834783A - Telephone outbound method, device, electronic equipment and storage medium

Info

Publication number: CN117834783A
Application number: CN202311802217.XA
Authority: CN
Inventors: 刘海伦; 沈鹏; 周晓波; 黄明星; 黄婷; 许垒; 王之琢; 郑福
Original assignee: Beijing Shuidi Technology Group Co ltd
Current assignee: Beijing Shuidi Technology Group Co ltd
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-04-05

Abstract

In the method, a natural language big model in a preset model is corrected according to various task heads, the preset model can intelligently recommend insurance for a user according to audio characteristics of voice signals of the user, text contents converted by the voice signals, user information and historical dialogue texts, solve the problem about insurance of the user, and guide the user to complete the whole insurance application and insurance flow through voice. The method comprises the following steps: acquiring a sound signal and user information of a user with an insurance intention based on operation information of the user on an insurance application interface; extracting audio characteristics from the sound signal and converting the sound signal into text content; and inputting the fused features obtained by utilizing the audio features, the text content, the user information and the historical dialogue text into a preset model to obtain a reply text, wherein a natural language big model in the preset model is obtained by correcting according to various task heads.

Description

Telephone outbound method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of insurance telephone sales technologies, and in particular, to a method and apparatus for calling out a telephone, an electronic device, and a storage medium.

Background

Currently, telemarketing insurance is a common sales approach for insurance companies. In the related art, a large amount of dialogue data including questions and answers is pre-stored in a database. When the questions spoken by the user are identified to be the same as the questions prestored in the database, outputting answers corresponding to the prestored questions to the user. However, since the number of pre-stored dialogue data is still limited to some extent and the content of dialogue data is not flexible enough, the use effect in the actual insurance sales scenario is not good.

Therefore, how to more intelligently converse with the user in the scene of telemarketing insurance becomes a technical problem to be solved by the technicians in the field.

Disclosure of Invention

The application provides a telephone outbound method, a telephone outbound device, an electronic device and a storage medium, wherein a natural language big model in a preset model is corrected according to various task heads, the preset model can intelligently recommend insurance for a user according to audio characteristics of voice signals of the user, text contents converted by the voice signals, user information and historical dialogue texts, solve the problem about insurance of the user, and guide the user to complete the whole insurance application flow through voice.

In a first aspect, an embodiment of the present application provides a method for outbound call of a telephone, including:

acquiring a voice contact way of a user with an insurance intention based on operation information of the user on an insurance application interface;

requesting voice communication with the user according to the voice contact mode;

responding to the establishment of voice communication with a user, and acquiring a voice signal of the user and user information, wherein the user information at least comprises the operation progress of the user on an application interface;

extracting audio characteristics from the sound signal and converting the sound signal into text content;

if the historical dialogue text exists in the voice communication, fusing the audio characteristics, the text content, the user information and the historical dialogue text to obtain fused characteristics;

and inputting the fused features into a preset model to obtain a reply text, wherein the preset model comprises a natural language big model and a generating head, and the natural language big model is obtained by correcting according to a user emotion recognition head, a recommendation head of an insurance product, a recognition head of a user insurance application progress and the generating head.

In some embodiments, the method further comprises:

if the historical dialogue text does not exist in the language communication, the audio features, the text content and the user information are fused, and the fused features are obtained.

In some embodiments, the method further comprises:

converting the reply text into a voice form to obtain target audio;

and sending the target audio to a user side so that the user side plays the target audio.

In some embodiments, after the sending the target audio to the user side, the method further includes:

acquiring a new sound signal, new user information and a new historical dialogue text of a user, wherein the new user information at least comprises the operation progress of the user on an application interface;

updating the original voice signal by using the new voice signal, updating the original user information by using the new user information, and updating the original historical dialogue text by using the new historical dialogue text;

and re-executing the steps of extracting the audio characteristics from the sound signals and converting the sound signals into text contents until the voice communication with the user is finished.

In some embodiments, the generating of the preset model includes:

training the preset model by using a large amount of sample data.

In some embodiments, the method further comprises:

and in the process of training the preset model by using a large amount of sample data, updating the user emotion recognition head, the recommendation head of the insurance product, the recognition head of the user insurance application progress and the generation head.

In some embodiments, the step of extracting audio features from the sound signal comprises:

extracting Mel frequency spectrum coefficient characteristics from the sound signal.

In a second aspect, an embodiment of the present application further provides a device for calling out a phone, including:

the first acquisition unit is used for acquiring voice contact information of the user with the insurance intention based on the operation information of the user on the insurance application interface;

the request unit is used for requesting voice communication with the user according to the voice contact way;

the second acquisition unit is used for responding to the establishment of voice communication with the user and acquiring the voice signal of the user and user information, wherein the user information at least comprises the operation progress of the user on an application interface;

the conversion unit is used for extracting audio characteristics from the sound signals and converting the sound signals into text contents;

the first fusion unit is used for fusing the audio features, the text content, the user information and the historical dialogue text to obtain fused features if the historical dialogue text exists in the voice communication;

the input unit is used for inputting the fused features into a preset model to obtain a reply text, wherein the preset model comprises a natural language big model and a generation head, and the natural language big model is obtained by correcting the natural language big model according to a user emotion recognition head, a recommendation head of an insurance product, a recognition head of a user insurance application progress and the generation head.

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to implement the steps of the phone outbound method.

In a fourth aspect, embodiments of the present application further provide a computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the method for outgoing calls.

According to the telephone outbound method, the telephone outbound device, the telephone outbound electronic equipment and the telephone outbound storage medium, natural language big models in the preset models are corrected according to various task heads, the preset models can intelligently recommend insurance for users according to audio features of voice signals of the users, text contents converted by the voice signals, user information and historical dialogue texts, the problem about insurance of the users is solved, and the users are guided by voice to complete the whole insurance application flow. The method comprises the following steps: acquiring a voice contact way of a user with an insurance intention based on operation information of the user on an insurance application interface; requesting voice communication with the user according to the voice contact mode; responding to the establishment of voice communication with a user, and acquiring a voice signal of the user and user information, wherein the user information at least comprises the operation progress of the user on an application interface; extracting audio characteristics from the sound signal and converting the sound signal into text content; if the historical dialogue text exists in the voice communication, fusing the audio characteristics, the text content, the user information and the historical dialogue text to obtain fused characteristics; and inputting the fused features into a preset model to obtain a reply text, wherein the preset model comprises a natural language big model and a generating head, and the natural language big model is obtained by correcting according to a user emotion recognition head, a recommendation head of an insurance product, a recognition head of a user insurance application progress and the generating head.

Drawings

FIG. 1 illustrates a flow chart of a method of outbound telephone calls provided in accordance with some embodiments;

fig. 2 schematically illustrates a structural diagram of a telephone outbound device provided in accordance with some embodiments;

fig. 3 illustrates a block diagram of an electronic device provided in accordance with some embodiments.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that such uses may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or described herein. Furthermore, the terms "include" and variations thereof are to be interpreted as open-ended terms that mean "include, but are not limited to.

Currently, telemarketing insurance is a common sales approach for insurance companies. In the related art, a large amount of dialogue data including questions and answers is pre-stored in a database. When the questions spoken by the user are identified to be the same as the questions prestored in the database, outputting answers corresponding to the prestored questions to the user. However, since the number of pre-stored dialogue data is still limited to some extent and the content of dialogue data is not flexible enough, the use effect in the actual insurance sales scenario is not good. Therefore, how to more intelligently converse with the user in the scene of telemarketing insurance becomes a technical problem to be solved by the technicians in the field.

In order to solve the technical problems, the embodiment of the application provides a telephone outbound method, wherein a natural language big model in a preset model is corrected according to various task heads, the preset model can intelligently recommend insurance for a user according to audio characteristics of voice signals of the user, text contents converted by the voice signals, user information and historical dialogue text, solve the problem about insurance of the user, and guide the user to complete the whole insurance application flow through voice.

Fig. 1 illustrates a flow chart of a method of outbound telephone call provided in accordance with some embodiments, the method including steps S100-S800.

S100, acquiring a voice contact way of the user with the insurance intention based on the operation information of the user on the insurance application interface.

In the embodiment of the application, the insurance company can provide the insurance application interface for the user on various terminals. In one example, an insurance company may provide a web page application interface for users to browse at the computer end and select appropriate insurance policies. In another example, an insurance company may provide an application insurance interface for users to browse and select appropriate insurance policies on the cell phone side.

It should be noted that, the application interface may be a set of interfaces, including a personal homepage, a product display page, an application notification page, a payment page, and the like, and the user may view different interfaces through a page turning operation. Of course, a single interface is possible, on which the user can view and select the appropriate insurance policy.

In this embodiment of the present application, the operation information of the user on the application interface includes information generated when the user browses the interface, for example, a duration of time that the user stays on a certain interface or a certain portion of the interface, and may further include information of clicking operation behavior of the user, for example, the user clicks a certain control. It should be noted that, the operation information of the user on the application interface is collected after the user agrees to acquire and subsequently use.

In the embodiment of the application, the client with the insurance intention is determined based on the operation information of the user on the insurance application interface. Firstly, operation information of all users on an application interface is acquired. And then, screening the user with the insurance intention according to a preset rule on the acquired operation information on the insurance interface.

In some embodiments, the preset rules may be set according to a length of time that the user remains in a certain interface, e.g., when the length of time that the user remains in interface a is greater than t seconds, the user is determined to be a user with an intent to apply. The preset rule may also be set according to a click operation behavior of the user, for example, when the user clicks the purchase determination control, the user is determined to be the user with the intention of applying, and other modes may be adopted to set the preset rule.

In the embodiment of the application, the user with the insurance intention is screened according to the operation information of the user on the insurance application interface, and then the voice contact way of the user with the insurance intention is acquired, so that the user can be helped to select a proper insurance policy. It should be noted that, the voice contact information of the user with the insurance intention is also obtained after the user agrees.

The language contact may be, for example, a user's cell phone number.

S200, requesting voice communication with the user according to the voice contact mode.

In one example, the voice contact may be a mobile phone number that the insurer may dial through the insurer's terminal requesting voice communication with the user. Of course, to protect user privacy, the voice contact may be a virtual number corresponding to the user.

S300, responding to establishment of voice communication with the user, and acquiring voice signals of the user and user information, wherein the user information at least comprises operation progress of the user on an application interface.

In the embodiment of the application, when establishing voice communication with the user, the voice signal of the user can be obtained when the user speaks.

In the embodiment of the application, the user information may include information such as gender and age of the applicant in addition to the operation progress of the user on the application interface.

In some embodiments, the progress of a user's operation on an application interface may be represented as one of a set of interfaces when the application interface includes the set of interfaces. The application interface comprises a personal homepage, a product display page, an application notification page and a payment page, and the user can check the interface to be the product display page, so that the operation progress of the user on the application interface is determined to be the product display page.

S400, extracting audio characteristics from the sound signal and converting the sound signal into text contents.

In the embodiment of the application, in order to enable the content of the replied user to be more accurate when the terminal of the insurance company and the user communicate in a voice way, the voice signal is analyzed, and the audio characteristics are extracted. Meanwhile, the sound signals are converted into text contents, so that preparation is made for the input of a subsequent natural language large model.

In some embodiments, a feature extraction module may be employed to extract audio features from the sound signal. In some embodiments, the step of extracting audio features from the sound signal comprises: mel-frequency spectral coefficient (Mel Frequency Cepstrum Coefficient, MFCC) features are extracted for the sound signal.

In some embodiments, the step of extracting mel-frequency spectral coefficient features for the sound signal comprises:

extracting mel-frequency spectrum coefficients from the sound signal;

dividing the mel-frequency spectral coefficients into a series of mel-filter banks;

and calculating the energy of each filter in each Mel filter group, and taking the energy as the characteristics of Mel frequency spectrum coefficients.

In some embodiments, the sound signal is converted into text content using an ASR (automatic speech recognition ) method.

S500, detecting whether a history dialogue text exists in the voice communication.

After establishing voice communication with the user, there may be multiple rounds of dialogue between the user and the insurance company, so that in order to make the reply text of the insurance company more accurate, in the embodiment of the present application, it is detected whether there is a history dialogue text in the voice communication. If the historical dialogue text exists, the historical dialogue text participates in the process of determining the reply text, so that the reply text is more accurate.

And S600, if the historical dialogue text exists in the voice communication, fusing the audio features, the text content, the user information and the historical dialogue text to obtain fused features.

In the embodiment of the application, the audio features, the text content, the user information and the historical dialogue text are fused, so that the follow-up preset model can finally output an accurate reply text according to more features.

In some embodiments, the literal content and the historical dialog text are represented in unstructured text description form. The user information is represented in the form of structured data. The audio features are represented in the form of embedded vectors. In some embodiments, after embedding different forms of text content, historical dialogue text, user information, audio features and other modes by adopting a shallow network, fusing is performed in an embedding space, and fused features are obtained.

In the embodiment of the application, the audio features, the text content, the user information and the historical dialogue text are fused by using a feature fusion layer, so that fused features are obtained. And respectively designing an audio feature embedding layer, a structured data embedding layer, a text data embedding layer and an ASR result embedding layer aiming at different modes in the feature fusion layer, wherein the ASR result embedding layer is also a text data embedding layer. And extracting the embedded vectors of different modes by using different embedded layers, wherein the audio features are extracted by using an audio feature embedded layer, the user information is extracted by using a structured data embedded layer, the historical dialogue text is extracted by using a text data embedded layer, and the text content is extracted by using an ASR result embedded layer.

Wherein the audio feature embedding layer is a one-layer fully connected network, and the structured data embedding layer and the text data embedding layer are both shallow neural networks based on an attention mechanism. As shown in x _i ＝W _a [u _i ,β _i ]Shown, wherein [ u ] _i ,β _i ]Is a different key value pair in the structured data and is used for embedding the structured data. Whereas unstructured text data is embedded using a transducer. Transformer embedding refers to a technique by which an input text sequence can be converted into a vector representation of fixed length.

In the embodiment of the application, the peripheral module is designed around the natural language large model in the security telephone sales scene, the mode of multi-mode input is adopted, each mode is designed to be integrated after being independently embedded into a layer, voice, text and structural data are effectively integrated in an embedded space and used by a preset model, and the input of multi-mode information is convenient for generating a more proper reply text.

In some embodiments, S700, if no history dialogue text exists in the language communication, the audio feature, the text content and the user information are fused, so as to obtain a fused feature.

In the embodiment of the application, since the history dialogue between the user and the insurance company terminal does not exist at the beginning of the establishment of the voice communication with the user, the history dialogue text does not exist correspondingly. At this time, only the audio features, the text content and the user information are fused, so as to obtain fused features. The specific fusion method is the same as that in step S600, and will not be described here again.

S800, inputting the fused features into a preset model to obtain a reply text, wherein the preset model comprises a natural language big model and a generation head, and the natural language big model is obtained by correcting according to a user emotion recognition head, a recommendation head of an insurance product, a recognition head of a user insurance application progress and the generation head.

In this embodiment of the present application, the step of inputting the fused feature into a preset model includes: and inputting the fused features into the natural language large model to obtain a fused vector. And inputting the fusion vector into the generating head to output a reply text. In this embodiment of the present application, the fusion vector is a 768-dimensional vector.

A large model of natural language is typically generated by generating a piece of text from the input text to answer questions in the input text. In the insurance telemarketing scenario, the input and output exhibit diversity. The input of the text-type historical user dialogue content, the voice signal of the user, the text content converted by the voice signal, the user information and the like; the output is not only needed to answer the questions of the user, but also needed to guide the user, identify the emotion of the user in real time, recommend insurance products according to historical conversations and user information, and predict the progress of user insurance application operation, so as to assist in accurate and correct answer. The challenges presented by these requirements are not just solved by a large model of natural language.

Therefore, in the embodiment of the application, when the preset model is set, the task head is added on the basis of the natural language big model, and the task head comprises a user emotion recognition head, a recommendation head of insurance products, a recognition head of user insurance application progress and a generation head. Due to the particularities of different tasks, parameters of the large natural language model need to be further adjusted and refined when the preset model is trained so as to meet the requirements of different tasks.

In some embodiments, the generating of the preset model includes: training the preset model by using a large amount of sample data.

In the embodiment of the application, the sample data may be a fused feature and a reply text generated by using a preset large amount of correct dialogue content.

In some embodiments, the user emotion recognition head, recommendation head for insurance products, recognition head for user's application progress, and generation head are updated during training of the preset model with a large amount of sample data.

In the embodiment of the application, the parameters of the large natural language model are continuously updated according to the content of the sample data while the parameters of the large natural language model are corrected by using the user emotion recognition head, the recommendation head of the insurance product, the recognition head of the user insurance progress and the generation head, so that the parameters of the large natural language model can be corrected more accurately.

The user emotion recognition head can recognize the user emotion, and the user emotion recognition is helpful for assisting the generation head in generating a reply text which is more effective for comforting the user and prompting the user to purchase insurance products.

User emotion can be categorized into anger, happiness, calm and anxiety. In the training stage, four emotion recognition results are output for each text content input converted from sound signals of the user, and the loss function adopts cross entropy due to classification.

The recommendation head of the insurance product is a classification head, the specific category is set according to the on-sale product, and meanwhile, a non-product on-sale category is added to judge whether a user is suitable for purchasing the insurance product or not.

The user's insuring progress is classified according to the interface that the specific user needs to operate in the identification head of the user's insuring progress, for example, the operation steps can be classified into entering personal homepage, entering product display page, entering insuring notification page, entering payment page, etc. The recognition head of the user's insuring progress is still a classification head, and the cross entropy is adopted for training in the training stage.

The result of the generating head is the only result displayed for the user, and each task head is used for restricting the generation of the preset model so as not to generate larger deviation and illusion. The generating head generates a section of reply text for replying to the user in an autoregressive generating mode. And calculating the loss by adopting cross entropy of the token (minimum unit in the text) generated in each step of the head training stage, and calculating the arithmetic average of all token losses to obtain the final loss.

In the embodiment of the application, the training is in the form of a multi-task head, each task head is regular supervision on network parameters of a large natural language model, and other task heads are auxiliary to corresponding tasks of the generating head, so that a preset model can generate a reply text which is more suitable for a reply user.

In the training stage, a multitask head mode is adopted, so that a preset model can be converged to a real valley point, and an auxiliary language generation part generates an appropriate conversation.

In some embodiments, the method further comprises:

converting the reply text into a voice form to obtain target audio;

In the embodiment of the present application, since the user is in a scenario of establishing voice communication with the user, after the reply text is obtained, the reply text needs to be converted into a voice form, and the converted target audio is sent to the user terminal, so that the user can listen to the reply content of the insurance company terminal through the user terminal.

In this embodiment of the present application, since there may be multiple rounds of conversations between the user and the insurance company end, when a new sound signal, new user information, and a new history conversational text of the user are acquired, the fused features generated by using the acquired new contents are input into a preset model, so as to obtain a reply text of the new round of conversations until the voice communication between the insurance company end and the user ends.

The natural language big model can better understand the user questions and give proper and accurate answers. The method provides a basis for designing a set of intelligent telephone outbound systems based on a large natural language model. In the embodiment of the application, the insurance telephone sales scene is considered, a set of multi-mode and multi-task telephone outbound system is designed around a natural language large model, and the insurance recommendation can be intelligently carried out according to the information of the user, the historical dialogue content of the user and the like, so that the problem about insurance of the user is solved, and the user is guided by voice to complete the whole insurance application process.

The telephone outbound method in the embodiment of the application can be applied to other fields besides the field of telephone sales insurance so as to reply a proper reply text of a user.

In the foregoing embodiments, a method, an apparatus, an electronic device, and a storage medium for outbound call of a phone are provided, where a large natural language model in a preset model is obtained by correcting according to multiple task headers, and the preset model can intelligently recommend insurance for a user according to audio features of a user's sound signal, text content converted by the sound signal, user information, and historical dialogue text, solve a problem about insurance for the user, and guide the user to complete the whole insurance application process by voice. The method comprises the following steps: acquiring a voice contact way of a user with an insurance intention based on operation information of the user on an insurance application interface; requesting voice communication with the user according to the voice contact mode; responding to the establishment of voice communication with a user, and acquiring a voice signal of the user and user information, wherein the user information at least comprises the operation progress of the user on an application interface; extracting audio characteristics from the sound signal and converting the sound signal into text content; if the historical dialogue text exists in the voice communication, fusing the audio characteristics, the text content, the user information and the historical dialogue text to obtain fused characteristics; and inputting the fused features into a preset model to obtain a reply text, wherein the preset model comprises a natural language big model and a generating head, and the natural language big model is obtained by correcting according to a user emotion recognition head, a recommendation head of an insurance product, a recognition head of a user insurance application progress and the generating head.

It should be noted that, the sequence number of each step in the above embodiment does not mean the sequence of execution sequence, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application. In practical applications, all possible embodiments may be combined in any combination manner to form possible embodiments of the present application, which are not described in detail herein.

Based on the telephone outbound method provided by each embodiment, the embodiment of the application also provides a telephone outbound device based on the same inventive concept.

Fig. 2 schematically illustrates a result diagram of a telephone outbound device provided in accordance with some embodiments. As shown in fig. 2, the telephone outbound device includes: a first acquisition unit 201, a request unit 202, a second acquisition unit 203, a conversion unit 204, a first fusion unit 205, and an input unit 206.

A first obtaining unit 201, configured to obtain a voice contact way of a user with an intention of application based on operation information of the user on an application interface;

a request unit 202, configured to request voice communication with a user according to the voice contact manner;

a second obtaining unit 203, configured to obtain, in response to establishment of voice communication with a user, a voice signal of the user and user information, where the user information includes at least an operation progress of the user on an application interface;

a conversion unit 204, configured to extract an audio feature from the sound signal, and convert the sound signal into text content;

a first fusing unit 205, configured to fuse the audio feature, the text content, the user information and the history dialogue text if the history dialogue text exists in the voice communication, so as to obtain a fused feature;

and an input unit 206, configured to input the fused features into a preset model to obtain a reply text, where the preset model includes a large natural language model and a generating head, and the large natural language model is obtained by correcting the large natural language model according to a user emotion recognition head, a recommendation head of an insurance product, a recognition head of a user insurance application progress, and the generating head.

In some embodiments, the apparatus further comprises:

and the second fusion unit is used for fusing the audio characteristics, the text content and the user information to obtain fused characteristics if no history dialogue text exists in the language communication.

In some embodiments, the apparatus further comprises:

the conversion unit is used for converting the reply text into a voice form to obtain target audio;

and the sending unit is used for sending the target audio to a user side so that the user side plays the target audio.

In some embodiments, the apparatus further comprises:

a third obtaining unit, configured to obtain a new sound signal of a user, new user information and a new history dialogue text, where the new user information at least includes an operation progress of the user on an application interface;

a first updating unit configured to update an original voice signal with the new voice signal, update the original user information with new user information, and update an original history dialogue text with a new history dialogue text;

and the re-executing unit is used for re-executing the steps of extracting the audio characteristics from the sound signal and converting the sound signal into text contents until the voice communication with the user is finished.

In some embodiments, the apparatus further comprises:

and the training unit is used for training the preset model by utilizing a large amount of sample data.

In some embodiments, the apparatus further comprises:

and the second updating unit is used for updating the user emotion recognition head, the recommendation head of the insurance product, the recognition head of the user insurance application progress and the generation head in the process of training the preset model by using a large amount of sample data.

In some embodiments, the conversion unit is configured to specifically perform: extracting Mel frequency spectrum coefficient characteristics from the sound signal.

Based on the same inventive concept, the embodiments of the present application also provide an electronic device, comprising a processor and a memory, the memory storing a computer program, the processor being arranged to run the computer program to perform the telephone outbound method of any of the embodiments described above.

In an exemplary embodiment, an electronic device is provided, as shown in fig. 3, the electronic device 300 shown in fig. 3 includes: a processor 301 and a memory 303. Wherein the processor 301 is coupled to the memory 303, such as via a bus 302. Optionally, the electronic device 300 may also include a transceiver 304. It should be noted that, in practical applications, the transceiver 304 is not limited to one, and the structure of the electronic device 300 is not limited to the embodiment of the present application.

The processor 301 may be a CPU (Central Processing Unit, central processor), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application SpecificIntegrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. Processor 301 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 302 may include a path to transfer information between the components. Bus 302 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect Standard) bus or an EISA (ExtendedIndustry Standard Architecture ) bus, or the like. Bus 302 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus.

The Memory 303 may be, but is not limited to, a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (RanDOM Access Memory ) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically ErasableProgrammable Read Only Memory ), a CD-ROM (Compact DiscRead Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 303 is used for storing computer program code for executing the aspects of the present application and is controlled to be executed by the processor 301. The processor 301 is arranged to execute computer program code stored in the memory 303 for implementing what is shown in the foregoing method embodiments.

Among them, electronic devices include, but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 3 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments herein.

Based on the same inventive concept, the embodiments of the present application also provide a storage medium having a computer program stored therein, wherein the computer program is configured to execute the phone outbound method of any one of the embodiments described above when running.

It will be clear to those skilled in the art that the specific working processes of the above-described systems, devices and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein for brevity.

Those of ordinary skill in the art will appreciate that: the technical solution of the present application may be embodied in essence or in whole or in part in a software product stored in a storage medium, which includes program instructions for causing an electronic device (e.g., a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application when the program instructions are executed. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random-access memory (RAM), a magnetic disk, or an optical disk, etc.

Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a personal computer, a server, or an electronic device such as a network device) associated with program instructions, where the program instructions may be stored in a computer-readable storage medium, and where the program instructions, when executed by a processor of the electronic device, perform all or part of the steps of the methods described in the embodiments of the present application.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may be modified or some or all technical features may be replaced equally within the spirit and principles of the present application; such modifications and substitutions do not depart from the scope of the present application.

Claims

1. A method of outgoing telephone calls, comprising:

2. The method as recited in claim 1, further comprising:

3. The method as recited in claim 1, further comprising:

converting the reply text into a voice form to obtain target audio;

4. The method of claim 3, further comprising, after said transmitting said target audio to a user terminal:

5. The method according to claim 1, wherein the generating of the preset model comprises:

training the preset model by using a large amount of sample data.

6. The method as recited in claim 5, further comprising:

7. The method of claim 1, wherein the step of extracting audio features from the sound signal comprises:

8. A telephone outbound device comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the telephone outgoing call method of any one of claims 1 to 7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the telephone outgoing call method as claimed in any one of claims 1 to 7.