CN111968632A

CN111968632A - Call voice acquisition method and device, computer equipment and storage medium

Info

Publication number: CN111968632A
Application number: CN202010673633.4A
Authority: CN
Inventors: 王焕鹏; 赵凯; 王福海; 张文锋
Original assignee: Merchants Union Consumer Finance Co Ltd
Current assignee: Merchants Union Consumer Finance Co Ltd
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2020-11-20
Anticipated expiration: 2040-07-14

Abstract

The application relates to a call voice acquisition method, a call voice acquisition device, computer equipment and a storage medium. The method comprises the following steps: acquiring a first response voice of a voice call object based on a preset call voice in a voice interaction process; the preset call voice corresponds to the voice call object; acquiring voice text information in the first response voice; determining the label information of the voice call object according to the voice text information; and determining a second response voice aiming at the first response voice and voice parameters of the second response voice according to the label information. By adopting the method, the second response voice aiming at the first response voice and the voice parameters of the second response voice can be dynamically adjusted according to the first response voice of the voice call object, and the personalized dialogue aiming at each user is realized, thereby solving the technical problems of single mode and unsatisfactory interactive result existing in the traditional interactive method.

Description

Call voice acquisition method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of voice interaction technologies, and in particular, to a method and an apparatus for acquiring a call voice, a computer device, and a storage medium.

Background

A voice interactive robot (Chatterbot) is a computer program that carries out conversations via conversations or words, capable of simulating human conversations or chats. At present, the voice interaction robot is widely applied to an instant messaging platform, realizes the functions of entertainment, retail marketing or serving as customer service, and can be applied to scenes of voice outbound services such as overdue reminding services and marketing services.

In the process of voice interaction, the traditional method generally adopts fixed several tones and the same set of template call voice, and during voice interaction, outbound interaction is randomly performed, however, the interaction method has a single mode, so that the feedback result of a user is not ideal after voice interaction.

Disclosure of Invention

Therefore, it is necessary to provide a call voice acquiring method, an apparatus, a computer device and a storage medium for solving the technical problem that the interaction method has a single mode and results in an unsatisfactory user feedback result after voice interaction.

A call voice acquisition method, the method comprising:

acquiring a first response voice of a voice call object based on a preset call voice in a voice interaction process; the preset call voice corresponds to the voice call object;

acquiring voice text information in the first response voice;

determining the label information of the voice call object according to the voice text information;

and determining a second response voice aiming at the first response voice and voice parameters of the second response voice according to the label information.

In one embodiment, the determining, according to the tag information, a second response voice for the first response voice and voice parameters of the second response voice includes:

when the tag information belongs to target tag information, obtaining portrait tag information of the voice call object;

determining a truth degree identification result of the first response voice according to the portrait label information;

and acquiring response voice corresponding to the reality recognition result and voice parameters of the response voice, and correspondingly taking the response voice as second response voice of the first response voice and voice parameters of the second response voice.

In one embodiment, the determining a truth recognition result of the first response voice according to the portrait label information includes:

determining a target information identifier corresponding to the portrait label information;

acquiring the portrait label grade corresponding to the target information identification as the portrait label grade of the voice call object;

and determining a truth recognition result of the first response voice according to the portrait label grade.

In one embodiment, the acquiring portrait tag information of the voice call object includes:

acquiring historical call data and historical credit data of the voice call object;

obtaining label values corresponding to a plurality of preset labels according to the historical call data and the historical credit data;

and according to the weight coefficient corresponding to each preset label, carrying out weighting processing on the label value corresponding to each preset label to obtain a target label value which is used as portrait label information of the voice call object.

In one embodiment, the voice parameters of the second response voice include: response tone, response intonation, and response volume;

after determining a second response voice for the first response voice and voice parameters of the second response voice according to the tag information, the method further includes:

performing voice synthesis on the response tone, the response intonation, the response volume and the second response voice based on a pre-trained voice synthesis model to obtain a target response voice;

and sending the target response voice to a voice interaction robot so that the voice interaction robot can perform voice interaction with the voice call object by adopting the target response voice.

In one embodiment, the determining the tag information of the voice call object according to the voice text information includes:

inputting the voice text information into a pre-trained semantic recognition model to obtain the label information of the voice call object; and the pre-trained semantic recognition model is used for performing semantic recognition on the voice text information to obtain the label information of the voice call object.

In one embodiment, before obtaining a first response voice of a voice call object based on a preset call voice in a voice interaction process, the method further includes:

acquiring initial call voice from a preset voice database;

and synthesizing the preset voice parameters and the initial call voice to obtain the preset call voice.

A call voice acquisition apparatus, the apparatus comprising:

the first answer voice acquisition module is used for acquiring a first answer voice of a voice call object based on a preset call voice in the voice interaction process; the preset call voice corresponds to the voice call object;

the voice text information acquisition module is used for acquiring voice text information in the first response voice;

the tag information determining module is used for determining the tag information of the voice call object according to the voice text information;

and the second response voice determining module is used for determining a second response voice aiming at the first response voice and a voice parameter of the second response voice according to the label information.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring voice text information in the first response voice;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring voice text information in the first response voice;

According to the call voice obtaining method, the call voice obtaining device, the computer equipment and the storage medium, the voice text information in the first answer voice is obtained by obtaining the first answer voice of the voice call object based on the preset call voice in the voice interaction process, the label information of the voice call object is determined according to the voice text information, and finally the second answer voice aiming at the first answer voice and the voice parameter of the second answer voice are determined according to the label information.

Drawings

Fig. 1 is an application scenario diagram of a call voice acquisition method in an embodiment;

FIG. 2 is a flow chart illustrating a method for call voice acquisition in one embodiment;

FIG. 3 is a flowchart illustrating the step of determining a second response utterance in one embodiment;

FIG. 4 is a diagram illustrating a page for editing a customer representation in one embodiment;

FIG. 5 is a diagram illustrating a configuration page of call voice in one embodiment;

FIG. 6 is a flow chart diagram illustrating call speech in one embodiment;

FIG. 7 is a block diagram showing the structure of a call voice acquiring apparatus according to an embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the present application, the term "plurality" means two or more.

The call voice acquiring method provided by the application can be applied to the application environment shown in fig. 1. The voice call object 102 may perform voice interaction with the voice interaction robot 106, and the server 104 may be disposed in the voice interaction robot 106, or may communicate with the voice interaction robot 106 through a network. The server 104 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a method for acquiring call voice is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:

step S202, acquiring a first response voice of a voice call object based on a preset call voice in the voice interaction process; the preset call voice corresponds to the voice call object.

The preset call voice represents the voice played to the voice call object, which is acquired by the voice interaction robot from the preset call voice library.

The first response voice represents response voice of the voice call object responding according to preset call voice played by the voice interaction robot.

In a specific implementation, in a scenario where the voice interaction robot calls a voice call object to perform a voice conversation, the voice interaction robot 106 first obtains a call voice for performing interaction from a call voice library, uses the call voice as a preset call voice and plays the call voice to the voice call object, and then the voice call object responds according to the preset call voice, uses a response voice of the voice call object as a first response voice, and the server 104 obtains the first response voice through voice recognition.

Step S204, acquiring the voice text information in the first response voice.

The voice text information may represent information obtained by converting the first response voice into a text.

In a specific implementation, after the server 104 obtains the first response voice, the first response voice may be converted into a text by an Automatic Speech Recognition (ASR) technology to obtain the voice text information. And further acquiring the keyword information in the first response voice.

Step S206, according to the voice text information, determining the label information of the voice call object.

Wherein the tag information may indicate an intention of the voice call object.

In a specific implementation, after the first response voice is converted by the automatic voice recognition technology to obtain the voice text information, the intention of the voice call object can be further determined according to the voice text information.

Further, in an embodiment, the step S206 specifically includes: inputting the voice text information into a pre-trained semantic recognition model to obtain the label information of a voice call object; the pre-trained semantic recognition model is used for performing semantic recognition on the voice text information to obtain the label information of the voice call object.

Specifically, the server 104 may input the converted voice text information into a semantic recognition model obtained by pre-training, and determine the tag information of the voice call object, that is, determine the intention of the voice call object according to the output result of the semantic recognition model.

For example, in an overdue reminding service scene, the first response voice of the voice call object is "i have cleared the loan", after the first response voice is converted into a text, the text "i have cleared the loan" can be input into the semantic recognition model, and according to the output result of the semantic recognition model, the tag information of the voice call object is obtained as the cleared loan.

Step S208, according to the label information, determining a second response voice aiming at the first response voice and a voice parameter of the second response voice.

The voice parameters comprise response tone, response tone and response volume.

In a specific implementation, after the tag information of the voice call object is determined, the second response voice for the first response voice and the voice parameters such as the response tone, the response volume, and the like of the second response voice can be determined from a pre-configured voice library according to the tag information.

Further, in one embodiment, after determining the second response voice for the first response voice and the voice parameters of the second response voice according to the tag information, the method further includes: performing voice synthesis on the response tone, the response volume and the second response voice based on a pre-trained voice synthesis model to obtain a target response voice; and sending the target response voice to the voice interaction robot so that the voice interaction robot can perform voice interaction with the voice call object by adopting the target response voice.

In a specific implementation, the server 104 may perform speech synthesis processing on the speech parameters and the second response speech based on a pre-trained speech synthesis model, and send the synthesized speech to the speech interaction robot 106 as a target response speech, so that the speech interaction robot 106 plays the target response speech to the speech call object, thereby implementing a conversation with the speech call object.

Further, after receiving the target response voice, the voice call object may respond again based on the target response voice, and use the response voice as a third response voice, and then the voice interaction robot 106 performs voice recognition on the third response voice, determines tag information of the voice call object, determines a fourth response voice for the third response voice and voice parameters of the fourth response voice according to the tag information, and so on, to implement voice interaction between the intelligent interaction robot 106 and the voice call object.

According to the method for acquiring the call voice, the voice text information in the first response voice is acquired by acquiring the first response voice of the voice call object based on the preset call voice in the voice interaction process, the tag information of the voice call object is determined according to the voice text information, and finally the second response voice aiming at the first response voice and the voice parameter of the second response voice are determined according to the tag information.

In an embodiment, as shown in fig. 3, the step S208 specifically includes:

step S302, when the label information belongs to the target label information, obtaining the portrait label information of the voice call object;

step S304, determining a truth degree identification result of the first response voice according to the portrait label information;

step S306, the response voice corresponding to the authenticity identification result and the voice parameter of the response voice are acquired, and the second response voice corresponding to the first response voice and the voice parameter of the second response voice are acquired.

The target tag information represents tag information (i.e., intention) that needs to be determined by applying portrait tag information of a voice call object, and the target tag information may include various tag information, and may be preset according to a requirement. For example, in an overdue reminder service outbound scenario, when the tag information is that the loan has been cleared, the tag information of "i have cleared the loan" needs to be determined, and the authenticity of the tag information is identified, so that the tag information can be used as target tag information.

The portrait tag information may represent information acquired based on feature data of a voice call partner, and the portrait tag information may be a portrait tag value, for example, when the portrait tag is a credit tag, the portrait tag information may be a credit tag value.

In a specific implementation, when the server 104 recognizes the tag information of the voice call object as the target tag information, that is, the portrait tag information of the voice call object needs to be applied to judge the first response voice of the voice call object, the portrait tag information of the voice call object is obtained, and the reality of the first response voice is recognized by using the portrait tag information, so as to obtain a reality recognition result. And finally, determining corresponding voice parameters of the response voice and the response voice according to the reality recognition result, and correspondingly taking the voice parameters of the response voice and the response voice as second response voice aiming at the first response voice and voice parameters of the second response voice.

For example, when the credit label value of the voice call partner is used as the portrait label information, the target label information of "i have cleared the loan" is judged. If the truth degree of the first response voice of the voice call object is determined to be low through the credit degree label value of the voice call object, the voice call object can be considered to be lying with a high probability, and the more strict response voice and tone can be used as the voice parameters of the second response voice and the second response voice. If it is determined that the first answering voice of the voice call object is more realistic, the voice call object is probably believed, then the answering voice and the voice parameters which are milder and give a certain soothing inquiry or give the largest grace period may be used as the voice parameters of the second answering voice and the second answering voice.

In the embodiment, whether the tag information of the voice call object obtained through recognition is determined, and when the tag information belongs to the target tag information, the image tag information of the voice call object is used for recognizing the truth of the first response voice, and further the corresponding second response voice and the voice parameters of the second response voice are determined according to the truth recognition result, so that different response voices and response voice parameters are dynamically determined according to the tag information of the voice call object, the anthropomorphic effect of the voice interaction robot for carrying out the outbound service is improved, the good feeling of the voice call object is improved, and accurate marketing can be realized in the outbound scene for carrying out the marketing service.

In an embodiment, the step S304 specifically includes: determining a target information identifier corresponding to the portrait label information; acquiring the portrait label grade corresponding to the target information identification as the portrait label grade of the voice call object; and determining a truth recognition result of the first response voice according to the portrait label grade.

The information identifier may represent an identifier of each section after the image tag information is divided into a plurality of sections. If the image tag information is an image tag value and there are a plurality of image tag value sections, one information tag is associated with each image tag value section.

In a specific implementation, taking an image tag value as the image tag information, after acquiring the image tag value of the voice call object, the server 104 may determine a target information identifier corresponding to the image tag value, that is, a target tag value interval corresponding to the image tag value, where each information identifier has a corresponding image tag level. And determining the truth degree recognition result of the first response voice according to the portrait label grade.

For example, taking the portrait tag information as the credit tag value, the credit tag value may be divided into two sections (0, N) and (N, 100), the corresponding information marks are respectively marked as a first information mark and a second information mark, the first information mark may correspond to a low credit tag level, and the second information mark may correspond to a high credit tag level. If the obtained credit label value of the voice call object is within the (0, N) interval, the first information identification is used as the target information identification, the low credit label grade corresponding to the first information identification is used as the credit label grade of the voice call object, the first response voice credibility of the voice call object can be determined to be low, and the voice call object is considered to lie with a high probability. Similarly, if the obtained credit label value of the voice call object is within the (N, 100) interval, the second information label is used as the target information label, the credit label level of the voice call object is determined to be a high credit label level, the first answer voice credibility is determined to be high, and the voice call object is considered not to lie.

In this embodiment, the portrait tag level of the voice call object is determined according to the target information identifier, and finally the truth recognition result of the first response voice to the voice call object is determined according to the portrait tag level, so that the second response voice for the first response voice and the voice parameter of the second response voice are determined according to the truth recognition result.

In an embodiment, the step S302 specifically includes: acquiring historical call data and historical credit data of a voice call object; obtaining label values corresponding to a plurality of preset labels according to the historical call data and the historical credit data; and according to the weight coefficient corresponding to each preset label, carrying out weighting processing on the label value corresponding to each preset label to obtain a target label value which is used as portrait label information of the voice call object.

The preset tag represents a variable associated with the portrait tag information of the voice call object, for example, when the portrait tag information is a credit tag, the preset tag may be a history overdue record, a credit record (e.g., sesame credit) on another platform, an age and income range, and the like.

In a specific implementation, the server 104 may obtain tag values corresponding to a plurality of preset tags according to the historical call data and the historical credit data of the voice call object by obtaining the historical call data of the voice call object and obtaining the historical credit data of the voice call object from an associated third-party system. Further, a weight coefficient corresponding to each preset label is obtained, the label value of each preset label and the corresponding weight coefficient are weighted to obtain a weighted value of each preset label, the sum of the weighted values of each preset label is calculated to obtain a target label value, and the target label value is used as portrait label information of the voice call object.

For example, if the portrait tag information is a credit tag value, and the weighting coefficients of the preset tags (i.e., historical overdue records, credit records on other platforms (e.g., sesame credits), ages, and income ranges) are respectively denoted as a, b, c, and d, the credit tag value of the voice call object is calculated according to the tag value of each preset tag and the weighting coefficient of the preset tag after the tag value of each preset tag is obtained from the historical call data and the historical credit data, wherein the process of calculating the credit tag value can be expressed as:

credit tag value ═ historical overdue record · weight a + credit records at other platforms · weight b + age · weight c + income range · weight d

The historical overdue record, the credit record, the age record, the income range record and the weight d of the other platforms can respectively represent the weighted value of each preset label.

In this embodiment, tag values of a plurality of preset tags are determined according to historical call data and historical credit data of the voice call object, and a target tag value is calculated according to a weight coefficient corresponding to each tag and is used as portrait tag information of the voice call object, so that the authenticity of the first response voice of the voice call object is conveniently identified according to the portrait tag information.

In one embodiment, if the voice interaction robot performs marketing-type service, the portrait tag information may be a product tag of interest, historical call data of a voice call object and a behavior record browsed on a third-party platform are acquired, browsing times of each product are acquired according to the historical call data and the historical browsing behavior record, the browsing times of each product are sorted according to a certain sequence (for example, from high to low), a high-frequency hot product concerned by the voice call object is acquired, and at least one high-frequency hot product is used as the product of interest of the voice call object. If the voice call object needs to borrow, the breakpoint of the voice call object in the borrowing process can be excavated in the voice interaction process, accurate problem solution is realized, and the voice call object is guided to complete the borrowing.

In one embodiment, the determination of the portrait tag of the voice call object may support the combination of different tags to form complex portrait tag information, as shown in fig. 4, for editing a page schematic diagram of a customer portrait (i.e., a portrait of the voice call object), the portrait tag of the voice call object may be divided into a client with property and a client without property, and the voice call object may be specifically divided according to the triggering conditions (i.e., determining whether there is a house loan, whether there is a car loan, a loan value, etc.) shown in the diagram. After the portrait tags of different voice call objects are determined, different dialogs (i.e., call voice and call voice parameters responding) can be configured according to the different portrait tags, as shown in fig. 5, a schematic view of a configuration page of the call voice is shown, one type of dialogs (i.e., call voice and call voice parameters) can be configured correspondingly for the portrait tag for the property client, and the other type of dialogs (i.e., call voice and call voice parameters) can be configured correspondingly for the portrait tag for the voice call object aged 26 to 35. As shown in fig. 6, the flow tree diagram of the configured response speech includes corresponding response speech under different intentions, and after the response speech of the voice call object is converted into text by ASR (speech to text), the obtained text is input into a semantic recognition model to determine the intention (i.e., tag information) of the voice call object. For example, if the intention of identifying the voice call object is that the M is hospitalized with illness, the branch of the flow tree of the M hospitalized with illness is entered, the call voice and the voice parameters of the branch of the flow tree are adopted to communicate with the voice call object, and the process continues to be circulated downwards by analogy. Each round of voice interaction in each conversation process involves the same semantic recognition, and as for how the flow flows, the flow tree is configured by business personnel.

Wherein, the step of determining the portrait label of each voice call object may further comprise: 1. calculating a client portrait label at the time of hastening according to historical data of hastening (namely, no payment is made on the date of payment due); 2. acquiring the result of prompting, whether AI is prompting, whether stage prompting is prompting, whether final prompting is prompting and the like; 3. calculating woe value of the client portrait label with the urging result as y; 4. wherein woe has the calculation formula: ln (good customer occupancy/bad customer occupancy) x 100% ═ odds ratio; 5. screening out a high woe variable interval to form thousands of people and thousands of faces labels; 6. the service person names each portrait label and configures the corresponding dialect.

In an embodiment, before the step S202, the method further includes: acquiring initial call voice from a preset voice database; and synthesizing the preset voice parameters and the initial call voice to obtain the preset call voice.

The initial call voice represents a voice including only text information and an uncomplexed voice parameter.

In the specific implementation, when the voice interaction robot calls and dials through the voice, a preset initial call voice is obtained from a voice database, and voice parameters such as preset tone, volume and the like and the initial call voice are subjected To voice synthesis through a Text To Speech (TTS) model To obtain the preset call voice, so that the voice interaction robot can use the preset call voice To interact with a voice call object.

In this embodiment, the initial call voice acquired from the voice database and the preset voice parameter are subjected to voice synthesis processing to obtain the preset call voice, so that the voice interaction robot plays the preset call voice to the voice call object, and the voice call object responds based on the preset call voice, thereby realizing the interaction of the voice call of the voice interaction robot.

It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 7, there is provided a call voice acquiring apparatus including: a first answer voice obtaining module 702, a voice text information obtaining module 704, a tag information determining module 706, and a second answer voice determining module 708, wherein:

a first answer voice obtaining module 702, configured to obtain a first answer voice of a voice call object based on a preset call voice in a voice interaction process; presetting a corresponding relation between a call voice and a voice call object;

a voice text information obtaining module 704, configured to obtain voice text information in the first response voice;

a tag information determining module 706, configured to determine tag information of the voice call object according to the voice text information;

the second responding voice determining module 708 is configured to determine a second responding voice for the first responding voice and voice parameters of the second responding voice according to the tag information.

In an embodiment, the second answer voice determining module 708 specifically includes:

the portrait tag information acquisition submodule is used for acquiring portrait tag information of a voice call object when the tag information belongs to target tag information;

the recognition result determining submodule is used for determining a truth recognition result of the first response voice according to the portrait label information;

and the response voice acquisition submodule is used for acquiring response voice corresponding to the authenticity identification result and voice parameters of the response voice, and corresponding to second response voice serving as the first response voice and the voice parameters of the second response voice.

In one embodiment, the recognition result determining sub-module is specifically configured to determine a target information identifier corresponding to the portrait label information; acquiring the portrait label grade corresponding to the target information identification as the portrait label grade of the voice call object; and determining a truth recognition result of the first response voice according to the portrait label grade.

In one embodiment, the portrait tag information obtaining sub-module is specifically configured to obtain historical call data and historical credit data of a voice call object; obtaining label values corresponding to a plurality of preset labels according to the historical call data and the historical credit data; and according to the weight coefficient corresponding to each preset label, carrying out weighting processing on the label value corresponding to each preset label to obtain a target label value which is used as portrait label information of the voice call object.

In one embodiment, the speech parameters of the second response speech include: response tone, response intonation, and response volume; the above-mentioned device still includes:

the voice synthesis module is used for carrying out voice synthesis on the response tone, the response volume and the second response voice based on a pre-trained voice synthesis model to obtain target response voice;

and the response voice sending module is used for sending the target response voice to the voice interaction robot so that the voice interaction robot can perform voice interaction with the voice call object by adopting the target response voice.

In an embodiment, the tag information determining module 706 is specifically configured to input the voice text information into a pre-trained semantic recognition model to obtain tag information of the voice call object; the pre-trained semantic recognition model is used for performing semantic recognition on the voice text information to obtain the label information of the voice call object.

In one embodiment, the above apparatus further comprises:

the initial call voice acquisition module is used for acquiring initial call voice from a preset voice database;

and the preset call voice determining module is used for synthesizing the preset voice parameters and the initial call voice to obtain the preset call voice.

It should be noted that, the call voice acquiring apparatus of the present application corresponds to the call voice acquiring method of the present application one to one, and the technical features and the advantages thereof described in the embodiments of the call voice acquiring method are all applicable to the embodiments of the call voice acquiring apparatus, and specific contents may refer to the description in the embodiments of the call voice acquiring method, and are not described herein again, and thus, the present application claims.

In addition, all or part of the modules in the call voice acquiring device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a call voice acquisition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A call voice acquisition method is characterized by comprising the following steps:

acquiring voice text information in the first response voice;

2. The method according to claim 1, wherein the determining a second response voice for the first response voice and voice parameters of the second response voice according to the tag information comprises:

3. The method of claim 2, wherein the determining the truth recognition result of the first response voice according to the portrait label information comprises:

4. The method of claim 2, wherein the obtaining of portrait tag information of the voice call object comprises:

5. The method according to claim 1, wherein the speech parameters of the second response speech include: response tone, response intonation, and response volume;

6. The method according to claim 1, wherein the determining tag information of the voice call object according to the voice text information comprises:

7. The method according to any one of claims 1 to 6, before obtaining the first answer voice of the voice call object based on the preset call voice in the voice interaction process, further comprising:

acquiring initial call voice from a preset voice database;

8. A call voice acquiring apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.