CN108156317B - Call voice control method and device, storage medium and mobile terminal - Google Patents

Call voice control method and device, storage medium and mobile terminal Download PDF

Info

Publication number
CN108156317B
CN108156317B CN201711393200.8A CN201711393200A CN108156317B CN 108156317 B CN108156317 B CN 108156317B CN 201711393200 A CN201711393200 A CN 201711393200A CN 108156317 B CN108156317 B CN 108156317B
Authority
CN
China
Prior art keywords
call
voice
feedback model
user
contact
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711393200.8A
Other languages
Chinese (zh)
Other versions
CN108156317A (en
Inventor
陈岩
刘耀勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201711393200.8A priority Critical patent/CN108156317B/en
Publication of CN108156317A publication Critical patent/CN108156317A/en
Application granted granted Critical
Publication of CN108156317B publication Critical patent/CN108156317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • G10L15/075Adaptation to the speaker supervised, i.e. under machine guidance
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72484User interfaces specially adapted for cordless or mobile telephones wherein functions are triggered by incoming communication events

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application discloses a call voice control method, a device, a storage medium and a mobile terminal, wherein the method comprises the following steps: when the current mobile terminal is detected to be in a call mode, acquiring the contact type of the current call contact; acquiring a preset feedback model generated based on a machine learning method; inputting the contact type into a preset feedback model, and acquiring target call sound characteristics output by the preset feedback model; and adjusting the call voice of the current mobile terminal user according to the target call voice characteristics, and sending the adjusted call voice to the terminal where the current call contact person is located. According to the technical scheme, the conversation voice of the user can be timely adjusted according to the type of the conversation contact person, and the conversation voice of the user sent to the terminal where the conversation contact person is located is matched with the current conversation contact person in terms of the characteristic voice of the conversation sound no matter what the conversation voice sent by the user is, so that the interestingness of voice conversation is improved.

Description

Call voice control method and device, storage medium and mobile terminal
Technical Field
The embodiment of the application relates to the technical field of call control, in particular to a call voice control method, a call voice control device, a storage medium and a mobile terminal.
Background
Mobile terminals such as mobile phones have more and more functions, which provide convenience for life and work of people, and the voice call function is a basic function in the mobile phones, so that people can make and receive calls and send voice messages by using the mobile phones. In the process of using the mobile phone to make a voice call, the related art has a defect in the call voice control method, and needs to be improved.
Disclosure of Invention
The embodiment of the application provides a call voice control method, a call voice control device, a storage medium and a mobile terminal, which can optimize a call voice control scheme.
In a first aspect, an embodiment of the present application provides a call voice control method, including:
when the current mobile terminal is detected to be in a call mode, acquiring the contact type of the current call contact;
the method comprises the steps of obtaining a preset feedback model generated based on a machine learning method, wherein the preset feedback model is obtained by training a plurality of call information samples with known call sound characteristics of users and is used for feeding back the call sound characteristics of the users of conversation call contacts based on the types of the call contacts;
inputting the contact type into the preset feedback model, and acquiring the target call sound characteristics output by the preset feedback model;
and adjusting the call voice of the current mobile terminal user according to the target call voice characteristics, and sending the adjusted call voice to the terminal where the current call contact person is located.
In a second aspect, an embodiment of the present application provides a call voice control apparatus, including:
the contact type acquisition module is used for acquiring the contact type of the current call contact when the current mobile terminal is detected to be in a call mode;
the system comprises a preset feedback model acquisition module, a preset feedback model acquisition module and a feedback module, wherein the preset feedback model acquisition module is used for acquiring a preset feedback model generated based on a machine learning method, the preset feedback model is obtained by training a plurality of call information samples of known user call sound characteristic information and is used for feeding back user call sound characteristics of a conversation call contact based on the type of the call contact;
the target call sound characteristic acquisition module is used for inputting the contact type into the preset feedback model and acquiring the target call sound characteristic output by the preset feedback model;
and the call voice adjusting module is used for adjusting the call voice of the current mobile terminal user according to the target call voice characteristic information and sending the adjusted call voice to the terminal where the current call contact person is located.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, where the program, when executed by a processor, implements the call voice control method as provided in the first aspect.
In a fourth aspect, an embodiment of the present application provides a mobile terminal, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing, implements the call voice control method provided in the first aspect.
The embodiment of the application generates a preset feedback model for determining the user call sound characteristics suitable for the conversation contact person in advance on the mobile terminal or the server, acquires the contact person type of the current call contact person when the mobile terminal is in a call mode, inputs the contact person type into the preset feedback model to obtain the target call sound characteristics suitable for the current mobile terminal user of the current call contact person, adjusts the call sound of the user according to the target call sound characteristics, sends the adjusted call sound to the terminal where the current call contact person is located, realizes the timely adjustment of the user call sound according to the call contact person type, and realizes that the user call sound sent to the terminal where the call contact person is located is matched with the current call contact person according to the call sound characteristics of the call sound sent by the user no matter what the call sound sent by the user is, the interest of voice call is also improved.
Drawings
Fig. 1 is a flowchart of a call voice control method according to an embodiment of the present application;
fig. 2 is a flowchart of another call voice control method provided in the embodiment of the present application;
fig. 3 is a schematic structural diagram of a call voice control device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of another mobile terminal according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Fig. 1 is a flowchart of a call voice control method according to an embodiment of the present application, where the method according to this embodiment may be executed by a call voice control device, the call voice control device may be implemented by hardware and/or software, and the call voice control device may be disposed inside a mobile terminal as a part of the mobile terminal. The mobile terminal in this embodiment includes, but is not limited to, a device with a call function, such as a smart phone, a tablet computer, or a notebook computer.
As shown in fig. 1, the call voice control method provided in this embodiment includes the following steps:
step 101, when detecting that the current mobile terminal is in a call mode, acquiring the contact type of the current call contact.
The call mode described in this embodiment includes a phone call mode, a third party voice call software call (for example, a video/voice call such as WeChat, QQ, etc., or sending out a WeChat voice message) mode, or other call modes.
Assuming that the user of the current mobile terminal is a, user a is talking to a talking contact B, whose relationship with user a includes a wide variety, which relationship is identified by a contact type, wherein the contact type may include colleague, leader, parent, relative, friend, client, lover or salesperson.
102, acquiring a preset feedback model generated based on a machine learning method, wherein the preset feedback model is obtained by training a plurality of call information samples with known call sound characteristics of users and is used for feeding back the call sound characteristics of the users of the conversation contact based on the type of the conversation contact.
The training generation and updating process of the preset feedback model based on the machine learning method can be carried out locally at the mobile terminal or in a preset server, and when the training generation of the preset feedback model is finished or the updating is finished, the preset feedback model can be directly sent to the mobile terminal for storage, or the preset server stores the preset feedback model and waits for the mobile terminal to actively acquire the preset feedback model. Accordingly, this step 102 may include: and acquiring a preset feedback model generated based on a machine learning method from a preset server or a mobile terminal locally. The machine learning method comprises a neural network method, a support vector machine method, a decision tree method, a logistic regression method, a Bayesian method and a random forest method.
In this embodiment, the source and the number of the call information samples of the call sound characteristics of the known user are not specifically limited. For example, the training sample may be historical speech information of the mobile terminal user, or may be historical speech information of a target user group, where the target user group may be a plurality of users having the same user attributes as the mobile terminal user, and the user attributes include age, gender, hobbies, occupation, and ordinary speech sound characteristics. It will be appreciated that for machine learning based models, the larger the number of samples in general, the more accurate the output results of the model.
The call information comprises the contact type of the call contact and the call sound characteristics embodied when the user has a conversation with the call contact. Generally, when a user communicates with a communication contact of a different contact type, the voice is different, for example, when communicating with a leader or a client, the voice is generally more formal, when communicating with a parent, a relative or a friend, the voice is generally more normal, and when communicating with a sales promoter, the voice is generally more robust. In another application scenario, for example, when the mobile terminal user is a professional user with a high requirement on call voice, such as a customer service operator or a sales promoter, the call voice characteristics of the mobile terminal user may be significantly different when the call contact type is a customer type and when the call contact type is another type. The call information comprises a plurality of call records of the users and each call contact person, and the type of the call contact person and the call sound characteristics of the users are marked for each call record. The voice feature of the user call can be used for extracting the voice feature information of the call voice according to the call voice data waveform of the user in the call process.
And taking the type of the call contact in the historical call information as the input of a preset feedback model, and taking the user call sound characteristics in the historical call information as the output of the preset feedback model, training the historical call information sample, and generating the preset feedback model. For the preset feedback model, when the subsequent mobile terminal is in a call mode, the predicted call sound characteristics of the user when the user is in a call with the current call contact can be fed back and output by inputting the contact type of the current call contact into the preset feedback model.
Wherein the call sound characteristics include at least one of tone, pitch, loudness, tone, pace, and speaking style. The call sound characteristic may be determined from a waveform shape, a vibration frequency, and a vibration amplitude in the call voice data waveform.
The preset feedback models can be multiple, some are used for feeding back tone features, some are used for feeding back loudness features, some are used for feeding back tone and speech speed features, and some are used for feeding back speech mode features. And the call sound characteristics of tone, loudness, tone, speech speed and speaking mode can be obtained simultaneously based on a preset feedback model.
Step 103, inputting the contact type into the preset feedback model, and acquiring the target call sound characteristics output by the preset feedback model.
Inputting the contact type of the current call contact into the preset feedback model to obtain a target call sound characteristic output by the preset feedback model, wherein the target call sound characteristic is a sound characteristic which a mobile terminal user should have when talking with the current call contact.
And 104, adjusting the call voice of the current mobile terminal user according to the target call voice characteristics, and sending the adjusted call voice to the terminal where the current call contact person is located.
In the process of communication, a microphone of the mobile terminal acquires communication voice data sent by a user in real time, the communication voice is adjusted and modified based on target communication voice characteristics before the communication voice data is sent to a communication contact person, and the adjusted communication voice is sent to a terminal where the current communication contact person is located.
Illustratively, the mobile terminal user is a customer service operator, the current call contact person is a client, the client operator may be in a cold sick state at some times, and the unavoidable call sound does not accord with the professional call requirement at ordinary times, so if the call voice control switch of the mobile terminal is turned on, the mobile terminal can automatically adjust and modify the call voice of the customer service operator according to the obtained call voice characteristic which accords with the call client contact person, so that the voice information of the customer service operator in the sick state can be eliminated or covered, the control of the mobile terminal on the call voice is more suitable for the user requirement, and the interestingness of the voice call is improved.
The method for controlling call voice provided by this embodiment generates a preset feedback model for determining the user call voice characteristics suitable for the call contact person in advance on the mobile terminal or the server, obtains the contact person type of the current call contact person when the mobile terminal is in the call mode, inputs the contact person type into the preset feedback model to obtain the target call voice characteristics suitable for the current mobile terminal user of the current call contact person, adjusts the call voice of the user according to the target call voice characteristics, sends the adjusted call voice to the terminal where the current call contact person is located, realizes the timely adjustment of the user call voice according to the call contact person types, and realizes that the user call voice sent to the terminal where the call contact person is located is matched with the current call contact person according to the call voice characteristics no matter what the call voice sent by the user is, the interest of voice call is also improved.
A method for performing call voice control using a preset feedback model generated by the neural network method will be briefly described below, taking a machine learning method as an example of the neural network method. Fig. 2 is a flowchart of another call voice control method according to an embodiment of the present application. As shown in fig. 2, the method provided by this embodiment includes the following steps:
step 201, obtaining historical call information of a mobile terminal user from a mobile terminal local or obtaining historical call information of a target user group from a preset server as a historical call information sample.
Step 202, training the historical call information sample by using a neural network method to generate a preset feedback model.
This step may include training the historical call information samples with a depth autoencoder to generate a preset feedback model.
And 203, acquiring the contact type of the current call contact when the current mobile terminal is detected to be in the call mode.
And 204, inputting the contact type into the preset feedback model, and acquiring the target call sound characteristics output by the preset feedback model.
Step 205, adjusting the call voice of the current mobile terminal user according to the target call voice characteristics, and sending the adjusted call voice to the terminal where the current call contact person is located.
On the basis of the technical scheme, the neural network method comprises an input layer, a hidden layer and an output layer; step 202 may include: inputting the contact type of each call contact in the historical call information into the input layer, and outputting the call sound characteristics of the intermediate user through the calculation of the activation function corresponding to each node of the hidden layer; and repeatedly correcting the weight in the activation function by using the difference between the voice feature of the middle user call and the voice feature of the user call for each call contact person in the historical call information and an optimization algorithm until the difference between the voice feature of the middle user call and the voice feature of the user call is within a preset range, obtaining the activation function of each trained node, and generating a preset feedback model.
Neural Networks (NNs) system refers to an artificial Neural network, a biological Neural network inspired from the human brain to process information, and includes an input layer, a hidden layer, and an output layer, and accordingly includes three kinds of nodes (basic units of the Neural network): the system comprises an input node, a hidden node and an output node, wherein the input node acquires information from the outside world; the hidden nodes are not directly connected with the outside world, and the nodes are calculated by using the activation function and transmit information from the input nodes to the output nodes; the output nodes are used to communicate information to the outside world.
The activation function refers to providing a non-linear modeling capability for the neural network system, and is a non-linear function in general. The activation function may include a relu function, a sigmoid function, a tanh function, or a maxout function.
sigmoid is a commonly used nonlinear activation function, and its mathematical form is as follows:
Figure BDA0001518023760000071
its output is a value between 0 and 1. tanh is also very similar to sigmoid, and in fact, tanh is a variant of sigmoid: tan (x) ═ 2sigmoid (2x) -1, unlike sigmoid, tan is 0-mean. In recent years relu has become more and more popular. Its mathematical expression is as follows: f (x) max (0, x), wherein the input signal<When 0, the outputs are all 0, the input signal>In the case of 0, the output equals the input. The expression of the maxout function is as follows: f. ofi(x)=maxj∈[1,k]Zij. FalseAssuming that the input nodes include x1 and x2, and the corresponding weights are w1 and w2, respectively, and further include weight b, the output node Y ═ f (w1 × 1+ w2 × 2+ b), where f is the activation function. In addition, the number of input layers and output layers is usually one, and the hidden layer may be formed of a plurality of layers.
The optimization algorithm includes a Stochastic Gradient Descent (SGD) algorithm, an adaptive moment estimation (adam) algorithm, or a Momentum algorithm.
On the basis of the above technical solution, adjusting the call voice of the user according to the target call sound feature, and sending the adjusted call voice to the terminal where the call contact person is located may include: generating an adjusting waveform according to the target call sound characteristic; synthesizing the adjusting waveform and a call voice waveform of the user acquired in real time to generate adjusted call voice data; and sending the adjusted call voice data to a terminal where a call contact person is located.
The synthesizing the adjustment waveform and the call voice waveform of the user acquired in real time may include: and synthesizing the adjusting waveform and the call voice waveform of the user acquired in real time by using a Pitch synchronous superposition (PSOLA) method.
On the basis of the technical scheme, the method also comprises the following steps: and acquiring the unit call voice fragment in real time according to a set acquisition rule. Correspondingly, the synthesizing the adjustment waveform and the call voice waveform obtained in real time, and generating adjusted call voice data may include: synthesizing the adjusting waveform and the unit call voice segment waveform to generate adjusted call voice subdata; sending the adjusted call voice data to the terminal where the call contact is located may include: and sending the adjusted call voice subdata to a terminal where a call contact person is located.
Optionally, the set obtaining rule may be to obtain one unit call voice segment every set duration or obtain one unit call voice segment every time when the end of one speech is detected, and specifically, may regard the end of one speech as detected when the pause time reaches the set time.
The call voice control method provided by this embodiment generates the preset feedback model by using the neural network system, when the mobile terminal is in a call mode, acquiring the contact type of the current call contact, inputting the contact type into a preset feedback model, obtaining a target call sound characteristic of a current mobile terminal user that is appropriate for a current call contact, the method and the device have the advantages that the conversation voice of the user is adjusted according to the target conversation sound characteristics, the adjusted conversation voice is sent to the terminal where the current conversation contact person is located, the conversation voice of the user is timely adjusted according to the type of the conversation contact person, the conversation voice of the user sent to the terminal where the conversation contact person is located is matched with the current conversation contact person in terms of the conversation sound characteristics no matter what the conversation voice sent by the user is, and interestingness of voice conversation is improved.
Fig. 3 is a schematic structural diagram of a call voice control device according to an embodiment of the present application, where the call voice control device may be implemented by software and/or hardware and integrated in a mobile terminal. As shown in fig. 3, the apparatus includes a contact type obtaining module 31, a preset feedback model obtaining module 32, a target call sound characteristic obtaining module 33, and a call voice adjusting module 34.
The contact type obtaining module 31 is configured to obtain a contact type of a current call contact when it is detected that the current mobile terminal is in a call mode;
the preset feedback model obtaining module 32 is configured to obtain a preset feedback model generated based on a machine learning method, where the preset feedback model is obtained by training a plurality of call information samples of known user call sound characteristic information and is used to feed back user call sound characteristics of a conversation call contact based on a call contact type;
the target call sound characteristic obtaining module 33 is configured to input the contact type into the preset feedback model, and obtain a target call sound characteristic output by the preset feedback model;
the call voice adjusting module 34 is configured to adjust a call voice of a current mobile terminal user according to the target call voice feature information, and send the adjusted call voice to a terminal where a current call contact is located.
The device provided by the embodiment realizes timely adjustment of the call voice of the user according to the type of the call contact person, and the call voice of the user sent to the terminal where the call contact person is located is matched with the current call contact person by the call voice characteristic voice no matter what the call voice sent by the user is, so that the interestingness of the voice call is also improved.
Optionally, the call sound characteristics include at least one of tone, pitch, loudness, tone, pace, and speaking style.
Optionally, the contact type includes colleagues, leaders, parents, relatives, friends, clients, lovers or promoters.
Optionally, the apparatus further comprises:
the system comprises a sample acquisition module, a history call information processing module and a history call information processing module, wherein the sample acquisition module is used for acquiring history call information of a mobile terminal user from a local mobile terminal or acquiring history call information of a target user group from a preset server to be used as a history call information sample;
and the preset feedback model generation module is used for training the historical call information sample by using a neural network method to generate a preset feedback model.
Optionally, the neural network method includes an input layer, a hidden layer, and an output layer; the preset feedback model generation module is specifically configured to:
inputting the contact type of each call contact in the historical call information into the input layer, and outputting the call sound characteristics of the intermediate user through the calculation of the activation function corresponding to each node of the hidden layer;
and repeatedly correcting the weight in the activation function by using the difference between the voice feature of the middle user call and the voice feature of the user call for each call contact person in the historical call information and an optimization algorithm until the difference between the voice feature of the middle user call and the voice feature of the user call is within a preset range, obtaining the activation function of each trained node, and generating a preset feedback model.
Optionally, the call voice adjusting module includes:
the adjusting waveform generating unit is used for generating an adjusting waveform according to the target call sound characteristic;
the call voice data generation unit is used for synthesizing the adjustment waveform and the call voice waveform of the user acquired in real time to generate adjusted call voice data;
and the call voice data sending unit is used for sending the adjusted call voice data to a terminal where a call contact person is located.
Optionally, the apparatus further comprises:
the unit call voice segment acquisition unit is used for acquiring the unit call voice segments in real time according to a set acquisition rule;
the call voice data generation unit is specifically configured to: synthesizing the adjusting waveform and the unit call voice segment waveform to generate adjusted call voice subdata;
the call voice data sending unit is specifically configured to: and sending the adjusted call voice subdata to a terminal where a call contact person is located.
Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a call voice control method, the method including:
when the current mobile terminal is detected to be in a call mode, acquiring the contact type of the current call contact;
the method comprises the steps of obtaining a preset feedback model generated based on a machine learning method, wherein the preset feedback model is obtained by training a plurality of call information samples with known call sound characteristics of users and is used for feeding back the call sound characteristics of the users of conversation call contacts based on the types of the call contacts;
inputting the contact type into the preset feedback model, and acquiring the target call sound characteristics output by the preset feedback model;
and adjusting the call voice of the current mobile terminal user according to the target call voice characteristics, and sending the adjusted call voice to the terminal where the current call contact person is located.
Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium provided in the embodiments of the present application and containing computer-executable instructions is not limited to the above-described call voice control operation, and may also perform related operations in the call voice control method provided in any embodiment of the present application.
The embodiment of the application provides a mobile terminal, and the mobile terminal can be integrated with the call voice control device provided by the embodiment of the application. Fig. 4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present application. The mobile terminal 400 may include: the device comprises a memory 401, a processor 402 and a computer program stored on the memory 401 and executable by the processor 402, wherein the processor 402 implements the call voice control method according to the embodiment of the application when executing the computer program.
The mobile terminal provided by the embodiment of the application realizes timely adjustment of the call voice of the user according to the type of the call contact person, and the call voice of the user sent to the terminal where the call contact person is located is matched with the current call contact person by the call voice characteristic voice no matter what the call voice sent by the user is, so that the interestingness of the voice call is also improved.
Fig. 5 is a schematic structural diagram of another mobile terminal provided in the embodiment of the present application, and as shown in fig. 5, the mobile terminal may include: a memory 501, a Central Processing Unit (CPU) 502 (also called a processor, hereinafter referred to as CPU), and the memory 501, which is used for storing executable program codes; the processor 502 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 501, for performing: when the current mobile terminal is detected to be in a call mode, acquiring the contact type of the current call contact; the method comprises the steps of obtaining a preset feedback model generated based on a machine learning method, wherein the preset feedback model is obtained by training a plurality of call information samples with known call sound characteristics of users and is used for feeding back the call sound characteristics of the users of conversation call contacts based on the types of the call contacts; inputting the contact type into the preset feedback model, and acquiring the target call sound characteristics output by the preset feedback model; and adjusting the call voice of the current mobile terminal user according to the target call voice characteristics, and sending the adjusted call voice to the terminal where the current call contact person is located.
The mobile terminal further includes: peripheral interface 503, RF (Radio Frequency) circuitry 505, audio circuitry 506, speakers 511, power management chip 508, input/output (I/O) subsystem 509, touch screen 512, other input/control devices 510, and external port 504, which communicate via one or more communication buses or signal lines 507.
It should be understood that the illustrated mobile terminal 500 is merely one example of a mobile terminal and that the mobile terminal 500 may have more or fewer components than shown, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
The following describes in detail the mobile terminal for controlling call voice provided in this embodiment, and the mobile terminal is a smart phone as an example.
A memory 501, the memory 501 being accessible by the CPU502, the peripheral interface 503, and the like, the memory 501 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other volatile solid state storage devices.
A peripheral interface 503, the peripheral interface 503 may connect input and output peripherals of the device to the CPU502 and the memory 501.
An I/O subsystem 509, which I/O subsystem 509 may connect input and output peripherals on the device, such as a touch screen 512 and other input/control devices 510, to the peripheral interface 503. The I/O subsystem 509 may include a display controller 5091 and one or more input controllers 5092 for controlling other input/control devices 510. Where one or more input controllers 5092 receive electrical signals from or send electrical signals to other input/control devices 510, the other input/control devices 510 may include physical buttons (push buttons, rocker buttons, etc.), dials, slide switches, joysticks, click wheels. It is noted that the input controller 5092 may be connected to any one of: a keyboard, an infrared port, a USB interface, and a pointing device such as a mouse.
A touch screen 512, which is an input interface and an output interface between the user terminal and the user, displays visual output to the user, which may include graphics, text, icons, video, and the like.
The display controller 5091 in the I/O subsystem 509 receives electrical signals from the touch screen 512 or transmits electrical signals to the touch screen 512. The touch screen 512 detects a contact on the touch screen, and the display controller 5091 converts the detected contact into an interaction with a user interface object displayed on the touch screen 512, that is, implements a human-computer interaction, and the user interface object displayed on the touch screen 512 may be an icon for running a game, an icon networked to a corresponding network, or the like. It is worth mentioning that the device may also comprise a light mouse, which is a touch sensitive surface that does not show visual output, or an extension of the touch sensitive surface formed by the touch screen.
The RF circuit 505 is mainly used to establish communication between the mobile phone and the wireless network (i.e., network side), and implement data reception and transmission between the mobile phone and the wireless network. Such as sending and receiving short messages, e-mails, etc. In particular, the RF circuitry 505 receives and transmits RF signals, also referred to as electromagnetic signals, through which the RF circuitry 505 converts electrical signals to or from electromagnetic signals and communicates with communication networks and other devices. The RF circuitry 505 may include known circuitry for performing these functions including, but not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC (CODEC) chipset, a Subscriber Identity Module (SIM), and so forth.
The audio circuit 506 is mainly used to receive audio data from the peripheral interface 503, convert the audio data into an electric signal, and transmit the electric signal to the speaker 511.
The speaker 511 is used for restoring the voice signal received by the handset from the wireless network through the RF circuit 505 to sound and playing the sound to the user.
And a power management chip 508 for supplying power and managing power to the hardware connected to the CPU502, the I/O subsystem, and the peripheral interface 503.
The call voice control device, the storage medium and the mobile terminal provided in the above embodiments can execute the call voice control method provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. For details of the call voice control method provided in any of the embodiments of the present application, reference may be made to the technical details not described in detail in the above embodiments.
The embodiment of the application further provides a call voice control device, the device is integrated in a preset server, and the device comprises a sample acquisition module and a preset feedback model generation module.
The sample acquisition module is used for acquiring historical call information of a mobile terminal user from the mobile terminal or acquiring historical call information of a target user group from a preset server local place to be used as a historical call information sample;
and the preset feedback model generation module is used for training the historical call information sample by using a neural network method to generate a preset feedback model.
The embodiment of the application also provides a server, and the server integrates the call voice control device comprising the sample acquisition module and the preset feedback model generation module.
The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims (8)

1. A conversation voice control method is characterized by comprising the following steps:
when the current mobile terminal is detected to be in a call mode, acquiring the contact type of the current call contact;
the method comprises the steps of obtaining a preset feedback model generated based on a machine learning method, wherein the preset feedback model is obtained by training a plurality of call information samples with known user call sound characteristics and is used for feeding back the user call sound characteristics of a conversation call contact based on the type of the conversation contact, the call sound characteristics comprise tone, loudness, tone, speech speed and speaking mode, and the preset feedback model is multiple and respectively corresponds to the tone characteristics, the loudness characteristics, the speech speed characteristics and the speaking mode characteristics; the preset feedback model is generated in a server and stored in the current mobile terminal;
inputting the contact type into the preset feedback model, and acquiring the target call sound characteristics output by the preset feedback model;
generating an adjusting waveform according to the target call sound characteristic;
synthesizing the adjusting waveform and a call voice waveform of the user acquired in real time to generate adjusted call voice data;
and sending the adjusted call voice data to a terminal where a call contact person is located.
2. The method of claim 1, wherein the contact human type comprises a colleague, a leader, a parent, a relative, a friend, a client, a lover, or a promoter.
3. The call voice control method according to claim 1, further comprising:
acquiring historical call information of a mobile terminal user from a mobile terminal local or acquiring historical call information of a target user group from a preset server to be used as a historical call information sample;
and training the historical call information sample by using a neural network method to generate a preset feedback model.
4. The call voice control method according to claim 3, wherein the neural network method includes an input layer, a hidden layer, and an output layer;
the training the historical call information sample by using the neural network method to generate a preset feedback model comprises the following steps:
inputting the contact type of each call contact in the historical call information into the input layer, and outputting the call sound characteristics of the intermediate user through the calculation of the activation function corresponding to each node of the hidden layer;
and repeatedly correcting the weight in the activation function by using the difference between the voice feature of the middle user call and the voice feature of the user call for each call contact person in the historical call information and an optimization algorithm until the difference between the voice feature of the middle user call and the voice feature of the user call is within a preset range, obtaining the activation function of each trained node, and generating a preset feedback model.
5. The call voice control method according to claim 1, further comprising: acquiring unit call voice fragments in real time according to a set acquisition rule;
the synthesizing the adjustment waveform with the call voice waveform acquired in real time, and generating adjusted call voice data includes: synthesizing the adjusting waveform and the unit call voice segment waveform to generate adjusted call voice subdata;
sending the adjusted call voice data to a terminal where a call contact is located comprises: and sending the adjusted call voice subdata to a terminal where a call contact person is located.
6. A speech control device for a call, comprising:
the contact type acquisition module is used for acquiring the contact type of the current call contact when the current mobile terminal is detected to be in a call mode;
the system comprises a preset feedback model acquisition module, a feedback model acquisition module and a feedback model feedback module, wherein the preset feedback model is generated based on a machine learning method, is obtained by training a plurality of call information samples of known user call sound characteristic information and is used for feeding back user call sound characteristics of a conversation call contact based on the type of the conversation contact, the call sound characteristics comprise tone, loudness, tone, speech speed and speaking mode, and the preset feedback model is multiple and respectively corresponds to the tone characteristics, the loudness characteristics, the speech speed characteristics and the speaking mode characteristics; the preset feedback model is generated in a server and stored in the current mobile terminal;
the target call sound characteristic acquisition module is used for inputting the contact type into the preset feedback model and acquiring the target call sound characteristic output by the preset feedback model;
conversation voice adjustment module includes:
the adjusting waveform generating unit is used for generating an adjusting waveform according to the target call sound characteristic;
the call voice data generation unit is used for synthesizing the adjustment waveform and the call voice waveform of the user acquired in real time to generate adjusted call voice data;
and the call voice data sending unit is used for sending the adjusted call voice data to a terminal where a call contact person is located.
7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the call voice control method according to any one of claims 1 to 5.
8. A mobile terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the call voice control method according to any one of claims 1 to 5 when executing the computer program.
CN201711393200.8A 2017-12-21 2017-12-21 Call voice control method and device, storage medium and mobile terminal Active CN108156317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711393200.8A CN108156317B (en) 2017-12-21 2017-12-21 Call voice control method and device, storage medium and mobile terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711393200.8A CN108156317B (en) 2017-12-21 2017-12-21 Call voice control method and device, storage medium and mobile terminal

Publications (2)

Publication Number Publication Date
CN108156317A CN108156317A (en) 2018-06-12
CN108156317B true CN108156317B (en) 2020-03-10

Family

ID=62464120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711393200.8A Active CN108156317B (en) 2017-12-21 2017-12-21 Call voice control method and device, storage medium and mobile terminal

Country Status (1)

Country Link
CN (1) CN108156317B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10896689B2 (en) * 2018-07-27 2021-01-19 International Business Machines Corporation Voice tonal control system to change perceived cognitive state
CN109151366B (en) * 2018-09-27 2020-09-22 惠州Tcl移动通信有限公司 Sound processing method for video call, storage medium and server
CN109215629B (en) * 2018-11-22 2021-01-01 Oppo广东移动通信有限公司 Voice processing method and device and terminal
CN109979473A (en) * 2019-03-29 2019-07-05 维沃移动通信有限公司 A kind of call sound processing method and device, terminal device
CN110364177A (en) * 2019-07-11 2019-10-22 努比亚技术有限公司 Method of speech processing, mobile terminal and computer readable storage medium
CN112750443A (en) * 2019-10-30 2021-05-04 北京小米移动软件有限公司 Call voice output method and device, storage medium and electronic equipment
CN113555011B (en) * 2021-07-07 2022-05-27 广西电网有限责任公司 Electric power industry customer service center voice translation modeling method, system and medium
CN114666449B (en) * 2022-03-29 2022-12-06 深圳市银服通企业管理咨询有限公司 Voice data processing method of calling system and calling system
CN114710592B (en) * 2022-04-11 2023-05-02 江西省信合客户服务有限公司 Calling system and method based on artificial intelligence
CN115665318B (en) * 2022-11-30 2023-10-20 荣耀终端有限公司 Call tone quality adjusting method and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093752A (en) * 2013-01-16 2013-05-08 华南理工大学 Sentiment analytical method based on mobile phone voices and sentiment analytical system based on mobile phone voices
CN103905644A (en) * 2014-03-27 2014-07-02 郑明� Generating method and equipment of mobile terminal call interface
CN104702759A (en) * 2013-12-06 2015-06-10 中兴通讯股份有限公司 Address list setting method and address list setting device
CN105208221A (en) * 2015-10-30 2015-12-30 维沃移动通信有限公司 Method and device for automatically adjusting communication voice
CN105448300A (en) * 2015-11-12 2016-03-30 小米科技有限责任公司 Method and device for calling

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI579828B (en) * 2015-06-01 2017-04-21 鴻海精密工業股份有限公司 Voice recognition device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093752A (en) * 2013-01-16 2013-05-08 华南理工大学 Sentiment analytical method based on mobile phone voices and sentiment analytical system based on mobile phone voices
CN104702759A (en) * 2013-12-06 2015-06-10 中兴通讯股份有限公司 Address list setting method and address list setting device
CN103905644A (en) * 2014-03-27 2014-07-02 郑明� Generating method and equipment of mobile terminal call interface
CN105208221A (en) * 2015-10-30 2015-12-30 维沃移动通信有限公司 Method and device for automatically adjusting communication voice
CN105448300A (en) * 2015-11-12 2016-03-30 小米科技有限责任公司 Method and device for calling

Also Published As

Publication number Publication date
CN108156317A (en) 2018-06-12

Similar Documents

Publication Publication Date Title
CN108156317B (en) Call voice control method and device, storage medium and mobile terminal
CN108076224B (en) Application program control method and device, storage medium and mobile terminal
CN107995370B (en) Call control method, device, storage medium and mobile terminal
CN107995428B (en) Image processing method, image processing device, storage medium and mobile terminal
CN108153463B (en) Application interface display control method and device, storage medium and mobile terminal
CN105119812B (en) In the method, apparatus and terminal device of chat interface change emoticon
CN108021572B (en) Reply information recommendation method and device
CN107947951A (en) Groups of users recommends method, apparatus and storage medium and server
US11102354B2 (en) Haptic feedback during phone calls
KR102447381B1 (en) Method for providing intelligent agent service while calling and electronic device thereof
CN105141587A (en) Virtual doll interaction method and device
CN108537971A (en) A kind of control method of massage armchair, terminal and storage medium
CN104636453A (en) Illegal user data identification method and device
US20180067991A1 (en) Using Structured Smart Digital Memory to Personalize Digital Agent and Bot Scenarios
US10015234B2 (en) Method and system for providing information via an intelligent user interface
CN107948093A (en) Adjust the method and device that network speed is applied in terminal device
CN111898018A (en) Virtual resource sending method and device, electronic equipment and storage medium
CN110727775B (en) Method and apparatus for processing information
US9807732B1 (en) Techniques for tuning calls with user input
CN113994359A (en) System for efficient use of data for personalization
KR20190117753A (en) Message notification method and terminal
KR20220150198A (en) METHOD AND APPARATUS FOR MATCHING MARRY INFORMATION USING PREFERENCE MODEL BASED ON Bi-LSTM
CN107704919B (en) Control method and device of mobile terminal, storage medium and mobile terminal
WO2020088759A1 (en) Electronic device and method for predicting an intention of a user
CN111191143B (en) Application recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 523860 No. 18, Wu Sha Beach Road, Changan Town, Dongguan, Guangdong

Applicant after: OPPO Guangdong Mobile Communications Co., Ltd.

Address before: 523860 No. 18, Wu Sha Beach Road, Changan Town, Dongguan, Guangdong

Applicant before: Guangdong OPPO Mobile Communications Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant