CN111193834A

CN111193834A - Man-machine interaction method and device based on user sound characteristic analysis and electronic equipment

Info

Publication number: CN111193834A
Application number: CN201911290188.7A
Authority: CN
Inventors: 李梦迪; 苏绥绥; 常富洋
Original assignee: Beijing Qilu Information Technology Co Ltd
Current assignee: Beijing Qilu Information Technology Co Ltd
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2020-05-22
Anticipated expiration: 2039-12-16
Also published as: CN111193834B

Abstract

The invention discloses a human-computer interaction method, a human-computer interaction device and electronic equipment based on user sound characteristic analysis, wherein the human-computer interaction method comprises the following steps: acquiring user voice information; carrying out semantic recognition on the user voice information, and extracting and forming text data corresponding to the voice information; and analyzing the voice characteristics of the user by utilizing a customer service response model based on the voice information of the user and the text data to generate interactive feedback voice, and sending the interactive feedback voice to the user. Through voice feature recognition and semantic recognition of the voice information of the user, accuracy and friendliness of the user in a man-machine interaction scene are improved, and better use experience is provided for the user.

Description

Man-machine interaction method and device based on user sound characteristic analysis and electronic equipment

Technical Field

The invention relates to the technical field of human-computer interaction, in particular to a human-computer interaction method and device based on user sound characteristic analysis, electronic equipment and a computer readable medium.

Background

With the rapid development of artificial intelligence technology and the continuous rising of labor cost of enterprises, more and more enterprises adopt a human-computer interaction mode of robot customer service for business related to customer service.

The existing intelligent man-machine interaction mode can accurately recognize semantic information contained in user voice and form man-machine interaction with high information exchange completion degree. In addition, there is a user emotion recognition technology combining face image information or voice keywords in the prior art, so as to determine the emotion of the user through face image recognition or voice keywords, thereby generating voice feedback information suitable for the emotion of the user.

However, most of the time, the human-computer interaction is only in a voice conversation mode, and the facial expression of the user cannot be recognized; the current state characteristics of the user cannot be accurately judged only by the semantic recognition or the voice recognition of the keywords. In addition, the man-machine interaction method which only considers the current emotion of the user in a single dimension cannot provide the user with a customer service with better experience.

Disclosure of Invention

The invention aims to provide a human-computer interaction method, a human-computer interaction device and electronic equipment based on user voice feature analysis, and aims to improve the accuracy and friendliness of a user in a human-computer interaction scene through voice feature recognition and semantic recognition of user voice information and provide better use experience for the user.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

In order to achieve the above object, an aspect of the present invention provides a human-computer interaction method based on user voice feature analysis, including:

acquiring user voice information in an IVR voice call with a user;

carrying out semantic recognition on the user voice information, and extracting and forming text data corresponding to the voice information;

and analyzing the voice characteristics of the user by utilizing a customer service response model based on the voice information of the user and the text data to generate interactive feedback voice, and sending the interactive feedback voice to the user.

According to a preferred embodiment of the present invention, the customer service response model performs voiceprint analysis on the user voice information, and generates the voice feature of the user by combining semantic analysis on the text data.

According to a preferred embodiment of the present invention, the user voice characteristics further include a user gender characteristic, an accent characteristic, an emotion characteristic, a speech rate characteristic, and a speaking style characteristic.

According to a preferred embodiment of the present invention, the method further comprises recognizing emotional characteristics of the user based on energy values and/or a waveform pattern of the user's voice.

According to a preferred embodiment of the present invention, the method further comprises continuously sampling the user voice to obtain a voice energy sequence and/or a voice waveform of the user, and recognizing the emotional characteristics of the user according to the voice energy sequence and/or the voice waveform.

According to a preferred embodiment of the present invention, the step of performing voiceprint analysis on the user voice information and generating the voice characteristics of the user in combination with the analysis of the text data further includes constructing a semantic-based voice characteristic recognition submodel and inputting the text data into the recognition submodel to recognize the voice characteristics of the user.

According to a preferred embodiment of the present invention, the customer service response model determines the user voice conversation intention according to the text data, matches the user voice conversation intention with a preset conversation library to generate feedback information, and synthesizes and generates interactive feedback voice by using a voice simulation technology based on the user voice feature and the feedback information.

According to a preferred embodiment of the present invention, the customer service response model may recognize and modify the user voice feature based on the newly acquired user voice information, and generate an interactive feedback voice according to the modified user voice feature.

According to a preferred embodiment of the present invention, after the user voice session is ended, user evaluation information is obtained, and the customer service response model is optimized according to the user evaluation information.

A second aspect of the present invention provides a human-computer interaction device based on user voice feature analysis, comprising:

the voice acquisition module is used for acquiring the voice information of the user in an IVR voice call with the user;

the semantic extraction module is used for carrying out semantic recognition on the user voice information and extracting and forming text data corresponding to the voice information;

the customer service response model is used for analyzing the user voice characteristics based on the user voice information and the text data to generate interactive feedback voice;

and the voice sending module is used for sending the interactive feedback voice to the user.

According to a preferred embodiment of the present invention, the customer service response model further includes a user voice feature recognition unit, configured to perform voiceprint analysis on the user voice information, and generate the voice feature of the user by combining semantic analysis on the text data.

According to a preferred embodiment of the present invention, the user voice feature recognition unit may perform user emotion feature recognition based on an energy value and/or a waveform pattern of the user voice.

According to a preferred embodiment of the present invention, the user voice feature recognition unit further comprises a sampling component for continuously sampling the user voice to obtain a voice energy sequence and/or a voice waveform of the user; and the recognition component is used for recognizing the emotional characteristics of the user according to the voice energy sequence and/or the voice oscillogram.

According to a preferred embodiment of the present invention, the customer service response model further comprises a semantic-based voice feature recognition submodel, and the text data is input into the recognition submodel to recognize the voice feature of the user.

According to a preferred embodiment of the present invention, the customer service response model further includes a feedback information generating unit, configured to determine the user voice conversation intention according to the text data, and match the user voice conversation intention with a preset conversation library to generate feedback information; and the voice simulation synthesis unit is used for synthesizing and generating interactive feedback voice by using a voice simulation technology based on the user voice characteristics and the feedback information.

According to a preferred embodiment of the present invention, the customer service response model may recognize and correct the user voice feature based on the newly acquired user voice information, and generate an interactive feedback voice according to the corrected user voice feature.

According to a preferred embodiment of the present invention, the system further includes an evaluation acquisition module, configured to acquire user evaluation information after the user voice session is ended; and the correction module is used for optimizing the customer service response model according to the user evaluation information.

A third aspect of the present invention provides an electronic apparatus, wherein the electronic apparatus includes:

a processor; and the number of the first and second groups,

a memory storing computer executable instructions that, when executed, cause the processor to perform the above-described human-computer interaction method based on user sound feature analysis.

A fourth aspect of the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement the above-mentioned human-computer interaction method based on user sound feature analysis.

Drawings

In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive faculty.

Fig. 1 is a main flowchart illustrating a human-computer interaction method based on user voice feature analysis according to an exemplary embodiment.

Fig. 2 is a specific example of a human-computer interaction method based on user voice feature analysis.

FIG. 3 is a block diagram illustrating a human-computer interaction device based on user voice characteristic analysis, according to an example embodiment.

FIG. 4 is a block diagram illustrating a customer service response model according to an exemplary embodiment.

FIG. 5 is a block diagram illustrating a voiceprint based voice feature recognition unit in accordance with an example embodiment.

Fig. 6 is a block diagram of an exemplary embodiment of an electronic device according to the present invention.

FIG. 7 is a block diagram illustrating a computer-readable medium in accordance with an example embodiment.

Detailed Description

Exemplary embodiments of the present invention will now be described more fully with reference to the accompanying drawings. The exemplary embodiments, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. The same reference numerals denote the same or similar elements, components, or parts in the drawings, and thus their repetitive description will be omitted.

Features, structures, characteristics or other details described in a particular embodiment do not preclude the fact that the features, structures, characteristics or other details may be combined in a suitable manner in one or more other embodiments in accordance with the technical idea of the invention.

In describing particular embodiments, the present invention has been described with reference to features, structures, characteristics or other details that are within the purview of one skilled in the art to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific features, structures, characteristics, or other details.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these terms should not be construed as limiting. These phrases are used to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention.

The term "and/or" and/or "includes any and all combinations of one or more of the associated listed items.

Fig. 1 is a main flowchart illustrating a human-computer interaction method based on user voice feature analysis according to an exemplary embodiment. The man-machine interaction method based on the user sound characteristic analysis at least comprises 6 steps S101-S106.

In step S101, user voice information is acquired.

And receiving and recording the voice information of the user in the process of voice communication with the user. The user uses the operator communication service or the network to establish voice call connection with the call center, the user client microphone converts the sound signal sent by the user into an electric signal, the electric signal is transmitted to the call center through the operator communication network or the internet, and the electric signal is converted into voice information through the telephone receiver. The call center acquires the transmitted and converted voice information.

In step S102, semantic extraction is performed on the voice information

And the client response model carries out semantic recognition on the user voice information acquired in the step S101, and extracts and forms text data corresponding to the voice information.

Further, keyword segmentation and indexing may be performed on the formed text data for further analysis. In the prior art, the writing of the voice file and the extraction of the keywords are common technical means in the field, and can be realized by adopting a conventional method in the field, which is not repeated in the invention.

In step S103, sound characteristic analysis is performed on the user speech.

And the client response model performs sound characteristic analysis on the user voice information to determine the sound characteristic of the current user. The sound analysis can include two aspects, namely, analyzing the voiceprint of the user to obtain the sound feature tag of the user, and performing semantic analysis on the content contained in the voice information of the user to determine the sound feature tag of the user. And combining the user sound feature labels obtained by the two-aspect analysis to determine the sound feature for the user.

More specifically, the voiceprint analysis of the user's voice information includes analysis of characteristics of the user in terms of gender, accent, mood, speed of speech, and the like.

More specifically, the emotional characteristics of the user are identified based on the energy value and/or the waveform pattern of the user's voice.

In one embodiment, the emotional characteristics of the user are identified based on the energy value of the user's voice. Firstly, converting user voice into a voice energy sequence, specifically: and detecting voice input by using a VAD algorithm, converting the voice input into a voice waveform signal, setting the width of a sampling window and a sampling interval, calculating a voice energy value of a sampling point, and acquiring a voice energy sequence. Further, the speech energy sequence may be composed of a plurality of sample point data, each sample point data including a sample point time stamp and a sample point speech energy value. Secondly, inputting the voice energy sequence into an emotion judgment model for calculation, wherein the emotion judgment model is a machine self-learning model and is obtained through historical call record training. More specifically, the historical user call records may be subjected to emotion calibration, corresponding voices are converted into voice energy sequences required by the model, and the voice energy sequences and emotion calibration values are used as training data for model training. Further, the emotion judgment model may be an RNN cycle network model. In the present embodiment, a speech energy sequence is input to the input layer of the speech emotion determination model, and the number of nodes in the output layer of the speech emotion determination model is the same as the number of nodes in the input layer, and the speech energy sequence is output. And judging the emotional characteristics for the user based on the voice energy sequence.

Performing semantic analysis on the content contained in the user voice information to determine that the content is used for sound features, and constructing a sound feature recognition sub-model based on semantics; inputting text data extracted from user voice information through voice recognition into the sub-model to recognize voice characteristics of the user.

Furthermore, corresponding mapping of the keywords and the emotion characteristics is established, the text information and the emotion characteristics corresponding to the conversation are used as training samples, and a semantic-based voice characteristic recognition sub-model is established by adopting a machine learning method. And taking the character information corresponding to the conversation as a model input layer, and obtaining the emotional characteristics possibly contained in the conversation through model operation.

And integrating the voiceprint analysis and semantic analysis of the user voice information to determine the voice characteristics of the user.

In step S104, interactive feedback speech is generated

The client response model judges the intention of the voice conversation of the user according to the text data, and the intention is matched with a preset conversation library to generate specific feedback information;

according to the analysis of the voice characteristics of the user in step S103, according to the voice characteristics of the current user, a voice simulation mode adapted to the voice characteristics of the user is synthesized by a voice simulation technique, and the generated specific feedback information is subjected to voice simulation to generate an interactive feedback voice.

In step S105, the generated interactive feedback voice is transmitted to the user, and the latest voice information of the user is received. And (4) processing the latest voice information in steps S101-S104, correcting the voice characteristics of the user in real time, simulating and generating interactive feedback voice according to the corrected voice characteristics of the user, and sending the interactive feedback voice to the user.

In step S106, the model evaluation is corrected

And after the call is finished, the evaluation of the user on the interactive call is obtained, and the parameters of the customer response model are adjusted based on the evaluation result, so that each sub-model and the judgment mode in the customer response model are more accurate.

Example (c):

fig. 2 is a specific example of applying the method of the present invention.

In step 201, a user enters a line to realize voice connection between the intelligent voice robot and the user. In step S202, the intelligent voice robot obtains voice information of the user, analyzes and judges a voice feature of the user based on the voice information to obtain the voice feature of the voice of the user, and extracts semantic content of the voice feature. In step S203, the simulated sound production scheme in the speech library is called based on the determined voice characteristics of the user, so as to prepare for speech output of the basic tone. The voice characteristics of the user include, but are not limited to, gender characteristics of the user, accent characteristics of the user, emotional characteristics of the user, speech rate characteristics of the user, semantic characteristics of the user, and style characteristics of the user.

More specifically, by judging the gender characteristics of the user, the intelligent voice robot can simulate the gender sounds more favorable for serving the current user to carry out voice interaction; judging the user region according to the accent characteristics of the user so that the intelligent voice robot simulates the accent of the place where the user is located to carry out voice interaction; judging the current emotion of the user by analyzing the emotion characteristics of the user so that the intelligent voice robot can adjust the tone during pronunciation; judging the speaking content of the user through semantic analysis so as to adjust the speed or tone of the intelligent robot during the simulated sound production; through analysis of the speech style characteristics of the user, it has been determined that the intelligent speech robot uses similar style replies at the time of the speech selection.

In step S204, the current voice information of the user is obtained in real time, the current voice characteristics of the user are analyzed by the analyzing step, and are matched with the voice library, and the feedback voice information conforming to the current voice characteristics of the user is generated in a simulated manner.

In step S205, when the session is ended, the user evaluation of the session is obtained, and the customer service model matching result of the intelligent voice robot is optimized based on the user evaluation.

Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Those skilled in the art will appreciate that all or part of the steps to implement the above-described embodiments are implemented as programs (computer programs) executed by a computer data processing apparatus. When the computer program is executed, the method provided by the invention can be realized. Furthermore, the computer program may be stored in a computer readable storage medium, which may be a readable storage medium such as a magnetic disk, an optical disk, a ROM, a RAM, or a storage array composed of a plurality of storage media, such as a magnetic disk or a magnetic tape storage array. The storage medium is not limited to centralized storage, but may be distributed storage, such as cloud storage based on cloud computing.

Embodiments of the apparatus of the present invention are described below, which may be used to perform method embodiments of the present invention. The details described in the device embodiments of the invention should be regarded as complementary to the above-described method embodiments; reference is made to the above-described method embodiments for details not disclosed in the apparatus embodiments of the invention.

FIG. 3 is a diagram illustrating a human-computer interaction device based on user voice feature analysis, according to an example embodiment.

As shown in fig. 3, the human-computer interaction device 300 based on user voice feature analysis may specifically include a voice obtaining module 301, a semantic extracting module 302, a customer service response model 303, a voice sending module 304, an evaluation obtaining module 305, and a modification module 306.

The voice acquiring module 301 is configured to acquire voice information of the user during the IVR voice call of the user.

And a semantic extraction module 302, configured to perform semantic recognition on the user voice information, and extract and form text data corresponding to the voice information.

And the customer service response model 303 is configured to analyze the user voice characteristics based on the user voice information and the text data, and generate an interactive feedback voice. More specifically, as shown in fig. 4, the user response model 303 includes a voiceprint-based user speech recognition unit 401, a semantic-based user speech feature recognition unit 402, a feedback information generation unit 403, and a speech simulation synthesis unit 404.

The voiceprint-based user voice recognition unit 401 is configured to perform voiceprint recognition on the user voice information to determine the user voice characteristics. More specifically, as shown in fig. 5, the voiceprint-based voice feature recognition unit 401 includes a gender feature recognition submodel 501, an accent feature recognition submodel 502, an emotion feature recognition submodel 503, and a speech rate feature recognition submodel 504. And the sub-models take historical call data and sound characteristic feature labels as training sample data and adopt a machine learning method to construct corresponding sub-models. And when each sub-model is applied, the voice information of the current user is acquired, and the voice characteristics used at present are acquired through model evaluation. In the present invention, the machine learning algorithm is a commonly used technical means in the field, and is not described herein in detail.

Further, the emotion feature recognition sub-model 503 can perform user emotion feature recognition based on the energy sequence and/or waveform pattern for speech, and has a sampling component for continuously sampling the user speech to obtain the speech energy sequence and/or speech waveform pattern of the user; and the recognition component is used for recognizing the emotional characteristics of the user according to the voice energy sequence and/or the voice oscillogram.

A semantic-based user voice feature recognition unit 402 that inputs the text data into the recognition unit to recognize the voice feature of the user. More specifically, semantic-based user speech feature recognition section 402 constructs a correspondence map of keywords and emotion features, and constructs a semantic-based speech feature recognition submodel by using a machine learning method using text information and emotion features corresponding to a conversation as training samples. And taking the character information corresponding to the conversation as a model input layer, and obtaining the emotional characteristics possibly contained in the conversation through model operation.

The feedback information generating unit 403 determines the intention of the user voice conversation according to the text data, matches the intention with a preset conversation library, and generates specific feedback information.

The voice simulation synthesizing unit 404 has a voice simulation scheme and a voice library corresponding to the voice feature, synthesizes a voice simulation mode adapted to the voice feature of the user according to the voice feature of the current user by a voice simulation technique based on the analysis of the voice feature of the user, and performs voice simulation on the specific feedback information generated by the feedback information generating unit 403 to generate an interactive feedback voice.

In the process of man-machine interaction with a user, the voice acquisition module 301 acquires latest user voice information in real time, judges the current voice characteristics of the user through the customer service response model 303, corrects the voice characteristics of the user in real time, generates interactive feedback voice according to the corrected voice characteristics of the user in a simulation mode, and sends the interactive feedback voice to the user.

An evaluation obtaining module 305, configured to obtain user evaluation information after the user voice session is ended.

And the correcting module 306 is used for optimizing the customer service response model according to the user evaluation information.

In the following, embodiments of the electronic device of the present invention are described, which may be regarded as specific physical implementations for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.

FIG. 6 is a block diagram of an electronic device architecture for client testing based on simulation of server responses in accordance with the present invention. An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.

Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 3.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention. The computer program, when executed by a data processing apparatus, enables the computer readable medium to carry out the above-described methods of the invention.

The computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In summary, the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components in embodiments in accordance with the invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP). The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims

1. A human-computer interaction method based on user sound characteristic analysis is characterized by comprising the following steps:

acquiring user voice information;

2. The method of claim 1, wherein:

and the customer service response model carries out voiceprint analysis on the user voice information and generates the voice characteristics of the user by combining semantic analysis on the text data.

3. The method of claims 1-2, wherein the user voice characteristics further comprise user gender characteristics, accent characteristics, mood characteristics, speech rate characteristics, and speech style characteristics.

4. A method according to claims 1-3, comprising:

and recognizing the emotional features of the user based on the energy value and/or the wave form diagram of the voice of the user.

5. The method of claims 1-4, comprising:

and continuously sampling the user voice to obtain a voice energy sequence and/or a voice waveform diagram of the user, and identifying the emotional characteristics of the user according to the voice energy sequence and/or the voice waveform diagram.

6. The method of claims 1-5, wherein the step of voiceprint parsing of the user's voice information, in combination with the analysis of the text data, to generate the user's voice characteristics further comprises:

constructing a semantic-based sound feature recognition sub-model;

inputting the text data into the recognition submodel to recognize a voice characteristic of the user.

7. The method of claims 1-6, further comprising:

the customer service response model judges the voice conversation intention of the user according to the text data, and matches the voice conversation intention with a preset conversation library to generate feedback information;

and synthesizing and generating interactive feedback voice by using a voice simulation technology based on the user voice characteristics and the feedback information.

8. A human-computer interaction device based on user sound characteristic analysis is characterized by comprising:

9. An electronic device, wherein the electronic device comprises:

a processor; and the number of the first and second groups,

a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-7.

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.