CN117828045A

CN117828045A - Intelligent session response method, device, electronic equipment and storage medium

Info

Publication number: CN117828045A
Application number: CN202311748206.8A
Authority: CN
Inventors: 靳会勤
Original assignee: China Unicom Online Information Technology Co Ltd
Current assignee: China Unicom Online Information Technology Co Ltd
Priority date: 2023-12-18
Filing date: 2023-12-18
Publication date: 2024-04-05

Abstract

The application provides an intelligent session response method, an intelligent session response device, electronic equipment and a storage medium. The method comprises the steps of obtaining user information of a calling user, and sending the user information to a user module for user information verification; receiving a verification passing instruction and sending an opening audio acquisition request to an NLP engine; the acquired starting audio is sent to a calling user side for playing; recording audio information of a calling user; the audio information and the user information are sent to a corpus processing model in an NLP engine to be identified and reply information is generated; and performing session control according to the received reply information. In this way, the accuracy of the session reply can be improved.

Description

Intelligent session response method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of data processing, and in particular relates to an intelligent session response method, an intelligent session response device, electronic equipment and a storage medium.

Background

The natural language processing technology is a cross subject field combining natural language and computer technology, and relates to technologies such as computational linguistics, voice processing, text mining, machine learning, neural networks and the like, so that a computer can better understand, process, generate and interact with human beings.

However, compared with the existing intelligent dialogue system, the intelligent dialogue system has large development difficulty and long period, often lacks flexibility and generalization capability, cannot process complex semantics and information, and cannot provide accurate response.

Disclosure of Invention

The disclosure provides an intelligent session response method, an intelligent session response device, electronic equipment and a storage medium.

According to a first aspect of the present disclosure, an intelligent session answer method is provided. The method comprises the following steps:

acquiring user information of a calling user, and sending the user information to a user module for user information verification;

receiving a verification passing instruction and sending an opening audio acquisition request to an NLP engine;

the acquired starting audio is sent to a calling user side for playing;

recording audio information of a calling user;

the audio information and the user information are sent to a corpus processing model in an NLP engine to be identified and reply information is generated;

and performing session control according to the received reply information.

Further, the intelligent session response method further comprises the following steps:

and performing text conversion and voice breakpoint recognition on the audio information, and sending the audio information to a corpus processing model in an NLP engine.

Further, the session control according to the received reply information includes:

judging a session mark in the reply information;

if the session is marked as continuous, sending the audio questions in the reply information to the calling user side;

and if the session is marked as ended, sending the reply audio in the reply information to the calling user side, and hanging up.

obtaining reply information of a calling user side;

generating a conversation round number according to the reply information;

the session round number is sent to the NLP engine to judge the round number;

and controlling whether to end the current round of session according to the round number judging result.

obtaining a session control result, wherein the session control result comprises a hang-up result;

judging whether the calling user actively hangs up according to the hang-up result;

if the active on-hook of the calling user is judged, session data are sent to the user module for storage after the active on-hook of the calling user is judged;

if the judgment is that the system hangs up, the session data is sent to the user module for storage after the system hangs up.

acquiring historical session data in the user module;

classifying the historical session data;

clustering and entity extraction are carried out on the user problems in the historical session data according to the classification result;

determining a question answer according to the clustering result and the extraction result;

and formulating an improvement strategy according to the question answer, and sending the improvement strategy to the NLP engine for corpus processing model optimization.

Further, the classifying the historical session data includes:

classifying the ratings of the reply information according to users, including unanswered, incorrect replies and unsatisfactory replies.

According to a second aspect of the present disclosure, an intelligent session answering device is provided. The device comprises:

the acquisition module is used for acquiring the user information of the calling user and sending the user information to the user module for user information verification;

the processing module is used for receiving the verification passing instruction and sending an opening audio acquisition request to the NLP engine;

the first sending module is used for sending the acquired starting audio to a calling user side for playing;

the recording module is used for recording the audio information of the calling user;

the second sending module is used for sending the audio information and the user information to a corpus processing model in the NLP engine for recognition and generating reply information;

and the control module is used for carrying out session control according to the received reply information.

According to a third aspect of the present disclosure, an electronic device is provided. The electronic device includes: the system comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes an intelligent session response method and/or an intelligent session response method when executing the program.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an intelligent session response method and/or an intelligent session response method.

The method and the device can help identify the identity of the user through user information verification, ensure the identity and the validity, provide personalized requirements, provide questions, guidance or welcome information for the user through setting the starting audio, guide the user to answer or ask questions, help to obtain accurate questions of the user in starting, improve the accuracy of the answers, send the audio information and the user information to a corpus processing model for feature recognition and combination of the user information for semantic understanding and answer generation, and the corpus processing model is trained by using a large amount of training data and deep learning technology, so that the accuracy of the answers can be improved, finally perform session control according to the received answer information, help to more intelligently manage conversation processes, improve the continuity of the sessions and prevent excessive resource consumption.

It should be understood that the description in this summary is not intended to limit key or critical features of the disclosed embodiments, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:

FIG. 1 illustrates a flow chart of an intelligent session answer method according to an embodiment of the disclosure;

FIG. 2 illustrates a block diagram of an intelligent session answering device, according to an embodiment of the present disclosure;

fig. 3 illustrates a block diagram of an exemplary electronic device capable of implementing embodiments of the present disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments in this disclosure without inventive faculty, are intended to be within the scope of this disclosure.

In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Fig. 1 shows a flowchart of an intelligent session answer method according to an embodiment of the disclosure, which is applied to a front-end processor, and includes:

s101, obtaining user information of a calling user, and sending the user information to a user module for user information verification. The user information can be a calling number, a called number and a call forwarding type, the called user orders a communication assistant product, when the called user cannot receive a call, a busy and inaccessible condition is triggered, and the call of the call forwarding calling user is forwarded to the front-end processor.

S102, receiving a verification passing instruction and sending an opening audio acquisition request to the NLP engine.

For example, after the user module passes the verification according to the calling number, the called number and the call forwarding type, the passing instruction is sent to the front-end processor, so that the front-end processor performs subsequent operations.

And S103, the acquired starting audio is sent to a calling user side for playing.

For example, the start audio may be a short and friendly greeting to guide the user to begin interacting with the smartphone assistant.

S104, recording the audio information of the calling user. Recording the speaking audio after the user listens to the starting audio.

In some embodiments, after recording the audio information of the calling user, the audio information is subjected to text conversion and speech breakpoint recognition and sent to a corpus processing model in the NLP engine.

For example, through a voice recognition technology or voice recording equipment, audio information of a calling user is recorded and stored as an audio file or an audio stream, voice is parsed into text, and voice breakpoint is carried out, the text information is text converted from the voice of the calling user, the voice breakpoint information describes the position of a sentence break appearing in the voice, and the extraction of the voice breakpoint information can also help the system understand the structure of a dialogue and better process multiple rounds of dialogue and context information. And finally, the acquired text information and voice breakpoint information are sent to a corpus processing model in the NLP engine.

And S105, the audio information and the user information are sent to a corpus processing model in an NLP engine for recognition and reply information generation.

For example, the corpus processing model may utilize existing knowledge bases, rules, and algorithms to analyze, classify, and semantically understand audio information or processed and translated text information, and generate corresponding reply information based on user information. The embodiment realizes conversion between voice and text, so that the system can better understand the requirements of users and provide accurate replies.

S106, performing session control according to the received reply information.

In some embodiments, the performing session control according to the received reply information includes: judging a session mark in the reply information; if the session is marked as continuous, sending the audio questions in the reply information to the calling user side; and if the session is marked as ended, sending the reply audio in the reply information to the calling user side, and hanging up.

For example, each reply to the question of the caller contains a session flag, which is marked according to the question of the caller, and if the reply cannot answer the question of the caller clearly once, the session flag will be marked as continuous, and a question audio will be formed to further ask the last question of the caller to generate a more accurate reply. Otherwise, if the session is marked as ended, the front-end processor actively hangs up after sending the reply audio to the calling user.

In some embodiments, after sending the audio questions in the reply information to the calling user side, obtaining the reply information of the calling user side; generating a conversation round number according to the reply information; the session round number is sent to the NLP engine to judge the round number; and controlling whether to end the current round of session according to the round number judging result.

For example, the answer to a question is regarded as a round of dialogue, the total number of rounds is calculated after each round of ending, a maximum dialogue round number threshold value can be set, when the dialogue round number reaches the threshold value, it can be judged that the dialogue has exceeded an acceptable range, if a problem that answer information of a calling user cannot be interpreted or repeated answer is always generated, the round of dialogue can be ended, corresponding ending audio can be sent, and the front-end computer actively hangs up the telephone to prevent excessive resource consumption.

In some embodiments, after performing session control according to the received reply information, obtaining a session control result, where the session control result includes an on-hook result; judging whether the calling user actively hangs up according to the hang-up result; if the active on-hook of the calling user is judged, session data are sent to the user module for storage after the active on-hook of the calling user is judged; if the judgment is that the system hangs up, the session data is sent to the user module for storage after the system hangs up.

For example, after hanging up, whether the user actively hangs up or the front-end hangs up, the whole session data is saved, and the saved session data plays an important role in personalized service, system improvement, continuous service and compliance supervision, so that the user needs can be better understood and satisfied through reasonable data management and analysis, and high-quality and accurate replies are provided.

The embodiment of the invention can help identify the identity of the user through user information verification, ensure the identity and the validity, provide personalized requirements, provide information such as questions, guidance or welcome to the user through setting the starting audio, thereby guiding the user to answer or ask questions, being beneficial to obtaining accurate questions of the user during starting, improving the accuracy of the answers, then sending the audio information and the user information to a corpus processing model for feature recognition and combining the user information for semantic understanding and answer generation, wherein the corpus processing model is trained by utilizing a large amount of training data and deep learning technology, can improve the accuracy of the answers, finally carry out session control according to the received answer information, be beneficial to more intelligently managing the conversation flow, improve the continuity of the session and prevent excessive resource consumption.

The intelligent session response method of the embodiment of the disclosure further comprises the following steps: acquiring historical session data in the user module; classifying the historical session data; clustering and entity extraction are carried out on the user problems in the historical session data according to the classification result; determining a question answer according to the clustering result and the extraction result; and formulating an improvement strategy according to the question answer, and sending the improvement strategy to the NLP engine for corpus processing model optimization.

For example, for questions posed by users in each type of session data, a clustering algorithm (e.g., K-means, hierarchical clustering, etc.) is used to group similar questions, entity extraction is performed on clustered question groups, entity information involved in the questions, such as person names, place names, product names, etc., are identified, which can help better understand the context of the questions and key information to be answered, whether there are wrong or unsatisfactory questions to answer the model in each cluster group is determined, an improvement strategy is formulated for the answers to the questions, which can be formulated manually, and the well strategy is sent to a corpus processing model to help optimize or retrain. Through analysis of historical conversation, corpus clustering and entity extraction, the model is found to answer errors or dissatisfaction on which questions and which knowledge points need to be improved or supplemented, so that the model can be helped to conduct targeted optimization and increase of a knowledge base, and the accuracy of conversation and the user satisfaction are improved.

In some embodiments, the classifying the historical session data includes: classifying the ratings of the reply information according to users, including unanswered, incorrect replies and unsatisfactory replies.

For example, after the session of each calling user is finished, there is an evaluation of the current session, and the evaluation is stored in the historical session data of the corresponding user.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.

The foregoing is a description of embodiments of the method, and the following further describes embodiments of the present disclosure through examples of apparatus.

Fig. 2 shows a block diagram of an intelligent session answering device according to an embodiment of the present disclosure, the device comprising:

the acquisition module 201 is configured to acquire user information of a calling user, and send the user information to the user module for user information verification;

the processing module 202 is configured to receive the verification pass instruction and send an open audio acquisition request to the NLP engine;

the first sending module 203 is configured to send the obtained start audio to a calling user side for playing;

a recording module 204, configured to record audio information of a calling user;

the second sending module 205 is configured to send the audio information and the user information to a corpus processing model in an NLP engine for recognition and generate reply information;

and the control module 206 is used for performing session control according to the received reply information.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the described modules may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.

Fig. 3 shows a schematic block diagram of an electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

The electronic device includes a computing unit 301 that can perform various appropriate actions and processes according to a computer program stored in a ROM302 or a computer program loaded from a storage unit 308 into a RAM 303. In the RAM303, various programs and data required for the operation of the electronic device can also be stored. The computing unit 301, the ROM302, and the RAM303 are connected to each other by a bus 304. I/O interface 305 is also connected to bus 304.

A number of components in the electronic device are connected to the I/O interface 305, including: an input unit 306 such as a keyboard, a mouse, etc.; an output unit 307 such as various types of displays, speakers, and the like; a storage unit 308 such as a magnetic disk, an optical disk, or the like; and a communication unit 309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 309 allows the electronic device to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 301 performs the various methods and processes described above, such as an intelligent session answer method. For example, in some embodiments, an intelligent session answering method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM302 and/or the communication unit 309. When the computer program is loaded into RAM303 and executed by computing unit 301, one or more steps of an intelligent session answer method described above may be performed. Alternatively, in other embodiments, the computing unit 301 may be configured to perform an intelligent session answer method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a readable storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The readable storage medium may be a machine-readable signal medium or a machine-readable storage medium. The readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: display means for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that the various forms of flow described above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An intelligent session response method applied to a front-end processor is characterized by comprising the following steps:

the acquired starting audio is sent to a calling user side for playing;

recording audio information of a calling user;

and performing session control according to the received reply information.

2. The intelligent session answer method of claim 1 in which the method further comprises:

after the audio information of the calling user is recorded, the audio information is subjected to text conversion and voice breakpoint recognition and is sent to a corpus processing model in an NLP engine.

3. The intelligent session response method according to claim 1, wherein the performing session control according to the received reply information includes:

judging a session mark in the reply information;

4. The intelligent session answer method of claim 3 further comprising:

after the audio questions in the reply information are sent to the calling user side, obtaining the reply information of the calling user side;

generating a conversation round number according to the reply information;

the session round number is sent to the NLP engine to judge the round number;

5. The intelligent session answer method of claim 1, further comprising:

after session control is carried out according to the received reply information, a session control result is obtained, wherein the session control result comprises a hang-up result;

6. The intelligent session answer method of claim 1, further comprising:

acquiring historical session data in the user module;

classifying the historical session data;

7. The intelligent session answer method of claim 6 in which said classifying said historical session data comprises:

8. An intelligent session answering device, comprising:

9. An electronic device, comprising:

at least one processor;

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.