CN112133306B - Response method and device based on express delivery user and computer equipment - Google Patents

Response method and device based on express delivery user and computer equipment Download PDF

Info

Publication number
CN112133306B
CN112133306B CN202010766516.2A CN202010766516A CN112133306B CN 112133306 B CN112133306 B CN 112133306B CN 202010766516 A CN202010766516 A CN 202010766516A CN 112133306 B CN112133306 B CN 112133306B
Authority
CN
China
Prior art keywords
voice
user
text information
scene
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010766516.2A
Other languages
Chinese (zh)
Other versions
CN112133306A (en
Inventor
周韶宁
张兴海
钟磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Baishi Technology Co Ltd
Original Assignee
Zhejiang Baishi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Baishi Technology Co Ltd filed Critical Zhejiang Baishi Technology Co Ltd
Priority to CN202010766516.2A priority Critical patent/CN112133306B/en
Publication of CN112133306A publication Critical patent/CN112133306A/en
Application granted granted Critical
Publication of CN112133306B publication Critical patent/CN112133306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5166Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a response method, a response device and computer equipment based on express users. The method is executed by a server and comprises the following steps: s100, receiving and recognizing voice from a user; s102, judging whether the voice content can be identified, and if so, turning to S104; s104, translating the content of the recognized user voice into text information and storing the text information in a database; s106, identifying the intention of the text information; s108, calling a corresponding scene according to the intention; s110, calling a speaking operation action configuration platform according to the scene; s112, converting the text information of the corresponding reply phone in the phone action configuration platform into voice information and outputting the voice information to the user terminal.

Description

Response method and device based on express delivery user and computer equipment
Technical Field
The application relates to the field of intelligent voice customer service, in particular to a response method, a response device, computer equipment and a computer storage medium based on express users.
Background
Customer service telephones in the express industry can provide a plurality of services related to express items for consumers, such as express waybill inquiry, waybill state inquiry, telephone ordering, business consultation, complaint advice, manual service and the like, and effectively solve various doubts of users. But the express industry still uses a large amount of ways based on voice keys to construct intelligent voice customer service robots. Based on the key selection mode, the input of a user is limited, the problem that the incoming call feedback of the user cannot be solved in a multi-dimensional and high coverage mode is often inflexible, such as:
the traditional voice customer service main menu is long, a great amount of broadcasting time is required to be consumed, and a user often does not have enough time to hear the record of the broadcasting main menu, and hangs up directly, so that the secondary repeated incoming call and the manual transfer rate are greatly improved;
the input of the user is often limited, the requirement of the user can not be met only through a key input interaction mode, the actual user experience is poor, the poor evaluation rate of the intelligent customer service system is improved, and the evaluation rate and the FCR are reduced;
not intelligent enough, the data fed back by the user cannot be continuously collected. The scene can not be flexibly switched according to the input of the user, the intention is identified, and the user experience is poor.
Disclosure of Invention
In order to solve at least one of the above problems, a first aspect of the present application provides a response method based on an express user, which is executed by a server, including:
s100, receiving and recognizing voice from a user;
s102, judging whether the voice content can be identified, and if so, turning to S104;
s104, translating the content of the recognized user voice into text information and storing the text information in a database;
s106, identifying the intention of the text information;
s108, calling a corresponding scene according to the intention;
s110, calling a speaking operation action configuration platform according to the scene;
s112, converting the text information of the corresponding reply phone in the phone action configuration platform into voice information and outputting the voice information to the user terminal.
In a specific embodiment, if the voice content is not recognized, the method further includes S103:
the server broadcasts inquiry voice, and recognizes the response voice of the user again, and the user is unsuccessful for two times and turns to the manual customer service.
In a specific embodiment, the step S106 further includes:
s1060, performing word segmentation on the text information to obtain a plurality of word segmentation text information, performing mitie quantization on the plurality of word segmentation text information respectively to obtain corresponding 271-dimensional vectors, and performing weighted average on the plurality of 271-dimensional vectors;
s1062, dividing the text information into single words to obtain a plurality of single word text information, respectively carrying out bert quantization on the plurality of single word text information to respectively obtain corresponding 768-dimensional vectors, and carrying out weighted average on the obtained 768-dimensional vectors;
s1064, combining the vector of 271 dimensions after weighted averaging with the vector of 768 dimensions after weighted averaging to obtain a vector of 1039 dimensions corresponding to the text information;
s1066, identifying the intention of the 1039-dimensional vector corresponding to the text information using the classifier model,
wherein the classifier model is obtained by: and performing the steps from S1060 to S1064 on the historical dialogues recorded in the database to obtain 1039-dimensional vectors corresponding to the historical dialogues, inputting N1039-dimensional vectors corresponding to the historical dialogues with marked intentions as training samples, wherein the intentions of the training samples are M intentions, and training is performed to obtain the training samples.
In a specific embodiment, the step S1066 further includes:
before training to obtain a classifier model, using a loss function, wherein the classifier model uses the loss function as an optimization target, and using an adaptive gradient descent method to carry out iterative solution to obtain the classifier model, and the loss function is as follows:
wherein ,
wherein i.e.1, 2, 3..N, N is a natural number,representing the number of training samples; j e 1,2, 3..m, M being a natural number representing the number of intentions of the training sample; x is x i Representing a training sample; y is j Represents x i Is a true intent of (1); y is j ' represents y j Most similar intent.
In a specific embodiment, the step S108 further includes:
the server broadcasts the scene confirmation voice, receives the confirmation response voice of the user side, recognizes and judges whether the scene is correct, and if yes, executes S110; if not, reminding whether to carry out recognition again or transfer to manual customer service processing, and if so, returning to S106.
In a specific embodiment, the step S110 further includes:
and predicting a call-back operation of the corresponding scene stored by the call-back operation action configuration platform according to the scene and the voice dialogue input by the user terminal.
In a specific embodiment, the step S112 further includes:
training a dialogue output model according to the regional dialect or accent characteristics of the user, converting the corresponding text into voice according to the model, outputting and feeding back to the user.
The second aspect of the present application provides a response device based on an express user, including:
the voice receiving module is used for receiving and recognizing the voice from the user side;
and a judging module: the translation module is used for judging whether the voice content can be identified, and if the voice content can be identified, the translation module is transferred to;
a translation module for translating the content of the recognized user voice into text information and storing in a database;
the intention recognition module is used for recognizing the intention of the text information;
the scene module is used for calling a corresponding scene according to the intention;
the speaking operation action configuration module is used for calling a speaking operation action configuration platform according to the corresponding scene;
and the synthesized voice module is used for converting the text information of the corresponding answer phone in the phone action configuration platform into voice information and outputting the voice information to the user side.
A third aspect of the present application provides a computer apparatus comprising:
one or more processors;
a storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement, for example, a response method based on the express user.
A fourth aspect of the present application provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a response method as an express user-based method.
The beneficial effects of the application are as follows:
the technical scheme of the application can solve the problems of poor satisfaction degree and insufficient intelligence of intelligent voice customer service based on the key menu. The intelligent voice customer service of the express industry is realized, the intelligent and intelligent performance is realized, and a large amount of labor cost is saved. And intelligent voice full-voice conversion is realized, and a menu selection button is not required to be pressed. The user can directly speak, the robot recognizes the dialogue content by an intention recognition method, and enters a corresponding scene, so that the user experience is greatly improved; the efficiency of solving the problem of intelligent customer service is improved, the secondary incoming call of a user is reduced, the labor rate of transfer is reduced, and the labor cost is greatly saved; through a more intelligent intention recognition algorithm, the whole system can become more intelligent in the iterative process.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a system architecture diagram of a response method based on an express user according to an embodiment of the present application.
Fig. 2 shows a flowchart of a response method based on an express user according to an embodiment of the present application.
Fig. 3 shows a schematic diagram of an express user based answering device according to one embodiment of the present application.
Fig. 4 shows a schematic structural diagram of a computer device implementing the response method of the present application according to an embodiment of the present application.
Detailed Description
In order to more clearly illustrate the present application, the present application will be further described with reference to preferred embodiments and the accompanying drawings. Like parts in the drawings are denoted by the same reference numerals. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and that this application is not limited to the details given herein.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the express user-based response method or the express user-based response device of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include a client 101, a network 104, and a server 107. The network 104 is the medium used to provide communication links between the clients 101 and the servers 107. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.
A user may interact with the server 107 via the network 104 using the client 101 to receive or transmit voice, etc.
The user terminal 101 may be a variety of electronic devices with a display screen including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 107 may be a server that provides various services, and the server 107 may analyze received voice and process the received voice, and feedback the processing result (e.g., a reply phone) to the client 101.
It should be noted that the numbers of clients, networks and servers in fig. 1 are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Example 1
As shown in fig. 2, an embodiment of the present application provides a response method based on an express user, which in one example is implemented in the server 107 of fig. 1, including:
s100, receiving and recognizing the voice from the user terminal 101.
S102, judging whether the voice content can be identified, and if the voice content can be identified, proceeding to S104.
If the voice content is not recognized, the method further includes S103:
the server broadcasts inquiry voice, and recognizes the response voice of the user again, and the user is unsuccessful for two times and turns to the manual customer service.
In a specific example, the server 107 receives that the voice from the user terminal 101 is "you good, i want to send an express, send to the address is" xxxxxxxxxxxxxxx "and the phone is" xxxxxxxxxxxx "and the name is" XXX ", if the server 107 recognizes the voice content, S104 is executed; if the server 107 fails to recognize the content of the voice, the server 107 reports that the inquiry voice is "please ask what can help you," and again recognizes the user response voice transmitted from the user terminal 101, if the user voice is not recognized after the two inquiry voices are reported, the server 107 directly switches the manual customer service for the user terminal 101.
It should be noted that the user voice content includes, but is not limited to, at least one of posting, urging, checking, business consultation, and complaint recording.
In one specific example, the user's speech is not necessarily standard mandarin, but may be a local language or accent specific language, and the server uses a language model trained with the local dialect or accent specific to more accurately identify the content of the user's speech, depending on the region in which the user is located.
S104, translating the content of the recognized user voice into text information and storing the text information in a database.
In one specific example, ASR speech is used to recognize and translate the content of user speech into textual information, which is stored in a database.
S106, identifying the text information intention, including:
s1060, word segmentation is carried out on the text information to obtain a plurality of word segmentation text information, the word segmentation text information is respectively subjected to Mitie quantization to obtain corresponding 271-dimensional vectors, and the obtained 271-dimensional vectors are weighted and averaged.
In one specific example, ASR speech recognizes user speech and translates the user speech into "I'm search" text content, divides the "I'm search" text into three word-segmented text messages, "I", "I'm", "search" and "I'm", subjects the three sub word-segmented text messages "I", "I'm", "search" to Mitie quantization to obtain three 271-dimensional vectors, and weight-averages the three vectors to obtain a 271-dimensional vector representing the three word segments of "I'm search".
S1062, dividing the text information into single words to obtain a plurality of single word text information, respectively carrying out bert quantization on the plurality of single word text information to obtain corresponding 768-dimensional vectors, and carrying out weighted average on the plurality of 768-dimensional vectors.
In a specific example, the text of "I'm to be searched" is divided into four single-word text information of "I", "to be searched" and "E", the four single-word text information of "I", "to be searched" and "E" is subjected to bert quantization to obtain four 768-dimensional vectors, and the four 768-dimensional vectors representing the four single words of "I'm to be searched" are obtained by weighted average of the four vectors.
And S1064, combining the vector of 271 dimensions after weighted averaging with the vector of 768 dimensions after weighted averaging to obtain a vector of 1039 dimensions corresponding to the text information.
In a specific example, the vector of 271 dimensions after weighted averaging is preceded, the vector of 768 dimensions after weighted averaging is followed, and a vector combination and splicing are performed, so as to obtain the vector of 1039 dimensions corresponding to the text information, and the combination manner can more accurately define the intention of the user voice input.
S1066, identifying the intention of the 1039-dimensional vector corresponding to the text information using the classifier model,
wherein the classifier model is obtained by: and performing the steps from S1060 to S1064 on the historical dialogues recorded in the database to obtain 1039-dimensional vectors corresponding to the historical dialogues, inputting N1039-dimensional vectors corresponding to the historical dialogues with marked intentions as training samples, wherein the intentions of the training samples are M intentions, and training is performed to obtain the training samples.
In a specific example, a loss function is used before training to obtain a classifier model, the classifier model uses the loss function as an optimization target, and an adaptive gradient descent method is used for iterative solution to obtain the classifier model, wherein the loss function is as follows:
wherein ,
wherein i e 1,2, 3..n, N is a natural number representing the number of training samples; j e 1,2, 3..m, M being a natural number representing the number of intentions of the training sample; x is x i Representing a training sample; y is j Represents x i Is a true intent of (1); y is j ' represents y j Most similar intent. The smaller the loss function, the better the classifier model.
In a specific example, the classifier model is used to identify the intent of the 1039-dimensional vector corresponding to the text information, and the intent of the user's voice is obtained using an optimization method.
S108, calling a corresponding scene according to the intention, wherein the method comprises the following steps:
the server broadcasts the scene confirmation voice, receives the confirmation response voice of the user side, recognizes and judges whether the scene is correct, and if yes, executes S110; if not, reminding whether to carry out recognition again or transfer to manual customer service processing, and if so, returning to S106.
In a specific example, the server 107 broadcasts a scene corresponding to the voice intention of the user to confirm, if yes, the server 107 confirms the user information, including but not limited to at least one of name, phone and address; if not, the server 107 broadcasts to inquire whether the user re-recognizes or transfers the voice to the manual service, and after receiving the user response, re-recognizes and returns the voice content of the recognized user to translate into text information or directly transfers the voice to the manual service for service.
S110, calling a speaking operation action configuration platform according to the scene.
In a specific example, the reply phone operation of the corresponding scene stored by the phone operation action configuration platform is predicted according to the scene and the voice dialogue input by the user side. The voice operation action configuration platform comprises at least one of general voice operation, standard problem voice operation, branch voice operation and main voice operation, and can configure different voice operation replies according to different scenes. For example, "check" intends to use pacifying first, and then report the status of the delivery in real time.
S112, converting the corresponding reply actions and text information of the reply phone in the phone action configuration platform into voice information and outputting the voice information to the user side.
In a specific example, the conversion of text information in the speech library into speech information may be performed according to the characteristics of the regional dialect or accent of the user, a specific training model is provided, the corresponding text is converted into speech according to the model, the speech is output and fed back to the user, then a first intention query speech is generated based on the current scene and is sent to the user, for example, the current scene of the user speech is "search file", after the server 107 feeds back the real-time running state of the express to the user 101, the server 107 actively queries the user 101, for example, "please ask for help to promote the user to promote the file? ", the next intention of the user terminal 101 is determined until all intention execution of the user terminal 101 ends.
Aiming at the existing problems at present, the application provides a response method based on express users, which can solve the problems of poor satisfaction degree and insufficient intelligence of intelligent voice customer service based on a key menu. The intelligent voice customer service of the express industry is realized, the intelligent and intelligent performance is realized, and a large amount of labor cost is saved. And intelligent voice full-voice conversion is realized, and a menu selection button is not required to be pressed. The user can directly speak, the robot recognizes the dialogue content by an intention recognition method, and enters a corresponding scene, so that the user experience is greatly improved; the efficiency of solving the problem of intelligent customer service is improved, the secondary incoming call of a user is reduced, the labor rate of transfer is reduced, and the labor cost is greatly saved; through a more intelligent intention recognition algorithm, the whole system can become more intelligent in the iterative process.
Example two
As shown in fig. 3, as an implementation of the above-mentioned response method based on the express user, an embodiment of the present application provides a response device based on the express user, where an embodiment of the device corresponds to the embodiment of the method shown in fig. 2.
An express user-based response device, comprising:
and the voice receiving module is used for receiving and recognizing the voice from the user side.
And a judging module: and the translation module is used for judging whether the voice content can be identified, and if the voice content can be identified, the translation module is transferred to.
And the translation module is used for translating the content of the recognized user voice into text information and storing the text information in the database.
And the intention recognition module is used for recognizing the intention of the text information.
And the scene module is used for calling the corresponding scene according to the intention.
And the speaking operation action configuration module is used for calling a speaking operation action configuration platform according to the corresponding scene.
And the synthesized voice module is used for converting the corresponding reply actions and text information of the reply phone in the phone action configuration platform into voice information and outputting the voice information to the user side.
It will be appreciated by those skilled in the art that the intelligent answering device according to the above-described method of applying embodiment one also includes some other known structure, such as a processor, a memory, etc.
Aiming at the existing problems at present, the application provides a response device based on an express user, which can solve the problems of poor satisfaction degree and insufficient intelligence of intelligent voice customer service based on a key menu. The intelligent voice customer service of the express industry is realized, the intelligent and intelligent performance is realized, and a large amount of labor cost is saved. And intelligent voice full-voice conversion is realized, and a menu selection button is not required to be pressed. The user can directly speak, the robot recognizes the dialogue content by an intention recognition method, and enters a corresponding scene, so that the user experience is greatly improved; the efficiency of solving the problem of intelligent customer service is improved, the secondary incoming call of a user is reduced, the labor rate of transfer is reduced, and the labor cost is greatly saved; through a more intelligent intention recognition algorithm, the whole system can become more intelligent in the iterative process.
Example III
As shown in FIG. 4, one embodiment of the present application provides a schematic structural diagram of a computer device, and the computer device 12 shown in FIG. 4 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.
As shown in FIG. 4, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the application.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown in fig. 4, the network adapter 20 communicates with other modules of the computer device 12 via the bus 18. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processor unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the method provided by the first embodiment of the present application.
Aiming at the existing problems at present, the application provides computer equipment which can solve the problems of poor satisfaction degree and insufficient intelligence of intelligent voice customer service based on a key menu. The intelligent voice customer service of the express industry is realized, the intelligent and intelligent performance is realized, and a large amount of labor cost is saved. And intelligent voice full-voice conversion is realized, and a menu selection button is not required to be pressed. The user can directly speak, the robot recognizes the dialogue content by an intention recognition method, and enters a corresponding scene, so that the user experience is greatly improved; the efficiency of solving the problem of intelligent customer service is improved, the secondary incoming call of a user is reduced, the labor rate of transfer is reduced, and the labor cost is greatly saved; through a more intelligent intention recognition algorithm, the whole system can become more intelligent in the iterative process.
Example IV
Another embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as provided in the above embodiment.
In practical applications, the computer-readable storage medium may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Aiming at the existing problems at present, the application provides a nonvolatile computer readable storage medium which can solve the problems of poor satisfaction degree and insufficient intelligence of intelligent voice customer service based on a key menu. The intelligent voice customer service of the express industry is realized, the intelligent and intelligent performance is realized, and a large amount of labor cost is saved. And intelligent voice full-voice conversion is realized, and a menu selection button is not required to be pressed. The user can directly speak, the robot recognizes the dialogue content by an intention recognition method, and enters a corresponding scene, so that the user experience is greatly improved; the efficiency of solving the problem of intelligent customer service is improved, the secondary incoming call of a user is reduced, the labor rate of transfer is reduced, and the labor cost is greatly saved; through a more intelligent intention recognition algorithm, the whole system can become more intelligent in the iterative process.
It should be understood that the foregoing examples of the present application are provided merely for clearly illustrating the present application and are not intended to limit the embodiments of the present application, and that various other changes and modifications may be made therein by one skilled in the art without departing from the spirit and scope of the present application as defined by the appended claims.

Claims (8)

1. The response method based on the express delivery user is executed by a server and is characterized by comprising the following steps:
s100, receiving and recognizing voice from a user;
s102, judging whether the voice content can be identified, and if so, turning to S104;
s104, translating the content of the recognized user voice into text information and storing the text information in a database;
s106, identifying the intention of the text information;
s108, calling a corresponding scene according to the intention;
s110, calling a speaking operation action configuration platform according to the scene;
s112, converting the text information of the corresponding reply phone in the phone action configuration platform into voice information and outputting the voice information to the user terminal;
the S106 further includes:
s1060, performing word segmentation on the text information to obtain a plurality of word segmentation text information, performing mitie quantization on the plurality of word segmentation text information respectively to obtain corresponding 271-dimensional vectors, and performing weighted average on the plurality of 271-dimensional vectors;
s1062, dividing the text information into single words to obtain a plurality of single word text information, respectively carrying out bert quantization on the plurality of single word text information to respectively obtain corresponding 768-dimensional vectors, and carrying out weighted average on the obtained 768-dimensional vectors;
s1064, combining the vector of 271 dimensions after weighted averaging with the vector of 768 dimensions after weighted averaging to obtain a vector of 1039 dimensions corresponding to the text information; the method comprises the steps of carrying out combination and splicing on a vector of 271 dimensions after weighted averaging and a vector of 768 dimensions after weighted averaging to obtain a vector of 1039 dimensions corresponding to the text information;
s1066, identifying the intention of the 1039-dimensional vector corresponding to the text information using the classifier model,
wherein the classifier model is obtained by: performing the steps of S1060 to S1064 on the history dialogue recorded in the database to obtain a 1039-dimensional vector corresponding to the history dialogue, inputting N1039-dimensional vectors corresponding to the history dialogue with marked intentions as training samples, wherein the intentions of the training samples are M intentions, and training is performed to obtain the training samples;
the S1066 further includes:
before training to obtain a classifier model, using a loss function, wherein the classifier model uses the loss function as an optimization target, and using an adaptive gradient descent method to carry out iterative solution to obtain the classifier model, and the loss function is as follows:
wherein ,
wherein i e 1,2, 3..n, N is a natural number representing the number of training samples; j e 1,2, 3..m, M being a natural number representing the number of intentions of the training sample; x is x i Representing a training sample; y is j Represents x i Is a true intent of (1); y is j ' represents y j Most similar intent.
2. The method according to claim 1, wherein if the speech content is not recognized, the method further comprises S103:
the server broadcasts inquiry voice, and recognizes the response voice of the user again, and the user is unsuccessful for two times and turns to the manual customer service.
3. The method of claim 1, wherein S108 further comprises:
the server broadcasts the scene confirmation voice, receives the confirmation response voice of the user side, recognizes and judges whether the scene is correct, and if yes, executes S110; if not, reminding whether to carry out recognition again or transfer to manual customer service processing, and if so, returning to S106.
4. The method of claim 1, wherein S110 further comprises:
and predicting a call-back operation of the corresponding scene stored by the call-back operation action configuration platform according to the scene and the voice dialogue input by the user terminal.
5. The method of claim 1, wherein S112 further comprises:
training a dialogue output model according to the regional dialect or accent characteristics of the user, converting the corresponding text into voice according to the model, outputting and feeding back to the user.
6. Express user based response device performing the method according to any of the claims 1-5, characterized by comprising:
the voice receiving module is used for receiving and recognizing the voice from the user side;
and a judging module: the translation module is used for judging whether the voice content can be identified, and if the voice content can be identified, the translation module is transferred to;
a translation module for translating the content of the recognized user voice into text information and storing in a database;
the intention recognition module is used for recognizing the intention of the text information;
the scene module is used for calling a corresponding scene according to the intention;
the speaking operation action configuration module is used for calling a speaking operation action configuration platform according to the scene;
and the synthesized voice module is used for converting the text information of the corresponding answer phone in the phone action configuration platform into voice information and outputting the voice information to the user side.
7. A computer device, comprising:
one or more processors;
a storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.
8. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the method according to any one of claims 1-5.
CN202010766516.2A 2020-08-03 2020-08-03 Response method and device based on express delivery user and computer equipment Active CN112133306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010766516.2A CN112133306B (en) 2020-08-03 2020-08-03 Response method and device based on express delivery user and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010766516.2A CN112133306B (en) 2020-08-03 2020-08-03 Response method and device based on express delivery user and computer equipment

Publications (2)

Publication Number Publication Date
CN112133306A CN112133306A (en) 2020-12-25
CN112133306B true CN112133306B (en) 2023-10-03

Family

ID=73851465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010766516.2A Active CN112133306B (en) 2020-08-03 2020-08-03 Response method and device based on express delivery user and computer equipment

Country Status (1)

Country Link
CN (1) CN112133306B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113314152A (en) * 2021-07-07 2021-08-27 上海中通吉网络技术有限公司 Method and equipment for judging whether call is effectively dialed out

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107135247A (en) * 2017-02-16 2017-09-05 江苏南大电子信息技术股份有限公司 A kind of service system and method for the intelligent coordinated work of person to person's work
CN110633475A (en) * 2019-09-27 2019-12-31 安徽咪鼠科技有限公司 Natural language understanding method, device and system based on computer scene and storage medium
CN111199149A (en) * 2019-12-17 2020-05-26 航天信息股份有限公司 Intelligent statement clarifying method and system for dialog system
CN111274824A (en) * 2020-01-20 2020-06-12 文思海辉智科科技有限公司 Natural language processing method, device, computer equipment and storage medium
CN111428028A (en) * 2020-03-04 2020-07-17 中国平安人寿保险股份有限公司 Information classification method based on deep learning and related equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107135247A (en) * 2017-02-16 2017-09-05 江苏南大电子信息技术股份有限公司 A kind of service system and method for the intelligent coordinated work of person to person's work
CN110633475A (en) * 2019-09-27 2019-12-31 安徽咪鼠科技有限公司 Natural language understanding method, device and system based on computer scene and storage medium
CN111199149A (en) * 2019-12-17 2020-05-26 航天信息股份有限公司 Intelligent statement clarifying method and system for dialog system
CN111274824A (en) * 2020-01-20 2020-06-12 文思海辉智科科技有限公司 Natural language processing method, device, computer equipment and storage medium
CN111428028A (en) * 2020-03-04 2020-07-17 中国平安人寿保险股份有限公司 Information classification method based on deep learning and related equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于 RASA 的智能语音对话系统;王雅君;《优秀硕士学位论文》;20190831;全文 *

Also Published As

Publication number Publication date
CN112133306A (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN110381221B (en) Call processing method, device, system, equipment and computer storage medium
CN105206272A (en) Voice transmission control method and system
CN102004624A (en) Voice recognition control system and method
WO2022142031A1 (en) Invalid call determination method and apparatus, computer device, and storage medium
CN106847256A (en) A kind of voice converts chat method
CN105120373A (en) Voice transmission control method and voice transmission control system
CN112235470B (en) Incoming call client follow-up method, device and equipment based on voice recognition
CN112989046A (en) Real-time speech technology prejudging method, device, computer equipment and storage medium
CN112133306B (en) Response method and device based on express delivery user and computer equipment
CN111400463B (en) Dialogue response method, device, equipment and medium
CN113111658A (en) Method, device, equipment and storage medium for checking information
CN113282733B (en) Customer service problem matching method, system, equipment and storage medium
CN111556096B (en) Information pushing method, device, medium and electronic equipment
CN115878768A (en) NLP-based vehicle insurance service call-back clue recommendation method and related equipment thereof
CN112632241A (en) Method, device, equipment and computer readable medium for intelligent conversation
CN111859902A (en) Text processing method, device, equipment and medium
CN111770236A (en) Conversation processing method, device, system, server and storage medium
EP4120245A2 (en) Method and apparatus for processing audio data, and electronic device
CN114757186B (en) User intention analysis method and device, computer storage medium and electronic equipment
CN111782775A (en) Dialogue method, device, equipment and medium
CN114493513B (en) Voice processing-based hotel management method and device and electronic equipment
CN112711654B (en) Chinese character interpretation technique generation method, system, equipment and medium for voice robot
CN114501112B (en) Method, apparatus, device, medium, and article for generating video notes
US12022026B2 (en) System and method for serving multiple customers by a live agent
CN114567700A (en) Interaction method, interaction device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant