CN113452853B - Voice interaction method and device, electronic equipment and storage medium - Google Patents
Voice interaction method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113452853B CN113452853B CN202110760477.XA CN202110760477A CN113452853B CN 113452853 B CN113452853 B CN 113452853B CN 202110760477 A CN202110760477 A CN 202110760477A CN 113452853 B CN113452853 B CN 113452853B
- Authority
- CN
- China
- Prior art keywords
- voice interaction
- voice
- outbound
- task
- priority
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 349
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000002452 interceptive effect Effects 0.000 claims abstract description 59
- 238000001914 filtration Methods 0.000 claims description 48
- 238000012545 processing Methods 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 11
- 230000006870 function Effects 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 claims description 4
- 230000008676 import Effects 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 230000003068 static effect Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 19
- 230000008569 process Effects 0.000 description 12
- 230000006854 communication Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 208000027534 Emotional disease Diseases 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003032 molecular docking Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/527—Centralised call answering arrangements not requiring operator intervention
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
Abstract
The disclosure provides a voice interaction method and device, electronic equipment and a storage medium, and relates to the technical field of computers. The voice interaction method comprises the following steps: collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters; setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task; performing voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene; and performing voice interaction with the target object through the interactive voice data so as to realize concurrent outbound of voice interaction tasks under a plurality of outbound scenes. The technical scheme of the embodiment of the disclosure can effectively improve the voice outbound efficiency and reduce the voice outbound cost.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a voice interaction method, a voice interaction apparatus, an electronic device, and a computer-readable storage medium.
Background
With the rapid development of communication networks, the size of users using communication network services has greatly increased, and accordingly, many network services need to be provided for users in a voice outbound manner.
Currently, the Voice call is realized either manually or by Interactive Voice Response (IVR) technology based on Voice recognition, semantic understanding and Voice synthesis. However, the voice outbound is realized in a manual mode, and under the condition that the scale of a user is large, not only is the efficiency low and the labor cost huge, but also the emotional problem of service personnel possibly exists; and the voice outbound is realized by the interactive voice response technology, the answer of the voice interaction is relatively fixed, the phenomenon of question-less answer is possibly caused, the interaction effect is not ideal, and the efficiency of the voice interaction is low.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a voice interaction method, a voice interaction apparatus, an electronic device, and a computer-readable storage medium, so as to overcome the problems of low voice outbound efficiency and unsatisfactory voice outbound effect in related schemes at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the embodiments of the present disclosure, there is provided a voice interaction method, including:
collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;
setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task;
performing voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;
and performing voice interaction with the target object through the interactive voice data so as to realize concurrent outbound of voice interaction tasks under a plurality of outbound scenes.
In some example embodiments of the present disclosure, the collecting voice interaction tasks from different data sources based on the foregoing scheme includes:
acquiring voice interaction tasks imported in batches from a management system based on a preset outbound task import template; and/or
Acquiring a voice interaction task from a third-party system based on an open standard interface; and/or
And capturing the voice interaction task from the third-party system based on the data acquisition tool.
In some example embodiments of the present disclosure, based on the foregoing, the method further includes:
converting the collected voice interaction tasks into voice interaction tasks in a standard format according to a pre-configured field conversion mapping relation; the voice interaction task in the standard format comprises a fixed static field and a dynamic personalized parameter.
In some example embodiments of the present disclosure, based on the foregoing scheme, setting a priority of each of the voice interaction tasks based on an outbound scenario corresponding to each of the voice interaction tasks includes:
constructing a plurality of number pools based on the outbound scenes corresponding to the voice interaction tasks, wherein the number pools correspond to different priorities;
and acquiring the priority attribute of the voice interaction task, and placing the voice interaction task into the number pool according to the priority attribute so as to finish setting the priority of each voice interaction task.
In some example embodiments of the present disclosure, based on the foregoing, the method further includes:
and distributing the voice relay lines corresponding to different outbound scenes.
In some example embodiments of the present disclosure, based on the foregoing, the method further includes:
filtering the voice interaction tasks under different outbound scenes according to preset filtering conditions;
the preset filtering conditions comprise black and white list filtering conditions, number segment filtering conditions, calling time period filtering conditions, multi-scene cross filtering conditions, repeated calling filtering conditions and calling scene directional filtering conditions.
In some example embodiments of the present disclosure, based on the foregoing solution, the generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalization parameter and the outbound scenario includes:
acquiring a mechanical speech template corresponding to the outbound scene; the machine telephony template comprises a plurality of voice interaction nodes;
determining a target voice interaction node according to the input information of the target object, and acquiring reply voice data corresponding to the target voice interaction node;
and assembling the dynamic personalized parameters into the reply voice data to generate interactive voice data corresponding to the voice interaction task.
In some example embodiments of the present disclosure, based on the foregoing, the method further includes:
if the voice interaction task is detected to be completed, acquiring a voice interaction record corresponding to the voice interaction task;
and if the voice interaction task is from a third-party system, returning the voice interaction record to the third-party system so that the third-party system can perform data association and other subsequent logic processing.
According to a second aspect of the embodiments of the present disclosure, there is provided a voice interaction apparatus, including:
the voice interaction task acquisition module is used for acquiring voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;
the priority determining module is used for setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task;
the interactive voice data generation module is used for carrying out voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;
and the voice interaction module is used for carrying out voice interaction with the target object through the interactive voice data so as to realize the concurrent outbound of the voice interaction tasks under a plurality of outbound scenes.
According to a third aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including: a processor; and a memory having computer readable instructions stored thereon, the computer readable instructions, when executed by the processor, implementing the voice interaction method of any one of the above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a voice interaction method according to any one of the above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the voice interaction method in the disclosed example embodiment collects voice interaction tasks from different data sources, the voice interaction tasks comprise dynamic personalized parameters, and then sets the priority of each voice interaction task based on an outbound scene corresponding to each voice interaction task; performing voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and an outbound scene; and performing voice interaction with the target object through the interactive voice data so as to realize the concurrent outbound of the voice interaction task under a plurality of outbound scenes. On one hand, interactive voice data corresponding to the voice interaction task are generated through the dynamic personalized parameters contained in the collected voice interaction task, so that the interactive voice data can change along with the change of the dynamic personalized parameters under different outbound scenes, the interactive voice data can better conform to interactive scenes, the flexibility and the accuracy of the interactive voice data are improved, and the user experience is improved; on the other hand, the voice interaction tasks are collected from different data sources, and the voice outbound operation is carried out on the target object corresponding to the voice interaction tasks according to the priority, so that the repeated outbound of the same target is avoided, the possibility of multi-scene concurrent voice outbound is realized, the voice outbound efficiency is effectively improved, and the labor cost is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort. In the drawings:
FIG. 1 schematically illustrates a flow diagram of a voice interaction method, in accordance with some embodiments of the present disclosure;
FIG. 2 schematically illustrates a flow diagram for building priorities for voice interaction tasks, in accordance with some embodiments of the present disclosure;
FIG. 3 schematically illustrates a flow diagram for prioritizing construction of number pools, in accordance with some embodiments of the present disclosure;
FIG. 4 schematically illustrates a flow diagram for filtering voice interaction tasks, in accordance with some embodiments of the present disclosure;
FIG. 5 schematically illustrates a flow diagram for assembling interactive voice data, in accordance with some embodiments of the present disclosure;
FIG. 6 schematically illustrates a flow diagram for implementing an outbound operation with assembled interactive voice data, in accordance with some embodiments of the present disclosure;
FIG. 7 schematically illustrates a flow diagram of interaction records corresponding to a returned voice interaction task, in accordance with some embodiments of the present disclosure;
FIG. 8 schematically illustrates a flow diagram for enabling voice interaction, in accordance with some embodiments of the present disclosure;
FIG. 9 schematically illustrates an application scenario diagram of a voice interaction method, in accordance with some embodiments of the present disclosure;
FIG. 10 schematically illustrates a schematic diagram of a voice interaction device, in accordance with some embodiments of the present disclosure;
FIG. 11 schematically illustrates a structural schematic of a computer system of an electronic device, in accordance with some embodiments of the present disclosure;
fig. 12 schematically illustrates a schematic diagram of a computer-readable storage medium, according to some embodiments of the present disclosure.
In the drawings, like or corresponding reference characters designate like or corresponding parts.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
Furthermore, the drawings are merely schematic illustrations and are not necessarily drawn to scale. The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
In this exemplary embodiment, first, a voice interaction method is provided, where the voice interaction method may be applied to a server or a terminal device, and this is not particularly limited in this exemplary embodiment. Taking the server to execute the method as an example, a voice interaction method is described below, and fig. 1 schematically illustrates a schematic diagram of a flow of the voice interaction method according to some embodiments of the present disclosure, and referring to fig. 1, the voice interaction method may include the following steps:
step S110, collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;
step S120, setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task;
step S130, carrying out voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;
and step S140, performing voice interaction with the target object through the interactive voice data so as to realize the concurrent outbound of the voice interaction task under a plurality of outbound scenes.
According to the voice interaction method in the embodiment, on one hand, interactive voice data corresponding to the voice interaction task are generated through the dynamic personalized parameters contained in the collected voice interaction task, so that the interactive voice data can change along with the change of the dynamic personalized parameters under different outbound scenes, the interactive voice data can better conform to interactive scenes, the flexibility and the accuracy of the interactive voice data are improved, and the user experience is improved; on the other hand, the voice interaction tasks are collected from different data sources, and the voice outbound operation is carried out on the target object corresponding to the voice interaction tasks according to the priority, so that the repeated outbound of the same target is avoided, the multi-scene concurrent voice outbound is possible, the voice outbound efficiency is effectively improved, and the labor cost is reduced.
Next, the voice interaction method in the present exemplary embodiment will be further explained.
In step S110, voice interaction tasks are collected from different data sources; the voice interaction task includes dynamic personalization parameters.
In an example embodiment of the present disclosure, the voice interaction task refers to an outbound task of an interactive voice response provided to the user based on a specific requirement, for example, the voice interaction task may be an outbound task of a data query service provided to the user based on the interactive voice response, such as prompting the user by voice to select different number keys to implement query of data such as telephone charge, traffic, and the like; the voice interaction task may also be an outbound task of a recommendation service such as call package recommendation and traffic package recommendation provided to the user based on the interactive voice response, for example, contents of the call package and the traffic package are introduced by voice, and operations such as fast opening or canceling of the call package and the traffic package are realized by an answer of the user.
Because the voice interaction task in the related technical scheme can only be set or imported through the management system and cannot be docked with other data sources, the voice interaction task has fewer sources, the application scenario is limited, and the method and the device are not flexible enough.
Specifically, the different data sources may include a management system, a third-party system, and the like, and the voice interaction tasks imported in batches may be acquired from the management system based on a preset outbound task import template; and/or acquiring a voice interaction task from a third-party system based on an open standard interface; and/or capturing voice interaction tasks from a third-party system based on the data collection tool.
The preset outbound task importing template is a template which is generated by a management system and used for triggering a voice interaction task, a machine dialect can be selected when an outbound scene is created, the management system can automatically generate the outbound task importing template according to dialect configuration and personalized variable information, an operator can download the outbound task importing template, and outbound of the voice interaction task can be triggered by filling related data of the voice interaction task in a template format of the outbound task importing template.
The open standard interface can include but is not limited to an HTTP POST JSON standard interface, and can support the submission of a single voice interaction task of a third-party system, the submission of batch voice interaction tasks and the submission of streaming voice interaction tasks, so that the requirements of the third-party system in most application scenes can be met, the application scenes which can be supported by the expanded voice interaction tasks can be effectively promoted, and the flexibility of deployment in each application scene can be promoted.
In this example, the voice interaction task may be passively acquired through active submission from another data source, and the voice interaction task may be acquired through active acquisition, for example, the voice interaction task may be captured from services such as a database, an FTP (File Transfer Protocol) server, and a message queue in a third-party system through a preset data acquisition tool.
Of course, the above is merely an illustrative example, and the voice interaction task may be collected from different data sources in various ways in the present exemplary embodiment, which is not limited in this exemplary embodiment.
In an exemplary embodiment of the present disclosure, in order to ensure compatibility of each voice collecting task and enable the voice collecting task to be successfully called out, after the voice interaction tasks are collected from different data sources, the collected voice interaction tasks may be uniformly passed through a data protocol adaptation layer to perform protocol and data conversion on the voice interaction tasks, and during the conversion, the collected voice interaction tasks may be converted into the voice interaction tasks in a standard format according to a pre-configured field conversion mapping relationship.
Specifically, the protocol of each voice acquisition task can be uniformly converted into a protocol (such as an HTTP POST JSON protocol) inside the system; then, according to the configured field conversion rule, converting the field in the original data message into a standard field and a format required by a task interacting with voice in a specified system, wherein information required by the system can be uniformly converted into json format data; then, according to the configured field conversion rule, comparing the dynamic individualized variable required by the outbound scene with json format data, and executing the next logic processing after the comparison is passed; after conversion, standard format data (fixed static field + dynamic json data) corresponding to the voice interaction task can be stored in the database, voice interaction task data inside the system are generated, a unified processing mechanism of multi-source data fusion is achieved, and the docking efficiency among systems of different data sources is greatly improved.
In step S120, a priority of each of the voice interaction tasks is set based on an outbound scenario corresponding to each of the voice interaction tasks.
In an example embodiment of the present disclosure, the outbound scenario refers to an application scenario corresponding to different voice interaction tasks, for example, an outbound scenario in which the voice interaction tasks such as telephone fee query and traffic query belong to a self-service query type, and an outbound scenario in which the voice interaction tasks such as arrearage notification belong to an active notification type, and of course, the outbound scenario may be set by a user according to different types of voice interaction tasks, which is not particularly limited in this example embodiment.
Specifically, step S120 may further include step S210 and step S220, and the priority of each voice interaction task may be set based on the outbound scenario corresponding to each voice interaction task, which is implemented based on step S210 and step S220, and as shown in fig. 2, the method specifically includes the following steps:
step S210, constructing a plurality of number pools based on the outbound scenes corresponding to the voice interaction tasks, wherein the number pools correspond to different priorities;
step S220, obtaining the priority attribute of the voice interaction task, and placing the voice interaction task into the number pool according to the priority attribute to finish setting the priority of each voice interaction task.
The number pool refers to an outbound number set by an administrator in relay management, and the priority attribute refers to attribute information for judging priority set for each voice interaction task, for example, the priority attribute may be a priority level set for the voice interaction task, such as priority level 1, priority level 2, priority level 3, priority level 4, priority level 5, and the like; the priority attribute may also be a type of an outbound scene corresponding to the voice interaction task, and certainly, the priority attribute may also be other attribute data capable of distinguishing a priority level corresponding to the voice interaction task, which is not particularly limited in this example embodiment.
In an example embodiment of the present disclosure, voice trunk lines corresponding to different outbound scenarios may be assigned. The voice Trunk Line (Trunk Line) is directly connected with all lines and affiliated equipment between two switching systems, and independent voice Trunk lines are distributed for different outbound scenes, so that when a plurality of outbound scenes simultaneously send outbound calls, the outbound call success rate of a voice interaction task is improved, and the outbound call efficiency is improved.
Fig. 3 schematically illustrates a flow diagram of prioritizing construction of number pools, according to some embodiments of the present disclosure.
Referring to fig. 3, in an application scenario where a multi-outbound scenario is concurrently outbound, a priority function of a voice interaction task may be implemented by dividing a voice relay line and managing a number pool, and specifically may be implemented by the following steps:
step S310, when the outbound scene is created based on the mechanical telephony template, the required number of voice trunk lines can be allocated, the allocated voice trunk lines of the part are exclusively occupied by the corresponding outbound scene, and of course, the number of voice trunk lines can be readjusted in the system according to the actual scene requirement;
step S320, when the voice interaction task needing the outbound exists in the outbound scene, a plurality of number pools can be set and constructed according to the priority level by default, the voice interaction task is placed in the appointed number pool according to the priority attribute corresponding to the voice interaction task, and then the voice interaction task can be extracted from the number pools by the scheduling thread according to the priority order of the number pools to carry out rule detection.
It should be noted that, if the voice interaction task has the same number that is being called due to the parallel outbound in the multi-outbound scenario, the queuing parameter (for example, the combination of the timestamp and the serial number) of the voice interaction task is adjusted to be placed at the end of the number pool queue in the attributive outbound scenario, so that the problem of poor outbound experience of the user due to continuous outbound to the user can be effectively avoided.
In an example embodiment of the present disclosure, the voice interaction task extracted from the number pool is detected regularly, and specifically, the voice interaction tasks in different outbound scenes may be filtered according to preset filtering conditions, where the preset filtering conditions may include a black-and-white list filtering condition, a number segment filtering condition, an outbound time period filtering condition, a multi-scene cross filtering condition, a repeat outbound filtering condition, and an outbound scene directional filtering condition.
FIG. 4 schematically illustrates a flow diagram for filtering voice interaction tasks, according to some embodiments of the present disclosure.
Referring to fig. 4, step S410, checking according to the black and white list filtering condition: the system can preset a global black-and-white list and an outbound scene-level black-and-white list, the outbound scene-level black-and-white list is prior to the global black-and-white list, and if the check fails, the voice interaction task is directly updated to be in an outbound failure state;
step S420, checking according to the number segment filtering condition: checking whether the number section of the voice interaction task meets the number section pre-configured in the outbound scene, and if the number section does not meet the home range configured in the outbound scene, updating the voice interaction task to be in an outbound failure state;
step S430, according to the outbound time period filtering condition check: whether the time range of the voice interaction task is within the time range pre-configured for the batch and the outbound scene is checked, and because the numbers in the number pool are sequenced according to the outbound time stamps calculated in advance, when a certain voice interaction task does not meet the time requirement, the rest outbound tasks in the whole number pool need to wait for the next outbound time period of the batch or the outbound scene to initiate outbound, so that unnecessary delay is caused, and the outbound efficiency is reduced, therefore, the outbound time period filtering condition check is performed on the voice interaction task, and the outbound efficiency can be effectively improved;
step S440, according to the multi-scene cross filtering condition check: acquiring a cross configuration rule of a multi-outbound scene to circularly check whether a rule that the outbound does not exceed N times in M days is met (wherein M and N are positive integers greater than or equal to 1), wherein a specific multi-scene cross filtering process comprises the following steps: (1) Each outbound scene stores the outbound record according to the day, and is used for carrying out repeated outbound inspection and multi-scene cross rule inspection in the outbound scene; (2) Acquiring a multi-outbound scene crossing rule related to an outbound task; (3) Circulating each outbound scene in the rules, traversing outbound records from the date before M days to the current date in the outbound scene, and checking whether the same number exists; (4) If the same number exists, accumulating the number corresponding to the number, judging whether the accumulated number exceeds N times, if so, checking the multi-scene cross filtering condition to fail, and updating the voice interaction task to be in an outbound failure state;
step S450, checking according to the repeated outbound filtering condition: the voice interaction task can be directly updated to the state of outbound failure if the same number is contained in the outbound record from the date before M days to the current date in the outbound scene by traversing the outbound scene according to the repeated outbound check configuration of the outbound scene;
step S460, checking directional filtering conditions according to the outbound scene: different outbound scenes can be adapted through an open universal interface protocol, service logic verification based on the outbound scenes is realized, whether outbound is allowed is checked from a service angle, for example, for the outbound scenes of arrearage notification, an interface can be called to check whether a user pays a fee before outbound, and the like, and if the interface detects that outbound of the voice interaction task is not allowed, the voice interaction task is directly updated to be in an outbound failure state.
Referring to fig. 1, in step S130, a voice outbound operation is performed on a target object corresponding to the voice interaction task according to the priority, and interactive voice data corresponding to the voice interaction task is generated in combination with the dynamic personalized parameter and the outbound scene.
In an example embodiment of the present disclosure, the interactive voice data refers to a voice played by the voice interaction task to the user in the process of voice call, the voice outbound operation refers to an operation of making an outbound request to a user phone number included in the voice interaction task, a phone call channel can be established with the target user through the voice outbound operation, and the corresponding interactive voice data is played through the phone call channel and the user's triggering operation.
Specifically, step S130 may include step S510 to step S530, and the step S510 to step S530 may implement generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameter and the outbound scenario, as shown in fig. 5, which specifically includes:
step S510, a mechanical speech template corresponding to the outbound scene is obtained; the machine telephony template comprises a plurality of voice interaction nodes;
step S520, determining a target voice interaction node according to the input information of the target object, and acquiring reply voice data corresponding to the target voice interaction node;
step S530, assembling the dynamic personalized parameters into the reply voice data to generate interactive voice data corresponding to the voice interaction task.
The machine language template refers to a template which is set in advance according to different outbound scenes and is used for generating interactive voice data, for example, the machine language template can be 'inquire { variable 1} as needed, ask { variable 2} as 1, ask 2'.
The voice interaction node refers to an interaction node which is set by a machine language template and can realize voice interaction with a user, for example, the machine language template can be "call charge request is pressed 1 if needed, remaining traffic request is pressed 2" if needed ", when the user selects 1 or 2, the interaction voice data will jump to a part corresponding to" 1 "or" 2", where the interaction voice part corresponding to" 1 "or" 2 "is the voice interaction node in the machine language template.
The input information of the target object refers to information of a voice interaction node for switching interactive voice data, which is selectively input by a user, for example, the interactive voice data may be "please press 1 if it is required to query { telephone charge } and please press 2 if it is required to query { remaining traffic }, then" 1 "or" 2 "of the voice input by the user through an input control of the user terminal or the user is input information of the target object, of course, the input information of the target object may also be information of a voice interaction node for switching interactive voice data input in other manners, which is not particularly limited in this example.
The reply voice data refers to information in the machine telephony template after the voice interaction node of the interactive voice data is switched by the input information, for example, for the interactive voice data, "call charge is requested to be inquired as needed, call remaining flow is requested to be pressed as 1, call remaining flow is requested to be pressed as 2", after the input information of the user selects "2", and the voice data replied by the robot "your remaining flow is the acquired flow value" may be considered as the reply voice data, which is merely an illustrative example, and the exemplary embodiment is not limited thereto.
Figure 6 schematically illustrates a flow diagram for implementing an outbound operation with assembled interactive voice data, in accordance with some embodiments of the present disclosure.
Referring to fig. 6, matching may be performed according to the required personalized variables and the dynamic personalized parameters in the outbound task during the outbound process, so as to implement relevant logical judgment, assemble a voice interaction data set, and play the voice interaction data set to the user, which specifically includes the following steps:
step S610, before the outbound task subsystem submits an outbound request to the CTI, personalized variables required by an outbound scene can be extracted and synchronized to the intelligent dialogue subsystem;
step S620, when the call-out starts, the intelligent dialogue subsystem can be routed to the starting node of the appointed mechanical dialogue template according to the parameters in the CTI request;
step S630, obtaining the configuration of the reply voice data of the start node, and combining the dynamic individualized parameters of the outbound task to assemble complete voice interaction data, returning the complete voice interaction data to the CTI and playing the complete voice interaction data to the user;
step S640, in the calling-out process, the routing of the voice interaction node can be carried out according to the dynamic personalized parameters of the voice interaction task, besides the routing of the dialogue node can be carried out according to the input information of the user;
step S650, after routing to the target voice interaction node, synchronizing step S630, assembling the reply voice data to generate voice interaction data, and sending to the CTI for playing.
With continued reference to fig. 1, in step S140, voice interaction is performed on the target object through the interactive voice data, so as to implement concurrent outbound of voice interaction tasks in multiple outbound scenarios.
In an example embodiment of the present disclosure, the target object refers to an object targeted by the voice interaction task, for example, the target object may be a user corresponding to the voice interaction task, or may also be a test script corresponding to the voice interaction task during testing, and of course, the target object may also be a robot capable of implementing voice interaction, which is not limited in this example embodiment.
After the interactive voice data corresponding to the voice interaction task is generated, voice outbound operation can be triggered through the outbound priority and the voice relay line to establish a communication channel with a user, and voice interaction is carried out on the target object based on the interactive voice data through the established communication channel, so that concurrent outbound of the voice interaction task under a plurality of outbound scenes can be realized.
In an example embodiment of the present disclosure, after finishing voice interaction with a target object and completing a voice interaction task, data and audio data in a process of sorting out a conversation may be collected, a source of an outbound task may be determined, and if the outbound task originates from a third-party system, call data may be assembled and fed back to the third-party system together with original collected data.
Specifically, when it is detected that the voice interaction task is completed, a voice interaction record corresponding to the voice interaction task may be obtained, and when it is detected that the voice interaction task originates from a third-party system, the voice interaction record may be returned to the third-party system, so that the third-party system performs data association and other subsequent logic processing.
Fig. 7 schematically illustrates a flow diagram of interaction logging corresponding to a return voice interaction task, in accordance with some embodiments of the present disclosure.
Referring to fig. 7, step S710 generates a dialog log: after the outbound call is finished, a question-answer dialogue log can be generated according to the execution track of the machine tactical template in the voice interaction process;
step S720, generating a call label: the tag information of the outbound call can be generated by combining the buried point information set by the outbound scene according to the execution track of the mechanical speech template in the voice interaction process, and the tag information is used for providing a data basis for the user to construct the user portrait subsequently;
step S730, call recording: in the communication process, the CTI can generate an original audio file (for example, an audio file in an alaw format of 8k, 8bit mono channel) for the user side and the machine side respectively, and after the call-out is finished, the two files are fused according to a time line to generate a whole-process audio file to be uploaded to a storage service; meanwhile, in the conversation process, each time of voice input information of a user generates a segmented audio file which can be used for follow-up log check and complaint processing, and the segmented audio file is also uploaded to a storage service for storage;
step S740, returning the call result: and judging the data source of the voice interaction task, and if the voice interaction task is from a third-party system, packaging and returning the outbound result, the conversation log, the audio file, the conversation label and the originally acquired dynamic parameter together so as to facilitate the third-party system to perform data association and subsequent logic processing.
FIG. 8 schematically illustrates a flow diagram for enabling voice interaction, in accordance with some embodiments of the present disclosure.
Referring to fig. 8, step S810, data reception/acquisition;
step S820, establishing a number pool according to scene and priority;
step S830, checking the outbound rule;
step S840, constructing an outbound individualized variable;
step S850, submitting an outbound task to CTI;
step S860, the user and the intelligent voice interaction robot realize intelligent conversation;
step S870, generating a dialog log and label information;
step S880, processing the whole audio file and the segmented audio file;
and step S890, packaging the outbound result, the dialog log, the tag information, the audio and the original dynamic parameter and feeding back the result, the dialog log, the tag information, the audio and the original dynamic parameter to a third-party system.
Fig. 9 schematically illustrates an application scenario diagram of a voice interaction method according to some embodiments of the present disclosure.
Referring to fig. 9, the voice interaction method in this exemplary embodiment may be applied to a system formed by a plurality of systems, where the system may include a multi-source fusion outbound task distribution system 901, a cti system 902, and an intelligent dialog system 903, and the voice interaction method may be executed by the multi-source fusion outbound task distribution system 901, or may be executed by other systems according to a specific application scenario, which is not particularly limited in this exemplary embodiment.
The multi-source fusion outbound task distribution system 901 can acquire or collect voice interaction tasks from different data sources, the voice interaction tasks are submitted to the CTI system 902, the CTI system 902 generates a dialogue request based on the voice interaction tasks and sends the dialogue request to the intelligent dialogue system 903, meanwhile, the multi-source fusion outbound task distribution system 901 can also send dynamic personalized parameters corresponding to the voice interaction tasks to the intelligent dialogue system 903, the intelligent dialogue system 903 generates voice interaction data according to the dynamic personalized parameters and the dialogue request, and the intelligent dialogue system 903 realizes voice interaction with a user to complete outbound operation of the voice interaction tasks.
It should be noted that although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order or that all of the depicted steps must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
In addition, in the present exemplary embodiment, a voice interaction apparatus is also provided. Referring to fig. 10, the voice interaction apparatus 1000 includes: the voice interaction task collection module 1010, the priority determination module 1020, the interaction voice data generation module 1030, and the voice interaction module 1040. Wherein:
the voice interaction task collection module 1010 is used for collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;
the priority determining module 1020 is configured to set a priority of each of the voice interaction tasks based on an outbound scene corresponding to each of the voice interaction tasks;
the interactive voice data generation module 1030 is configured to perform a voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generate interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;
the voice interaction module 1040 is configured to perform voice interaction with the target object through the interactive voice data, so as to implement concurrent outbound of voice interaction tasks in multiple outbound scenarios.
In an exemplary embodiment of the disclosure, based on the foregoing solution, the voice interaction task collecting module 1010 may be configured to:
acquiring batch imported voice interaction tasks from a management system based on a preset outbound task import template; and/or
Acquiring a voice interaction task from a third-party system based on an open standard interface; and/or
And capturing the voice interaction task from the third-party system based on the data acquisition tool.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice interaction apparatus 1000 may further include a voice interaction task conversion module, and the voice interaction task conversion module may be configured to:
converting the collected voice interaction tasks into voice interaction tasks in a standard format according to a pre-configured field conversion mapping relation; the voice interaction task in the standard format comprises a fixed static field and a dynamic personalized parameter.
In an exemplary embodiment of the disclosure, based on the foregoing scheme, the priority determining module 1020 may further be configured to:
constructing a plurality of number pools based on the outbound scenes corresponding to the voice interaction tasks, wherein the number pools correspond to different priorities;
and acquiring the priority attribute of the voice interaction task, and placing the voice interaction task into the number pool according to the priority attribute so as to finish setting the priority of each voice interaction task.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice interaction apparatus 1000 may further include a voice trunk line allocation module, and the voice trunk line allocation module may be configured to:
and distributing the voice relay lines corresponding to different outbound scenes.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice interaction apparatus 1000 may further include a voice interaction task filtering module, and the voice interaction task filtering module may be configured to:
filtering the voice interaction tasks under different outbound scenes according to preset filtering conditions;
the preset filtering conditions comprise black and white list filtering conditions, number segment filtering conditions, calling time period filtering conditions, multi-scene cross filtering conditions, repeated calling filtering conditions and calling scene directional filtering conditions.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the interactive voice data generating module 1030 may be further configured to:
acquiring a mechanical speech template corresponding to the outbound scene; the machine telephony template comprises a plurality of voice interaction nodes;
determining a target voice interaction node according to the input information of the target object, and acquiring reply voice data corresponding to the target voice interaction node;
and assembling the dynamic personalized parameters into the reply voice data to generate interactive voice data corresponding to the voice interaction task.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the voice interaction apparatus 1000 may further include a voice interaction recording feedback module, and the voice interaction recording feedback module may be configured to:
if the voice interaction task is detected to be completed, acquiring a voice interaction record corresponding to the voice interaction task;
and if the voice interaction task originates from a third-party system, returning the voice interaction record to the third-party system so that the third-party system can perform data association and other subsequent logic processing.
The specific details of each module of the voice interaction apparatus have been described in detail in the corresponding voice interaction method, and therefore are not described herein again.
It should be noted that although in the above detailed description several modules or units of the voice interaction device are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above voice interaction method is also provided.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 1100 according to such an embodiment of the disclosure is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.
As shown in fig. 11, the electronic device 1100 is in the form of a general purpose computing device. The components of the electronic device 1100 may include, but are not limited to: the at least one processing unit 1110, the at least one memory unit 1120, a bus 1130 connecting different system components (including the memory unit 1120 and the processing unit 1110), and a display unit 1140.
Wherein the storage unit stores program code that is executable by the processing unit 1110 to cause the processing unit 1110 to perform steps according to various exemplary embodiments of the present disclosure as described in the above section "exemplary methods" of the present specification. For example, the processing unit 1110 may execute step S110 shown in fig. 1, collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters; step S120, setting the priority of each voice interaction task based on the outbound scene corresponding to each voice interaction task; step S130, carrying out voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene; and step S140, performing voice interaction with the target object through the interactive voice data to realize concurrent outbound of voice interaction tasks under a plurality of outbound scenes.
The storage unit 1120 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 1121 and/or a cache memory unit 1122, and may further include a read only memory unit (ROM) 1123.
The storage unit 1120 may also include a program/utility 1124 having a set (at least one) of program modules 1125, such program modules 1125 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 1100 may also communicate with one or more external devices 1170 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 1100, and/or any devices (e.g., router, modem, etc.) that enable the electronic device 1100 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 1150. Also, the electronic device 1100 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1160. As shown, the network adapter 1160 communicates with the other modules of the electronic device 1100 over the bus 1130. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure as described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.
Referring to fig. 12, a program product 1200 for implementing the above voice interaction method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).
Furthermore, the above-described drawings are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes illustrated in the above figures are not intended to indicate or limit the temporal order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (9)
1. A method of voice interaction, comprising:
collecting voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;
constructing a plurality of number pools based on the outbound scenes corresponding to the voice interaction tasks, wherein the number pools correspond to different priorities; acquiring the priority attribute of the voice interaction task, placing the voice interaction task into the number pool according to the priority attribute to finish setting the priority of each voice interaction task, distributing voice relay lines corresponding to different outbound scenes, and realizing the priority function of the voice interaction task by dividing the voice relay lines and managing the number pool, wherein the priority function comprises the following steps: when the outbound scene is created based on the mechanical speech technology template, the number of the required voice relay lines is distributed, and the distributed voice relay lines are exclusively occupied by the corresponding outbound scene; when the voice interaction tasks needing to be called out exist in the calling-out scene, defaulting to set and construct a plurality of number pools according to the priority levels, placing the voice interaction tasks in the appointed number pools according to the priority attributes corresponding to the voice interaction tasks, and extracting the voice interaction tasks from the number pools by a scheduling thread according to the priority order of the number pools to perform rule detection;
performing voice outbound operation on a target object corresponding to the voice interaction task according to the priority, and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;
and performing voice interaction with the target object through the interactive voice data so as to realize concurrent outbound of voice interaction tasks under a plurality of outbound scenes.
2. The voice interaction method of claim 1, wherein the collecting voice interaction tasks from different data sources comprises:
acquiring voice interaction tasks imported in batches from a management system based on a preset outbound task import template; and/or
Acquiring a voice interaction task from a third-party system based on an open standard interface; and/or
And capturing the voice interaction task from the third-party system based on the data acquisition tool.
3. The method of voice interaction according to claim 1 or 2, wherein the method further comprises:
converting the collected voice interaction tasks into voice interaction tasks in a standard format according to a pre-configured field conversion mapping relation; the voice interaction task in the standard format comprises a fixed static field and a dynamic personalized parameter.
4. The voice interaction method of claim 1, further comprising:
filtering the voice interaction tasks under different outbound scenes according to preset filtering conditions;
the preset filtering conditions comprise black and white list filtering conditions, number segment filtering conditions, calling time period filtering conditions, multi-scene cross filtering conditions, repeated calling filtering conditions and calling scene directional filtering conditions.
5. The voice interaction method according to claim 1, wherein generating interactive voice data corresponding to the voice interaction task in combination with the dynamic personalization parameter and the outbound scenario includes:
acquiring a mechanical speech template corresponding to the outbound scene; the machine telephony template comprises a plurality of voice interaction nodes;
determining a target voice interaction node according to the input information of the target object, and acquiring reply voice data corresponding to the target voice interaction node;
and assembling the dynamic personalized parameters into the reply voice data to generate interactive voice data corresponding to the voice interaction task.
6. The method of voice interaction according to claim 1 or 2, wherein the method further comprises:
if the voice interaction task is detected to be completed, acquiring a voice interaction record corresponding to the voice interaction task;
and if the voice interaction task is from a third-party system, returning the voice interaction record to the third-party system so that the third-party system can perform data association and other subsequent logic processing.
7. A voice interaction apparatus, comprising:
the voice interaction task acquisition module is used for acquiring voice interaction tasks from different data sources; the voice interaction task comprises dynamic personalized parameters;
the priority determining module is used for constructing a plurality of number pools on the basis of the outbound scenes corresponding to the voice interaction tasks, wherein the number pools correspond to different priorities; acquiring the priority attribute of the voice interaction task, placing the voice interaction task into the number pool according to the priority attribute so as to complete the setting of the priority of each voice interaction task, distributing voice relay circuits corresponding to different outbound scenes, and realizing the priority function of the voice interaction task through the division of the voice relay circuits and the management of the number pool, wherein the priority function comprises the following steps: when the outbound scene is created based on the mechanical telephony template, the number of the required voice relay lines is distributed, and the distributed voice relay lines are exclusively occupied by the corresponding outbound scene; when the voice interaction task needing outbound exists in the outbound scene, a plurality of number pools are set and constructed according to the priority level by default, the voice interaction task is placed in the appointed number pool according to the priority attribute corresponding to the voice interaction task, and the voice interaction task is extracted from the number pools by a scheduling thread according to the priority order of the number pools to carry out rule detection;
the interactive voice data generation module is used for carrying out voice outbound operation on a target object corresponding to the voice interaction task according to the priority and generating interactive voice data corresponding to the voice interaction task by combining the dynamic personalized parameters and the outbound scene;
and the voice interaction module is used for carrying out voice interaction with the target object through the interactive voice data so as to realize the concurrent outbound of the voice interaction tasks under a plurality of outbound scenes.
8. An electronic device, comprising:
a processor; and
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of voice interaction of any of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of voice interaction according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110760477.XA CN113452853B (en) | 2021-07-06 | 2021-07-06 | Voice interaction method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110760477.XA CN113452853B (en) | 2021-07-06 | 2021-07-06 | Voice interaction method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113452853A CN113452853A (en) | 2021-09-28 |
CN113452853B true CN113452853B (en) | 2022-11-18 |
Family
ID=77815237
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110760477.XA Active CN113452853B (en) | 2021-07-06 | 2021-07-06 | Voice interaction method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113452853B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114827361B (en) * | 2022-04-08 | 2024-09-24 | 马上消费金融股份有限公司 | Outbound processing method and device |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106331394A (en) * | 2016-10-19 | 2017-01-11 | 上海携程商务有限公司 | Voice outbound system and outbound method |
CN108803879A (en) * | 2018-06-19 | 2018-11-13 | 驭势(上海)汽车科技有限公司 | A kind of preprocess method of man-machine interactive system, equipment and storage medium |
CN109117233A (en) * | 2018-08-22 | 2019-01-01 | 百度在线网络技术(北京)有限公司 | Method and apparatus for handling information |
CN109669754A (en) * | 2018-12-25 | 2019-04-23 | 苏州思必驰信息科技有限公司 | The dynamic display method of interactive voice window, voice interactive method and device with telescopic interactive window |
CN113050910B (en) * | 2019-12-26 | 2023-12-05 | 阿里巴巴集团控股有限公司 | Voice interaction method, device, equipment and storage medium |
CN112269864B (en) * | 2020-10-15 | 2023-06-23 | 北京百度网讯科技有限公司 | Method, device, equipment and computer storage medium for generating broadcast voice |
CN112492111B (en) * | 2020-11-25 | 2022-09-06 | 南京星云数字技术有限公司 | Intelligent voice outbound method, device, computer equipment and storage medium |
-
2021
- 2021-07-06 CN CN202110760477.XA patent/CN113452853B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN113452853A (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10832677B2 (en) | Coordinating the execution of a voice command across multiple connected devices | |
CN109658932B (en) | Equipment control method, device, equipment and medium | |
CN109873909B (en) | Voice calling method, device and equipment and computer storage medium | |
US10530850B2 (en) | Dynamic call control | |
CN109087639B (en) | Method, apparatus, electronic device and computer readable medium for speech recognition | |
CN108470034A (en) | A kind of smart machine service providing method and system | |
WO2021218981A1 (en) | Method and apparatus for generating interaction record, and device and medium | |
CN110365796B (en) | Service request processing method and device | |
CN109947387B (en) | Audio acquisition method, audio playing method, system, device and storage medium | |
CN105518645A (en) | Load-balanced, persistent connection techniques | |
JP2017519443A (en) | Audio display | |
CN109299088A (en) | Mass data storage means, device, storage medium and electronic equipment | |
CN113452853B (en) | Voice interaction method and device, electronic equipment and storage medium | |
CN110019716A (en) | More wheel answering methods, terminal device and storage medium | |
CN110418181B (en) | Service processing method and device for smart television, smart device and storage medium | |
JP6689953B2 (en) | Interpreter service system, interpreter service method, and interpreter service program | |
US20240112680A1 (en) | Assisted Speech Recognition | |
CN114365217B (en) | Optimizing queued calls | |
CN112583984A (en) | Agent allocation method, device, system, equipment and medium based on voice interaction | |
CN110019727A (en) | Intelligent interactive method, device, terminal device and storage medium | |
CN103646644A (en) | Method and apparatus for obtaining voice recognition service information recognition | |
CN114969299A (en) | Conversation management method and device, computer equipment and storage medium | |
CN109960489A (en) | Generate method, apparatus, equipment, medium and the question answering system of intelligent Answer System | |
CN111355853A (en) | Call center data processing method and device | |
US10270907B2 (en) | Systems and methods for an integrated interactive response system and mobile device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |