Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a method, a system, a computer readable storage medium and a device for managing a voice conflict, which are used for solving the problem that the voice conflict cannot be effectively processed when a dual system exists in the prior art, so that a user command cannot be executed correctly.
To achieve the above and other related objects, an aspect of the present invention provides a voice collision management method applied to a device having a dual operating system, the voice collision management method comprising: the method comprises the steps that voice information recorded by a first operating system in the dual operating systems is provided for a second operating system in the dual operating systems; the voice information is sent, and a first recognition result of the first operating system on the voice information and a second recognition result of the second operating system on the voice information are received; and arbitrating a plurality of formulated management strategies according to the first identification result and the second identification result, so that the first operating system or the second operating system can respond correspondingly according to the arbitrated management strategies.
In an embodiment of the present invention, the first operating system provides the voice information to the second operating system through an inter-process communication manner.
In an embodiment of the present invention, the first recognition result of the voice information includes a scene that is recognized from the voice information by the service end corresponding to the first operating system, a feedback time of the service end corresponding to the first operating system, a domain name corresponding to the scene, a confidence level of the scene, and/or an identifier corresponding to each scene if multiple rounds of scenes are triggered; the second recognition result of the voice information comprises a scene which is recognized from the voice information by the service end corresponding to the second operating system, feedback time of the service end corresponding to the second operating system, a domain name corresponding to the scene, a confidence degree of the scene and/or an identifier corresponding to each scene if a plurality of rounds of scenes are triggered.
In an embodiment of the present invention, a first management policy in the formulated management policies is to compare a feedback time of a server corresponding to the first operating system and a feedback time of a server corresponding to the second operating system with a preset response time; if the feedback time of the service end corresponding to the first operating system does not exceed the preset response time, the feedback time of the service end corresponding to the second operating system exceeds the preset response, the second identification result is ignored, and the first identification result is used; if the feedback time of the service end corresponding to the first operating system exceeds the preset response time, the feedback time of the service end corresponding to the second operating system does not exceed the preset response, the first identification result is ignored, and the second identification result is used.
In an embodiment of the present invention, the second management policy in the formulated management policies is to divide the scenes according to the domain names corresponding to the scenes, and preferably uses the predetermined first recognition result or the predetermined second recognition result.
In an embodiment of the present invention, if the domain names corresponding to the scenes in the first recognition result and the second recognition result are the same, comparing the confidence coefficient of the scenes in the first recognition result with the confidence coefficient of the scenes in the second recognition result, and if the confidence coefficient of the scenes in the first recognition result is greater than the confidence coefficient of the scenes in the second recognition result, preferentially using the first recognition result; if the confidence coefficient of the scene in the first recognition result is smaller than that of the scene in the second recognition result, the second recognition result is preferentially used.
In an embodiment of the present invention, a fourth management policy in the formulated management policy is to use the first recognition result preferentially if the domain name corresponding to the scene in the first recognition result is inconsistent with the domain name corresponding to the scene in the second recognition result, and the confidence coefficient of the scene in the first recognition result exceeds a predetermined confidence coefficient threshold; if the domain name corresponding to the scene in the first recognition result is inconsistent with the domain name corresponding to the scene in the second recognition result, and the confidence coefficient of the scene in the second recognition result exceeds a preset confidence coefficient threshold value, the second recognition result is preferentially used; if the confidence coefficient of the scene in the first recognition result and the confidence coefficient of the scene in the second recognition result are lower than the preset confidence coefficient threshold value, the preset first recognition result or the preset second recognition result is preferentially used.
In an embodiment of the present invention, if the fifth management policy in the formulated management policies triggers multiple rounds of scenes, the recognition result corresponding to the first scene in the multiple rounds of scenes is preferentially selected from the stored recognition results corresponding to the multiple rounds of scenes.
Another aspect of the present invention provides a voice collision management system applied to a device having a dual operating system, the voice collision management system comprising: the control module is used for enabling the first operating system in the dual operating systems to record voice information and providing the voice information for the second operating system in the dual operating systems; the communication module is used for sending the voice information and receiving a first recognition result of the voice information by the first operating system and a second recognition result of the voice information by the second operating system; and the arbitration module is used for arbitrating a plurality of formulated management strategies according to the first identification result and the second identification result so that the first operating system or the second operating system can respond correspondingly according to the arbitrated management strategies.
Still another aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the speech collision management method.
Another aspect of the invention provides an apparatus comprising: a processor and a memory; the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, to cause the device to execute the voice collision management method.
As described above, the voice conflict management method, system, computer readable storage medium and apparatus of the present invention have the following beneficial effects:
the voice conflict management method, the system, the computer readable storage medium and the equipment provided by the invention can effectively process the voice conflict phenomenon occurring on the double systems and correctly execute the voice command of the user.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
Example 1
The embodiment provides a voice conflict management method, which is applied to equipment with double operating systems, and comprises the following steps:
the method comprises the steps that voice information recorded by a first operating system in the dual operating systems is provided for a second operating system in the dual operating systems;
the voice information is sent, and a first recognition result of the first operating system on the voice information and a second recognition result of the second operating system on the voice information are received;
and arbitrating a plurality of formulated management strategies according to the first identification result and the second identification result, so that the first operating system or the second operating system can respond correspondingly according to the arbitrated management strategies.
The voice collision management method provided by the present embodiment will be described in detail below with reference to the drawings. The embodiment is applied to a device with two operating systems, for example, the first operating system is a Linux system, and the second operating system is an Android system.
Referring to fig. 1, a flow chart of a voice collision management method in an embodiment is shown. As shown in fig. 1, the voice collision management method specifically includes the following steps:
s11, enabling the first operating system in the dual operating systems to record voice information, and providing the voice information for the second operating system in the dual operating systems.
For example, the Linux system is enabled to record voice information, and the voice information is provided to the Android system through an inter-process communication mode (IPC mode).
The voice information is a double Mic recorded recording and reference sound (2 channel/16bit/16 KHZ/PCM).
S12, sending the voice information, and receiving a first recognition result of the voice information by the first operating system and a second recognition result of the voice information by the second operating system.
In this embodiment, the voice information is sent to a server corresponding to the first operating system and a server corresponding to the second operating system respectively.
In this embodiment, the first recognition result of the voice information includes a scene that the server corresponding to the first operating system recognizes from the voice information, a feedback time of the server corresponding to the first operating system, a domain name corresponding to the scene, a confidence level of the scene, and/or an identifier corresponding to each scene if multiple rounds of scenes are triggered.
For example, the first recognition result of the voice information at the Linux end includes a corresponding domain, a confidence level and an ID number corresponding to each scene if multiple scenes are triggered.
The second recognition result of the voice information comprises a scene which is recognized from the voice information by the service end corresponding to the second operating system, feedback time of the service end corresponding to the second operating system, a domain name corresponding to the scene, a confidence degree of the scene and/or an identifier corresponding to each scene if a plurality of rounds of scenes are triggered.
For example, the second recognition result of the voice information of the Android terminal includes a corresponding domain, a confidence level and whether to trigger multiple scenes, and an ID number corresponding to each scene.
And S13, arbitrating a plurality of formulated management strategies according to the first identification result and the second identification result, so that the first operating system or the second operating system can respond correspondingly according to the arbitrated management strategies.
In this embodiment, a first management policy in the formulated management policies compares a feedback time of a server corresponding to the first operating system and a feedback time of a server corresponding to the second operating system with a preset response time;
if the feedback time of the service end corresponding to the first operating system does not exceed the preset response time, the feedback time of the service end corresponding to the second operating system exceeds the preset response, the second identification result is ignored, and the first identification result is used;
if the feedback time of the service end corresponding to the first operating system exceeds the preset response time, the feedback time of the service end corresponding to the second operating system does not exceed the preset response, the first identification result is ignored, and the second identification result is used.
For example, if the feedback time of the Linux end exceeds the preset response time and the feedback time of the Android end does not exceed the preset response time, the identification result of the Linux end is ignored, and the identification result of the Android end is directly used.
The second management policy in the formulated management policies is to divide scenes according to domain names (domains) corresponding to the scenes, and the first identification result or the second identification result is preferentially used.
For example, when the scene identified from the voice information is domain related to food, hotel, travel, etc., the identification result of the Android terminal is preferentially used.
If the domain names corresponding to the scenes in the first recognition result and the second recognition result are the same, comparing the confidence coefficient of the scenes in the first recognition result with the confidence coefficient of the scenes in the second recognition result, and if the confidence coefficient of the scenes in the first recognition result is greater than the confidence coefficient of the scenes in the second recognition result, preferentially using the first recognition result; if the confidence coefficient of the scene in the first recognition result is smaller than that of the scene in the second recognition result, the second recognition result is preferentially used.
For example, if the confidence coefficient of the scene in the identification result of the Linux end is higher than that of the scene in the identification result of the Android end, the identification result of the Linux end is preferentially used; the confidence coefficient of the scene in the identification result of the Linux end is lower than that of the scene in the identification result of the Android end, and then the identification result of the Android end is preferably used.
The fourth management policy in the formulated management policy is to use the first recognition result preferentially if the domain name corresponding to the scene in the first recognition result is inconsistent with the domain name corresponding to the scene in the second recognition result and the confidence coefficient of the scene in the first recognition result exceeds a preset confidence coefficient threshold value; if the domain name corresponding to the scene in the first recognition result is inconsistent with the domain name corresponding to the scene in the second recognition result, and the confidence coefficient of the scene in the second recognition result exceeds a preset confidence coefficient threshold value, the second recognition result is preferentially used; if the confidence coefficient of the scene in the first recognition result and the confidence coefficient of the scene in the second recognition result are lower than the preset confidence coefficient threshold value, the preset first recognition result or the preset second recognition result is preferentially used.
For example, if the domain name corresponding to the scene in the identification result of the Linux end is inconsistent with the domain name corresponding to the scene in the identification result of the Android end, the identification result corresponding to the Linux end or the Android end, in which the confidence exceeds a predetermined confidence threshold (e.g., 80%), is preferentially used. The confidence in the identification result of the Linux end and the identification result of the Android end are lower than a preset confidence threshold (such as 80%), and the preset identification result of the Linux end is preferably used.
If the fifth management policy in the formulated management policies triggers multiple rounds of scenes, the recognition result corresponding to the first scene in the multiple rounds of scenes is preferentially selected from the stored recognition results corresponding to the multiple rounds of scenes, so that the multiple rounds of scenes are ensured to be completely processed by a certain end.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described voice conflict management method. Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
The voice conflict management method provided by the embodiment can effectively process the voice conflict phenomenon occurring on the dual system and correctly execute the voice command of the user.
Example two
The present embodiment provides a voice collision management system, and it should be understood that the following division of each module of the management system is only a division of logic functions, and may be fully or partially integrated into one physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the x module may be a processing element that is set up separately, may be implemented in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.
For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when a module is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a Central Processing Unit (CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Referring to fig. 2, a schematic diagram of a voice collision management system in an embodiment is shown. As shown in fig. 2, the voice collision management system 2 includes a control module 21, a communication module 22, and an arbitration module 23.
The control module 21 is configured to cause the first operating system in the dual operating system to record voice information and provide the voice information to the second operating system in the dual operating system. Wherein the control module 21 controls the first operating system to provide the voice information to the second operating system through an inter-process communication mode.
The communication module 22 coupled to the control module 21 is configured to send the voice information and receive a first recognition result of the voice information by the first operating system and a second recognition result of the voice information by the second operating system. In this embodiment, the communication module 22 sends the voice information to a server corresponding to the first operating system and a server corresponding to the second operating system, respectively. The service end corresponding to the first operating system and the service end corresponding to the second operating system can identify scenes from the voice information and feed back a first identification result and a second identification result.
In this embodiment, the first recognition result of the voice information includes a scene that the server corresponding to the first operating system recognizes from the voice information, a feedback time of the server corresponding to the first operating system, a domain name corresponding to the scene, a confidence level of the scene, and/or an identifier corresponding to each scene if multiple rounds of scenes are triggered.
The second recognition result of the voice information comprises a scene which is recognized from the voice information by the service end corresponding to the second operating system, feedback time of the service end corresponding to the second operating system, a domain name corresponding to the scene, a confidence degree of the scene and/or an identifier corresponding to each scene if a plurality of rounds of scenes are triggered.
The arbitration module 23 coupled to the communication module 22 is configured to arbitrate a plurality of formulated management policies according to the first recognition result and the second recognition result, so that the first operating system or the second operating system performs corresponding response processing according to the arbitrated management policies.
In this embodiment, a first management policy in the formulated management policies compares a feedback time of a server corresponding to the first operating system and a feedback time of a server corresponding to the second operating system with a preset response time;
if the feedback time of the service end corresponding to the first operating system does not exceed the preset response time, the feedback time of the service end corresponding to the second operating system exceeds the preset response, the second identification result is ignored, and the first identification result is used;
if the feedback time of the service end corresponding to the first operating system exceeds the preset response time, the feedback time of the service end corresponding to the second operating system does not exceed the preset response, the first identification result is ignored, and the second identification result is used.
The second management policy in the formulated management policies is to divide scenes according to domain names (domains) corresponding to the scenes, and the first identification result or the second identification result is preferentially used.
If the domain names corresponding to the scenes in the first recognition result and the second recognition result are the same, comparing the confidence coefficient of the scenes in the first recognition result with the confidence coefficient of the scenes in the second recognition result, and if the confidence coefficient of the scenes in the first recognition result is greater than the confidence coefficient of the scenes in the second recognition result, preferentially using the first recognition result; if the confidence coefficient of the scene in the first recognition result is smaller than that of the scene in the second recognition result, the second recognition result is preferentially used.
The fourth management policy in the formulated management policy is to use the first recognition result preferentially if the domain name corresponding to the scene in the first recognition result is inconsistent with the domain name corresponding to the scene in the second recognition result and the confidence coefficient of the scene in the first recognition result exceeds a preset confidence coefficient threshold value; if the domain name corresponding to the scene in the first recognition result is inconsistent with the domain name corresponding to the scene in the second recognition result, and the confidence coefficient of the scene in the second recognition result exceeds a preset confidence coefficient threshold value, the second recognition result is preferentially used; if the confidence coefficient of the scene in the first recognition result and the confidence coefficient of the scene in the second recognition result are lower than the preset confidence coefficient threshold value, the preset first recognition result or the preset second recognition result is preferentially used.
If the fifth management policy in the formulated management policies triggers multiple rounds of scenes, the recognition result corresponding to the first scene in the multiple rounds of scenes is preferentially selected from the stored recognition results corresponding to the multiple rounds of scenes, so that the multiple rounds of scenes are ensured to be completely processed by a certain end.
Example III
Still another apparatus provided in an embodiment of the present application includes: a processor, a memory, a transceiver, a communication interface, and a system bus; the memory and the communication interface are connected with the processor and the transceiver through the system bus and complete the communication with each other, the memory is used for storing the computer program, the communication interface is used for communicating with other devices, and the processor and the transceiver are used for running the computer program to enable the devices to execute the steps of the voice conflict management method.
The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industrial Standard Architecture (EISA) bus, or the like. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used to enable communication between the database access apparatus and other devices (e.g., clients, read-write libraries, and read-only libraries). The memory may comprise Random Access Memory (RAM) or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSP), application Specific Integrated Circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In summary, the method, system, computer readable storage medium and device for managing voice conflict provided by the present invention can effectively process the voice conflict phenomenon occurring on the dual system, and correctly execute the voice command of the user. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.