CN116844537A - Voice interaction method, electronic equipment and readable storage medium - Google Patents
Voice interaction method, electronic equipment and readable storage medium Download PDFInfo
- Publication number
- CN116844537A CN116844537A CN202210300703.0A CN202210300703A CN116844537A CN 116844537 A CN116844537 A CN 116844537A CN 202210300703 A CN202210300703 A CN 202210300703A CN 116844537 A CN116844537 A CN 116844537A
- Authority
- CN
- China
- Prior art keywords
- voice control
- control instruction
- voice
- vehicle
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000003993 interaction Effects 0.000 title claims abstract description 31
- 238000004590 computer program Methods 0.000 claims description 13
- 238000004891 communication Methods 0.000 claims description 8
- 230000004044 response Effects 0.000 claims 1
- 230000006870 function Effects 0.000 abstract description 41
- 230000009467 reduction Effects 0.000 abstract description 7
- 230000001965 increasing effect Effects 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 5
- 230000001815 facial effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004378 air conditioning Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 241000238558 Eucarida Species 0.000 description 1
- 244000141359 Malus pumila Species 0.000 description 1
- DWDGSKGGUZPXMQ-UHFFFAOYSA-N OPPO Chemical compound OPPO DWDGSKGGUZPXMQ-UHFFFAOYSA-N 0.000 description 1
- 235000021016 apples Nutrition 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Telephone Function (AREA)
Abstract
The application provides a voice interaction method, electronic equipment and a readable storage medium. The voice interaction method comprises the following steps: acquiring a voice control instruction uploaded by a vehicle-mounted terminal, and determining an execution main body of the voice control instruction based on an arbitration mode; and sending the voice control instruction to the corresponding execution main body so that the execution main body executes the voice control instruction; the execution main body is a vehicle-mounted terminal or a handheld terminal. In this way, the car machine voice recognition function with the stronger noise reduction function and the handheld voice recognition function with the stronger control function are combined, so that more voice functions can be realized, and the voice product experience of a user is greatly increased.
Description
Technical Field
The present application relates to the field of internet of vehicles, and more particularly, to a method for voice interaction, an electronic device, and a computer-readable storage medium.
Background
At present, part of the vehicle machines have a voice recognition function, and the main functional logic is that voice is collected through a vehicle-mounted microphone, voice recognition and semantic recognition are processed through a voice engine, and then the voice engine distributes the voice recognition and semantic recognition to the local vehicle machines or the cloud end of the vehicle machines to execute commands; the operation of opening the window is performed locally by the vehicle machine; if "i want to hear the phase sound of Guo Degang", then the cloud execution of the car machine is needed.
However, the local or cloud of the vehicle can only operate some entertainment functions supported by the vehicle control or the vehicle, so that the use experience of the user is limited.
Disclosure of Invention
An object of the present application is to provide a method for voice interaction, which is suitable for interaction control between a voice function of a vehicle-mounted terminal and a voice function of a handheld terminal.
The application aims to provide a voice interaction method which can improve the experience of a user in using voice functions in a vehicle.
In order to achieve the above purpose, the application obtains the voice control instruction uploaded by the vehicle-mounted terminal, and determines the execution main body of the voice control instruction based on an arbitration mode; and sending the voice control instruction to the corresponding execution main body so that the execution main body executes the voice control instruction; the execution main body is a vehicle-mounted terminal or a handheld terminal. In this way, the car machine voice recognition function with the stronger noise reduction function and the handheld voice recognition function with the stronger control function are combined, so that more voice functions can be realized, and the voice product experience of a user is greatly increased.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings. Wherein:
FIG. 1 is a flow chart of a method 100 of voice interaction according to an embodiment of the present application;
fig. 2 is a flowchart of step S120 according to an exemplary embodiment of the present application;
fig. 3 is a flowchart of step S130 according to an exemplary embodiment of the present application;
FIG. 4 is a flow chart of a method 200 of voice interaction according to an embodiment of the application;
fig. 5 is a flowchart of step S240 according to the first embodiment of the present application;
fig. 6 is a flowchart of step S240 according to a second embodiment of the present application;
fig. 7 is a flowchart of step S240 according to a third embodiment of the present application;
FIG. 8 is a schematic diagram of an exemplary application scenario of a method of voice interaction according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an exemplary application scenario of a method of voice interaction according to an embodiment of the present application;
FIG. 10 is a flow chart illustrating the execution of a method of voice interaction according to an embodiment of the present application;
fig. 11 is a schematic structural view of an electronic device according to an exemplary embodiment of the present application; and
fig. 12 is a schematic structural view of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
For a better understanding of the application, various aspects of the application will be described in more detail with reference to the accompanying drawings. It should be understood that the detailed description is merely illustrative of exemplary embodiments of the application and is not intended to limit the scope of the application in any way. Like reference numerals refer to like elements throughout the specification. The expression "and/or" includes any and all combinations of one or more of the associated listed items.
It should be noted that in this specification, unless explicitly taught to the contrary, the expressions first, second, third, etc. are used merely to separate one feature from another feature region and do not denote any limitation of features, particularly any order of precedence.
It will be further understood that terms such as "comprises," "comprising," "includes," "including," "having," "containing," and/or "including" are open ended and not closed ended in this specification, and mean that there are stated features, but do not preclude the presence or addition of one or more other features, and/or groups thereof. Furthermore, when describing embodiments of the application, use of "may" means "one or more embodiments of the application. Also, the term "exemplary" is intended to refer to an example or illustration.
Unless otherwise defined, all terms (including engineering and technical terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present application pertains. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. In addition, unless explicitly defined or contradicted by context, the particular steps included in the methods described herein need not be limited to the order described, but may be performed in any order or in parallel. The application will be described in detail below with reference to the drawings in connection with embodiments.
It can be understood that the voice recognition function is configured on the vehicle machine of the vehicle, but the voice recognition function can only operate some entertainment functions supported by the vehicle control or the vehicle machine, so that the use experience of a user is limited.
It will be appreciated that the handset is configured with speech recognition functions such as Siri voice for apples, jovi voice for VIVO, small art voice for hua, breeno voice for OPPO, etc. The main functional logic of the voice recognition is to collect voice through a microphone of the mobile phone, analyze the voice through a mobile phone system and finally execute the voice.
Because the service positioning of the car machine is different from that of the mobile phone, the voice recognition function configured on the mobile phone can support more functions. However, when the user is in the car, the environment in which the car is located may be noisy, so that the language recognition effect of the mobile phone is not good, which also reduces the use experience of the user.
Based on the above, the embodiment of the application provides a voice interaction method. In the scheme, a voice control instruction uploaded by a vehicle-mounted terminal is acquired; determining an execution main body of the voice control instruction based on an arbitration mode; and sending the voice control instruction to the corresponding execution main body so that the execution main body executes the voice control instruction; the execution main body is a vehicle-mounted terminal or a handheld terminal. In this way, the car machine voice recognition function with the stronger noise reduction function and the handheld voice recognition function with the stronger control function are combined, so that more voice functions can be realized, and the voice product experience of a user is greatly increased.
Hereinafter, specific examples of the present scheme will be described in more detail with reference to the accompanying drawings.
FIG. 1 illustrates a flow chart of a method 100 of voice interaction according to an embodiment of the application. As shown in fig. 1, the method 100 of voice interaction comprises the steps of:
s110, acquiring a voice control instruction uploaded by the vehicle-mounted terminal;
s120, determining an execution main body of the voice control instruction based on an arbitration mode, wherein the execution main body is a vehicle-mounted terminal or a handheld terminal; and
and S130, sending the voice control instruction to the corresponding execution main body so that the execution main body executes the voice control instruction.
It should be understood that the steps illustrated in method 100 of voice interaction are not exclusive, that method 100 may also include additional steps not illustrated and/or that illustrated steps may be omitted, and that the scope of the application is not limited in this respect. Step S110 to step S130 are described in detail below with reference to fig. 1 to 3 and 10.
S110
In step S110, a voice control instruction uploaded by the vehicle-mounted terminal is acquired.
In some embodiments, a vehicle microphone with high noise immunity is mounted on the vehicle, so that the voice function of the vehicle is adapted to the noisy environment in which the vehicle is located.
Specifically, the acoustic environment inside the vehicle is severe, various noise interferences and man-machine interaction exist, collected microphone signals need to be processed, and the performance of subsequent voice awakening and voice instructions can be guaranteed.
In some embodiments, the user audio is subjected to voice enhancement processing through an echo cancellation algorithm and a noise reduction algorithm so as to improve the effective audio signal of the user and remove noise interference. Echo cancellation refers to global acoustic echo cancellation. Acoustic echo refers to an echo set generated by sound played by a speaker of the device itself after being reflected by different paths once or more times and entering a microphone, and may also be referred to as device self-noise. When a user interacts with the device through speech, the echo signal mixes with the clean speech signal, which can deteriorate the signal-to-noise ratio of the collected speech signal, severely interfering with subsequent speech wake-up and reception of speech instructions. Therefore, the self-noise of the equipment is eliminated through an echo cancellation algorithm module, so that the purpose of improving the signal-to-noise ratio is achieved. The noise reduction algorithm is realized by adopting a noise tracking algorithm specially designed for the characteristics of vehicle-mounted noise, the current noise characteristics in the vehicle are dynamically estimated in real time, then the estimated noise is suppressed by utilizing the noise reduction algorithm, the purpose of enhancing the voice of a user is achieved, and finally the performance of a voice system under the vehicle-mounted noise environment is improved. The echo cancellation algorithm and the noise reduction algorithm can be realized by adopting algorithms commonly used in related technologies, and the application is not repeated here. In addition, the application can also adopt other algorithms for voice enhancement processing of the user audio, and the application is not limited to this.
When a user in the vehicle sends out voice, voice information is collected through the vehicle-mounted microphone, and a sound signal of the voice information is converted into an electric signal to be transmitted to the vehicle-mounted terminal. And analyzing the voice information through the vehicle-mounted terminal to obtain a voice control instruction associated with the voice information.
Optionally, a voice engine is arranged in the vehicle-mounted terminal, after the voice engine performs processes such as voice recognition and semantic recognition on the voice information, a voice control instruction in the voice information is analyzed, and then the voice control instruction is sent to the cloud of the vehicle.
Optionally, the voice engine is arranged at the vehicle cloud end, the vehicle-mounted terminal transmits the electric signal of the voice information to the vehicle cloud end, and the voice engine arranged at the vehicle cloud end analyzes the voice control instruction in the voice information after performing processes such as voice recognition and semantic recognition on the voice information.
For example, the user may wake up the voice function by wake up word, and when the voice function determines to wake up, the user may send out voice information, such as "open window", "turn down air conditioning temperature by 2 ℃" "i want to hear sound/music/radio", etc.
S120
In step S120, an execution subject of the voice control instruction is determined based on the arbitration mode, where the execution subject is a vehicle-mounted terminal or a handheld terminal.
From the foregoing, the voice function on the existing car machine can only implement a few functions, such as car control, music, radio station, and the like, which limits the use experience of the user.
After receiving a voice control instruction of a user, the application determines that an execution main body of the voice control instruction is a vehicle-mounted terminal or a handheld terminal based on an arbitration mode, if the voice control instruction is the vehicle-mounted terminal, the voice control instruction is sent to the vehicle-mounted terminal, and if the voice control instruction is the handheld terminal, the voice control instruction is sent to the handheld terminal. The handheld terminal can be a smart phone, a smart iPad and the like.
In some embodiments, as shown in fig. 2, the step of determining the execution subject of the voice control instruction based on the arbitration mode includes:
s121, responding to the semantics in the voice control instruction, and determining the skill class to which the voice control instruction belongs; and
s122, determining an execution subject of the voice control instruction according to the corresponding relation between the determined skill type and the vehicle-mounted terminal or the handheld terminal.
Specifically, as shown in fig. 10, an arbitration system is deployed in the cloud of the vehicle, and an execution subject of the voice control instruction can be determined through the arbitration system. The corresponding relation between the skill category and the vehicle-mounted terminal or the handheld terminal is stored in the arbitration system. Illustratively, the skill-category includes any of a local vehicle control skill, a local multimedia skill, or a cell phone skill. And correspondingly, the execution main bodies of the local vehicle control skills and the local multimedia skills correspond to the vehicle-mounted terminal, and the execution main bodies of the mobile phone skills correspond to the handheld terminal.
Alternatively, the local vehicle control skill is a skill for controlling the vehicle, such as "opening windows", "increasing/decreasing air conditioning temperature", and the like. The local multimedia skills are skills that can be executed by the vehicle-mounted terminal or a service end interconnected with the vehicle-mounted terminal, such as "turn on navigation", "turn on radio", "weather forecast", "news", "sound", and the like. The mobile phone skills are skills which cannot be executed by the vehicle-mounted terminal or a service end interconnected with the vehicle-mounted terminal, such as 'network cloud music', 'QQ music', 'loving art video', 'train ticket', 'friend circle', 'call' and the like. It will be appreciated that the above examples are merely illustrative of local vehicle control skills, local multimedia skills, or cell phone skills, and may be based on actual configurations of the vehicle-mounted terminal and the hand-held terminal during implementation.
In some embodiments, the above-mentioned correspondence may be set by default, or may be set by the user according to the needs.
So far, in step S121, after the vehicle cloud receives the voice control instruction, the skill type to which the voice control instruction belongs can be determined according to the semantics in the voice control instruction, and then in step S122, the execution subject of the voice control instruction is determined according to the corresponding relationship between the determined skill type and the vehicle-mounted terminal or the handheld terminal.
Exemplary embodiments
When the voice engine recognizes that a voice control instruction in the voice information is "window opening", the arbitration system determines that the skill type to which the voice control instruction belongs is a vehicle control skill according to the semantics of the voice control instruction, and then determines that an execution subject corresponding to the voice control instruction is a vehicle-mounted terminal according to the corresponding relation.
When the voice engine recognizes that the voice control instruction in the voice information is 'i want to hear the voice of Guo Degang', the arbitration system determines that the skill type to which the voice control instruction belongs is a local multimedia skill according to the semantics of the voice control instruction, and then determines that the execution subject corresponding to the voice control instruction is a vehicle-mounted terminal according to the corresponding relation.
When the voice engine recognizes that a voice control instruction in voice information is "I want to listen to Zhou Jielun songs by using Internet music", the arbitration system determines that the skill class to which the voice control instruction belongs is a mobile phone skill according to the semantics of the voice control instruction, and then determines that an execution subject corresponding to the voice control instruction is a handheld terminal according to the corresponding relation.
When the voice engine recognizes that a voice control instruction in voice information is 'I want to see live broadcast of Beijing winter Olympic with an Aiqi video', the arbitration system determines that the skill class to which the voice control instruction belongs is the mobile phone skill according to the semantics of the voice control instruction, and then determines that an execution subject corresponding to the voice control instruction is a handheld terminal according to the corresponding relation.
When the speech engine recognizes that the speech control instruction in the speech information is "help me send the latest 2 photos to the friend circle and match the text: when weather today is true, the arbitration system determines that the skill class to which the voice control instruction belongs is the mobile phone skill according to the semantics of the voice control instruction, and then determines that the execution subject corresponding to the voice control instruction is a handheld terminal according to the corresponding relation.
S130
In step S130, a voice control instruction is transmitted to the determined execution subject to cause the execution subject to execute the voice control instruction.
It can be understood that, in step S120, it is determined that the execution subject of the voice control instruction is a vehicle-mounted terminal or a handheld terminal only for the distinction of the execution subject class based on the arbitration method.
The voice control instruction is transmitted to the corresponding execution body, which is the in-vehicle terminal or the hand-held terminal associated with the user who issued the voice control instruction, in step S130.
It will be understood that, since the user is located in the vehicle, when the execution subject determined in step S120 is the vehicle-mounted terminal, the voice control command should be sent to the vehicle-mounted terminal that uploaded the voice control command in step S110 in step S130, so that the vehicle-mounted terminal executes the voice control command.
However, in the case where the execution subject determined in step S120 is a handheld terminal, since the user who sends out the voice information in the vehicle may be not only the driver but also the passenger, it is necessary to send the voice control instruction to the handheld terminal interconnected with the vehicle-mounted terminal. However, in some scenarios, there is more than one handheld terminal interconnected to the vehicle mounted terminal, and at this time, it is also necessary to determine to which handheld terminal the voice control command should be sent.
In some embodiments, when it is determined in step S120 that the execution subject is a handheld terminal, as shown in fig. 3, the step of sending the voice control instruction to the determined execution subject in step S130 includes:
s131, determining a user who sends out a voice control instruction; and
s132, sending the voice control instruction to the handheld terminal of the user.
In some embodiments, in step S131, determining a user who issues a voice control instruction includes: the user who issues the voice control instruction is determined based on the manner of voiceprint recognition.
For example, voiceprint information of different users can be pre-recorded into the vehicle-mounted terminal and respectively bound with the handheld terminal of the user. Therefore, the voice information received by the vehicle-mounted terminal also necessarily carries voiceprint information of the user who sends the voice information. When the voice information is analyzed, the vehicle-mounted terminal can be matched with voice print information recorded in the system in advance according to the voice print information in the voice information, so that a user sending the voice information is determined, and the information of the user and a voice control instruction are sent to the cloud of the vehicle. Accordingly, after the user who issued the voice control instruction is determined in step S131, the voice control instruction is transmitted to the handheld terminal of the user in step S132.
In some embodiments, in step S131, determining a user who issues a voice control instruction includes: the user who issues the voice control instruction is determined based on the sound source localization and the face recognition.
For example, facial image information of different users may be pre-recorded into the vehicle-mounted terminal and respectively bound with the user's handheld terminal. In addition, the vehicle-mounted microphone has a sound source positioning function, and a camera is installed in the vehicle. Therefore, when the vehicle-mounted microphone collects voice information, the position of a user sending the voice information in the vehicle can be located in a sound source locating mode, and then the camera is controlled to turn to the position of the user and an image of the user is acquired. When the vehicle-mounted terminal receives voice information through the vehicle-mounted microphone, the facial image of the user sending the voice information is also acquired through the camera. The vehicle-mounted terminal can also match the facial image acquired by the camera with the facial image recorded in advance in the system when analyzing the voice information, so that a user sending the voice information is determined, and the information of the user and the voice control instruction are sent to the cloud of the vehicle. Accordingly, after the user who issued the voice control instruction is determined in step S131, the voice control instruction is transmitted to the handheld terminal of the user in step S132.
It can be understood that, in step S132, sending the language control instruction to the user' S handheld terminal means that the vehicle cloud end sends the voice control instruction to the cloud end of the handheld terminal, and the cloud end of the handheld terminal sends the voice control instruction to the handheld terminal according to the related protocol.
The voice control system of the handheld terminal receives the voice control instruction and controls the corresponding application program in the handheld terminal to run according to the semantics in the voice control instruction.
Fig. 4 shows a flow chart of a method 200 of voice interaction according to an embodiment of the application. As shown in fig. 4, the method 200 of voice interaction comprises the steps of:
s210, acquiring a voice control instruction uploaded by the vehicle-mounted terminal;
s220, determining an execution main body of the voice control instruction based on an arbitration mode, wherein the execution main body is a vehicle-mounted terminal or a handheld terminal;
s230, sending the voice control instruction to the corresponding execution main body so that the execution main body executes the voice control instruction; and
s240, the voice control system of the handheld terminal controls the corresponding application program to run according to the semantics in the voice control instruction.
The steps S210 to S230 can refer to the steps S110 to S130 shown in fig. 1 to 3, and the application is not repeated here, and the step S240 is described in detail with reference to fig. 5 to 7 and 8 to 10.
In step S220, it is determined that the executing body is a handheld terminal, and after the voice control instruction is sent to the executing body in step S230, the voice control system of the handheld terminal controls the corresponding application program in the handheld terminal to run according to the semantics in the voice control instruction in step S240.
For example, if the voice control command is a play command, the voice control system determines keywords searched in the application program according to the semantics in the voice control command, then opens the corresponding application program, automatically inputs the keywords in the search box for searching, and plays music or video according to the search result. If the voice control instruction is an execution class command, the voice control system opens the application program according to the semantics in the voice control instruction and directly executes the application program.
For example, when the voice control instruction is "i want to hear the phase sound of Guo Degang", the determined keywords may be "phase sound" and "Guo Degang". When the voice control instruction is "i want to listen to the song of Zhou Jielun" the determined keywords are "Zhou Jielun" and "tornado". When the voice control instruction is "i want to see live broadcast of Beijing winter Olympic Games", the determined keywords can be "winter Olympic Games", "live broadcast" and the like.
In some embodiments, after the voice control system of the handheld terminal controls the corresponding application program to run according to the semantics in the voice control instruction, the voice interaction method of the present application further includes: when the hand-held terminal and the vehicle-mounted terminal are in an interconnection state, the hand-held terminal maps the processing content of the application program to the vehicle-mounted terminal through the interconnection relation between the hand-held terminal and the vehicle-mounted terminal.
If the command of the play class is in the voice control command, after the APP of the handheld terminal plays the corresponding content, the play content can be mapped into the vehicle-mounted terminal through the interconnection relation with the vehicle-mounted terminal.
For example, if the content is audio-class playing content, the playing content may be mapped to the vehicle-mounted terminal through a bluetooth, USB, wifi or other interconnection channel, and the audio player of the vehicle-mounted terminal plays the content. If the content is video-type playing content, the playing content can be mapped to the vehicle-mounted terminal through an interconnection channel such as USB or wifi, for example, a screen is thrown, and the display of the vehicle-mounted terminal displays the playing content.
In order to more clearly describe the technical solution of the present application, the following will take the practical application scenario as an example with reference to fig. 8, 9 and 10.
In the first application scenario, as shown in fig. 8, when the user 310 wakes up the vehicle-mounted voice function and speaks "open the window", the voice engine of the vehicle-mounted terminal 320 recognizes the voice information and sends the voice information to the vehicle cloud 330. The arbitration system of the vehicle cloud 330 determines that the skill class to which the voice control command belongs is a local vehicle control skill, so that the voice control command is sent to the vehicle-mounted terminal 320, and the vehicle-mounted terminal 320 controls the vehicle window to be opened after receiving the voice control command.
In the second application scenario, as shown in fig. 8, when the user 310 wakes up the vehicle-mounted voice function and speaks "i want to hear the voice of Guo Degang", the voice engine of the vehicle-mounted terminal 320 recognizes the voice information and sends it to the vehicle cloud 330. The arbitration system of the vehicle cloud 330 determines that the skill class to which the voice control command belongs is a local multimedia skill according to the semantics of the voice control command. Therefore, the voice control command is sent to the vehicle-mounted terminal 320, and after the vehicle-mounted terminal 320 receives the voice control command, the vehicle-mounted terminal controls the corresponding application program to run, searches according to the related keywords, and plays specific audio data.
In the third application scenario, as shown in fig. 8, when the user 310 wakes up the vehicle-mounted voice function and speaks "i want to listen to the song" tornado "of Zhou Jielun with the internet music, the voice engine of the vehicle-mounted terminal 320 recognizes the voice information and then sends the voice information to the vehicle cloud 330. The arbitration system of the vehicle cloud 330 determines that the skill class to which the voice control command belongs is a mobile phone skill according to the semantics of the voice control command. Therefore, the voice control command is sent to the handheld terminal 340, and after the handheld terminal 340 receives the voice control command, the handheld terminal 340 controls the operation of the internet-ready cloud music application program, searches according to related keywords (Zhou Jielun, tornado), and plays the audio data of the tornado. And transmits the audio data to the vehicle terminal 320 through the bluetooth path, so that the sound is played at the vehicle terminal.
In the fourth application scenario, as shown in fig. 8, when the user 310 wakes up the vehicle-mounted voice function and speaks "i want to see the live broadcast of the beijing winter olympic with the video of the aiqi art", the voice information is sent to the vehicle cloud 330 after being recognized by the voice engine of the vehicle-mounted terminal 320, and the arbitration system of the vehicle cloud 330 determines that the skill class to which the voice control instruction belongs is the mobile phone skill according to the semantics of the voice control instruction, so that the voice control instruction is sent to the handheld terminal 340, and after the handheld terminal 340 receives the voice control instruction, the handheld terminal 340 controls the operation of the video application program of the aiqi art, searches according to related keywords (winter olympic, live broadcast), and plays the video data of the winter olympic live broadcast. And projects the video content to the display screen of the in-vehicle terminal 320 for display by a projection screen technique.
In a fifth application scenario, as shown in fig. 8, when the user 310 wakes up the vehicle-mounted voice function and speaks "help me send the latest 2 photos to the friend circle and join in marriage: when weather today is good, the voice engine of the vehicle-mounted terminal 320 recognizes the voice information and then sends the voice information to the vehicle cloud 330, and the arbitration system of the vehicle cloud 330 determines that the skill class to which the voice control instruction belongs is the mobile phone skill according to the semantics of the voice control instruction, so that the voice control instruction is sent to the handheld terminal 340, and after the handheld terminal 340 receives the voice control instruction, the micro-communication interface is called to start the selection picture and send the friend circle. After the completion, the processing result may be further transferred to the vehicle-mounted terminal 320 through USB or wifi for prompting.
In a sixth application scenario, as shown in fig. 9, the user in the vehicle includes a driver 310-1 and a passenger 310-2, and both the handheld terminals 340-1 and 340-2 have interconnection records with the vehicle-mounted terminal, but the handheld terminal 340-1 that is in wifi interconnection with the vehicle-mounted terminal 320 at the current moment is all the driver 310-1. When the passenger 310-2 wakes up the vehicle-mounted voice function and speaks "i want to see the live broadcast of the beijing winter olympic with the aiqi video", the voice engine of the vehicle-mounted terminal 320 recognizes the voice information and determines that the user who sends the voice control instruction is the passenger 310-2 according to the voiceprint information or the image information, and then sends the information to the vehicle cloud 330. The arbitration system of the car cloud 330 determines that the skill class to which the voice control instruction belongs is a mobile phone skill according to the semantics of the voice control instruction, so that the voice control instruction is sent to the handheld terminal 340-2 of the car occupant 310-2, and after the handheld terminal 340-2 of the car occupant 310-2 receives the voice control instruction, the car occupant 310-2 controls the operation of the aide video application program, searches according to related keywords (winter, live broadcast), and plays video data of the winter live broadcast. At this time, since the handheld terminal interconnected with the vehicle-mounted terminal 320wifi is the handheld terminal 340-1 of the driver 310-1, the control system of the vehicle-mounted terminal 320 controls the communication module of the vehicle-mounted terminal to be disconnected from the handheld terminal 340-1 of the driver 310-1 and interconnected with the handheld terminal 340-2wifi of the passenger 310-2, and the video content is projected onto the display screen of the vehicle-mounted terminal 320 for display through the projection technology.
In some embodiments, semantics in the voice control instructions cannot be directed to a unique application. For example, the voice control instruction is "i want to listen to music", however, applications of a plurality of music categories, such as internet music, QQ music, cool dog music, and the like, are installed in the handheld terminal. As another example, the voice control instruction is "inquiry train ticket", however, a plurality of travel-class application programs such as chinese railway 12306, intelligent train ticket, flying pig, carrying course, etc. are installed in the handheld terminal. In this scenario, it is necessary to determine which application is to be used to execute the voice control instruction.
In some embodiments, as shown in fig. 5, the step S240 includes:
s241, determining a plurality of application programs capable of executing the voice control instruction according to the semantics in the voice control instruction;
s242, traversing a plurality of application programs, and determining member opening states of the application programs; and
s243, controlling the application program of the member to execute the voice control instruction.
According to the scheme, the application program which opens the member service is used for executing the user instruction, so that the user can obtain good use experience.
When it is determined in step S242 that more than one application program is provided for the member service, the application program most recently used by the user may be determined by obtaining the usage trace of the user, and then the application program is controlled to execute the voice control instruction.
In some alternative embodiments, as shown in fig. 6, the step S240 includes:
s241', determining a plurality of application programs capable of executing the voice control instruction according to the semantics in the voice control instruction;
s242' determining priorities of the plurality of applications; and
s243', control the application program with the highest priority order to execute the voice control instruction.
In the above scheme, in step S242', the priority order of each application program may be determined according to the scores of the application programs in the application store, where the application program with a high score has a high priority and the application program with a low score has a low priority. Alternatively, the user may set the priorities of the plurality of applications in the same category in advance, and then determine the priority order of the respective applications according to the preset priorities in step S242'.
In other alternative embodiments, as shown in fig. 7, the step S240 includes:
s241', determining a plurality of application programs capable of executing the voice control instruction according to the semantics in the voice control instruction;
s242", determining the frequency of use of the plurality of applications; and
s243", control executes the voice control instruction in accordance with the application program with the highest frequency of use.
In the above scheme, in step S242", the frequency of use of each application by the user may be determined according to the power consumption of each application, where a larger power consumption means that the application has longer running time and a higher frequency of use by the user, and vice versa.
In addition, based on the voice interaction method, the embodiment of the application further provides electronic equipment, such as a server, a cloud server and the like.
Fig. 11 shows a schematic structural diagram of an electronic device according to a first exemplary embodiment of the present application.
As shown in fig. 11, the electronic device includes: at least one processor 701; and a memory 702 communicatively coupled to the at least one processor 701; the memory stores instructions executable by the at least one processor 701, the instructions being executable by the at least one processor 701 to enable the at least one processor 701 to perform the method of voice interaction as mentioned in the above embodiments. Wherein the electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
Fig. 12 shows a schematic structural diagram of an electronic device according to a second exemplary embodiment of the present application.
As shown in fig. 12, the electronic device may further include, for example: an I/O interface 703, an input unit 704, an output unit 705, a communication unit 706, a read-only memory (ROM) 707, and a Random Access Memory (RAM) 708. In particular, the processor 701 may perform various suitable actions and processes in accordance with a computer program stored in the ROM707 or a computer program loaded from the memory 702 into the RAM 708. In the RAM708, various programs and data required for the operation of the electronic device may also be stored. The processor 701, the ROM707, and the RAM708 are connected to each other via a bus 709. An I/O interface (input/output interface) 703 is also connected to the bus 709.
A number of components in the electronic device are connected to the I/O interface 703, including: an input unit 704 such as a keyboard, a mouse, etc.; an output unit 705 such as various types of displays, speakers, and the like; a memory 702, such as a magnetic disk, optical disk, etc.; and a communication unit 706 such as a network card, modem, wireless communication transceiver, etc. The communication unit 706 allows the electronic device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The processor 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 701 performs the various methods and processes described above, such as the method of voice interaction. For example, in some embodiments, the method of voice interaction may be implemented as a computer software program tangibly embodied on a computer-readable storage medium, such as the memory 702. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device via the ROM707 and/or communication unit 706. When the computer program is loaded into RAM708 and executed by processor 701, one or more steps of the method of voice interaction described above may be performed. Alternatively, in other embodiments, processor 701 may be configured to perform the method of voice interaction in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. The program code described above may be packaged into a computer program product. These program code or computer program product may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus such that the program code, when executed by the processor 701, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the computer, partly on the computer, as a stand-alone software package, partly on the computer and partly on a remote computer or entirely on the remote computer or server.
The specific description and the beneficial effects of the electronic device may refer to the description of the voice interaction method, and will not be repeated.
Furthermore, it should be noted here that: in another aspect, the present application further provides a computer readable storage medium, where a computer program executed by the aforementioned method for voice interaction is stored, where the computer program includes program instructions, and when the processor executes the program instructions, a description of the aforementioned method for voice interaction can be executed, and therefore, a description will not be repeated herein. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs to instruct related hardware, and the programs may be stored in a computer readable storage medium, which when executed may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
The above description is only illustrative of the embodiments of the application and of the technical principles applied. It will be appreciated by those skilled in the art that the scope of the application is not limited to the specific combination of the above technical features, but also encompasses other technical solutions which may be formed by any combination of the above technical features or their equivalents without departing from the technical concept. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.
Claims (13)
1. A method of voice interaction, comprising:
acquiring a voice control instruction uploaded by a vehicle-mounted terminal;
determining an execution main body of the voice control instruction based on an arbitration mode; and
sending the voice control instruction to a corresponding execution body so that the corresponding execution body executes the voice control instruction;
the corresponding execution main body is the vehicle-mounted terminal or a handheld terminal interconnected with the vehicle-mounted terminal.
2. The method of claim 1, wherein determining the execution subject of the voice control instruction based on the arbitration manner comprises:
determining a skill class to which the voice control instruction belongs in response to semantics in the voice control instruction; and
and determining an execution main body of the voice control instruction according to the corresponding relation between the determined skill type and the vehicle-mounted terminal or the handheld terminal.
3. The method of claim 2, wherein the skill-class comprises any of a local car control skill, a local multimedia skill, or a cell phone skill.
4. A method according to claim 2 or 3, wherein the executing body is a handheld terminal;
wherein sending the voice control instruction to the corresponding execution body includes:
determining a user who sends out the voice control instruction; and
and sending the voice control instruction to the handheld terminal of the user.
5. The method of claim 4, wherein determining the user that issued the voice-control instruction comprises:
and determining the user who sends the voice control instruction based on the voiceprint recognition mode.
6. The method of claim 4, wherein determining the user that issued the voice-control instruction comprises:
and determining the user giving the voice control instruction based on the sound source positioning and the face recognition mode.
7. The method of claim 4, wherein after sending the voice-control instruction to the execution body, the method further comprises:
and the voice control system of the handheld terminal controls the corresponding application program to run according to the semantics in the voice control instruction.
8. The method of claim 7, wherein the hand-held terminal has a plurality of applications installed therein capable of executing the same voice control instruction;
the voice control system of the handheld terminal controls the corresponding application program to run according to the semantics in the voice control instruction, and the voice control system comprises the following steps:
determining a plurality of application programs capable of executing the voice control instruction according to the semantics in the voice control instruction;
traversing the plurality of application programs and determining member opening states of the plurality of application programs; and
and controlling the application program of the opening member to execute the voice control instruction.
9. The method of claim 7, wherein the hand-held terminal has installed therein a plurality of applications capable of executing the same voice control instruction;
the voice control system of the handheld terminal controls the corresponding application program to run according to the semantics in the voice control instruction, and the voice control system comprises the following steps:
determining a plurality of application programs capable of executing the voice control instruction according to the semantics in the voice control instruction;
determining priorities of the plurality of applications; and
and controlling the application program with the highest priority order to execute the voice control instruction.
10. The method of claim 7, wherein the hand-held terminal has a plurality of applications installed therein capable of executing the same voice control instruction;
the voice control system of the handheld terminal controls the corresponding application program to run according to the semantics in the voice control instruction, and the voice control system comprises the following steps:
determining a plurality of application programs capable of executing the voice control instruction according to the semantics in the voice control instruction;
determining a frequency of use of the plurality of applications; and
and controlling the voice control instruction to be executed according to the application program with the highest use frequency.
11. The method of claim 7, wherein the voice control system of the handheld terminal further comprises, after the corresponding application program is controlled to run according to semantics in the voice control instruction:
when the handheld terminal and the vehicle-mounted terminal are in an interconnection state, the handheld terminal maps the processing content of the application program into the vehicle-mounted terminal through the interconnection relation with the vehicle-mounted terminal.
12. An electronic device, comprising:
a processor; and
the memory is in communication connection with the processor;
wherein the memory stores a program executable by a processor, which processor is capable of performing the method according to any of claims 1-6 when the program is executed by the processor.
13. A readable storage medium, characterized in that a computer program is stored on the readable storage medium, which computer program, when being executed by a processor, implements a method according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210300703.0A CN116844537A (en) | 2022-03-24 | 2022-03-24 | Voice interaction method, electronic equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210300703.0A CN116844537A (en) | 2022-03-24 | 2022-03-24 | Voice interaction method, electronic equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116844537A true CN116844537A (en) | 2023-10-03 |
Family
ID=88173047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210300703.0A Pending CN116844537A (en) | 2022-03-24 | 2022-03-24 | Voice interaction method, electronic equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116844537A (en) |
-
2022
- 2022-03-24 CN CN202210300703.0A patent/CN116844537A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111261151B (en) | Voice processing method and device, electronic equipment and storage medium | |
CN106910510A (en) | Vehicle-mounted power amplifying device, vehicle and its audio play handling method | |
CN109166575A (en) | Exchange method, device, smart machine and the storage medium of smart machine | |
JP2019128938A (en) | Lip reading based voice wakeup method, apparatus, arrangement and computer readable medium | |
CN105210146A (en) | Method and apparatus for controlling voice activation | |
KR20160005050A (en) | Adaptive audio frame processing for keyword detection | |
CN109087660A (en) | Method, apparatus, equipment and computer readable storage medium for echo cancellor | |
CN111833857B (en) | Voice processing method, device and distributed system | |
CN109273006A (en) | Sound control method, onboard system, vehicle and the storage medium of onboard system | |
CN111179930A (en) | Method and system for realizing intelligent voice interaction in driving process | |
CN111128166B (en) | Optimization method and device for continuous awakening recognition function | |
CN111402877A (en) | Noise reduction method, device, equipment and medium based on vehicle-mounted multi-sound zone | |
CN112614491A (en) | Vehicle-mounted voice interaction method and device, vehicle and readable medium | |
CN111319566A (en) | Voice recognition function link control system and method for vehicle | |
CN110544478A (en) | System and method for intelligent far-field voice interaction of cockpit | |
CN114724564A (en) | Voice processing method, device and system | |
CN112562742A (en) | Voice processing method and device | |
CN112712799B (en) | Acquisition method, device, equipment and storage medium for false triggering voice information | |
CN112185425A (en) | Audio signal processing method, device, equipment and storage medium | |
CN110473524B (en) | Method and device for constructing voice recognition system | |
CN111726284A (en) | WeChat sending method and device for vehicle-mounted intelligent sound box | |
CN116844537A (en) | Voice interaction method, electronic equipment and readable storage medium | |
WO2022052691A1 (en) | Multi-device voice processing method, medium, electronic device, and system | |
CN115457961A (en) | Voice interaction method, vehicle, server, system and storage medium | |
CN111768794A (en) | Voice noise reduction method, voice noise reduction system, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |