CN116863927A - Vehicle-mounted multimedia voice instruction processing method and device and electronic equipment - Google Patents

Vehicle-mounted multimedia voice instruction processing method and device and electronic equipment Download PDF

Info

Publication number
CN116863927A
CN116863927A CN202310734568.5A CN202310734568A CN116863927A CN 116863927 A CN116863927 A CN 116863927A CN 202310734568 A CN202310734568 A CN 202310734568A CN 116863927 A CN116863927 A CN 116863927A
Authority
CN
China
Prior art keywords
vehicle
instruction
application
information
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310734568.5A
Other languages
Chinese (zh)
Inventor
郭向阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Faw Beijing Software Technology Co ltd
FAW Group Corp
Original Assignee
Faw Beijing Software Technology Co ltd
FAW Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Faw Beijing Software Technology Co ltd, FAW Group Corp filed Critical Faw Beijing Software Technology Co ltd
Priority to CN202310734568.5A priority Critical patent/CN116863927A/en
Publication of CN116863927A publication Critical patent/CN116863927A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)

Abstract

The application discloses a vehicle-mounted multimedia voice command processing method, a vehicle-mounted multimedia voice command processing system, a vehicle-mounted multimedia voice command processing device, electronic equipment, a storage medium and a vehicle, wherein the method comprises the steps of collecting user voice information; analyzing and controlling an instruction of the vehicle machine application according to the user voice; acquiring history information and current state information controlled by the application of the vehicle; according to the history information and the current state information of the application control of the vehicle, matching the analyzed control vehicle application instruction; and executing the instruction by the vehicle-mounted application according to the matching result of the control vehicle-mounted application instruction. Through the scheme, the history and the current state of the vehicle-mounted machine control are used for filtering and matching the instruction of the current vehicle-mounted machine control application, so that when iteration occurs to the vehicle-mounted machine application, the instruction of the vehicle-mounted machine control application can complete self iteration through the history and the current state of the vehicle-mounted machine control.

Description

Vehicle-mounted multimedia voice instruction processing method and device and electronic equipment
Technical Field
The present application relates to the field of voice commands, and in particular, to a vehicle-mounted multimedia voice command processing method, a vehicle-mounted multimedia voice command processing system, a vehicle-mounted multimedia voice command processing device, an electronic apparatus, a storage medium, and a vehicle.
Background
The intelligent voice function on the vehicle is an important function of the current automobile, and great convenience is provided when the automobile is driven. After receiving the voice command, the voice receiving sensor on the vehicle sends the voice command to the voice cloud for recognition, and then sends the voice command to the vehicle-mounted voice application, and the voice command is forwarded to the specific vehicle-mounted application, and then the specific vehicle-mounted application executes specific functions.
However, the vehicle-mounted application has higher updating iteration frequency, such as short iteration period of media applications like music or video, and partial functions are removed and new functions are added after the media applications are updated, so that the voice application and the audio-video application are required to synchronously and iteratively update the voice command type and the replied voice template; iterations and updates of the voice function can also cause a mismatch in the voice control function of the media application and inconsistent broadcast voice templates with expectations. Because the end-side application coupling is serious, development cost and overall function error rate are increased.
The media application integrates a specific control instruction and a specific voice broadcasting template provided by a voice terminal side, can only reply to a defined voice operation, has larger difference between the function executed by the instruction outside the fixed voice operation and the function and the voice broadcasting template which are expected to be executed actually, and has no secondary confirmation and instruction record learning capability to the instruction which cannot be identified. The voice command is sent to the execution without secondary updating or confirmation combined with the current specific scene, and the specific scene is not attached.
Thus, there is a need for an on-board speech recognition scheme that can accommodate media application iterations, following which the matching and execution of speech instructions is updated from iteration to iteration.
The invention aims to provide a vehicle-mounted multimedia voice command processing method, a vehicle-mounted multimedia voice command processing system, a vehicle-mounted multimedia voice command processing device, electronic equipment, a storage medium and a vehicle, and at least one technical problem is solved.
The invention provides the following scheme:
according to an aspect of the present invention, there is provided a vehicle-mounted multimedia voice instruction processing method including:
collecting voice information of a user;
analyzing and controlling an instruction of the vehicle machine application according to the user voice;
acquiring history information and current state information controlled by the application of the vehicle;
according to the history information and the current state information of the application control of the vehicle, matching the analyzed control vehicle application instruction;
and executing the instruction by the vehicle-mounted application according to the matching result of the control vehicle-mounted application instruction.
Further, the matching the parsed control vehicle-mounted application instruction according to the history information and the current state information of the application control of the vehicle-mounted device includes:
Acquiring an instruction rule list;
according to a preset matching rule, matching an instruction for controlling the application of the vehicle machine in the instruction rule list;
if the corresponding instruction is matched in the instruction rule list, executing the instruction by the vehicle-mounted application according to the instruction rule list;
if the instruction rule list does not match the corresponding instruction, broadcasting the voice corresponding to the application instruction of the control vehicle.
Further, the step of not matching the corresponding instruction in the instruction rule list further includes:
acquiring application scene information of a vehicle;
the vehicle application scene information comprises history information and collection information of vehicle application execution instructions;
the vehicle-mounted application scene information also comprises information of the current state of the vehicle-mounted application;
according to the vehicle-mounted application scene information, matching historical control vehicle-mounted application instructions similar to the current control vehicle-mounted application instructions;
if the current control vehicle-to-machine application instruction approximates the historical control vehicle-to-machine application instruction to exceed the preset threshold, the instruction rule list is refreshed according to the current control vehicle-to-machine application instruction approximates the historical control vehicle-to-machine application instruction.
Further, refreshing the instruction rule list according to the instruction of the current analysis control vehicle-mounted application comprises:
Refreshing the instruction rule list according to the information of the application scene corresponding to the application instruction of the control vehicle machine;
the instruction rule list comprises the vehicle-mounted application scene information, corresponding voice information and instructions for controlling vehicle-mounted application.
According to two aspects of the present invention, there is provided an in-vehicle multimedia voice instruction processing system including: the system comprises a voice analysis module, an instruction processing module and a vehicle application module;
the voice analysis module is used for analyzing the collected user voice into an instruction for controlling the vehicle-mounted application;
the instruction processing module is used for matching and controlling the vehicle-machine application instruction under the instruction rule list;
the vehicle-mounted application module is used for executing the vehicle-mounted application control instruction;
the voice analysis module sends an analyzed control vehicle-to-machine application instruction;
the instruction processing module receives the data of the voice analysis module and matches and controls the vehicle-to-machine application instruction according to the instruction rule list;
if the matching is successful, sending an application instruction for controlling the vehicle machine;
and the vehicle-mounted application module receives the vehicle-mounted application module data and executes the vehicle-mounted control application instruction.
Further, the instruction processing module includes: a scene data module and a media cloud module;
the scene data module is used for acquiring historical information, collection information and information of the current state of the vehicle application execution instruction;
the media cloud module is used for searching historical control car machine application instructions similar to the current control car machine application instructions;
if the command rule list fails to match the command for controlling the vehicle-mounted application, the scene data module sends historical information, collection information and information of the current state of the vehicle-mounted application;
the media cloud module receives the scene data module data, and searches a history control vehicle-to-vehicle application instruction similar to the current control vehicle-to-vehicle application instruction according to the history information, collection information and information of the current state of the vehicle-to-vehicle application execution instruction;
if the approximation degree of the historical control vehicle-mounted machine application instruction and the current control vehicle-mounted machine application instruction exceeds the preset threshold, the historical control vehicle-mounted machine application instruction is approximated according to the current control vehicle-mounted machine application instruction, and the instruction rule list is refreshed.
According to three aspects of the present invention, there is provided an in-vehicle multimedia voice instruction processing apparatus including:
The voice acquisition module is used for acquiring voice information of a user;
the instruction analysis module is used for analyzing and controlling the instructions of the vehicle machine application according to the user voice;
the vehicle information module is used for acquiring history information and current state information controlled by vehicle application;
the command matching module is used for matching the analyzed control vehicle-machine application command according to the history information and the current state information controlled by the application of the vehicle-machine;
and the instruction execution module is used for executing the instruction by the vehicle-mounted application according to the matching result of the instruction.
According to four aspects of the present invention, there is provided an electronic apparatus including: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the vehicle-mounted multimedia voice instruction processing method.
According to five aspects of the present invention, there is provided a computer-readable storage medium comprising: which stores a computer program executable by an electronic device for causing the electronic device to perform the steps of the vehicle-mounted multimedia voice instruction processing method when the computer program is run on the electronic device.
According to six aspects of the present application, there is provided a vehicle including:
the electronic equipment is used for realizing the steps of the vehicle-mounted multimedia voice instruction processing method;
a processor that runs a program, and executes the steps of the vehicle-mounted multimedia voice instruction processing method from data output from the electronic device when the program runs;
a storage medium for storing a program that, when executed, performs the steps of the in-vehicle multimedia voice instruction processing method on data output from an electronic device.
Through the scheme, the following beneficial technical effects are obtained:
according to the application, the history and the current state of the vehicle-mounted machine control are used for filtering and matching the instruction of the current vehicle-mounted machine control application, so that when iteration occurs to the vehicle-mounted machine application, the instruction of the vehicle-mounted machine control application can complete self iteration through the history and the current state of the vehicle-mounted machine control.
The application compares the unsuccessful matched control vehicle-mounted application instruction with the similar historical control vehicle-mounted application instruction, and refreshes the instruction rule list, so that the unsuccessful matched control vehicle-mounted application instruction can be associated with the instruction rule list to complete self iteration.
Drawings
Fig. 1 is a flowchart of a method for processing a vehicle-mounted multimedia voice command according to one or more embodiments of the present application.
Fig. 2 is a block diagram of a vehicle-mounted multimedia voice command processing system according to one or more embodiments of the present invention.
Fig. 3 is a block diagram of an on-vehicle multimedia voice command processing apparatus according to one or more embodiments of the present invention.
FIG. 4 is a schematic diagram of a current speech instruction execution system in accordance with one embodiment of the present invention.
FIG. 5 is a schematic diagram of an improved speech instruction execution system in accordance with one embodiment of the present invention.
FIG. 6 is a schematic diagram of an improved timing for execution of voice instructions in accordance with one embodiment of the present invention.
Fig. 7 is a schematic diagram of the EM algorithm flow of one embodiment of the present invention.
FIG. 8 is a schematic diagram of a dimension bit algorithm solution in accordance with one embodiment of the present invention.
FIG. 9 is a diagram of a string state sequence according to an embodiment of the present invention.
Fig. 10 is a block diagram of an electronic device according to one or more embodiments of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flowchart of a method for processing a vehicle-mounted multimedia voice command according to one or more embodiments of the present application.
As shown in fig. 1, the vehicle-mounted multimedia voice command processing method includes:
step S1, collecting voice information of a user;
s2, analyzing and controlling an instruction of the vehicle machine application according to the voice of the user;
step S3, history information and current state information of the application control of the vehicle-mounted device are obtained;
step S4, matching the analyzed control vehicle-mounted application instruction according to the history information and the current state information controlled by the vehicle-mounted application;
and S5, according to a matching result of the control vehicle-mounted application instruction, executing the instruction by the vehicle-mounted application.
Through the scheme, the following beneficial technical effects are obtained:
according to the application, the history and the current state of the vehicle-mounted machine control are used for filtering and matching the instruction of the current vehicle-mounted machine control application, so that when iteration occurs to the vehicle-mounted machine application, the instruction of the vehicle-mounted machine control application can complete self iteration through the history and the current state of the vehicle-mounted machine control.
The application compares the unsuccessful matched control vehicle-mounted application instruction with the similar historical control vehicle-mounted application instruction, and refreshes the instruction rule list, so that the unsuccessful matched control vehicle-mounted application instruction can be associated with the instruction rule list to complete self iteration.
Specifically, in a normal case, after a control command is recognized by voice, the control command is controlled in accordance with the vehicle application. If the vehicle applies updating iteration, the corresponding voice recognition is also updated in a matched mode, and if the end side application is too severely coupled, the vehicle is pulled to start the whole body, and a certain development cost resistance is caused for the iteration of the vehicle application.
While the iterative process of the vehicle application brings about some changes to the functions of the vehicle application, the selection of certain functions is background specific to different users. For example, the original voice command "vent window" and the vehicle defaults to opening the left window. The car application later adds the function of controlling the windows separately, requiring that the voice command from the driver position is "left window ventilation" for the left position of the windows. The original 'windowing ventilation' becomes a control skylight. At this time, if the window is directly controlled according to the voice command, the original purpose or effect cannot be achieved. However, in the conventional multiple use process, the vehicle machine leaves trace information of the use process, for example, at which indoor temperature the vehicle window is ventilated, which vehicle window is used to. Corresponding to information such as a scene, a vehicle state and the like appearing in history, and regarding the original current ' nonstandard ' instruction and the standard ' instruction as the same instruction. The method realizes the self iteration of the application iteration of the voice recognition automatic following vehicle machine.
Meanwhile, the result of the vehicle machine after responding to the voice instruction is broadcasted, and the result is sent out together with the standard instruction in the broadcasting process. For example, the user says "window ventilation", and the vehicle is applied to the left window of the driver station and simultaneously broadcasts "window ventilation" through voice. The user can follow the voice broadcast in the subsequent use to correct the habit. If the user says "left window ventilation", the voice broadcast can stop broadcasting the habit of correcting the user.
In this embodiment, matching the parsed control vehicle-to-machine application instruction according to the history information and the current state information of the vehicle-to-machine application control includes:
acquiring an instruction rule list;
according to a preset matching rule, matching an instruction for controlling the application of the vehicle machine in an instruction rule list;
if the corresponding instruction is matched in the instruction rule list, executing the instruction by the vehicle-mounted application according to the instruction rule list;
if the instruction rule list does not match the corresponding instruction, broadcasting the voice corresponding to the application instruction of the control car machine.
Specifically, the instruction rule list contains information of a plurality of dimensions corresponding to one application operation, and includes voice instruction analysis process information in addition to voice instructions, functions executed by the vehicle-mounted application, history information, collection information, vehicle state information, and the like.
The voice command is analyzed, and besides being directly converted into text data, the text data is disassembled to obtain the real intention of the user.
If the user uses the voice command before the iteration and the corresponding record does not exist in the command rule list after the iteration, the vehicle-mounted application cannot accurately execute the command. Therefore, the commonality is searched in the functions, the history information, the collection information and the vehicle state information executed by the vehicle-mounted application by combining the voice instruction, so that the voice instruction used by the current user can control the iterated vehicle-mounted application.
If the data corresponding to the current voice command is not found in the command rule list, and the voice command cannot be directly executed, the command can be provided with directivity by collecting history information, collection information, vehicle state information and the like. For example, in a multimedia application, for a song, the play history and the song collection information are more, and even if the menu interface level of the multimedia is changed, the play history information, the collection information, the current vehicle state information and the like of the song can be positioned on the function of playing the song.
In this embodiment, the instruction rule list not matching the corresponding instruction further includes:
Acquiring application scene information of a vehicle;
the vehicle application scene information comprises historical information and collection information of vehicle application execution instructions;
the vehicle-mounted application scene information also comprises information of the current state of the vehicle-mounted application;
according to the vehicle-mounted application scene information, matching historical control vehicle-mounted application instructions similar to the current control vehicle-mounted application instructions;
if the current control vehicle-to-machine application instruction approximates the historical control vehicle-to-machine application instruction to exceed the preset threshold, the control vehicle-to-machine application instruction is controlled according to the current control vehicle-to-machine application instruction approximates the historical control vehicle-to-machine application instruction, and the instruction rule list is refreshed.
Specifically, for any kind of on-board application, such as for a functional and functional scenario, the requirements of the user and the functions that the on-board application can perform are implied. In order to make the vehicle application more perfect, rich or simplified. The corresponding voice command is sent by continuing the prior use experience. Typically, a car machine application will not be mutated to a version that is too much different from the original function, and the corresponding voice command, although biased, is difficult for the user to adapt if the voice command is marked according to a severe "label". The history information and collection information of the vehicle-mounted application execution instruction can be used for restoring the vehicle-mounted application which is possibly triggered at the time, parameters, functions and the like in the vehicle-mounted application. The feedback of the user command execution can be formed by broadcasting the voice corresponding to the application instruction of the control car machine along with the increase of the use times. In the interaction process, the user finds that the application triggered by the current voice command is different from the expected application, the current voice command can be terminated timely, the user is induced to try other voice commands instead, and iteration of the voice command is completed.
In this embodiment, the instruction rule list refreshing according to the instruction applied by the current analysis control vehicle machine includes:
refreshing an instruction rule list according to the information of the application scene of the corresponding vehicle machine of the control vehicle machine application instruction;
the instruction rule list comprises vehicle-mounted application scene information, corresponding voice information and instructions for controlling vehicle-mounted application.
Specifically, the instruction rule list includes information of the vehicle machine application corresponding to the voice instruction, and other information related to the voice instruction, such as scene information and state information. In principle, it is ideal to directly implement the operation of the vehicle-mounted application through the voice command, and when no corresponding voice command is found by querying the command rule list, the corresponding vehicle-mounted application is found in other associated information.
The list of instruction rules may be refreshed with the current voice instruction as a "hotword" that "annotates" the voice instruction.
The method is not excluded from a possibility, the vehicle application is modified greatly, the functions which are not available before are added, the corresponding voice instruction is not used, and the method can actively broadcast voice according to the scene information in the current scene conforming instruction rule list, and promote new iteration and the corresponding voice instruction to the user.
Fig. 2 is a block diagram of a vehicle-mounted multimedia voice command processing system according to one or more embodiments of the present invention.
As shown in fig. 2, the vehicle-mounted multimedia voice instruction processing system includes: the system comprises a voice analysis module, an instruction processing module and a vehicle application module;
the voice analysis module is used for analyzing the collected user voice into an instruction for controlling the vehicle-mounted device application;
the instruction processing module is used for matching and controlling the vehicle-machine application instruction under the instruction rule list;
the vehicle-mounted application module is used for executing the vehicle-mounted application control instruction;
the voice analysis module sends an analyzed control vehicle machine application instruction;
the command processing module receives the data of the voice analysis module and matches and controls the application command of the vehicle machine according to the command rule list;
if the matching is successful, sending an application instruction for controlling the vehicle machine;
the vehicle application module receives the vehicle application module data and executes the vehicle application control instruction.
Specifically, in the voice data part of the instruction rule list, the application instruction of the vehicle machine is matched and controlled, the voice instructions which accord with the instruction rule list and the voice instructions which accord with the instruction rule list record are distinguished, and the application of the vehicle machine is directly controlled for the voice instructions which are 'standard' or have been refreshed.
In this embodiment, the instruction processing module includes: a scene data module and a media cloud module;
the scene data module is used for acquiring historical information, collection information and information of the current state of the vehicle application execution instruction;
the media cloud module is used for searching historical control vehicle-to-vehicle application instructions similar to the current control vehicle-to-vehicle application instructions;
if the command rule list fails to match the command for controlling the vehicle-mounted application, the scene data module sends historical information, collection information and information of the current state of the vehicle-mounted application for executing the command by the vehicle-mounted application;
the media cloud module receives scene data module data, and searches historical control vehicle-to-vehicle application instructions similar to the current control vehicle-to-vehicle application instructions according to historical information, collection information and information of the current state of the vehicle-to-vehicle application execution instructions;
if the approximation degree of the historical control vehicle-mounted machine application instruction and the current control vehicle-mounted machine application instruction exceeds the preset threshold, the historical control vehicle-mounted machine application instruction is approximated according to the current control vehicle-mounted machine application instruction, and the instruction rule list is refreshed.
Specifically, for "nonstandard" or unrefreshed voice instructions, the approximate vehicle-to-vehicle application experience is searched according to the history information, the collection information and the information of the current state of the vehicle-to-vehicle application. If the user's approval is obtained, the instruction rule list may be refreshed, which is equivalent to iterating the voice instruction through learning.
The approximation degree of the historical control vehicle-mounted machine application instruction and the current control vehicle-mounted machine application instruction exceeds a preset threshold, a plurality of data items can be given weight values, and the preset threshold is set through the weight values. For example, for a vehicle application with a higher frequency of use, the historical information may be assigned higher, for example, for a vehicle application in a specific scene, the current state information of the vehicle application may be assigned higher.
Fig. 3 is a block diagram of an on-vehicle multimedia voice command processing apparatus according to one or more embodiments of the present invention.
As shown in fig. 3, the in-vehicle multimedia voice instruction processing apparatus includes: the system comprises a voice acquisition module, an instruction analysis module, a vehicle information module, an instruction matching module and an instruction execution module;
the voice acquisition module is used for acquiring voice information of a user;
the instruction analysis module is used for analyzing and controlling the instructions of the vehicle machine application according to the voice of the user;
the vehicle information module is used for acquiring history information and current state information controlled by vehicle application;
the command matching module is used for matching the analyzed control vehicle-machine application command according to the history information and the current state information controlled by the vehicle-machine application;
and the instruction execution module is used for executing the instruction by the vehicle application according to the matching result of the instruction.
It should be noted that, although the system only discloses a voice acquisition module, an instruction analysis module, a vehicle information module, an instruction matching module and an instruction execution module, the present invention is not limited to the above basic functional modules, but rather, the present invention is to be expressed in that, on the basis of the above basic functional modules, one or more functional modules can be added arbitrarily by a person skilled in the art in combination with the prior art to form an infinite number of embodiments or technical solutions, that is, the system is open rather than closed, and the protection scope of the claims of the present invention is not limited to the above disclosed basic functional modules because the embodiment only discloses individual basic functional modules.
FIG. 4 is a schematic diagram of a current speech instruction execution system in accordance with one embodiment of the present invention.
The mutual dependence of the voice and media application end side in the current vehicle-mounted system is serious, all functions can be used only by strictly keeping the version consistent, the development efficiency is low, and the error rate is high; the voice command forwarding logic is fixed, when the software version is fixed, all commands correspond to a fixed execution program, the difference between the execution of the commands outside the fixed speech operation and the expected execution is large, and the uncertain commands have no secondary confirmation function and command memory learning; the current scene is not combined, and the instruction execution humanization has a lifting space.
The general flow of executing voice instructions by the current vehicle is shown in fig. 4, firstly, voice input is performed by a user, after voice is received by a sound collecting sensor module on the vehicle, the voice is uploaded to a voice cloud through a voice application, corresponding to steps 1-3 of fig. 4, voice analysis results are returned to the vehicle after voice cloud analysis, then corresponding receiving applications (such as audio and video applications) are searched in a database of the voice application by taking the results as index values, then an interface of the application is called to execute the results, corresponding logic is executed by other applications on the vehicle after receiving the command, and then the feedback execution results of the broadcasting function of the voice application are called, corresponding to step 6 of fig. 4.
FIG. 5 is a schematic diagram of an improved speech instruction execution system in accordance with one embodiment of the present invention.
FIG. 6 is a schematic diagram of an improved timing for execution of voice instructions in accordance with one embodiment of the present invention.
As shown in fig. 5 and 6, in the block diagrams of the improved voice instruction execution system, steps 1-4 in fig. 5 are the same as the logic before improvement, and are the recognition process of the voice instruction; aiming at the problems of inconsistent iteration frequency and high inter-dependence degree coupling severity of audio and video media application and voice recognition, a media instruction processing engine is introduced and is realized in the form of terminal side service, the matching rule of the service is acquired or subscribed and updated from a media cloud, the instruction rule table contains a voice predefined media instruction type, an application corresponding to the instruction type and a voice broadcasting template, and the voice broadcasting template is used for preparing instruction processing logic, and the media application only needs to provide a functional interface for the media instruction processing engine service, so that the influence of voice instruction change on media application change is reduced; step 5 of fig. 5, the voice application forwards the media processing instruction to the media instruction processing engine service, the service goes to step 10 of fig. 5 if the media processing instruction can be matched with a specific rule according to the query instruction rule table, the media application function is called and a broadcasting template corresponding to the execution result of the voice instruction is returned to the voice terminal side application for broadcasting in step 11 of fig. 5, if the specific rule is not found, the current scene information is obtained from the scene data acquisition module, and the scene information acquisition module comprises actively updated contents of each application corresponding to steps 6-7 of fig. 5: such as play history, collection information, last play content, time and other related content and actively collected information of each audio/video in the using process: and (3) uploading information and voice instructions obtained from the scene data acquisition module to a media cloud in step 7, matching the nearest instructions to a media instruction processing engine service, and enabling the service to present selectable items through views, recording selection results and corresponding instructions and updating the instructions to an instruction rule table for next use, and then calling a media application function interface and feeding back a voice broadcasting template to the voice application corresponding to steps 10 and 11 of fig. 5. The timing in fig. 6 corresponds to fig. 5.
Fig. 7 is a schematic diagram of the EM algorithm flow of one embodiment of the present invention.
FIG. 8 is a schematic diagram of a dimension bit algorithm solution in accordance with one embodiment of the present invention.
FIG. 9 is a diagram of a string state sequence according to an embodiment of the present invention.
As shown in fig. 7, in the matching process of the voice instruction and the scene information, HMM (hidden markov model) may be used for word segmentation. HMM is a model of statistical analysis that is used to describe a markov process containing hidden parameters. Markov models were proposed by Baum in seventies of the twentieth century, and Rabiner mathematically validated markov. After that, the HMM has been widely spread and developed in the eighties, plays a great role in the field of signal processing, becomes an important direction of signal processing, and is widely used in the fields of speech recognition, word recognition, and the like. Bell laboratories have successfully applied it to language vocabulary recognition, making this model a great deal of application and research in speech processing. HMMs are generally composed of five parts: hidden state, model output value, initial state probability, transition probability between states, and output probability distribution of states corresponding to output. The implicit state is usually represented by S, is an implicit state actually required in the model, and is usually not obtained by direct observation; the observable output is represented by O, is the observed output quantity of the model, is associated with the implicit state and can be regarded as the external appearance of the implicit state; the initial state probability matrix pi represents the probability of each state at the initial moment; transition probability A between hidden states represents transition probability between hidden states; and outputting a probability matrix B, wherein the probability matrix B represents the probability that one hidden state corresponds to output as a certain observation output quantity. HMM is generally represented by θ= (pi, a, B).
N: the number of implicit states, here the size of the state value set (B, M, E, S) { B: begin, M: middle, E: end, S: single } represents the position of the word in the word, B represents the beginning word in the word, M represents the middle word in the word, E represents the ending word in the word, S represents the word formation of a single word, here 4.
M: in the HMM model chinese word segmentation, our input is a sentence (i.e., a sequence of observations) and the output is the value of each word in the sentence (o 1 ,o 2 ,…,o m ) And (3) representing.
The initial state probability distribution, i.e. the probability that the first word of a sentence belongs to the four states { B, E, M, S }, is a 1x4 matrix (pi BMES ) Satisfies the following conditions1<=i<=N,N=4。
A transition probability matrix A (a ij ) N*N Where N is 4, i, j ε { B, E, M, S }, it is essentially a two-dimensional matrix of 4*4 (4 is the size of the set of state values). The order of the abscissa and ordinate of the matrix is BEMS.
B, emission probability matrix b= ((B) j (k)) N*M Wherein S is i Belonging to { B, E, M, S }, O k One of the corpora is a corpus, a matrix of 4*N.
The first step of model parameter calculation (model parameter calculation is completed before use and is not performed during use), the model parameters are calculated by using an EM algorithm, and the data set uses a Chinese corpus of people's daily report.
Secondly, word segmentation is carried out on the character string, and a dimension bit algorithm is used for calculating which state in (B, M, E and S) the specific character belongs to. The viterbi algorithm is used in a wide range of dynamic programming algorithms in the field of machine learning, such as in the prediction of conditional random fields and hidden markov state solutions. In practical applications, the viterbi algorithm is not only used for natural language decoding algorithms, but is also widely used in modern digital communications. Model parameters θ= (pi, a, B) and actual output strings o= { O calculated from the first step 1 ,o 2 ,…,o m Solving the most probable word sequence, i.e., solving: { q 1 ,q 2 ,…,q T }=argmax{p(I|O,θ)}
Wherein I represents a state sequence, O represents an output sequence, and the solving process is shown in fig. 8.
x ij It is indicated that the state of the t-th character is assumed to correspond to q j The state corresponding to the t-1 th character is such that a ij b(o t ) The state with the highest probability is that no matter which optimal path is, the character string at each position necessarily corresponds to one state, i.e. each column in the graph must pass through, if the path corresponding to the character at the previous position isq i Only the path q is required to be recorded j N paths (the number of states) can be obtained through T (the length of the character string) iterative computation, and the optimal path can be obtained after comparison and can be used as a state sequence I of a model. If delta is used t (i) Representing the maximum probability, ψ, of all paths with the t-th position character state i (one of the values B, M, E, S) t (i) The recursive formula of the viterbi algorithm, which represents the state corresponding to the previous position character of the path that maximizes the probability, can be expressed as:
and calculating the most probable state sequence when the probability is maximum, namely the most probable word segmentation sequence, through the formula.
In another embodiment, FIG. 9 is a word segmentation result for the string "I want to hear songs that were heard yesterday".
In another embodiment, the semantic strings are matched to the usage record and the context information.
The character string similarity calculation can be implemented by using a contracture network, namely a simple sequence network (SBERT), wherein the sub-networks of the SBERT model all use the BERT model, and the two BERT models share parameters. When comparing the similarity of the two sentences A and B, respectively inputting the two sentences A and B into the BERT network, outputting two groups of vectors representing the sentences, and then calculating the similarity of the two sentences A and B.
The play history, the collection information, the last play content, the corresponding time and scene information weather, the driving duration related fatigue state, the content displayed on the current screen and the like are updated together as additional fields when each time is updated. Word segmentation sequence L= (L) 1 ,l 2 ,…,l n ) And performing character string similarity matching with the play history data, the collection information data and the last play content data respectively, and calculating average matching probability:
where n is the length of the word segmentation sequence.
The introduction of scene information reduces the range and meets the required result on the basis of the matching of play history, collection information and last play content, and the weather information correlation coefficient takes lambda t The probability of weather match is calculated as p t Lambda for fatigue state f P represents the matching probability of fatigue state f Lambda is used for current screen content c Representing that the probability of matching the current screen content is p c Other scene information is introduced in the same form, and the final weighted probability value is:
correlation coefficient lambda t ,λ f And lambda (lambda) c Obtained by solving through EM algorithm, p t ,p f And p c And calculating the similarity of the character strings.
Finally according toAnd listing partial data according to actual needs from large to small for selection by a user.
Fig. 10 is a block diagram of an electronic device according to one or more embodiments of the present application.
As shown in fig. 10, the present application provides an electronic apparatus including: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
The memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of a vehicle-mounted multimedia voice instruction processing method.
The present application also provides a computer-readable storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of a vehicle-mounted multimedia voice instruction processing method.
The present application also provides a vehicle including:
the electronic equipment is used for realizing the steps of the vehicle-mounted multimedia voice instruction processing method;
a processor that runs a program, and that executes the steps of the vehicle-mounted multimedia voice instruction processing method from data output from the electronic device when the program runs;
a storage medium storing a program that, when executed, performs the steps of the in-vehicle multimedia voice instruction processing method on data output from the electronic device.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The electronic device includes a hardware layer, an operating system layer running on top of the hardware layer, and an application layer running on top of the operating system. The hardware layer includes hardware such as a central processing unit (CPU, central Processing Unit), a memory management unit (MMU, memory Management Unit), and a memory. The operating system may be any one or more computer operating systems that implement electronic device control via processes (processes), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system, etc. In addition, in the embodiment of the present invention, the electronic device may be a handheld device such as a smart phone, a tablet computer, or an electronic device such as a desktop computer, a portable computer, which is not particularly limited in the embodiment of the present invention.
The execution body controlled by the electronic device in the embodiment of the invention can be the electronic device or a functional module in the electronic device, which can call a program and execute the program. The electronic device may obtain firmware corresponding to the storage medium, where the firmware corresponding to the storage medium is provided by the vendor, and the firmware corresponding to different storage media may be the same or different, which is not limited herein. After the electronic device obtains the firmware corresponding to the storage medium, the firmware corresponding to the storage medium can be written into the storage medium, specifically, the firmware corresponding to the storage medium is burned into the storage medium. The process of burning the firmware into the storage medium may be implemented by using the prior art, and will not be described in detail in the embodiment of the present invention.
The electronic device may further obtain a reset command corresponding to the storage medium, where the reset command corresponding to the storage medium is provided by the provider, and the reset commands corresponding to different storage media may be the same or different, which is not limited herein.
At this time, the storage medium of the electronic device is a storage medium in which the corresponding firmware is written, and the electronic device may respond to a reset command corresponding to the storage medium in which the corresponding firmware is written, so that the electronic device resets the storage medium in which the corresponding firmware is written according to the reset command corresponding to the storage medium. The process of resetting the storage medium according to the reset command may be implemented in the prior art, and will not be described in detail in the embodiments of the present application.
For convenience of description, the above devices are described as being functionally divided into various units and modules. Of course, the functions of the units, modules may be implemented in one or more pieces of software and/or hardware when implementing the application.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated by one of ordinary skill in the art that the methodologies are not limited by the order of acts, as some acts may, in accordance with the methodologies, take place in other order or concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform the method according to the embodiments or some parts of the embodiments of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The vehicle-mounted multimedia voice command processing method is characterized by comprising the following steps of:
collecting voice information of a user;
analyzing and controlling an instruction of the vehicle machine application according to the user voice;
acquiring history information and current state information controlled by the application of the vehicle;
according to the history information and the current state information of the application control of the vehicle, matching the analyzed control vehicle application instruction;
and executing the instruction by the vehicle-mounted application according to the matching result of the control vehicle-mounted application instruction.
2. The method for processing the vehicle-mounted multimedia voice command according to claim 1, wherein the matching the parsed control vehicle application command according to the history information and the current state information of the application control of the vehicle comprises:
Acquiring an instruction rule list;
according to a preset matching rule, matching an instruction for controlling the application of the vehicle machine in the instruction rule list;
if the corresponding instruction is matched in the instruction rule list, executing the instruction by the vehicle-mounted application according to the instruction rule list;
if the instruction rule list does not match the corresponding instruction, broadcasting the voice corresponding to the application instruction of the control vehicle.
3. The method according to claim 2, wherein the step of not matching the corresponding instruction in the instruction rule list further comprises:
acquiring application scene information of a vehicle;
the vehicle application scene information comprises history information and collection information of vehicle application execution instructions;
the vehicle-mounted application scene information also comprises information of the current state of the vehicle-mounted application;
according to the vehicle-mounted application scene information, matching historical control vehicle-mounted application instructions similar to the current control vehicle-mounted application instructions;
if the current control vehicle-to-machine application instruction approximates the historical control vehicle-to-machine application instruction to exceed the preset threshold, the instruction rule list is refreshed according to the current control vehicle-to-machine application instruction approximates the historical control vehicle-to-machine application instruction.
4. The method for processing a vehicle-mounted multimedia voice command according to claim 3, wherein refreshing the command rule list according to the command currently parsed to control the vehicle-to-machine application comprises:
refreshing the instruction rule list according to the information of the application scene corresponding to the application instruction of the control vehicle machine;
the instruction rule list comprises the vehicle-mounted application scene information, corresponding voice information and instructions for controlling vehicle-mounted application.
5. A vehicle-mounted multimedia voice command processing system, characterized in that the vehicle-mounted multimedia voice command processing system comprises: the system comprises a voice analysis module, an instruction processing module and a vehicle application module;
the voice analysis module is used for analyzing the collected user voice into an instruction for controlling the vehicle-mounted application;
the instruction processing module is used for matching and controlling the vehicle-machine application instruction under the instruction rule list;
the vehicle-mounted application module is used for executing the vehicle-mounted application control instruction;
the voice analysis module sends an analyzed control vehicle-to-machine application instruction;
the instruction processing module receives the data of the voice analysis module and matches and controls the vehicle-to-machine application instruction according to the instruction rule list;
If the matching is successful, sending an application instruction for controlling the vehicle machine;
and the vehicle-mounted application module receives the vehicle-mounted application module data and executes the vehicle-mounted control application instruction.
6. The vehicle-mounted multimedia voice command processing system of claim 5, wherein the command processing module comprises: a scene data module and a media cloud module;
the scene data module is used for acquiring historical information, collection information and information of the current state of the vehicle application execution instruction;
the media cloud module is used for searching historical control car machine application instructions similar to the current control car machine application instructions;
if the command rule list fails to match the command for controlling the vehicle-mounted application, the scene data module sends historical information, collection information and information of the current state of the vehicle-mounted application;
the media cloud module receives the scene data module data, and searches a history control vehicle-to-vehicle application instruction similar to the current control vehicle-to-vehicle application instruction according to the history information, collection information and information of the current state of the vehicle-to-vehicle application execution instruction;
if the approximation degree of the historical control vehicle-mounted machine application instruction and the current control vehicle-mounted machine application instruction exceeds the preset threshold, the historical control vehicle-mounted machine application instruction is approximated according to the current control vehicle-mounted machine application instruction, and the instruction rule list is refreshed.
7. An in-vehicle multimedia voice command processing apparatus, characterized in that the in-vehicle multimedia voice command processing apparatus includes:
the voice acquisition module is used for acquiring voice information of a user;
the instruction analysis module is used for analyzing and controlling the instructions of the vehicle machine application according to the user voice;
the vehicle information module is used for acquiring history information and current state information controlled by vehicle application;
the command matching module is used for matching the analyzed control vehicle-machine application command according to the history information and the current state information controlled by the application of the vehicle-machine;
and the instruction execution module is used for executing the instruction by the vehicle-mounted application according to the matching result of the instruction.
8. An electronic device, comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the vehicle-mounted multimedia voice instruction processing method of any one of claims 1 to 4.
9. A computer-readable storage medium, comprising: which stores a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of the vehicle-mounted multimedia voice instruction processing method of any one of claims 1 to 4.
10. A vehicle, characterized by comprising:
an electronic device for implementing the steps of the method for processing a vehicle-mounted multimedia voice command according to any one of claims 1 to 4;
a processor that runs a program, the data output from the electronic device when the program runs executing the steps of the in-vehicle multimedia voice instruction processing method according to any one of claims 1 to 4;
a storage medium storing a program that, when executed, performs the steps of the in-vehicle multimedia voice instruction processing method according to any one of claims 1 to 4 on data output from an electronic device.
CN202310734568.5A 2023-06-20 2023-06-20 Vehicle-mounted multimedia voice instruction processing method and device and electronic equipment Pending CN116863927A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310734568.5A CN116863927A (en) 2023-06-20 2023-06-20 Vehicle-mounted multimedia voice instruction processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310734568.5A CN116863927A (en) 2023-06-20 2023-06-20 Vehicle-mounted multimedia voice instruction processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116863927A true CN116863927A (en) 2023-10-10

Family

ID=88220723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310734568.5A Pending CN116863927A (en) 2023-06-20 2023-06-20 Vehicle-mounted multimedia voice instruction processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116863927A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117590978A (en) * 2023-12-02 2024-02-23 江门市冠鑫电子有限公司 Vehicle-mounted multimedia integrated management system and method based on artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117590978A (en) * 2023-12-02 2024-02-23 江门市冠鑫电子有限公司 Vehicle-mounted multimedia integrated management system and method based on artificial intelligence
CN117590978B (en) * 2023-12-02 2024-04-09 江门市冠鑫电子有限公司 Vehicle-mounted multimedia integrated management system and method based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN110473531B (en) Voice recognition method, device, electronic equipment, system and storage medium
US8676583B2 (en) Belief tracking and action selection in spoken dialog systems
US20230186912A1 (en) Speech recognition method, apparatus and device, and storage medium
CN111191016A (en) Multi-turn conversation processing method and device and computing equipment
CN108010527B (en) Speech recognition method, computer device, and storage medium
US11657225B2 (en) Generating summary content tuned to a target characteristic using a word generation model
CN110782881A (en) Video entity error correction method after speech recognition and entity recognition
WO2021000403A1 (en) Voice matching method for intelligent dialogue system, electronic device and computer device
CN111079418B (en) Named entity recognition method, device, electronic equipment and storage medium
JP2020004382A (en) Method and device for voice interaction
CN112509560B (en) Voice recognition self-adaption method and system based on cache language model
CN116863927A (en) Vehicle-mounted multimedia voice instruction processing method and device and electronic equipment
CN111178081B (en) Semantic recognition method, server, electronic device and computer storage medium
CN114067786A (en) Voice recognition method and device, electronic equipment and storage medium
CN112767921A (en) Voice recognition self-adaption method and system based on cache language model
CN111914561A (en) Entity recognition model training method, entity recognition device and terminal equipment
CN114676689A (en) Sentence text recognition method and device, storage medium and electronic device
CN114420102B (en) Method and device for speech sentence-breaking, electronic equipment and storage medium
CN114822533B (en) Voice interaction method, model training method, electronic device and storage medium
CN114822532A (en) Voice interaction method, electronic device and storage medium
CN111243604A (en) Training method for speaker recognition neural network model supporting multiple awakening words, speaker recognition method and system
CN112487813B (en) Named entity recognition method and system, electronic equipment and storage medium
CN111128172B (en) Voice recognition method, electronic equipment and storage medium
CN111090970B (en) Text standardization processing method after voice recognition
CN115294974A (en) Voice recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination