CN115440221B - Vehicle-mounted intelligent voice interaction method and system based on cloud computing - Google Patents

Vehicle-mounted intelligent voice interaction method and system based on cloud computing Download PDF

Info

Publication number
CN115440221B
CN115440221B CN202211395643.1A CN202211395643A CN115440221B CN 115440221 B CN115440221 B CN 115440221B CN 202211395643 A CN202211395643 A CN 202211395643A CN 115440221 B CN115440221 B CN 115440221B
Authority
CN
China
Prior art keywords
information
instruction
interactive
vehicle
target user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211395643.1A
Other languages
Chinese (zh)
Other versions
CN115440221A (en
Inventor
徐俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Tiandixing Technology Co ltd
Original Assignee
Foshan Tiandixing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Tiandixing Technology Co ltd filed Critical Foshan Tiandixing Technology Co ltd
Priority to CN202211395643.1A priority Critical patent/CN115440221B/en
Publication of CN115440221A publication Critical patent/CN115440221A/en
Application granted granted Critical
Publication of CN115440221B publication Critical patent/CN115440221B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Automation & Control Theory (AREA)
  • Mechanical Engineering (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Transportation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a vehicle-mounted intelligent voice interaction method and system based on cloud computing, which comprises the following steps: acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, and analyzing state information of the driving user by combining with vehicle-mounted environment information to generate a driving scene in the driving process of a current vehicle; acquiring the position and identity information of a target user through voice information, and initializing an instruction hierarchical graph; performing semantic recognition on voice information based on machine learning, searching in an instruction hierarchical graph to generate an interactive instruction, generating comprehensive constraint according to the driving situation, and correcting the interactive instruction through the comprehensive constraint; and acquiring feedback information of the target user on the interactive instruction, analyzing interactive instruction habit information of the target user, and compensating for correction of the interactive instruction. The method and the device provided by the invention are used for intelligently analyzing the interactive instruction based on the driving situation, better meeting the behavior characteristics of voice interaction of the automobile user on the premise of ensuring the recognition efficiency, and improving the interactive experience in the automobile.

Description

Vehicle-mounted intelligent voice interaction method and system based on cloud computing
Technical Field
The invention relates to the technical field of voice interaction, in particular to a vehicle-mounted intelligent voice interaction method and system based on cloud computing.
Background
With the gradual rise of the internet of vehicles and intelligent automobiles, intelligent transportation and the internet of vehicles are becoming hot topics concerned by people, and more functions are carried on the vehicle machines. At present, the voice interaction technology is successfully applied to scenes such as intelligent sound boxes, input methods and the like, and is beneficial to reducing the operation dependence of a driver on equipment in a vehicle and improving the driving safety factor. Under the background of rapid development of intelligent technologies, key technologies of voice interaction comprise voice recognition, semantic understanding and voice synthesis. The speech recognition technology is to convert human voice signals, i.e. natural language, into corresponding text or instructions. Semantic understanding techniques process received text or instructions to convert natural language into a language that a machine can understand, thereby understanding the user's intent.
The voice recognition in the vehicle-mounted voice interaction is used for recognizing the identity of a speaker besides text content, and providing differentiated services for drivers and passengers in a vehicle according to application scenes, so that compared with the existing voice interaction widely applied to the vehicle, the voiceprint recognition is a field with a relatively higher technical threshold, how to realize more accurate user information habit acquisition by utilizing the voiceprint recognition, and an intelligent voice interaction model aiming at a vehicle-mounted special scene is constructed, so that the behavior characteristics of the voice interaction of the vehicle user are better met, the interaction experience in the vehicle is improved, and the safety protection of the vehicle is strengthened, so that the safety of a driver is guaranteed, and the interaction experience in the driving process is improved.
Disclosure of Invention
In order to solve the technical problems, the invention provides a vehicle-mounted intelligent voice interaction method and system based on cloud computing.
The invention provides a vehicle-mounted intelligent voice interaction method based on cloud computing, which comprises the following steps:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical diagram;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user to the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
In the scheme, the state information of a driving user is analyzed by combining the attention information with the vehicle-mounted environment information, and the driving scene in the driving process of the current vehicle is generated, specifically:
acquiring facial frame image data of a driver through an in-vehicle camera, preprocessing the facial frame image data, and extracting a key frame of the facial frame image data;
extracting human face characteristic points of a driving user according to key frames of the facial frame image data, and acquiring human face orientation information, human eye closing degree and sight line direction according to the human face characteristic points;
comparing and analyzing the acquired face orientation information, the eye closing degree and the sight line direction with a preset threshold value, reading the attention information of a driving user, setting weight information according to road condition information of a current driving road section, and adjusting the attention threshold value by using the weight information;
evaluating the attention information of the driving user according to the attention threshold value at the current moment, acquiring vehicle-mounted environment information and an attention evaluation result, and performing matching analysis on the state information of the driving user;
when the state information of the driver is in a fatigue state, generating voice information to remind the driver, making a decision according to the vehicle-mounted environment information to generate a vehicle-mounted environment change suggestion, and acquiring voice feedback of the driver to execute a corresponding instruction;
in addition, the driving scene in the current vehicle driving process is generated through the state information of the driving user, the vehicle-mounted environment information and the vehicle driving information.
In this scheme, acquire the mutual speech information in the vehicle-mounted environment, acquire target user's position and identity information through mutual speech information, specifically do:
acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, carrying out filtering and denoising on the interactive voice information, and dividing the vehicle-mounted environment into sub-areas with preset quantity;
acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
determining the position of the interactive voice information, then carrying out voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the voiceprint corresponding to the interactive voice information and carrying out similarity calculation on cloud storage data;
acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data do not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and matching an interactive instruction set corresponding to the functional information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
In the scheme, the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints, specifically comprising the following steps:
preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average construction on sentence vector expression according to the Word vectors, and taking the Word vectors and the sentence vector expression as semantic features;
establishing a key information extraction model based on a bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiated weights by combining an attention mechanism and context to obtain key information in interactive voice information;
classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of a target user;
when the retrieval path in the instruction hierarchical graph corresponds to an instruction which is not unique, setting question-back voice information according to retrieval contents, updating the intention according to the feedback of a target user, and matching the corresponding interactive instruction through the updated intention;
and setting comprehensive constraints based on the current driving situation, judging whether the matched interactive instruction accords with the range of the comprehensive constraints, and if not, modifying the interactive instruction and inquiring feedback information of a target user through voice.
In this scheme, still include: monitoring attention information of a driving user during voice interaction, specifically comprising:
after receiving the interactive instruction, acquiring the state information of the current timestamp of the driving user, and creating a temporary attention monitoring task based on the state information of the current timestamp;
acquiring the sight line drop point frequency of a driving user in the driving scene of the current timestamp to acquire a watching hot spot area, acquiring the sight line drop point of each timestamp in the attention monitoring task through the sight line direction of the driving user, and marking the watching duration of the sight line drop point;
judging whether a sight line drop point of each timestamp in the attention monitoring task falls in the watching hotspot area, if the watching duration that the sight line drop point does not fall in the watching hotspot area is longer than a preset threshold, judging whether voice interaction is suspended according to the type of the interaction instruction, and generating voice prompt;
and after the voice interaction is suspended, when the condition that the sight line drop point of the driving user returns to the watching hot spot area is detected, the voice interaction scene is recovered, and the instruction corresponding operation is realized according to the historical interaction instruction.
In the scheme, feedback information of the target user to the interactive instruction is acquired, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and the interactive instruction is compensated based on the instruction habit information, specifically:
after the interactive instruction is executed, feedback information of the target user on the interactive instruction is obtained, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set as a voiceprint information tag of the target user;
the method comprises the steps of performing supplementary correction on an instruction hierarchical diagram based on a supplementary data set of each interactive instruction, extracting a diagram structure according to the corrected instruction hierarchical diagram, and training a graph convolution neural network according to the extracted diagram structure to obtain instruction habit information of a target user;
establishing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction so that the interactive instruction can achieve the expected effect of the target user at one time;
and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
The second aspect of the present invention further provides a cloud computing-based vehicle-mounted intelligent voice interaction system, which includes: the vehicle-mounted intelligent voice interaction method based on the cloud computing comprises a memory and a processor, wherein the memory comprises the vehicle-mounted intelligent voice interaction method based on the cloud computing, and when the processor executes the vehicle-mounted intelligent voice interaction method based on the cloud computing, the following steps are realized:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical graph;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user to the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
The invention discloses a vehicle-mounted intelligent voice interaction method and system based on cloud computing, which comprises the following steps: acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, and analyzing state information of the driving user by combining with vehicle-mounted environment information to generate a driving scene in the driving process of a current vehicle; acquiring the position and identity information of a target user through voice information, and initializing an instruction hierarchical graph; performing semantic recognition on voice information based on machine learning, searching in an instruction hierarchical graph to generate an interactive instruction, generating comprehensive constraint according to the driving scene, and correcting the interactive instruction through the comprehensive constraint; and acquiring feedback information of the target user on the interactive instruction, analyzing interactive instruction habit information of the target user, and compensating for the correction of the interactive instruction. The method and the device are used for intelligently analyzing the interactive instruction based on the driving scene, better meeting the behavior characteristics of voice interaction of the automobile user on the premise of ensuring the recognition efficiency and improving the interactive experience in the automobile.
Drawings
FIG. 1 is a flow chart of a cloud computing-based vehicle-mounted intelligent voice interaction method according to the invention;
FIG. 2 is a flowchart illustrating a method for obtaining location and identity information of a target user through interactive voice information according to the present invention;
FIG. 3 is a flow chart of a method for semantic recognition of interactive voice information based on machine learning according to the present invention;
fig. 4 shows a block diagram of a cloud computing-based vehicle-mounted intelligent voice interaction system.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Fig. 1 shows a flow chart of a cloud computing-based vehicle-mounted intelligent voice interaction method of the invention.
As shown in fig. 1, a first aspect of the present invention provides a cloud-computing-based vehicle-mounted intelligent voice interaction method, including:
s102, acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current vehicle driving process;
s104, acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical graph;
s106, performing semantic recognition on interactive voice information by the cloud based on machine learning, searching in an instruction hierarchical graph to generate an interactive instruction, generating comprehensive constraint according to the driving scene, and correcting the interactive instruction through the comprehensive constraint;
and S108, acquiring feedback information of the target user to the interactive instruction, analyzing the instruction habit information of the target user through matching of the voiceprint information of the target user and the feedback information, and compensating the modification of the interactive instruction based on the instruction habit information.
The method comprises the steps that facial frame image data of a driving user are obtained through a camera in a vehicle, the frame image data are preprocessed, and a key frame of the facial frame image data is extracted; extracting human face characteristic points of a driving user according to key frames of the facial frame image data, and acquiring human face orientation information, human eye closing degree and sight line direction; setting corresponding threshold intervals for the extracted data, matching attention levels to the threshold intervals, comparing and analyzing the acquired face orientation information, the eye closing degree and the sight line direction with preset thresholds, reading the attention information of the driving user, and judging the current attention level of the driving user. Setting weight information according to the road condition information of the current driving road section, and adjusting the attention threshold value by using the weight information, for example, when the current driving road section has a congested road condition, reducing the corresponding attention threshold value, so that a driving user is more concentrated in the driving process; evaluating the attention information of the driver according to the attention threshold value at the current moment, acquiring vehicle-mounted environment information and performing matching analysis on the state information of the driver with the attention evaluation result, wherein the vehicle-mounted environment information comprises information such as the number of people in the vehicle, the sound of the vehicle-mounted environment, the temperature of the vehicle-mounted environment, the air quality of the vehicle-mounted environment and the like; when the state information of the driver is in a fatigue state, generating voice information to remind the driver, making a decision according to the vehicle-mounted environment information to generate a vehicle-mounted environment change suggestion, and acquiring voice feedback of the driver to execute a corresponding instruction, for example, when the driver is in a slight fatigue state, inquiring whether to open a vehicle window, play music or reduce the temperature of an air conditioner through voice interaction. In addition, the driving scene in the current vehicle driving process is generated through the state information of the driving user, the vehicle-mounted environment information and the vehicle driving information.
Fig. 2 is a flowchart illustrating a method for obtaining location and identity information of a target user through interactive voice information according to the present invention.
According to the embodiment of the invention, the interactive voice information in the vehicle-mounted environment is acquired, and the position and identity information of the target user is acquired through the interactive voice information, which specifically comprises the following steps:
s202, acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, filtering and denoising the interactive voice information, and dividing the vehicle-mounted environment into a preset number of sub-regions;
s204, acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
s206, determining the position of the interactive voice information, then performing voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the similarity between the voiceprint corresponding to the interactive voice information and cloud storage data;
s208, acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data does not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and S210, matching an interactive instruction set corresponding to the function information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
It should be noted that, an interactive instruction set corresponding to the function information is matched according to the location information of the target user, and an instruction hierarchical diagram is initialized based on the identity information through the interactive instruction set, specifically: the method comprises the steps of studying interactive instructions based on position information of a vehicle-mounted environment for classification, obtaining keyword information corresponding to each interactive instruction through big data retrieval, and constructing an interactive instruction knowledge graph by the interactive instructions, the keyword information and classification results; extracting a corresponding interactive instruction set from the interactive instruction knowledge graph according to a source subregion of the target user interactive voice information; acquiring similar historical driving scenes from the historical driving scenes according to the current driving scenes, extracting the use frequency of each interactive instruction in the similar historical driving scenes, and carrying out priority classification on the interactive instructions in the interactive instruction set according to the use frequency; if the identity information of the target user is stored in the cloud, constructing a user portrait according to a historical interaction instruction corresponding to the identity information of the target user, adjusting the priority through the user portrait, and generating an instruction hierarchical graph according to the adjusted priority by the interaction instruction set.
FIG. 3 is a flow chart illustrating a method for semantic recognition of interactive voice information based on machine learning according to the present invention.
According to the embodiment of the invention, the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints, specifically:
s302, preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average according to the Word vectors to construct sentence vector expression, and taking the Word vectors and the sentence vector expression as semantic features;
s304, establishing a key information extraction model based on the bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiation weights by combining an attention mechanism and context to obtain key information in the interactive voice information;
s306, classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of the target user;
s308, when the retrieval path in the instruction hierarchical graph corresponds to an instruction which is not unique, setting question-back voice information according to the retrieval content, updating the intention according to the feedback of the target user, and matching the corresponding interactive instruction according to the updated intention;
s310, comprehensive constraints are set based on the current driving situation, whether the matched interactive instruction meets the range of the comprehensive constraints or not is judged, if not, the interactive instruction is corrected, and then feedback information of a target user is inquired through voice.
It should be noted that after receiving the interactive instruction, obtaining the state information of the current timestamp of the driving user, and creating a temporary attention monitoring task based on the state information of the current timestamp; acquiring the sight line drop frequency of a driving user in the driving scene of the current timestamp to acquire a watching hotspot area, acquiring the sight line drop of each timestamp in the attention monitoring task through the sight line direction of the driving user, and marking the watching duration of the sight line drop; judging whether a sight line drop point of each timestamp in the attention monitoring task falls in the watching hotspot area, if the watching duration that the sight line drop point does not fall in the watching hotspot area is longer than a preset threshold, judging whether voice interaction is suspended according to the type of the interaction instruction, and generating voice prompt; and after the voice interaction is suspended, when the fact that the sight line drop point of the driving user returns to the watching hot spot area is detected, the voice interaction scene is recovered, and the instruction corresponding operation is realized according to the historical interaction instruction, wherein the sight line drop point detection of the target user can be obtained through eye movement hardware detection equipment or a pupil projection space method.
Integrating the context characteristics of the interactive voice information based on a bidirectional long and short term memory neural network model, ensuring the semantic integrity of the interactive voice information, constructing a data set by analyzing interactive instruction keywords through big data, dividing the data set into a training set and a verification set, carrying out word vector representation on the training data set, inputting the training data set into the bidirectional long and short term memory neural network combined with an attention mechanism for training, and extracting key information in the interactive voice information through the trained model.
In addition, comprehensive constraints are set based on the current driving scene, risk factors in the current driving scene are analyzed, constraint information is set on part of interactive instructions according to the risk factors, when the interactive instructions corresponding to the interactive voice information of the target user do not accord with a constraint preset range, voice prompt is generated, corresponding instructions are executed according to feedback of the target user, for example, when children are detected in the vehicle according to vehicle-mounted environment information, the height of a window where the children are located is constrained.
After the interactive instruction is executed, feedback information of the target user on the interactive instruction is acquired, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set as a voiceprint information tag of the target user; the method comprises the steps of supplementing and correcting an instruction hierarchical graph based on a supplement data set of each interactive instruction, extracting a graph structure according to the corrected instruction hierarchical graph, representing habit characteristics of a target user to the interactive instruction, training a graph convolution neural network according to the extracted graph structure to obtain instruction habit information of the target user, and executing message propagation between the target user and the interactive instruction which are connected with each other in the graph structure, wherein the message propagation process comprises characteristic transformation, neighborhood aggregation and nonlinear activation, so that target user nodes of self attribute characteristics have local neighborhood information and are expressed in a vector form, and the method specifically comprises the following steps:
Figure 253937DEST_PATH_IMAGE001
/>
wherein,
Figure 21036DEST_PATH_IMAGE002
represents the target user node->
Figure 876997DEST_PATH_IMAGE003
At the fifth place>
Figure 653323DEST_PATH_IMAGE004
Sub-convolved feature vector representation->
Figure 248646DEST_PATH_IMAGE005
Indicating that the interactive instruction node->
Figure 717804DEST_PATH_IMAGE004
Is selected, based on the set of neighbor nodes in the system, and>
Figure 529903DEST_PATH_IMAGE006
response history interactive instruction node>
Figure 375499DEST_PATH_IMAGE007
And the target user node->
Figure 87978DEST_PATH_IMAGE003
In the degree of (c), is greater than or equal to>
Figure 587092DEST_PATH_IMAGE008
Representing interactive instruction node>
Figure 761853DEST_PATH_IMAGE007
At the fifth place>
Figure 270194DEST_PATH_IMAGE009
A feature vector representation of the sub-convolution;
constructing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction, so that the interactive instruction can achieve the expected effect of the target user at one time; and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
Fig. 4 shows a block diagram of a cloud computing-based vehicle-mounted intelligent voice interaction system.
The second aspect of the present invention further provides a cloud computing-based vehicle-mounted intelligent voice interaction system 4, which includes: the memory 41 and the processor 42, where the memory includes a cloud-computing-based vehicle-mounted intelligent voice interaction method program, and when executed by the processor, the cloud-computing-based vehicle-mounted intelligent voice interaction method program implements the following steps:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical diagram;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user on the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
It should be noted that, the method includes acquiring facial frame image data of a driving user through an in-vehicle camera, preprocessing the facial frame image data, and extracting a key frame of the facial frame image data; extracting face characteristic points of a driver according to key frames of the face frame image data, and acquiring face orientation information, eye closing degree and sight direction; setting corresponding threshold intervals for the extracted data, matching attention levels to the threshold intervals, comparing and analyzing the acquired face orientation information, the eye closing degree and the sight line direction with preset thresholds, reading the attention information of the driving user, and judging the current attention level of the driving user. Setting weight information according to the road condition information of the current driving road section, and adjusting the attention threshold value by using the weight information, for example, when the current driving road section has a congested road condition, reducing the corresponding attention threshold value, so that a driving user is more concentrated in the driving process; evaluating the attention information of the driver according to the attention threshold value at the current moment, acquiring vehicle-mounted environment information and performing matching analysis on the state information of the driver by the attention evaluation result, wherein the vehicle-mounted environment information comprises information such as the number of people in the vehicle, the sound of the vehicle-mounted environment, the temperature of the vehicle-mounted environment, the air quality of the vehicle-mounted environment and the like; when the state information of the driver is in a fatigue state, generating voice information to remind the driver, making a decision according to the vehicle-mounted environment information to generate a vehicle-mounted environment change suggestion, and acquiring voice feedback of the driver to execute a corresponding instruction, for example, when the driver is in a slight fatigue state, inquiring whether to open a vehicle window, play music or reduce the temperature of an air conditioner through voice interaction. In addition, the driving scene in the current vehicle driving process is generated through the state information of the driving user, the vehicle-mounted environment information and the vehicle driving information.
According to the embodiment of the invention, the interactive voice information in the vehicle-mounted environment is acquired, and the position and identity information of the target user is acquired through the interactive voice information, which specifically comprises the following steps:
acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, carrying out filtering and denoising on the interactive voice information, and dividing the vehicle-mounted environment into sub-areas with preset quantity;
acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
determining the position of the interactive voice information, then carrying out voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the voiceprint corresponding to the interactive voice information and carrying out similarity calculation on cloud storage data;
acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data do not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and matching an interactive instruction set corresponding to the functional information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
It should be noted that, an interactive instruction set corresponding to the function information is matched according to the location information of the target user, and an instruction hierarchical diagram is initialized based on the identity information through the interactive instruction set, specifically: the method comprises the steps of studying interactive instructions based on position information of a vehicle-mounted environment for classification, obtaining keyword information corresponding to each interactive instruction through big data retrieval, and constructing an interactive instruction knowledge graph by the interactive instructions, the keyword information and classification results; extracting a corresponding interactive instruction set from the interactive instruction knowledge graph according to a source subregion of the target user interactive voice information; acquiring similar historical driving scenes from the historical driving scenes according to the current driving scenes, extracting the use frequency of each interactive instruction in the similar historical driving scenes, and carrying out priority classification on the interactive instructions in the interactive instruction set according to the use frequency; if the identity information of the target user is stored in the cloud, constructing a user portrait according to a historical interaction instruction corresponding to the identity information of the target user, adjusting the priority through the user portrait, and generating an instruction hierarchical graph according to the adjusted priority by the interaction instruction set.
According to the embodiment of the invention, the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints, specifically:
preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average construction on sentence vector expression according to the Word vectors, and taking the Word vectors and the sentence vector expression as semantic features;
establishing a key information extraction model based on a bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiated weights by combining an attention mechanism and context to obtain key information in interactive voice information;
classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of a target user;
when the retrieval path in the instruction hierarchical graph corresponds to a non-unique instruction, setting question-back voice information according to retrieval contents, updating intentions according to feedback of a target user, and matching corresponding interactive instructions according to the updated intentions;
and setting comprehensive constraints based on the current driving situation, judging whether the matched interactive instruction accords with the range of the comprehensive constraints, and if not, modifying the interactive instruction and inquiring feedback information of a target user through voice.
It should be noted that after receiving the interactive instruction, obtaining the state information of the current timestamp of the driving user, and creating a temporary attention monitoring task based on the state information of the current timestamp; acquiring the sight line drop frequency of a driving user in the driving scene of the current timestamp to acquire a watching hotspot area, acquiring the sight line drop of each timestamp in the attention monitoring task through the sight line direction of the driving user, and marking the watching duration of the sight line drop; judging whether a sight line drop point of each timestamp in the attention monitoring task falls in the watching hotspot area, if the watching duration that the sight line drop point does not fall in the watching hotspot area is longer than a preset threshold, judging whether voice interaction is suspended according to the type of the interaction instruction, and generating voice prompt; and after the voice interaction is suspended, when the fact that the sight line drop point of the driving user returns to the watching hot spot area is detected, the voice interaction scene is recovered, and the instruction corresponding operation is realized according to the historical interaction instruction, wherein the sight line drop point detection of the target user can be obtained through eye movement hardware detection equipment or a projection space method.
Integrating the context characteristics of the interactive voice information based on a bidirectional long and short term memory neural network model, ensuring the semantic integrity of the interactive voice information, constructing a data set by analyzing interactive instruction keywords through big data, dividing the data set into a training set and a verification set, carrying out word vector representation on the training data set, inputting the training data set into the bidirectional long and short term memory neural network combined with an attention mechanism for training, and extracting key information in the interactive voice information through the trained model.
In addition, comprehensive constraints are set based on the current driving scene, risk factors in the current driving scene are analyzed, constraint information is set on part of interactive instructions according to the risk factors, when the interactive instructions corresponding to the interactive voice information of the target user do not accord with a constraint preset range, voice prompt is generated, corresponding instructions are executed according to feedback of the target user, for example, when children are detected in the vehicle according to vehicle-mounted environment information, the height of a window where the children are located is constrained.
After the interactive instruction is executed, feedback information of the target user on the interactive instruction is acquired, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set as a voiceprint information tag of the target user; the method comprises the steps of supplementing and correcting an instruction hierarchical graph based on a supplement data set of each interactive instruction, extracting a graph structure according to the corrected instruction hierarchical graph, representing habit characteristics of a target user to the interactive instruction, training a graph convolution neural network according to the extracted graph structure to obtain instruction habit information of the target user, and executing message propagation between the target user and the interactive instruction which are connected with each other in the graph structure, wherein the message propagation process comprises characteristic transformation, neighborhood aggregation and nonlinear activation, so that target user nodes of self attribute characteristics have local neighborhood information and are expressed in a vector form, and the method specifically comprises the following steps:
Figure 727194DEST_PATH_IMAGE001
wherein,
Figure 662789DEST_PATH_IMAGE002
represents the target user node->
Figure 324846DEST_PATH_IMAGE003
At the fifth place>
Figure 105720DEST_PATH_IMAGE004
Sub-convolved feature vector representation->
Figure 294256DEST_PATH_IMAGE005
Indicating that the interactive instruction node->
Figure 774654DEST_PATH_IMAGE004
In a neighbor node set, in conjunction with a node selection unit>
Figure 48640DEST_PATH_IMAGE006
Response history interactive instruction node->
Figure 774151DEST_PATH_IMAGE007
And the target user node->
Figure 817193DEST_PATH_IMAGE003
Is greater than or equal to>
Figure 205842DEST_PATH_IMAGE008
Indicating that the interactive instruction node->
Figure 701546DEST_PATH_IMAGE007
At the fifth place>
Figure 230747DEST_PATH_IMAGE009
A feature vector representation of the sub-convolution;
constructing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction, so that the interactive instruction can achieve the expected effect of the target user at one time; and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
The third aspect of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a cloud-computing-based vehicle-mounted intelligent voice interaction method program, and when the cloud-computing-based vehicle-mounted intelligent voice interaction method program is executed by a processor, the steps of the cloud-computing-based vehicle-mounted intelligent voice interaction method are implemented.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A vehicle-mounted intelligent voice interaction method based on cloud computing is characterized by comprising the following steps:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the target user through a judgment result, and initializing an instruction hierarchical diagram;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user to the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
2. The cloud-computing-based vehicle-mounted intelligent voice interaction method according to claim 1, characterized in that state information of a driving user is analyzed according to attention information and vehicle-mounted environment information, and a driving scene in a current vehicle driving process is generated, specifically:
acquiring facial frame image data of a driver through an in-vehicle camera, preprocessing the facial frame image data, and extracting a key frame of the facial frame image data;
extracting human face characteristic points of a driving user according to key frames of the facial frame image data, and acquiring human face orientation information, human eye closing degree and sight line direction according to the human face characteristic points;
comparing and analyzing the acquired face orientation information, the eye closing degree and the sight line direction with a preset threshold value, reading the attention information of a driving user, setting weight information according to road condition information of a current driving road section, and adjusting the attention threshold value by using the weight information;
evaluating the attention information of the driving user according to the attention threshold value at the current moment, acquiring vehicle-mounted environment information and an attention evaluation result, and performing matching analysis on the state information of the driving user;
when the state information of the driver is in a fatigue state, generating voice information to remind the driver, making a decision according to the vehicle-mounted environment information to generate a vehicle-mounted environment change suggestion, and acquiring voice feedback of the driver to execute a corresponding instruction;
in addition, the driving scene in the current vehicle driving process is generated through the state information of the driving user, the vehicle-mounted environment information and the vehicle driving information.
3. The cloud-computing-based vehicle-mounted intelligent voice interaction method according to claim 1, characterized in that interactive voice information in a vehicle-mounted environment is acquired, and position and identity information of a target user is acquired through the interactive voice information, specifically:
acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, carrying out filtering and denoising on the interactive voice information, and dividing the vehicle-mounted environment into sub-areas with preset quantity;
acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
determining the position of the interactive voice information, then carrying out voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the voiceprint corresponding to the interactive voice information and carrying out similarity calculation on cloud storage data;
acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data do not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and matching an interactive instruction set corresponding to the functional information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
4. The vehicle-mounted intelligent voice interaction method based on cloud computing according to claim 1, wherein the cloud performs semantic recognition on interaction voice information based on machine learning, searches in an instruction hierarchical diagram to generate an interaction instruction, generates a comprehensive constraint according to the driving scenario, and corrects the interaction instruction through the comprehensive constraint, specifically:
preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average construction on sentence vector expression according to the Word vectors, and taking the Word vectors and the sentence vector expression as semantic features;
establishing a key information extraction model based on a bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiated weights by combining an attention mechanism and context to obtain key information in interactive voice information;
classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of a target user;
when the retrieval path in the instruction hierarchical graph corresponds to an instruction which is not unique, setting question-back voice information according to retrieval contents, updating the intention according to the feedback of a target user, and matching the corresponding interactive instruction through the updated intention;
and setting comprehensive constraints based on the current driving situation, judging whether the matched interactive instruction accords with the range of the comprehensive constraints, and if not, modifying the interactive instruction and inquiring feedback information of a target user through voice.
5. The cloud-computing-based vehicle-mounted intelligent voice interaction method according to claim 1, further comprising: monitoring attention information of a driving user during voice interaction, specifically comprising:
after receiving the interactive instruction, acquiring the state information of the current timestamp of the driving user, and creating a temporary attention monitoring task based on the state information of the current timestamp;
acquiring the sight line drop frequency of a driving user in the driving scene of the current timestamp to acquire a watching hotspot area, acquiring the sight line drop of each timestamp in the attention monitoring task through the sight line direction of the driving user, and marking the watching duration of the sight line drop;
judging whether a sight line drop point of each timestamp in the attention monitoring task falls in the watching hotspot area, if the watching duration that the sight line drop point does not fall in the watching hotspot area is longer than a preset threshold, judging whether voice interaction is suspended according to the type of the interaction instruction, and generating voice prompt;
and after the voice interaction is suspended, when the condition that the sight line drop point of the driving user returns to the watching hot spot area is detected, the voice interaction scene is recovered, and the instruction corresponding operation is realized according to the historical interaction instruction.
6. The cloud-computing-based vehicle-mounted intelligent voice interaction method according to claim 1, characterized in that feedback information of a target user on an interaction instruction is acquired, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interaction instruction is compensated based on the instruction habit information, specifically:
after the interactive instruction is executed, feedback information of a target user to the interactive instruction is obtained, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set to be a voiceprint information label of the target user;
the method comprises the steps of supplementing and correcting an instruction hierarchical diagram based on a supplement data set of each interactive instruction, extracting a diagram structure according to the corrected instruction hierarchical diagram, and training a graph convolution neural network according to the extracted diagram structure to obtain instruction habit information of a target user;
constructing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction, so that the interactive instruction can achieve the expected effect of the target user at one time;
and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
7. The utility model provides an on-vehicle intelligent voice interaction system based on cloud calculates which characterized in that, this system includes: the vehicle-mounted intelligent voice interaction method based on the cloud computing comprises a storage and a processor, wherein the storage comprises the vehicle-mounted intelligent voice interaction method based on the cloud computing, and when the vehicle-mounted intelligent voice interaction method based on the cloud computing is executed by the processor, the following steps are realized:
acquiring attention information of a driving user in a driving process in a vehicle-mounted environment, analyzing state information of the driving user according to the attention information and the vehicle-mounted environment information, and generating a driving scene in the current driving process of a vehicle;
acquiring interactive voice information in a vehicle-mounted environment, acquiring position and identity information of a target user through the interactive voice information, judging whether the target user is a driving user, matching an interactive instruction set corresponding to the user through a judgment result, and initializing an instruction hierarchical graph;
the cloud carries out semantic recognition on interactive voice information based on machine learning, searches in an instruction hierarchical graph to generate an interactive instruction, generates comprehensive constraints according to the driving scene, and corrects the interactive instruction through the comprehensive constraints;
feedback information of the target user on the interactive instruction is obtained, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interactive instruction is compensated based on the instruction habit information.
8. The vehicle-mounted intelligent voice interaction system based on cloud computing as claimed in claim 7, wherein interactive voice information in a vehicle-mounted environment is acquired, and the position and identity information of a target user is acquired through the interactive voice information, and specifically:
acquiring interactive voice information in the vehicle-mounted environment according to a voice receiving module in the vehicle-mounted environment, carrying out filtering and denoising on the interactive voice information, and dividing the vehicle-mounted environment into sub-areas with preset quantity;
acquiring sound energy information and arrival time difference of the received interactive voice information in each subarea, and judging a source subarea of the interactive voice information according to the sound energy information and the arrival time difference;
determining the position of the interactive voice information, then carrying out voiceprint recognition, retrieving identity information through big data at the cloud according to the voiceprint recognition result, and calculating the voiceprint corresponding to the interactive voice information and carrying out similarity calculation on cloud storage data;
acquiring data with the similarity meeting a preset similarity standard, extracting corresponding identity information as the identity information of a target user, reading the voice habit characteristics matched with the stored voice habit characteristics through the identity information, and if the cloud storage data do not meet the preset similarity standard, creating a voiceprint sequence and storing the voiceprint sequence in the cloud;
and matching an interactive instruction set corresponding to the functional information according to the position information of the target user, and initializing an instruction hierarchical graph based on the identity information through the interactive instruction set.
9. The cloud-computing-based vehicle-mounted intelligent voice interaction system according to claim 7, wherein the cloud performs semantic recognition on interaction voice information based on machine learning, searches in an instruction hierarchical diagram to generate an interaction instruction, generates a comprehensive constraint according to the driving scenario, and corrects the interaction instruction through the comprehensive constraint, specifically:
preprocessing interactive voice information, extracting Word vectors from the preprocessed interactive voice information through a Word2vec model, performing weighted average construction on sentence vector expression according to the Word vectors, and taking the Word vectors and the sentence vector expression as semantic features;
establishing a key information extraction model based on a bidirectional long-short term memory neural network model, inputting semantic features into the key information extraction model, and configuring differentiated weights by combining an attention mechanism and context to obtain key information in interactive voice information;
classifying by using the key information, labeling category labels, retrieving in the initialized instruction hierarchical diagram to obtain corresponding instructions of the key information, and managing the intention of a target user;
when the retrieval path in the instruction hierarchical graph corresponds to an instruction which is not unique, setting question-back voice information according to retrieval contents, updating the intention according to the feedback of a target user, and matching the corresponding interactive instruction through the updated intention;
and setting comprehensive constraints based on the current driving situation, judging whether the matched interactive instruction accords with the range of the comprehensive constraints, and if not, modifying the interactive instruction and inquiring feedback information of a target user through voice.
10. The cloud-computing-based vehicle-mounted intelligent voice interaction system according to claim 7, wherein feedback information of a target user on an interaction instruction is acquired, instruction habit information of the target user is analyzed through matching of voiceprint information and the feedback information of the target user, and correction of the interaction instruction is compensated based on the instruction habit information, and specifically:
after the interactive instruction is executed, feedback information of a target user to the interactive instruction is obtained, a supplementary data set of each interactive instruction is set through the feedback information, and the supplementary data set is set to be a voiceprint information label of the target user;
the method comprises the steps of supplementing and correcting an instruction hierarchical diagram based on a supplement data set of each interactive instruction, extracting a diagram structure according to the corrected instruction hierarchical diagram, and training a graph convolution neural network according to the extracted diagram structure to obtain instruction habit information of a target user;
constructing a personalized database of a target user by combining the instruction habit information with the corresponding vehicle-mounted environment, learning according to the personalized data, and compensating the correction precision of the interactive instruction, so that the interactive instruction can achieve the expected effect of the target user at one time;
and presetting a cloud storage time threshold, and deleting the personalized database of the target object when the non-calling time of the personalized database corresponding to the voiceprint information of the target object exceeds the preset storage time threshold.
CN202211395643.1A 2022-11-09 2022-11-09 Vehicle-mounted intelligent voice interaction method and system based on cloud computing Active CN115440221B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211395643.1A CN115440221B (en) 2022-11-09 2022-11-09 Vehicle-mounted intelligent voice interaction method and system based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211395643.1A CN115440221B (en) 2022-11-09 2022-11-09 Vehicle-mounted intelligent voice interaction method and system based on cloud computing

Publications (2)

Publication Number Publication Date
CN115440221A CN115440221A (en) 2022-12-06
CN115440221B true CN115440221B (en) 2023-03-24

Family

ID=84252910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211395643.1A Active CN115440221B (en) 2022-11-09 2022-11-09 Vehicle-mounted intelligent voice interaction method and system based on cloud computing

Country Status (1)

Country Link
CN (1) CN115440221B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118248135A (en) * 2022-12-23 2024-06-25 北京罗克维尔斯科技有限公司 Voice interaction method and device of intelligent equipment and vehicle
CN116741175B (en) * 2023-08-14 2023-11-03 深圳市实信达科技开发有限公司 Block chain-based intelligent data transmission supervision system and method
CN117115788B (en) * 2023-10-19 2024-01-02 天津所托瑞安汽车科技有限公司 Intelligent interaction method for vehicle, back-end server and front-end equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9085303B2 (en) * 2012-11-15 2015-07-21 Sri International Vehicle personal assistant
CN108896061A (en) * 2018-05-11 2018-11-27 京东方科技集团股份有限公司 A kind of man-machine interaction method and onboard navigation system based on onboard navigation system
CN110019740B (en) * 2018-05-23 2021-10-01 京东方科技集团股份有限公司 Interaction method of vehicle-mounted terminal, server and storage medium
CN110874202B (en) * 2018-08-29 2024-04-19 斑马智行网络(香港)有限公司 Interaction method, device, medium and operating system
CN111653277A (en) * 2020-06-10 2020-09-11 北京百度网讯科技有限公司 Vehicle voice control method, device, equipment, vehicle and storage medium
US11897331B2 (en) * 2021-01-14 2024-02-13 Baidu Usa Llc In-vehicle acoustic monitoring system for driver and passenger
CN115294976A (en) * 2022-06-23 2022-11-04 中国第一汽车股份有限公司 Error correction interaction method and system based on vehicle-mounted voice scene and vehicle thereof
CN115273797A (en) * 2022-06-28 2022-11-01 智己汽车科技有限公司 Sound-based automobile interaction method and system and storage medium

Also Published As

Publication number Publication date
CN115440221A (en) 2022-12-06

Similar Documents

Publication Publication Date Title
CN115440221B (en) Vehicle-mounted intelligent voice interaction method and system based on cloud computing
CN112100349B (en) Multi-round dialogue method and device, electronic equipment and storage medium
CN110390363A (en) A kind of Image Description Methods
Laraba et al. Dance performance evaluation using hidden Markov models
CN111310562A (en) Vehicle driving risk management and control method based on artificial intelligence and related equipment thereof
CN113723528B (en) Vehicle-mounted language-vision fusion multi-mode interaction method and system, equipment and storage medium
CN110827797B (en) Voice response event classification processing method and device
CN111653274B (en) Wake-up word recognition method, device and storage medium
CN113408287B (en) Entity identification method and device, electronic equipment and storage medium
CN112418302A (en) Task prediction method and device
CN111125457A (en) Deep cross-modal Hash retrieval method and device
CN113837669B (en) Evaluation index construction method of label system and related device
CN111340355A (en) Matching method, device, server and medium of travel order
CN116363712B (en) Palmprint palm vein recognition method based on modal informativity evaluation strategy
CN116958512A (en) Target detection method, target detection device, computer readable medium and electronic equipment
CN113469023B (en) Method, apparatus, device and storage medium for determining alertness
CN114882522A (en) Behavior attribute recognition method and device based on multi-mode fusion and storage medium
CN113128284A (en) Multi-mode emotion recognition method and device
CN117407507A (en) Event processing method, device, equipment and medium based on large language model
CN113012687A (en) Information interaction method and device and electronic equipment
CN108959387B (en) Information acquisition method and device
CN115689603A (en) User feedback information collection method and device and user feedback system
CN116737940B (en) Intelligent decision method and decision system
CN117235234B (en) Object information acquisition method, device, computer equipment and storage medium
CN118228021B (en) Training method and related device for recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant