CN111415656A

CN111415656A - Voice semantic recognition method and device and vehicle

Info

Publication number: CN111415656A
Application number: CN201910009490.4A
Authority: CN
Inventors: 刘磊
Original assignee: Shanghai Qinggan Intelligent Technology Co Ltd
Current assignee: Shanghai Qinggan Intelligent Technology Co Ltd
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2020-07-14
Anticipated expiration: 2039-01-04
Also published as: CN111415656B

Abstract

The application relates to a speech semantic recognition method, which comprises the steps of judging whether speech information of a user is received in real time; when voice information is received, judging whether the voice information conforms to a preset dialect; if so, performing corresponding response operation according to the voice information; if not, analyzing the voice information, obtaining key words in the voice information, obtaining the target intention of the user according to the key words and/or the combination of the key words, and obtaining and displaying at least one piece of input demonstration information matched with the target intention and the preset words of the user. The application also relates to a voice semantic recognition device and a vehicle. The voice semantic recognition method can refer the voice interaction technology to the car machine equipment, can reduce manual operation of a user by using the voice recognition technology, can provide voice guide for the user under the condition that the user does not master the voice, provides more appropriate help, accelerates the progress of mastering the voice function of the user, and improves user experience.

Description

Voice semantic recognition method and device and vehicle

Technical Field

The application relates to the technical field of voice recognition, in particular to a voice semantic recognition method, a voice semantic recognition device and a vehicle.

Background

The speech recognition technology is a high-tech technology for correctly recognizing human speech by a machine and converting vocabulary contents in the human speech into corresponding computer-readable inputtable texts or commands. With the continuous progress of science and technology, the field of speech recognition technology is more and more extensive. Compared with other input modes such as keyboard input and the like, the voice recognition technology is more in line with the daily habits of users, and therefore, the voice recognition technology becomes one of the most important human-computer interaction technologies.

However, the existing voice functions cannot be as intelligent as a real person, the specific dialogues and the using methods need the user to learn to use the voice functions better, the user does not want to spend time and energy on reading the specifications, and even if the user wants to read the specifications, many dialogues are difficult to remember.

Aiming at the defects in multiple aspects of the prior art, the application provides a speech semantic recognition method, a speech semantic recognition device and a vehicle.

Disclosure of Invention

The application aims to provide a voice semantic recognition method, a voice semantic recognition device and a vehicle, a voice interaction technology can be introduced into vehicle equipment, manual operation of a user can be reduced by using the voice recognition technology, voice guide can be provided for the user under the condition that the user does not master the voice, more appropriate help is provided, meanwhile, the progress of mastering a voice function of the user is accelerated, and user experience is improved.

In order to solve the above technical problem, the present application provides a speech semantic recognition method, including the following steps: judging whether voice information of a user is received in real time; when voice information is received, judging whether the voice information conforms to a preset dialect; if so, performing corresponding response operation according to the voice information; if not, analyzing the voice information, obtaining key words in the voice information, obtaining the target intention of the user according to the key words and/or the combination of the key words, and obtaining and displaying at least one piece of input demonstration information matched with the target intention and the preset words of the user.

In one embodiment, the step of parsing the voice information, obtaining keywords in the voice information, and obtaining the user target intention according to the keywords and/or the combination of the keywords comprises converting the received voice information into at least one piece of text information; performing word segmentation on the text information, wherein word segmentation adopts word segmentation based on a word bank; identifying keywords according to the segmented text; and acquiring the target intention of the user according to the keywords and/or the combination of the keywords.

In one embodiment, the step of converting the received voice information into at least one piece of text information includes performing feature recognition on the voice information to obtain voice features of the user, wherein the voice features of the user at least include regional feature data where the user is located; judging the official language type of the region corresponding to the language type used by the user according to the voice characteristics of the user; the voice information is converted into at least one piece of text information matching the official language type.

In one embodiment, the step of converting the received speech information into at least one text message is followed by error correction of the at least one text message by near word matching and common homophone replacement.

In one embodiment, the word segmentation based on the word bank is to perform word segmentation on the text information by means of a Chinese dictionary database, a historical behavior word bank and a hot search word bank.

In one embodiment, the step of obtaining and presenting at least one input demonstration information matching the target intention of the user and the preset dialect comprises classifying the input demonstration information according to a preset rule.

In one embodiment, the step of obtaining and displaying at least one piece of input demonstration information matching the target intention and the preset dialect of the user comprises weighting and scoring the input demonstration information according to the matching degree of the target intention and the preset dialect of the user, and obtaining and displaying input demonstration information ranked at the top n bits, wherein n is a positive integer greater than or equal to 1.

In order to solve the above technical problem, the present application further provides a speech semantic recognition apparatus, which includes a memory and a processor, wherein the memory is used for storing executable program codes; the processor is configured to call the executable program code in the memory to perform the steps of: judging whether voice information of a user is received in real time; when voice information is received, judging whether the voice information conforms to a preset dialect; if so, making corresponding response operation according to the voice information; if not, analyzing the voice information, obtaining key words in the voice information, obtaining the target intention of the user according to the key words and/or the combination of the key words, and obtaining and displaying at least one piece of input demonstration information matched with the target intention and the preset words of the user.

In one embodiment, the processor is further configured to convert the received voice message into at least one text message; performing word segmentation on the text information, wherein word segmentation adopts word segmentation based on a word bank; identifying keywords according to the segmented text; and acquiring the target intention of the user according to the keywords and/or the combination of the keywords.

In order to solve the technical problem, the present application further provides a vehicle equipped with the above speech semantic recognition device, wherein the vehicle is an unmanned vehicle, a manually driven vehicle, or an intelligent vehicle capable of freely switching between the unmanned vehicle and the manually driven vehicle.

The speech semantic recognition method, the speech semantic recognition device and the vehicle can refer the speech interaction technology to the vehicle equipment, the manual operation of a user can be reduced by using the speech recognition technology, speech guide can be provided for the user under the condition that the user does not master speech, more appropriate help is provided, meanwhile, the progress of mastering the speech function of the user is accelerated, and the user experience is improved.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical means of the present application more clearly understood, the present application may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present application more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.

Drawings

Fig. 1 is a schematic flow chart of a speech semantic recognition method according to an embodiment of the present application.

Fig. 2 is a flowchart illustrating step S15 in the speech semantic recognition method shown in fig. 1 according to an embodiment.

Fig. 3 is a flowchart illustrating step S16 in the speech semantic recognition method shown in fig. 1 according to an embodiment.

Fig. 4 is a schematic structural diagram of a speech semantic recognition apparatus according to an embodiment of the present application.

Detailed Description

To further clarify the technical measures and effects taken by the present application to achieve the intended purpose, the present application will be described in detail below with reference to the accompanying drawings and preferred embodiments.

While the present application has been described in terms of specific embodiments and examples for achieving the desired objects and objectives, it is to be understood that the invention is not limited to the disclosed embodiments, but is to be accorded the widest scope consistent with the principles and novel features as defined by the appended claims.

Fig. 1 is a schematic flow chart of a speech semantic recognition method according to a first embodiment of the present application, and as shown in fig. 1, the speech semantic recognition method includes the following steps:

and step S11, judging whether the voice information of the user is received in real time.

In particular, voice information of the user may be received through a microphone or other voice input device.

If the voice message of the user is not received, executing step S12, not processing; if the voice message of the user is received, step S13 is executed to determine whether the predetermined dialect is satisfied.

If the predetermined rule is satisfied, go to step S14: and carrying out corresponding response operation according to the voice information.

Specifically, the preset dialect is a dialect grasped through the pre-machine language learning, that is, when voice information in accordance with the preset dialect is received, a corresponding response operation may be performed without processing. For example, the preset dialog of the embodiment may be "please help me navigate to XXX", "turn on air conditioner", "turn on radio", and so on.

If the preset words are not met, for example, "navigate me to turn on the air conditioner and also turn on the radio", "play the song and listen to", "yes time when the meal is needed, find a parking lot to park, i want to go to the meal", etc., the received information cannot be identified and the corresponding response operation cannot be performed, step S15 is executed: and analyzing the language information, acquiring key words in the voice information, and acquiring the target intention of the user according to the key words and/or the combination of the key words.

Specifically, in an embodiment, in order to facilitate user operation, a user does not need to train words in advance or adopt fixed words, and the method and the device can directly identify and process common natural languages, analyze received voice information, acquire keywords in the voice information, and then acquire a user target intention according to the keywords and/or a combination of the keywords.

Specifically, in one embodiment, the step S15 of parsing the received language information to obtain keywords in the voice information, and obtaining the user target intention according to the keywords and/or the combination of the keywords may be converting the voice information into plain text information, obtaining the keywords of the voice information by segmenting the plain text information, and obtaining the user target intention according to the keywords and/or the combination of the keywords. In another embodiment, the target intention of the user can also be obtained by extracting the voice feature information according to the voice information, generating the recognition result of the voice information according to the voice feature information and a preset acoustic model, and then according to a preset algorithm and the recognition result of the voice information.

Specifically, the target intention of the user may include functions desired to be used, such as a navigation function, a function of controlling various devices on the vehicle, such as an in-vehicle multimedia device, a window, a lighting device, and the like. The user's target intent may also include a destination desired to be reached, songs desired to be listened to, people desired to talk to, and so on.

Step S16: and acquiring and displaying at least one piece of input demonstration information matched with the target intention and preset dialogues of the user.

Specifically, in the present embodiment, the input demonstration information may be a preset utterance grasped in advance by machine language learning, or may be information that can be recognized and generated based on a combination of the target intention of the user and the preset utterance. For example, when the navigation function is used, the preset terminology is "please help me navigate to XXX", and the target intention of the user includes a destination to be reached, for example, "Tiananmen square", and the generated input demonstration information may include "please help me navigate to Tiananmen square".

Specifically, in the present embodiment, the generated input demonstration information is acquired and presented while the information is being voice-broadcasted.

Specifically, in the present embodiment, the input demonstration information may be classified by functions, such as a multimedia play function, a navigation function, and the like.

Fig. 2 is a flowchart illustrating an embodiment of step S15 in the speech semantic recognition method shown in fig. 1. As shown in fig. 2, the step of parsing the language information, obtaining the keywords in the voice information, and obtaining the user target intention according to the keywords and/or the combination of the keywords in the embodiment may specifically include the following processes.

Step S21: and carrying out feature recognition on the received voice information to obtain the voice features of the user.

Specifically, the language features of the user at least comprise regional feature data where the user is located.

Specifically, the regional characteristic of the user refers to the user's location or the user's native region, and can be determined according to the language type used by the user. The language types may include different languages, dialects, etc., such as english, japanese, korean, arabic, cantonese, sichuan dialects, etc. Specifically, semantic analysis can be performed on the received voice message to obtain the language type to which the voice message belongs, and the regional feature data where the user is located can be obtained according to the language type to which the voice message belongs.

Specifically, in the present embodiment, after performing semantic analysis on the speech information, the specific content of the speech can be obtained. And then, comparing the vocabulary, the semantics and the like in the specific content with a pre-established language vocabulary database, wherein the language vocabulary database comprises vocabulary libraries corresponding to different language types. Therefore, the corresponding language type can be compared according to the vocabulary corresponding to the voice information of the user, and the regional characteristic data of the user can be further predicted. For example, if the user uses portuguese, the user may be a user from the portuguese usage country or the user is in the portuguese usage country, and if the user uses cantonese, the user may be a user from or in guangdong, hong kong, etc.

Step S22: and judging the official language type of the region corresponding to the language type used by the user according to the voice characteristics of the user.

Specifically, in the present embodiment, the official language type of the region corresponding to the language type used by the user can be determined according to the region feature data where the user is located, for example, if the region feature data where the user is located corresponds to sichuan, it is known that the language type used by the user is sichuan and the corresponding official language is mandarin.

Specifically, in another embodiment, the user may also trigger the language button and select the language type of the speech information that the user wishes to recognize, for example, the language type may be, but is not limited to, chinese (mandarin and local dialects such as cantonese, northeast, tetragon, etc.), english, french, german, korean, etc., so as to obtain the official language type corresponding to the language type after processing.

Step S23: the voice information is converted into at least one piece of text information matching the official language type.

Specifically, in the present embodiment, in order to improve the reliability of speech information recognition, words and phrases related to the language information may be acquired through big data learning to compose pieces of text information. In another embodiment, the user's voice information may also be directly converted into a piece of plain text information.

Specifically, in order to prevent the factor of processing error of converting the voice information into the text information, in an embodiment, the step of converting the received voice information into at least one piece of text information further includes performing error correction processing on the at least one piece of text information through near-synonym matching and common homophone replacement.

Specifically, in the present embodiment, when performing error correction processing, error correction is performed by near-synonym matching, and then it is determined whether a phrase exists by using common homophones, and if so, error correction replacement is performed. For example, "i want to eat XX food and please help me recommend restaurant nearby" food, "wrong information such as" business "," mistake "or" real object "may be generated in the voice information conversion text information, and the" food "is replaced with the correct one after the error correction processing.

Step S24: and segmenting the text information.

Specifically, in the present embodiment, word segmentation is based on a lexicon, and word segmentation based on the lexicon is performed on the text information by means of a chinese dictionary database, a historical behavior lexicon, and a popular search lexicon.

Specifically, The accuracy Of Word segmentation depends On The algorithm and Word bank, different languages need different Word segmentation techniques due to different composition, for example, english is in Word units, Word and Word are separated by spaces, Chinese is in Word units, and adjacent words are connected to form a Word, in another embodiment, regular Word segmentation and dictionary-Based Word segmentation algorithm mmseg (a Word Identification System for man dictionary Chinese Text Based On twovariables Of The Maximum Matching algorithm) can be used, thereby realizing Word segmentation for english and Chinese.

Specifically, in the present embodiment, the principle of word segmentation is that the keyword is segmented according to the minimum word segmentation times. The recognition complexity can be reduced and the recognition efficiency can be improved through word segmentation.

Step S25: and acquiring keywords according to the text after word segmentation.

Specifically, in the present embodiment, the keywords are identified according to the segmented text, and for the text that cannot be identified, the pre-established user-used word library is used for matching and identifying. In another embodiment, text that is not recognized may also be discarded.

Step S26: and acquiring the target intention of the user according to the keywords and/or the combination of the keywords.

Specifically, in the present embodiment, the target intention of the user may be obtained according to the keywords and/or the combination of the keywords, and the operation that the user may want to perform may be inferred, so as to provide guidance and help.

Fig. 3 is a flowchart illustrating an embodiment of step S16 in the speech semantic recognition method shown in fig. 1 in fig. 3. As shown in fig. 3, the step of acquiring and displaying at least one input demonstration information matching the user target intention and the preset dialect in the present embodiment specifically includes the following steps.

Step S31: the input demonstration information is classified according to preset rules.

Specifically, in the present embodiment, the preset rule may be classified by functions, such as a navigation function of a vehicle, a play function of in-vehicle multimedia, and the like.

Specifically, as machine language is continuously learned, the data volume of input demonstration information becomes more and more huge, and the classification of the input demonstration information according to preset rules is to improve the response rate, so that a user can acquire the input demonstration information more quickly, and the user experience is improved.

Step S32: and carrying out sequencing weighted scoring on the preset dialect of the input demonstration information according to the matching degree of the preset dialect and the target intention of the user, and acquiring and displaying the input demonstration information with the top n grades.

Specifically, in the present embodiment, the terminal displays the input demonstration information of the top n bits that matches the user's target intention and the preset dialect to the highest extent. In other embodiments, the terminal may display the input demonstration information matching the target intention and preset dialogues of the user and having the highest historical use frequency of the user.

Fig. 4 is a schematic structural diagram of an embodiment of the speech semantic recognition apparatus according to the present application, and as shown in fig. 4, the speech semantic recognition apparatus 40 according to the present embodiment includes: a memory 401 and a processor 402. The memory 401 is used for storing executable program code; the processor 402 is configured to call the executable program code in the memory 401 to perform the following steps: judging whether voice information of a user is received in real time; when voice information is received, judging whether the voice information conforms to a preset dialect; if so, making corresponding response operation according to the voice information; if not, analyzing the voice information, obtaining key words in the voice information, obtaining the target intention of the user according to the key words and/or the combination of the key words, and obtaining and displaying at least one piece of input demonstration information matched with the target intention and the preset words of the user.

In one embodiment, the processor 402 is further configured to convert the received voice message into at least one text message; performing word segmentation on the text information, wherein word segmentation adopts word segmentation based on a word bank; identifying keywords according to the segmented text; and acquiring the target intention of the user according to the keywords and/or the combination of the keywords.

The application also provides a vehicle which is provided with the voice semantic recognition device and is an unmanned vehicle, a manually-driven vehicle or an intelligent vehicle capable of freely switching between the unmanned vehicle and the manually-driven vehicle.

Although the present application has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application, and all changes, substitutions and alterations that fall within the spirit and scope of the application are to be understood as being covered by the following claims.

Claims

1. A speech semantic recognition method, characterized in that the speech semantic recognition method comprises:

judging whether voice information of a user is received in real time;

when the voice information is received, judging whether the voice information conforms to a preset dialect;

if yes, performing corresponding response operation according to the voice information;

if not, analyzing the voice information, acquiring key words in the voice information, acquiring a user target intention according to the key words and/or the combination of the key words, and acquiring and displaying at least one piece of input demonstration information matched with the user target intention and the preset conversation.

2. The speech semantic recognition method according to claim 1, wherein the step of parsing the speech information, obtaining keywords in the speech information, and obtaining the user target intention according to the keywords and/or the combination of the keywords comprises:

converting the received voice information into at least one piece of text information;

performing word segmentation on the text information, wherein word segmentation based on a word bank is adopted;

recognizing the keywords according to the segmented text;

and acquiring the target intention of the user according to the keywords and/or the combination of the keywords.

3. The speech semantic recognition method of claim 2, wherein the step of converting the received speech information into at least one text message comprises:

performing feature recognition on the voice information to acquire voice features of the user, wherein the voice features of the user at least comprise regional feature data of the user;

judging the official language type of the region corresponding to the language type used by the user according to the voice characteristics of the user;

converting the voice information into the at least one piece of text information matching the official language type.

4. The speech semantic recognition method of claim 2, wherein the step of converting the received speech information into at least one text message is followed by:

and carrying out error correction processing on the at least one piece of text information through similar meaning word matching and common homophone word replacement.

5. The speech semantic recognition method of claim 2, wherein the thesaurus-based segmentation is a segmentation of the text information by means of a chinese dictionary database, a historical behavior thesaurus, and a popular search thesaurus.

6. The speech semantic recognition method of claim 1, wherein the step of obtaining and presenting at least one input demonstration information matching the user's target intent and the predetermined dialect is preceded by the steps of:

and classifying the input demonstration information according to a preset rule to improve the response rate.

7. The speech semantic recognition method of claim 1, wherein the step of obtaining and presenting at least one input demonstration information matching the user's target intent and the preset dialect comprises:

and carrying out weighted scoring on the input demonstration information according to the matching degree with the target intention of the user and the preset dialect, and acquiring and displaying the input demonstration information with top n grades, wherein n is a positive integer greater than or equal to 1.

8. A speech semantic recognition device is characterized by comprising a memory and a processor,

the memory is used for storing executable program codes;

the processor is configured to call the executable program code in the memory to perform the steps of:

judging whether voice information of a user is received in real time;

when voice information is received, judging whether the voice information conforms to a preset dialect;

if so, making corresponding response operation according to the voice information;

9. The speech semantic recognition apparatus of claim 8, wherein the processor is further configured to convert the received speech information into at least one text message; performing word segmentation on the text information, wherein word segmentation based on a word bank is adopted; identifying keywords according to the segmented text; and acquiring the target intention of the user according to the keywords and/or the combination of the keywords.

10. A vehicle equipped with the speech semantic recognition device according to claim 9, characterized in that the vehicle is an unmanned vehicle, a human-driven vehicle, or an intelligent vehicle that freely switches between an unmanned vehicle and a human-driven vehicle.