CN115938365B - Voice interaction method, vehicle and computer readable storage medium - Google Patents

Voice interaction method, vehicle and computer readable storage medium Download PDF

Info

Publication number
CN115938365B
CN115938365B CN202310229334.5A CN202310229334A CN115938365B CN 115938365 B CN115938365 B CN 115938365B CN 202310229334 A CN202310229334 A CN 202310229334A CN 115938365 B CN115938365 B CN 115938365B
Authority
CN
China
Prior art keywords
word
voice request
vocabulary
user voice
embedding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310229334.5A
Other languages
Chinese (zh)
Other versions
CN115938365A (en
Inventor
何澍义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Motors Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Motors Technology Co Ltd filed Critical Guangzhou Xiaopeng Motors Technology Co Ltd
Priority to CN202310229334.5A priority Critical patent/CN115938365B/en
Publication of CN115938365A publication Critical patent/CN115938365A/en
Application granted granted Critical
Publication of CN115938365B publication Critical patent/CN115938365B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Navigation (AREA)

Abstract

The application discloses a voice interaction method, which comprises the following steps: receiving a user voice request in a vehicle cabin; word embedding and extracting are carried out on the voice request of the user to obtain word embedding characteristics; performing word embedding extraction on a user voice request to obtain word embedding characteristics; the splice character embedding feature and the word embedding feature obtain the splice feature of the user voice request; and carrying out named entity recognition according to the splicing characteristics of the user voice request so as to complete voice interaction according to the recognition result of the named entity. In the method, word embedding extraction and word embedding extraction are carried out on a user voice request in a vehicle cabin, named entity recognition is carried out according to the obtained word embedding characteristics and the splicing result of the word embedding characteristics, and finally voice interaction is completed. According to the voice interaction method, the named entities can be identified through the types of the words and the boundary information in the voice request sent by the user, voice interaction is finally completed, accuracy in identifying the named entities in the voice request is improved, and interaction experience of the user is improved.

Description

Voice interaction method, vehicle and computer readable storage medium
Technical Field
The present disclosure relates to the field of natural language understanding technologies, and in particular, to a voice interaction method, a vehicle, and a computer readable storage medium.
Background
Currently, in-vehicle voice technology may support user interaction within a vehicle cabin via voice, such as controlling vehicle components or interacting with components in an in-vehicle system user interface. For the recognition of named entities in the user voice request, the current vehicle-mounted voice technology has the problems of inaccurate recognition of the named entities in the user voice request, influences the accuracy of the voice interaction process, and is poor in user experience.
Disclosure of Invention
The application provides a voice interaction method, a vehicle and a computer readable storage medium.
The voice interaction method comprises the following steps:
receiving a user voice request in a vehicle cabin;
word embedding and extracting are carried out on the user voice request to obtain word embedding characteristics;
performing word embedding extraction on the user voice request to obtain word embedding characteristics;
splicing the word embedding feature and the word embedding feature to obtain a splicing feature of the user voice request;
and carrying out named entity recognition according to the splicing characteristics of the user voice request so as to complete voice interaction according to the recognition result of the named entity.
In this way, in the application, word embedding extraction and word embedding extraction are performed on the user voice request in the vehicle cabin, the word embedding features of the obtained word embedding features are spliced, and named entity recognition is performed according to the relation between each word and each word of the voice request contained in the splicing result, so that voice interaction is finally completed. According to the voice interaction method, the named entities can be identified through the types of the words and the boundary information in the voice request sent by the user, voice interaction is finally completed, accuracy in identifying the named entities in the voice request is improved, and interaction experience of the user is improved.
The word embedding and extracting process for the user voice request to obtain word embedding characteristics comprises the following steps:
initializing a vocabulary embedded representation of a preset vocabulary;
respectively determining the preset word list comprising each word in the user voice request to generate a word list sequence;
and constructing word embedding of each word in the user voice request according to the word list embedding representation and the word list sequence so as to obtain the word embedding characteristics of each word in the user voice request.
Therefore, the vocabulary embedded representation of the preset vocabulary corresponding to each named entity in the user voice request can be initialized, the preset vocabulary corresponding to each word in the user voice request is determined, and the word embedded characteristics of each word are obtained through calculation, so that the named entity can be identified by the natural language understanding model, and the calculation cost is reduced.
Before the step of word embedding and extracting the user voice request to obtain word embedding characteristics, the voice interaction method comprises the following steps:
constructing a first type word list and a second type word list according to the service type, wherein the initial position of the first type word list is a preset word, the middle position of the second type word list is a preset word, and the preset word list comprises the first type word list and the second type word list.
Thus, the vocabulary can be preset according to the service type classification, so that a vocabulary sequence is generated according to the sequence number of the vocabulary, and the vocabulary embedded representation is combined to obtain the embedded characteristics of each word in the voice request.
The determining the preset word list including each word in the user voice request to generate a word list sequence includes:
determining a first type vocabulary comprising each word in the user voice request, and generating a first vocabulary sequence according to the sequence number of the first type vocabulary;
and determining a second type vocabulary comprising each word in the user voice request, and generating a second vocabulary sequence according to the sequence number of the second type vocabulary.
Therefore, the sequence numbers of the preset word list where each word in the voice request of the user is located can be combined to form the corresponding word list sequence, so that word embedding of each word in the voice request is calculated by combining word list embedding representation, and finally word embedding characteristics are obtained.
The word embedding of each word in the user voice request is constructed according to the word list embedded representation and the word list sequence, so as to obtain the word embedding characteristics of each word in the user voice request, and the word embedding characteristics comprise:
calculating the vocabulary embedded representation corresponding to the elements in the first vocabulary sequence to obtain first word embedded of each word in the user voice request;
calculating the vocabulary embedded representation corresponding to the elements in the second vocabulary sequence to obtain second word embedded of each word in the user voice request;
and splicing the first word embedding and the second word embedding of each word in the user voice request to obtain the word embedding characteristics of the corresponding word in the user voice request.
Therefore, word embedding characteristics can be obtained through calculation processing according to word list embedding representation corresponding to elements in word list sequences of each word in the voice request, so that the specific named entity where the word is located is judged, and accuracy of a named entity identification process is improved.
The word embedding and extracting of the user voice request to obtain word embedding characteristics comprises the following steps:
and extracting the character embedding characteristics of each character in the user voice request according to a character embedding word list.
Thus, the character embedding characteristics of each character in the user voice request can be extracted from the existing character embedding word list, so that the natural language understanding model has the capability of recognizing specific characters.
The splicing of the word embedding feature and the word embedding feature to obtain the splicing feature of the user voice request comprises the following steps:
splicing the word embedding characteristics and the word embedding characteristics corresponding to each word in the user voice request to obtain splicing characteristics of each word in the user voice request;
and splicing the splicing characteristics of each word in the user voice request in sequence to obtain the splicing characteristics of the user voice request.
Therefore, the word embedding characteristics and the word embedding characteristics corresponding to each word in the user request can be spliced in sequence, so that the splicing characteristics of the user voice request are obtained, the splicing characteristics are used as the input of the named entity recognition model, and the corresponding named entity is finally recognized according to the splicing characteristics.
And performing named entity recognition according to the splicing characteristics of the user voice request to complete voice interaction according to the result of the named entity recognition, wherein the method comprises the following steps:
taking the spliced features of the user voice request as the input of a named entity recognition model to carry out named entity recognition;
and carrying out natural language understanding according to the result of the named entity recognition to complete voice interaction.
Therefore, the named entity recognition can be carried out according to the splicing result of the user voice request, the named entity recognition capability is improved, the named entity recognition in the natural language understanding process is more accurate, and the interactive experience of the user is improved.
The vehicle of the present application comprises a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, implements the method described above.
The computer readable storage medium of the present application stores a computer program which, when executed by one or more processors, implements the method described above.
Additional aspects and advantages of embodiments of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic flow chart of a voice interaction method of the present application;
FIG. 2 is a second flow chart of the voice interaction method of the present application;
FIG. 3 is a third flow chart of the voice interaction method of the present application;
FIG. 4 is a flow chart of a voice interaction method of the present application;
FIG. 5 is a fifth flow chart of the voice interaction method of the present application;
FIG. 6 is a schematic illustration of a calculation process of the voice interaction method of the present application;
FIG. 7 is a flow chart of a voice interaction method of the present application;
FIG. 8 is a flow chart of a voice interaction method of the present application;
FIG. 9 is a flowchart eighth of the voice interaction method of the present application;
FIG. 10 is a flowchart illustrating a voice interaction method according to the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present application and are not to be construed as limiting the embodiments of the present application.
With the development and popularization of vehicle electronic technology, a vehicle can perform voice interaction with a user, namely, the voice request of the user can be recognized, and finally, the intention in the voice request of the user is completed. The voice interaction function of the human and vehicle meets the diversified experiences of the driver and the passenger in the driving process. However, for a chinese phonetic request, a word may exist in multiple named entity words that differ in meaning. Therefore, the boundaries and types of the named entities need to be extracted more accurately, and the named entity recognition is enhanced. In the related art, a framework based on model design is introduced for identification, such as a Lattice LSTM framework based on LSTM model design, which is not suitable for other models, and parallel calculation cannot be performed, so that the reusability is poor and the calculation cost is high. In other related technologies, the named entity recognition can be performed by using word boundary information, but the accuracy of the extracted named entity cannot be ensured.
Based on the above problems that may be encountered, referring to fig. 1, the present application provides a voice interaction method, including:
01: receiving a user voice request in a vehicle cabin;
02: word embedding and extracting are carried out on the voice request of the user to obtain word embedding characteristics;
03: performing word embedding extraction on a user voice request to obtain word embedding characteristics;
04: the splice character embedding feature and the word embedding feature obtain the splice feature of the user voice request;
05: and carrying out named entity recognition according to the splicing characteristics of the user voice request so as to complete voice interaction according to the recognition result of the named entity.
The application also provides a vehicle including a memory and a processor. The speech processing method of the present application may be implemented by the vehicle of the present application. Specifically, the memory stores a computer program, and the processor is configured to receive a user voice request in a vehicle cabin, perform word embedding extraction on the user voice request to obtain word embedding features, splice the word embedding features and the word embedding features to obtain splice features of the user voice request, and perform named entity recognition according to the splice features of the user voice request, so as to complete voice interaction according to a result of the named entity recognition.
Specifically, in the voice interaction method of the present application, as shown in fig. 2, in receiving a user voice request in a vehicle cabin, word embedding features and word embedding features in the user voice request may be extracted, and they may be spliced to obtain a splice feature of the user voice request.
In the word embedding process, a word list is constructed for named entities in a voice request according to the service type. When a user makes a voice request, the number of times that each word appears in its corresponding entity word is counted and represented by a type vocabulary, which is called word embedding. Finally, word embedding characteristics can be obtained by performing calculation processing on the word embedding process of all word lists.
In the word embedding process, the word embedding feature of each word can be extracted directly by utilizing a pre-trained word embedding vocabulary.
After the word embedding feature and the word embedding feature are obtained, they can be spliced to obtain the splicing feature of the user voice request. The splice feature includes information of each word in the user voice request and information of all named entity words where the word is located.
And carrying out named entity recognition according to the relation between the words and the characters embodied in the splicing characteristics of the user voice request to obtain the types and boundary information of the named entity words. And finally, according to the result of the named entity recognition, completing the voice interaction.
In summary, in the application, word embedding extraction and word embedding extraction are performed on a user voice request in a vehicle cabin, word embedding features of the obtained word embedding features are spliced, named entity recognition is performed according to the relation between each word and each word of the voice request contained in the splicing result, and finally voice interaction is completed. According to the voice interaction method, the named entities can be identified through the types of the words and the boundary information in the voice request sent by the user, voice interaction is finally completed, accuracy in identifying the named entities in the voice request is improved, and interaction experience of the user is improved.
Referring to fig. 3, step 02 includes:
021: initializing a vocabulary embedded representation of a preset vocabulary;
022: respectively determining preset word lists comprising each word in the user voice request to generate word list sequences;
023: and constructing word embedding of each word in the user voice request according to the word list embedding representation and the word list sequence so as to obtain word embedding characteristics of each word in the user voice request.
The processor is used for initializing the word list embedded representation of the preset word list, respectively determining the preset word list comprising each word in the user voice request to generate a word list sequence, and constructing word embedding of each word in the user voice request according to the word list embedded representation and the word list sequence to obtain word embedding characteristics of each word in the user voice request
Specifically, for user voice requests in different service areas, a preset vocabulary can be constructed according to the type (NER) of the existence of the named entity. Initializing each vocabulary embedded representation according to the identity identification information of the vocabulary, namely
Figure SMS_1
……(1)
Embedding a representation of a vocabularyE gaz (x) Initializing, i.e. for multidimensional vectorsE gaz (x) The initial value is given, and specific numerical values are not limited herein. Initialized vocabulary embedded representationE gaz (x) Can be used as training variable of subsequent natural language understanding model.
Parameters (parameters)iEach word in the voice request may be substituted. Parameters (parameters)LDefinable parametersiIs a value range of the user's voice requestAnd establishing the number of named entities, namely establishing the number of preset word lists.
In one example, the user voice request is "navigate to Guangzhou south station underground parking", which belongs to the field of navigation services. A word list is constructed according to the types of each named entity in the sentence, as shown in table 1:
TABLE 1
Figure SMS_2
The number of word lists that can be constructed by the voice request can be obtainedL=4. Word list embedded representationE gaz (x) In (1), parametersxThe value is 0 to less than or equal tox4。
Further, the vocabularies may be distinguished by a sequence number according to the order in which they appear in the voice request. In the preset vocabulary constructed according to the voice request, the four types of vocabularies are marked according to the types of the vocabularies and the appearance sequence of the vocabularies: DISTRICT number 1, POI_GENERIC number 2, PARK number 3, POI_TYPE number 4. In other examples, there may be two or more vocabularies of the same type in a voice request, e.g., the first vocabulary and the second vocabulary are the same in order, and both have a sequence number of 1.
According to the vocabulary sequence numbers determined in the process, a preset vocabulary of each word in the user voice request can be determined, and a vocabulary sequence corresponding to the word is generated according to the preset vocabulary. Wherein the preset vocabulary is a collection of vocabulary sequences. The vocabulary sequence represents the position information of the word in the preset vocabulary.
Finally, the representation is embedded according to the vocabulary of the word where each word is locatedE gaz (x) And the position information of the word in a preset word list can calculate the word embedding characteristics corresponding to the word, so that the natural language understanding model can identify the named entity.
In particular, when a new named entity exists in a voice request sent by a user, the named entity type is first identified, and whether a new vocabulary type needs to be created for the named entity type or the named entity type can be programmed into the vocabulary of the original type is judged. After the new named entity type is determined, the new named identification entity is added into a subsequent word embedding and extracting process, so that the natural language understanding model has the capability of identifying the named entity, and the calculation cost of the natural language understanding model is reduced.
Therefore, the vocabulary embedded representation of the preset vocabulary corresponding to each named entity in the user voice request can be initialized, the preset vocabulary corresponding to each word in the user voice request is determined, and the word embedded characteristics of each word are obtained through calculation, so that the named entity can be identified by the natural language understanding model, and the calculation cost is reduced.
Referring to fig. 4, step 02 further includes:
024: and constructing a first type word list and a second type word list according to the service type.
The processor is used for constructing a first type word list and a second type word list according to the service types, the initial position of the first type word list is a preset word, the middle position of the second type word list is a preset word, and the preset word list comprises the first type word list and the second type word list.
Specifically, for a preset vocabulary of each word in the user voice request, a first type vocabulary and a second type vocabulary can be constructed according to the service type. Constructing the service type according to the preset word list includes regarding the word currently being calculated as the preset word, and the first type word list represents the preset word list where the word starts with the preset word, which may be called as a Begin word list. The second type of vocabulary, which may be referred to as an "inter vocabulary," is a vocabulary in which preset words are located in Intermediate positions.
In one example, the user voice request is "navigate to Guangzhou south station underground parking", which belongs to the field of navigation services. According to the preset vocabulary shown in table 1, when the preset word is "broad", the first type vocabulary includes all the vocabularies beginning with "broad", namely "DISTRICT: guangzhou "," poi_generic: guangzhou south station ", and" PARK: guangzhou south station underground parking garage. Whereas the "broad" word does not have a second type of vocabulary.
Thus, the vocabulary can be preset according to the service type classification, so that a vocabulary sequence is generated according to the sequence number of the vocabulary, and the vocabulary embedded representation is combined to obtain the embedded characteristics of each word in the voice request.
Referring to fig. 5, step 022 includes:
0221: determining a first type vocabulary comprising each word in the user voice request, so as to generate a first vocabulary sequence according to the serial number of the first type vocabulary;
0222: a second type vocabulary including words in the user voice request is determined to generate a second vocabulary sequence based on the sequence number of the second type vocabulary.
The processor is configured to determine a first type vocabulary comprising words in the user's voice request, to generate a first vocabulary sequence based on a sequence number of the first type vocabulary, and to determine a second type vocabulary comprising words in the user's voice request, to generate a second vocabulary sequence based on a sequence number of the second type vocabulary.
Specifically, for a first type vocabulary of each word in a user voice request, the sequence numbers of the first type vocabulary in which the word is located may be combined to form a first vocabulary sequence. Similarly, the sequence numbers of the second type vocabulary where the word is located can be combined to form a second vocabulary sequence.
For voice requests of length nquery n Wherein each wordc i (1≤in) Sequentially arranging to obtain:
Figure SMS_3
……(2)
can be obtained byc i The first vocabulary sequence formed by the Begin vocabulary sequence number is:
Figure SMS_4
……(3)
wherein the parameters arej 1 ~j t Is the sequence number of the vocabulary.
To be used forc i The second vocabulary sequence formed by the 'inter vocabulary' sequence numbers is:
Figure SMS_5
……(4)
wherein the parameters arek 1 ~k t Is the sequence number of the vocabulary.
In one example, the user voice request is "navigate to Guangzhou south station underground parking garage". As shown in FIG. 6, when the preset word is "wide", the 4 th word in the sentence can be written asc 4 . The first type of vocabulary includes all vocabularies beginning with "broad" characters, namely "DISTRICT: guangzhou "," poi_generic: guangzhou south station ", and" PARK: guangzhou south station underground parking garage. And marking the three types of word lists according to the word list types and the appearance sequence thereof to obtain a DISTRICT number of 1, a POI_GENERIC number of 2 and a PARK number of 3.
In the above-described voice request, the "wide" word does not exist in the vocabulary "poi_type" with the sequence number 4. And, if there are no other words in the voice request that exist in the four vocabularies at the same time, the sequence numbers of the first type vocabulary can be combined to form a first vocabulary sequence containing three elements:
Figure SMS_6
……(5)
since the second type vocabulary corresponding to the "wide" word does not exist in the four preset vocabularies, a second vocabulary sequence can be generated as follows:
Figure SMS_7
……(6)
for the above-mentioned voice request, when the preset word is "south", which is the 6 th word in the sentence, it can be noted thatc 6 . The first type of vocabulary includes all the vocabularies beginning with the "south" word. Since the first type vocabulary beginning with the "nan" word does not exist in the preset vocabulary, the first vocabulary sequence may be generated as follows:
Figure SMS_8
……(7)
the second type of vocabulary includes a vocabulary of "south" words in the middle of a sentence. I.e. "POI GENERICs: guangzhou south station "and" PARK: guangzhou south station underground parking garage. Wherein, DISTRICT number is 1, POI_GENERIC number is 2, and PARK number is 3. The sequence numbers of the second type vocabulary may be combined to form a second vocabulary sequence:
Figure SMS_9
……(8)
for the above-mentioned voice request, when the preset word is "ground", the "ground" is the 8 th word in the sentence, and can be noted asc 8 . The first TYPE vocabulary includes all the vocabularies beginning with the "ground" word, i.e. "POI TYPE: underground parking lot ", its serial number is 4. The first vocabulary sequence is:
Figure SMS_10
……(9)
the second type of vocabulary includes a vocabulary of "south" words in the middle of a sentence. Namely, "PARK: guangzhou south station underground parking lot ", its serial number is 3. The second vocabulary sequence is:
Figure SMS_11
……(10)
the vocabulary sequence generating process can also select words except for the "Guangdong", "nan" and "Di" words as examples, and the specific selection is based on the calculation cost and the representativeness of the words. If the first vocabulary sequence and the second vocabulary sequence corresponding to the "south" of the "station" are completely consistent, in order to save the calculation cost, the "station" may not be calculated after the vocabulary sequence corresponding to the "south" is obtained.
In particular, in other examples, if there are two or more vocabularies of the same type in a voice request, for example, the first vocabulary and the second vocabulary are the same in order, the sequence numbers are all 1. For example, when in wordsc 4 The first vocabulary sequence of the named entity is as follows when the initial vocabulary of named entity has two DISTRICTs with the serial number of 1 and PARK with the serial number of 2:
Figure SMS_12
……(11)
when two or more vocabularies of the same type exist in the voice request, the process of obtaining the second vocabulary sequence through the second type vocabulary is basically consistent with the process of obtaining the first vocabulary sequence, and the details are not repeated here.
Therefore, the sequence numbers of the preset word list where each word in the voice request of the user is located can be combined to form the corresponding word list sequence, so that word embedding of each word in the voice request is calculated by combining word list embedding representation, and finally word embedding characteristics are obtained.
Referring to fig. 7, step 023 includes:
0231: calculating word list embedded representations corresponding to elements in the first word list sequence to obtain first word embedded of each word in the user voice request;
0232: performing calculation processing on the vocabulary embedded representations corresponding to the elements in the second vocabulary sequence to obtain second word embedded of each word in the user voice request;
0233: and splicing the first word embedding and the second word embedding of each word in the user voice request to obtain word embedding characteristics of the corresponding word in the user voice request.
The processor is used for carrying out calculation processing on the vocabulary embedded representations corresponding to the elements in the first vocabulary sequence to obtain first word embedded of each word in the user voice request, carrying out calculation processing on the vocabulary embedded representations corresponding to the elements in the second vocabulary sequence to obtain second word embedded of each word in the user voice request, and splicing the first word embedded and the second word embedded of each word in the user voice request to obtain word embedded characteristics of the corresponding word in the user voice request.
Specifically, the first word embedding and the second word embedding may be obtained by calculating the corresponding vocabulary embedded representations according to the first vocabulary sequence and the second vocabulary sequence, respectively.
In one example, the user voice request is "navigate to Guangzhou south station underground parking garage". When the preset word is "wide", the word "wide" is the 4 th word in the sentence and can be recorded asc 4 . The first vocabulary sequence of "broad" is:B(c 4 ) = [1, 2, 3]the second vocabulary sequence is:I(c 4 ) = [0, 0, 0]。
can be aligned withE gaz Performing a calculation process, generally selectableE gaz Average value method. The first word embedding calculation process is as follows:
Figure SMS_13
……(12)
parameters (parameters)tWord list embedded representation for participating in average calculationE gaz (x) Is a total number of (a) in the number of (a).
In the above example, the Chinese characters are "widely" substitutedc 4 And the corresponding first vocabulary is sequenced to obtain:
Figure SMS_14
……(13)
the second word embedding calculation process is as follows:
Figure SMS_15
……(14)
in the above example, the Chinese characters are "widely" substitutedc 4 And the corresponding first vocabulary is sequenced to obtain:
Figure SMS_16
……(15)
in other examples, the user voice request may be "navigate to southern Guangzhou", and when the preset word is "broad", the corresponding first vocabulary sequence is:B(c 4 ) = [1, 2]the second vocabulary sequence is:I(c 4 ) = [0, 0]. Because the first vocabulary sequence and the second vocabulary sequence are different than the "wide" in the voice request "navigation to the south station underground parking lot in Guangzhou", the calculation results of the corresponding first word embedding and second word embedding will also be different. The first word insert and the second word insert may collectively represent characteristics of a word in different named entities, particularly distinguishing boundary information in a composite named entity formed by combining multiple named entities, e.g., determining whether a named entity is "guangzhou", "guangzhou south station", or "guangzhou south station underground parking".
And splicing the first word embedding and the second word embedding of each word in the voice request to obtain the word embedding characteristics of the corresponding word in the voice request. The word embedding comprises related information of the first word embedding and the second word embedding, and can represent the characteristics of a certain word in different named entities. That is, when a word may appear in the same location in different named entities, the specific named entity corresponding to the word in the voice request may be determined by word embedding of the word at this time.
Therefore, word embedding characteristics can be obtained through calculation processing according to word list embedding representation corresponding to elements in word list sequences of each word in the voice request, so that the specific named entity where the word is located is judged, and accuracy of a named entity identification process is improved.
Referring to fig. 8, step 03 includes:
031: and extracting character embedding characteristics of each character in the user voice request according to the character embedding vocabulary.
The processor is used for extracting the character embedding characteristics of each character in the user voice request according to the character embedding vocabulary.
Specifically, in the past natural language understanding process, each time a word is recognized and understood, the word and its word-embedded representation information may be added to a corresponding vocabulary, which is referred to as a word-embedded vocabulary. When the natural language understanding model needs to recognize the word again, the word embedding feature of the word can be extracted from the word embedding word list.
Word embedding feature availabilityE C (c i ) The representation is made of a combination of a first and a second color,E C (c i ) Is a multidimensional vector. In the self-helpHowever, under the condition that the training times of the language model are certain, the corresponding relation between the words and the word embedding characteristics is unique, namely, the word embedding word list enables the natural language understanding model to have the capability of identifying specific words.
Thus, the character embedding characteristics of each character in the user voice request can be extracted from the existing character embedding word list, so that the natural language understanding model has the capability of recognizing specific characters.
Referring to fig. 9, step 04 includes:
041: splicing word embedding characteristics and word embedding characteristics corresponding to each word in the user voice request to obtain splicing characteristics of each word in the user voice request;
042: and splicing the splicing characteristics of each word in the user voice request in sequence to obtain the splicing characteristics of the user voice request.
The processor is used for splicing word embedding characteristics and word embedding characteristics corresponding to each word in the user voice request to obtain splicing characteristics of each word in the user voice request, and splicing characteristics of each word in the user voice request in sequence to obtain splicing characteristics of the user voice request.
Specifically, a word may be embedded into a feature using a stitching function "concat ()"E C (c i ) And word characteristicsE B (c i ) AndE I (c i ) And sequentially connecting to obtain the splicing characteristics of each word in the voice request of the user. Splice feature useE(c i ) Expressed as "concatE C (c i ),E B (c i ),E I (c i )) "。
Therefore, the word embedding characteristics and the word embedding characteristics corresponding to each word in the user request can be spliced in sequence, so that the splicing characteristics of the user voice request are obtained, the splicing characteristics are used as the input of the named entity recognition model, and the corresponding named entity is finally recognized according to the splicing characteristics.
Referring to fig. 10, step 05 includes:
051: taking the spliced features of the user voice request as the input of a named entity recognition model to carry out named entity recognition;
052: and carrying out natural language understanding according to the result of the named entity recognition to complete voice interaction.
The processor is used for carrying out named entity recognition by taking the spliced characteristic of the user voice request as the input of the named entity recognition model, and carrying out natural language understanding according to the result of the named entity recognition so as to complete voice interaction.
Specifically, after the splicing features of the user voice request are obtained, the splicing features can be used as the input of the recognition model of the named entity to recognize the named entity.
Because the corresponding relation between the words and the word embedding characteristics is unique under the condition that the training times of the natural language model are certain. Moreover, word embedding features of the words are also in one-to-one correspondence with named entities. Therefore, the splicing features can enhance the capability of the natural language understanding model for identifying the named entities, and can perform natural language understanding according to the identification result of the named entities, thereby finally completing voice interaction.
Therefore, the named entity recognition can be carried out according to the splicing result of the user voice request, the named entity recognition capability is improved, the named entity recognition in the natural language understanding process is more accurate, and the interactive experience of the user is improved.
The computer readable storage medium of the present application stores a computer program which, when executed by one or more processors, implements the methods described above.
In the description of the present specification, reference to the terms "above," "specifically," and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable requests for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present application.

Claims (9)

1. A method of voice interaction, comprising:
receiving a user voice request in a vehicle cabin;
initializing a vocabulary embedded representation of a preset vocabulary;
respectively determining the preset word list comprising each word in the user voice request to generate a word list sequence;
constructing word embedding of each word in the user voice request according to the word list embedding representation and the word list sequence to obtain word embedding characteristics of each word in the user voice request;
performing word embedding extraction on the user voice request to obtain word embedding characteristics;
splicing the word embedding feature and the word embedding feature to obtain a splicing feature of the user voice request;
and carrying out named entity recognition according to the splicing characteristics of the user voice request so as to complete voice interaction according to the recognition result of the named entity.
2. The voice interaction method according to claim 1, wherein before the step of word embedding and extracting the user voice request to obtain word embedding characteristics, the voice interaction method comprises:
constructing a first type word list and a second type word list according to the service type, wherein the initial position of the first type word list is a preset word, the middle position of the second type word list is a preset word, and the preset word list comprises the first type word list and the second type word list.
3. The voice interaction method of claim 2, wherein the determining the preset vocabulary including each word in the user voice request to generate a vocabulary sequence comprises:
determining a first type vocabulary comprising each word in the user voice request, and generating a first vocabulary sequence according to the sequence number of the first type vocabulary;
and determining a second type vocabulary comprising each word in the user voice request, and generating a second vocabulary sequence according to the sequence number of the second type vocabulary.
4. The method of claim 3, wherein said constructing a word insert for each word in said user voice request based on said vocabulary embedded representation and said vocabulary sequence to obtain said word insert characteristics for each word in said user voice request comprises:
calculating the vocabulary embedded representation corresponding to the elements in the first vocabulary sequence to obtain first word embedded of each word in the user voice request;
calculating the vocabulary embedded representation corresponding to the elements in the second vocabulary sequence to obtain second word embedded of each word in the user voice request;
and splicing the first word embedding and the second word embedding of each word in the user voice request to obtain the word embedding characteristics of the corresponding word in the user voice request.
5. The voice interaction method according to claim 1, wherein the word embedding extraction of the user voice request includes:
and extracting the character embedding characteristics of each character in the user voice request according to a character embedding word list.
6. The voice interaction method of claim 5, wherein the concatenating the word embedding feature and the word embedding feature results in a concatenated feature of the user voice request, comprising:
splicing the word embedding characteristics and the word embedding characteristics corresponding to each word in the user voice request to obtain splicing characteristics of each word in the user voice request;
and splicing the splicing characteristics of each word in the user voice request in sequence to obtain the splicing characteristics of the user voice request.
7. The voice interaction method according to claim 1, wherein the performing named entity recognition according to the concatenation feature of the user voice request to complete voice interaction according to a result of the named entity recognition includes:
taking the spliced features of the user voice request as the input of a named entity recognition model to carry out named entity recognition;
and carrying out natural language understanding according to the result of the named entity recognition to complete voice interaction.
8. A vehicle comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, implements the method of any of claims 1-7.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by one or more processors, implements the method according to any of claims 1-7.
CN202310229334.5A 2023-03-09 2023-03-09 Voice interaction method, vehicle and computer readable storage medium Active CN115938365B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310229334.5A CN115938365B (en) 2023-03-09 2023-03-09 Voice interaction method, vehicle and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310229334.5A CN115938365B (en) 2023-03-09 2023-03-09 Voice interaction method, vehicle and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN115938365A CN115938365A (en) 2023-04-07
CN115938365B true CN115938365B (en) 2023-06-30

Family

ID=86652739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310229334.5A Active CN115938365B (en) 2023-03-09 2023-03-09 Voice interaction method, vehicle and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115938365B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109741732B (en) * 2018-08-30 2022-06-21 京东方科技集团股份有限公司 Named entity recognition method, named entity recognition device, equipment and medium
US10867132B2 (en) * 2019-03-29 2020-12-15 Microsoft Technology Licensing, Llc Ontology entity type detection from tokenized utterance
CN110570853A (en) * 2019-08-12 2019-12-13 阿里巴巴集团控股有限公司 Intention recognition method and device based on voice data
CN111783462B (en) * 2020-06-30 2023-07-04 大连民族大学 Chinese named entity recognition model and method based on double neural network fusion
CN112632999A (en) * 2020-12-18 2021-04-09 北京百度网讯科技有限公司 Named entity recognition model obtaining method, named entity recognition device and named entity recognition medium
CN112668337B (en) * 2020-12-23 2022-08-19 广州橙行智动汽车科技有限公司 Voice instruction classification method and device
CN113836929A (en) * 2021-09-28 2021-12-24 平安科技(深圳)有限公司 Named entity recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115938365A (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN105070288B (en) Vehicle-mounted voice instruction identification method and device
CN106601237B (en) Interactive voice response system and voice recognition method thereof
CN107656996B (en) Man-machine interaction method and device based on artificial intelligence
CN115064167B (en) Voice interaction method, server and storage medium
CN110309277B (en) Man-machine conversation semantic analysis method and system, vehicle-mounted man-machine conversation method and system, controller and storage medium
CN110276023A (en) POI changes event discovery method, apparatus, calculates equipment and medium
CN111292751B (en) Semantic analysis method and device, voice interaction method and device, and electronic equipment
CN105719648B (en) personalized unmanned vehicle interaction method and unmanned vehicle
CN110472029A (en) A kind of data processing method, device and computer readable storage medium
CN111737990B (en) Word slot filling method, device, equipment and storage medium
CN114495916B (en) Method, device, equipment and storage medium for determining insertion time point of background music
CN115938365B (en) Voice interaction method, vehicle and computer readable storage medium
CN111831929B (en) Method and device for acquiring POI information
CN112580368A (en) Method, device, equipment and storage medium for recognizing intention sequence of conversation text
CN115064170B (en) Voice interaction method, server and storage medium
CN105740374A (en) Distributed memory based three-dimensional platform data fuzzy query method
CN116304014A (en) Method for training entity type recognition model, entity type recognition method and device
CN115689603A (en) User feedback information collection method and device and user feedback system
CN108597503A (en) Testing material generation method, device, equipment and read-write storage medium
CN117892140B (en) Visual question and answer and model training method and device thereof, electronic equipment and storage medium
CN112966119B (en) Information acquisition method, equipment and medium
CN117078942B (en) Context-aware refereed image segmentation method, system, device and storage medium
CN113990322B (en) Voice interaction method, server, voice interaction system and medium
CN115620722B (en) Voice interaction method, server and computer readable storage medium
CN116796290B (en) Dialog intention recognition method, system, computer and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant