WO2024078460A1 - 语音处理方法、语音交互方法、服务器及存储介质 - Google Patents

语音处理方法、语音交互方法、服务器及存储介质 Download PDF

Info

Publication number
WO2024078460A1
WO2024078460A1 PCT/CN2023/123601 CN2023123601W WO2024078460A1 WO 2024078460 A1 WO2024078460 A1 WO 2024078460A1 CN 2023123601 W CN2023123601 W CN 2023123601W WO 2024078460 A1 WO2024078460 A1 WO 2024078460A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
zone
rejection
rejection mode
label
Prior art date
Application number
PCT/CN2023/123601
Other languages
English (en)
French (fr)
Inventor
韩传宇
李东恒
易晖
翁志伟
王天一
Original Assignee
广州小鹏汽车科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州小鹏汽车科技有限公司 filed Critical 广州小鹏汽车科技有限公司
Publication of WO2024078460A1 publication Critical patent/WO2024078460A1/zh

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • B60R16/037Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements for occupant comfort, e.g. for automatic adjustment of appliances according to personal settings, e.g. seats, mirrors, steering wheel
    • B60R16/0373Voice control

Definitions

  • the present application relates to the field of speech technology, and in particular to a speech processing method, a speech interaction method, a server and a computer-readable storage medium.
  • vehicles can support voice control services, such as voice control of window opening, etc.
  • voice control services such as voice control of window opening, etc.
  • users may speak from multiple sound zones in the car, and not all the voices are requests to the vehicle system. This requires the vehicle voice processor to reject useless information from all voices, extract voice requests for itself and respond.
  • the rejection processing of voice requests can usually only be applied to single-tone zone scenarios.
  • the rejection processing of voice requests can usually only be applied to single-tone zone scenarios.
  • the present application provides a speech processing method, a speech interaction method, a server and a computer-readable storage medium.
  • the speech processing method of the present application comprises:
  • the rejection mode of the corresponding voice zone is updated according to the user voice request and the dialogue voice zone information to determine the rejection mode of each voice zone.
  • the vehicle cabin is divided into multiple audio zones, and upon receiving a voice request, the rejection mode corresponding to each audio zone is confirmed according to the voice request and its voice request, thereby meeting the rejection requirements for multi-audio zone voice interaction in the vehicle cabin.
  • the rejection mode of each audio zone will be updated, so that in the multi-audio zone interaction scenario, the rejection accuracy of the voice request is higher and the user experience is better.
  • the determining, according to the wake-up sound zone information, an initial rejection mode of each sound zone in the plurality of sound zones in the vehicle cabin includes:
  • the initial rejection mode of each sound zone in the vehicle cabin except the wake-up sound zone is a second rejection mode, and the second rejection mode has a higher degree of rejection of voice requests than the first rejection mode.
  • the initial rejection mode of each sound zone can be confirmed according to the wake-up sound zone information.
  • the initial rejection mode of the wake-up sound zone is the first rejection mode
  • the initial rejection mode of the non-wake-up sound zone is the second rejection mode with a higher degree of rejection.
  • the updating of the rejection mode of the corresponding sound zone according to the user voice request and the dialogue sound zone information to determine the rejection mode of each sound zone includes:
  • the rejection mode for the conversation voice zone is the first rejection mode and the user voice request is a non-vehicle interaction voice request, the rejection mode for the conversation voice zone is updated to the second rejection mode.
  • the rejection mode of a certain dialogue voice zone is the first rejection mode
  • the voice request of the voice zone is a non-vehicle interaction voice request
  • the rejection mode of the voice zone is updated to the second rejection mode
  • the updating of the rejection mode of the corresponding sound zone according to the user voice request and the dialogue sound zone information to determine the rejection mode of each sound zone includes:
  • the rejection mode of the corresponding voice zone is updated to the second rejection mode.
  • the rejection mode of a certain dialogue voice zone is the first rejection mode, but the voice zone does not receive a valid voice request within the preset time, then it can be considered that the voice zone has no real interaction intention for the time being, and the rejection mode of the voice zone is updated to the second rejection mode.
  • the updating of the rejection mode of the corresponding sound zone according to the user voice request and the dialogue sound zone information to determine the rejection mode of each sound zone includes:
  • the rejection mode of the conversation voice zone is the second rejection mode according to the conversation voice zone information
  • the rejection mode of the conversation voice zone is updated to the first rejection mode
  • the rejection mode of a certain dialogue voice zone is the second rejection mode, but the voice zone receives a valid voice request within the preset time length, then it can be considered that there is a real interaction intention in the voice zone, and the rejection mode of the voice zone can be updated to the first rejection mode, that is, a rejection mode with a lower degree of rejection.
  • the speech processing method comprises:
  • the method further comprises:
  • the voice request is processed according to the rejection mode of the dialogue voice zone, the speaker tag and the intention classification tag to obtain a rejection result.
  • the user's voice request is calibrated by the speaking object label and the intention classification label, and the rejection result of the voice request is determined by combining the rejection mode of the sound zone where the voice request is located, that is, whether it is clear and recallable or filtered as noise.
  • the processing of the voice request according to the rejection mode of the dialogue voice zone, the speaker tag and the intention classification tag to obtain a rejection result includes:
  • the rejection mode of the dialogue voice zone is the first rejection mode
  • the speaking object label is a voice assistant label and the intention classification label is a first-level label or a second-level label
  • the rejection result obtained by processing the user voice request is a clear result
  • the speaking object label is a non-voice assistant label and the intention classification label is a third-level label
  • the user voice request is processed to obtain the rejection result as a noise result
  • the intention classification label represents the effectiveness of the user voice request, wherein the first-level label is greater than the second-level label and the second-level label is greater than the third-level label.
  • the rejection result is confirmed to be a clear result; for the voice request whose speaking object label is not a voice assistant class label and whose intention classification label is a third-level label, the rejection result is confirmed to be a noise result.
  • the step of processing the voice request according to the rejection mode, the speaker tag, and the intention classification tag to obtain a rejection result includes:
  • the rejection mode of the dialogue voice zone is the second rejection mode
  • the speaking object label is a voice assistant label and the intention classification label is a first-level label
  • the rejection result obtained by processing the user voice request is a clear result
  • the speaking object label is a non-voice assistant label and the intention classification label is a second-level label or a third-level label
  • the user voice request is processed to obtain the rejection result as a noise result.
  • the rejection result is confirmed to be a clear result
  • voice requests whose labels are not voice assistant labels and whose intent classification labels are second-level labels or third-level labels the rejection result is confirmed to be a noise result.
  • the second rejection mode has a stricter rejection degree for labels with second-level intent classification labels.
  • the voice interaction method of the present application includes:
  • the rejection result is sent to the vehicle to complete the voice interaction.
  • the vehicle cabin is divided into multiple audio zones, and upon receiving a voice request, the rejection mode corresponding to each audio zone is confirmed according to the voice request and its voice request, thereby meeting the rejection requirements for multi-audio zone voice interaction in the vehicle cabin.
  • the rejection mode of each audio zone will be updated, so that in the multi-audio zone interaction scenario, the rejection accuracy of the voice request is higher and the user experience is better.
  • the server of the present application includes a processor and a memory, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the above method is implemented.
  • the computer-readable storage medium of the present application stores a computer program, and when the computer program is executed by one or more processors, the above method is implemented.
  • FIG1 is a flow chart of a speech processing method of the present invention.
  • FIG2 is a schematic diagram of a vehicle cockpit of the present application.
  • FIG3 is a schematic diagram of one of the states of the speech processing method of the present application.
  • FIG4 is a second state diagram of the speech processing method of the present application.
  • FIG5 is a third state diagram of the speech processing method of the present application.
  • FIG6 is a fourth state diagram of the speech processing method of the present application.
  • FIG7 is a fifth state diagram of the speech processing method of the present application.
  • FIG8 is a sixth state diagram of the speech processing method of the present application.
  • FIG9 is a second flow chart of the speech processing method of the present application.
  • FIG10 is a seventh state diagram of the speech processing method of the present application.
  • FIG11 is a state diagram of the speech processing method of the present application.
  • FIG12 is a flow chart of the voice interaction method of the present application.
  • the present application provides a speech processing method, including:
  • the present application also provides a server, which includes a memory and a processor.
  • the speech processing method of the present application can be implemented by the server of the present application.
  • a computer program is stored in the memory, and the processor is used to receive the wake-up sound zone information forwarded by the vehicle for the user to wake up the vehicle voice function in the vehicle cabin, and to determine the initial rejection mode of each of the multiple sound zones in the vehicle cabin according to the wake-up sound zone information, and to receive the user voice request forwarded by the vehicle after the vehicle voice function is awakened and the dialogue sound zone information confirmed according to the user voice request, and to update the rejection mode of the corresponding sound zone according to the user voice request and the dialogue sound zone information to determine the rejection mode of each sound zone.
  • the voice assistant of the vehicle-mounted system provides many conveniences for users in the cockpit, and users can control the software or vehicle components in the cockpit through voice interaction.
  • the voice assistant can support continuous dialogue, that is, after one wake-up, the user and the voice assistant can have multiple rounds of dialogue similar to natural language communication until the end of the dialogue, without having to perform a wake-up operation every time they interact with the voice assistant.
  • some related technologies only provide the main driver with the authority to conduct voice interaction, that is, only the main driver can conduct voice interaction in the cockpit, and users in other seats can only convey the main driver's wishes if they want to realize related functions. However, this may distract the main driver and affect driving safety.
  • the voice assistant may be faced with receiving conversations between different users and the voice assistant, conversations between different users, etc. How to process the received voice requests as accurately as possible without limiting the interaction environment, and determine which voice requests need to be fed back so as to better serve users will determine the user experience of voice interaction.
  • the vehicle voice wake-up function is to wake up the vehicle's voice assistant.
  • the wake-up voice request can be a wake-up word set by the manufacturer or customized by the user.
  • the voice assistant is woken up, the user in the cabin can have multiple consecutive conversations with the voice assistant. The conversation ends when the conversation reaches the set round threshold or when no voice request from the user is received within the predetermined time.
  • the cockpit is divided into different sound zones according to the areas where the user may make sounds. Please refer to FIG. 2.
  • the vehicle cockpit can be divided into five sound zones including the main driver's sound zone 101, the co-driver's sound zone 102, the left side of the rear row, i.e., the left rear sound zone 103, the middle of the rear row, i.e., the middle sound zone 104, and the right side of the rear row, i.e., the right rear sound zone 105.
  • Multiple voice pickup devices can be provided in the cockpit, so as to determine the sound zone position information of the user who made the voice request according to the acquired state information of the voice request.
  • the wake-up audio zone is the audio zone where the user who issued the wake-up voice request is located. For example, if the driver wakes up the voice assistant, then the wake-up audio zone is the driver's audio zone.
  • the wake-up audio zone information is the audio zone location information corresponding to the wake-up audio zone.
  • the conversation audio zone is the audio zone where the voice assistant obtains the location of the user who is performing voice interaction.
  • the audio zone where the conversation is in progress is the conversation audio zone.
  • the main driver user and the co-driver user interact with the voice assistant successively.
  • the voice requests issued by the main driver user and the co-driver user are successively obtained by the voice assistant, and the audio zones where the main driver user and the co-driver user are located belong to the conversation audio zone.
  • the conversation audio zone and the awakening audio zone can be the same or different.
  • Rejection processing is used to identify during the interaction which of the user's voice requests are directed to the voice assistant, and recall and execute them, and which are not directed to the voice assistant and are filtered out as noise.
  • rejection modes are provided, and different rejection modes are based on the annotation of voice requests for recall or rejection. In different rejection modes, different rejection results may occur for the same voice request. The details are expanded below.
  • a state machine is introduced, which is used to record the rejection mode of each sound zone during the voice interaction process. And the state machine is continuously updated according to the corresponding sound zone information received and the user's voice request.
  • the user's voice request has a certain randomness.
  • the rejection mode of each sound zone needs to be updated with the progress of the voice interaction, so as to ensure that every voice request with a clear interaction intention with the voice assistant can be accurately recognized, and other interactions not with the voice assistant can be accurately rejected.
  • the vehicle cabin is divided into multiple audio zones, and upon receiving a voice request, the rejection mode corresponding to each audio zone is confirmed according to the voice request and its voice request, thereby meeting the rejection requirements for multi-audio zone voice interaction in the vehicle cabin.
  • the rejection mode of each audio zone will be updated, so that in the multi-audio zone interaction scenario, it has a higher accuracy of voice request rejection and a better user experience.
  • step 02 includes:
  • the processor is used to determine that the initial rejection mode of the wake-up sound zone in the vehicle cabin is the first rejection mode according to the wake-up sound zone information, and is used to determine that the initial rejection mode of each sound zone in the vehicle cabin except the wake-up sound zone is the second rejection mode.
  • two rejection modes with different degrees of rejection are provided, namely a first rejection mode and a second rejection mode, wherein the second rejection mode has a higher degree of rejection of voice requests than the first rejection mode.
  • different rejection modes are adopted, and the rejection results are also different. For example, for the voice request "Will it rain tomorrow?", the voice request may not be clear in intent, has certain ambiguity, and is relatively non-standard in expression.
  • the first rejection mode is adopted, it can be recalled to confirm the intention to query the weather, and if the second rejection mode is adopted, it will be directly rejected.
  • an initial rejection mode will be configured for each audio zone in each cabin, and subsequent rejection mode updates will be performed based on the initial rejection mode. It can be understood that, in general, users who wake up the voice assistant usually have a strong intention to interact. Therefore, the initial rejection mode of the awakened audio zone is set to the first rejection mode, and the initial rejection mode of other audio zones is set to the second rejection mode to avoid other audio zones from interfering with the interaction of the first audio zone.
  • the rejection mode of the main driver's voice zone 101 will be set to the first rejection mode.
  • the rejection modes of other voice zones in the cockpit such as the passenger voice zone 102, the left rear voice zone 103, the middle voice zone 104, and the right rear voice zone 105 in the previous example, will be set to the second rejection mode.
  • the initial rejection mode of each sound zone can be confirmed according to the wake-up sound zone information.
  • the initial rejection mode of the wake-up sound zone is the first rejection mode
  • the initial rejection mode of the non-wake-up sound zone is the second rejection mode with a higher degree of rejection.
  • step 04 includes:
  • the rejection mode of the dialogue voice zone is updated to the second rejection mode.
  • the processor is used for updating the rejection mode of the dialogue voice zone to the second rejection mode when it is confirmed that the rejection mode of the dialogue voice zone is the first rejection mode according to the dialogue voice zone information and the user voice request is a non-vehicle interaction voice request.
  • the rejection mode of the dialogue voice zone can be confirmed according to the dialogue voice zone information. For example, if the dialogue voice zone is a wake-up voice zone, then the rejection mode of the dialogue voice zone is confirmed to be the first rejection mode.
  • the user voice request is a non-vehicle interaction voice request, for example, the acquired voice request is "Hello, who are you?", it can be confirmed that the user is making a phone call.
  • the acquired user request is "I don't know,” it can be confirmed that the user is currently chatting.
  • Voice requests like this can be considered as non-vehicle interaction voice requests. In this case, it can be considered that the user in the voice zone has no real intention of interaction for the time being, and the rejection mode of the voice zone can be updated to the second rejection mode to perform a higher degree of rejection.
  • the main driver user wakes up the vehicle voice assistant, and the main driver voice zone 101 is set to the first rejection mode.
  • the voice request is a non-vehicle interaction voice request.
  • the rejection mode of the main driver voice zone 101 is updated to the second rejection mode, that is, it is determined that the subsequent main driver voice zone 101 has no clear interaction intention for the time being, and the rejection degree is increased to prevent voice requests with low interaction intention from being missed.
  • the rejection mode of a certain dialogue voice zone is the first rejection mode
  • the voice request of the voice zone is a non-vehicle interaction voice request
  • the rejection mode of the voice zone is updated to the second rejection mode
  • step 04 includes:
  • the rejection mode of the corresponding audio zone is updated to the second rejection mode.
  • the processor is used to update the rejection mode of the corresponding sound zone to the second rejection mode when the sound zone of the vehicle cabin rejection mode is the first rejection mode and fails to obtain a valid voice request within a first preset time period.
  • the rejection mode of the dialogue voice zone can be confirmed based on the dialogue voice zone information. For example, if the dialogue voice zone is a wake-up voice zone, then the rejection mode of the dialogue voice zone is confirmed to be the first rejection mode, but if the voice zone does not obtain a valid voice request within a period of time. For example, the rejection mode of a certain voice zone is the first rejection mode, but no valid voice request is obtained within 20 seconds. In this case, it can be considered that the user of the voice zone has no real interaction intention for the time being, and the rejection mode of the voice zone can be updated to the second rejection mode for a higher degree of rejection. Among them, failure to obtain a valid voice request may mean that a voice request is not obtained or that a voice request is obtained, but the voice request is not related to vehicle interaction.
  • the first preset duration is a time limit for the interval of valid voice requests issued by the user, and can be set to an appropriate value according to actual conditions, such as 20s, 30s, 50s, 1min, etc. It can be understood that if the first preset duration is too short, the rejection mode of the voice zone will be frequently switched, while if it is too long, the false recall rate of the voice request may be high.
  • the first preset duration can be set to 20 seconds.
  • the main driver voice zone 101 is set to the first rejection mode. If no valid voice request is received from the main driver voice zone 101 within the first preset duration, that is, no voice request is received within 20 seconds or no voice request related to vehicle interaction is received, then the rejection mode of the main driver voice zone 101 is updated to the second rejection mode, that is, it is determined that the subsequent main driver voice zone 101 has no clear interaction intention for the time being, thereby increasing the rejection degree and preventing voice requests with low interaction intention from being missed.
  • the first rejection mode of the sound zone will continue to be maintained.
  • the rejection mode of a certain dialogue voice zone is the first rejection mode, but the voice zone does not receive a valid voice request within the preset time, then it can be considered that the voice zone has no real interaction intention for the time being, and the rejection mode of the voice zone is updated to the second rejection mode.
  • step 04 includes:
  • the rejection mode of the dialogue voice zone is the second rejection mode according to the dialogue voice zone information
  • the rejection mode of the dialogue voice zone is updated to the first rejection mode
  • the processor is used for updating the rejection mode of the conversation voice zone to the first rejection mode if it is determined that a valid voice request is executed in the conversation voice zone within a second preset time period according to the user voice request, when confirming that the rejection mode of the conversation voice zone is the second rejection mode according to the conversation voice zone information.
  • the effective voice request is executed, that is, the effective voice request is obtained, and the corresponding vehicle execution instruction is generated.
  • the rejection mode of the dialogue voice zone can be confirmed according to the dialogue voice zone information. For example, if the dialogue voice zone is a non-wake-up voice zone, then it can be confirmed that the initial rejection mode of the dialogue voice zone is the second rejection mode. If the voice zone receives a valid voice request within a period of time, or obtains a voice request related to vehicle interaction. For example, the rejection mode of a certain voice zone is the second rejection mode, and a valid voice request "open the window" is obtained within the second predetermined time period. In this case, it can be considered that the user of the voice zone has a real intention to interact, and the rejection mode of the voice zone can be updated to the first rejection mode to perform a lower degree of rejection.
  • the second preset duration is similar to the first preset duration, which is a limit on the interval time for the user to issue a valid voice request, and can be set to an appropriate value according to actual conditions, such as 20s, 30s, 50s, 1min, etc. It can be understood that if the first preset duration is too short, the rejection mode of the sound zone will be frequently switched, while if it is too long, the false recall rate of the voice request may be high.
  • the second preset duration can be set to 20 seconds
  • the main driving sound zone 101 is the wake-up sound zone
  • the left rear sound zone 103 is the non-wake-up sound zone
  • the initial rejection state is the second rejection mode. If the left rear sound zone 103 obtains a valid voice request and is executed within 20 seconds, the rejection mode of the left rear sound zone 103 is updated to the first rejection mode with a lower degree of rejection, that is, it is judged that the subsequent left rear sound zone 103 has a clearer interaction intention, reduces the degree of rejection, and prevents the voice request from being mistakenly rejected.
  • the rejection mode of a certain dialogue voice zone is the second rejection mode, but the voice zone receives a valid voice request within the preset time length, then it can be considered that there is a real interaction intention in the voice zone, and the rejection mode of the voice zone can be updated to the first rejection mode, that is, a rejection mode with a lower degree of rejection.
  • the speech processing method of the present application further includes:
  • the processor is used to exit the vehicle voice function if no user voice request is obtained within a third preset time period after the vehicle voice function is awakened.
  • each sound zone can be timed separately until the last sound zone fails to obtain the user's voice request within the third preset time length, exits the vehicle voice function, and waits for the next wake-up.
  • the third preset time is a limit for the time to exit the vehicle voice function, and an appropriate value can be set according to the actual situation, such as 100s, 120s, 150s, etc. It can be understood that if the third preset time is too short, the vehicle voice function will be frequently exited, affecting the user experience, while if it is set too long, there may be a long invalid working time, which increases the processing load.
  • the third preset time length can be set to 120 seconds. After the vehicle voice function is awakened, after multiple rounds of interaction, if each sound zone does not receive any voice request from the user within 120 seconds, the vehicle voice function is exited and waits for the next awakening.
  • the voice processing method further includes:
  • the voice request is processed according to the rejection pattern of the dialogue area, the speaker label and the intent classification label to obtain the rejection result.
  • the processor is used to process the user voice request to determine the speaking object label and the intention classification label of the user voice request; and to process the voice request according to the rejection mode of the dialogue voice zone, the speaking object label and the intention classification label to obtain the rejection result.
  • the speaking object label is used to mark whether the voice request issued by the user is issued to the voice assistant, and may include voice assistant type labels and non-voice assistant type labels.
  • Intent classification labels are used to characterize the effectiveness of the user's voice request's intention to interact with the vehicle. They can be divided into first-level labels, second-level labels, and third-level labels from high to low effectiveness.
  • each voice request of the user can be calibrated using these two tags, and further combined with the rejection mode of the corresponding sound zone determined in advance, the final rejection result, as well as recall or rejection can be obtained.
  • the user's voice request is calibrated through the speaking object label and the intention classification label, and the rejection result of the voice request is determined by combining the rejection mode of the sound zone where the voice request is located, that is, whether it is clear and recallable or filtered as noise.
  • Step 06 includes:
  • rejection mode for the dialogue area is the first rejection mode
  • the speaker label is a voice assistant label and the intent classification label is a first-level label or a second-level label
  • rejection result obtained by processing the user voice request is a clear result
  • the rejection result obtained by processing the user voice request is a noise result.
  • the processor is used to process the user voice request to obtain a rejection result as a clear result when the rejection mode for the conversation voice zone is the first rejection mode and if the speaking object label is a voice assistant type label and the intention classification label is a first level label or a second level label; and to process the user voice request to obtain a rejection result as a noise result when the speaking object label is a non-voice assistant type label and the intention classification label is a third level label.
  • the speaking object label is used to mark whether the voice request issued by the user is issued to the voice assistant, and may include, for example, “explicitly said to the voice assistant", “most likely said to the voice assistant”, “explicitly not said to the voice assistant”, “most likely not said to the voice assistant”, “unable to determine", "no speaker”, etc., among which the voice assistant class labels include “explicitly said to the voice assistant” and “most likely said to the voice assistant", and the non-voice assistant class labels include “explicitly not said to the voice assistant", “most likely not said to the voice assistant", “unable to determine” and "no speaker”.
  • the voice request For example, for the voice request "open the car window", it can be considered that the voice request is "most likely said to the voice assistant", and its speaking object label can be confirmed to be a voice assistant class label.
  • the voice request “Hahahaha” it can be considered that the voice request “is most likely not spoken to the voice assistant”, and its speaking object label can be confirmed as a non-voice assistant label.
  • the intent grading label is used to characterize the effectiveness of the user voice request, which may include: “strong effectiveness”, “weak effectiveness”, “no intention” and “unable to judge”, etc.
  • the labels can be divided according to the effectiveness of the user voice request: the first-level label “strong effectiveness”, the second-level label “weak effectiveness” and the third-level label "no intention or unable to judge”.
  • Weakly effective voice requests usually have unclear intent, may contain ambiguity, have irregular sentence structures, and are less relevant to vehicle functions. For example: Will it rain tomorrow?, Why is the battery out?, What song is this?, Turn up the volume, Air conditioning, etc.
  • Unintentional voice requests usually have unclear intentions, may be ambiguous, have random sentence structures, and are weakly related to or irrelevant to vehicle functions. For example: Whatever, our family, I can get a loan if I want to buy this car, please get out quickly, open the window, change speed.
  • the voice request For example, for the voice request "open the car window", it can be considered that the voice request is "most likely said to the voice assistant", and its speaker label can be confirmed to be a voice assistant class label. And the voice request is a strong and effective voice request, and its intent classification label can be confirmed to be a first-level label. If the voice zone is the first rejection mode, the rejection result is a clear result.
  • the voice request For another example, for the voice request "Hahahaha”, it can be considered that the voice request is "most likely not said to the voice assistant", and its speaker label can be confirmed as a non-voice assistant label.
  • the voice request is an unintended voice request, and its intent classification label can be confirmed as a third-level label. If the sound zone is the first rejection mode, the rejection result is a noise result.
  • the rejection result obtained by processing the user's voice request is a clear result, that is, the voice request is recalled.
  • the speaker label is a non-voice assistant label
  • the intent classification label is a third-level label
  • the rejection results are confirmed to be clear; for voice requests whose speaking object labels are not voice assistant class labels and whose intention classification labels are third-level labels, the rejection results are confirmed to be noise results.
  • Step 06 also includes:
  • rejection mode for the dialogue area is the second rejection mode, if the speaker label is a voice assistant label and the intent classification label is a first-level label, the rejection result obtained by processing the user voice request is a clear result;
  • the rejection result obtained by processing the user voice request is a noise result.
  • the processor is used to process the user voice request to obtain a rejection result as a clear result when the rejection mode for the conversation voice zone is the second rejection mode and if the speaking object label is a voice assistant type label and the intention classification label is a first-level label; and to process the user voice request to obtain a rejection result as a noise result when the speaking object label is a non-voice assistant type label and the intention classification label is a second-level label or a third-level label.
  • the rejection result obtained by processing the user's voice request is a clear result, that is, the voice request is recalled.
  • the speaker label is a non-voice assistant label
  • the intent classification label is a second-level label or a third-level label
  • the rejection result obtained by processing the user's voice request is a noise result, that is, the voice request is rejected.
  • the voice request For example, for the voice request "open the car window", it can be considered that the voice request is "most likely said to the voice assistant", and its speaker label can be confirmed to be a voice assistant class label. And the voice request is a strong and effective voice request, and its intent classification label can be confirmed to be a first-level label. If the voice zone is the second rejection mode, the rejection result is a clear result.
  • the voice request For another example, for the voice request "Hahahaha”, it can be considered that the voice request is "most likely not said to the voice assistant", and its speaker tag can be confirmed as a non-voice assistant tag.
  • the voice request is an unintended voice request, and its intent classification tag can be confirmed as a third-level tag. If the sound zone is in the second rejection mode, the rejection result is a noise result.
  • the rejection result is confirmed to be a clear result
  • voice requests with a non-voice assistant type tag and a second-level or third-level intent classification tag the rejection result is confirmed to be a noise result.
  • the second rejection mode is more stringent in rejecting tags with second-level intent classification tags.
  • Example 1 Please refer to Table 1.
  • the user of the main driving sound zone 101 wakes up the vehicle voice function.
  • the main driving sound zone 101 is confirmed as the wake-up sound zone, and the initial rejection mode is the first rejection mode.
  • the other sound zones are non-wake-up sound zones, and the initial rejection mode is the second rejection mode.
  • the user of the main driving sound zone 101 issued a voice request of "turn on the air conditioner".
  • the speaking object label of the voice request is the voice assistant class, and the intention classification label is the first-level label, and a clear rejection result is obtained.
  • the user of the main driving sound zone 101 issued a voice request of "20 degrees 3rd gear wind", the speaking object label of the voice request is the voice assistant class, and the intention classification label is the first-level label, and a clear rejection result is obtained.
  • the user of the left rear sound zone 103 issued a voice request of "a little low”, the speaking object label of the voice request is the non-voice assistant class, and the intention classification label is the second-level label, and a noise rejection result is obtained.
  • the user in the left rear audio zone 103 issues voice requests "the vehicle temperature should be higher” and "a little higher”, and the speaking object labels are both voice assistant types, and the intention classification label is a first-level label. Since there is a valid voice request executed within the preset time, the rejection mode of the left rear audio zone 103 will be updated to the first rejection mode, and a clear rejection result will be obtained.
  • Example 2 Please refer to Table 2.
  • the user of the left rear audio zone 103 wakes up the vehicle voice function.
  • the left rear audio zone 103 is confirmed as the wake-up audio zone, and the initial rejection mode is the first rejection mode.
  • the other audio zones are non-wake-up audio zones, and the initial rejection mode is the second rejection mode.
  • the user of the left rear audio zone 103 sends a voice request of "How is the weather today?"
  • the speaker object label of the voice request is the voice assistant class, and the intent classification label is the first-level label, and a clear rejection result is obtained.
  • the user of the left rear audio zone 103 sends a voice request of "What about tomorrow?"
  • the speaker object label of the voice request is the voice assistant class
  • the intent classification label is the first-level label
  • a clear rejection result is obtained.
  • the user in the left rear audio zone 103 and the user in the right rear audio zone start chatting, and the user in the left rear audio zone 103 issues a voice request "The weather is good, why don't we go hiking tomorrow?" Since there is a valid instruction executed in the left rear audio zone 103 within the preset time, the rejection mode of the left rear audio zone 103 remains in the first rejection mode, and the speaking object label of the voice request is a non-voice assistant class, and the intention classification label is a third-level label, and a noise rejection result is obtained.
  • the user in the right rear audio zone 105 issues a voice request "Sure", the speaking object label of the voice request is a non-voice assistant class, the intention classification label is a third-level label, and a noise rejection result is obtained.
  • the user in the left rear audio zone 103 issues a voice request "Do you want to go to the Badaling Great Wall?"
  • the speaking object label of the voice request is a non-voice assistant class, the intention classification label is a third-level label, and a noise rejection result is obtained.
  • the user in the right rear audio zone 105 issues a voice request "See how long it will take to get there", the speaking object label of the voice request is a non-voice assistant class, the intention classification label is a third-level label, and a noise rejection result is obtained.
  • the user in the left rear audio zone 103 sends a voice request "Help me navigate to the Badaling Great Wall". Since a valid instruction is executed in the left rear audio zone 103 within the preset time, the rejection mode of the left rear audio zone 103 remains in the first rejection mode, and the intention classification label of the voice request is determined to be the first level label, and a clear rejection result is obtained.
  • Example 3 Please refer to Table 3. After the user in the main driver's voice zone 101 wakes up the vehicle voice function, the main driver's voice zone 101 is confirmed as the wake-up voice zone, and the initial rejection mode is the first rejection mode. The other voice zones are non-wake-up voice zones, and the initial rejection mode is the second rejection mode. At this time, the user in the main driver's voice zone 101 starts to make a call and issues voice requests such as "Hello, hello”, “I'm going to work now”, “I'm on the way and haven't arrived yet". The speaking object labels of these voice requests are all non-voice assistant categories, and the intent classification label is determined to be a third-level label, and a noise rejection result is obtained.
  • the user in the co-pilot voice zone 102 issues a voice request "Turn down the volume a little", and the rejection mode of the co-pilot voice zone 102 is updated to the first rejection mode.
  • the speaking object label of the voice request is a voice assistant category, and the intent classification label is determined to be a first-level label, and a clear rejection result is obtained.
  • the left rear audio zone 103 sends a voice request "Turn off the music", and the rejection mode of the left rear audio zone 103 is updated to the first rejection mode.
  • the speaker label of the voice request is the voice assistant class, and the intent classification label is determined to be the first level label, and a clear rejection result is obtained.
  • the present application also provides a voice interaction method, including:
  • the voice interaction method of the present application can be implemented by the server of the present application, and the server includes a memory and a processor.
  • the voice interaction method of the present application can be implemented by the server of the present application.
  • a computer program is stored in the memory, and the processor is used to receive the wake-up sound zone information of the user in the vehicle cabin to wake up the vehicle voice function forwarded by the vehicle, and to determine the initial rejection mode of each sound zone in the multiple sound zones in the vehicle cabin according to the wake-up sound zone information, and to receive the user voice request forwarded by the vehicle after the vehicle voice function is awakened and the dialogue sound zone information confirmed according to the user voice request, and to update the rejection mode of the corresponding sound zone according to the user voice request and the dialogue sound zone information to determine the rejection mode of each sound zone, and to process the user voice request to obtain the speaking object label and the intention classification label after determining the rejection mode of each sound zone, and to process the voice request according to the rejection mode, the speaking object label and the intention classification label to obtain the rejection result,
  • the rejection result is sent to the vehicle, and the vehicle can execute the control instruction generated by the voice request or make no response to complete the voice interaction.
  • the vehicle cabin is divided into multiple audio zones, and upon receiving a voice request, the rejection mode corresponding to each audio zone is confirmed according to the voice request and its voice request, thereby meeting the rejection requirements for multi-audio zone voice interaction in the vehicle cabin.
  • the rejection mode of each audio zone will be updated, so that in the multi-audio zone interaction scenario, the rejection accuracy of the voice request is higher and the user experience is better.
  • the computer-readable storage medium of the present application stores a computer program, and when the computer program is executed by one or more processors, the above method is implemented.
  • Any process or method description in a flowchart or otherwise described herein may be understood to represent a module, fragment or portion of code that includes one or more executable requests for implementing specific logical functions or steps of a process, and the scope of some embodiments of the present application includes additional implementations in which functions may not be performed in the order shown or discussed, including performing functions in a substantially simultaneous manner or in the reverse order depending on the functions involved, which should be understood by technicians in the technical field to which the embodiments of the present application belong.

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Traffic Control Systems (AREA)
  • Navigation (AREA)

Abstract

一种语音处理方法,包括:接收的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息;根据唤醒音区信息确定车辆多音区座舱内每个音区初始的拒识模式;接收车辆转发的在车辆语音功能被唤醒后的用户语音请求以及根据用户语音请求确认的对话音区信息;根据用户语音请求和对话音区信息更新对应音区的拒识模式,以确定每个音区的拒识模式。

Description

语音处理方法、语音交互方法、服务器及存储介质
本申请要求于2022年10月13日申请的、申请号为202211255729.4的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音技术领域,特别涉及一种语音处理方法、语音交互方法、服务器及计算机可读存储介质。
背景技术
随着自动驾驶技术的发展,车辆可以支持语音控制服务,如语音控制车窗开启等。在实际用车场景中,用户可能从车内多个音区发出语音,且发出的语音并不都是对车载系统的请求,这就要求车载语音处理器能够在所有语音中拒绝识别无用信息,提取针对自己的语音请求并做出响应。
相关技术中,对于语音请求的拒识处理通常仅能够针对单音区场景,通过结合当前文本信息、自动语音识别技术、置信度表征语音特征等实现在单音区场景下对无关语音请求的拒识,无法满足对于车辆内多音区语音交互的需求。
技术问题
本申请提供了一种语音处理方法、语音交互方法、服务器及计算机可读存储介质。
技术解决方案
本申请的语音处理方法,包括:
接收车辆转发的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息;
根据所述唤醒音区信息确定所述车辆座舱内多个音区中每个音区初始的拒识模式;
接收所述车辆转发的在所述车辆语音功能被唤醒后的用户语音请求以及根据所述用户语音请求确认的对话音区信息;
根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式。
如此,本申请中,将车辆座舱划分为多个音区,针对接收到语音请求,根据语音请求及其语音请求来确认每个音区对应的拒识模式,从而能够满足车辆座舱内对于多音区语音交互的拒识需求。同时,伴随语音交互的进行,各个音区的拒识模式会进行更新,从而在多音区交互场景中,具有较高的语音请求拒识准确性,用户体验较佳。
所述根据所述唤醒音区信息确定所述车辆座舱内多个音区中每个音区初始的拒识模式,包括:
根据所述唤醒音区信息确定所述车辆座舱内唤醒音区初始的所述拒识模式为第一拒识模式;
确定所述车辆座舱内除所述唤醒音区外的各个音区初始的所述拒识模式为第二拒识模式,所述第二拒识模式对语音请求的拒识程度高于所述第一拒识模式。
如此,可根据唤醒音区信息确认各个音区的初始拒识模式,具体而言,唤醒音区初始的拒识模式为第一拒识模式,非唤醒音区初始的拒识模式为拒识程度更高的第二拒识模式。
所述根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式,包括:
若根据所述对话音区信息确认对话音区的拒识模式为所述第一拒识模式且所述用户语音请求为非车辆交互语音请求,则将所述对话音区的拒识模式更新为第二拒识模式。
如此,如果在交互过程中,某一对话音区的拒识模式为第一拒识模式,当该音区的语音请求为非车辆交互语音请求,那么可认为该音区暂时无真实交互意图,将该音区的拒识模式更新为第二拒识模式。
所述根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式,包括:
若所述车辆座舱拒识模式为所述第一拒识模式的音区在第一预设时长内未获取到有效语音请求,则将对应音区的拒识模式更新为所述第二拒识模式。
如此,如果在交互过程中,某一对话音区的拒识模式为第一拒识模式,但该音区在预设时长内未收到有效语音请求,那么可认为该音区暂时无真实交互意图,将该音区的拒识模式更新为第二拒识模式。
所述根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式,包括:
在根据所述对话音区信息确认对话音区的拒识模式为所述第二拒识模式的情况下,若根据所述用户语音请求确定所述对话音区在第二预设时长内存在有效语音请求被执行,则将所述对话音区的拒识模式更新为所述第一拒识模式。
如此,如果在交互过程中,某一对话音区的拒识模式为第二拒识模式,但该音区在预设时长内接收到有效语音请求,那么可认为该音区存在真实交互意图,可将该音区的拒识模式更新为第一拒识模式,也即是拒识程度较低的拒识模式。
所述语音处理方法包括:
在所述车辆语音功能被唤醒后的第三预设时长内未获取到用户语音请求的情况下,退出所述车辆语音功能。
如此,在预设时间内,如果座舱内用户都没有发出任何语音请求,暂时退出车辆语音功能,等待下一次唤醒。
所述方法还包括:
处理所述用户语音请求确定所述用户语音请求的说话对象标签和意图分级标签;
根据对话音区的拒识模式、所述说话对象标签和所述意图分级标签对所述语音请求进行处理得到拒识结果。
如此,通过说话对象标签和意图分级标签对用户语音请求进行标定,在结合该语音请求所在音区的拒识模式,确定语音请求的拒识结果,也即是清晰可召回或作为噪声过滤。
所述根据对话音区的拒识模式、所述说话对象标签和所述意图分级标签对所述语音请求进行处理得到拒识结果,包括:
在所述对话音区的拒识模式为第一拒识模式的情况下,若所述说话对象标签为语音助手类标签且所述意图分级标签为第一级标签或第二级标签,则对所述用户语音请求进行处理得到所述拒识结果为清晰结果;
若所述说话对象标签为非语音助手类标签且所述意图分级标签为第三级标签,则对所述用户语音请求进行处理得到所述拒识结果为噪声结果,所述意图分级标签表征所述用户语音请求的有效程度,其中所述第一级标签大于所述第二级标签且所述第二级标签大于所述第三级标签。
如此,在第一拒识模式下,对于说话对象标签为语音助手类标签且所述意图分级标签为第一级标签或第二级标签的语音请求,确认拒识结果为清晰结果,对于非语音助手类标签且所述意图分级标签为第三级标签的语音请求,确认拒识结果为噪声结果。
所述根据所述拒识模式、所述说话对象标签和所述意图分级标签对所述语音请求进行处理得到拒识结果,包括:
在所述对话音区的拒识模式为第二拒识模式的情况下,若所述说话对象标签是语音助手类标签且所述意图分级标签为第一级标签,则对所述用户语音请求进行处理得到所述拒识结果为清晰结果;
若所述说话对象标签为非语音助手类标签且所述意图分级标签为第二级标签或第三级标签,则对所述用户语音请求进行处理得到所述拒识结果为噪声结果。
如此,在第二拒识模式下,对于说话对象标签为语音助手类标签且所述意图分级标签为第一级标签的语音请求,确认拒识结果为清晰结果,对于非语音助手类标签且所述意图分级标签为第二级标签或第三级标签的语音请求,确认拒识结果为噪声结果。相对于第一拒识模式,第二拒识模式对于意图分级标签为第二级的标签拒识程度更为严格。
本申请的语音交互方法,包括:
接收车辆转发的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息;
根据所述唤醒音区信息确定所述车辆座舱内多个音区中每个音区初始的拒识模式;
接收所述车辆转发的在所述车辆语音功能被唤醒后的用户语音请求以及根据所述用户语音请求确认的对话音区信息;
根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式;
确定每个所述音区的拒识模式后,处理所述用户语音请求得到说话对象标签和意图分级标签;
根据所述拒识模式、所述说话对象标签和所述意图分级标签对所述语音请求进行处理得到拒识结果;
将所述拒识结果下发至所述车辆以完成语音交互。
如此,将车辆座舱划分为多个音区,针对接收到语音请求,根据语音请求及其语音请求来确认每个音区对应的拒识模式,从而能够满足车辆座舱内对于多音区语音交互的拒识需求。同时,伴随语音交互的进行,各个音区的拒识模式会进行更新,从而在多音区交互场景中,具有较高的语音请求拒识准确性,用户体验较佳。
本申请的服务器,包括处理器和存储器,所述存储器中存储有计算机程序,所述计算机程序被所述处理器执行时,实现上述的方法。
本申请的计算机可读存储介质,存储有计算机程序,当所述计算机程序被一个或多个处理器执行时,实现上述的方法。
有益效果
本申请的实施方式的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实施方式的实践了解到。
附图说明
本申请的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解,其中:
图1是本申请语音处理方法的流程示意图之一;
图2是本申请车辆座舱的示意图;
图3是本申请语音处理方法的状态示意图之一;
图4是本申请语音处理方法的状态示意图之二;
图5是本申请语音处理方法的状态示意图之三;
图6是本申请语音处理方法的状态示意图之四;
图7是本申请语音处理方法的状态示意图之五;
图8是本申请语音处理方法的状态示意图之六;
图9是本申请语音处理方法的流程示意图之二;
图10是本申请语音处理方法的状态示意图之七;
图11是本申请语音处理方法的状态示意图之八;
图12是本申请语音交互方法的流程示意图。
本发明的实施方式
下面详细描述本申请的实施方式,实施方式的示例在附图中示出,其中,相同或类似的标号自始至终表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请的实施方式,而不能理解为对本申请的实施方式的限制。
请参阅图1,本申请提供一种语音处理方法,包括:
01:接收车辆转发的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息;
02:根据唤醒音区信息确定车辆座舱内多个音区中每个音区初始的拒识模式;
03:接收车辆转发的在车辆语音功能被唤醒后的用户语音请求以及根据用户语音请求确认的对话音区信息;
04:根据用户语音请求和对话音区信息更新对应音区的拒识模式,以确定每个音区的拒识模式。
本申请还提供了一种服务器,服务器包括存储器和处理器。本申请的语音处理方法可以由本申请的服务器实现。具体地,存储器中存储有计算机程序,处理器用于接收车辆转发的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息,及用于根据唤醒音区信息确定车辆座舱内多个音区中每个音区初始的拒识模式,及用于接收车辆转发的在车辆语音功能被唤醒后的用户语音请求以及根据用户语音请求确认的对话音区信息,以及用于根据用户语音请求和对话音区信息更新对应音区的拒识模式,以确定每个音区的拒识模式。
具体地,车载系统的语音助手为座舱内的用户提供诸多便利,用户可以通过语音交互实现对软件或座舱内车辆零部件的控制。为了交互便利,语音助手可支持连续对话,也即是,在一次唤醒后,用户和语音助手可以进行类似自然语言交流中的多轮对话,直至对话结束,而不必在每次与语音助手交互时,都进行唤醒操作。而为了保证车辆行驶安全,部分相关技术中,仅向主驾用户提供进行语音交互的权限,即只有主驾用户可以在座舱内进行语音交互,而其他座位处的用户若希望实现相关功能就只能由主驾用户进行转达,然而如此,可能会导致主驾用户分心,从而影响驾驶安全。若开放权限给座舱内的全部用户,使得所有用户均可在语音助手被唤醒后进行对话,由于车内空间属于共享环境,语音助手可能会面临接收到来自不同用户与语音助手之间的对话,不同用户之间的对话等,如何在不限定交互环境的情况下,尽可能准确地对接收到的语音请求做出准确的处理,确定需要对哪些语音请求做出反馈,从而能够更好地为用户服务,将决定用户进行语音交互的使用体验。
可以理解,在多音区连续对话的场景中,也即是,在语音助手被唤醒后,支持座舱内不同位置处的用户共同与语音助手进行多轮对话的场景。多个用户可能围绕同一主题进行自由度较高的交互,这些交互中可能有些是与语音助手的交互,有些是用户之间的交互,相较于单一音区的情况更为复杂。
唤醒车辆语音功能也即是唤醒车辆的语音助手,唤醒语音请求可以是由厂商设定或用户自定义的唤醒词。在语音助手被唤醒后,座舱内用户可与语音助手进行连续多轮对话。在对话达到设定的轮次阈值,或在预定时间内没有接收到用户的语音请求等情况后,对话结束。
座舱内根据用户可能发声的区域划分为不同的音区,请参阅图2,以五座车辆100为例,车辆座舱内可划分为包括主驾音区101、副驾音区102、后排左侧即左后音区103、后排中间即中间音区104以及后排右侧即右后音区105等在内的5个音区。座舱内可设置有多个语音拾取装置,从而根据获取到的语音请求的状态信息判断发出语音请求的用户所在的音区位置信息。
唤醒音区也即是发出唤醒语音请求的用户所在的音区位置。如,主驾唤醒语音助手,那么唤醒音区就是主驾音区。唤醒音区信息也即是唤醒音区对应的音区位置信息。
对话音区也即是语音助手获取到的正在进行语音交互的用户所在的音区位置,正在进行对话的音区即为对话音区。如,在某一场景中,在语音助手被唤醒后,主驾用户与副驾用户先后与语音助手进行交互,则在该场景中,主驾用户和副驾用户发出的语音请求先后被语音助手获取,主驾用户和副驾用户所在音区都属于对话音区。对话音区与唤醒音区可以相同或不同。
拒识处理用于在交互过程中甄别出用户的语音请求哪些是对语音助手说的,将其进行召回并执行,哪些不是对语音助手说的,将其作为噪声过滤。
本申请中,提供多种拒识模式,不同的拒识模式基于对语音请求的标注进行召回或拒识,在不同拒识模式下,针对同一语音请求可能会有不同的拒识结果。具体在下文展开。
本申请中,引入状态机,状态机用于记录在语音交互过程中各个音区的拒识模式。并不断地根据接收到的对应音区信息和用户的语音请求进行状态机的更新。实际用车场景中,用户的语音请求具有一定的随机性,当语音助手被唤醒后,各音区的拒识模式需要跟随语音交互的进程更新,从而保证对每一个与语音助手存在明确交互意图的语音请求能够被准确识别,而对于其他非与语音助手的交互能够准确拒识。
综上所述,本申请中,将车辆座舱划分为多个音区,针对接收到语音请求,根据语音请求及其语音请求来确认每个音区对应的拒识模式,从而能够满足车辆座舱内对于多音区语音交互的拒识需求。同时,伴随语音交互的进行,各个音区的拒识模式会进行更新,从而在多音区交互场景中,具有较高的语音请求拒识准确性,用户体验较佳。
请参阅图3及图4,步骤02包括:
021:根据唤醒音区信息确定车辆座舱内唤醒音区初始的拒识模式为第一拒识模式;
022:确定车辆座舱内除唤醒音区外的各个音区初始的拒识模式为第二拒识模式。
处理器用于根据唤醒音区信息确定车辆座舱内唤醒音区初始的拒识模式为第一拒识模式,以及用于确定车辆座舱内除唤醒音区外的各个音区初始的拒识模式为第二拒识模式。
具体地,本申请中,提供两种拒识程度不同的拒识模式,即第一拒识模式和第二拒识模式,其中,第二拒识模式对语音请求的拒识程度高于第一拒识模式。对于同一语音请求而言,采用的拒识模式不同,拒识结果也不同。例如,对于语音请求“明天下不下雨”,该语音请求可能意图不够清楚、存在一定的歧义,表达也相对不够规范,但如采用第一拒识模式,则可将其召回,确认查询天气的意图,而如果采用第二拒识模式,则直接对其进行拒识处理。
在交互过程中,在语音助手唤醒后,会对各个座舱内各个音区配置一初始的拒识模式,并基于该初始拒识模式,进行后续的拒识模式更新。可以理解,一般而言,唤醒语音助手的用户通常具有较强的交互意图,因此,将唤醒音区初始的拒识模式设置为第一拒识模式,其他音区初始的拒识模式置为第二拒识模式,以避免其他音区可能对第一音区的交互造成干扰。
在一个示例中,若车辆语音助手被主驾音区101的用户唤醒,那么主驾音区101也即是确认为唤醒音区,主驾音区101的拒识模式将被置为第一拒识模式。座舱内其它音区,如前例中的副驾音区102、左后音区103、中间音区104、右后音区105的拒识模式将被置为第二拒识模式。
如此,可根据唤醒音区信息确认各个音区的初始拒识模式,具体而言,唤醒音区初始的拒识模式为第一拒识模式,非唤醒音区初始的拒识模式为拒识程度更高的第二拒识模式。
请参阅图3及图5,步骤04包括:
041:若根据对话音区信息确认对话音区的拒识模式为第一拒识模式且用户语音请求为非车辆交互语音请求,则将对话音区的拒识模式更新为第二拒识模式。
处理器用于在根据对话音区信息确认对话音区的拒识模式为第一拒识模式且用户语音请求为非车辆交互语音请求的情况下,将对话音区的拒识模式更新为第二拒识模式。
具体地,在交互过程中,可根据对话音区信息确认对话音区的拒识模式,例如,对话音区为唤醒音区,那么确认对话音区的拒识模式为第一拒识模式,但如果用户语音请求为非车辆交互语音请求,例如,获取到的语音请求为“喂你好哪位”,可确认该用户在打电话,又如,获取的用户请求为“不知道呀”,可确认用户当前在闲聊。类似这类语音请求可认为是非车辆交互语音请求。在这种情况下,可认为该音区用户暂时无真实交互意图,可将该音区的拒识模式更新为第二拒识模式,进行较高程度的拒识。
在一个示例中,主驾用户唤醒车辆语音助手户,主驾音区101被置为第一拒识模式,但根据获取到的主驾音区101的语音请求,确认语音请求为非车辆交互语音请求,那么将主驾音区101的拒识模式更新为第二拒识模式,即判断后续主驾音区101暂时无明确的交互意图,提高拒识程度,防止交互意图不高的语音请求被漏拒。
如此,如果在交互过程中,某一对话音区的拒识模式为第一拒识模式,当该音区的语音请求为非车辆交互语音请求,那么可认为该音区暂时无真实交互意图,将该音区的拒识模式更新为第二拒识模式。
请参阅图3及图6,步骤04包括:
042:若车辆座舱拒识模式为第一拒识模式的音区在第一预设时长内未获取到有效语音请求,则将对应音区的拒识模式更新为第二拒识模式。
处理器用于在车辆座舱拒识模式为第一拒识模式的音区在第一预设时长内未获取到有效语音请求的情况下,将对应音区的拒识模式更新为第二拒识模式。
具体地,在交互过程中,可根据对话音区信息确认对话音区的拒识模式,例如,对话音区为唤醒音区,那么确认对话音区的拒识模式为第一拒识模式,但如果该音区在一段时间内未获取到有效语音请求。例如,某一音区的拒识模式为第一拒识模式,但在20s内未获取到有效语音请求。在这种情况下,可认为该音区用户暂时无真实交互意图,可将该音区的拒识模式更新为第二拒识模式,进行较高程度的拒识。其中,未获取到有效语音请求,可以是未获取到语音请求或者未虽然获取到语音请求,但该语音请求与车辆交互不相关。
其中,第一预设时长是对于用户发出有效语音请求的间隔时间的限定,可根据实际情况取设定适当的取值,例如20s、30s、50s、1min等。可以理解,第一预设时长过短会导致音区的拒识模式频繁切换,而设置过长则可能导致语音请求的误召回率较高。
在一个示例中,可将第一预设时长设为20秒,主驾用户唤醒车辆语音助手户,主驾音区101被置为第一拒识模式,若在第一预定时长内未获取到主驾音区101存在有效语音请求,即在20s内未接收到语音请求或者未接收到与车辆交互相关的语音请求,那么将主驾音区101的拒识模式更新为第二拒识模式,即判断后续主驾音区101暂时无明确的交互意图,提高拒识程度,防止交互意图不高的语音请求被漏拒。
而如果在第一预设时长内获取到有效指令,则该音区的第一拒识模式将继续保持。
如此,如果在交互过程中,某一对话音区的拒识模式为第一拒识模式,但该音区在预设时长内未收到有效语音请求,那么可认为该音区暂时无真实交互意图,将该音区的拒识模式更新为第二拒识模式。
请参阅图3及图7,步骤04包括:
043:在根据对话音区信息确认对话音区的拒识模式为第二拒识模式的情况下,若根据用户语音请求确定对话音区在第二预设时长内存在有效语音请求被执行,则将对话音区的拒识模式更新为第一拒识模式。
处理器用于在根据对话音区信息确认对话音区的拒识模式为第二拒识模式的情况下,若根据用户语音请求确定对话音区在第二预设时长内存在有效语音请求被执行,则将对话音区的拒识模式更新为第一拒识模式。
具体地,有效语音请求被执行也即是获取到有效语音请求,并生成相应的车辆执行指令。在交互过程中,可根据对话音区信息确认对话音区的拒识模式,例如,对话音区为非唤醒音区,那么可确认对话音区初始的拒识模式为第二拒识模式,如果该音区在一段时间内接收到有效语音请求,或者说获取到与车辆交互相关的语音请求。例如,某一音区的拒识模式为第二拒识模式,在第二预定时长内获取到有效语音请求“打开车窗”。在这种情况下,可认为该音区用户存在真实交互意图,可将该音区的拒识模式更新为第一拒识模式,进行较低程度的拒识。
其中,第二预设时长与第一预设时长相类似,是对于用户发出有效语音请求的间隔时间的限定,可根据实际情况取设定适当的取值,例如20s、30s、50s、1min等。可以理解,第一预设时长过短会导致音区的拒识模式频繁切换,而设置过长则可能导致语音请求的误召回率较高。
在一个示例中,可将第二预设时长设为20秒,主驾音区101为唤醒音区,左后音区103为非唤醒音区,初始拒识状态为第二拒识模式,若左后音区103在20秒内获取到有效语音请求被执行,将左后音区103的拒识模式更新为拒识程度较低的第一拒识模式,即判断后续左后音区103具有较为明确的交互意图,降低拒识程度,防止语音请求被误拒。
可以理解地,如果拒识模式为第二拒识模式的音区在第二预设时长内未获取到有效指令,则该音区的第二拒识模式将继续保持。
如此,如果在交互过程中,某一对话音区的拒识模式为第二拒识模式,但该音区在预设时长内接收到有效语音请求,那么可认为该音区存在真实交互意图,可将该音区的拒识模式更新为第一拒识模式,也即是拒识程度较低的拒识模式。
请参阅图3及图8,本申请语音处理方法还包括:
044:在车辆语音功能被唤醒后的第三预设时长内未获取到用户语音请求的情况下,退出车辆语音功能。
处理器用于在车辆语音功能被唤醒后的第三预设时长内未获取到用户语音请求的情况下,退出车辆语音功能。
具体地,在交互过程中,如果语音助手在距离前一次获取到用户语音请求的时间超过第三预设时长,每个音区可单独计时,直至最后一个音区在第三预设时长内未获取到用户语音请求,退出车辆语音功能,等待下一次唤醒。
其中,第三预设时长是对于退出车辆语音功能时间的限定,可根据实际情况设定适当的取值,例如100s、120s、150s等。可以理解,第三预设时长过短会导致车辆语音功能频繁退出,影响使用体验,而设置过长则会可能会存在较长的无效工作时间,加重处理负荷。
在一个示例中,可将第三预设时长设为120秒,在车辆语音功能被唤醒后,经过多轮交互后,各个音区在120秒内均未再获取到用户的任何语音请求,则退出车辆语音功能,等待下一次唤醒。
如此,在预设时间内,如果座舱内用户都没有发出任何语音请求,暂时退出车辆语音功能,等待下一次唤醒。
请参阅图9,语音处理方法还包括:
05:处理用户语音请求确定用户语音请求的说话对象标签和意图分级标签;
06:根据对话音区的拒识模式、说话对象标签和意图分级标签对语音请求进行处理得到拒识结果。
处理器用于处理用户语音请求确定用户语音请求的说话对象标签和意图分级标签;以及用于根据对话音区的拒识模式、说话对象标签和意图分级标签对语音请求进行处理得到拒识结果。
具体地,说话对象标签用于标定用户发出的语音请求是否对语音助手发出,可包括语音助手类标签和非语音助手类标签。
意图分级标签用于表征用户语音请求与车辆进行交互意图的有效程度,按有效性从高到低可分为第一级标签、第二级标签和第三级标签。
本申请中,可对用户的每一条语音请求利用这两个标签进行标定,并进一步结合在先确定的对应音区的拒识模式,可得到最终的拒识结果,及召回或拒识。
如此,通过说话对象标签和意图分级标签对用户语音请求进行标定,在结合该语音请求所在音区的拒识模式,确定语音请求的拒识结果,也即是清晰可召回或作为噪声过滤。
步骤06包括:
061:在对话音区的拒识模式为第一拒识模式的情况下,若说话对象标签为语音助手类标签且意图分级标签为第一级标签或第二级标签,则对用户语音请求进行处理得到拒识结果为清晰结果;
062:若说话对象标签为非语音助手类标签且意图分级标签为第三级标签,则对用户语音请求进行处理得到拒识结果为噪声结果。
处理器用于在对话音区的拒识模式为第一拒识模式的情况下,若说话对象标签为语音助手类标签且意图分级标签为第一级标签或第二级标签,则对用户语音请求进行处理得到拒识结果为清晰结果,以及用于在说话对象标签为非语音助手类标签且意图分级标签为第三级标签的情况下,对用户语音请求进行处理得到拒识结果为噪声结果。
具体地,请参阅图10,本申请中,说话对象标签用于标定用户发出的语音请求是否对语音助手发出,例如可以包括:“明确对语音助手说”、“大概率对语音助手说”、“明确不对语音助手说”、“大概率不对语音助手说”“无法判断”“无说话人”等情况,其中语音助手类标签包括“明确对语音助手说”和“大概率对语音助手说”,非语音助手类标签包括“明确不对语音助手说”、“大概率不对语音助手说”、“无法判断”及“无说话人”。
例如,对于语音请求“打开车窗”,可认为该语音请求“大概率对语音助手说”,可确认其说话对象标签为语音助手类标签。
又如,对于语音请求“哈哈哈哈”,可以认为该语音请求“大概率不对语音助手说”,可确认其说话对象标签为非语音助手类标签。
意图分级标签用于表征所述用户语音请求的有效程度,可包括:“强有效”、“弱有效”、“无意图”及“无法判断”等,根据用户语音请求的有效程度可划分标签:第一级标签“强有效”、第二级标签“弱有效”和第三级标签“无意图或无法判断”。
其中,强有效语音请求,通常意图清晰大多无歧义、句式较规范、与车辆功能相关性强。例如:打开空调、椅背调直、仪表调亮一点、播放歌曲、打开音乐界面、音量大点等。
弱有效语音请求,通常意图不够清晰、可能存在歧义、句式不够规范、与车辆功能相关性较弱。例如:明天下不下雨、怎么会没电了、这什么歌、大点声、空调等。
无意图语音请求,通常意图不够清晰、可能存在歧义、句式较为随意、与车辆功能弱相关或无关。例如:随便、我们家、想买这个车可以贷款、开了快点出来吧、开玻璃、变个速。
无法判断,可作为以上情况的补充。
例如,对于语音请求“打开车窗”,可认为该语音请求“大概率对语音助手说”,可确认其说话对象标签为语音助手类标签。并且该语音请求为强有效语音请求,可确认其意图分级标签为第一级标签。若该音区为第一拒识模式,则拒识结果为清晰结果。
又如,对于语音请求“哈哈哈哈”,可以认为该语音请求“大概率不对语音助手说”,可确认其说话对象标签为非语音助手类标签。并且该语音请求为无意图语音请求,可确认其意图分级标签为第三级标签。若该音区为第一拒识模式,则拒识结果为噪声结果。
在实际应用场景中,在对话音区处于第一拒识模式的情况下,如果说话对象标签为语音助手类标签,表明语音请求的说话对象为语音助手或大概率为语音助手,且意图分级标签为第一级标签或第二级标签时,也即是强有效或弱有效语音请求,则对用户语音请求进行处理得到拒识结果为清晰结果,也即是将该语音请求进行召回。反之,如果说话对象标签为非语音助手类标签,且意图分级标签为第三级标签,则对用户语音请求进行处理得到的拒识结果为噪声结果,也即是拒识该语音请求。
如此,在第一拒识模式下,对于说话对象标签为语音助手类标签且意图分级标签为第一级标签或第二级标签的语音请求,确认拒识结果为清晰,对于非语音助手类标签且意图分级标签为第三级标签的语音请求,确认拒识结果为噪声结果。
步骤06还包括:
063:在对话音区的拒识模式为第二拒识模式的情况下,若说话对象标签是语音助手类标签且意图分级标签为第一级标签,则对用户语音请求进行处理得到拒识结果为清晰结果;
064:若说话对象标签为非语音助手类标签且意图分级标签为第二级标签或第三级标签,则对用户语音请求进行处理得到拒识结果为噪声结果。
处理器用于在对话音区的拒识模式为第二拒识模式的情况下,若说话对象标签是语音助手类标签且意图分级标签为第一级标签,则对用户语音请求进行处理得到拒识结果为清晰结果,及用于在说话对象标签为非语音助手类标签且意图分级标签为第二级标签或第三级标签的情况下,对用户语音请求进行处理得到拒识结果为噪声结果。
请参阅图11,在实际应用场景中,在对话音区处于第二拒识模式的情况下,如果说话对象标签为语音助手类标签,表明语音请求的说话对象为语音助手或大概率为语音助手,且意图分级标签为第一级标签时,也即是强有效语音请求,则对用户语音请求进行处理得到拒识结果为清晰结果,也即是将该语音请求进行召回。反之,如果说话对象标签为非语音助手类标签,且意图分级标签为第二级标签或第三级标签,则对用户语音请求进行处理得到的拒识结果为噪声结果,也即是拒识该语音请求。
例如,对于语音请求“打开车窗”,可认为该语音请求“大概率对语音助手说”,可确认其说话对象标签为语音助手类标签。并且该语音请求为强有效语音请求,可确认其意图分级标签为第一级标签。若该音区为第二拒识模式,则拒识结果为清晰结果。
又如,对于语音请求“哈哈哈哈”,可以认为该语音请求“大概率不对语音助手说”,可确认其说话对象标签为非语音助手类标签。并且该语音请求为无意图语音请求,可确认其意图分级标签为第三级标签。若该音区为第二拒识模式,则拒识结果为噪声结果。
如此,在第二拒识模式下,对于说话对象标签为语音助手类标签且意图分级标签为第一级标签的语音请求,确认拒识结果为清晰结果,对于非语音助手类标签且意图分级标签为第二级标签或第三级标签的语音请求,确认拒识结果为噪声结果。相对于第一拒识模式,第二拒识模式对于意图分级标签为第二级的标签拒识程度更为严格。
以下通过三个场景示例对根据拒识模式、说话对象标签和意图分级标签对语音请求进行处理得到拒识结果进行图示辅助说明:
示例一:请参阅表1,主驾音区101的用户唤醒车辆语音功能,主驾音区101确认为唤醒音区,初始拒识模式为第一拒识模式,其他音区为非唤醒音区,初始拒识模式为第二拒识模式。主驾音区101的用户发出“开下空调”的语音请求,该语音请求的说话对象标签为语音助手类,意图分级标签为第一级标签,得到清晰拒识结果。进一步地,主驾音区101的用户发出“20度3档风”的语音请求,该语音请求的说话对象标签为语音助手类,意图分级标签为第一级标签,得到清晰拒识结果。进一步地,左后音区103的用户发出“有点低吧”的语音请求,该语音请求的说话对象标签为非语音助手类,意图分级标签为第二级标签,得到噪声拒识结果。进一步地,左后音区103的用户发出语音请求“车辆温高一点”,以及“再高一点”,说话对象标签均为语音助手类,意图分级标签为第一级标签,由于在预设时长内存在有效语音请求被执行,左后音区的103的拒识模式将更新为第一拒识模式,并得到清晰拒识结果。
唤醒音区 对话音区 语音请求 说话对象标签 意图分级标签 拒识模式 拒识结果
主驾 主驾 开下空调 语音助手类 第一级 第一拒识模式 清晰
主驾 主驾 20度3档风 语音助手类 第一级 第一拒识模式 清晰
主驾 左后 有点低吧 非语音助手类 第二级 第二拒识模式 噪声
主驾 左后 车辆温高一点 语音助手类 第一级 第一拒识模式 清晰
主驾 左后 再高一点 语音助手类 第一级 第一拒识模式 清晰
表1
示例二:请参阅表2,左后音区103的用户唤醒车辆语音功能,左后音区103确认为唤醒音区,初始拒识模式为第一拒识模式,其他音区为非唤醒音区,初始拒识模式为第二拒识模式。左后音区103的用户发出“今天天气怎么样”的语音请求,该语音请求的说话对象标签为语音助手类,意图分级标签为第一级标签,得到清晰拒识结果。进一步地,左后音区103的用户发出“明天呢”的语音请求,该语音请求的说话对象标签为语音助手类,意图分级标签为第一级标签,得到清晰拒识结果。随后,左后音区103的用户和右后音区的开始聊天,左后音区103的用户发出语音请求“天气挺好的要不明天去爬山吧”,由于在预设时间内,左后音区103存在有效指令被执行,左后音区103的拒识模式仍保持在第一拒识模式,该语音请求的说话对象标签为非语音助手类,意图分级标签为第三级标签,得到噪声拒识结果。右后音区105的用户发出语音请求“可以呀”,该语音请求的说话对象标签为非语音助手类,意图分级标签为第三级标签,得到噪声拒识结果。左后音区103的用户发出语音请求“去八达岭长城吗”,该语音请求的说话对象标签为非语音助手类,意图分级标签为第三级标签,得到噪声拒识结果。右后音区105的用户发出语音请求“看看过去要多久”,该语音请求的说话对象标签为非语音助手类,意图分级标签为第三级标签,得到噪声拒识结果。进一步地,结束闲聊,左后音区103的用户发出语音请求“帮我导航到八达岭长城”,由于在预设时间内,左后音区103存在有效指令被执行,左后音区103的拒识模式仍保持在第一拒识模式,该语音请求的意图分级标签判定为第一级标签,得到清晰拒识结果。
唤醒音区 对话音区 语音请求 说话对象标签 意图分级标签 拒识模式 拒识结果
左后 左后 今天天气怎么样 语音助手类 第一级 第一拒识模式 清晰
左后 左后 明天呢 语音助手类 第一级 第一拒识模式 清晰
左后 左后 天气挺好的要不明天去爬山吧 非语音助手类 第三级 第一拒识模式 噪声
左后 右后 可以呀 非语音助手类 第三级 第二拒识模式 噪声
左后 左后 去八达岭长城吗 非语音助手类 第三级 第一拒识模式 噪声
左后 右后 看看过去要多久 非语音助手类 第三级 第二拒识模式 噪声
左后 左后 帮我导航到八达岭长城 语音助手类 第一级 第一拒识模式 清晰
表2
示例三:请参阅表3,主驾音区101的用户唤醒车辆语音功能后,主驾音区101确认为唤醒音区,初始拒识模式为第一拒识模式,其他音区为非唤醒音区,初始拒识模式为第二拒识模式。此时主驾音区101的用户开始打电话,发出“你好你好”,“我现在去上班”,“还在路上呢还没到”等语音请求,这些语音请求的说话对象标签均为非语音助手类,意图分级标签判定为第三级标签,得到噪声拒识结果。进一步地,副驾音区102用户发出语音请求“音量调低一点”,副驾音区102的拒识模式更新为第一拒识模式,该语音请求的说话对象标签为语音助手类,意图分级标签判定为第一级标签,得到清晰拒识结果。左后音区103发出语音请求“把音乐关了吧”,左后音区103的拒识模式更新为第一拒识模式,该语音请求的说话对象标签为语音助手类,意图分级标签判定为第一级标签,得到清晰拒识结果。
唤醒音区 对话音区 语音请求 说话对象标签 意图分级标签 拒识模式 拒识结果
主驾 主驾 你好你好 非语音助手类 第三级 第一拒识模式 噪声
主驾 主驾 我现在去上班 非语音助手类 第三级 第一拒识模式 噪声
主驾 主驾 还在路上呢还没到 非语音助手类 第三级 第一拒识模式 噪声
主驾 副驾 音量调低一点 语音助手类 第一级 第一拒识模式 清晰
主驾 左后 把音乐关了吧 语音助手类 第一级 第二拒识模式 清晰
表3
请参阅图12,本申请还提供了一种语音交互方法,包括:
01:接收车辆转发的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息;
02:根据唤醒音区信息确定车辆座舱内多个音区中每个音区初始的拒识模式;
03:接收车辆转发的在车辆语音功能被唤醒后的用户语音请求以及根据用户语音请求确认的对话音区信息;
04:根据用户语音请求和对话音区信息更新对应音区的拒识模式,以确定每个音区的拒识模式;
07:确定每个音区的拒识模式后,处理用户语音请求得到说话对象标签和意图分级标签;
08:根据拒识模式、说话对象标签和意图分级标签对语音请求进行处理得到拒识结果;
09:将拒识结果下发至车辆以完成语音交互。
本申请的语音交互方法可以由本申请的服务器实现,服务器包括存储器和处理器。本申请的语音交互方法可以由本申请的服务器实现。具体地,存储器中存储有计算机程序,处理器用于接收车辆转发的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息,及用于根据唤醒音区信息确定车辆座舱内多个音区中每个音区初始的拒识模式,及用于接收车辆转发的在车辆语音功能被唤醒后的用户语音请求以及根据用户语音请求确认的对话音区信息,及用于根据用户语音请求和对话音区信息更新对应音区的拒识模式,以确定每个音区的拒识模式,及用于确定每个音区的拒识模式后,处理用户语音请求得到说话对象标签和意图分级标签,及用于根据拒识模式、说话对象标签和意图分级标签对语音请求进行处理得到拒识结果,以及用于将拒识结果下发至车辆以完成语音交互。
具体地,在确认对于语音请求的拒识结果后,将拒识结果下发至车辆,车辆可执行由语音请求生成的控制指令或不做响应,完成语音交互。
关于拒识模式及拒识结果的确认方式,可参考上述处理方法中各个实施方式的解释说明,此处不再赘述。
如此,将车辆座舱划分为多个音区,针对接收到语音请求,根据语音请求及其语音请求来确认每个音区对应的拒识模式,从而能够满足车辆座舱内对于多音区语音交互的拒识需求。同时,伴随语音交互的进行,各个音区的拒识模式会进行更新,从而在多音区交互场景中,具有较高的语音请求拒识准确性,用户体验较佳。
本申请的计算机可读存储介质,存储有计算机程序,当计算机程序被一个或多个处理器执行时,实现上述的方法。
在本说明书的描述中,参考术语“上述”、“具体地”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行请求的代码的模块、片段或部分,并且本申请的一些实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
尽管上面已经示出和描述了本申请的实施方式,可以理解的是,上述实施方式是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施方式进行变化、修改、替换和变型。

Claims (12)

  1. 一种语音处理方法,其中,所述语音处理方法包括:
    接收车辆转发的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息;
    根据所述唤醒音区信息确定所述车辆座舱内多个音区中每个音区初始的拒识模式;
    接收所述车辆转发的在所述车辆语音功能被唤醒后的用户语音请求以及根据所述用户语音请求确认的对话音区信息;
    根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式。
  2. 根据权利要求1所述的语音处理方法,其中,所述根据所述唤醒音区信息确定所述车辆座舱内多个音区中每个音区初始的拒识模式,包括:
    根据所述唤醒音区信息确定所述车辆座舱内唤醒音区初始的所述拒识模式为第一拒识模式;
    确定所述车辆座舱内除所述唤醒音区外的各个音区初始的所述拒识模式为第二拒识模式,所述第二拒识模式对语音请求的拒识程度高于所述第一拒识模式。
  3. 根据权利要求2所述的语音处理方法,其中,所述根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式,包括:
    若根据所述对话音区信息确认对话音区的拒识模式为所述第一拒识模式且所述用户语音请求为非车辆交互语音请求,则将所述对话音区的拒识模式更新为第二拒识模式。
  4. 根据权利要求2所述的语音处理方法,其中,所述根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式,包括:
    若所述车辆座舱拒识模式为所述第一拒识模式的音区在第一预设时长内未获取到有效语音请求,则将对应音区的拒识模式更新为所述第二拒识模式。
  5. 根据权利要求2所述的语音处理方法,其中,所述根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式,包括:
    在根据所述对话音区信息确认对话音区的拒识模式为所述第二拒识模式的情况下,若根据所述用户语音请求确定所述对话音区在第二预设时长内存在有效语音请求被执行,则将所述对话音区的拒识模式更新为所述第一拒识模式。
  6. 根据权利要求1所述的语音处理方法,其中,所述语音处理方法包括:
    在所述车辆语音功能被唤醒后的第三预设时长内未获取到用户语音请求的情况下,退出所述车辆语音功能。
  7. 根据权利要求1至6中任一项所述的语音处理方法,其中,所述语音处理方法包括:
    处理所述用户语音请求确定所述用户语音请求的说话对象标签和意图分级标签;
    根据对话音区的拒识模式、所述说话对象标签和所述意图分级标签对所述语音请求进行处理得到拒识结果。
  8. 根据权利要求7所述的语音处理方法,其中,所述根据对话音区的拒识模式、所述说话对象标签和所述意图分级标签对所述语音请求进行处理得到拒识结果,包括:
    在所述对话音区的拒识模式为第一拒识模式的情况下,若所述说话对象标签为语音助手类标签且所述意图分级标签为第一级标签或第二级标签,则对所述用户语音请求进行处理得到所述拒识结果为清晰结果;
    若所述说话对象标签为非语音助手类标签且所述意图分级标签为第三级标签,则对所述用户语音请求进行处理得到所述拒识结果为噪声结果,所述意图分级标签表征所述用户语音请求的有效程度,其中所述第一级标签大于所述第二级标签且所述第二级标签大于所述第三级标签。
  9. 根据权利要求8所述的语音处理方法,其中,所述根据所述拒识模式、所述说话对象标签和所述意图分级标签对所述语音请求进行处理得到拒识结果,包括:
    在所述对话音区的拒识模式为第二拒识模式的情况下,若所述说话对象标签是语音助手类标签且所述意图分级标签为第一级标签,则对所述用户语音请求进行处理得到所述拒识结果为清晰结果;
    若所述说话对象标签为非语音助手类标签且所述意图分级标签为第二级标签或第三级标签,则对所述用户语音请求进行处理得到所述拒识结果为噪声结果。
  10. 一种语音交互方法,其中,所述语音交互方法包括:
    接收车辆转发的用户在车辆座舱内唤醒车辆语音功能的唤醒音区信息;
    根据所述唤醒音区信息确定所述车辆座舱内多个音区中每个音区初始的拒识模式;
    接收所述车辆转发的在所述车辆语音功能被唤醒后的用户语音请求以及根据所述用户语音请求确认的对话音区信息;
    根据所述用户语音请求和所述对话音区信息更新对应音区的所述拒识模式,以确定每个所述音区的拒识模式;
    确定每个所述音区的拒识模式后,处理所述用户语音请求得到说话对象标签和意图分级标签;
    根据所述拒识模式、所述说话对象标签和所述意图分级标签对所述语音请求进行处理得到拒识结果;
    将所述拒识结果下发至所述车辆以完成语音交互。
  11. 一种服务器,其中,所述服务器包括存储器和处理器,所述存储器中存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1至10中任一项所述的方法。
  12. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被一个或多个处理器执行时,实现如权利要求1至10中任意一项所述的方法。
PCT/CN2023/123601 2022-10-13 2023-10-09 语音处理方法、语音交互方法、服务器及存储介质 WO2024078460A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211255729.4 2022-10-13
CN202211255729.4A CN115503639A (zh) 2022-10-13 2022-10-13 语音处理方法、语音交互方法、服务器及存储介质

Publications (1)

Publication Number Publication Date
WO2024078460A1 true WO2024078460A1 (zh) 2024-04-18

Family

ID=84510697

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/123601 WO2024078460A1 (zh) 2022-10-13 2023-10-09 语音处理方法、语音交互方法、服务器及存储介质

Country Status (2)

Country Link
CN (1) CN115503639A (zh)
WO (1) WO2024078460A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115503639A (zh) * 2022-10-13 2022-12-23 广州小鹏汽车科技有限公司 语音处理方法、语音交互方法、服务器及存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000181500A (ja) * 1998-12-15 2000-06-30 Equos Research Co Ltd 音声認識装置及びエ―ジェント装置
CN107430524A (zh) * 2015-05-20 2017-12-01 华为技术有限公司 一种定位声音发出位置的方法和终端设备
CN108520747A (zh) * 2018-03-29 2018-09-11 浙江吉利汽车研究院有限公司 一种具有语音识别功能的车载控制装置
CN110562260A (zh) * 2018-05-17 2019-12-13 现代自动车株式会社 对话系统和对话处理方法
CN111161720A (zh) * 2018-11-08 2020-05-15 现代自动车株式会社 车辆及其控制方法
CN111583907A (zh) * 2020-04-15 2020-08-25 北京小米松果电子有限公司 信息处理方法、装置及存储介质
DE102020207143A1 (de) * 2020-06-08 2021-12-09 Volkswagen Aktiengesellschaft Kraftfahrzeug mit einem Sprachdialogsystem und Sprachdialogsystem
CN113990300A (zh) * 2021-12-27 2022-01-28 广州小鹏汽车科技有限公司 语音交互方法、车辆、服务器和计算机可读存储介质
CN114155853A (zh) * 2021-12-08 2022-03-08 斑马网络技术有限公司 一种拒识方法、装置、设备及存储介质
CN115503639A (zh) * 2022-10-13 2022-12-23 广州小鹏汽车科技有限公司 语音处理方法、语音交互方法、服务器及存储介质

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000181500A (ja) * 1998-12-15 2000-06-30 Equos Research Co Ltd 音声認識装置及びエ―ジェント装置
CN107430524A (zh) * 2015-05-20 2017-12-01 华为技术有限公司 一种定位声音发出位置的方法和终端设备
CN108520747A (zh) * 2018-03-29 2018-09-11 浙江吉利汽车研究院有限公司 一种具有语音识别功能的车载控制装置
CN110562260A (zh) * 2018-05-17 2019-12-13 现代自动车株式会社 对话系统和对话处理方法
CN111161720A (zh) * 2018-11-08 2020-05-15 现代自动车株式会社 车辆及其控制方法
CN111583907A (zh) * 2020-04-15 2020-08-25 北京小米松果电子有限公司 信息处理方法、装置及存储介质
DE102020207143A1 (de) * 2020-06-08 2021-12-09 Volkswagen Aktiengesellschaft Kraftfahrzeug mit einem Sprachdialogsystem und Sprachdialogsystem
CN114155853A (zh) * 2021-12-08 2022-03-08 斑马网络技术有限公司 一种拒识方法、装置、设备及存储介质
CN113990300A (zh) * 2021-12-27 2022-01-28 广州小鹏汽车科技有限公司 语音交互方法、车辆、服务器和计算机可读存储介质
CN115503639A (zh) * 2022-10-13 2022-12-23 广州小鹏汽车科技有限公司 语音处理方法、语音交互方法、服务器及存储介质

Also Published As

Publication number Publication date
CN115503639A (zh) 2022-12-23

Similar Documents

Publication Publication Date Title
US20230178077A1 (en) Techniques for wake-up work recognition and related systems and methods
CN106816149B (zh) 车辆自动语音识别系统的优先化内容加载
WO2024078460A1 (zh) 语音处理方法、语音交互方法、服务器及存储介质
US20050216271A1 (en) Speech dialogue system for controlling an electronic device
CN109545219A (zh) 车载语音交互方法、系统、设备及计算机可读存储介质
US9601111B2 (en) Methods and systems for adapting speech systems
US9558739B2 (en) Methods and systems for adapting a speech system based on user competance
US20100088093A1 (en) Voice Command Acquisition System and Method
US9202459B2 (en) Methods and systems for managing dialog of speech systems
CN112614491B (zh) 一种车载语音交互方法、装置、车辆、可读介质
US20140136214A1 (en) Adaptation methods and systems for speech systems
US11521612B2 (en) Vehicle control apparatus and method using speech recognition
US11069351B1 (en) Vehicle voice user interface
US11929065B2 (en) Coordinating electronic personal assistants
CN114360527B (zh) 车载语音交互方法、装置、设备及存储介质
WO2024088085A1 (zh) 语音交互方法、语音交互装置、车辆和可读存储介质
CN114724564A (zh) 语音处理方法、装置和系统
CN113879235A (zh) 汽车多屏控制的方法、系统、设备及存储介质
JP2020095121A (ja) 音声認識システム、学習済みモデルの生成方法、音声認識システムの制御方法、プログラム、及び移動体
CN110211579B (zh) 一种语音指令识别方法、装置及系统
US9715878B2 (en) Systems and methods for result arbitration in spoken dialog systems
WO2024083128A1 (zh) 语音交互方法、服务器及计算机可读存储介质
WO2023168895A1 (zh) 车载机器人及其操作方法、介质和计算机程序产品
JP5074759B2 (ja) 対話制御装置、対話制御方法及び対話制御プログラム
CN115457943A (zh) 语音识别的播报方法、装置、设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876662

Country of ref document: EP

Kind code of ref document: A1