CN114446300B

CN114446300B - Multi-sound zone identification method, device, equipment and storage medium

Info

Publication number: CN114446300B
Application number: CN202210144215.5A
Authority: CN
Inventors: 杜春明; 李峥; 徐木水; 王丹
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2023-03-24
Anticipated expiration: 2042-02-17
Also published as: CN114446300A

Abstract

The disclosure provides a method, a device, equipment and a storage medium for identifying multiple sound zones. Relate to car networking technical field, concretely relates to speech recognition field. The specific implementation scheme is as follows: respectively determining azimuth information corresponding to the awakening identification and the inquiry identification; determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification; and according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification, and uploading the audio collected by the target sound zone. According to the technical scheme disclosed by the invention, the accuracy and efficiency of inquiry content identification can be improved.

Description

Multi-sound zone identification method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of car networking, in particular to the field of voice recognition.

Background

To meet the increasing intelligent demand of people, some vehicles support a voice interaction function. With the upgrade of vehicle hardware, the original single-tone region is upgraded into a multi-tone region, and sound is inevitably spread in each tone region due to the limitation of vehicle space, so that the accuracy of query (query) content identification is low in this scenario.

Disclosure of Invention

The disclosure provides a method, a device, equipment and a storage medium for identifying multiple sound zones.

According to a first aspect of the present disclosure, a method for identifying multiple sound zones is provided, which is applied to a vehicle-end device, and includes:

respectively determining azimuth information corresponding to the awakening identification and the inquiry identification;

determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification;

and according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification, and uploading the audio collected by the target sound zone.

According to a second aspect of the present disclosure, there is provided a multi-tone region identification method applied to a cloud server, including:

receiving audio uploaded by vehicle-end equipment, wherein the audio comprises audio collected by a target sound zone; the target sound zone is determined by the vehicle-end equipment from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification according to the direction comparison result of the awakening identification and the inquiry identification;

determining target audio from the audio; and identifying the target audio to obtain the inquiry content.

According to a third aspect of the present disclosure, there is provided a polyphonic region identification method, comprising:

respectively determining azimuth information corresponding to the awakening identification and the inquiry identification; determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification; according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification, and uploading the audio collected by the target sound zone;

According to a fourth aspect of the present disclosure, there is provided a multiple tone zone identification apparatus applied to a vehicle-end device, including:

the first determining module is used for respectively determining the azimuth information corresponding to the awakening identification and the inquiry identification;

the second determination module is used for determining the direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification;

and the control module is used for determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification according to the direction comparison result and uploading the audio collected by the target sound zone.

According to a fifth aspect of the present disclosure, there is provided a multi-tone area recognition apparatus applied to a cloud server, including:

the receiving module is used for receiving audio uploaded by the vehicle-end equipment, wherein the audio comprises audio collected by a target sound zone; the target sound zone is determined by the vehicle-end equipment from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification according to the direction comparison result of the awakening identification and the inquiry identification;

the third determining module is used for determining target audio from the audio;

and the identification module is used for identifying the target audio to obtain the inquiry content.

According to a sixth aspect of the present disclosure, there is provided a multi-zone recognition system comprising:

the vehicle-end equipment is used for respectively determining the direction information corresponding to the awakening identification and the inquiry identification; determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification; according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification, and uploading the audio collected by the target sound zone;

the cloud server is used for determining a target audio from the audio; and identifying the target audio to obtain the inquiry content.

According to a seventh aspect of the present disclosure, there is provided an electronic apparatus comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method applied to any embodiment of the vehicle-end equipment side.

According to an eighth aspect of the present disclosure, there is provided an electronic apparatus comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the method applied to any embodiment of the cloud server side.

According to a ninth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the present disclosure as applied to the system-side embodiment.

According to a tenth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute a method of the present disclosure applied to any embodiment of the vehicle-side device side.

According to an eleventh aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the present disclosure applied in any embodiment of the cloud server side.

According to a twelfth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute a method in which the present disclosure is applied to any embodiment on the system side.

According to a thirteenth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the present disclosure as applied to any one of the embodiments of the vehicle-end device side.

According to a fourteenth aspect of the present disclosure, a computer program product is provided, which includes a computer program that, when being executed by a processor, implements the method of the present disclosure applied to any embodiment of the cloud server side.

According to a fifteenth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method in which the present disclosure is applied in a system-side embodiment.

According to a sixteenth aspect of the present disclosure, there is provided a vehicle comprising the electronic device of the seventh aspect.

According to a seventeenth aspect of the present disclosure, there is provided a server comprising the electronic device according to the eighth aspect.

According to the technical scheme disclosed by the invention, the accuracy and efficiency of inquiry content identification can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a first flowchart illustrating a method for multi-range identification according to an embodiment of the present disclosure;

FIG. 2 is a schematic audio diagram of a wake-up and query in the same vocal tract according to an embodiment of the present disclosure;

FIG. 3 is an audio schematic in a scenario where wake-up and query are not in the same zone according to an embodiment of the disclosure;

FIG. 4 is a second flowchart illustrating a multi-range identification method according to an embodiment of the disclosure;

FIG. 5 is a third flowchart illustrating a multi-zone identification method according to an embodiment of the disclosure;

FIG. 6 is a first schematic diagram of a multi-zone identification apparatus according to one embodiment of the present disclosure;

FIG. 7 is a second schematic diagram of a multi-zone identification apparatus according to an embodiment of the present disclosure;

FIG. 8 is an interaction diagram of a multi-zone recognition system according to one embodiment of the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing a multi-zone identification method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terms "first," "second," and "third," etc. in the description embodiments and claims of the present disclosure and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a list of steps or elements. A method, system, article, or apparatus is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, system, article, or apparatus.

The present disclosure provides a multiple tone zone identification method, which can be applied to a vehicle-end device. The vehicle-end equipment comprises but is not limited to a vehicle-mounted terminal with a voice interaction function and the like, and the type of the vehicle-end equipment is not limited in the disclosure. The vehicle-end device is specifically mounted on or connectable with a vehicle that includes a plurality of sound zones (polyphonic zones for short). As shown in fig. 1, the method for identifying polyphonic regions includes:

s101, respectively determining orientation information corresponding to awakening identification and inquiry identification;

s102, determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification;

s103, according to the direction comparison result, a target sound area is determined from the awakening sound area corresponding to the awakening identification and the inquiry sound area corresponding to the inquiry identification, and the audio collected by the target sound area is uploaded.

In the disclosed embodiment, the vehicle includes at least two or more than two sound zones. For example, the primary driver seat of the vehicle corresponds to a sound zone, and the secondary driver seat corresponds to a sound zone. For another example, the left rear seat of the vehicle corresponds to a sound zone, and the right rear seat corresponds to a sound zone. The sound zone position of the vehicle is not compulsorily limited, and the specific sound zone position can be set according to the design requirement of the vehicle. Each sound zone is provided with an audio collector such as a recording microphone which is responsible for collecting the audio of the sound zone where the sound zone is located.

In the disclosed embodiment, the orientation information is used to characterize the direction and position. For example, the direction information of the wake recognition is the direction in which the voice triggering the wake recognition is located. As another example, the location information of the query recognition is the location where the voice triggered the query recognition is located.

In the embodiment of the present disclosure, the direction comparison result is used to represent whether the wake-up identification and the query identification are located in the same direction, or to represent whether the wake-up identification and the query identification are located in the same sound zone.

For example, in the case that the direction comparison result indicates that the wake-up identification and the query identification are located in the same direction, the wake-up identification and the query identification correspond to the same sound zone.

For example, in the case that the direction comparison result indicates that the wake-up identification and the query identification are located in different directions, the wake-up identification and the query identification correspond to different sound zones.

In some embodiments, determining a target sound zone from a wake-up sound zone corresponding to a wake-up identification and a query sound zone corresponding to a query identification according to the direction comparison result includes:

and under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are positioned in the same direction, determining the awakening sound zone as a target sound zone.

and under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are positioned in different directions, determining the inquiry sound zone as a target sound zone.

According to the technical scheme of the embodiment of the disclosure, the direction information of the awakening identification and the inquiry identification is respectively determined; determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification; because the target sound area is determined according to the direction comparison result and the audio frequency of the target sound area is uploaded, the requirements of directional identification and non-directional identification are met, and the accuracy of the uploaded audio frequency can be ensured, so that the problem that the query content cannot be accurately identified due to sound interference of other sound areas is avoided, and the accuracy of query content identification is improved.

In some embodiments, determining a target sound zone from a wakeup sound zone corresponding to the wakeup identification and a query sound zone corresponding to the query identification according to the direction comparison result, and uploading audio collected by the target sound zone includes:

determining the awakening sound zone as a target sound zone under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are positioned in the same direction;

and uploading audio collected by the awakening sound zone from a preset time point before the awakening point to the arrival of the voice tail point.

Here, the Voice end point may be determined using a Voice Activity Detect (VAD) technique. VADs may also be referred to as voice activity detection. The purpose of silence suppression is to identify and eliminate long periods of silence from the sound signal stream. Silence suppression may save valuable bandwidth resources and may be beneficial to reduce end-to-end delay perceived by users. In order to solve the problem that the users often pause for a short time in voice interaction, so that voice truncation is caused, and the returned resources are not expected, a voice start point (VAD start point) and a voice tail point (VAD tail point) can be determined by adopting VAD technology. In FIGS. 2 and 3, VAD-BEGIN indicates the VAD start point, and VAD-END indicates the VAD END point.

Here, the preset time point is determined according to a preset time interval from the wakeup point. Illustratively, if the preset time interval is 2080ms, uploading a video acquired from 2080ms before the wake-up point. It should be noted that the preset time point can be set or adjusted according to design requirements or user requirements. For example, the value of the preset time point or the preset time interval may be adjusted according to the response speed, the recognition accuracy, or the like.

Fig. 2 is a schematic diagram of audio frequency of a scene in which a driver wakes up and asks for a voice in the same sound zone, as shown in fig. 2, the driver speaks a wake-up voice and the driver speaks a voice for asking after a certain time interval; and the awakening voice and the inquiry voice of the main driving can spread to the auxiliary driving area, but the energy of the voice collected by the main driving area is larger than that collected by the auxiliary driving area. Under the circumstance, if the voice collected by the copilot area is uploaded to the cloud server, the cloud server may not recognize the voice or cannot accurately recognize the voice due to too low energy.

For example, when the trigger wake-up recognition is detected, uploading audio collected from a preset time point before a wake-up point; under the condition that the inquiry identification is triggered by detection, if the direction comparison result represents that the awakening identification and the inquiry identification are located in the same direction, the awakening sound zone is determined as a target sound zone, the current uploading is not interfered, and the audio collected by the awakening sound zone is continuously uploaded until a voice tail point arrives.

Therefore, the audio frequency in the same sound zone can be uploaded, awakened and inquired to the cloud server, the correctness of the uploaded audio frequency is guaranteed, and the accuracy of inquiry content identified by the cloud server is improved.

determining the inquiry sound zone as a target sound zone under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are positioned in different directions;

and uploading the audio collected by the query sound area from the voice starting point to the voice tail point.

Fig. 3 shows an audio schematic diagram in a scene in which the wake-up and query are not in the same sound zone, and as shown in fig. 3, the main driver speaks the wake-up voice, and the wake-up voice of the main driver is spread to the secondary driver zone, but the energy of the wake-up voice collected by the main driver zone is greater than that of the wake-up voice collected by the secondary driver zone. After a period of time, the assistant driver speaks inquiry voice, the inquiry voice of the assistant driver is spread to the main driving area, but the energy of the inquiry voice collected by the assistant driving area is larger than that collected by the main driving area. In this scenario, if the voice collected by the main driving area is uploaded to the cloud server all the time, the cloud server may not recognize or accurately recognize the query voice.

For example, when the trigger wakeup identification is detected, uploading audio collected from a preset time point before a wakeup point; under the condition that the inquiry identification is triggered is detected, if the direction comparison result represents that the awakening identification and the inquiry identification are located in different directions, the inquiry sound area is determined as a target sound area, the target sound area is switched from the awakening sound area to the inquiry sound area (namely the target sound area corresponding to the voice starting point), and the audio collected by the inquiry sound area from the voice starting point to the voice tail point is uploaded, so that the requirement of non-directional identification (the awakening identification and the inquiry identification are located in different directions) can be met.

Therefore, the audios in different sound areas can be uploaded, awakened and inquired to the cloud server, the correctness of the uploaded audio is guaranteed, and the accuracy of inquiry content identified by the cloud server is improved.

In some embodiments, the above method for identifying polyphonic regions further includes: and sending a reset request, wherein the reset request is used for indicating that the audio starting from the voice starting point to the voice ending point is recognized.

For example, when the trigger wake-up recognition is detected, uploading audio collected from a preset time point before a wake-up point; under the condition that the inquiry identification is triggered and detected, if the direction comparison result represents that the awakening identification and the inquiry identification are located in different directions, the inquiry sound area is determined as a target sound area, the target sound area is switched from the awakening sound area to the inquiry sound area (namely the target sound area corresponding to the voice starting point), and the audio collected by the inquiry sound area from the voice starting point to the voice tail point is uploaded; and sending a reset request to the cloud server to instruct the cloud server to identify the audio frequency from the voice starting point to the voice tail point, so that the cloud server can clearly inquire the audio frequency corresponding to the content.

Therefore, the audio to be identified of the cloud server can be definitely indicated, and the accuracy and efficiency of inquiry content identified by the cloud server are improved.

The disclosure provides a multi-zone identification method, which can be applied to a cloud server. This disclosure does not restrict the type of high in the clouds server, for example, this high in the clouds server can be ordinary server, also can be high in the clouds server, and the car end equipment of this disclosure includes but not limited to the terminal that has the voice interaction function. As shown in fig. 4, the method for identifying polyphonic regions includes:

s401, receiving audio uploaded by vehicle-end equipment, wherein the audio comprises audio collected by a target sound zone; the target sound zone is determined by the vehicle-end equipment from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification according to the direction comparison result of the awakening identification and the inquiry identification;

s402, determining target audio from the audio;

and S403, identifying the target audio to obtain inquiry content.

In the embodiment of the present disclosure, the direction comparison result is used to represent whether the wake-up identification and the query identification are located in the same direction, or whether the wake-up identification and the query identification are located in the same sound zone.

In some embodiments, identifying the target audio and obtaining query content includes: the target audio is decoded by a decoder to obtain the query content.

In some embodiments, identifying the target audio and obtaining the query content further comprises: and returning the inquiry content to the vehicle-end equipment.

In some embodiments, identifying the target audio and obtaining the query content further comprises: inquiring based on the inquiry content to obtain an inquiry result; and returning the query result to the vehicle-end equipment.

For example, if the query content is "how much the weather is today", the query result about the weather is returned to the vehicle-end device, such as "cloudy and sunny day; 2, 3 grades of northeast wind; the highest air temperature is 9 ℃.

For another example, if the query content is "how to go to XXX building", the query result is returned to the vehicle-side device, such as displaying a map route from the current location to XXX building on the display device.

According to the scheme, the audio uploaded by the vehicle-end equipment is identified by utilizing the computing power of the cloud server, so that the identification speed of the inquiry voice is ensured, the identification efficiency can be improved, and the accuracy of the identified inquiry content can be improved.

In some embodiments, determining the target audio from the audio comprises: and under the condition that a reset request is not received, determining the audio as the target audio, wherein the reset request is sent by the vehicle-end equipment under the condition that the awakening identification and the inquiry identification are positioned in different directions. Correspondingly, the target audio is identified, and the query content is obtained, wherein the method comprises the following steps: identifying the target audio to obtain an identification result; and taking the content behind the target wake-up word in the recognition result as the query content.

For example, in the running process of the vehicle, if the main driver says "how the weather is in the small scale today", the recognition result is "how the weather is in the small scale today", and the cloud server takes the awakening word "after the small scale" as the query content.

Therefore, the inquiry content under the condition that the awakening identification and the inquiry identification are positioned in the same azimuth scene can be identified, the identification accuracy is improved, and the identification precision is also improved.

In some embodiments, determining the target audio from the audio comprises: determining the audio from a voice starting point to a voice tail point in the audio as a target audio under the condition of receiving a reset request, wherein the reset request is sent by the vehicle-end equipment under the condition of determining that the awakening identification and the inquiry identification are positioned in different directions; correspondingly, identifying the target audio to obtain the query content comprises the following steps: identifying the target audio to obtain an identification result; and taking the whole content of the identification result as the inquiry content.

For example, in the running process of the vehicle, the main driver says 'small degree', the assistant driver says 'how much like the weather today' after a few seconds, the target audio is 'how much like the weather today' acquired by the assistant driver area, and the cloud server takes the 'how much like the weather today' as inquiry content.

Therefore, through the target audio and the reset request, the inquiry content under different azimuth scenes of awakening identification and inquiry identification can be identified, the identification accuracy is improved, and the identification precision is also improved.

The utility model provides a multi-sound zone identification method, this multi-sound zone identification method can be applied to multi-sound zone identification system, and this multi-sound zone identification system includes car end equipment and cloud server. This is disclosed does not restrict the type of high in the clouds server, for example, this high in the clouds server can be ordinary server, also can be high in the clouds server, and car end equipment in this disclosure is including the equipment that has the voice interaction function. As shown in fig. 5, the polyphonic region identification method includes: s501: the vehicle-end equipment respectively determines orientation information corresponding to the awakening identification and the inquiry identification; determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification; according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification; s502: uploading the audio collected by the target sound zone; s503: determining target audio from the audio; and identifying the target audio to obtain the inquiry content.

Therefore, the inquiry content is accurately and efficiently identified by firstly acquiring the audio by using the vehicle end and then identifying by using the higher computing power of the cloud end.

The embodiment of the present disclosure discloses a multi-tone zone identification apparatus, which is applied to a vehicle-end device, as shown in fig. 6, the multi-tone zone identification apparatus may include:

a first determining module 610, configured to determine orientation information corresponding to the wake-up identification and the query identification, respectively;

a second determining module 620, configured to determine a direction comparison result between the wake-up identification and the query identification according to the direction information corresponding to the wake-up identification and the query identification respectively;

and the control module 630 is configured to determine a target sound zone from the wakeup sound zone corresponding to the wakeup identification and the query sound zone corresponding to the query identification according to the direction comparison result, and upload the audio collected by the target sound zone.

In some embodiments, the control module 630 is specifically configured to:

and uploading audio collected by the awakening sound zone from a preset time point before the awakening point until a voice tail point arrives.

In some embodiments, the control module 630 is specifically configured to:

In some embodiments, the polyphonic region identification apparatus may further include: and a sending module (not shown in fig. 6) configured to send a reset request, where the reset request is used to instruct to recognize audio starting from a voice starting point to a voice ending point.

It should be understood by those skilled in the art that the functions of the processing modules in the multi-tone region identification apparatus according to the embodiments of the present disclosure may be realized by an analog circuit that implements the functions described in the embodiments of the present disclosure, or by running software that implements the functions described in the embodiments of the present disclosure on an electronic device.

The multi-tone-zone identification device can improve the accuracy and efficiency of identifying the inquiry content in each scene.

The embodiment of the disclosure discloses a multi-zone identification device, which is applied to a cloud server, and as shown in fig. 7, the multi-zone identification device can comprise:

the receiving module 710 is configured to receive audio uploaded by the vehicle-end device, where the audio includes audio collected by a target sound zone; the target sound zone is determined by the vehicle-end equipment from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification according to the direction comparison result of the awakening identification and the inquiry identification;

a third determining module 720, configured to determine a target audio from the audios;

and the identifying module 730 is used for identifying the target audio to obtain the query content.

In some embodiments, the third determining module 720 is configured to determine the audio as the target audio if a reset request is not received, where the reset request is sent by the vehicle-end device when the wake-up identifier and the query identifier are determined to be located in different directions; correspondingly, the identifying module 730 is configured to identify the target audio to obtain an identification result; and taking the content behind the target wake-up word in the recognition result as the inquiry content.

In some embodiments, the third determining module 720 is configured to determine, as the target audio, the audio starting from the voice starting point to the voice ending point in the audio if a reset request is received, where the reset request is issued by the vehicle-end device when it is determined that the wake-up identification and the query identification are located in different directions; correspondingly, the identifying module 730 is configured to identify the target audio to obtain an identification result; and taking the whole content of the identification result as the inquiry content.

The multi-tone area recognition device of the embodiment of the disclosure can improve the accuracy and efficiency of recognizing the inquiry content in each scene.

An embodiment of the present disclosure further provides a multi-phoneme region identification system, as shown in fig. 8, the multi-phoneme region identification apparatus includes: the system comprises vehicle-end equipment and a cloud server; the vehicle-end equipment is used for respectively determining azimuth information corresponding to the awakening identification and the inquiry identification; determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification; according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification, and uploading the audio collected by the target sound zone; the cloud server is used for determining a target audio from the audio; and identifying the target audio to obtain the inquiry content.

This openly does not inject car end equipment and high in the clouds server's number, can include a plurality of car end equipment, a plurality of high in the clouds server among the practical application.

Further, the multi-tone area recognition system further comprises at least one output device, which is used for receiving a query result returned by the cloud server based on the query content and displaying the query result.

Here, the output device includes, but is not limited to, a navigation device, a sound device, a display device, and the like.

The multi-tone area identification system of the embodiment of the disclosure can improve the accuracy and efficiency of identifying the inquiry content in each scene.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, a computer program product, a vehicle, and a server according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the device 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read-Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An Input/Output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing Unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable Processor, controller, microcontroller, and the like. The calculation unit 901 performs the respective methods and processes described above, such as a polyphonic region identification method. For example, in some embodiments, the multi-zone identification method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the multi-tone region identification method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the multi-tone region identification method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application-Specific Standard Products (ASSPs), system-on-Chip (SOC), load Programmable Logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard Disk, a random access Memory, a Read-Only Memory, an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a Compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a Display device (e.g., a Cathode Ray Tube (CRT) or Liquid Crystal Display (LCD) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client and server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A multi-tone zone identification method is applied to vehicle-end equipment and comprises the following steps:

respectively determining orientation information corresponding to an awakening identification and an inquiry identification, wherein the inquiry identification is used for indicating and searching received contents to be inquired; the direction of the awakening recognition is the direction of the voice triggering the awakening recognition; the direction of inquiry recognition is the direction of the voice triggering inquiry recognition;

according to the direction comparison result, determining a target sound area from the awakening sound area corresponding to the awakening identification and the inquiry sound area corresponding to the inquiry identification, and uploading the audio collected by the target sound area;

wherein, according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification comprises:

determining the awakening sound zone as the target sound zone under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are located at the same direction;

and under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are positioned in different directions, determining the inquiry sound zone as the target sound zone.

2. The method of claim 1, wherein the uploading the target soundzone captured audio comprises:

in case the orientation comparison result indicates that the wake-up identification is in the same orientation as the query identification,

and uploading the audio collected by the awakening sound zone from a preset time point before the awakening point to a voice tail point.

3. The method of claim 1, wherein the uploading the target soundzone captured audio comprises:

in case the orientation comparison result indicates that the wake-up identification and the query identification are located in different orientations,

4. The method of claim 3, further comprising:

and sending a reset request, wherein the reset request is used for indicating that the audio from the voice starting point to the voice ending point is recognized.

5. A multi-tone-zone identification method is applied to a cloud server and comprises the following steps:

receiving audio uploaded by vehicle-end equipment, wherein the audio comprises audio collected by a target sound zone; the target sound zone is determined from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification according to the direction comparison result of the awakening identification and the inquiry identification, and the inquiry identification is used for indicating and searching the received content to be inquired; the direction of the awakening recognition is the direction of the voice triggering the awakening recognition; the direction of inquiry recognition is the direction of the voice triggering inquiry recognition;

determining target audio from the audio;

identifying the target audio to obtain inquiry content;

according to the direction comparison result of the awakening identification and the inquiry identification, the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification comprise:

6. The method of claim 5, wherein the determining a target audio from the audio comprises:

determining the audio as the target audio under the condition that a reset request is not received, wherein the reset request is sent by the vehicle-end equipment under the condition that the awakening identification and the inquiry identification are determined to be positioned in different directions;

wherein, the identifying the target audio to obtain the query content comprises:

identifying the target audio to obtain an identification result;

and taking the content behind the target wake-up word in the recognition result as the query content.

7. The method of claim 5, wherein the determining a target audio from the audio comprises:

determining audio frequency from a voice starting point to a voice tail point in the audio frequency as the target audio frequency under the condition of receiving a reset request, wherein the reset request is sent by the vehicle-end equipment under the condition of determining that the awakening identification and the inquiry identification are positioned in different directions;

identifying the target audio to obtain an identification result;

and taking the whole content of the identification result as the inquiry content.

8. A multi-tone region identification method comprises the following steps:

the vehicle-end equipment respectively determines orientation information corresponding to the awakening identification and the inquiry identification; determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification; according to the direction comparison result, determining a target sound area from the awakening sound area corresponding to the awakening identification and the inquiry sound area corresponding to the inquiry identification, uploading the audio collected by the target sound area, wherein the inquiry identification is used for indicating and searching the received content to be inquired; the direction of the awakening recognition is the direction of the voice triggering the awakening recognition; the direction of inquiry recognition is the direction of the voice triggering inquiry recognition; wherein, according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification comprises: determining the awakening sound zone as the target sound zone under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are located at the same direction; determining the inquiry sound zone as the target sound zone under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are positioned in different directions;

the cloud server determines a target audio from the audio; and identifying the target audio to obtain inquiry content.

9. A multi-tone zone recognition device is applied to vehicle-end equipment and comprises:

the first determining module is used for respectively determining orientation information corresponding to the awakening identification and the inquiry identification, wherein the inquiry identification is used for indicating the received content to be inquired in the search; the direction of the awakening recognition is the direction of the voice triggering the awakening recognition; the direction of inquiry recognition is the direction of the voice triggering inquiry recognition;

the second determination module is used for determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification;

the control module is used for determining a target sound area from the awakening sound area corresponding to the awakening identification and the inquiry sound area corresponding to the inquiry identification according to the direction comparison result and uploading the audio collected by the target sound area;

the control module is configured to:

10. The apparatus of claim 9, wherein the control module is further configured to:

in case the orientation comparison result indicates that the wake-up identification and the query identification are located in the same orientation,

11. The apparatus of claim 9, wherein the control module is further configured to:

12. The apparatus of claim 11, further comprising:

the device comprises a sending module and a resetting module, wherein the sending module is used for sending a resetting request, and the resetting request is used for indicating and identifying the audio frequency from the voice starting point to the voice tail point.

13. The utility model provides a multi-tone zone recognition device, is applied to high in the clouds server, includes:

the receiving module is used for receiving audio uploaded by the vehicle-end equipment, wherein the audio comprises audio collected by a target sound zone; the target sound zone is determined by the vehicle-end equipment from a wake-up sound zone corresponding to the wake-up identification and a query sound zone corresponding to the query identification according to a direction comparison result of the wake-up identification and the query identification, the query identification is used for indicating and searching the received content to be queried, and the direction of the wake-up identification is the direction of the voice triggering the wake-up identification; the direction of inquiry recognition is the direction of the voice triggering inquiry recognition; wherein, according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification comprises: determining the awakening sound zone as the target sound zone under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are located at the same direction; determining the inquiry sound zone as the target sound zone under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are positioned in different directions;

a third determining module, configured to determine a target audio from the audios;

and the identification module is used for identifying the target audio to obtain inquiry content.

14. The apparatus of claim 13, wherein the third determining means is configured to:

wherein the identification module is configured to:

identifying the target audio to obtain an identification result;

15. The apparatus of claim 13, wherein the third determining means is configured to:

under the condition that a reset request is received, determining the audio frequency from a voice starting point to a voice tail point in the audio frequency as the target audio frequency, wherein the reset request is sent by the vehicle-end equipment under the condition that the awakening identification and the inquiry identification are positioned in different directions;

wherein the identification module is configured to:

identifying the target audio to obtain an identification result;

16. A multi-zone identification system comprising:

the vehicle-end equipment is used for respectively determining the direction information corresponding to the awakening identification and the inquiry identification; determining a direction comparison result of the awakening identification and the inquiry identification according to the direction information respectively corresponding to the awakening identification and the inquiry identification; according to the direction comparison result, determining a target sound area from the awakening sound area corresponding to the awakening identification and the inquiry sound area corresponding to the inquiry identification, uploading the audio collected by the target sound area, wherein the inquiry identification is used for indicating and searching the received content to be inquired, and the direction of the awakening identification is the direction of the voice triggering the awakening identification; the direction of inquiry recognition is the direction of the voice triggering inquiry recognition; wherein, according to the direction comparison result, determining a target sound zone from the awakening sound zone corresponding to the awakening identification and the inquiry sound zone corresponding to the inquiry identification comprises: under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are located in the same direction, determining the awakening sound zone as the target sound zone; under the condition that the direction comparison result represents that the awakening identification and the inquiry identification are located in different directions, determining the inquiry sound zone as the target sound zone;

the cloud server is used for determining target audio from the audio; and identifying the target audio to obtain inquiry content.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

18. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 5-7.

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of claim 8.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.

22. A vehicle, comprising: comprising an electronic device as claimed in claim 17.

23. A server, comprising: comprising an electronic device as claimed in claim 18.