CN113113005A

CN113113005A - Voice data processing method and device, computer equipment and storage medium

Info

Publication number: CN113113005A
Application number: CN202110295407.1A
Authority: CN
Inventors: 张磊嘉
Original assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Current assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2021-07-13

Abstract

The application relates to a voice data processing method, a voice data processing device, computer equipment and a storage medium. The method comprises the following steps: receiving a current user instruction corresponding to a current user, wherein the current user instruction comprises a current area position and a target area position, acquiring a current voice service instance corresponding to the current area position, the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit, acquiring a target voice service hardware unit corresponding to the target area position, switching the current voice service hardware unit controlled by the current voice service instance into the target voice service hardware unit, acquiring a target voice service instance, receiving a target user voiceprint feature corresponding to a target user where the target area position is located, binding the target user voiceprint feature with the target voice service instance, and providing voice service for the target user through the target voice service instance. The method can improve the accuracy of the voice service object.

Description

Voice data processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing voice data, a computer device, and a storage medium.

Background

At present, intelligent voice services in vehicles are in a single-task working mode, only a single person can be served at the same time, and user service customization only depends on a login account. While the hardware relies on the most basic speech hardware architecture (e.g., a display, a set of mono or multi channel mics). However, if a plurality of persons are on one vehicle at the same time, a single device in the vehicle cannot simultaneously perform voice services for the plurality of persons, and cannot precisely adjust to the characteristics (such as sex and age) of different persons, so as to perform corresponding voice services.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device and a storage medium for processing voice data, which can provide voice services for multiple users without interfering with each other, and which have multiple voice service instances and are bound to regional hardware and a specified user voiceprint individually.

A method of processing speech data, the method comprising:

receiving a current user instruction corresponding to a current user, wherein the current user instruction comprises a current area position and a target area position;

acquiring a current voice service instance corresponding to the current area position, wherein the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit;

acquiring a target voice service hardware unit corresponding to the position of a target area;

switching a current voice service hardware unit controlled by a current voice service instance into a target voice service hardware unit to obtain a target voice service instance, wherein the target voice service instance comprises a current voice service software unit and a target voice service hardware unit;

and receiving the voiceprint characteristics of the target user corresponding to the target user at the target area position, binding the voiceprint characteristics of the target user with the target voice service instance, and providing voice service for the target user through the target voice service instance.

In one embodiment, before receiving a current user instruction corresponding to a current user, the method includes: when detecting that a current vehicle machine corresponding to a current vehicle is started, acquiring a default voice service instance, wherein the default voice service instance comprises a default voice service software unit and a corresponding default voice service hardware unit, receiving a voice print feature of a super user corresponding to the super user, binding the voice print feature of the super user with the default voice service instance, performing voice service on the super user through the default voice service instance, and enabling the super user to be a user with the highest authority in the current vehicle.

In one embodiment, receiving a current user instruction corresponding to a current user includes: receiving a first super user instruction corresponding to a super user, wherein the first super user instruction comprises a current area position of a current user, acquiring a current voice service hardware unit corresponding to the current area position, establishing an association relationship between a default voice service software unit controlled by a default voice service instance and the current voice service hardware unit according to the first super user instruction to obtain a current voice service instance, receiving a voiceprint feature of the current user corresponding to the current user, binding the voiceprint feature of the current user with the current voice service instance, providing voice service for the current user through the current voice service instance, and collecting the current user instruction corresponding to the current user through the current voice service hardware unit controlled by the current voice service instance.

In one embodiment, receiving a current user instruction corresponding to a current user includes: receiving a second super user instruction corresponding to a super user, wherein the second super user instruction comprises a current area position of a current user, acquiring a current voice service hardware unit corresponding to the current area position, adding a new voice service software unit according to the second super user instruction, establishing an association relationship between the new voice service software unit and the current voice service hardware unit to obtain a current voice service example, receiving a current user voiceprint feature corresponding to the current user, binding the current user voiceprint feature with the current voice service example, providing voice service for the current user through the current voice service example, and acquiring the current user instruction corresponding to the current user through the current voice service hardware unit controlled by the current voice service example.

In one embodiment, the voice data processing method further includes: the method comprises the steps that a current user sharing instruction corresponding to a target user is received through a target voice service hardware unit controlled by a target voice service instance, the current user sharing instruction comprises a shared user area position where a shared user is located and current user sharing content, a corresponding first voice service instance is obtained according to the shared user area position, the first voice service instance comprises a first voice service software unit and a corresponding first voice service hardware unit, the current user sharing content is copied to the first voice service instance, and the current user sharing content is displayed for the shared user through the first voice service hardware unit controlled by the first voice service instance.

In one embodiment, the voice data processing method further includes: the method comprises the steps that a target voice service hardware unit controlled by a target voice service instance receives a current user statement corresponding to a target user, voice recognition is carried out on the current user statement to obtain a current user field corresponding to the current user statement, a target feedback statement corresponding to the current user statement is determined according to the current user field, and the target feedback statement is responded to the target user through the target voice service hardware unit.

In one embodiment, the voice data processing method further includes: receiving a plurality of user input sentences, carrying out voiceprint recognition on each user input sentence to obtain user voiceprint characteristics corresponding to each user input sentence, determining a user voice service instance corresponding to each user input sentence according to each user voiceprint characteristic, and responding to the corresponding user input sentence through the user voice service instance.

A speech data processing apparatus, the apparatus comprising:

the user instruction receiving module is used for receiving a current user instruction corresponding to a current user, and the current user instruction comprises a current area position and a target area position;

the current voice service instance acquisition module is used for acquiring a current voice service instance corresponding to the current area position, and the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit;

the voice service hardware unit acquisition module is used for acquiring a target voice service hardware unit corresponding to the position of a target area;

the target voice service instance generation module is used for switching a current voice service hardware unit controlled by a current voice service instance into a target voice service hardware unit to obtain a target voice service instance, and the target voice service instance comprises a current voice service software unit and a target voice service hardware unit;

and the target voice service instance processing module is used for receiving the voiceprint characteristics of the target user corresponding to the target user where the target area is located, binding the voiceprint characteristics of the target user with the target voice service instance, and providing voice service for the target user through the target voice service instance.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the voice data processing method, the device, the computer equipment and the storage medium, the current voice service hardware unit controlled by the current voice service instance of the current user is changed, the target voice service hardware unit of the area position of the target user is bound again and is bound with the voiceprint of the target user, so that the changed current voice service instance can independently provide voice service for the target user without being influenced by the current user. Therefore, through a plurality of voice service instances, the voice print of the regional hardware and the specified user can be independently bound, and the voice service can be provided for a plurality of users without mutual interference. The voice service object can be switched by changing the area hardware controlled by the current voice service instance, the service object can be accurately identified, and the accuracy of identifying the voice service object is improved.

Drawings

FIG. 1 is a diagram of an exemplary implementation of a method for processing speech data;

FIG. 2 is a flow diagram illustrating a method for processing speech data in one embodiment;

FIG. 3 is a flow diagram illustrating a method for processing voice data in one embodiment;

FIG. 4 is a flowchart illustrating a current user instruction receiving step in one embodiment;

FIG. 5 is a flowchart illustrating a current user instruction receiving step in one embodiment;

FIG. 6 is a flow diagram illustrating a method for processing speech data in one embodiment;

FIG. 7 is a flowchart illustrating a method of processing voice data according to one embodiment;

FIG. 8 is a flow diagram illustrating a method for processing speech data in one embodiment;

FIG. 9 is a block diagram showing the structure of a speech data processing apparatus according to an embodiment;

FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The voice data processing method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

Specifically, the terminal 102 receives a current user instruction corresponding to a current user, where the current user instruction includes a current area position and a target area position, and sends the current user instruction to the server 104 through network communication. The server 104 obtains a current voice service instance corresponding to a current area position, the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit, obtains a target voice service hardware unit corresponding to a target area position, switches the current voice service hardware unit controlled by the current voice service instance into the target voice service hardware unit to obtain a target voice service instance, the target voice service instance comprises the current voice service software unit and the target voice service hardware unit, receives a target user voiceprint feature corresponding to a target user where the target area position is located, binds the target user voiceprint feature with the target voice service instance, and provides voice service for the target user through the target voice service instance.

In one embodiment, as shown in fig. 2, a method for processing voice data is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:

step 202, receiving a current user instruction corresponding to a current user, where the current user instruction includes a current area position and a target area position.

The terminal can be a vehicle-mounted terminal where the current vehicle is located, the current vehicle comprises a front-row driving position, a front-row co-driving position, a rear-row left position and a rear-row right position, each region position of the current vehicle is provided with a corresponding voice service hardware unit, and each voice service hardware unit comprises a microphone, a loudspeaker and a display screen of the region position.

The current user is a user speaking currently in the current vehicle, the current user can perform voice interaction with the corresponding voice service hardware unit, the current voice service hardware unit corresponding to the current area position where the current user is located receives a current user instruction, and the current user instruction comprises the current area position and the target area position. The current area position refers to a seat where a current user is located in the current vehicle, and the target area position refers to an area position designated in the current vehicle.

The current voice service hardware unit corresponding to the current area position where the current user is located and the current voice service hardware unit can perform voice interaction, and the voice print feature of the current user and the current voice service hardware unit are bound in advance, so that the current voice service hardware unit can only recognize the voice of the current user.

Step 204, obtaining a current voice service instance corresponding to the current area position, where the current voice service instance includes a current voice service software unit and a corresponding current voice service hardware unit.

The voice service instance is composed of a voice service hardware unit and a voice service software unit, the voice service hardware unit is a hardware device of voice service, the voice service software unit is a software device of voice service, and the voice service software unit is a software entity for processing audio data collected by the voice service hardware unit and providing voice service.

The corresponding voice service instance exists in each area position of the current vehicle, so that the corresponding current voice service instance can be obtained according to the current area position, and the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit.

Step 206, a target voice service hardware unit corresponding to the target area position is obtained.

And acquiring a target voice service hardware unit corresponding to the current user-specified area position.

And step 208, switching the current voice service hardware unit controlled by the current voice service instance into a target voice service hardware unit to obtain a target voice service instance, wherein the target voice service instance comprises a current voice service software unit and a target voice service hardware unit.

And switching the area of the voice service instance service, and providing the voice service for the new service area through the voice service instance. Specifically, the hardware unit of the voice service controlled by the voice service instance is changed, and the user in the new service area is bound again to provide the voice service for the user. Specifically, a current voice service hardware unit controlled by a current voice service instance is changed into a target voice service hardware unit, that is, a current voice service software unit is bound with a target voice service hardware unit, so as to obtain a target voice service instance. And performing voice service for the target user at the area position of the target voice service hardware unit through the target voice service instance. Wherein the target user is a user in the area position of the target voice service hardware unit.

Step 210, receiving a voiceprint feature of a target user corresponding to a target user where the target area is located, binding the voiceprint feature of the target user with a target voice service instance, and providing voice service for the target user through the target voice service instance.

After the target voice service instance is obtained, the target voice service instance needs to be bound with the target user, and after the target voice service instance is bound, the target voice service instance can be used for serving only the target user independently. Specifically, the voiceprint feature of the target user corresponding to the target user at the target area position is collected, specifically, the target user audio information of the target user is collected through a microphone in a target voice service hardware unit at the target area position, the voiceprint recognition is performed on the target user audio information through a current voice service software unit associated with the target voice service hardware unit to obtain the voiceprint feature of the target user, and the voiceprint feature of the target user is bound with the target voice service instance, that is, the target voice service instance only recognizes the target user matched with the voiceprint feature of the target user. And finally, providing voice service for the target user through the target voice service instance, specifically, collecting user statements sent by the target user through a target voice service hardware unit controlled by the target voice service instance, performing voice recognition on the user statements through a current voice software unit controlled by the target voice service instance, determining corresponding user feedback statements, and responding through the target voice service hardware unit.

In the voice data processing method, the target voice service hardware unit of the area position of the target user is bound again and bound with the voiceprint of the target user by changing the current voice service hardware unit controlled by the current voice service instance of the current user, so that the changed current voice service instance can independently provide voice service for the target user without being influenced by the current user. Therefore, through a plurality of voice service instances, the voice print of the regional hardware and the specified user can be independently bound, and the voice service can be provided for a plurality of users without mutual interference. The voice service object can be switched by changing the area hardware controlled by the current voice service instance, the service object can be accurately identified, and the accuracy of identifying the voice service object is improved.

In one embodiment, as shown in fig. 3, before receiving the current user instruction corresponding to the current user, the method includes:

step 302, when detecting that a current vehicle machine corresponding to a current vehicle is started, acquiring a default voice service instance, where the default voice service instance includes a default voice service software unit and a corresponding default voice service hardware unit.

And 304, receiving the voiceprint characteristics of the super user corresponding to the super user, binding the voiceprint characteristics of the super user with a default voice service instance, and performing voice service on the super user through the default voice service instance, wherein the super user is the user with the highest authority in the current vehicle.

The hardware equipment of each vehicle comprises a vehicle machine, the vehicle machine is a vehicle-mounted infotainment product which is installed in an automobile for short, and the vehicle machine can realize information communication between people and the automobile and between the automobile and the outside (the automobile and the automobile) in function. The method comprises the steps of firstly, detecting a current vehicle-mounted machine state corresponding to a current vehicle, and when the current vehicle-mounted machine state is started, indicating that the whole vehicle-mounted machine of the current vehicle is started, acquiring a default voice service instance, wherein the default voice service instance can be downloaded to a vehicle-mounted terminal from a server in advance, and then acquiring the default voice service instance from the local, or downloading the default voice service instance from the server in real time. The default voice service instance comprises a default voice service software unit and a corresponding default voice service hardware unit.

The super user is the user with the highest authority in the current vehicle, the user with the highest authority in the current vehicle is usually the user who mainly drives, namely, the driver is the super user of the current vehicle, the super user audio information corresponding to the super user is received, and the voice print recognition is carried out on the super user audio information through the default voice service software unit, so that the voice print characteristic of the super user is obtained. And binding the voice print characteristic of the super user with the default voice service instance, so that the default voice service instance provides the voice service for the super user independently. That is, the default voice service instance recognizes only the voice of the driver.

And if the current vehicle has a plurality of users, determining that the user corresponding to the user sound closest to the default voice service hardware unit is a super user. That is, the default voice service software unit in the default voice service instance may determine the user voice closest to the default voice service hardware unit according to the decibel level of the plurality of user voices.

In one embodiment, as shown in fig. 4, receiving a current user instruction corresponding to a current user includes:

step 402, receiving a first super user instruction corresponding to a super user, where the first super user instruction includes a current area location where a current user is located.

Step 404, obtaining a current voice service hardware unit corresponding to the current area position.

And step 406, establishing an association relationship between the default voice service software unit controlled by the default voice service instance and the current voice service hardware unit according to the first super user instruction, so as to obtain the current voice service instance.

The first super-user instruction is an instruction sent by a super-user and is used for binding the relationship between the current voice service instance and the current user, so that the current voice service instance only recognizes the voice of the current user. Specifically, a default voice service hardware unit controlled by a default voice service instance acquires a first super user instruction corresponding to a super user, wherein the first super user instruction comprises a current region position where a current user is located.

Further, a current voice service hardware unit where the current area position is located, that is, a voice service hardware device where the current area position is located is obtained, and the default voice service software unit controlled by the default voice service instance and the current voice service hardware unit where the current area position is located are bound according to the first super user instruction, so that the current voice service instance is obtained.

And step 408, receiving the voiceprint feature of the current user corresponding to the current user, binding the voiceprint feature of the current user with the current voice service instance, and providing voice service for the current user through the current voice service instance.

Step 410, a current user instruction corresponding to a current user is collected through a current voice service hardware unit controlled by a current voice service instance.

Specifically, the current voice service hardware unit in the current voice service instance receives current user audio information corresponding to the current user, and performs voice print recognition on the current user audio information through the current voice service software unit to obtain the voice print feature of the current user. Furthermore, the voiceprint feature of the current user can be bound with the current voice service instance, and after the voiceprint feature of the current user is bound, the current voice service instance can only provide voice service for the current user and is not influenced by other users. Therefore, the current user instruction corresponding to the current user can be acquired through the current voice service hardware unit controlled by the current voice service instance.

In one embodiment, as shown in fig. 5, receiving a current user instruction corresponding to a current user includes:

step 502, receiving a second super user instruction corresponding to a super user, where the second super user instruction includes a current area location where a current user is located.

Step 504, a current voice service hardware unit corresponding to the current area position is obtained.

Step 506, adding a new voice service software unit according to the second super user instruction, and establishing an association relationship between the new voice service software unit and the current voice service hardware unit to obtain the current voice service instance.

The second super-user instruction is an instruction sent by the super-user and is used for binding the relationship between the current voice service instance and the current user, so that the current voice service instance only recognizes the voice of the current user. Specifically, a second super user instruction corresponding to the super user is acquired through a default voice service hardware unit controlled by a default voice service instance, and the second super user instruction comprises a current region position where the current user is located.

Further, a current voice service hardware unit where the current area position is located, that is, a voice service hardware device where the current area position is located, is obtained, a new voice service software unit is added according to the second super user instruction, and the new voice service software unit is bound with the current voice service hardware unit, so that the current voice service instance is obtained.

And step 508, receiving the voiceprint feature of the current user corresponding to the current user, binding the voiceprint feature of the current user with the current voice service instance, and providing voice service for the current user through the current voice service instance.

Step 510, a current voice service hardware unit controlled by a current voice service instance acquires a current user instruction corresponding to a current user.

In one embodiment, as shown in fig. 6, the voice data processing method further includes:

step 602, a current user sharing instruction corresponding to a target user is received through a target voice service hardware unit controlled by a target voice service instance, where the current user sharing instruction includes a shared user area location where a shared user is located and a current user sharing content.

Step 604, a corresponding first voice service instance is obtained according to the shared user area location, and the first voice service instance includes a first voice service software unit and a corresponding first voice service hardware unit.

Step 606, the current user sharing content is copied to the first voice service instance, and the current user sharing content is displayed for the shared user through the first voice service hardware unit controlled by the first voice service instance.

After the target user voiceprint characteristics of the target user are bound with the target voice service instance, the target voice service instance only provides voice service for the target user. Specifically, a current user sharing instruction corresponding to a target user can be received through a target voice service hardware unit controlled by a target voice service instance, and the current user sharing instruction includes a shared user area position where a shared user is located and a current user sharing content. That is, the target user may share the content to other users, and the current user shared content refers to the content that the target user needs to share.

Further, a first voice service instance where the shared user area is located is obtained, where the first voice service instance includes a first voice service software unit and an associated first voice service hardware unit. Through the voice request of the target user, the target voice service instance interacts with the first voice service instance, specifically, the sharing content of the current user can be directly copied to the first voice service instance, so that the sharing content of the current user can be displayed by the first voice service hardware unit controlled by the first voice service instance. And sharing of voice service instances in different regional positions is realized. For example, the back row left side general user B is browsing a commodity, wants to share the commodity with the back row right side general user C, requests to share the commodity with the back right side by voice, and then the commodity appears on the display of the back row right side general user B.

In one embodiment, as shown in fig. 7, the voice data processing method further includes:

step 702, a current user statement corresponding to a target user is received by a target voice service hardware unit controlled by a target voice service instance.

Step 704, performing voice recognition on the current user statement to obtain a current user field corresponding to the current user statement.

Step 706, determining a target feedback statement corresponding to the current user statement according to the current user field.

Step 708, the target feedback statement is responded to the target user through the target voice service hardware unit.

The current statement is sent by a target user, the target voice service instance only provides voice service for the target user, and the current user statement corresponding to the target user is received through a target voice service hardware unit controlled by the target voice service instance. The current user statement sent by the target user can be acquired through a microphone in the target voice service hardware unit, or the current user statement input by the target user can be acquired through a screen in the target voice service hardware unit.

And further, performing voice recognition on the current user sentence to obtain the current user field. The current user field here refers to the knowledge field where the current user statement is located. For example, the current user statement is: "how the road conditions of two Beijing rings are", performing voice recognition on the road conditions to obtain the corresponding current user field as follows: and (6) navigating.

Secondly, after the current user field corresponding to the current user statement is obtained, the target feedback statement corresponding to the current user statement can be determined according to the current user field, the relationship between the feedback statements corresponding to each user field can be established in advance, and the target feedback statement corresponding to the current user statement can be determined according to the relationship. For example, the current user domains are: navigation, the target feedback statement is: "need to provide you with the best route navigation. "finally, the target feedback statement may be responded to the target user by the target voice service hardware unit.

In one embodiment, as shown in fig. 8, the voice data processing method further includes:

at step 802, a plurality of user input sentences are received.

And step 804, performing voiceprint recognition on each user input statement to obtain a user voiceprint characteristic corresponding to each user input statement.

Step 806, determining a user voice service instance corresponding to each user input statement according to each user voiceprint feature.

Step 808, responding to the corresponding user input statement through the user voice service instance.

If a plurality of users of the current vehicle simultaneously send out voice, a plurality of user input sentences are received. Voiceprint recognition needs to be performed on each user input statement to obtain user voiceprint features corresponding to each user input statement. Because each voice service instance is bound with the corresponding user voiceprint feature, the matched user voice service instance can be determined according to the user voiceprint feature corresponding to each user input statement, and finally, the corresponding user input statement is responded through the user voice service instance.

In a specific application scenario, after a vehicle machine of a current vehicle is started, the vehicle machine is in an initial state, a single voice service instance, a code number super, the whole vehicle is in a single voice service unit state, a unique voice service process is operated in the vehicle machine, the voice service instance is authorized to be a super waiter, a voice service hardware unit group (the minimum voice service hardware unit of each area of the whole vehicle at the moment, including the minimum voice service hardware units of all areas of a front row, a rear row left and a rear row right) is controlled to receive voice of the whole vehicle, all users are simultaneously subjected to unified voice service, and only a single user can be served at the same time, (wherein, when a user initiates a voice request, the system judges the user direction and calls the nearest minimum voice service hardware unit to interact with the user, and deploy the service).

Further, in the single voice service instance state, the main driving user a voice requests to start the split mode: voice awakening, wherein a client requires a voice service instance super to perform self-service, the voice service instance requires a user A to perform voiceprint binding, the user A becomes a super user A and has a control authority for the vehicle-mounted voice service instance super, after the voiceprint binding is completed, the voice of the user A is bound by the voice service instance super, only the voice of the user A is recognized, and the voice service instance is not awakened by other people and provides service. At this time, the super user a is queried by the voice service instance super to ask for the new voice service instance to be generated, namely the voice service instance normal-1, to specify the work area position (for example, front row, rear row left, rear row right).

Assuming that the super user a assigns the area to the left in the back row, the car machine system will run a new voice service instance, voice service instance normal-1, and the voice service instance normal-1 will control the minimum voice service hardware unit of the designated area (left in the back row) and mainly provide voice service for the normal user B in the area (left in the back row). The normal user B in the area is required to perform voiceprint binding by the voice service instance normal-1, and only performs voice service for the normal user B. After the process is completed, the voice service instance super controls the minimum voice service hardware unit group formed by two minimum voice service hardware units, namely the front row and the rear row, to perform voice interaction with the super user A and provide services, and the voice service instance normal-1 controls the minimum voice service hardware unit in the designated area (the rear row, the left row) to perform voice interaction with the common user B and provide services. Because there are multiple voice instances, and zone hardware is bound separately, and user voiceprints are specified, both users can be served independently and simultaneously without interfering with each other.

For example: the voice service instance normal-1 is originally performing voice service on the ordinary user B on the left side of the back row, at the moment, the voice service instance normal-1 is requested by the ordinary user B on the left side of the back row to provide service for the ordinary user C on the right side of the back row, at the moment, the voice service instance normal-1 starts to switch the controlled voice service hardware unit, the control right of the original minimum voice service hardware unit (on the left side of the back row) is also given to the voice service instance super, the minimum voice service hardware unit (on the left side of the back row) is started to be controlled, a user voiceprint binding request is started for the ordinary user C on the right side of the back row, and service is started to be provided.

The scenario may be: the requirement of a user mother on the left side of the back row wants to select an animation for a son on the right side of the back row to see, and the operation is as follows: the ordinary user B mother on the left side of the rear row and the voice service instance normal-1 server at the position initiate voice searching cartoon, after the cartoon is selected, the voice service instance normal-1 server is made to play for the child ordinary user C with the region being the rear row right, the child ordinary user C on the left side is asked whether to start playing, the user answers and confirms, and at the moment, the voice service instance normal-1 performs voiceprint binding on the ordinary user C and starts to serve for the child ordinary user C.

Wherein, the voice service instances can be shared, in particular, the voice service instances positioned in different area positions can share the current ongoing voice service content, for example, the speech service instance normal-1 is serving the left-ranked regular user B, when the speech service content is task 1, while the speech service instance normal-2 is performing speech services for the back row right side normal subscriber C, at this time, the common user B wants to share and synchronize the performed task 1 to the common user C, the common user B makes a request through voice, then the voice service instance normal-1 interacts with the voice service instance normal-2, the task 1 is copied to the voice service instance normal-2, the voice service instance normal-2 has a task 1 copy which is the same as the task 1, at this time, the general user C can start to operate on the voice service content as a task 1 copy.

The scene may be that the common user B on the left side of the back row is browsing a commodity, wants to share the commodity with the common user C on the right side of the back row, requests to share the commodity with the right back side by voice, and then the commodity appears on the display of the common user B on the right side of the back row.

In a specific embodiment, a method for processing voice data is provided, which specifically includes the following steps:

1. when detecting that a current vehicle machine corresponding to a current vehicle is started, acquiring a default voice service instance, wherein the default voice service instance comprises a default voice service software unit and a corresponding default voice service hardware unit.

2. Receiving the voiceprint characteristics of the super user corresponding to the super user, binding the voiceprint characteristics of the super user with a default voice service instance, and performing voice service on the super user through the default voice service instance, wherein the super user is the user with the highest authority in the current vehicle.

3. And receiving a current user instruction corresponding to a current user, wherein the current user instruction comprises a current area position and a target area position.

3-1-1, receiving a first super user instruction corresponding to the super user, wherein the first super user instruction comprises the current region position where the current user is located.

And 3-1-2, acquiring the current voice service hardware unit corresponding to the current area position.

And 3-1-3, establishing an association relationship between a default voice service software unit controlled by the default voice service instance and a current voice service hardware unit according to the first super user instruction to obtain the current voice service instance.

And 3-1-4, receiving the voiceprint characteristics of the current user corresponding to the current user, binding the voiceprint characteristics of the current user with the current voice service instance, and providing voice service for the current user through the current voice service instance.

And 3-1-5, acquiring a current user instruction corresponding to the current user through a current voice service hardware unit controlled by the current voice service instance.

And 3-2-1, receiving a second super user instruction corresponding to the super user, wherein the second super user instruction comprises the current region position where the current user is located.

And 3-2-2, acquiring the current voice service hardware unit corresponding to the current area position.

3-2-3, adding a new voice service software unit according to the instruction of the second super user, and establishing an association relationship between the new voice service software unit and the current voice service hardware unit to obtain a current voice service instance.

And 3-2-4, receiving the voiceprint characteristics of the current user corresponding to the current user, binding the voiceprint characteristics of the current user with the current voice service instance, and providing voice service for the current user through the current voice service instance.

And 3-2-5, acquiring a current user instruction corresponding to the current user through a current voice service hardware unit controlled by the current voice service instance.

4. And acquiring a current voice service instance corresponding to the current area position, wherein the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit.

5. And acquiring a target voice service hardware unit corresponding to the position of the target area.

6. And switching the current voice service hardware unit controlled by the current voice service instance into a target voice service hardware unit to obtain a target voice service instance, wherein the target voice service instance comprises a current voice service software unit and a target voice service hardware unit.

7. And receiving the voiceprint characteristics of the target user corresponding to the target user at the target area position, binding the voiceprint characteristics of the target user with the target voice service instance, and providing voice service for the target user through the target voice service instance.

8. And receiving a current user sharing instruction corresponding to the target user through a target voice service hardware unit controlled by the target voice service instance, wherein the current user sharing instruction comprises a shared user area position where the shared user is located and the current user sharing content.

9. And acquiring a corresponding first voice service instance according to the position of the shared user area, wherein the first voice service instance comprises a first voice service software unit and a corresponding first voice service hardware unit.

10. The current user sharing content is copied to a first voice service instance, and the current user sharing content is displayed for the shared user through a first voice service hardware unit controlled by the first voice service instance.

11. And receiving the current user statement corresponding to the target user through the target voice service hardware unit controlled by the target voice service instance.

12. And carrying out voice recognition on the current user sentence to obtain the current user field corresponding to the current user sentence.

13. And determining a target feedback statement corresponding to the current user statement according to the current user field.

14. And responding the target feedback statement to the target user through the target voice service hardware unit.

15. A plurality of user input sentences is received.

16. And carrying out voiceprint recognition on each user input statement to obtain the user voiceprint characteristics corresponding to each user input statement.

17. And determining the user voice service instance corresponding to each user input statement according to each user voiceprint feature.

18. Responding to the corresponding user input sentence through the user voice service instance.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 9, there is provided a voice data processing apparatus 900 comprising: a user instruction receiving module 902, a current voice service instance obtaining module 904, a voice service hardware unit obtaining module 906, a target voice service instance generating module 908, and a target voice service instance processing module 910, wherein:

a user instruction receiving module 902, configured to receive a current user instruction corresponding to a current user, where the current user instruction includes a current area position and a target area position.

A current voice service instance obtaining module 904, configured to obtain a current voice service instance corresponding to the current area location, where the current voice service instance includes a current voice service software unit and a corresponding current voice service hardware unit.

A voice service hardware unit obtaining module 906, configured to obtain a target voice service hardware unit corresponding to the target area location.

The target voice service instance generation module 908 is configured to switch a current voice service hardware unit controlled by a current voice service instance to a target voice service hardware unit to obtain a target voice service instance, where the target voice service instance includes a current voice service software unit and a target voice service hardware unit.

And the target voice service instance processing module 910 is configured to receive a voiceprint feature of a target user corresponding to a target user at a target area, bind the voiceprint feature of the target user with the target voice service instance, and provide a voice service for the target user through the target voice service instance.

In an embodiment, when detecting that a current vehicle-mounted device corresponding to a current vehicle is started, the voice data processing apparatus 900 obtains a default voice service instance, where the default voice service instance includes a default voice service software unit and a corresponding default voice service hardware unit, receives a voice print feature of a super user corresponding to a super user, binds the voice print feature of the super user with the default voice service instance, and performs voice service for the super user through the default voice service instance, where the super user is a user with the highest authority in the current vehicle.

In one embodiment, the voice data processing apparatus 900 receives a first super user instruction corresponding to a super user, where the first super user instruction includes a current area location where a current user is located, acquires a current voice service hardware unit corresponding to the current area location, establishes an association relationship between a default voice service software unit controlled by a default voice service instance and the current voice service hardware unit according to the first super user instruction, obtains a current voice service instance, receives a voiceprint feature of the current user corresponding to the current user, binds the voiceprint feature of the current user with the current voice service instance, provides voice service for the current user through the current voice service instance, and collects a current user instruction corresponding to the current user through the current voice service hardware unit controlled by the current voice service instance.

In an embodiment, the user instruction receiving module 902 receives a second super user instruction corresponding to a super user, where the second super user instruction includes a current area location where a current user is located, obtains a current voice service hardware unit corresponding to the current area location, adds a new voice service software unit according to the second super user instruction, establishes an association relationship between the new voice service software unit and the current voice service hardware unit to obtain a current voice service instance, receives a current user voiceprint feature corresponding to the current user, binds the current user voiceprint feature with the current voice service instance, provides a voice service for the current user through the current voice service instance, and collects the current user instruction corresponding to the current user through the current voice service hardware unit controlled by the current voice service instance.

In one embodiment, the voice data processing apparatus 900 receives a current user sharing instruction corresponding to a target user through a target voice service hardware unit controlled by a target voice service instance, where the current user sharing instruction includes a shared user area location where a shared user is located and a current user sharing content, and obtains a corresponding first voice service instance according to the shared user area location, where the first voice service instance includes a first voice service software unit and a corresponding first voice service hardware unit, copies the current user sharing content to the first voice service instance, and displays the current user sharing content for the shared user through the first voice service hardware unit controlled by the first voice service instance.

In one embodiment, the voice data processing apparatus 900 receives a current user statement corresponding to a target user through a target voice service hardware unit controlled by a target voice service instance, performs voice recognition on the current user statement to obtain a current user field corresponding to the current user statement, determines a target feedback statement corresponding to the current user statement according to the current user field, and responds to the target feedback statement to the target user through the target voice service hardware unit.

In one embodiment, the voice data processing apparatus 900 receives a plurality of user input sentences, performs voiceprint recognition on each user input sentence to obtain a user voiceprint feature corresponding to each user input sentence, determines a user voice service instance corresponding to each user input sentence according to each user voiceprint feature, and responds to the corresponding user input sentence through the user voice service instance. For the specific limitation of the voice data processing apparatus, reference may be made to the above limitation of the voice data processing method, which is not described herein again. The respective modules in the above-described voice data processing apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a speech data processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: receiving a current user instruction corresponding to a current user, wherein the current user instruction comprises a current area position and a target area position, acquiring a current voice service instance corresponding to the current area position, the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit, a target voice service hardware unit corresponding to a target area position is obtained, the current voice service hardware unit controlled by the current voice service instance is switched to the target voice service hardware unit, a target voice service instance is obtained, the target voice service instance comprises the current voice service software unit and the target voice service hardware unit, a target user voiceprint feature corresponding to a target user where the target area position is located is received, the target user voiceprint feature is bound with the target voice service instance, and voice service is provided for the target user through the target voice service instance.

In one embodiment, the processor, when executing the computer program, further performs the steps of: receiving a first super user instruction corresponding to a super user, wherein the first super user instruction comprises a current area position of a current user, acquiring a current voice service hardware unit corresponding to the current area position, establishing an association relationship between a default voice service software unit controlled by a default voice service instance and the current voice service hardware unit according to the first super user instruction to obtain a current voice service instance, receiving a voiceprint feature of the current user corresponding to the current user, binding the voiceprint feature of the current user with the current voice service instance, providing voice service for the current user through the current voice service instance, and collecting the current user instruction corresponding to the current user through the current voice service hardware unit controlled by the current voice service instance.

In one embodiment, the processor, when executing the computer program, further performs the steps of: receiving a second super user instruction corresponding to a super user, wherein the second super user instruction comprises a current area position of a current user, acquiring a current voice service hardware unit corresponding to the current area position, adding a new voice service software unit according to the second super user instruction, establishing an association relationship between the new voice service software unit and the current voice service hardware unit to obtain a current voice service example, receiving a current user voiceprint feature corresponding to the current user, binding the current user voiceprint feature with the current voice service example, providing voice service for the current user through the current voice service example, and collecting the current user instruction corresponding to the current user through the current voice service hardware unit controlled by the current voice service example.

In one embodiment, the processor, when executing the computer program, further performs the steps of: the method comprises the steps that a current user sharing instruction corresponding to a target user is received through a target voice service hardware unit controlled by a target voice service instance, the current user sharing instruction comprises a shared user area position where a shared user is located and current user sharing content, a corresponding first voice service instance is obtained according to the shared user area position, the first voice service instance comprises a first voice service software unit and a corresponding first voice service hardware unit, the current user sharing content is copied to the first voice service instance, and the current user sharing content is displayed for the shared user through the first voice service hardware unit controlled by the first voice service instance.

In one embodiment, the processor, when executing the computer program, further performs the steps of: the method comprises the steps that a target voice service hardware unit controlled by a target voice service instance receives a current user statement corresponding to a target user, voice recognition is carried out on the current user statement to obtain a current user field corresponding to the current user statement, a target feedback statement corresponding to the current user statement is determined according to the current user field, and the target feedback statement is responded to the target user through the target voice service hardware unit.

In one embodiment, the processor, when executing the computer program, further performs the steps of: receiving a plurality of user input sentences, carrying out voiceprint recognition on each user input sentence to obtain user voiceprint characteristics corresponding to each user input sentence, determining a user voice service instance corresponding to each user input sentence according to each user voiceprint characteristic, and responding to the corresponding user input sentence through the user voice service instance.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: receiving a current user instruction corresponding to a current user, wherein the current user instruction comprises a current area position and a target area position, acquiring a current voice service instance corresponding to the current area position, the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit, a target voice service hardware unit corresponding to a target area position is obtained, the current voice service hardware unit controlled by the current voice service instance is switched to the target voice service hardware unit, a target voice service instance is obtained, the target voice service instance comprises the current voice service software unit and the target voice service hardware unit, a target user voiceprint feature corresponding to a target user where the target area position is located is received, the target user voiceprint feature is bound with the target voice service instance, and voice service is provided for the target user through the target voice service instance.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of speech data processing, the method comprising:

acquiring a current voice service instance corresponding to a current area position, wherein the current voice service instance comprises a current voice service software unit and a corresponding current voice service hardware unit;

acquiring a target voice service hardware unit corresponding to the target area position;

switching the current voice service hardware unit controlled by the current voice service instance into the target voice service hardware unit to obtain a target voice service instance, wherein the target voice service instance comprises the current voice service software unit and the target voice service hardware unit;

and receiving a target user voiceprint feature corresponding to a target user where the target area is located, binding the target user voiceprint feature with the target voice service instance, and providing voice service for the target user through the target voice service instance.

2. The method of claim 1, wherein before receiving the current user instruction corresponding to the current user, the method comprises:

when detecting that a current vehicle machine corresponding to a current vehicle is started, acquiring a default voice service instance, wherein the default voice service instance comprises a default voice service software unit and a corresponding default voice service hardware unit;

receiving a voice print feature of a super user corresponding to the super user, binding the voice print feature of the super user with the default voice service instance, and performing voice service on the super user through the default voice service instance, wherein the super user is a user with the highest authority in the current vehicle.

3. The method of claim 2, wherein receiving a current user instruction corresponding to a current user comprises:

receiving a first super user instruction corresponding to the super user, wherein the first super user instruction comprises a current region position where a current user is located;

acquiring a current voice service hardware unit corresponding to the current area position;

establishing an association relation between a default voice service software unit controlled by the default voice service instance and the current voice service hardware unit according to the first super user instruction to obtain a current voice service instance;

receiving a voiceprint feature of a current user corresponding to the current user, binding the voiceprint feature of the current user with the current voice service instance, and providing voice service for the current user through the current voice service instance;

and acquiring a current user instruction corresponding to the current user through a current voice service hardware unit controlled by the current voice service instance.

4. The method of claim 2, wherein receiving a current user instruction corresponding to a current user comprises:

receiving a second super user instruction corresponding to the super user, wherein the second super user instruction comprises the current region position of the current user;

adding a new voice service software unit according to the second super user instruction, and establishing an association relationship between the new voice service software unit and the current voice service hardware unit to obtain a current voice service instance;

5. The method of claim 1, further comprising:

receiving a current user sharing instruction corresponding to the target user through a target voice service hardware unit controlled by the target voice service instance, wherein the current user sharing instruction comprises a shared user area position where a shared user is located and current user sharing content;

acquiring a corresponding first voice service instance according to the position of the shared user area, wherein the first voice service instance comprises a first voice service software unit and a corresponding first voice service hardware unit;

and copying the current user sharing content to the first voice service instance, and displaying the current user sharing content for the shared user through a first voice service hardware unit controlled by the first voice service instance.

6. The method of claim 1, further comprising:

receiving a current user statement corresponding to the target user through a target voice service hardware unit controlled by the target voice service instance;

performing voice recognition on the current user statement to obtain a current user field corresponding to the current user statement;

determining a target feedback statement corresponding to the current user statement according to the current user field;

and responding the target feedback statement to the target user through the target voice service hardware unit.

7. The method of claim 1, further comprising:

receiving a plurality of user input sentences;

carrying out voiceprint recognition on each user input statement to obtain a user voiceprint characteristic corresponding to each user input statement;

determining a user voice service instance corresponding to each user input statement according to each user voiceprint feature;

responding to the corresponding user input sentence through the user voice service instance.

8. A speech data processing apparatus, characterized in that the apparatus comprises:

a voice service hardware unit obtaining module, configured to obtain a target voice service hardware unit corresponding to the target area location;

a target voice service instance generation module, configured to switch the current voice service hardware unit controlled by the current voice service instance to the target voice service hardware unit, so as to obtain a target voice service instance, where the target voice service instance includes the current voice service software unit and the target voice service hardware unit;

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.