CN113407758A

CN113407758A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN113407758A
Application number: CN202110789225.XA
Authority: CN
Inventors: 袁志伟
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2021-09-17

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: collecting voice information of a speaking user based on a preset microphone array, and determining target position information corresponding to the voice information; determining target image information of a target speaking user corresponding to the target position information; determining the role authority of the target speaking user according to the target image information; and determining a target execution mode corresponding to the voice information based on the role authority so as to execute the operation corresponding to the voice information based on the target execution mode. According to the technical scheme of the embodiment of the invention, a man-machine voice interaction way is provided for users at all positions, the man-machine interaction efficiency and experience are improved, and the operation safety is ensured.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of data processing, in particular to a data processing method and device, electronic equipment and a storage medium.

Background

With the development of automobile technology, the demand of users for in-vehicle entertainment equipment is gradually increasing. Based on the requirements, a corresponding intelligent entertainment screen can be arranged in the vehicle.

In the prior art, a central control screen is usually arranged in a cockpit, and a user at a main driving position and a secondary driving position in the cockpit can interact with a vehicle-mounted machine system by using the central control screen. On the one hand, however, the interaction requirements of the rear row users in the vehicle are not considered in this way, that is, the rear row users cannot conveniently and effectively use the control screen; on the other hand, when the driver and the crew give an instruction to the car-mounted device system in a voice manner, noise in the environment (such as voice from other users) may interfere with the recognition of the instruction by the system, and further, if the car-mounted device system incorrectly recognizes the voice of other users as a control instruction, even a safety accident may be caused.

Therefore, the scheme provided by the related art cannot cover all users in the interaction process with the vehicle-mounted system, and is easily interfered by other sounds when an instruction is given to the vehicle-mounted system in a voice mode, so that the intelligence of the vehicle-mounted system is low.

Disclosure of Invention

The invention provides a data processing method, a data processing device, electronic equipment and a storage medium, which provide a man-machine voice interaction way for users at various positions, improve the man-machine interaction efficiency and experience, and ensure the operation safety.

In a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:

collecting voice information of a speaking user based on a preset microphone array, and determining target position information corresponding to the voice information;

determining target image information of a target speaking user corresponding to the target position information;

determining the role authority of the target speaking user according to the target image information;

and determining a target execution mode corresponding to the voice information based on the role authority so as to execute the operation corresponding to the voice information based on the target execution mode.

In a second aspect, an embodiment of the present invention further provides a data processing apparatus, where the apparatus includes:

the voice information acquisition module is used for acquiring voice information of a speaking user based on a preset microphone array and determining target position information corresponding to the voice information;

the target image information determining module is used for determining target image information of a target speaking user corresponding to the target position information;

the role authority determining module is used for determining the role authority of the target speaking user according to the target image information;

and the target execution mode determining module is used for determining a target execution mode corresponding to the voice information based on the role authority so as to execute the operation corresponding to the voice information based on the target execution mode.

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method according to any one of the embodiments of the present invention.

In a fourth aspect, the embodiments of the present invention further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used for executing the data processing method according to any one of the embodiments of the present invention.

According to the technical scheme, the microphone array is preset, voice information of a speaking user is collected based on the microphone array, target position information corresponding to the voice information is determined, a man-machine voice interaction way is provided for personnel at each position, interference of other sounds in the voice interaction process is avoided by utilizing directional pickup, man-machine interaction efficiency and experience are improved, role authority of the target speaking user can be further determined by determining target image information of the target speaking user corresponding to the target position information, an execution mode is determined based on the role authority, operation corresponding to the voice information is achieved, the problem that a low-authority user issues wrong or dangerous voice information to a system is avoided, and operation safety is guaranteed.

Drawings

In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, a brief description is given below of the drawings used in describing the embodiments. It should be clear that the described figures are only views of some of the embodiments of the invention to be described, not all, and that for a person skilled in the art, other figures can be derived from these figures without inventive effort.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention;

fig. 2 is a flowchart of a data processing method according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a data processing method according to a second embodiment of the present invention;

FIG. 4 is a diagram illustrating screen actions presented to guide a user to participate in a voice interaction according to a second embodiment of the present invention;

FIG. 5 is an ambient light prompt presented to guide a user to participate in a voice interaction in accordance with a second embodiment of the present invention;

FIG. 6 is a diagram illustrating an orientation prompt of a physical robot presented to guide a user to engage in a voice interaction in accordance with a second embodiment of the present invention;

fig. 7 is a flowchart of a data processing method according to a second embodiment of the present invention;

fig. 8 is a block diagram of a data processing apparatus according to a third embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention, which is applicable to a situation where users at different positions in a space perform voice interaction with a system, and is particularly applicable to a scenario where users at different positions in an automobile cabin with a microphone array perform voice interaction with an in-vehicle system.

As shown in fig. 1, the method specifically includes the following steps:

and S110, collecting voice information of a speaking user based on a preset microphone array, and determining target position information corresponding to the voice information.

The speaking users can be users which are positioned at different positions in the space and send voice information, for example, when a driver at a main driving position in a car cabin gives an instruction to a car machine system in a voice mode, the driver can be used as the speaking user, and it can be understood that when any user in the car cabin has the authority to send a voice instruction to the car machine system, each user can be used as the speaking user. Correspondingly, the device for receiving the speech information of the speaking user can be a microphone array, the microphone array is composed of a certain number of acoustic sensors (such as a plurality of microphones) and is used for collecting the sound sources of the speech signals in multiple directions in a space, a carbon film is arranged in each microphone, vibration is generated when sound waves are received and compressed, the carbon film is in contact with electrodes below the carbon film, the contact time length and frequency are related to the vibration amplitude and frequency of the sound waves, and therefore the conversion from the sound signals to the electric signals is achieved. It should be noted that, when the microphone array is deployed in an automobile, it is usually connected to a car machine system.

It can be understood that compared with a single microphone, deploying a microphone array in a space not only can realize the collection of sound in the space, but also can further determine the position information of the sound source.

Optionally, when the target wake-up word is acquired based on the preset multi-sound-zone microphone array, the voice information of the speaking user is acquired.

Specifically, the multi-sound-zone microphone array may be understood as a microphone array in which a plurality of acoustic sensors constituting the microphone array are divided into a plurality of sound zones according to different position areas in a space, and each sound zone corresponds to a specific position area and is used for collecting voice information sent by a user in the position area. The target wake-up word may be a word for triggering a system voice assistant or enabling the system to enter a standby state, for example, when the voice assistant is installed in the in-vehicle system, the "assistant" may be set as the target wake-up word, and when the voice collected by the microphone array includes the "assistant", the voice assistant of the in-vehicle system may be triggered and waits for a user to issue a further instruction. It should be noted that, in the actual application process, for the speaking user, only when the speaking user utters the voice containing the target wake-up word, the voice assistant can be woken up, or the in-vehicle system is enabled to enter the standby state, otherwise, when the user utters the normal voice not containing the target wake-up word, the system does not react.

In this embodiment, the positions where the users with different utterances are located are corresponding position information, for example, the position information of the driver in the car seat cabin corresponds to the main driving, and the position information of the user at the passenger seat corresponds to the passenger seat. It should be understood by those skilled in the art that different cabin position information is stored in the car system and is provided with specific identifiers, such as, for four cars, a primary driving position corresponds to identifier 1, a secondary driving position corresponds to identifier 2, a rear row left position corresponds to identifier 3, and a rear row right position corresponds to identifier 4.

In this embodiment, there are at least two ways to determine the speaking user target location information based on the microphone array. The first way is to determine target location information of a target speaking user corresponding to voice information based on a multi-zone microphone array.

Specifically, when a plurality of acoustic sensors forming a microphone array are deployed in a space, the acoustic sensors can be divided into a plurality of sound zones according to different position areas, and meanwhile, the acoustic sensors in different sound zones are associated with corresponding position information and stored in a vehicle-mounted device system database in the form of a mapping table. Based on the above, when the speaking user sends out voice information, the acoustic sensor which has the strongest perception on the voice information (namely, the sound zone which has the strongest perception on the voice information) can be determined, and further the target position information corresponding to the speaking user is determined in a table look-up manner, so that directional sound pickup is realized. Continuing to take the example of four cars with a microphone array deployed in the cabin, the position information of the four seats respectively corresponds to the identifiers 1, 2, 3 and 4 when being stored in the car-mounted system, each identifier corresponds to the four microphone identifiers 1 ', 2', 3 'and 4' forming the microphone array, and is stored in the form of a mapping table, when a user on the left side of the back row with the seat identifier of 3 sends out voice information, the microphone with the identifier of 3 'in the microphone array senses the voice information most strongly, and the car-mounted system can determine the position information with the identifier of 3 corresponding to the microphone with the identifier of 3', namely the left side of the back row of the passenger cabin, in the form of table lookup.

It should be noted that, after the target sound zone is determined according to the speaking user, part of the sound zones except the target sound zone can be temporarily closed according to a preset rule, only the target sound zone is locked, and the current speaking user is guided to issue a subsequent voice instruction in a visual manner on the central control screen, so that the interaction efficiency and the interaction experience are improved.

The second way is to pre-store the distance range and the angle range corresponding to the microphone array at different positions in the space in the system, when the speaking user sends out voice information, the distance and the angle between the speaking user and the microphone array can be determined based on a sound source positioning algorithm, and the position information of the speaking user is determined according to the range of the determined distance and angle. Taking a four-seat automobile with a microphone array deployed in a cabin as an example, when a user on the left side of the rear row of the cabin sends voice information, the microphone array determines that the distance and the angle between the user and the automobile are respectively 80cm and 200 degrees based on a sound source positioning algorithm, and meanwhile, the distance range and the angle range between the seat position information and the microphone array stored in the car machine system in advance are respectively [30cm, 120cm ], [180 degrees and 270 degrees ], and based on the distance and the angle range, the car machine system can determine that the position information corresponding to the current speaking user is the left side of the rear row of the cabin.

And S120, determining target image information of the target speaking user corresponding to the target position information.

The target image information is obtained by shooting through the camera device, and it can be understood that at least one camera device or a plurality of camera devices can be deployed in the space. When the camera device is deployed and the device is static, a panoramic image of a space can be acquired, target image information including a target speaking user is determined from the panoramic image, and when the device is dynamic, the device can be moved to the position near the target speaking user by using a sliding rail in the space after voice information of the speaking user is received, and then target image information including the target speaking user is acquired.

Optionally, the image to be processed of the target area is shot based on at least one preset image pickup device, and the target image information is determined from the image to be processed according to the target position information.

In this embodiment, at least one image capturing device may be disposed in the space in advance and fixed at a certain position in the space, and it should be noted that, in the process of disposition, the image capturing device is also made to meet the condition that the whole image of the space can be captured. And when the space is shot by the camera device, the obtained image is the image to be processed.

In order to provide targeted feedback to different types of users based on images, the system also needs to further process the images to be processed shot by the camera device. Specifically, the script may be configured in advance according to the correspondence between the imaging region and the position information in the space, when the image to be processed is obtained, the portion corresponding to the target position information may be divided from the image to be processed according to the script configured in advance, and further, other portions may be discarded to cut out a new image, and the cut-out image is used as the target image information.

Illustratively, a panoramic camera device is arranged at a central control position of four automobiles, the height of the panoramic camera device is adjusted to shoot panoramic images in the automobiles, and corresponding scripts are configured in advance according to the relationship between a main driving position, a secondary driving position and a rear row two positions and an imaging area of the panoramic camera device. When the driver at the main driving position sends out voice information, the panoramic camera device can shoot a panoramic image comprising four positions of the cabin as an image to be processed, and further, target image information only comprising the driver at the main driving position can be cut out from the panoramic image according to a preset script.

Optionally, a shooting instruction is generated based on the target position information, and the shooting instruction is sent to the target image capturing device, so that the target image capturing device captures target image information corresponding to the target position information.

In the present embodiment, a plurality of image pickup apparatuses may be disposed at respective positions in space while associating the respective image pickup apparatuses with corresponding position information. After the target position information is determined, the corresponding camera device can be determined in a table look-up mode, and then a shooting instruction is issued to the camera device. And after receiving the shooting instruction, the target camera device can shoot the target image information comprising the target speaking user.

Continuing with the description of the example of four cars, four cameras are respectively disposed corresponding to the main driving position, the assistant driving position, the left position of the rear row and the right position of the rear row, and a mapping table representing the correspondence between each position information and the camera is stored in advance in the car machine system. After the voice information is sent out by the driver at the main driving position, the camera device corresponding to the main driving position can be determined in a table look-up mode, and then the camera device is used for shooting the main driving position to obtain the target image information only including the driver at the main driving position.

In this embodiment, the multiple image capturing devices are disposed in a manner that further enhances the adaptability of the scheme to the space. Meanwhile, those skilled in the art should understand that the image capturing device used in this embodiment may be a variety of devices, such as a high-definition camera in a vehicle, a rifle bolt, a hemispherical camera, an integrated camera, an infrared camera for day and night, a high-speed dome camera, and a web camera, and the specific image capturing device and the deployment mode should be selected according to actual needs, and the embodiment of the present disclosure is not limited specifically herein.

In an application scene of the automobile cabin, after target image information corresponding to a speaking user is collected through the camera device, the target image information can be uploaded to the vehicle-mounted device system, and the vehicle-mounted device system can execute further operation by utilizing the target image information.

And S130, determining the role authority of the target speaking user according to the target image information.

The role authority can represent the scope and degree of decision making of the current system by the user, and it can be understood that different speaking users can have the same role authority or different role authorities. Meanwhile, the role authority of the user corresponds to the characteristics of different types of users and is pre-stored in the system in the form of a mapping table, based on the mapping table, after the target image information of the target speaking user is determined, the image can be analyzed, the user characteristics of the target speaking user are determined, and then the role authority corresponding to the determined characteristics is determined in a table look-up mode.

Taking an automobile as an example, role authorities of different types of users can be stored in a vehicle-mounted system in advance, specifically, the role authority corresponding to an adult at a main driving position is a primary authority with the highest grade, and the vehicle-mounted system has the authority for opening, closing and adjusting various functions of the vehicle, such as the authority for opening and closing the voice-controlled automatic driving function; the role authority corresponding to the adult in the non-main driving position is a secondary authority and has the authority for opening, closing and adjusting partial functions of the vehicle, for example, the authority for controlling the lifting and adjusting of the corresponding vehicle window through voice; the role authority corresponding to the children is a three-level authority, and only has specific authority which does not influence the safety of users and vehicles. After the camera device in the cabin collects the target image information of the target speaking user, the car machine system can analyze the user characteristics in the image to determine the age bracket of the target speaking user, and the corresponding role authority can be obtained by combining the target position information of the target speaking user.

And S140, determining a target execution mode corresponding to the voice information based on the role authority, and executing the operation corresponding to the voice information based on the target execution mode.

In this embodiment, the voice message sent by the target speaking user may include an operation instruction, for example, for a car-mounted device system, the voice message sent by the car-mounted cabin main driving position user may include an instruction of "starting automatic driving". Those skilled in the art should understand that the voice information collected by the microphone array can be input into a pre-trained voice recognition algorithm model, so as to determine the corresponding operation instruction.

In this embodiment, for different role authorities, the system may select corresponding processing logic as the target execution manner. Optionally, a corresponding relationship between different role authorities, available voice information, and execution manners may be created in advance, so that when the role authorities and the voice information are determined, a target execution manner is determined based on the corresponding relationship.

Continuing to explain by the above example, when it is determined that the role authority of the speaking user is the first-level authority with the highest level, the in-vehicle system may query, in combination with the received voice information, in a mapping table representing the role authority, the available voice information, and the corresponding relation of the execution modes, and then execute corresponding operations according to the query result, for example, start the automatic driving function of the automobile; when the role authority of the speaking user is determined to be the third-level authority, the car machine system needs to feed back inquiry information to the user at the main driving position according to the inquiry result, and determines whether to execute the operation corresponding to the voice information according to the feedback information, if so, when a child in the car sends a command of opening the lathe, the car machine system inquires whether the driver agrees that the child opens the corresponding car window.

In order to clearly describe the technical solution of the present embodiment, an application scenario may be taken as an automobile cabin and described with reference to the flow in fig. 2, but the present invention is not limited to the above scenario and may be applied to various scenarios for executing corresponding operations based on the collected voice information. Referring to fig. 2, when a user in the car cabin wakes up the voice assistant of the car machine system using a wake-up word, the microphone array may identify the collected voice information, and when the identification is successful, the voice assistant may feed back a wake-up response to the user; when the recognition is failed, if the user does not wake up the voice assistant any more, the flow is ended, and if the user continues to try to wake up the voice assistant, the voice message containing the wake-up word needs to be issued continuously. After the voice assistant feeds back the awakening response, the voice information of the speaking user can be continuously picked up in an oriented mode by the microphone array, the role authority of the user is further determined, the corresponding instruction is extracted from the voice information, and the execution mode corresponding to the role authority is selected to execute the extracted instruction.

According to the technical scheme, the microphone array is preset, voice information of a speaking user is collected based on the microphone array, target position information corresponding to the voice information is determined, a man-machine voice interaction way is provided for personnel at each position, interference of other sounds in the voice interaction process is avoided by utilizing directional pickup, the efficiency and experience of man-machine interaction are improved, role authority of the target speaking user can be further determined by determining target image information of the target speaking user corresponding to the target position information, an execution mode and operation corresponding to the voice information are determined based on the role authority, the problem that a low-authority user sends wrong or dangerous voice information to a system is avoided, and operation safety is guaranteed.

Example two

Fig. 3 is a flowchart illustrating a data processing method according to a second embodiment of the present invention, where based on the foregoing embodiment, the role right of the target speaking user is determined based on age level information and target location information, and corresponding operations are executed in a differentiated execution manner for different role rights, so as to further improve the security of operations executed based on voice information. Aiming at the condition that a user with non-advanced user authority speaks, the inquiry user is guided to make a decision on voice information in a visual mode, the problem of pickup confusion caused by deploying a multi-sound-area microphone array in a space is avoided, and the efficiency and experience of voice interaction are further improved. The specific implementation manner can be referred to the technical scheme of the embodiment. The technical terms that are the same as or corresponding to the above embodiments are not repeated herein.

As shown in fig. 3, the method specifically includes the following steps:

s210, collecting voice information of a speaking user based on a preset microphone array, and determining target position information corresponding to the voice information.

And S220, determining target image information of the target speaking user corresponding to the target position information.

And S230, inputting the target image information into a target user classification model obtained through pre-training to obtain age level information of the target speaking user in the target image information.

The target user classification model can determine the age level of the user based on the user characteristics in the target image information, such as determining whether the target speaking user is an adult user or an underage user. Illustratively, a convolutional neural network is used as an algorithm model, 500 images comprising different users are randomly selected as a training set to train the model, 1000 images comprising different users are randomly collected, 500 images are used as a verification set to estimate model parameters, and the rest 500 images are used as a test set to evaluate the algorithm performance. After the optimal model parameters are found by using the verification set, 500 images serving as the training set and 500 images serving as the verification set are mixed to form a new training set, the model is optimized for multiple times, and when the target detection evaluation index of the measured algorithm model reaches a preset threshold value, the model training is considered to be finished. At this time, the system may use the target image information as input, and determine the age-level information of the target speaking user through the trained target user classification model, for example, after the target image information including the head portrait of an adult user is collected in the cabin and input to the model, the obtained age-level information may be "20 to 30 years old", and after the target image information including the head portrait of an immature user is collected in the cabin and input to the model, the obtained age-level information may be "10 to 18 years old".

And S240, determining the role authority of the target speaking user based on the age level information and the target position information.

In this embodiment, the role authority of the target speaking user needs to be determined based on the age level information and the target location information together. Taking an application scenario of four automobiles as an example, when it is determined that a target speaking user is located at a main driving position, the vehicle-mounted device system cannot directly determine that the user has opening, closing and adjusting permissions of multiple functions of a vehicle, and can determine the user by combining age level information output by a target user classification model. It can be understood that when the model output result indicates that the main driving position user is a minor, the determined role authority is not the authority corresponding to the driver, but is the authority corresponding to the minor user without affecting the safety of the user and the vehicle, and on the basis, the vehicle-mounted machine system refuses to execute part of the voice information issued by the minor user. Illustratively, when the voice information sent by the underage user is determined to be the starting of the vehicle, the vehicle-mounted device system refuses to execute, and feeds back a prompt message refusing to execute the voice information through the central control screen, and when the voice information sent by the underage user is determined to be the starting of the automatic driving function, the vehicle-mounted device system directly executes flameout operation and the like.

And S250, determining a target execution mode corresponding to the voice information based on the role authority, and executing the operation corresponding to the voice information based on the target execution mode.

Optionally, if the role right is a high-level role right, the target execution mode is to execute an operation corresponding to the voice information.

Continuing with the above example, when it is determined that the target speaking user is the driver in the main driving position and the age level information output by the target user classification model also indicates that the driver is an adult, it may be determined that the role authority corresponding to the driver is the high-level role authority, i.e., has the highest authority to control the vehicle. Corresponding to the high-level role right item, after the car machine system receives the voice information, the control instruction in the voice information can be directly extracted, and corresponding operation is executed according to the extracted control instruction, such as starting a car, starting an automatic driving function and the like.

Optionally, if the role right is a non-advanced role right, determining executable voice information corresponding to the role right according to a pre-established mapping relation table, and determining a target execution mode corresponding to the voice information based on the voice information and the executable voice information.

Corresponding to the high level role privileges, the system may also assign non-high level role privileges to the user. It can be understood that the user authority level of the high-level role authority is higher than that of the non-high-level role authority, and the control functions corresponding to different authorities are different.

In this embodiment, executable voice information (i.e., control instructions) corresponding to the non-high-level role authority can be determined through a pre-stored mapping table, and for the voice information which can be used as the control instructions, the system can be directly executed, or corresponding inquiry information can be generated according to a preset rule and fed back to the driver at the main driving position.

Continuing to explain by the example, when the target speaking user is judged to be the child user at the rear row position, in the subsequent process, if the voice information for starting the vehicle sent by the child user is received, the voice information is determined to be not executable voice information through table lookup, the control instruction system can choose to refuse execution, and prompt the message of operation out of limit; if the voice information sent by the child user for adjusting the car window is received, the voice information is determined to be executable voice information through table lookup, and then the operation of adjusting the car window can be directly executed.

It should be particularly noted that, when the scheme of this embodiment is applied to the field of automobiles, the in-vehicle system can be interconnected with the cloud. That is, after the target speaking user and the target position information corresponding to the target speaking user are determined, the determined data can be uploaded to the cloud end by combining the target image information, the age level information of the speaking user is determined by the cloud end, and the role authority of the target speaking user is determined based on the age level information and the target position information. The cloud end is introduced to process the acquired data, so that the data processing efficiency can be further improved.

Optionally, if the voice information belongs to executable voice information, the target execution mode is to execute an operation corresponding to the voice information.

For example, after determining that the role authority corresponding to an adult user in the vehicle cabin is a non-advanced user authority, when the user sends out voice information for starting the air conditioner in the vehicle, the voice information is determined to belong to executable voice information in a table look-up mode, and based on the fact, the vehicle-mounted machine system can send an instruction for starting the air conditioner in the vehicle to the vehicle.

Optionally, if the voice information does not belong to executable voice information, the target execution mode is to play the voice information based on the playing device, and execute an operation corresponding to the confirmed voice information after receiving the confirmed voice information sent by the high-level role authority user.

In this embodiment, if it is determined that the voice information delivered by the user does not belong to the executable voice information, it indicates that the control instruction corresponding to the voice information exceeds the control authority of the user on the system. At this time, it is necessary to play the voice information using the playing device in the space and feed back a query message to the user having the superior user authority to determine whether the operation corresponding to the voice information can be performed.

Illustratively, when the target speaking user is a child in the rear row of an automobile cabin, the corresponding role authority is a non-advanced user authority, when the child user sends out voice information for adjusting a window, the voice information is determined to be not executable voice information in a table look-up mode, at the moment, the voice information for adjusting the window sent by the child user can be broadcasted in the cabin, and a local arbitration engine deployed in a vehicle machine system is used for intelligently selecting a user with the advanced user authority in the current cabin. Besides the users with high user authority, the basis for selecting the inquiring user can be the position information of the users in the passenger cabin, the age bracket of the users, the current sight line direction of the users, the listening state of the users and the like, for example, the users at the main driving position, the users in the age bracket of '30-40 years' or the users with the current sight line direction as the central control screen are used as the inquiring user.

Furthermore, after the inquiry user in the current cabin is determined, specific prompt information can be displayed on the central control screen to guide the inquiry user to make a decision on the voice information. Taking fig. 4, 5, and 6 as an example, it can be determined through the screen action in fig. 4 that the user at the primary and secondary driving positions in the cabin can participate in the decision-making of the voice message; fig. 5 is the same as that shown in fig. 4, with only differences in the manner of presentation, and in particular fig. 5 is directed to the inquiring user by the ambience light surrounding a particular microphone; fig. 6 shows a situation that a physical robot for assisting driving is deployed in a cabin, and when a driver who determines a main driving position has authority to make a decision on the voice information, the physical robot adjusts the orientation of the physical robot to the main driving position, so as to guide the driver to make a decision on a control instruction for adjusting a window.

The inquiry user is guided to make a decision on the voice information sent by the user corresponding to the non-advanced user authority in a visual mode, so that the problem of pickup confusion caused by deploying a multi-sound-area microphone array in a space is avoided, and the efficiency and experience of voice interaction are further improved.

In order to clearly describe the technical solution of the present embodiment, an application scenario may be taken as an automobile cabin and described with reference to the flow in fig. 7, but the present invention is not limited to the above scenario and may be applied to various scenarios for executing corresponding operations based on the collected voice information. Referring to fig. 7, when a user in the cabin wakes up the voice assistant by using the target wake-up word, the position information and age level information of the speaking user can be determined by using the microphone array and the camera device which are pre-deployed in the vehicle, and the corresponding sound zone in the microphone array can be locked by using the data, so that directional sound pickup is realized. The voice information of the target speaking user is continuously acquired, the acquired voice information is uploaded to a voice interaction service center at the cloud end, the corresponding operation authority can be determined through the position information and the age level information of the speaking user, and when the high-level user authority corresponding to the speaking user is judged, the vehicle-mounted computer system can select to directly execute the corresponding operation and issue voice feedback for the voice information which is issued by the user and can be used as a control instruction; when the speaking user is judged to correspond to the non-advanced user right, voice information which is sent by the user and can be used as a control instruction can be generated to guide other users to intervene, the right requirements of the other users are determined, and broadcasting is carried out in the cabin. Aiming at voice information issued by a speaking user, whether further interaction is needed can be judged through a pre-deployed local arbitration engine, and if the control instruction corresponding to the voice information is judged not to need interaction, the control instruction can be directly broadcasted and corresponding operation can be executed; and if the control instruction corresponding to the voice information is judged to need further interaction, determining the inquiring user according to the determined permission requirement, the position, age bracket, sight direction or listening state of the user in the current cabin, and guiding the determined inquiring user to participate in the further interaction of the voice information through the screen action, the atmosphere lamp and the entity robot.

According to the technical scheme, the role authority of the target speaking user is determined based on the age level information and the target position information, corresponding operations are executed in different role authorities in a differentiated execution mode, and the safety of the operations executed based on the voice information is further improved. Aiming at the condition that a user with non-advanced user authority speaks, the inquiry user is guided to make a decision on voice information in a visual mode, the problem of pickup confusion caused by deploying a multi-sound-area microphone array in a space is avoided, and the efficiency and experience of voice interaction are further improved.

EXAMPLE III

Fig. 8 is a block diagram of a data processing apparatus according to a third embodiment of the present invention, which is capable of executing a data processing method according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 8, the apparatus specifically includes: a voice information collection module 310, a target image information determination module 320, a role authority determination module 330, and a target execution manner determination module 340.

The voice information collecting module 310 is configured to collect voice information of a speaking user based on a preset microphone array, and determine target location information corresponding to the voice information.

And a target image information determining module 320, configured to determine target image information of a target speaking user corresponding to the target location information.

And a role authority determining module 330, configured to determine, according to the target image information, a role authority of the target speaking user.

And a target execution mode determining module 340, configured to determine a target execution mode corresponding to the voice information based on the role permission, so as to execute an operation corresponding to the voice information based on the target execution mode.

On the basis of the above technical solutions, the voice information collecting module 310 includes a voice information collecting unit and a target position information determining unit.

And the voice information acquisition unit is used for acquiring the voice information of the speaking user when the target awakening word is acquired based on the preset multi-sound-area microphone array.

A target location information determining unit for determining target location information of a target speaking user corresponding to the voice information based on the multi-sound-zone microphone array.

Optionally, the target image information determining module 320 is further configured to shoot an image to be processed of a target area based on at least one preset image capture device, and determine target image information from the image to be processed according to the target position information; or generating a shooting instruction based on the target position information, and sending the shooting instruction to a target camera device so as to enable the target camera device to shoot target image information corresponding to the target position information; and the target image information comprises a target speaking user.

On the basis of the above technical solutions, the role authority determining module 330 includes an age level information determining unit and a role authority determining unit.

And the age level information determining unit is used for inputting the target image information into a target user classification model obtained through pre-training to obtain the age level information of the target speaking user in the target image information.

And the role authority determining unit is used for determining the role authority of the target speaking user based on the age level information and the target position information.

On the basis of the above technical solutions, the target execution mode determining module 340 includes an advanced role authority execution mode determining unit and a non-advanced role authority execution mode determining unit.

And the advanced role authority execution mode determining unit is used for executing the operation corresponding to the voice information according to the target execution mode if the role authority is the advanced role authority.

A non-advanced role authority execution mode determining unit, configured to determine, if the role authority is a non-advanced role authority, executable voice information corresponding to the role authority according to a pre-established mapping relation table, and determine, based on the voice information and the executable voice information, a target execution mode corresponding to the voice information; and the user authority level of the high-level role authority is higher than the user authority level of the non-high-level role authority.

Optionally, the non-advanced role authority execution manner determining unit is further configured to, if the voice information belongs to the executable voice information, execute an operation corresponding to the voice information in the target execution manner; and if the voice information does not belong to the executable voice information, the target execution mode is to play the voice information based on a playing device and execute the operation corresponding to the confirmed voice information after receiving the confirmed voice information sent by the high-level role authority user.

On the basis of the above technical solutions, the data processing apparatus further includes a relationship creation module.

And the relation creating module is used for creating corresponding relations among different role authorities, available voice information and execution modes in advance so as to determine a target execution mode based on the corresponding relations when the role authorities and the voice information are determined.

According to the technical scheme, the microphone array is preset, voice information of a speaking user is collected based on the microphone array, and target position information corresponding to the voice information is determined, so that a man-machine voice interaction way is provided for personnel at each position, interference of other sounds in a voice interaction process is avoided by utilizing directional pickup, the man-machine interaction efficiency and experience are improved, the role authority of the target speaking user can be further determined by determining the target image information of the target speaking user corresponding to the target position information, the execution mode and the operation corresponding to the voice information are determined based on the role authority, and the operation safety is ensured.

The data processing device provided by the embodiment of the invention can execute the data processing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.

Example four

Fig. 9 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. FIG. 9 illustrates a block diagram of an exemplary electronic device 40 suitable for use in implementing embodiments of the present invention. The electronic device 40 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 9, the electronic device 40 is in the form of a general purpose computing device. The components of electronic device 40 may include, but are not limited to: one or more processors or processing units 401, a system memory 402, and a bus 403 that couples the various system components (including the system memory 402 and the processing unit 401).

Bus 403 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 40 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 40 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 402 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)404 and/or cache memory 405. The electronic device 40 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 406 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 9, and commonly referred to as a "hard drive"). Although not shown in FIG. 9, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 403 by one or more data media interfaces. Memory 402 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 408 having a set (at least one) of program modules 407 may be stored, for example, in memory 402, such program modules 407 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 407 generally perform the functions and/or methods of the described embodiments of the invention.

The electronic device 40 may also communicate with one or more external devices 409 (e.g., keyboard, pointing device, display 410, etc.), with one or more devices that enable a user to interact with the electronic device 40, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 40 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interface 411. Also, the electronic device 40 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 412. As shown, the network adapter 412 communicates with the other modules of the electronic device 40 over the bus 403. It should be appreciated that although not shown in FIG. 9, other hardware and/or software modules may be used in conjunction with electronic device 40, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 401 executes various functional applications and data processing, for example, implementing a data processing method provided by an embodiment of the present invention, by executing a program stored in the system memory 402.

EXAMPLE five

Fifth, an embodiment of the present invention further provides a storage medium containing computer-executable instructions, which are used to perform a data processing method when executed by a computer processor.

The method comprises the following steps:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable item code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

The item code embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer project code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The project code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A data processing method, comprising:

2. The method of claim 1, wherein the collecting voice information of a speaking user based on a preset microphone array and determining target position information corresponding to the voice information comprises:

when a target awakening word is acquired based on a preset multi-sound-area microphone array, acquiring voice information of a speaking user;

determining target location information of a target speaking user corresponding to the voice information based on the multi-zone microphone array.

3. The method of claim 1, wherein the determining target image information of a target speaking user corresponding to the target location information comprises:

shooting an image to be processed of a target area based on at least one preset camera device, and determining target image information from the image to be processed according to the target position information; or the like, or, alternatively,

generating a shooting instruction based on the target position information, and sending the shooting instruction to a target camera device so as to enable the target camera device to shoot target image information corresponding to the target position information;

and the target image information comprises a target speaking user.

4. The method of claim 1, wherein the determining the role authority of the target speaking user according to the target image information comprises:

inputting the target image information into a target user classification model obtained through pre-training to obtain age level information of a target speaking user in the target image information;

and determining the role authority of the target speaking user based on the age level information and the target position information.

5. The method of claim 1, wherein the determining, based on the role authority, a target execution manner corresponding to the voice information to perform an operation corresponding to the voice information based on the target execution manner comprises:

if the role authority is a high-level role authority, the target execution mode is to execute the operation corresponding to the voice information;

if the role authority is not the advanced role authority, determining executable voice information corresponding to the role authority according to a pre-established mapping relation table, and determining a target execution mode corresponding to the voice information based on the voice information and the executable voice information;

and the user authority level of the high-level role authority is higher than the user authority level of the non-high-level role authority.

6. The method of claim 5, wherein determining the target execution mode corresponding to the speech information based on the speech information and the executable speech information comprises:

if the voice information belongs to the executable voice information, the target execution mode is to execute the operation corresponding to the voice information;

and if the voice information does not belong to the executable voice information, the target execution mode is to play the voice information based on a playing device and execute the operation corresponding to the confirmed voice information after receiving the confirmed voice information sent by the high-level role authority user.

7. The method of claim 1, further comprising:

and creating corresponding relations among different role authorities, available voice information and execution modes in advance, and determining a target execution mode based on the corresponding relations when determining the role authorities and the voice information.

8. A data processing apparatus, comprising:

9. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a data processing method as claimed in any one of claims 1-7.

10. A storage medium containing computer-executable instructions for performing the data processing method of any one of claims 1-7 when executed by a computer processor.