CN115762498A

CN115762498A - Voice playing control method and device and electronic equipment

Info

Publication number: CN115762498A
Application number: CN202211160978.5A
Authority: CN
Inventors: 王宁; 李良斌
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2023-03-07

Abstract

The invention provides a control method and device for voice playing and electronic equipment, and relates to the technical field of voice processing. The method comprises the following steps: the method comprises the steps of obtaining images and voices of users in an elevator car; determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice; under the conditions that the face direction and the pronunciation direction are both towards the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing, the voice playing mode of the playing equipment is adjusted; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing. Therefore, when the key positions of the face facing the elevator car and the pronunciation direction facing the elevator car are determined, the voice playing mode of the playing equipment is adjusted, so that the interference of the playing equipment on the subsequent voice control instruction recognition can be effectively reduced, and the accuracy of the voice recognition result is improved.

Description

Voice playing control method and device and electronic equipment

Technical Field

The present invention relates to the field of voice processing technologies, and in particular, to a method and an apparatus for controlling voice playing, and an electronic device.

Background

Along with the continuous development of science and technology, intelligent elevator has got into people's the field of vision. The intelligent elevator has a voice control function, and people can perform voice control on the elevator based on the voice control function; and a playing device, such as an advertising screen, is provided in the elevator car for broadcasting audios and videos, news and the like.

When a user performs voice control on the elevator, a microphone of the elevator collects voice output by the user and sends the collected voice to a control system of the elevator; correspondingly, the control system of the elevator carries out voice recognition on the collected voice and carries out corresponding elevator on the elevator according to the voice recognition result.

However, when the microphone collects the voice output by the user, if the playing device broadcasts audio and video at the same time, the microphone collects the broadcasted audio and video at the same time, and the audio and video can interfere with subsequent voice recognition, so that the accuracy of the voice recognition result is low.

Disclosure of Invention

The invention provides a control method and device for voice playing and electronic equipment, which can reduce the interference of playing equipment in an elevator car on subsequent voice recognition, thereby improving the accuracy of a voice recognition result.

The invention provides a control method for voice playing, which comprises the following steps:

images and speech of a user in an elevator car are obtained.

And determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice.

Under the condition that the face orientation and the pronunciation direction are both oriented to the key positions of the elevator car and the situation that the playing equipment in the elevator car is in voice playing is detected, adjusting the voice playing mode of the playing equipment; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing.

According to the control method for voice playing provided by the invention, when the face orientation and the pronunciation direction are determined to face the key position of the elevator car and the situation that the playing equipment in the elevator car is detected to be in voice playing is detected, the voice playing mode of the playing equipment is adjusted, and the control method comprises the following steps:

and under the condition that the face faces the key position of the elevator car and the playing equipment in the elevator car is in voice playing, the voice playing mode of the playing equipment is adjusted for the first time.

And carrying out awakening word detection on the voice.

And when the condition that the voice comprises a preset awakening word is detected, the pronunciation direction faces to the key position of the elevator car, and the voice playing mode of the playing equipment is adjusted again.

According to the control method for voice playing provided by the present invention, after the voice playing mode of the playing device is adjusted for the first time, the method further includes:

verifying the identity of the user according to the voice; the verification method comprises voiceprint verification.

Detect under pronunciation include the condition of predetermineeing the word of awakening up, just pronunciation direction orientation elevator car's button position adjusts again playback devices's voice playback mode includes:

and when the user identity verification is passed and the voice is detected to comprise a preset awakening word, the pronunciation direction faces to the key position of the elevator car, and the voice playing mode of the playing equipment is adjusted again.

According to the control method for voice playing provided by the invention, the adjusting the voice playing mode of the playing device comprises the following steps:

verifying the identity of the user according to the image; the verification method comprises face verification and/or iris verification.

And adjusting the voice playing mode of the playing equipment under the condition that the user identity authentication is passed.

According to the control method for voice playing provided by the present invention, the adjusting the voice playing mode of the playing device comprises:

detecting a volume of the voice and an ambient volume of the elevator car.

And determining the adjusting mode according to the volume of the voice and the environment volume of the elevator car.

And adjusting the voice playing mode of the playing equipment according to the adjusting mode.

According to the control method for voice playing provided by the invention, the control method for voice playing further comprises the following steps:

detecting whether a key position of a first user with the face direction and the pronunciation direction facing the elevator car exists in a first preset time period; under the condition that the first user does not exist, the voice playing mode of the playing equipment is restored to the voice playing mode before adjustment;

and/or the presence of a gas in the atmosphere,

detecting whether a preset awakening word exists in a second preset time period; and under the condition that the preset awakening words do not exist, restoring the voice playing mode of the playing equipment to the voice playing mode before adjustment.

According to the control method of voice playing provided by the invention, the determining the face orientation of the user according to the image comprises the following steps:

and inputting the image into a face orientation detection model to obtain the face orientation corresponding to the user.

The face orientation detection model is obtained by training an initial face orientation detection model based on a plurality of image samples and labeled face orientations corresponding to the image samples.

and acquiring the voice control instruction of the user after the voice playing mode is adjusted.

And analyzing the voice control command to obtain a keyword included in the voice control command, and controlling the elevator based on the keyword.

The present invention also provides a control device for voice playing, which may include:

the first acquisition unit is used for acquiring images and voice of users in the elevator car.

And the processing unit is used for determining the face orientation of the user according to the image and determining the pronunciation direction of the user according to the voice.

The adjusting unit is used for adjusting a voice playing mode of playing equipment under the conditions that the face direction and the pronunciation direction are determined to face the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing.

According to the control device for voice playing provided by the invention, the adjusting unit is specifically used for primarily adjusting the voice playing mode of the playing equipment under the condition that the key position of the face facing the elevator car is determined and the situation that the playing equipment in the elevator car is in voice playing is detected; performing awakening word detection on the voice; and when the condition that the voice comprises a preset awakening word is detected, the pronunciation direction faces to the key position of the elevator car, and the voice playing mode of the playing equipment is adjusted again.

According to the control device for voice playing provided by the invention, the adjusting unit is specifically configured to verify the identity of the user according to the voice after the voice playing mode of the playing device is adjusted for the first time; the verification method comprises voiceprint verification; and when the user identity verification is passed and the voice is detected to comprise a preset awakening word, the pronunciation direction faces to the key position of the elevator car, and the voice playing mode of the playing equipment is adjusted again.

According to the control device for voice playing provided by the invention, the adjusting unit is specifically configured to verify the identity of the user according to the image; the verification method comprises face verification and/or iris verification. And adjusting the voice playing mode of the playing equipment under the condition that the user identity authentication is passed.

According to the control device for voice playing provided by the invention, the adjusting unit is specifically used for detecting the volume of the voice and the environmental volume of the elevator car; determining the adjusting mode according to the volume of the voice and the environment volume of the elevator car; and adjusting the voice playing mode of the playing equipment according to the adjusting mode.

According to the control device for voice playing provided by the invention, the control device for voice playing further comprises a first detection unit, a first recovery unit, a second detection unit and a second recovery unit.

The first detection unit is used for detecting whether the face direction and the pronunciation direction of a first user face towards the key position of the elevator car in a first preset time period; the first restoring unit is configured to restore the voice playing mode of the playing device to the voice playing mode before adjustment if it is determined that the first user does not exist;

and/or the presence of a gas in the gas,

the second detection unit is used for detecting whether a preset awakening word exists in a second preset time period; and the second restoring unit is used for restoring the voice playing mode of the playing equipment to the voice playing mode before adjustment under the condition that the preset awakening word is determined not to exist.

According to the control device for voice playing provided by the present invention, the processing unit is specifically configured to input the image into a face orientation detection model, so as to obtain the face orientation corresponding to the user; the face orientation detection model is obtained by training an initial face orientation detection model based on a plurality of image samples and labeled face orientations corresponding to the image samples.

According to the control device for voice playing provided by the invention, the control device for voice playing further comprises a second acquisition unit and a control unit.

The second obtaining unit is used for obtaining the voice control instruction of the user after the voice playing mode is adjusted;

and the control unit is used for analyzing the voice control command to obtain a keyword included in the voice control command and controlling the elevator based on the keyword.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the processor realizes the control method of the voice playing.

The present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of controlling voice playback as described in any of the above.

The present invention also provides a computer program product comprising a computer program, which when executed by a processor implements the method for controlling voice playback as described in any of the above.

According to the voice playing control method, the voice playing control device and the electronic equipment, the image and the voice of the user in the elevator car are obtained; determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice; under the conditions that the face direction and the pronunciation direction are both towards the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing, the voice playing mode of the playing equipment is adjusted; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing. Therefore, when the key positions of the face facing the elevator car and the pronunciation direction facing the elevator car are determined, the voice playing mode of the playing equipment is adjusted, so that the interference of the playing equipment on the subsequent voice control instruction recognition can be effectively reduced, and the accuracy of the voice recognition result is improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for controlling voice playing according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a control apparatus for playing voice according to an embodiment of the present invention;

fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

In the embodiments of the present invention, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the present invention, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The technical scheme provided by the embodiment of the invention can be applied to an intelligent elevator control scene, in particular to a voice control scene of an intelligent elevator. The intelligent elevator has a voice control function, and people can perform voice control on the elevator based on the voice control function; and a playing device, such as an advertising screen, is provided in the elevator car for broadcasting audios and videos, news and the like. In some elevator scenarios, for example, too many people taking the elevator do not have the convenience of pressing keys; or, in order to avoid contacting elevator buttons and causing the propagation of viruses and bacteria and the like under epidemic situations, the elevator can be subjected to voice control based on a voice control function.

When a user performs voice control on the elevator, a microphone of the elevator collects voice output by the user and sends the collected voice to a control system of the elevator; and the control system of the elevator performs voice recognition on the collected voice and performs corresponding elevator on the elevator according to the voice recognition result. However, when the microphone collects the voice output by the user, if the playing device broadcasts the audio and video at the same time, the microphone collects the broadcasted audio and video together, and the audio and video can interfere with the subsequent voice recognition, so that the accuracy of the voice recognition result is low.

In order to reduce the interference of the playing equipment to the subsequent voice control instruction recognition, considering that under the normal condition, when the face direction and the pronunciation direction of a user face the key positions of the elevator car, the voice control is carried out on the elevator to a great extent, therefore, the embodiment of the invention provides a voice playing control method, which is characterized in that the image and the voice of the user in the elevator car are obtained; determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice; under the conditions that the face direction and the pronunciation direction are both towards the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing, the voice playing mode of the playing equipment is adjusted; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing. Therefore, when the key positions of the face facing the elevator car and the pronunciation direction facing the elevator car are determined, the voice playing mode of the playing equipment is adjusted, so that the interference of the playing equipment on the subsequent voice control instruction recognition can be effectively reduced, and the accuracy of the voice recognition result is improved.

Hereinafter, the control method of voice playing provided by the present invention will be described in detail through several specific embodiments described below. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1 is a flowchart illustrating a method for controlling voice playback according to an embodiment of the present invention, where the method for controlling voice playback can be executed by software and/or a hardware device. For example, referring to fig. 1, the method for controlling voice playing may include:

s101, obtaining images and voice of users in the elevator car.

The image may include, in addition to the user, a key position of the elevator car, and the like, and may be specifically set according to actual needs.

Illustratively, when the images of the users in the elevator car are obtained, a camera can be arranged in the elevator car in advance, and the images of the users in the elevator car are obtained through the images of the users collected by the camera.

For example, when the voice output by the user is acquired, a voice acquisition device, such as a microphone, may be arranged in the elevator car in advance, and the voice of the user acquired by the voice acquisition device is acquired, so as to acquire the voice of the user in the elevator car.

After the image and the voice of the user in the elevator car are acquired respectively, the following S102 may be performed:

s102, determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice.

For example, when the face orientation of the user is determined according to the image, the image may be input into a face orientation detection model obtained through pre-training, so as to obtain the face orientation corresponding to the user; the image recognition algorithm can also be adopted to recognize the image, determine the face orientation of the user and the like, and can be specifically set according to actual needs. The face orientation detection model is obtained by training the initial face orientation detection model based on a plurality of image samples and labeled face orientations corresponding to the image samples.

For example, when the initial face orientation detection model is trained based on a plurality of image samples and labeled face orientations corresponding to the image samples, the plurality of image samples may be input into the initial face orientation detection model to obtain predicted face orientations corresponding to the image samples; for each image sample, constructing a loss function corresponding to the image sample according to the direction of the labeled face and the predicted face corresponding to the image sample; and updating model parameters of the initial face orientation detection model according to the loss functions corresponding to the image samples until the updated face orientation detection model meets preset conditions, for example, the updating times reach a preset time threshold, or the updated face orientation detection model converges, and determining the face orientation detection model meeting the preset conditions as a final face orientation detection model, so as to train to obtain the face orientation detection model.

For example, when determining the pronunciation direction of the user according to the voice, the voice may be processed by a sound source positioning device, such as a microphone array or a direction of arrival estimation, to determine the pronunciation direction of the user. It should be noted that, in the embodiment of the present invention, the pronunciation direction is determined by using a microphone array or a direction of arrival estimation, which may refer to related description in the prior art for determining the pronunciation direction by using a microphone array or a direction of arrival estimation, and here, the embodiment of the present invention is not described again.

After the face direction and the pronunciation direction of the user are respectively determined, whether the user needs to carry out voice control on the elevator or not can be determined by combining the face direction and the pronunciation direction of the user, if the face direction and/or the pronunciation direction of the user does not face the key position of the elevator car, the fact that the user does not need to carry out voice control on the elevator is shown, and under the situation, the playing equipment of the elevator can continuously keep the current voice playing mode; on the contrary, if the face of the user faces and the pronunciation direction faces the key position of the elevator car, it indicates that the user needs to perform voice control on the elevator, and in this case, in order to avoid the audio and video broadcast by the playing device from interfering with the voice control instruction subsequently output by the user, the voice playing mode of the playing device can be adjusted, that is, the following S103 is executed:

s103, under the conditions that the face direction and the pronunciation direction are determined to face the key positions of the elevator car and the playing device in the elevator car is detected to be in voice playing, the voice playing mode of the playing device is adjusted; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing.

For example, in the embodiment of the present invention, when the voice playing mode of the playing device is adjusted, the volume of the voice and the ambient volume of the elevator car may be detected first; determining an adjusting mode jointly according to the volume of the voice and the environmental volume of the elevator car; and adjusting the voice playing mode of the playing equipment according to the adjusting mode.

For example, under the condition that the volume of the voice output by the user is large and the environmental volume of the elevator car is small, even if some other voices exist, the acquisition and voice recognition of the voice control instruction output by the user subsequently cannot be influenced, and in such a condition, the voice playing volume can be reduced in order not to influence the output of the playing equipment; of course, if the influence on the playing device is not considered, the voice playing can be directly paused, closed or muted; for another example, when the volume of the voice output by the user is small and the volume of the environment of the elevator car is large, in order to accurately collect and recognize the voice control command output subsequently, in this case, the voice play can be directly paused, turned off or muted.

It can be seen that in the embodiment of the invention, the image and the voice of the user in the elevator car are obtained; determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice; under the conditions that the face direction and the pronunciation direction are determined to face the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing, the voice playing mode of the playing equipment is adjusted; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing. Therefore, when the key positions of the face facing the elevator car and the pronunciation direction facing the elevator car are determined, the voice playing mode of the playing equipment is adjusted, so that the interference of the playing equipment on the subsequent voice control instruction recognition can be effectively reduced, and the accuracy of the voice recognition result is improved.

Based on the embodiment shown in fig. 1, when it is determined that the face faces and the pronunciation direction both face the key position of the elevator car and it is detected that the playing device in the elevator car is in voice playing, after the voice playing mode of the playing device is adjusted, the voice control instruction of the user after the voice playing mode is adjusted can be further acquired; and analyzing the voice control command to obtain keywords included in the voice control command, and controlling the elevator based on the keywords obtained by analysis.

For example, the keywords may be words for indicating floors such as 15 th floor and 8 th floor, or may be words for opening or closing doors, and may be specifically set according to actual needs.

Therefore, in the embodiment of the invention, when the elevator is controlled, the voice playing mode of the playing device is adjusted, so that the playing device after the voice playing mode is adjusted can not interfere with the acquisition and voice recognition of the voice control instruction subsequently output by a user, the voice control instruction can be better analyzed, the accuracy of the analysis result is improved, and the accuracy of the elevator control can be effectively improved when the elevator is subsequently controlled according to the analyzed keyword.

Based on the embodiment shown in fig. 1, when the voice playing mode of the playing device is adjusted when it is determined that the face direction and the sound emitting direction both face the key positions of the elevator car, for example, in order to further improve the accuracy of the adjustment, before the voice playing mode of the playing device is adjusted, it may be determined whether the face direction of the user is the key positions of the elevator car, and when it is determined that the face faces the key positions of the elevator car and it is detected that the playing device in the elevator car is in voice playing, the voice playing mode of the playing device is initially adjusted; further carrying out awakening word detection on the voice; and under the condition that the detected voice comprises the preset awakening words, the pronunciation direction faces the key position of the elevator car, and the voice playing mode of the playing equipment is adjusted again.

For example, the preset wake-up word may be a small ladder, an elevator, or the like, and may be specifically set according to actual needs, where the embodiment of the present invention is not further limited to the specific setting of the preset wake-up word.

Exemplarily, when detecting a wake-up word for a voice to determine whether the voice includes a preset wake-up word, the voice may be input into a wake-up word detection model obtained by pre-training to obtain a corresponding wake-up word; the voice can also be detected by adopting a wake-up word extraction algorithm, the corresponding wake-up word can be determined, and the like, and the voice can be specifically set according to actual needs. The awakening word detection model is obtained by training the initial awakening word detection model based on a plurality of voice samples and the marked awakening words corresponding to the voice samples.

Illustratively, based on a plurality of voice samples and labeled wake-up words corresponding to the voice samples, when the initial wake-up word detection model is trained, the voice samples may be input into the initial wake-up word detection model to obtain predicted wake-up words corresponding to the voice samples; aiming at each voice sample, constructing a loss function corresponding to the voice sample according to the labeled awakening words and the predicted awakening words corresponding to the voice sample; and updating model parameters of the initial awakening word detection model according to the loss function corresponding to each voice sample until the updated awakening word detection model meets a preset condition, for example, the updating times reach a preset time threshold, or the updated awakening word detection model is converged, and determining the awakening word detection model meeting the preset condition as a final awakening word detection model, so as to obtain the awakening word detection model through training.

For example, in the above scheme, after the voice playing mode of the playing device is adjusted for the first time, the identity of the user may be verified according to the voice; the verification method comprises voiceprint verification, and the voice playing mode of the playing device is adjusted again when the user identity verification is passed and the voice is detected to comprise the preset awakening word and the pronunciation direction faces the key position of the elevator car.

It can be seen that, in the embodiment of the present invention, the voice playing mode of the playing device is first adjusted; further detecting the awakening words of the voice; under the condition that voice including the preset awakening word is detected, the voice playing mode of the playing device is adjusted again with the pronunciation direction facing to the key position of the elevator car, so that the voice playing mode of the playing device can be prevented from being adjusted by mistake, and the accuracy of the adjustment of the voice playing mode is improved.

Based on the embodiment shown in fig. 1, when the voice playing mode of the playing device is adjusted under the condition that it is determined that the face and the pronunciation direction both face the key position of the elevator car, for example, in order to further improve the security of the adjustment, before the voice playing mode of the playing device is adjusted, the identity of the user may be verified, and when the user identity verification passes, the voice playing mode of the playing device is adjusted, so that the adjustment of the voice playing mode by an illegal user may be avoided, and the security of the adjustment is improved.

For example, when the identity of the user is verified, the identity of the user may be verified according to the image; the verification method comprises face verification and/or iris verification; the embodiments of the present invention are only illustrated by way of example according to the image mode, but do not represent that the embodiments of the present invention are only limited thereto.

Based on the embodiment shown in fig. 1, when it is determined that the face and the pronunciation direction both face the key position of the elevator car, and the voice playing mode of the playing device is adjusted, the playing device does not perform voice playing based on the adjusted voice playing mode all the time, but recovers the voice playing mode of the playing device to the voice playing mode before adjustment and broadcasts the voice playing mode before adjustment when it is not detected that the user wants to perform voice control on the elevator, so that the influence on playing of playing contents caused by the fact that the playing device is always in the adjusted voice playing mode, such as voice playing is paused, voice playing is turned off or voice playing is muted, can be avoided.

For example, when the voice playing mode of the playing device is restored to the voice playing mode before the adjustment, the method may include:

detecting whether a key position of a first user with the face direction and the pronunciation direction both facing an elevator car exists in a first preset time period; in the case that the first user is determined not to exist, the fact that the user wants to carry out voice control on the elevator is not detected, and in the case, the voice playing mode of the playing equipment can be restored to the voice playing mode before adjustment; the value of the first preset time period may be set according to actual needs, and the embodiment of the present invention is not particularly limited herein.

And/or the presence of a gas in the gas,

detecting whether a preset awakening word exists in a second preset time period; under the condition that the preset awakening words do not exist, the fact that the situation that a user wants to carry out voice control on the elevator is not detected is indicated, and under the situation, the voice playing mode of the playing equipment can be restored to the voice playing mode before adjustment; the value of the second preset time period may be set according to actual needs, and the embodiment of the present invention is not particularly limited herein.

It can be seen that, in the embodiment of the present invention, whether the face of the first user faces and the pronunciation direction faces the key position of the elevator car in the first preset time period and/or whether the preset wake-up word exists in the second preset time period is detected, whether the user wants to perform voice control on the elevator is detected, and when the user does not want to perform voice control on the elevator, the voice playing mode of the playing device is restored to the voice playing mode before adjustment, so that the influence on the playing of the playing content due to the fact that the playing device is always in the adjusted voice playing mode, such as voice playing is paused, voice playing is turned off or voice playing is muted, can be avoided.

The following describes the control device for voice playing provided by the present invention, and the control device for voice playing described below and the control method for voice playing described above can be referred to correspondingly.

Fig. 2 is a schematic structural diagram of a control device for playing voice according to an embodiment of the present invention, for example, please refer to fig. 2, where the control device 20 for playing voice may include:

a first acquisition unit 201 for acquiring images and voices of users in the elevator car.

And the processing unit 202 is used for determining the face orientation of the user according to the image and determining the pronunciation direction of the user according to the voice.

The adjusting unit 203 is used for adjusting the voice playing mode of the playing equipment under the conditions that the face direction and the pronunciation direction are determined to face the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing.

Optionally, the adjusting unit 203 is specifically configured to, when the position of the key whose face faces the elevator car is determined and it is detected that the playing device in the elevator car is in voice playing, primarily adjust a voice playing mode of the playing device; performing awakening word detection on the voice; and under the condition that the detected voice comprises the preset awakening words, the pronunciation direction faces the key position of the elevator car, and the voice playing mode of the playing equipment is adjusted again.

Optionally, the adjusting unit 203 is specifically configured to verify the identity of the user according to the voice after the voice playing mode of the playing device is adjusted for the first time; the verification method comprises voiceprint verification; when the user identity verification is passed and the detected voice comprises a preset awakening word, the pronunciation direction faces the key position of the elevator car, and the voice playing mode of the playing equipment is adjusted again.

Optionally, the adjusting unit 203 is specifically configured to verify the identity of the user according to the image; the verification method comprises face verification and/or iris verification. And under the condition that the user identity authentication is passed, adjusting the voice playing mode of the playing equipment.

Optionally, the adjusting unit 203 is specifically configured to detect a volume of the voice and an ambient volume of the elevator car; determining an adjusting mode according to the volume of the voice and the environment volume of the elevator car; and adjusting the voice playing mode of the playing equipment according to the adjusting mode.

Optionally, the control device 20 for playing back voice may further include a first detecting unit, a first recovering unit, a second detecting unit, and a second recovering unit.

The first detection unit is used for detecting whether the face direction and the pronunciation direction of a first user face the key position of the elevator car or not in a first preset time period; the first restoring unit is used for restoring the voice playing mode of the playing equipment to the voice playing mode before adjustment under the condition that the first user is determined not to exist;

and/or the presence of a gas in the gas,

Optionally, the processing unit 202 is specifically configured to input the image into the face orientation detection model, and obtain a face orientation corresponding to the user; the face orientation detection model is obtained by training an initial face orientation detection model based on a plurality of image samples and labeled face orientations corresponding to the image samples.

Optionally, the control device 20 for voice playing may further include a second acquiring unit and a control unit.

And the second acquisition unit is used for acquiring the voice control instruction of the user after the voice playing mode is adjusted.

The control apparatus 20 for voice playing provided in the embodiment of the present invention may implement the technical solution of the control method for voice playing in any embodiment, and the implementation principle and the beneficial effects thereof are similar to those of the control method for voice playing, and reference may be made to the implementation principle and the beneficial effects of the control method for voice playing, which are not described herein again.

Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor) 310, a communication Interface (Communications Interface) 320, a memory (memory) 330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform a method of controlling the playback of speech, the method comprising: acquiring images and voices of users in an elevator car; determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice; under the conditions that the face direction and the pronunciation direction are determined to face the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing, the voice playing mode of the playing equipment is adjusted; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing.

In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer readable storage medium, when the computer program is executed by a processor, the computer can execute the method for controlling voice playback provided by the above methods, the method includes: acquiring images and voices of users in an elevator car; determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice; under the conditions that the face direction and the pronunciation direction are both towards the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing, the voice playing mode of the playing equipment is adjusted; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements a method for controlling voice playback provided by the above methods, the method comprising: acquiring images and voices of users in an elevator car; determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice; under the conditions that the face direction and the pronunciation direction are both towards the key positions of the elevator car and the playing equipment in the elevator car is detected to be in voice playing, the voice playing mode of the playing equipment is adjusted; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A control method for voice playing is characterized by comprising the following steps:

acquiring images and voices of users in an elevator car;

determining the face orientation of the user according to the image, and determining the pronunciation direction of the user according to the voice;

2. The method for controlling voice playing according to claim 1, wherein when it is determined that the face and the sound-emitting direction are both facing the key positions of the elevator car and it is detected that a playing device in the elevator car is in voice playing, adjusting the voice playing mode of the playing device comprises:

under the conditions that the face faces the position of a key of the elevator car and the playing device in the elevator car is detected to be in voice playing, primarily adjusting the voice playing mode of the playing device;

performing awakening word detection on the voice;

when the voice is detected to include the preset awakening word, the pronunciation direction faces to the key position of the elevator car, and the voice playing mode of the playing equipment is adjusted again.

3. The method for controlling audio playback according to claim 2, wherein after the audio playback manner of the playback device is adjusted for the first time, the method further comprises:

verifying the identity of the user according to the voice; the verification method comprises voiceprint verification;

detect under the condition that pronunciation include the preset word of awakening up, just pronunciation direction orientation the button position of elevator car adjusts again playback devices's pronunciation broadcast mode includes:

4. The method for controlling audio playback according to any one of claims 1 to 3, wherein the adjusting the audio playback mode of the playback device includes:

verifying the identity of the user according to the image; the verification method comprises face verification and/or iris verification;

and under the condition that the user identity authentication is passed, adjusting the voice playing mode of the playing equipment.

5. The method for controlling audio playback according to any one of claims 1 to 3, wherein the adjusting the audio playback mode of the playback device includes:

detecting a volume of the voice and an ambient volume of the elevator car;

determining the adjusting mode according to the volume of the voice and the environment volume of the elevator car;

6. The method for controlling audio playback according to any one of claims 1 to 3, wherein the method further comprises:

detecting whether a key position of a first user, in which the face direction and the pronunciation direction of the first user face the elevator car, exists within a first preset time period; under the condition that the first user does not exist, the voice playing mode of the playing equipment is restored to the voice playing mode before adjustment;

and/or the presence of a gas in the gas,

7. The method for controlling voice playing according to any one of claims 1 to 3, wherein the determining the face orientation of the user according to the image comprises:

inputting the image into a face orientation detection model to obtain the face orientation corresponding to the user;

8. The method for controlling voice playback according to any one of claims 1 to 3, wherein the method further comprises:

acquiring a voice control instruction of the user after the voice playing mode is adjusted;

9. A control apparatus for playing back a voice, comprising:

the first acquisition unit is used for acquiring images and voices of users in the elevator car;

the processing unit is used for determining the face orientation of the user according to the image and determining the pronunciation direction of the user according to the voice;

the adjusting unit is used for adjusting the voice playing mode of the playing equipment under the condition that the face orientation and the pronunciation direction are both oriented to the key positions of the elevator car and the situation that the playing equipment in the elevator car is detected to be in voice playing is determined; the adjusting mode comprises reducing voice playing volume, pausing voice playing, closing voice playing or muting voice playing.

10. An electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, characterized in that said processor implements the method of controlling the playing of speech according to any of claims 1 to 8 when executing said program.

11. A non-transitory computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for controlling playback of a voice according to any one of claims 1 to 8.