CN108073381A - A kind of object control method, apparatus and terminal device - Google Patents

A kind of object control method, apparatus and terminal device Download PDF

Info

Publication number
CN108073381A
CN108073381A CN201611006003.1A CN201611006003A CN108073381A CN 108073381 A CN108073381 A CN 108073381A CN 201611006003 A CN201611006003 A CN 201611006003A CN 108073381 A CN108073381 A CN 108073381A
Authority
CN
China
Prior art keywords
sound
microphones
sound source
microphone
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611006003.1A
Other languages
Chinese (zh)
Inventor
梁俊斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201611006003.1A priority Critical patent/CN108073381A/en
Publication of CN108073381A publication Critical patent/CN108073381A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

This application discloses a kind of object control methods, device and terminal device, method is applied to the equipment for being provided at least two microphones, the opening direction of wherein at least two microphone is different, equipment surface is respectively arranged with the entity object for possessing lateral property towards identical side with each microphone aperture, the application utilizes the multi-microphone that terminal is set, identification sends the direction of the sound source of phonetic order, determine the target entity object of control needed for phonetic order, and then control the target entity object operation, the control to different entities object can be realized by phonetic order by user, it is easy to operate, it is convenient.

Description

Object control method and device and terminal equipment
Technical Field
The present application relates to the field of object control technologies, and in particular, to an object control method, an object control apparatus, and a terminal device.
Background
With the development of intelligent terminals, existing terminal devices are generally equipped with various entity objects, and different entity objects have different functions, for example, a camera module configured on a mobile phone can provide functions of taking pictures and recording videos, and a light module can provide functions of lighting and illuminating.
Taking a camera as an example, an existing mobile phone generally has a front camera and a rear camera for meeting photographing requirements in different directions. In order to meet different photographing requirements, the mobile phone needs to be operated to select the required front-mounted camera or rear-mounted camera before photographing, sometimes the front-mounted camera and the rear-mounted camera are frequently and constantly switched, so that the operation is troublesome, and particularly, the mobile phone needs to be taken down to be operated when being hung on a selfie stick, which is very inconvenient.
Disclosure of Invention
In view of this, the present application provides an object control method, an object control device and a terminal device, which are used to solve the problem of inconvenient control of switching among existing multi-entity objects.
In order to achieve the above object, the following solutions are proposed:
an object control method is applied to equipment provided with at least two microphones, wherein the openings of the at least two microphones are in different directions, and solid objects with lateral properties are respectively arranged on the surface of the equipment and the side faces, which are in the same direction, of the openings of the microphones, and the method comprises the following steps:
acquiring sound signals collected by the microphones;
identifying whether the sound signal contains a set voice instruction;
if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone;
determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
and controlling the target entity object to run.
An object control device is applied to equipment provided with at least two microphones, wherein the openings of the at least two microphones are in different directions, and solid objects with lateral properties are respectively arranged on the surfaces of the equipment and the sides of the equipment, which are in the same direction as the openings of the microphones, the device comprises:
the sound signal acquisition unit is used for acquiring sound signals collected by the microphones;
the language instruction identification unit is used for identifying whether the sound signal contains a set voice instruction or not;
the sound source relative position determining unit is used for determining the relative position relationship between a sound source which sends a set voice instruction and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone if each sound signal contains the set voice instruction;
the distance determining unit is used for determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
and the entity object control unit is used for controlling the operation of the target entity object.
A terminal device, comprising at least two microphones, wherein at least two microphones have different opening orientations, and a side of a surface of the terminal device, which is the same as the opening orientations of the microphones, is provided with a directional entity object, respectively, the terminal device further comprises a processor, and the processor is configured to:
acquiring sound signals collected by the microphones;
identifying whether the sound signal contains a set voice instruction;
if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone;
determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
and controlling the target entity object to run.
The object control method provided by the embodiment of the application is applied to equipment provided with at least two microphones, wherein the openings of the at least two microphones are in different directions, and the side surfaces of the surface of the equipment, which are in the same direction as the openings of the microphones, are respectively provided with entity objects with lateral properties, so that sound signals collected by the microphones are obtained; identifying whether the sound signal contains a set voice instruction; if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone; determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment; and controlling the target entity object to run. Therefore, the direction of a sound source sending a voice command is identified by using the multiple microphones arranged on the terminal, the target entity object required to be controlled by the voice command is determined, the target entity object is controlled to operate, a user can control different entity objects through the voice command, and the operation is simple and convenient.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of an object control method disclosed in an embodiment of the present application;
FIG. 2 is a schematic diagram of the gain distribution of a cardioid directional microphone;
FIG. 3 is a schematic diagram illustrating the relative position relationship between a sound source and an omni-directional microphone according to an exemplary embodiment of the present disclosure;
FIG. 4 is a flow chart of a method for determining a direction angle of a sound source according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating the relative position relationship between a sound source and a unidirectional microphone according to an exemplary embodiment of the present disclosure;
FIG. 6 is a flow chart of another method for determining the direction angle of a sound source disclosed in an embodiment of the present application;
fig. 7 is a schematic structural diagram of an object control apparatus according to an embodiment of the present application;
fig. 8 is a schematic diagram of a hardware structure of a terminal device disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
With the continuous innovation of the technology of acoustic sensors and the more refined size of the sensors, multiple microphones have become the standard configuration of terminal equipment, and the arrangement positions of the microphones are distributed on different sides of the terminal. Taking a mobile phone as an example, microphones are generally arranged at the top of the front of the mobile phone, beside a camera at the back of the mobile phone, at the bottom of the mobile phone, and the like. The different microphone deployment locations differ and the directivity of each microphone may be different.
The method and the device have the advantages that the sound source direction containing the set voice command is identified based on the multiple microphones, the target entity object required to be controlled by the set voice command is determined according to the sound source direction, and the target entity object is controlled to operate according to the set voice command, so that the operation of a user is simplified, and great convenience is brought to the user.
Several scenarios that are more common in the scheme of the present application are described below:
scene 1:
the mobile phone is provided with a front camera and a rear camera, and a front panel and a rear panel of the mobile phone are respectively provided with a microphone.
The user can use the front camera to take a picture, and also can use the rear camera to take a picture. According to the implementation mode of the prior art, a user needs to select a camera used in a mobile phone photographing setting page and further use the selected camera for photographing.
According to the implementation mode of the application, a shot person only needs to speak out a set voice command, each microphone of the mobile phone respectively collects sound signals, further, the relative position relation between a sound source sending the set voice command and front and rear panels of the mobile phone is determined according to the sound signals, a target camera required to be controlled by the voice command is determined, and the target camera is controlled to start shooting.
Obviously, according to the method, a user does not need to manually select the camera to be used, and the operation is simpler and more convenient.
Scene 2:
the terminal equipment comprises a front panel and a rear panel, and a searchlight and a microphone are respectively arranged on each panel. The terminal equipment can be used for a concert scene to play light for singers.
According to the prior art, a background person is required to manually select to turn on a searchlight on one of the front and rear panels to illuminate the singer according to the location of the singer.
After the method is used, a song to be sung by a singer can be used as a voice instruction, or a voiceprint of the singer is used as the voice instruction, each microphone of the terminal equipment respectively collects voice signals, further, the relative position relation between the singer and front and rear panels of the mobile phone is determined according to the voice signals, a target searchlight required to be controlled by the voice instruction is determined, the target searchlight is controlled to be turned on, and light is provided for the singer.
Of course, besides the two scenarios illustrated above, the solution of the present application may also be applied in other scenarios, such as video conferencing, etc.
Next, the object control method of the present application will be described in detail. The object control method of the present application may be applied to an apparatus provided with at least two microphones having openings oriented differently. It will be appreciated that the apparatus may have a plurality of different sides, that microphones may be provided on different sides of the apparatus, and that the number of microphones provided on each side may be varied to ensure that at least two sides are provided with microphones.
Furthermore, the device surface is provided with a solid object with lateral property on the same side face with the opening of each microphone, and the solid object with lateral property can be a camera, a loudspeaker, a light element and the like. The entity object with the laterality can be arranged on different sides of the terminal, such as a bilateral camera, a bilateral flash lamp, a bilateral loudspeaker and the like arranged on a front panel and a rear panel of the mobile phone.
Referring to fig. 1, the method may include:
step S100, acquiring sound signals collected by each microphone;
specifically, each microphone on the equipment is in an open state during operation, and can collect sound signals in the environment.
Step S110, identifying whether the sound signal contains a set voice command;
it should be noted that, since the microphones are not located far away from each other on the device, the sound signal including the setting voice command from the sound source is generally collected by each microphone. Therefore, in this step, the sound signals collected by the microphones can be recognized, and in a normal situation, if the sound source emits a sound signal including a set voice command, each microphone can collect the sound signal, that is, the set voice command can be recognized from each sound signal.
The voice command can be a voice command preset by a user, and the voice command can be in a character form or a voice signal. The process of identifying whether the voice signal contains the set voice command can be realized through the following two ways:
one is as follows: converting the sound signal into text information; and identifying whether the text information contains set text characters, if so, determining that the text information contains a set voice command, and if not, determining that the text information does not contain the set voice command.
The second step is as follows: and performing signal feature matching on the sound signal and the template sound signal, if the matching is successful, determining that the set voice command is contained, otherwise, determining that the set voice command is not contained.
Wherein the template sound signal may contain voice instructions. Certainly, the template sound signal may also be a sound pattern of a sound source collected in advance, and when the signal characteristics are matched, the sound pattern of the sound signal is matched with the template sound pattern to determine whether the sound pattern of the sound signal is consistent with the template sound pattern.
Step S120, if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone;
specifically, if the setting voice instruction is included in the sound signal, it indicates that the sound source has issued the setting voice instruction. In this step, the relative position relationship between the sound source that sends the set voice command and each side of the device may be determined according to the attribute parameters of the sound signals collected by each microphone.
The attribute parameters of the sound signal collected by the microphone include but are not limited to: phase, energy, intensity, etc. of the sound signal. Since the microphones are arranged on different sides of the equipment, time delay and energy variation exist between sound signals emitted by sound sources collected by the microphones arranged on different sides. The method and the device can determine the relative position relation between the sound source and each side of the equipment by utilizing each attribute parameter of the sound signal collected by each microphone.
Optionally, according to the difference in the directivity of the microphone, different attribute parameters of the sound signal may be selected to determine the relative position relationship between the sound source and each side of the device in this step. For example, for a unidirectional microphone, the gain of the microphone towards one side of the opening is much larger than the gain of the microphone towards other directions, so that the energy of the microphones arranged on different sides can have a large difference, and therefore, for the unidirectional microphone, the relative position relationship between the sound source and each side of the device can be determined through the energy parameters of the sound signals collected by each microphone. For another example, for a omnidirectional microphone, the gain of the omnidirectional microphone in each direction does not change much, if the energy parameter of the sound signal is used for position identification, the accuracy may be low, and the phase of the sound signal after passing through different sides of the device may change greatly, which may result in a larger time delay of the sound signal collected by the microphone on the different sides.
Of course, the above only illustrates two alternative ways, and besides, the present application may also determine the relative position relationship of the sound source and the sides of the device according to other attribute parameters of the sound signal.
Step S130, determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
after the relative position relationship between the sound source and each side surface of the equipment is determined, the target entity object required to be controlled by the set voice command can be further determined.
Taking a self-photographing scene of a user as an example, the user sends a photographing instruction as a sound source, and after the sound source direction is determined by the mobile phone, the camera required to be controlled by the photographing instruction can be determined to be a front-facing camera according to the sound source direction, so that the camera arranged on the front panel can be controlled to be opened.
And step S140, controlling the operation of the target entity object.
Specifically, the set voice command may include a control policy for the target entity object, and further, the target entity object may be controlled to operate based on the set voice command.
In an exemplary scenario, a user may set a voice instruction as "take a picture" or "record a video", and after determining a camera to be controlled, the mobile phone may control the camera to operate according to a control policy indicated by the voice instruction, for example, control the camera to take a picture or record a video.
Further optionally, the control strategies corresponding to the voice instructions may be stored in advance, and after the target entity object to be controlled is determined, the control strategies corresponding to the voice instructions are acquired, and the target entity object is controlled to operate based on the acquired control strategies.
Furthermore, the method and the device can also preset corresponding control strategies for each entity object, further obtain the control strategies corresponding to the target entity object after the target object to be controlled is determined, and control the target entity object to operate based on the obtained control strategies.
The control strategy can be different according to different entity objects. As for a camera, the control strategy may include: opening, taking a picture after opening, recording a video after opening, closing and the like. For a light emitting device, the control strategy may include: turn on, turn off, adjust luminous intensity, etc.
The object control method provided by the embodiment of the application is applied to equipment provided with at least two microphones, wherein the openings of the at least two microphones are in different directions, and the side surfaces of the surface of the equipment, which are in the same direction as the openings of the microphones, are respectively provided with entity objects with lateral directions, so that sound signals collected by the microphones are obtained; identifying whether the sound signal contains a set voice instruction; if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone; determining a target entity object required to be controlled by setting a voice command according to the relative position relation between the sound source and each side surface of the equipment; and controlling the target entity object to run. Therefore, the direction of a sound source sending a voice command is identified by using the multiple microphones arranged on the terminal, the target entity object required to be controlled by the voice command is determined, the target entity object is controlled to operate, a user can control different entity objects through the voice command, and the operation is simple and convenient.
In the following embodiments, the directivity of the microphone will be described first.
Microphone directivity: describing the sensitivity of a microphone to sound from different angles, there are generally both fully directional and uni-directional. The sensitivity is substantially the same for all directional types, i.e. for sounds coming from different angles. The single directional is sensitive to sound coming from some angles and insensitive to sound from other angles. Such as the most typical cardioid orientation. Referring to fig. 2, which illustrates the gain distribution of the cardioid directional microphone, it can be seen from fig. 2 that it is sensitive to the sound coming from the front (the gain is large) and insensitive to the sound coming from the rear (the gain is low).
According to the different directivities of the microphones on the equipment, the present application can set different ways to determine the relative position relationship between the sound source and the sides of the equipment, and the following description is made respectively:
the first mode is as follows:
all the microphones on the equipment are all directional microphones.
The step of determining the relative position relationship between the sound source which sends the set voice command and each side of the device according to the attribute parameters of the sound signals collected by each microphone in the above step may include:
s1, processing the sound signals collected by each microphone according to a generalized cross-correlation algorithm to obtain the direction angle of the sound source sending the set voice command;
the generalized cross-correlation algorithm can determine the time delay of the sound signal collected by each microphone according to the phase of the sound signal, and further determine the direction angle of the sound source according to the time delay.
And S2, determining the relative position relation between the sound source and each side surface of the equipment according to the direction angle of the sound source.
Referring to the case illustrated in fig. 3, the process of processing the sound signals collected by the microphones according to the generalized cross-correlation algorithm to obtain the direction angle of the sound source that emits the set voice command in S1 will be described in detail.
A schematic diagram of the relative position relationship of a sound source and an omni-directional microphone is illustrated in fig. 3.
In this embodiment, the number of the microphones is two, and the two microphones are respectively disposed on the front and rear panels of the mobile phone. The midpoint of the connecting line of the two microphones mic1 and mic2 is used as an origin O, the direction from the mic1 to the mic2 is the X-axis forward direction, and the direction perpendicular to the O point is the Y-axis forward direction.
It can be seen from fig. 3 that there is a time delay inevitably after the sound signal emitted by the sound source passes through the front and rear panels of the mobile phone, and the direction angle of the sound source can be determined according to the time delay of the sound signal collected by the two microphones, so as to further determine the relative position relationship between the sound source and the front and rear panels of the mobile phone.
In determining the sound source direction angle, the present application may use a generalized cross-correlation algorithm, and then, for example, a generalized cross-correlation GCC algorithm is taken as an example to describe an implementation process for determining the sound source direction angle, which may refer to the processing flow illustrated in fig. 4, and as shown in fig. 4, the process includes:
step S400, respectively carrying out Fourier transform on a first sound signal and a second sound signal which are respectively collected by two microphones, and carrying out conjugate multiplication on the first sound signal and the second sound signal which are transformed to obtain a multiplication result;
specifically, the first sound signal and the second sound signal are defined as x respectively1And x2The two are determined as X after Fourier transformation1And X2. The result of conjugate multiplication is
Step S410, solving a cross-power spectral density function of the multiplication result;
in particular, the cross-power spectral density function may be expressed as:
step S420, performing inverse Fourier transform on the cross-power spectral density function, and performing modulus calculation on a transform result to obtain a modulus calculation result;
and step S430, searching an angle corresponding to the peak value according to the modulus result, and determining the angle as a sound source direction angle.
According to the processing mode, the direction angle of the sound source can be determined according to the sound signals collected by the two microphones.
Of course, in the above processing flow, in order to improve the signal-to-noise ratio and increase the convergence rate of the cross-power spectral density function, the following processing links may be added between step S410 and step S420:
and weighting the cross-power spectral density function to obtain the weighted cross-power spectral density function.
Wherein the weighting function can be expressed as:
the weighted cross-power spectral density function can be expressed as:
of course, the GCC algorithm is only used as an example for the description above, and other algorithms such as MUSIC (Multiple Signal Classification) algorithm, esprit (timing Signal parameters dynamic techniques) algorithm, etc. may be used to determine the direction angle of the sound source in the present application.
The second mode is as follows:
the microphones of the device are not all omni-directional microphones.
In this case, it can be further subdivided into: each microphone is a unidirectional microphone in its entirety, and is a unidirectional microphone in part and an omni-directional microphone in part.
Because of the existence of the unidirectional microphone, energy attenuation exists when a sound signal emitted by a sound source reaches each microphone, and the sound source direction can be determined based on the difference of the energy of the sound signal collected by each microphone.
Taking the microphones as examples, all of which are unidirectional microphones, fig. 5 illustrates a schematic diagram of the relative position relationship between the sound source and the unidirectional microphones.
The unidirectional microphones illustrated in fig. 5 are heart-shaped directional microphones, and the number of the unidirectional microphones is two, and the two unidirectional microphones are respectively arranged on the front panel and the rear panel of the mobile phone.
It can be seen from fig. 5 that energy attenuation inevitably exists after sound signals emitted by the sound source pass through the front and rear panels of the mobile phone, and the direction angle of the sound source can be determined according to the energy values of the sound signals collected by the two microphones, so as to further determine the relative position relationship between the sound source and the front and rear panels of the mobile phone.
Referring to the process flow illustrated in fig. 6, the process of determining the sound source direction angle may include:
step S600, measuring sound energy of sound signals collected by each microphone;
step S610, comparing the sound energy of the sound signals collected by the microphones, and determining a target microphone corresponding to the sound signal with the maximum sound energy;
step S620, determining the opening direction of the target microphone as the sound source direction.
It is understood that if the operator shouts a voice command to the handset in front, the energy of the voice command signal collected by the front microphone should be greater than the energy of the voice command signal collected by the rear microphone, and vice versa, if the operator shouts a voice command to the handset in rear of the handset, the energy of the voice command signal collected by the rear microphone should be greater than the energy of the voice command signal collected by the front microphone. Based on this, the opening orientation of the target microphone corresponding to the sound signal with the largest sound energy can be determined as the sound source direction.
Further optionally, under the influence of materials and a processing process, there is a certain difference in gains of different microphones, and in order to ensure measurement accuracy, before the measurement of the sound energy of the sound signal collected by each microphone, the following processes may be further added:
and correcting the sound signals collected by the microphones according to the set gain correction coefficients of the microphones to obtain corrected sound signals.
Based on this, the process of measuring the sound energy of the sound signal collected by each microphone specifically includes: the sound energy of each sound signal after correction is measured.
The gain correction coefficients of the microphones may be marked when the microphones leave the factory, or may be obtained by later measurement.
It is understood that the physical object to be controlled in the present application may be a camera, or a light emitting device. The terminal equipment can be a terminal with a front camera and a rear camera, and can also be a terminal with a front light-emitting device and a rear light-emitting device. And, the terminal equipment front and back panel is provided with at least one microphone respectively. Based on the method, after a target camera or a target light-emitting device required to be controlled by a voice command is determined, the working state of the target camera can be controlled; or, the operating state of the light emitting device from the target is controlled.
The following describes the object control device provided in the embodiments of the present application, and the object control device described below and the object control method described above may be referred to in correspondence with each other.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an object control apparatus disclosed in the embodiment of the present application. The object control device is applied to equipment provided with at least two microphones, wherein the openings of the at least two microphones are in different directions, and solid objects with lateral properties are respectively arranged on the surfaces of the equipment and the sides of the openings of the microphones, which are in the same directions. As shown in fig. 7, the apparatus includes:
a sound signal acquiring unit 11, configured to acquire sound signals acquired by the microphones;
a language instruction identification unit 12 for identifying whether the sound signal contains a set voice instruction;
a sound source relative position determining unit 13, configured to determine, according to attribute parameters of sound signals collected by microphones, a relative position relationship between a sound source that has sent a set voice instruction and each side of the device if each sound signal includes the set voice instruction;
a distance determining unit 14, configured to determine, according to a relative positional relationship between the sound source and each side of the device, a target entity object to be controlled by the set voice command;
and an entity object control unit 15, configured to control the operation of the target entity object.
The object control device acquires sound signals collected by the microphones; identifying whether the sound signal contains a set voice instruction; if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone; determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment; and controlling the target entity object to run. Therefore, the direction of a sound source sending the voice command is identified by using the multiple microphones arranged on the terminal, the target entity object required to be controlled by the voice command is determined, the target entity object is controlled to operate, a user can control different entity objects through the voice command, and the operation is simple and convenient.
Optionally, according to the difference in microphone directivity, the present application discloses different constituent structures of the sound source relative position determining unit, as follows:
if all the microphones are omnidirectional microphones; the sound source relative position determination unit may include:
the first sound source relative position determining subunit is used for processing the sound signals collected by the microphones according to a generalized cross-correlation algorithm to obtain the direction angle of the sound source sending the set voice instruction;
and the second sound source relative position determining subunit is used for determining the relative position relationship between the sound source and each side surface of the equipment according to the direction angle of the sound source.
The at least two microphones are not all omni-directional microphones; the sound source relative position determination unit may include:
a third sound source relative position determining subunit, configured to measure sound energy of sound signals collected by each of the microphones;
the fourth sound source relative position determining subunit is used for comparing the sound energy of the sound signals collected by the microphones and determining a target microphone corresponding to the sound signal with the maximum sound energy;
a fifth sound source relative position determining subunit operable to determine an opening orientation of the target microphone as the sound source direction.
Optionally, if the number of the microphones is two, the determining subunit of the relative position of the first sound source processes the sound signals collected by the microphones according to a generalized cross-correlation algorithm, so as to obtain a process of sending a direction angle of the sound source of the set voice command, which may specifically include:
respectively carrying out Fourier transform on a first sound signal and a second sound signal which are respectively collected by two microphones, and carrying out conjugate multiplication on the first sound signal and the second sound signal after the Fourier transform to obtain a multiplication result;
obtaining a cross-power spectrum density function of the multiplication result;
performing inverse Fourier transform on the cross-power spectral density function, and performing modulus calculation on a transform result to obtain a modulus calculation result;
and searching an angle corresponding to the peak value according to the modulus result, and determining the angle as a sound source direction angle.
Optionally, before performing the inverse fourier transform on the cross-power spectral density function, the method may further include:
and weighting the cross-power spectral density function to obtain the weighted cross-power spectral density function.
Optionally, before measuring the sound energy of the sound signal collected by each microphone, the sound source relative position determining unit may further perform the following processing operations:
and correcting the sound signals collected by the microphones according to the set gain correction coefficients of the microphones to obtain corrected sound signals. Based on this, the process of measuring the sound energy of the sound signal collected by each microphone by the sound source relative position determination unit may include: the sound energy of each sound signal after correction is measured.
The embodiment of the application further discloses a terminal device, which comprises at least two microphones, wherein the opening orientations of at least two microphones are different, the side surfaces of the surface of the terminal device, which are the same as the opening orientations of the microphones, are respectively provided with a lateral entity object, and the terminal device further comprises a processor, wherein the processor is used for:
acquiring sound signals collected by the microphones;
identifying whether the sound signal contains a set voice instruction;
if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone;
determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
and controlling the target entity object to run.
In the following embodiments, a hardware structure of the terminal device is introduced, referring to fig. 8, and fig. 8 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present disclosure.
As shown in fig. 8, the terminal device may include:
the system comprises a processor 1, a communication interface 2, a memory 3, a communication bus 4, a display screen 5, a microphone 6 and an entity object 7;
the processor 1, the communication interface 2, the memory 3, the microphone 6, the entity object 7 and the display screen 5 are communicated with each other through the communication bus 4;
optionally, the communication interface 2 may be an interface of a communication module, such as an interface of a GSM module;
a processor 1 for executing a program;
a memory 3 for storing a program;
the program may include program code including operating instructions of the processor.
The processor 1 may be a central processing unit CPU or an application specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present application.
The memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory), such as at least one disk memory.
The microphone 6 is used for collecting sound signals in the environment and transmitting the sound signals to the processor for recognizing the set voice command.
The entity object 7 is for running under control of the processor. The physical object 7 may be a bilateral camera, a bilateral lighting device, etc.
Wherein the program is specifically for:
acquiring sound signals collected by the microphones;
identifying whether the sound signal contains a set voice instruction;
if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone;
determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
and controlling the target entity object to run.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (13)

1. An object control method is applied to equipment provided with at least two microphones, wherein the openings of the at least two microphones are in different directions, and solid objects with lateral properties are respectively arranged on the surface of the equipment and the side faces, which are in the same direction, of the openings of the microphones, and the method comprises the following steps:
acquiring sound signals collected by the microphones;
identifying whether the sound signal contains a set voice instruction;
if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone;
determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
and controlling the target entity object to run.
2. The method of claim 1, wherein each of the microphones is an omni-directional microphone; the determining the relative position relationship between the sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone comprises the following steps:
processing the sound signals collected by each microphone according to a generalized cross-correlation algorithm to obtain the direction angle of the sound source sending the set voice command;
and determining the relative position relation between the sound source and each side surface of the equipment according to the direction angle of the sound source.
3. The method according to claim 2, wherein the number of the microphones is two, and the processing the sound signals collected by each microphone according to the generalized cross-correlation algorithm to obtain the direction angle of the sound source emitting the set voice command comprises:
respectively carrying out Fourier transform on a first sound signal and a second sound signal which are respectively collected by two microphones, and carrying out conjugate multiplication on the first sound signal and the second sound signal after the Fourier transform to obtain a multiplication result;
obtaining a cross-power spectrum density function of the multiplication result;
performing inverse Fourier transform on the cross-power spectral density function, and performing modulus calculation on a transform result to obtain a modulus calculation result;
and searching an angle corresponding to the peak value according to the modulus result, and determining the angle as a sound source direction angle.
4. The method of claim 3, wherein prior to said inverse Fourier transforming said cross-power spectral density function, the method further comprises:
and weighting the cross-power spectral density function to obtain the weighted cross-power spectral density function.
5. The method of claim 1, wherein the at least two microphones are not all omni-directional microphones; the determining the relative position relationship between the sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone comprises the following steps:
measuring the sound energy of the sound signals collected by the microphones;
comparing the sound energy of the sound signals collected by the microphones, and determining a target microphone corresponding to the sound signal with the maximum sound energy;
determining an opening orientation of the target microphone as the sound source direction.
6. The method of claim 5, wherein prior to said measuring the sound energy of the sound signal picked up by each of said microphones, the method further comprises:
correcting the sound signals collected by the microphones according to the set gain correction coefficients of the microphones to obtain corrected sound signals;
the measuring the sound energy of the sound signal collected by each microphone comprises:
the sound energy of each sound signal after correction is measured.
7. The method of claim 1, wherein the recognizing whether the voice signal contains a set voice command comprises:
converting the sound signal into text information;
identifying whether the text information contains set text characters, if so, determining that a set voice command is contained, and if not, determining that the set voice command is not contained;
or,
and performing signal feature matching on the sound signal and the template sound signal, if the matching is successful, determining that the set voice command is contained, otherwise, determining that the set voice command is not contained.
8. The method according to any one of claims 1 to 7, wherein the physical object is a camera, the device is a terminal carrying a front camera and a rear camera, and a front panel and a rear panel of the terminal are respectively provided with at least one microphone;
the controlling the operation of the target entity object comprises the following steps:
and controlling the working state of the target camera which needs to be controlled by the set voice command in the front camera and the rear camera.
9. The method according to any one of claims 1 to 7, wherein the physical object is a light emitting device, the device is a terminal carrying a front and a rear light emitting device, and a front panel and a rear panel of the terminal are respectively provided with at least one microphone;
the controlling the operation of the target entity object comprises the following steps:
and controlling the working state of the target light-emitting device required to be controlled by the set voice command in the front light-emitting device and the rear light-emitting device.
10. An object control device, applied to a device provided with at least two microphones, wherein at least two microphones have different opening orientations, and a side surface of the device, which is the same as the opening orientation of each microphone, is provided with a solid object with lateral property, the device comprising:
the sound signal acquisition unit is used for acquiring sound signals collected by the microphones;
the language instruction identification unit is used for identifying whether the sound signal contains a set voice instruction or not;
the sound source relative position determining unit is used for determining the relative position relationship between a sound source which sends a set voice instruction and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone if each sound signal contains the set voice instruction;
the distance determining unit is used for determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
and the entity object control unit is used for controlling the operation of the target entity object.
11. The apparatus of claim 10, wherein each of the microphones is an omni-directional microphone; the sound source relative position determination unit includes:
the first sound source relative position determining subunit is used for processing the sound signals collected by the microphones according to a generalized cross-correlation algorithm to obtain the direction angle of the sound source sending the set voice instruction;
and the second sound source relative position determining subunit is used for determining the relative position relationship between the sound source and each side surface of the equipment according to the direction angle of the sound source.
12. The apparatus of claim 10, wherein the at least two microphones are not all omni-directional microphones; the sound source relative position determination unit includes:
a third sound source relative position determining subunit, configured to measure sound energy of sound signals collected by each of the microphones;
the fourth sound source relative position determining subunit is used for comparing the sound energy of the sound signals collected by the microphones and determining a target microphone corresponding to the sound signal with the maximum sound energy;
a fifth sound source relative position determining subunit operable to determine an opening orientation of the target microphone as the sound source direction.
13. A terminal device, comprising at least two microphones, wherein at least two microphones have different opening orientations, and the side surfaces of the terminal device surface that are the same as the orientation of each of the microphone openings are respectively provided with a physical object with a lateral property, the terminal device further comprising a processor, the processor being configured to:
acquiring sound signals collected by the microphones;
identifying whether the sound signal contains a set voice instruction;
if each sound signal contains a set voice command, determining the relative position relationship between a sound source sending the set voice command and each side of the equipment according to the attribute parameters of the sound signals collected by each microphone;
determining a target entity object required to be controlled by the set voice command according to the relative position relation between the sound source and each side surface of the equipment;
and controlling the target entity object to run.
CN201611006003.1A 2016-11-15 2016-11-15 A kind of object control method, apparatus and terminal device Pending CN108073381A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611006003.1A CN108073381A (en) 2016-11-15 2016-11-15 A kind of object control method, apparatus and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611006003.1A CN108073381A (en) 2016-11-15 2016-11-15 A kind of object control method, apparatus and terminal device

Publications (1)

Publication Number Publication Date
CN108073381A true CN108073381A (en) 2018-05-25

Family

ID=62162638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611006003.1A Pending CN108073381A (en) 2016-11-15 2016-11-15 A kind of object control method, apparatus and terminal device

Country Status (1)

Country Link
CN (1) CN108073381A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110010126A (en) * 2019-03-11 2019-07-12 百度国际科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
CN110223684A (en) * 2019-05-16 2019-09-10 华为技术有限公司 A kind of voice awakening method and equipment
WO2020051836A1 (en) * 2018-09-13 2020-03-19 Alibaba Group Holding Limited Methods and devices for processing audio input using unidirectional audio input devices
WO2020051841A1 (en) * 2018-09-13 2020-03-19 Alibaba Group Holding Limited Human-machine speech interaction apparatus and method of operating the same
CN111060867A (en) * 2019-12-17 2020-04-24 南京愔宜智能科技有限公司 Directional microphone microarray direction of arrival estimation method
CN111883167A (en) * 2020-08-12 2020-11-03 上海明略人工智能(集团)有限公司 Sound separation method and device, recording equipment and readable storage medium
CN112887552A (en) * 2021-01-22 2021-06-01 维沃移动通信有限公司 Focus tracking method and device and electronic equipment
WO2021218600A1 (en) * 2020-04-28 2021-11-04 华为技术有限公司 Voice wake-up method and device
EP4231622A4 (en) * 2021-12-27 2024-04-03 Beijing Honor Device Co., Ltd. Video processing method and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100029255A1 (en) * 2008-08-04 2010-02-04 Lg Electronics Inc. Mobile terminal capable of providing web browsing function and method of controlling the mobile terminal
US20110164105A1 (en) * 2010-01-06 2011-07-07 Apple Inc. Automatic video stream selection
CN102707261A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Microphone array sound source localization system
CN103237178A (en) * 2013-03-26 2013-08-07 北京小米科技有限责任公司 Video frame switching method, video frame switching device and video frame switching equipment
CN104580992A (en) * 2014-12-31 2015-04-29 广东欧珀移动通信有限公司 Control method and mobile terminal
CN104699445A (en) * 2013-12-06 2015-06-10 华为技术有限公司 Audio information processing method and device
CN105959554A (en) * 2016-06-01 2016-09-21 努比亚技术有限公司 Video shooting apparatus and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100029255A1 (en) * 2008-08-04 2010-02-04 Lg Electronics Inc. Mobile terminal capable of providing web browsing function and method of controlling the mobile terminal
US20110164105A1 (en) * 2010-01-06 2011-07-07 Apple Inc. Automatic video stream selection
CN102707261A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Microphone array sound source localization system
CN103237178A (en) * 2013-03-26 2013-08-07 北京小米科技有限责任公司 Video frame switching method, video frame switching device and video frame switching equipment
CN104699445A (en) * 2013-12-06 2015-06-10 华为技术有限公司 Audio information processing method and device
CN104580992A (en) * 2014-12-31 2015-04-29 广东欧珀移动通信有限公司 Control method and mobile terminal
CN105959554A (en) * 2016-06-01 2016-09-21 努比亚技术有限公司 Video shooting apparatus and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
顾添翼.: "基于麦克风阵列的多声源测向方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020051836A1 (en) * 2018-09-13 2020-03-19 Alibaba Group Holding Limited Methods and devices for processing audio input using unidirectional audio input devices
WO2020051841A1 (en) * 2018-09-13 2020-03-19 Alibaba Group Holding Limited Human-machine speech interaction apparatus and method of operating the same
CN112654960A (en) * 2018-09-13 2021-04-13 阿里巴巴集团控股有限公司 Man-machine voice interaction device and operation method thereof
CN110010126A (en) * 2019-03-11 2019-07-12 百度国际科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
CN110223684A (en) * 2019-05-16 2019-09-10 华为技术有限公司 A kind of voice awakening method and equipment
CN111060867A (en) * 2019-12-17 2020-04-24 南京愔宜智能科技有限公司 Directional microphone microarray direction of arrival estimation method
WO2021218600A1 (en) * 2020-04-28 2021-11-04 华为技术有限公司 Voice wake-up method and device
US12032421B2 (en) 2020-04-28 2024-07-09 Huawei Technologies Co., Ltd. Voice wakeup method and device
CN111883167A (en) * 2020-08-12 2020-11-03 上海明略人工智能(集团)有限公司 Sound separation method and device, recording equipment and readable storage medium
CN112887552A (en) * 2021-01-22 2021-06-01 维沃移动通信有限公司 Focus tracking method and device and electronic equipment
CN112887552B (en) * 2021-01-22 2022-11-11 维沃移动通信有限公司 Focus tracking method and device and electronic equipment
EP4231622A4 (en) * 2021-12-27 2024-04-03 Beijing Honor Device Co., Ltd. Video processing method and electronic device

Similar Documents

Publication Publication Date Title
CN108073381A (en) A kind of object control method, apparatus and terminal device
CN111050269B (en) Audio processing method and electronic equipment
US10972835B2 (en) Conference system with a microphone array system and a method of speech acquisition in a conference system
CN111034222B (en) Sound pickup apparatus, sound pickup method, and computer program product
US11706577B2 (en) Systems and methods for equalizing audio for playback on an electronic device
CN102045618B (en) Automatically adjusted microphone array, method for automatically adjusting microphone array, and device carrying microphone array
CN102843540B (en) Automatic camera for video conference is selected
JP4847022B2 (en) Utterance content recognition device
US20100123785A1 (en) Graphic Control for Directional Audio Input
CN108370471A (en) Distributed audio captures and mixing
CN110089131A (en) Distributed audio capture and mixing control
US20060195316A1 (en) Voice detecting apparatus, automatic image pickup apparatus, and voice detecting method
US10904658B2 (en) Electronic device directional audio-video capture
JP7133789B2 (en) Sound collection device, sound collection system, sound collection method, program, and calibration method
US20070172083A1 (en) Method and apparatus for controlling a gain of a voice signal
CN111863020B (en) Voice signal processing method, device, equipment and storage medium
CN104424073A (en) Information processing method and electronic equipment
CN113744750B (en) Audio processing method and electronic equipment
CN113496708A (en) Sound pickup method and device and electronic equipment
CN112233689A (en) Audio noise reduction method, device, equipment and medium
KR101976937B1 (en) Apparatus for automatic conference notetaking using mems microphone array
CN116405774A (en) Video processing method and electronic equipment
EP4178220A1 (en) Voice-input device
JP2019537071A (en) Processing sound from distributed microphones
JP7111202B2 (en) SOUND COLLECTION CONTROL SYSTEM AND CONTROL METHOD OF SOUND COLLECTION CONTROL SYSTEM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180525

RJ01 Rejection of invention patent application after publication