CN117250584A - Sound source positioning method, sound source positioning device, electronic equipment and computer readable storage medium - Google Patents

Sound source positioning method, sound source positioning device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN117250584A
CN117250584A CN202311085539.7A CN202311085539A CN117250584A CN 117250584 A CN117250584 A CN 117250584A CN 202311085539 A CN202311085539 A CN 202311085539A CN 117250584 A CN117250584 A CN 117250584A
Authority
CN
China
Prior art keywords
sound
sound signals
acquisition
target
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311085539.7A
Other languages
Chinese (zh)
Inventor
陈宏格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Apollo Zhilian Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Zhilian Beijing Technology Co Ltd filed Critical Apollo Zhilian Beijing Technology Co Ltd
Priority to CN202311085539.7A priority Critical patent/CN117250584A/en
Publication of CN117250584A publication Critical patent/CN117250584A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/8022Systems for determining direction or deviation from predetermined direction using the Doppler shift introduced by the relative motion between source and receiver
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/803Systems for determining direction or deviation from predetermined direction using amplitude comparison of signals derived from receiving transducers or transducer systems having differently-oriented directivity characteristics
    • G01S3/8032Systems for determining direction or deviation from predetermined direction using amplitude comparison of signals derived from receiving transducers or transducer systems having differently-oriented directivity characteristics wherein the signals are derived sequentially
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/809Rotating or oscillating beam systems using continuous analysis of received signal for determining direction in the plane of rotation or oscillation or for determining deviation from a predetermined direction in such a plane
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/186Determination of attitude
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present disclosure provides a sound source localization method, relates to the technical field of sound signal processing, and in particular relates to a sound signal processing device, an electronic device and a computer readable storage medium. The specific implementation scheme is as follows: acquiring a plurality of target sound signals and determining a plurality of delay information according to the plurality of target sound signals; determining sounding positions of target sound sources corresponding to the target sound signals according to the delay information; determining a sound source area according to the sounding position; enhancing sound signal acquisition for a sound source region; wherein, a plurality of target sound signals are collected by a plurality of sound collecting units; one delay information is delay information between target sound signals acquired by any two sound acquisition units of the plurality of sound acquisition units. Sound source localization can be achieved, sound signal collection of a sound source area is increased in space, so that directional collection of sound signals is achieved, and required sound signals are accurately collected.

Description

Sound source positioning method, sound source positioning device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of sound signal processing technologies, and in particular, to a sound source localization method, a sound source localization device, an electronic device, and a computer readable storage medium.
Background
Currently, in many application scenarios of speech acquisition, for example: in an application scene of vehicle-mounted voice control, because the position of a target source sending out a voice command is not fixed in a scene space, a plurality of voice acquisition units are required to be arranged in the space to acquire voice so as to accurately acquire the voice command; or, the user needs to get close to the sound collection unit to collect the sound instruction, and the collection accuracy is poor.
Disclosure of Invention
The present disclosure provides a sound source localization method, apparatus, electronic device, and computer-readable storage medium for solving at least one of the above-mentioned technical problems.
According to an aspect of the present disclosure, there is provided a sound source localization method including:
acquiring a plurality of target sound signals, and determining a plurality of delay information according to the plurality of target sound signals;
determining sounding positions of target sound sources corresponding to the target sound signals according to the delay information;
determining a sound source area according to the sound production position;
enhancing sound signal acquisition for the sound source region;
wherein the plurality of target sound signals are collected by a plurality of sound collecting units; one of the delay information is delay information between target sound signals acquired by any two of the plurality of sound acquisition units.
According to another aspect of the present disclosure, there is provided a sound source localization apparatus including:
the delay information acquisition module is used for acquiring a plurality of target sound signals and determining a plurality of delay information according to the plurality of target sound signals;
the sounding position determining module is used for determining sounding positions of target sound sources corresponding to the target sound signals according to the delay information;
the sound source area determining module is used for determining a sound source area according to the sounding position;
the region acquisition enhancement module is used for enhancing the sound signal acquisition aiming at the sound source region;
wherein the plurality of target sound signals are collected by a plurality of sound collecting units; one of the delay information is delay information between target sound signals acquired by any two of the plurality of sound acquisition units.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to the above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to the above.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flow chart of a sound source localization method according to a first embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a sound source localization method according to a second embodiment of the present disclosure;
FIG. 3A is a schematic diagram of an exemplary sound collection unit distribution;
FIG. 3B is a schematic illustration of the position of an exemplary sound collection unit;
FIG. 4 is a schematic waveform diagram of an exemplary two sound signals;
fig. 5 is a schematic structural view of a sound source positioning device according to a third embodiment of the present disclosure;
fig. 6 is a block diagram of an electronic device for implementing the methods of embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Embodiments of the disclosure and features of embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Sound source localization according to the present disclosure may be performed by an electronic device such as a terminal device or a server, the terminal device may be a vehicle-mounted device, a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc., and the method may be implemented by a processor invoking computer readable program instructions stored in a memory. Alternatively, the sound source localization provided by the present disclosure may be performed by a server.
In disclosing the first embodiment, referring to fig. 1, fig. 1 shows a schematic flow chart of a sound source localization method provided in the first embodiment of the present disclosure. The method comprises the following steps:
s101, acquiring a plurality of target sound signals and determining a plurality of delay information according to the plurality of target sound signals.
S102, determining sounding positions of target sound sources corresponding to the target sound signals according to the delay information.
S103, determining a sound source area according to the sounding position.
S104, enhancing sound signal acquisition aiming at the sound source area.
Wherein, a plurality of target sound signals are collected by a plurality of sound collecting units; one delay information is delay information between target sound signals acquired by any two sound acquisition units of the plurality of sound acquisition units.
It should be noted that the method provided by the present disclosure may be applied in various application scenarios, for example: the system comprises a vehicle-mounted voice interaction scene, an intelligent home scene, a video conference scene and the like, wherein different scene spaces are provided under different scenes, for example, under the vehicle-mounted voice interaction scene, the scene spaces are in-vehicle spaces; under the intelligent home scene, the scene space is an indoor space. A plurality of sound collection units are arranged in the scene space and used for collecting sound signals, a certain sound source in the scene space can be collected by the plurality of sound collection units, and the sound signals collected by the sound collection units are processed to obtain target sound signals serving as processing targets. The target sound signal may be determined in a variety of ways, for example: and determining the sound signals with the similarity larger than a preset threshold value from the plurality of sound signals as a plurality of target sound signals. That is, the same (similar) sound signals among the sound signals collected by the plurality of sound collecting units are screened as target sound signals, and the filtering noise reduction can be preliminarily realized by adopting the mode, so that noise and invalid sound signals are eliminated.
In some examples, the sound collection unit may be various types of devices capable of collecting sound, such as: a general microphone, an omni-directional microphone, etc., are not limited herein.
According to the method provided by the disclosure, after sound source localization is carried out according to a plurality of target sound signals, the sounding position of the target sound is obtained, so that the sound source localization can be realized; after the sound source is positioned, a sound source area is determined aiming at the sound production position, and the sound signal acquisition of the sound source area is enhanced aiming at the sound source area, so that the sound signal can be directionally acquired, the sound signal is acquired in a targeted manner, the accuracy of acquiring the sound signal is effectively improved, in this way, a large number of sound acquisition units are not required, the accurate sound signal can be obtained in a sound source positioning and area acquisition enhancing mode, and the number of required sound acquisition units can be reduced.
In disclosing the second embodiment, referring to fig. 2, fig. 2 shows a schematic flow chart of a sound source localization method provided in the second embodiment of the present disclosure. The method comprises the following steps:
s201, acquiring and storing position information of a plurality of sound acquisition units.
Referring to fig. 3A, fig. 3A shows an exemplary sound collection unit distribution diagram, where M 1 -M n The 1 st to n th sound collection units arranged in the scene space are represented, and K represents a target sound source; assume that n sound collection units are included, each sound collection unitThe location information of the collection unit is expressed as (X i ,Y i ) The angle sign i indicates the 1 st to n th sound collection units. Before starting to collect the sound signal, after the sound collection unit is set up, a signal denoted as (X i ,Y i ) And the correspondence of the position information and the sound collection unit are stored in the storage unit.
S202, acquiring and storing unit acquisition areas of each of a plurality of sound acquisition units.
In some examples, a unit acquisition area, that is, an area for which one sound acquisition unit is mainly responsible for acquiring sound signals, is preset for each sound acquisition unit in advance; radius r for the size of the cell acquisition region i The distinction indicates that the angle sign i indicates the 1 st to n th sound collection units. Before the collection of the sound signal is started, after the sound collection unit is set, the unit collection area of the sound collection unit and the corresponding relation between the sound collection unit and the unit collection area can be stored in the storage unit.
The execution order of S201 and S202 may be changed, and is not limited herein.
S203, acquiring a plurality of sound signals acquired by the sound acquisition units, and determining sound signals with similarity larger than a preset threshold value in the sound signals as a plurality of target sound signals.
A plurality of sound collection units are disposed in the scene Jing Kongjian (e.g., in-vehicle space) and are configured to collect sound signals, a sound source in the scene space is collected by the plurality of sound collection units, the sound signals collected by the sound collection units are processed, and sound signals with similarity greater than a preset threshold value in the plurality of sound signals are determined as a plurality of target sound signals. That is, the same (similar) sound signals among the sound signals collected by the plurality of sound collecting units are screened as target sound signals, and the filtering noise reduction can be preliminarily realized by adopting the mode, so that noise and invalid sound signals are eliminated.
Specifically, the similarity includes at least one of: spectrum similarity, pitch similarity, fundamental frequency similarity, semantic content similarity; the preset threshold value of the similarity can be set according to the requirement, and is not limited again.
S204, aiming at any delay information, acquiring acquisition time differences of two target sound signals according to the shortest distance between two sound acquisition units corresponding to the two target sound signals and the two sound acquisition units.
Specifically, any one of the delay information a is determined according to the following formula:
A=Δd+Cτ (1)
wherein, referring to fig. 3B, fig. 3B shows a schematic position diagram of an exemplary sound collection unit, a first sound collection unit M 1 And a second sound collection unit M 2 As an illustration; Δd is the shortest distance between two sound collection units whose delay information corresponds to two target sound signals; c is the propagation speed of sound in air, c=340 m/s; τ is the collection time difference of two sound collection units respectively collecting two target sound signals, the collection time difference is determined according to the time when the two sound collection units collect the target sound signals, the time when the target sound source occurs is consistent, the time when the target sound source arrives at the sound collection units at different positions of the target sound source is provided with the time difference (collection time difference), delay information is calculated aiming at any two of a plurality of sound collection units corresponding to the target sound signals in the mode, and a plurality of delay information is obtained.
S203-S204 are one implementation of S101, and S101 has other implementations, which are not limited herein.
S205, determining the sounding position of the target sound source according to the position information of the sound collecting units corresponding to the target sound signals, the delay information, the collecting time of the sound collecting units aiming at the target sound signals and the preset sounding position error parameters.
Specifically, the sound emission position S of the target sound source is determined according to the following formula:
wherein X is i Y i () Representing sound productionPosition information of the collection units, see S201 in particular, and the angle mark i indicates 1 st to n th sound collection units; a is that i Delay information of the sound collection units is represented, specifically referring to formula (1), and angle marks i represent 1 st to n th sound collection units; t is t i The method comprises the steps that the collection time of a sound collection unit aiming at a target sound signal is represented, and angle marks i represent 1 st to n th sound collection units; sigma represents a preset sounding position error parameter, sigma is a parameter set for eliminating a collection error of a collected sound signal, a specific numerical value can be set according to needs, and sigma is set as a representation parameter of a spherical range and a representation parameter of a linear range according to a spherical propagation mode or a linear propagation mode of sound.
The position information of the sound collecting units is adjusted through the delay information and the collecting time of the target sound signal of one sound collecting unit, and the sound producing position of the target sound can be truly achieved by integrating the position information of a plurality of sound collecting units.
Wherein S205 is an implementation of S102, which is not limited herein.
S206, according to the sounding positions and the position information of each sound collecting unit, the regional weight of each sound source collecting unit is truly calculated.
Specifically, according to the sound emission position of the target sound and the position information of each sound collection unit determined in S205, the distance from the sound emission position to each sound collection unit may be obtained, and then according to the order from the small distance to the large distance, the area weight of each sound collection unit is set, and the area weight of the sound collection unit close to the sound emission position is greater than the area weight of the sound collection unit far from the sound emission position.
S207, according to each sound source collecting unit and unit collecting area and area weight, the sound area is truly formed.
According to each sound source acquisition unit and unit acquisition area and the area weight of each sound source acquisition unit, the unit acquisition areas of each sound source acquisition unit are subjected to area superposition, so that the sound area of target sound can be determined, the acquisition of sound signals in the sound area can be enhanced subsequently, and particularly, the acquisition sensitivity of the sound acquisition units in the sound area can be enhanced; or, enhancing the target sound signal collected by the sound collecting unit in the sound area; or, the acquisition sensitivity of the sound acquisition units involved in the sound area is enhanced to different degrees according to the area weight, and specifically, the enhancement mode may be set according to the need, which is not limited herein.
S206-S207 are one implementation of S103, which is not limited herein.
S208, acquiring a plurality of sound signals currently acquired by the sound source area, determining a target sound instruction according to the sound signals, and determining an operation instruction according to the target sound instruction.
Specifically, S208 includes the steps of:
step one: and acquiring a plurality of sound signals currently acquired by a plurality of sound source acquisition units associated with the sound source region, and determining acquisition time of the plurality of sound signals currently acquired.
Step two: and adjusting a plurality of sound signals collected currently according to the collection time to obtain a target sound instruction.
Step three: and sending the operation instruction to the execution object according to the target sound instruction and the operation instruction, wherein the execution object is configured to execute corresponding operation according to the operation instruction.
The steps are defined in the above, after the sound signal collection and enhancement of the sound source area are performed, the subsequent sound source collection unit collects a plurality of sound signals, processes the sound signals according to the collection time of the plurality of sound signals (for example, noise reduction and noise removal processes) to obtain an accuracy target sound instruction, the target sound quality is the superposition of the plurality of processed sound signals, the target sound instruction is analyzed, an execution object (for example, a car window, an air conditioner and the like) indicated in the target sound quality is analyzed, and the operation instruction (for example, an opening operation instruction, a closing operation instruction and the like) of the execution object is sent to the execution object with a control execution function, so that the execution object is driven to operate according to the operation instruction, and intelligent voice interaction control is realized.
In some examples, step two specifically includes the sub-steps of:
the method comprises the following substeps: and according to the acquisition time, carrying out acquisition time synchronization processing on the plurality of currently acquired sound signals.
Referring to fig. 4, fig. 4 shows waveforms of two sound signals; taking the example that the plurality of currently collected sound signals comprise two sound signals (namely a first sound signal A and a second sound signal B), the collection time of the first sound signal is t i The acquisition time of the second sound signal B is t i+1 ,t x Representing the acquisition time difference of two sound signals, then t is given i And t i+1 Moves the starting time axis of (c) to the same starting point, i.e., t i And t i+1 And (3) overlapping the points on the time axis t, and performing time synchronization processing on each currently acquired sound signal.
Sub-step two: and separating the same sound signals in the current collected sound signals after the synchronous collection time processing, and carrying out superposition processing on the same sound signals to integrate the same sound signals into a target sound instruction.
Continuing with the example of the sound of fig. 4, the first sound signal a and the second sound signal B are superimposed and fused into one target sound execution P, which is performed according to the following formula:
P=a(t i )*b(t i+1 -t i ) (3)
and (3) superposing a plurality of sound signals according to the formula (3) in the same way to obtain a target sound instruction. By the method, an accurate target voice command can be obtained, so that semantic analysis can be performed according to the target voice command, and an accurate semantic analysis result can be obtained.
In disclosing a third embodiment, based on the same principle as fig. 1, fig. 5 shows a sound source localization device 50 provided by the third embodiment of the present disclosure, the device comprising:
a delay information acquisition module 501 for acquiring a plurality of target sound signals and determining a plurality of delay information according to the plurality of target sound signals;
the sounding position determining module 502 is configured to determine sounding positions of target sound sources corresponding to the plurality of target sound signals according to the plurality of delay information;
a sound source region determining module 503 for determining a sound source region according to the sound producing position;
a region acquisition enhancement module 504, configured to enhance sound signal acquisition for a sound source region;
wherein, a plurality of target sound signals are collected by a plurality of sound collecting units; one delay information is delay information between target sound signals acquired by any two sound acquisition units of the plurality of sound acquisition units.
In some examples, the delay information acquisition module is specifically configured to:
and acquiring a plurality of sound signals acquired by the plurality of sound acquisition units, and determining sound signals with similarity larger than a preset threshold value in the plurality of sound signals as a plurality of target sound signals.
In some examples, the delay information acquisition module is specifically configured to:
for any delay information, the shortest distance between two sound collection units corresponding to two target sound signals and the collection time difference of the two target sound signals are respectively collected by the two sound collection units according to the delay information.
In some examples, the apparatus further comprises:
and the position information acquisition module is used for acquiring and storing the position information of the plurality of sound acquisition units.
In some examples, the sound source region determination module is specifically to:
and determining the sounding position of the target sound source according to the position information, the delay information and the sounding position error parameters of the sound collecting units corresponding to the target sound signals, the collecting time of the sound collecting units aiming at the target sound signals and the preset sounding position error parameters.
In some examples, the apparatus further comprises:
the unit acquisition area acquisition module is used for acquiring and storing the unit acquisition areas of the sound acquisition units.
In some examples, the sound source region determination module is specifically to:
according to the sounding position and the position information of each sound collecting unit, the regional weight of each sound source collecting unit is truly calculated;
and according to each sound source acquisition unit and unit acquisition area and area weight, the sound area is truly acquired.
In some examples, the apparatus further comprises:
the acquisition time determining module is used for acquiring a plurality of sound signals currently acquired by a plurality of sound source acquisition units associated with a sound source area and determining the acquisition time of the plurality of sound signals currently acquired;
the sound instruction determining module is used for adjusting a plurality of currently acquired sound signals according to the acquisition time to obtain a target sound instruction;
the operation instruction determining module is used for actually executing the object and the operation instruction according to the target sound instruction, sending the operation instruction to the execution object, and executing the corresponding operation according to the operation execution by the execution object;
in some examples, the sound instruction determination module is specifically to:
according to the acquisition time, synchronously processing the acquired time of the plurality of currently acquired sound signals;
and separating the same sound signals in the current collected sound signals after the synchronous collection time processing, and carrying out superposition processing on the same sound signals to integrate the same sound signals into a target sound instruction.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as a sound source localization method. For example, in some embodiments, the sound source localization method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When a computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the sound source localization method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured as a sound source localization method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (20)

1. A method of sound source localization, the method comprising:
acquiring a plurality of target sound signals, and determining a plurality of delay information according to the plurality of target sound signals;
determining sounding positions of target sound sources corresponding to the target sound signals according to the delay information;
determining a sound source area according to the sound production position;
enhancing sound signal acquisition for the sound source region;
wherein the plurality of target sound signals are collected by a plurality of sound collecting units; one of the delay information is delay information between target sound signals acquired by any two of the plurality of sound acquisition units.
2. The method of claim 1, wherein the acquiring a plurality of target sound signals comprises:
and acquiring a plurality of sound signals acquired by a plurality of sound acquisition units, and determining sound signals with similarity larger than a preset threshold value in the plurality of sound signals as the plurality of target sound signals.
3. The method of claim 1 or 2, wherein the determining a plurality of delay information from the plurality of target sound signals comprises:
and aiming at any delay information, acquiring acquisition time differences of the two target sound signals according to the shortest distance between the two sound acquisition units corresponding to the two target sound signals and the two sound acquisition units.
4. A method according to any one of claims 1-3, wherein before the acquiring a plurality of target sound signals and determining a plurality of delay information from the plurality of target sound signals, the method further comprises:
and acquiring and storing the position information of the sound acquisition units.
5. The method of claim 4, wherein the determining, according to the plurality of delay information, the sound emission positions of the target sound sources corresponding to the plurality of target sound signals includes:
and determining the sounding position of the target sound source according to the position information of the plurality of sound collecting units corresponding to the plurality of target sound signals, the plurality of delay information, the collecting time of the plurality of sound collecting units aiming at the plurality of target sound signals and preset sounding position error parameters.
6. The method of any of claims 1-5, wherein prior to the acquiring the plurality of target sound signals and determining the plurality of delay information from the plurality of target sound signals, the method further comprises:
and acquiring and storing the unit acquisition areas of the sound acquisition units.
7. The method of claim 6, wherein the determining a sound source region from the sound-producing location comprises:
according to the sounding position and the position information of each sound collecting unit, the regional weight of each sound source collecting unit is truly calculated;
and according to each sound source acquisition unit and unit acquisition area and the area weight, the sound area is truly determined.
8. The method of any of claims 1-7, wherein after the enhancing the sound signal acquisition for the sound source region, the method further comprises:
acquiring a plurality of sound signals currently acquired by a plurality of sound source acquisition units associated with the sound source region, and determining acquisition time of the plurality of sound signals currently acquired;
adjusting the plurality of currently acquired sound signals according to the acquisition time to obtain a target sound instruction;
and sending the operation instruction to the execution object according to the target sound instruction truly executing object and the operation instruction, wherein the execution object is configured to execute corresponding operation according to the operation instruction.
9. The method of claim 8, wherein said adjusting the plurality of currently acquired sound signals according to the acquisition time to obtain a target sound instruction comprises:
according to the acquisition time, carrying out acquisition time synchronization processing on the plurality of currently acquired sound signals;
and separating the same sound signals in the current collected sound signals after the synchronous processing of the collection time, and carrying out superposition processing on the same sound signals to integrate the same sound signals into one target sound instruction.
10. A sound source localization device, the device comprising:
the delay information acquisition module is used for acquiring a plurality of target sound signals and determining a plurality of delay information according to the plurality of target sound signals;
the sounding position determining module is used for determining sounding positions of target sound sources corresponding to the target sound signals according to the delay information;
the sound source area determining module is used for determining a sound source area according to the sounding position;
the region acquisition enhancement module is used for enhancing the sound signal acquisition aiming at the sound source region;
wherein the plurality of target sound signals are collected by a plurality of sound collecting units; one of the delay information is delay information between target sound signals acquired by any two of the plurality of sound acquisition units.
11. The apparatus of claim 10, wherein the delay information acquisition module is specifically configured to:
and acquiring a plurality of sound signals acquired by a plurality of sound acquisition units, and determining sound signals with similarity larger than a preset threshold value in the plurality of sound signals as the plurality of target sound signals.
12. The apparatus of claim 10 or 11, wherein the delay information acquisition module is specifically configured to:
and aiming at any delay information, acquiring acquisition time differences of the two target sound signals according to the shortest distance between the two sound acquisition units corresponding to the two target sound signals and the two sound acquisition units.
13. The apparatus according to any one of claims 9-12, wherein the apparatus further comprises:
and the position information acquisition module is used for acquiring and storing the position information of the plurality of sound acquisition units.
14. The apparatus of claim 13, wherein the sound source region determination module is specifically configured to:
and determining the sounding position of the target sound source according to the position information of the plurality of sound collecting units corresponding to the plurality of target sound signals, the plurality of delay information, the collecting time of the plurality of sound collecting units aiming at the plurality of target sound signals and preset sounding position error parameters.
15. The apparatus of any of claims 10-14, wherein the apparatus further comprises:
the unit acquisition area acquisition module is used for acquiring and storing the unit acquisition areas of the sound acquisition units.
16. The apparatus of claim 15, wherein the sound source region determination module is specifically configured to:
according to the sounding position and the position information of each sound collecting unit, the regional weight of each sound source collecting unit is truly calculated;
and according to each sound source acquisition unit and unit acquisition area and the area weight, the sound area is truly determined.
17. The apparatus of any of claims 10-16, wherein the apparatus further comprises:
the acquisition time determining module is used for acquiring a plurality of sound signals currently acquired by the sound source acquisition units associated with the sound source area and determining the acquisition time of the plurality of sound signals currently acquired;
the sound instruction determining module is used for adjusting the plurality of currently acquired sound signals according to the acquisition time to obtain a target sound instruction;
an operation instruction determining module, configured to actually execute an object and an operation instruction according to the target sound instruction, and send the operation instruction to the execution object, where the execution object is configured to execute a corresponding operation according to an operation execution;
the sound instruction determining module is specifically configured to:
according to the acquisition time, carrying out acquisition time synchronization processing on the plurality of currently acquired sound signals;
and separating the same sound signals in the current collected sound signals after the synchronous processing of the collection time, and carrying out superposition processing on the same sound signals to integrate the same sound signals into one target sound instruction.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-9.
CN202311085539.7A 2023-08-25 2023-08-25 Sound source positioning method, sound source positioning device, electronic equipment and computer readable storage medium Pending CN117250584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311085539.7A CN117250584A (en) 2023-08-25 2023-08-25 Sound source positioning method, sound source positioning device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311085539.7A CN117250584A (en) 2023-08-25 2023-08-25 Sound source positioning method, sound source positioning device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN117250584A true CN117250584A (en) 2023-12-19

Family

ID=89132175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311085539.7A Pending CN117250584A (en) 2023-08-25 2023-08-25 Sound source positioning method, sound source positioning device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN117250584A (en)

Similar Documents

Publication Publication Date Title
CN112785625B (en) Target tracking method, device, electronic equipment and storage medium
CN111599371B (en) Voice adding method, system, device and storage medium
CN110824425A (en) Indoor positioning method and device and electronic equipment
JP2022177202A (en) Calibration method for laser radar and positioning device, equipment and autonomous driving vehicle
KR20180025634A (en) Voice recognition apparatus and method
CN111060874A (en) Sound source positioning method and device, storage medium and terminal equipment
CN113361710A (en) Student model training method, picture processing device and electronic equipment
CN113053368A (en) Speech enhancement method, electronic device, and storage medium
CN114926549A (en) Three-dimensional point cloud processing method, device, equipment and storage medium
WO2018003158A1 (en) Correlation function generation device, correlation function generation method, correlation function generation program, and wave source direction estimation device
CN113033408B (en) Data queue dynamic updating method and device, electronic equipment and storage medium
CN113177497B (en) Training method of visual model, vehicle identification method and device
CN111312223B (en) Training method and device of voice segmentation model and electronic equipment
US20220212108A1 (en) Audio frequency signal processing method and apparatus, terminal and storage medium
CN117250584A (en) Sound source positioning method, sound source positioning device, electronic equipment and computer readable storage medium
CN113570727B (en) Scene file generation method and device, electronic equipment and storage medium
CN113920273B (en) Image processing method, device, electronic equipment and storage medium
CN113763968B (en) Method, apparatus, device, medium, and product for recognizing speech
CN114882879A (en) Audio noise reduction method, method and device for determining mapping information and electronic equipment
CN112201259B (en) Sound source positioning method, device, equipment and computer storage medium
CN113762397A (en) Detection model training and high-precision map updating method, device, medium and product
CN113654548A (en) Positioning method, positioning device, electronic equipment and storage medium
CN114187574A (en) Method for identifying change point, method and device for training change point detection network
CN112784962A (en) Training method and device for hyper network, electronic equipment and storage medium
CN113674755B (en) Voice processing method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination