WO2021228059A1 - Fixed sound source recognition method and apparatus - Google Patents

Fixed sound source recognition method and apparatus Download PDF

Info

Publication number
WO2021228059A1
WO2021228059A1 PCT/CN2021/092948 CN2021092948W WO2021228059A1 WO 2021228059 A1 WO2021228059 A1 WO 2021228059A1 CN 2021092948 W CN2021092948 W CN 2021092948W WO 2021228059 A1 WO2021228059 A1 WO 2021228059A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
sound signal
electronic device
fixed
attribute information
Prior art date
Application number
PCT/CN2021/092948
Other languages
French (fr)
Chinese (zh)
Inventor
李晓建
胡伟湘
王保辉
李伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202011399173.7A external-priority patent/CN113674759A/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021228059A1 publication Critical patent/WO2021228059A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • This application relates to the field of artificial intelligence, and more specifically, to a method and device for identifying a fixed sound source.
  • intelligent voice recognition functions are widely used in electronic devices.
  • electronic devices such as smart phones, smart speakers, smart TVs, and smart robots are all equipped with smart voice recognition functions.
  • the user needs to issue a voice command in a quiet environment, so that the electronic device can perform corresponding operations according to the voice command issued by the user.
  • the electronic device will receive the noise from the noise source while receiving the voice command input by the user, so that the voice command input by the user is disturbed by the noise emitted by the noise source. It is difficult for the electronic device to correctly recognize the true intention corresponding to the voice command input by the user, which leads to a decrease in the accuracy of the electronic device in recognizing the voice.
  • the embodiments of the present application provide a method and device for identifying a fixed sound source, so as to identify a fixed sound source in an environment around an electronic device.
  • an embodiment of the present application provides a method for identifying a fixed sound source.
  • the method is applied to an electronic device.
  • the method includes: the electronic device acquires a first audio stream in a first time period, and the first audio stream includes at least the first audio stream.
  • the fixed sound source in the fixed sound source matches the attribute information
  • the fixed sound source library includes one or more fixed sound sources corresponding to the attribute information
  • the fixed sound source is a sound source that is located in the same position and emits a known sound type
  • the electronic device can match the first attribute information of the first sound signal in the first audio stream with the fixed sound source library generated in advance, if the first attribute information matches the fixed sound source library in the fixed sound source library.
  • the attribute information of the source matches, indicating that the first sound signal is a sound signal from a fixed sound source, so the electronic device can accurately identify the fixed sound source in the environment.
  • the electronic device includes a microphone array
  • the electronic device acquiring the first audio stream in the first time period includes: the electronic device uses the microphone array to collect data from the electronic device in the first time period. The sound in the environment generates the first audio stream.
  • the first attribute information includes a sounding position, a sound type, and a sounding time of the first sound signal.
  • the electronic device determining the first attribute information of the first sound signal includes: the electronic device uses a microphone array to determine the sounding position of the first sound signal; The sound feature determines the sound type of the first sound signal; the electronic device determines the sounding time of the first sound signal.
  • the first attribute information includes the sounding position, sound content, and sounding time of the first sound signal.
  • the electronic device determining the first attribute information of the first sound signal includes: the electronic device uses a microphone array to determine the sounding position of the first sound signal; The sound feature determines the sound content of the first sound signal; the electronic device determines the sounding time of the first sound signal.
  • the first attribute information includes the sounding position, sound type, sound content, and sounding time of the first sound signal.
  • the electronic device determining the first attribute information of the first sound signal includes: the electronic device uses a microphone array to determine the sounding position of the first sound signal; The sound characteristic determines the sound type of the first sound signal; the electronic device determines the sound content of the first sound signal according to the sound characteristic of the first sound signal; the electronic device determines the sounding time of the first sound signal.
  • the electronic device determines the sound type of the first sound signal according to the sound characteristics of the first sound signal, including: the electronic device determines whether there is a sound that corresponds to the first sound signal in the sound event library.
  • the sound type corresponding to the feature, the sound event library includes one or more sound types; when there is a sound type corresponding to the sound feature of the first sound signal in the sound event library, the sound type corresponding to the sound feature of the first sound signal Determine as the sound type of the first sound signal; when there is no sound type corresponding to the sound feature of the first sound signal in the sound event library, the electronic device sends the first network request to the external server, and the electronic device receives the first network request sent by the external server.
  • the first network request includes the sound characteristic of the first sound signal
  • the first response request includes the sound type corresponding to the sound characteristic of the first sound signal; or, there is no sound corresponding to the first sound signal in the sound event library
  • the electronic device determines whether the number of times the sound feature of the first sound signal appears in the first position is greater than the first threshold, and the first position is the sounding position of the first sound signal. If the sound characteristic of the first sound signal is The number of occurrences at the first position is greater than the first threshold, and it is determined that the sound type of the first sound signal is a known sound type.
  • the electronic device can obtain the sound type corresponding to the sound feature of the first sound signal in a sound event library or an external server.
  • the method further includes: the electronic device acquires a second audio stream in a second time period, the second audio stream includes at least a second sound signal; and the electronic device determines the value of the second sound signal The second attribute information; the electronic device determines whether the second attribute information exists in the fixed sound source library.
  • the fixed sound source library includes attribute information corresponding to one or more fixed sound sources. The fixed sound sources are located in the same position and emit one A sound source with a known sound type; when the second attribute information does not exist in the fixed sound source library, the second attribute information is stored in the fixed sound source library.
  • the electronic device can establish a fixed sound source library, and can also continuously update the content in the fixed sound source library.
  • an embodiment of the present application provides an electronic device, including a memory and a processor connected to the memory, the memory is used for storing instructions; the processor is used for executing instructions, so that the computer device performs the following operations:
  • the first audio stream, the first audio stream includes at least a first sound signal; the first sound signal is separated from the first audio stream; the first attribute information of the first sound signal is determined; whether the first attribute information is related to a fixed sound source
  • the attribute information of the fixed sound sources in the library matches, and the fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound sources are sound sources that are located in the same position and emit a known sound type;
  • the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is a sound signal emitted by the fixed sound source.
  • the electronic device includes a microphone array; the processor is specifically configured to use the microphone array to collect sounds in an environment in which the electronic device is located within the first time period to generate a first audio stream.
  • the first attribute information includes the sounding position, sound type, and sounding time of the first sound signal.
  • the processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound type of the first sound signal according to the sound characteristics of the first sound signal; and determine the sound type of the first sound signal; The sounding time of a sound signal.
  • the first attribute information includes the sounding position, sound content, and sounding time of the first sound signal.
  • the processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound content of the first sound signal according to the sound characteristics of the first sound signal; and determine the sound content of the first sound signal; The sounding time of a sound signal.
  • the first attribute information includes the sounding position, sound type, sound content, and sounding time of the first sound signal.
  • the processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound type of the first sound signal according to the sound characteristics of the first sound signal; The sound feature of a sound signal determines the sound content of the first sound signal; the sounding time of the first sound signal is determined.
  • the processor is specifically configured to determine whether there is a sound type corresponding to the sound feature of the first sound signal in the sound event library, and the sound event library includes one or more sound types ; When there is a sound type corresponding to the sound feature of the first sound signal in the sound event library, the sound type corresponding to the sound feature of the first sound signal is determined as the sound type of the first sound signal; it does not exist in the sound event library.
  • the sound type corresponds to the sound characteristic of the first sound signal
  • the first network request is sent to the external server, and the first response request sent by the external server is received.
  • the first network request includes the sound characteristic of the first sound signal, and the first response request It includes the sound type corresponding to the sound feature of the first sound signal; or, when there is no sound type corresponding to the sound feature of the first sound signal in the sound event library, it is determined that the sound feature of the first sound signal appears at the first position Whether the number of times is greater than the first threshold, the first position is the sounding position of the first sound signal, and if the number of times the sound feature of the first sound signal appears at the first position is greater than the first threshold, it is determined that the sound type of the first sound signal is known Sound type.
  • the processor is further configured to obtain a second audio stream in a second time period, where the second audio stream includes at least a second sound signal; and determines the second audio signal of the second sound signal. Attribute information; determine whether the second attribute information exists in the fixed sound source library.
  • the fixed sound source library includes attribute information corresponding to one or more fixed sound sources. The fixed sound source is located at the same position and emits a known sound Type of sound source; when the second attribute information does not exist in the fixed sound source library, the second attribute information is stored in the fixed sound source library.
  • an embodiment of the present application provides an electronic device, including: an acquisition module, configured to acquire a first audio stream in a first time period, the first audio stream including at least a first sound signal; a processing module, configured to Separate the first sound signal from the first audio stream; determine the first attribute information of the first sound signal; determine whether the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, and fix the sound source library It includes attribute information corresponding to one or more fixed sound sources.
  • the fixed sound source is a sound source that is located at the same position and emits a known sound type; the first attribute information is related to the fixed sound source in the fixed sound source library. When the attribute information matches, it is determined that the first sound signal is a sound signal emitted by a fixed sound source.
  • FIG. 1 shows a schematic diagram of a scenario provided by an embodiment of this application
  • Figure 2 shows a schematic diagram of the azimuths of the three sound sources in Figure 1;
  • FIG. 3 shows a flowchart of a method for identifying a fixed sound source according to an embodiment of this application
  • FIG. 4 shows a schematic diagram of another scenario provided by an embodiment of this application.
  • Figure 5 shows a schematic diagram of the azimuths of the two sound sources in Figure 4.
  • FIG. 6 shows a flowchart of another method for identifying a fixed sound source provided by an embodiment of this application
  • FIG. 7 shows a schematic diagram of another scenario provided by an embodiment of this application.
  • FIG. 8 shows a flowchart of another method for identifying a fixed sound source according to an embodiment of this application.
  • FIG. 9 shows a schematic diagram of another scenario provided by an embodiment of this application.
  • FIG. 10 shows a schematic diagram of the azimuth of one sound source in FIG. 9;
  • FIG. 11 shows a schematic diagram of an electronic device provided by an embodiment of this application.
  • FIG. 12 shows a schematic diagram of yet another electronic device provided by an embodiment of this application.
  • FIG. 1 shows a schematic diagram of a scene provided by an embodiment of the present application
  • FIG. 2 shows a schematic diagram of the orientation of the three sound sources in FIG. 1.
  • the scene diagram shown in FIG. 1 shows a smart speaker 100, a range hood 200, an air conditioner 300, and a user 400.
  • the smart speaker 100 shown in FIG. 1 can execute the fixed sound source identification method provided by the embodiment of the present application.
  • the range hood 200 is in working state while the user 400 sends the sound signal C to the smart speaker 100, the range hood 200 will emit noise to the smart speaker 100, and the noise emitted by the range hood 200 is shown in FIGS. 1 and 2 The sound signal A.
  • the air conditioner 300 Assuming that during the process of the user 400 sending the sound signal C to the smart speaker 100, the air conditioner 300 is in the working state, the air conditioner 300 emits noise to the smart speaker 100, and the noise emitted by the air conditioner 300 is the sound signal B in FIGS. 1 and 2.
  • the smart speaker 100 can use the fixed sound source identification method provided in the embodiments of the present application to determine which sound signal is received belongs to the fixed sound source. After the smart speaker 100 determines that the sound signal A and the sound signal B belong to a fixed sound source, the smart speaker 100 can shield the received sound signal A and sound signal B, and only recognize the voice command corresponding to the received sound signal C, thereby The user plays the song "ABC".
  • the smart speaker 100 can use the fixed sound source identification method provided by the embodiments of the present application to accurately identify the fixed sound source in the environment, and correctly identify the input of the user 400
  • the real intention corresponding to the voice command improves the accuracy of the smart speaker 100 in recognizing the voice command.
  • FIG. 3 shows a flowchart of a method for identifying a fixed sound source according to an embodiment of this application.
  • the fixed sound source recognition method shown in FIG. 3 can be applied to electronic devices, and the electronic devices can be devices with smart voice recognition functions such as smart phones, smart speakers, smart TVs, and smart robots.
  • the method shown in FIG. 3 includes the following steps S101 to S104.
  • the electronic device acquires a first audio stream in a first time period, where the first audio stream includes at least a first sound signal.
  • the first time period refers to the time period during which the user inputs a voice instruction to the electronic device.
  • the electronic device may use the microphone array to collect sounds in the environment where the electronic device is located in the first time period to generate the first audio stream.
  • the first audio stream includes a sound signal A, a sound signal B, and a sound signal C, where the first sound signal is the sound signal A.
  • step A1 After S101 and before S102, the electronic device also needs to separate the first sound signal from the first audio stream.
  • the specific separation process includes step A1 and step A2:
  • Step A1 The electronic device performs a preprocessing operation on the first audio stream to obtain the corrected first audio stream.
  • the preprocessing operation includes variable centralization processing, whitening processing, principal component analysis dimensionality reduction processing and time filtering processing.
  • the purpose of the preprocessing operation is to reduce the noise in the first audio stream.
  • Step A2 Perform independent component correlation (ICA) processing on the corrected first audio stream to obtain the first sound signal.
  • ICA independent component correlation
  • the independent component analysis processing is used to separate the first sound signal from the first audio stream, so that the first sound signal can be processed correspondingly in subsequent steps.
  • the electronic device determines first attribute information of the first sound signal.
  • the electronic device determining the first attribute information of the first sound signal may include the following steps: the electronic device uses a microphone array The sounding position of the first sound signal is determined; the electronic device determines the sound type of the first sound signal according to the sound characteristics of the first sound signal; the electronic device determines the sounding time of the first sound signal.
  • the sounding position of the first sound signal refers to the sounding position of the sound source corresponding to the first sound signal relative to the electronic device.
  • the sounding position of sound signal A is the sounding position of sound source 1 corresponding to sound signal A relative to smart speaker 100. If it is considered that the position where the smart speaker 100 is located is the center point, then the sound source 1 corresponding to the sound signal A has a sound position of 140 degrees relative to the smart speaker 100.
  • the sounding position of the sound source 2 corresponding to the sound signal B relative to the smart speaker 100 is 45 degrees
  • the sounding position of the sound source 3 corresponding to the sound signal C relative to the smart speaker 100 is 270 degrees.
  • the sound feature of the first sound signal includes but is not limited to Mel frequency cepstrum coefficient (MFCC).
  • MFCC Mel frequency cepstrum coefficient
  • the sound type of the first sound signal refers to the sound emitted by the sound source corresponding to the first sound signal. For example, please refer to FIG. 1 and FIG. 2, assuming that the first sound signal is the sound signal A, then the sound type of the sound signal A is the sound of the range hood 200.
  • the sounding time of the first sound signal is the time when the electronic device receives the first sound signal.
  • the first sound signal is sound signal A
  • the sounding time of sound signal A is 18:30 on April 10, 2020.
  • the electronic device determining the first attribute information of the first sound signal may include the following steps: the electronic device uses a microphone array The sounding position of the first sound signal is determined; the electronic device determines the sound content of the first sound signal according to the sound characteristics of the first sound signal; the electronic device determines the sounding time of the first sound signal.
  • the sound content of the first sound signal is voice content.
  • the first sound signal is sound signal C.
  • sound signal C will be generated to propagate in the air.
  • the sound content of the signal C is "play song ABC" that the user 400 said with his mouth.
  • the electronic device determining the first attribute information of the first sound signal may include the following steps: electronic device Use the microphone array to determine the sounding position of the first sound signal; the electronic device determines the sound type of the first sound signal according to the sound characteristics of the first sound signal; the electronic device determines the sound content of the first sound signal according to the sound characteristics of the first sound signal; The electronic device determines the sounding time of the first sound signal.
  • the first attribute information includes the sounding position, sound type, and sounding time of the first sound signal.
  • the application scenario of the first implementation manner is: the electronic device can identify a sound source that only emits one type of sound.
  • the first attribute information includes the sounding position, sound content, and sounding time of the first sound signal.
  • the application scenario of the second implementation manner is: the electronic device can identify the sound source emitting the voice content.
  • the first attribute information includes the sounding position, sound type, sound content, and sounding time of the first sound signal.
  • the application scenario of the third implementation manner is: not only can identify the sound source that emits only one type of sound, but also the sound source that emits the voice content.
  • the electronic device determines whether the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library.
  • the fixed sound source library includes attribute information of one or more fixed sound sources, and the fixed sound sources are sound sources that are located at the same location and emit a known sound type.
  • Table 1 shows the attribute information of multiple fixed sound sources in the fixed sound source library, and the attribute information of the fixed sound source in Table 1 is smart speakers 100 Pre-learned and generated data based on historical information. Regarding the generation process of Table 1, the following embodiments will introduce in detail.
  • the smart speaker 100 determines that the attribute information of sound signal A includes the sounding position, sound type, and sounding time of sound signal A.
  • the sounding position of the sound signal A is 140 degrees
  • the sound type of the sound signal A is the sound of the range hood 200
  • the sounding time of the sound signal A is 18:30 on April 10, 2020.
  • the smart speaker 100 After the smart speaker 100 obtains the attribute information of the sound signal A, the smart speaker 100 will determine whether the attribute information of the sound signal A matches the attribute information of the fixed sound source in the fixed sound source library of Table 1. It can be known from Table 1 that the attribute information of the sound signal A matches the attribute information of the fixed sound source corresponding to number 1 in Table 1, indicating that the sound source 1 corresponding to the sound signal A is the sound signal emitted by the fixed sound source.
  • the electronic device can match the first attribute information of the first sound signal in the first audio stream with the pre-generated fixed sound source library, if the first attribute information matches the fixed sound source library
  • the attribute information of the fixed sound source in the match indicates that the first sound signal is the sound signal emitted by the fixed sound source, so the electronic device can accurately identify the fixed sound source in the environment.
  • FIG. 4 shows another schematic diagram of a scene provided by an embodiment of the present application
  • FIG. 5 shows a schematic diagram of the orientation of the two sound sources in FIG. 4.
  • the scene diagram shown in FIG. 4 shows the smart speaker 100, the user 400, and the smart TV 600.
  • the smart speaker 100 shown in FIG. 4 can execute the fixed sound source identification method provided by the embodiment of the present application.
  • FIG. 4 and FIG. 5 in a possible scenario, assume that the user 400 has devices such as smart speakers 100 and smart TV 600 at home, and the relative positions between smart speakers 100 and smart TV 600 are shown in FIGS. 4 and Shown in Figure 5. Assuming that the user 400 wants the smart speaker 100 to play the song "ABC”, the user 400 will send a voice instruction to the smart speaker 100 to play the song "ABC”, and the voice instruction is the sound signal F in FIGS. 4 and 5.
  • the smart TV 600 is in a working state, the smart TV 600 will emit noise to the smart speaker 100, and the noise emitted by the smart TV 600 is the sound signal in FIGS. 4 and 5 E.
  • the smart speaker 100 can use the fixed sound source identification method shown in FIG. 3 to determine which sound signal received belongs to the fixed sound source. After the smart speaker 100 determines that the smart TV 600 belongs to a fixed sound source, the smart speaker 100 can shield the received sound signal E, and only recognize the voice command corresponding to the received sound signal F, so as to play the song "ABC" for the user.
  • the first audio stream includes a sound signal E and a sound signal F, where the first sound signal is a sound signal E, the second sound signal is a sound signal F, and the sounding positions of the sound signals E and F are divided into 230° and 320°.
  • the smart speaker 100 determines that the attribute information of the sound signal E includes the sounding position, sound content, and sounding time of the sound signal E. Assuming that the sounding position of the sound signal E is 230 degrees, the sound content of the sound signal E is "Welcome to the DEF program", and the sounding time of the sound signal E is 18:30.
  • Table 2 shows the attribute information of multiple fixed sound sources in the fixed sound source library. Pre-learned and generated data.
  • the smart speaker 100 after the smart speaker 100 obtains the attribute information of the sound signal E, the smart speaker 100 will determine whether the attribute information of the sound signal E is the same as that in the fixed sound source library shown in Table 2. Match the attribute information of the fixed sound source. It can be known from Table 2 that the attribute information of the sound signal E matches the attribute information of the fixed sound source corresponding to number 1 in Table 2, indicating that the sound source 5 corresponding to the sound signal E is the sound signal emitted by the fixed sound source.
  • FIG. 6 shows a flowchart of another method for identifying a fixed sound source according to an embodiment of the present application.
  • the method shown in FIG. 6 is the refinement step in S102 of FIG. 3, specifically the refinement step of "the electronic device determines the sound type of the first sound signal according to the sound characteristics of the first sound signal".
  • the method shown in FIG. 6 includes the following steps S201 to S203.
  • step S201 The electronic device determines whether there is a sound type corresponding to the sound feature of the first sound signal in the sound event library. If it exists, execute step S202; if it does not exist, execute step S203.
  • the sound event library includes one or more sound types, and the sound types in the sound event library are all preset.
  • Table 3 shows the correspondence between sound features and sound types in the sound event library.
  • Voice characteristics Sound type Voice feature X The sound of range hood 200 Voice feature Y The sound of air conditioner 300 ... ...
  • the smart speaker 100 determines the sound characteristic X of the sound signal A. Then, the smart speaker 100 determines whether there is a sound type corresponding to the sound feature X of the sound signal A in the sound event library shown in Table 3. It can be known from Table 3 that the sound type corresponding to the sound feature X of the sound signal A is the sound of the range hood 200. Finally, the smart speaker 100 can determine that the sound type corresponding to the sound feature X of the sound signal A is the sound of the range hood 200.
  • S202 Determine the sound type corresponding to the sound feature of the first sound signal as the sound type of the first sound signal.
  • the electronic device sends a first network request to the external server, and the electronic device receives the first response request sent by the external server.
  • the first network request includes the sound feature of the first sound signal
  • the first response request includes the sound type corresponding to the sound feature of the first sound signal
  • FIG. 7 shows a schematic diagram of another scenario provided in an embodiment of the present application.
  • the smart speaker 100 can connect to an external server 1000 through the Internet. Assuming that there is no sound type corresponding to the sound characteristic X of the sound signal A in the sound event library of the smart speaker 100, the smart speaker 100 will send a first network request to the server 1000, and the first network request includes the sound characteristic X of the sound signal A. . After the server 1000 receives the first network request, the server 1000 will query in the cloud storage that the sound type corresponding to the sound feature X of the sound signal A is the sound of the range hood 200, and then the server 1000 will send the first network request to the smart speaker 100 A response request.
  • the first response request includes the sound characteristic X of the sound signal A and the corresponding sound type is the sound of the range hood 200.
  • the smart speaker 100 can learn that the sound type corresponding to the sound feature X of the sound signal A is the sound of the range hood 200.
  • the electronic device can obtain the sound type corresponding to the sound feature of the first sound signal in a sound event library or an external server.
  • step S204 may be further included (S204 is not shown in FIG. 6), and S204 may replace S203 to form another implementation manner.
  • S204 may include the following steps: the electronic device determines whether the number of times the sound feature of the first sound signal appears in the first position is greater than a first threshold, the first position is the sounding position of the first sound signal, and if the sound of the first sound signal The number of times the feature appears in the first position is greater than the first threshold, and it is determined that the sound type of the first sound signal is a known sound type. In addition, after determining that the sound type of the first sound signal is a known sound type, the electronic device may also store the sound characteristics of the first sound signal and the known sound type in the sound event library.
  • the first position is the sounding position of the sound source corresponding to the first sound signal relative to the electronic device, that is, the first position is the sounding position of the first sound signal.
  • the first threshold is a preset number of times. For example, the first threshold may be set to 3 times in advance.
  • the known sound type refers to a sound type that is uncertain of a specific sound type but belongs to a fixed sound source.
  • the electronic device can determine the sound type of the first sound signal as a known sound type.
  • the electronic device may determine the sound type of the first sound signal as the known sound type A. Although the electronic device is not sure which specific sound type the known sound type A belongs to, the electronic device may determine the known sound type A. It is a type of sound that can often be received.
  • FIG. 8 shows a flowchart of another method for identifying a fixed sound source according to an embodiment of the present application.
  • the method shown in FIG. 8 includes the following steps S301 to S304.
  • the electronic device acquires a second audio stream in a second time period, where the second audio stream includes at least a second sound signal.
  • the second time period refers to a time period during which the user does not input a voice instruction to the electronic device.
  • the electronic device When the user does not input a voice command to the electronic device, the electronic device will obtain the sound signal from the fixed sound source in the surrounding environment in real time.
  • the second audio stream is an audio stream generated by the electronic device using the microphone array to collect sounds in the environment where the electronic device is located in the second time period when the user does not input a voice command to the electronic device.
  • FIG. 9 shows a schematic diagram of another scene provided by an embodiment of this application
  • FIG. 10 shows a schematic diagram of the position of one sound source in FIG. 9.
  • the scene schematic diagram shown in FIG. 9 shows the smart speaker 100 and the water dispenser 500.
  • the smart speaker 100 obtains the second audio stream within a period of time, the second audio stream only includes the sound signal D, and the sound signal D is the sound signal emitted by the water dispenser 500.
  • the electronic device determines second attribute information of the second sound signal.
  • S302 in FIG. 8 The execution process of S302 in FIG. 8 is the same as that of S102 in FIG. 3.
  • S302 in FIG. 8 please refer to the detailed description of S102 in FIG. 3.
  • the electronic device judges whether the second attribute information exists in the fixed sound source library.
  • the fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound sources are sound sources that are located at the same location and emit a known sound type.
  • the second attribute information exists in the fixed sound source library, it means that the attribute information of the sound source corresponding to the second sound signal has been stored in the fixed sound source library. If the second attribute information does not exist in the fixed sound source library, it means that the attribute information of the sound source corresponding to the second sound signal is not stored in the fixed sound source library, that is, the sound source corresponding to the second sound signal is for the electronic device New fixed sound source.
  • the smart speaker 100 determines that the attribute information of the sound signal D includes the sounding position, the sound type, and the sounding time of the sound signal D.
  • the sounding position of the sound signal D is 180 degrees
  • the sound type of the sound signal D is the sound of the water dispenser 500
  • the sounding time of the sound signal D is from 18:10 to 18:13 on April 10, 2020.
  • the smart speaker 100 After the smart speaker 100 obtains the attribute information of the sound signal D, the smart speaker 100 determines whether the attribute information of the sound signal D exists in the fixed sound source library. It can be known from Table 1 that the attribute information of the sound signal D does not exist in the fixed sound source library, so the smart speaker 100 stores the attribute information of the sound signal D in the fixed sound source library.
  • Table 4 is the state after the attribute information of the sound signal D is stored in the fixed sound source library shown in Table 1.
  • the electronic device can establish a fixed sound source library, and can also continuously update the content in the fixed sound source library.
  • the fixed sound source library shown in Table 1 or Table 4 can be established.
  • FIG. 11 shows a schematic diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device shown in Figure 11 includes the following modules:
  • the acquiring module 11 is configured to acquire a first audio stream in a first time period, where the first audio stream includes at least a first sound signal.
  • the processing module 12 is configured to separate the first sound signal from the first audio stream; determine the first attribute information of the first sound signal; determine whether the first attribute information is consistent with the fixed sound source in the fixed sound source library
  • the fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound source is a sound source that is located in the same position and emits a known sound type; When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is a sound signal emitted by the fixed sound source.
  • the device embodiment described in FIG. 11 is only schematic.
  • the division of modules is only a logical function division.
  • multiple modules or components can be combined or integrated into Another system, or some features can be ignored, or not implemented.
  • the functional modules in the various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • FIG. 12 shows a schematic diagram of another electronic device provided by an embodiment of the present application.
  • the electronic device shown in FIG. 12 includes a processor 21 and a memory 22.
  • the processor 21 is configured to execute instructions stored in the memory 22, so that the electronic device performs the following operations: obtain a first audio stream in a first time period, and the first audio stream is at least The first sound signal is included; the first sound signal is separated from the first audio stream; the first attribute information of the first sound signal is determined; whether the first attribute information is related to a fixed sound source in a fixed sound source library
  • the fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound source is a sound source that is located in the same position and emits a known sound type; When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is a sound signal emitted by the fixed sound source.
  • the processor 21 is one or more CPUs.
  • the CPU is a single-core CPU or a multi-core CPU.
  • the memory 22 includes, but is not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory, EPROM or flash Memory), flash memory, or optical memory, etc.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read-only memory
  • flash memory or optical memory, etc.
  • the code of the operating system is stored in the memory 22.
  • the electronic device further includes a bus 23, and the above-mentioned processor 21 and the memory 22 are connected to each other through the bus 23, and may also be connected to each other in other ways.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

A fixed sound source recognition method and device. The method comprises: an electronic device obtains a first audio stream within a first time period, the first audio stream at least comprising a first sound signal (S101); separate the first sound signal from the first audio stream; the electronic device determines first attribute information of the first sound signal (S102); the electronic device determines whether the first attribute information matches attribute information of a fixed sound source in a fixed sound source library (S103); if yes, determine that the first sound signal is a sound signal sent by the fixed sound source (S104). The electronic device matches the first attribute information of the first sound signal and the fixed sound source library, and if the first attribute information matches attribute information of a fixed sound source in the fixed sound source library, the first sound signal is a sound signal sent by the fixed sound source, so that the electronic device can accurately recognize a fixed sound source in an environment.

Description

一种固定声源识别方法及装置Method and device for identifying fixed sound source 技术领域Technical field
本申请涉及人工智能领域,更具体的说,涉及固定声源识别方法及装置。This application relates to the field of artificial intelligence, and more specifically, to a method and device for identifying a fixed sound source.
背景技术Background technique
随着技术的进步,智能语音识别功能被广泛的应用于电子设备中。例如,智能手机、智能音箱、智能电视和智能机器人等电子设备中均设置有智能语音识别功能。目前,在使用这一类电子设备的过程中,用户需要在安静的环境下发出语音指令,以使电子设备能够根据用户发出的语音指令来进行相应的操作。With the advancement of technology, intelligent voice recognition functions are widely used in electronic devices. For example, electronic devices such as smart phones, smart speakers, smart TVs, and smart robots are all equipped with smart voice recognition functions. At present, in the process of using this type of electronic device, the user needs to issue a voice command in a quiet environment, so that the electronic device can perform corresponding operations according to the voice command issued by the user.
如果用户所在的环境中存在噪音源,那么电子设备在接收到用户输入的语音指令的同时,还会接收到噪音源发出的噪音,使得用户输入的语音指令被噪音源发出的噪音所干扰,以使电子设备难以正确的识别出用户输入的语音指令对应的真实意图,从而导致电子设备识别语音的准确率降低。If there is a noise source in the environment where the user is located, the electronic device will receive the noise from the noise source while receiving the voice command input by the user, so that the voice command input by the user is disturbed by the noise emitted by the noise source. It is difficult for the electronic device to correctly recognize the true intention corresponding to the voice command input by the user, which leads to a decrease in the accuracy of the electronic device in recognizing the voice.
因此,如何识别出电子设备周围环境下的噪音源,以避免电子设备受到环境噪音的干扰,成为目前亟须解决的技术问题。Therefore, how to identify the noise source in the surrounding environment of the electronic device to prevent the electronic device from being interfered by the environmental noise has become a technical problem that needs to be solved urgently.
发明内容Summary of the invention
本申请实施例提供一种固定声源识别方法及装置,以识别出电子设备周围环境中的固定声源。The embodiments of the present application provide a method and device for identifying a fixed sound source, so as to identify a fixed sound source in an environment around an electronic device.
第一方面,本申请实施例提供了一种固定声源识别方法,方法应用于电子设备中,方法包括:电子设备获取第一时间段内的第一音频流,第一音频流至少包括第一声音信号;所述电子设备在所述第一音频流中分离出所述第一声音信号;电子设备确定第一声音信号的第一属性信息;电子设备判断第一属性信息是否与固定声源库中的固定声源的属性信息相匹配,固定声源库中包括一个或多个固定声源对应的属性信息,固定声源为位于同一个位置且发出一种已知声音类型的声源;在第一属性信息与固定声源库中的固定声源的属性信息相匹配时,确定第一声音信号为固定声源发出的声音信号。In the first aspect, an embodiment of the present application provides a method for identifying a fixed sound source. The method is applied to an electronic device. The method includes: the electronic device acquires a first audio stream in a first time period, and the first audio stream includes at least the first audio stream. A sound signal; the electronic device separates the first sound signal from the first audio stream; the electronic device determines the first attribute information of the first sound signal; the electronic device determines whether the first attribute information is consistent with a fixed sound source library The fixed sound source in the fixed sound source matches the attribute information, the fixed sound source library includes one or more fixed sound sources corresponding to the attribute information, and the fixed sound source is a sound source that is located in the same position and emits a known sound type; When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is a sound signal emitted by the fixed sound source.
在第一方面中,电子设备能够将第一音频流中的第一声音信号的第一属性信息与预先生成的固定声源库进行匹配,如果第一属性信息与固定声源库中的固定声源的属性信息相匹配,说明第一声音信号为固定声源发出的声音信号,所以电子设备能够精准的识别出环境中存在的固定声源。In the first aspect, the electronic device can match the first attribute information of the first sound signal in the first audio stream with the fixed sound source library generated in advance, if the first attribute information matches the fixed sound source library in the fixed sound source library. The attribute information of the source matches, indicating that the first sound signal is a sound signal from a fixed sound source, so the electronic device can accurately identify the fixed sound source in the environment.
在第一方面的一种可能的实现方式中,电子设备包括麦克风阵列,电子设备获取第一时间段内的第一音频流,包括:电子设备利用麦克风阵列在第一时间段内采集电子设备所处环境中的声音生成第一音频流。In a possible implementation of the first aspect, the electronic device includes a microphone array, and the electronic device acquiring the first audio stream in the first time period includes: the electronic device uses the microphone array to collect data from the electronic device in the first time period. The sound in the environment generates the first audio stream.
在第一方面的一种可能的实现方式中,第一属性信息包括第一声音信号的发声位置、声音类型和发声时间。In a possible implementation manner of the first aspect, the first attribute information includes a sounding position, a sound type, and a sounding time of the first sound signal.
在第一方面的一种可能的实现方式中,电子设备确定第一声音信号的第一属性信息,包括:电子设备利用麦克风阵列确定第一声音信号的发声位置;电子设备根据第一声音信 号的声音特征确定第一声音信号的声音类型;电子设备确定第一声音信号的发声时间。In a possible implementation of the first aspect, the electronic device determining the first attribute information of the first sound signal includes: the electronic device uses a microphone array to determine the sounding position of the first sound signal; The sound feature determines the sound type of the first sound signal; the electronic device determines the sounding time of the first sound signal.
在第一方面的一种可能的实现方式中,第一属性信息包括第一声音信号的发声位置、声音内容和发声时间。In a possible implementation manner of the first aspect, the first attribute information includes the sounding position, sound content, and sounding time of the first sound signal.
在第一方面的一种可能的实现方式中,电子设备确定第一声音信号的第一属性信息,包括:电子设备利用麦克风阵列确定第一声音信号的发声位置;电子设备根据第一声音信号的声音特征确定第一声音信号的声音内容;电子设备确定第一声音信号的发声时间。In a possible implementation of the first aspect, the electronic device determining the first attribute information of the first sound signal includes: the electronic device uses a microphone array to determine the sounding position of the first sound signal; The sound feature determines the sound content of the first sound signal; the electronic device determines the sounding time of the first sound signal.
在第一方面的一种可能的实现方式中,第一属性信息包括第一声音信号的发声位置、声音类型、声音内容和发声时间。In a possible implementation manner of the first aspect, the first attribute information includes the sounding position, sound type, sound content, and sounding time of the first sound signal.
在第一方面的一种可能的实现方式中,电子设备确定第一声音信号的第一属性信息,包括:电子设备利用麦克风阵列确定第一声音信号的发声位置;电子设备根据第一声音信号的声音特征确定第一声音信号的声音类型;电子设备根据第一声音信号的声音特征确定第一声音信号的声音内容;电子设备确定第一声音信号的发声时间。In a possible implementation of the first aspect, the electronic device determining the first attribute information of the first sound signal includes: the electronic device uses a microphone array to determine the sounding position of the first sound signal; The sound characteristic determines the sound type of the first sound signal; the electronic device determines the sound content of the first sound signal according to the sound characteristic of the first sound signal; the electronic device determines the sounding time of the first sound signal.
在第一方面的一种可能的实现方式中,电子设备根据第一声音信号的声音特征确定第一声音信号的声音类型,包括:电子设备确定声音事件库中是否存在与第一声音信号的声音特征对应的声音类型,声音事件库包括一种或多种声音类型;在声音事件库中存在与第一声音信号的声音特征对应的声音类型时,将第一声音信号的声音特征对应的声音类型确定为第一声音信号的声音类型;在声音事件库中不存在与第一声音信号的声音特征对应的声音类型时,电子设备向外部服务器发送第一网络请求,电子设备接收外部服务器发送的第一响应请求,第一网络请求包括第一声音信号的声音特征,第一响应请求包括第一声音信号的声音特征对应的声音类型;或者,在声音事件库中不存在与第一声音信号的声音特征对应的声音类型时,电子设备确定第一声音信号的声音特征在第一位置出现的次数是否大于第一阈值,第一位置为第一声音信号的发声位置,如果第一声音信号的声音特征在第一位置出现的次数大于第一阈值,确定第一声音信号的声音类型为已知声音类型。In a possible implementation manner of the first aspect, the electronic device determines the sound type of the first sound signal according to the sound characteristics of the first sound signal, including: the electronic device determines whether there is a sound that corresponds to the first sound signal in the sound event library. The sound type corresponding to the feature, the sound event library includes one or more sound types; when there is a sound type corresponding to the sound feature of the first sound signal in the sound event library, the sound type corresponding to the sound feature of the first sound signal Determine as the sound type of the first sound signal; when there is no sound type corresponding to the sound feature of the first sound signal in the sound event library, the electronic device sends the first network request to the external server, and the electronic device receives the first network request sent by the external server. A response request, the first network request includes the sound characteristic of the first sound signal, and the first response request includes the sound type corresponding to the sound characteristic of the first sound signal; or, there is no sound corresponding to the first sound signal in the sound event library When the feature corresponds to the sound type, the electronic device determines whether the number of times the sound feature of the first sound signal appears in the first position is greater than the first threshold, and the first position is the sounding position of the first sound signal. If the sound characteristic of the first sound signal is The number of occurrences at the first position is greater than the first threshold, and it is determined that the sound type of the first sound signal is a known sound type.
在第一方面的一种可能的实现方式中,电子设备能够在声音事件库或外部服务器中获取与第一声音信号的声音特征对应的声音类型。In a possible implementation manner of the first aspect, the electronic device can obtain the sound type corresponding to the sound feature of the first sound signal in a sound event library or an external server.
在第一方面的一种可能的实现方式中,方法还包括:电子设备获取第二时间段内的第二音频流,第二音频流至少包括第二声音信号;电子设备确定第二声音信号的第二属性信息;电子设备判断第二属性信息是否存在于固定声源库中,固定声源库中包括一个或多个固定声源对应的属性信息,固定声源为位于同一个位置且发出一种已知声音类型的声源;在第二属性信息不存在于固定声源库中时,将第二属性信息存储至固定声源库中。In a possible implementation of the first aspect, the method further includes: the electronic device acquires a second audio stream in a second time period, the second audio stream includes at least a second sound signal; and the electronic device determines the value of the second sound signal The second attribute information; the electronic device determines whether the second attribute information exists in the fixed sound source library. The fixed sound source library includes attribute information corresponding to one or more fixed sound sources. The fixed sound sources are located in the same position and emit one A sound source with a known sound type; when the second attribute information does not exist in the fixed sound source library, the second attribute information is stored in the fixed sound source library.
其中,电子设备能够建立固定声源库,并且还可以不断的更新固定声源库中的内容。Among them, the electronic device can establish a fixed sound source library, and can also continuously update the content in the fixed sound source library.
第二方面,本申请实施例提供了电子设备,包括存储器和与存储器连接的处理器,存储器用于存储指令;处理器用于执行指令,以使计算机设备执行以下操作:获取第一时间段内的第一音频流,第一音频流至少包括第一声音信号;在第一音频流中分离出第一声音信号;确定第一声音信号的第一属性信息;判断第一属性信息是否与固定声源库中的固定声源的属性信息相匹配,固定声源库中包括一个或多个固定声源对应的属性信息,固定声源为位于同一个位置且发出一种已知声音类型的声源;在第一属性信息与固定声源库中的固定声源的属性信息相匹配时,确定第一声音信号为固定声源发出的声音信号。In the second aspect, an embodiment of the present application provides an electronic device, including a memory and a processor connected to the memory, the memory is used for storing instructions; the processor is used for executing instructions, so that the computer device performs the following operations: The first audio stream, the first audio stream includes at least a first sound signal; the first sound signal is separated from the first audio stream; the first attribute information of the first sound signal is determined; whether the first attribute information is related to a fixed sound source The attribute information of the fixed sound sources in the library matches, and the fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound sources are sound sources that are located in the same position and emit a known sound type; When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is a sound signal emitted by the fixed sound source.
在第二方面的一种可能的实现方式中,电子设备包括麦克风阵列;处理器,具体用于 利用麦克风阵列在第一时间段内采集电子设备所处环境中的声音生成第一音频流。In a possible implementation of the second aspect, the electronic device includes a microphone array; the processor is specifically configured to use the microphone array to collect sounds in an environment in which the electronic device is located within the first time period to generate a first audio stream.
在第二方面的一种可能的实现方式中,第一属性信息包括第一声音信号的发声位置、声音类型和发声时间。In a possible implementation manner of the second aspect, the first attribute information includes the sounding position, sound type, and sounding time of the first sound signal.
在第二方面的一种可能的实现方式中,处理器,具体用于利用麦克风阵列确定第一声音信号的发声位置;根据第一声音信号的声音特征确定第一声音信号的声音类型;确定第一声音信号的发声时间。In a possible implementation manner of the second aspect, the processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound type of the first sound signal according to the sound characteristics of the first sound signal; and determine the sound type of the first sound signal; The sounding time of a sound signal.
在第二方面的一种可能的实现方式中,第一属性信息包括第一声音信号的发声位置、声音内容和发声时间。In a possible implementation manner of the second aspect, the first attribute information includes the sounding position, sound content, and sounding time of the first sound signal.
在第二方面的一种可能的实现方式中,处理器,具体用于利用麦克风阵列确定第一声音信号的发声位置;根据第一声音信号的声音特征确定第一声音信号的声音内容;确定第一声音信号的发声时间。In a possible implementation manner of the second aspect, the processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound content of the first sound signal according to the sound characteristics of the first sound signal; and determine the sound content of the first sound signal; The sounding time of a sound signal.
在第二方面的一种可能的实现方式中,第一属性信息包括第一声音信号的发声位置、声音类型、声音内容和发声时间。In a possible implementation manner of the second aspect, the first attribute information includes the sounding position, sound type, sound content, and sounding time of the first sound signal.
在第二方面的一种可能的实现方式中,处理器,具体用于利用麦克风阵列确定第一声音信号的发声位置;根据第一声音信号的声音特征确定第一声音信号的声音类型;根据第一声音信号的声音特征确定第一声音信号的声音内容;确定第一声音信号的发声时间。In a possible implementation of the second aspect, the processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound type of the first sound signal according to the sound characteristics of the first sound signal; The sound feature of a sound signal determines the sound content of the first sound signal; the sounding time of the first sound signal is determined.
在第二方面的一种可能的实现方式中,处理器,具体用于确定声音事件库中是否存在与第一声音信号的声音特征对应的声音类型,声音事件库包括一种或多种声音类型;在声音事件库中存在与第一声音信号的声音特征对应的声音类型时,将第一声音信号的声音特征对应的声音类型确定为第一声音信号的声音类型;在声音事件库中不存在与第一声音信号的声音特征对应的声音类型时,向外部服务器发送第一网络请求,接收外部服务器发送的第一响应请求,第一网络请求包括第一声音信号的声音特征,第一响应请求包括第一声音信号的声音特征对应的声音类型;或者,在声音事件库中不存在与第一声音信号的声音特征对应的声音类型时,确定第一声音信号的声音特征在第一位置出现的次数是否大于第一阈值,第一位置为第一声音信号的发声位置,如果第一声音信号的声音特征在第一位置出现的次数大于第一阈值,确定第一声音信号的声音类型为已知声音类型。In a possible implementation of the second aspect, the processor is specifically configured to determine whether there is a sound type corresponding to the sound feature of the first sound signal in the sound event library, and the sound event library includes one or more sound types ; When there is a sound type corresponding to the sound feature of the first sound signal in the sound event library, the sound type corresponding to the sound feature of the first sound signal is determined as the sound type of the first sound signal; it does not exist in the sound event library When the sound type corresponds to the sound characteristic of the first sound signal, the first network request is sent to the external server, and the first response request sent by the external server is received. The first network request includes the sound characteristic of the first sound signal, and the first response request It includes the sound type corresponding to the sound feature of the first sound signal; or, when there is no sound type corresponding to the sound feature of the first sound signal in the sound event library, it is determined that the sound feature of the first sound signal appears at the first position Whether the number of times is greater than the first threshold, the first position is the sounding position of the first sound signal, and if the number of times the sound feature of the first sound signal appears at the first position is greater than the first threshold, it is determined that the sound type of the first sound signal is known Sound type.
在第二方面的一种可能的实现方式中,处理器,还用于获取第二时间段内的第二音频流,第二音频流至少包括第二声音信号;确定第二声音信号的第二属性信息;判断第二属性信息是否存在于固定声源库中,固定声源库中包括一个或多个固定声源对应的属性信息,固定声源为位于同一个位置且发出一种已知声音类型的声源;在第二属性信息不存在于固定声源库中时,将第二属性信息存储至固定声源库中。In a possible implementation manner of the second aspect, the processor is further configured to obtain a second audio stream in a second time period, where the second audio stream includes at least a second sound signal; and determines the second audio signal of the second sound signal. Attribute information; determine whether the second attribute information exists in the fixed sound source library. The fixed sound source library includes attribute information corresponding to one or more fixed sound sources. The fixed sound source is located at the same position and emits a known sound Type of sound source; when the second attribute information does not exist in the fixed sound source library, the second attribute information is stored in the fixed sound source library.
第三方面,本申请实施例提供了一种电子设备,包括:获取模块,用于获取第一时间段内的第一音频流,第一音频流至少包括第一声音信号;处理模块,用于在第一音频流中分离出第一声音信号;确定第一声音信号的第一属性信息;判断第一属性信息是否与固定声源库中的固定声源的属性信息相匹配,固定声源库中包括一个或多个固定声源对应的属性信息,固定声源为位于同一个位置且发出一种已知声音类型的声源;在第一属性信息与固定声源库中的固定声源的属性信息相匹配时,确定第一声音信号为固定声源发出的声音信号。In a third aspect, an embodiment of the present application provides an electronic device, including: an acquisition module, configured to acquire a first audio stream in a first time period, the first audio stream including at least a first sound signal; a processing module, configured to Separate the first sound signal from the first audio stream; determine the first attribute information of the first sound signal; determine whether the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, and fix the sound source library It includes attribute information corresponding to one or more fixed sound sources. The fixed sound source is a sound source that is located at the same position and emits a known sound type; the first attribute information is related to the fixed sound source in the fixed sound source library. When the attribute information matches, it is determined that the first sound signal is a sound signal emitted by a fixed sound source.
附图说明Description of the drawings
图1所示的为本申请实施例提供的一种场景示意图;FIG. 1 shows a schematic diagram of a scenario provided by an embodiment of this application;
图2所示的为图1中的3个声源的方位示意图;Figure 2 shows a schematic diagram of the azimuths of the three sound sources in Figure 1;
图3所示的为本申请实施例提供的一种固定声源识别方法的流程图;FIG. 3 shows a flowchart of a method for identifying a fixed sound source according to an embodiment of this application;
图4所示的为本申请实施例提供的另一种场景示意图;FIG. 4 shows a schematic diagram of another scenario provided by an embodiment of this application;
图5所示的为图4中的2个声源的方位示意图;Figure 5 shows a schematic diagram of the azimuths of the two sound sources in Figure 4;
图6所示的为本申请实施例提供的另一种固定声源识别方法的流程图;FIG. 6 shows a flowchart of another method for identifying a fixed sound source provided by an embodiment of this application;
图7所示的为本申请实施例提供的又一种场景示意图;FIG. 7 shows a schematic diagram of another scenario provided by an embodiment of this application;
图8所示的为本申请实施例提供的又一种固定声源识别方法的流程图;FIG. 8 shows a flowchart of another method for identifying a fixed sound source according to an embodiment of this application;
图9所示的为本申请实施例提供的又一种场景示意图;FIG. 9 shows a schematic diagram of another scenario provided by an embodiment of this application;
图10所示的为图9中的1个声源的方位示意图;FIG. 10 shows a schematic diagram of the azimuth of one sound source in FIG. 9;
图11所示的为本申请实施例提供的一种电子设备的示意图;FIG. 11 shows a schematic diagram of an electronic device provided by an embodiment of this application;
图12所示的为本申请实施例提供的又一种电子设备的示意图。FIG. 12 shows a schematic diagram of yet another electronic device provided by an embodiment of this application.
具体实施方式Detailed ways
请参见图1和图2所示,图1所示的为本申请实施例提供的一种场景示意图,图2所示的为图1中的3个声源的方位示意图。图1所示的场景示意图展示了智能音箱100、排油烟机200、空调300和用户400。其中,图1所示的智能音箱100能够执行本申请实施例提供的固定声源识别方法。Please refer to FIG. 1 and FIG. 2. FIG. 1 shows a schematic diagram of a scene provided by an embodiment of the present application, and FIG. 2 shows a schematic diagram of the orientation of the three sound sources in FIG. 1. The scene diagram shown in FIG. 1 shows a smart speaker 100, a range hood 200, an air conditioner 300, and a user 400. Among them, the smart speaker 100 shown in FIG. 1 can execute the fixed sound source identification method provided by the embodiment of the present application.
结合图1和图2所示,在一种可能的场景中,假设用户400的家中具有智能音箱100、排油烟机200和空调300等设备,而且智能音箱100、排油烟机200和空调300之间的相对位置如图1和图2所示。假设用户400想要让智能音箱100播放歌曲《ABC》,用户400会向智能音箱100发出播放歌曲《ABC》的语音指令,该语音指令为图1和图2中的声音信号C。As shown in Figure 1 and Figure 2, in a possible scenario, it is assumed that the home of the user 400 has smart speakers 100, range hood 200, and air conditioner 300, and the smart speaker 100, range hood 200, and air conditioner 300 The relative position between the two is shown in Figure 1 and Figure 2. Assuming that the user 400 wants to make the smart speaker 100 play the song "ABC", the user 400 will send a voice command to the smart speaker 100 to play the song "ABC", and the voice command is the sound signal C in FIGS. 1 and 2.
假设在用户400向智能音箱100发出声音信号C的过程中,排油烟机200处于工作状态,排油烟机200会向智能音箱100发出噪音,排油烟机200发出的噪音为图1和图2中的声音信号A。Suppose that the range hood 200 is in working state while the user 400 sends the sound signal C to the smart speaker 100, the range hood 200 will emit noise to the smart speaker 100, and the noise emitted by the range hood 200 is shown in FIGS. 1 and 2 The sound signal A.
假设在用户400向智能音箱100发出声音信号C的过程中,空调300处于工作状态,空调300会向智能音箱100发出噪音,空调300发出的噪音为图1和图2中的声音信号B。Assuming that during the process of the user 400 sending the sound signal C to the smart speaker 100, the air conditioner 300 is in the working state, the air conditioner 300 emits noise to the smart speaker 100, and the noise emitted by the air conditioner 300 is the sound signal B in FIGS. 1 and 2.
此时,智能音箱100便可以利用本申请实施例提供的固定声源识别方法,来判断接收到的哪个声音信号属于固定声源。在智能音箱100确定了声音信号A和声音信号B属于固定声源以后,智能音箱100可以屏蔽接收到的声音信号A和声音信号B,仅识别接收到的声音信号C对应的语音指令,从而为用户播放歌曲《ABC》。At this time, the smart speaker 100 can use the fixed sound source identification method provided in the embodiments of the present application to determine which sound signal is received belongs to the fixed sound source. After the smart speaker 100 determines that the sound signal A and the sound signal B belong to a fixed sound source, the smart speaker 100 can shield the received sound signal A and sound signal B, and only recognize the voice command corresponding to the received sound signal C, thereby The user plays the song "ABC".
在图1和图2所示的示例中,智能音箱100可以利用本申请实施例提供的固定声源识别方法,精准的识别出环境中存在的固定声源,并正确的识别出用户400输入的语音指令对应的真实意图,从而提高了智能音箱100识别语音指令的准确率。In the examples shown in Figures 1 and 2, the smart speaker 100 can use the fixed sound source identification method provided by the embodiments of the present application to accurately identify the fixed sound source in the environment, and correctly identify the input of the user 400 The real intention corresponding to the voice command improves the accuracy of the smart speaker 100 in recognizing the voice command.
请参见图3所示,图3所示的为本申请实施例提供的一种固定声源识别方法的流程图。图3所示的固定声源识别方法可以应用于电子设备中,电子设备可以为智能手机、智能音箱、智能电视和智能机器人等具有智能语音识别功能的设备。图3所示的方法包括以下步骤S101至S104。Please refer to FIG. 3, which shows a flowchart of a method for identifying a fixed sound source according to an embodiment of this application. The fixed sound source recognition method shown in FIG. 3 can be applied to electronic devices, and the electronic devices can be devices with smart voice recognition functions such as smart phones, smart speakers, smart TVs, and smart robots. The method shown in FIG. 3 includes the following steps S101 to S104.
S101、电子设备获取第一时间段内的第一音频流,第一音频流至少包括第一声音信号。S101. The electronic device acquires a first audio stream in a first time period, where the first audio stream includes at least a first sound signal.
其中,第一时间段指的是用户向电子设备输入语音指令的时间段。Wherein, the first time period refers to the time period during which the user inputs a voice instruction to the electronic device.
在电子设备包括麦克风阵列时,电子设备可以利用麦克风阵列在第一时间段内采集电子设备所处环境中的声音生成第一音频流。When the electronic device includes a microphone array, the electronic device may use the microphone array to collect sounds in the environment where the electronic device is located in the first time period to generate the first audio stream.
示例的,请结合图1和图2所示,假设第一音频流包括声音信号A、声音信号B和声音信号C,其中,第一声音信号为声音信号A。For example, please refer to FIG. 1 and FIG. 2. Assume that the first audio stream includes a sound signal A, a sound signal B, and a sound signal C, where the first sound signal is the sound signal A.
在S101以后且在S102以前,电子设备还需要在第一音频流中分离出第一声音信号,具体的分离过程包括步骤A1和步骤A2:After S101 and before S102, the electronic device also needs to separate the first sound signal from the first audio stream. The specific separation process includes step A1 and step A2:
步骤A1、电子设备对第一音频流进行预处理操作得到修正后的第一音频流。Step A1. The electronic device performs a preprocessing operation on the first audio stream to obtain the corrected first audio stream.
其中,预处理操作包括变量的中心化处理、白化处理、主成分分析降维处理和时间滤波处理。预处理操作的目的在于降低第一音频流中的噪声。Among them, the preprocessing operation includes variable centralization processing, whitening processing, principal component analysis dimensionality reduction processing and time filtering processing. The purpose of the preprocessing operation is to reduce the noise in the first audio stream.
步骤A2、对修正后的第一音频流进行独立成分分析(independent component correlation algorithm,ICA)处理得到第一声音信号。Step A2: Perform independent component correlation (ICA) processing on the corrected first audio stream to obtain the first sound signal.
其中,独立成分分析处理用于在第一音频流中分离出第一声音信号,以便于后续步骤可以对第一声音信号进行相应的处理。Wherein, the independent component analysis processing is used to separate the first sound signal from the first audio stream, so that the first sound signal can be processed correspondingly in subsequent steps.
S102、电子设备确定第一声音信号的第一属性信息。S102. The electronic device determines first attribute information of the first sound signal.
其中,关于电子设备确定第一声音信号的第一属性信息存在多种实现方式,下面介绍几种具体的实现方式。Among them, there are multiple implementation manners for determining the first attribute information of the first sound signal by the electronic device, and several specific implementation manners are introduced below.
第一种实现方式,如果第一属性信息包括第一声音信号的发声位置、声音类型和发声时间,那么电子设备确定第一声音信号的第一属性信息,可以包括以下步骤:电子设备利用麦克风阵列确定第一声音信号的发声位置;电子设备根据第一声音信号的声音特征确定第一声音信号的声音类型;电子设备确定第一声音信号的发声时间。In the first implementation manner, if the first attribute information includes the sounding position, sound type, and sounding time of the first sound signal, then the electronic device determining the first attribute information of the first sound signal may include the following steps: the electronic device uses a microphone array The sounding position of the first sound signal is determined; the electronic device determines the sound type of the first sound signal according to the sound characteristics of the first sound signal; the electronic device determines the sounding time of the first sound signal.
在第一种实现方式中,第一声音信号的发声位置指的是第一声音信号对应的声源相对于电子设备的发声位置。示例的,请结合图1和图2所示,假设第一声音信号为声音信号A,那么声音信号A的发声位置为声音信号A对应的声源1相对于智能音箱100的发声位置。如果认为智能音箱100所处的位置是中心点,那么声音信号A对应的声源1相对于智能音箱100的发声位置为140度。同理地,声音信号B对应的声源2相对于智能音箱100的发声位置为45度,声音信号C对应的声源3相对于智能音箱100的发声位置为270度。In the first implementation manner, the sounding position of the first sound signal refers to the sounding position of the sound source corresponding to the first sound signal relative to the electronic device. For example, please refer to FIG. 1 and FIG. 2, assuming that the first sound signal is sound signal A, then the sounding position of sound signal A is the sounding position of sound source 1 corresponding to sound signal A relative to smart speaker 100. If it is considered that the position where the smart speaker 100 is located is the center point, then the sound source 1 corresponding to the sound signal A has a sound position of 140 degrees relative to the smart speaker 100. Similarly, the sounding position of the sound source 2 corresponding to the sound signal B relative to the smart speaker 100 is 45 degrees, and the sounding position of the sound source 3 corresponding to the sound signal C relative to the smart speaker 100 is 270 degrees.
在第一种实现方式中,第一声音信号的声音特征包括但不限于梅尔频率倒谱系数(mel frequency cepstrum coefficient,MFCC)。第一声音信号的声音类型指的是第一声音信号对应的声源所发出的声音。示例的,请结合图1和图2所示,假设第一声音信号为声音信号A,那么声音信号A的声音类型为排油烟机200的声音。In the first implementation manner, the sound feature of the first sound signal includes but is not limited to Mel frequency cepstrum coefficient (MFCC). The sound type of the first sound signal refers to the sound emitted by the sound source corresponding to the first sound signal. For example, please refer to FIG. 1 and FIG. 2, assuming that the first sound signal is the sound signal A, then the sound type of the sound signal A is the sound of the range hood 200.
在第一种实现方式中,第一声音信号的发声时间为电子设备接收到第一声音信号的时间。示例的,请结合图1和图2所示,假设第一声音信号为声音信号A,那么声音信号A的发声时间为2020年4月10日18点30分。In the first implementation manner, the sounding time of the first sound signal is the time when the electronic device receives the first sound signal. For example, please refer to Figures 1 and 2, assuming that the first sound signal is sound signal A, then the sounding time of sound signal A is 18:30 on April 10, 2020.
第二种实现方式,如果第一属性信息包括第一声音信号的发声位置、声音内容和发声时间,那么电子设备确定第一声音信号的第一属性信息,可以包括以下步骤:电子设备利用麦克风阵列确定第一声音信号的发声位置;电子设备根据第一声音信号的声音特征确定第一声音信号的声音内容;电子设备确定第一声音信号的发声时间。In the second implementation manner, if the first attribute information includes the sounding position, sound content, and sounding time of the first sound signal, then the electronic device determining the first attribute information of the first sound signal may include the following steps: the electronic device uses a microphone array The sounding position of the first sound signal is determined; the electronic device determines the sound content of the first sound signal according to the sound characteristics of the first sound signal; the electronic device determines the sounding time of the first sound signal.
在第二种实现方式中,第一声音信号的声音内容为语音内容。示例的,请结合图1和 图2所示,假设第一声音信号为声音信号C,在用户400对着智能音箱100说“播放歌曲ABC”时,会生成声音信号C在空气中传播,声音信号C的声音内容为用户400用嘴说的“播放歌曲ABC”。In the second implementation manner, the sound content of the first sound signal is voice content. For example, please refer to Figure 1 and Figure 2. Assume that the first sound signal is sound signal C. When user 400 says "Play song ABC" to smart speaker 100, sound signal C will be generated to propagate in the air. The sound content of the signal C is "play song ABC" that the user 400 said with his mouth.
第三种实现方式,如果第一属性信息包括第一声音信号的发声位置、声音类型、声音内容和发声时间,那么电子设备确定第一声音信号的第一属性信息,可以包括以下步骤:电子设备利用麦克风阵列确定第一声音信号的发声位置;电子设备根据第一声音信号的声音特征确定第一声音信号的声音类型;电子设备根据第一声音信号的声音特征确定第一声音信号的声音内容;电子设备确定第一声音信号的发声时间。In a third implementation manner, if the first attribute information includes the sounding position, sound type, sound content, and sounding time of the first sound signal, then the electronic device determining the first attribute information of the first sound signal may include the following steps: electronic device Use the microphone array to determine the sounding position of the first sound signal; the electronic device determines the sound type of the first sound signal according to the sound characteristics of the first sound signal; the electronic device determines the sound content of the first sound signal according to the sound characteristics of the first sound signal; The electronic device determines the sounding time of the first sound signal.
当然,并不局限于上述提到的三种实现方式,还可以在第一属性信息中增加其他类型的信息。Of course, it is not limited to the above-mentioned three implementation manners, and other types of information can also be added to the first attribute information.
在第一属性信息包含的内容不同时,可以识别不同类型的场景。When the content contained in the first attribute information is different, different types of scenes can be identified.
对于第一种实现方式而言,第一属性信息包括第一声音信号的发声位置、声音类型和发声时间。第一种实现方式的应用场景为:电子设备可以识别只发出一种声音类型的声源。For the first implementation manner, the first attribute information includes the sounding position, sound type, and sounding time of the first sound signal. The application scenario of the first implementation manner is: the electronic device can identify a sound source that only emits one type of sound.
对于第二种实现方式而言,第一属性信息包括第一声音信号的发声位置、声音内容和发声时间。第二种实现方式的应用场景为:电子设备可以识别发出语音内容的声源。For the second implementation manner, the first attribute information includes the sounding position, sound content, and sounding time of the first sound signal. The application scenario of the second implementation manner is: the electronic device can identify the sound source emitting the voice content.
对于第三种实现方式而言,第一属性信息包括第一声音信号的发声位置、声音类型、声音内容和发声时间。第三种实现方式的应用场景为:不仅可以识别只发出一种声音类型的声源,而且可以识别发出语音内容的声源。For the third implementation manner, the first attribute information includes the sounding position, sound type, sound content, and sounding time of the first sound signal. The application scenario of the third implementation manner is: not only can identify the sound source that emits only one type of sound, but also the sound source that emits the voice content.
S103、电子设备判断第一属性信息是否与固定声源库中的固定声源的属性信息相匹配。S103. The electronic device determines whether the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library.
其中,固定声源库中包括一个或多个固定声源的属性信息,固定声源为位于同一个位置且发出一种已知声音类型的声源。Wherein, the fixed sound source library includes attribute information of one or more fixed sound sources, and the fixed sound sources are sound sources that are located at the same location and emit a known sound type.
示例的,请结合表1、图1和图2所示,表1所示的为固定声源库中的多个固定声源的属性信息,表1中的固定声源的属性信息为智能音箱100根据历史信息预先学习并生成的数据。关于表1的生成过程,后面的实施例会详细的介绍。For example, please refer to Table 1, Figure 1 and Figure 2. Table 1 shows the attribute information of multiple fixed sound sources in the fixed sound source library, and the attribute information of the fixed sound source in Table 1 is smart speakers 100 Pre-learned and generated data based on historical information. Regarding the generation process of Table 1, the following embodiments will introduce in detail.
Figure PCTCN2021092948-appb-000001
Figure PCTCN2021092948-appb-000001
表1Table 1
在表1、图1和图2的示例中,假设第一声音信号为声音信号A,智能音箱100确定声音信号A的属性信息包括声音信号A的发声位置、声音类型和发声时间。其中,声音信号A的发声位置为140度,声音信号A的声音类型为排油烟机200的声音,声音信号A的发声时间为2020年4月10日18点30分。In the examples in Table 1, FIG. 1 and FIG. 2, assuming that the first sound signal is sound signal A, the smart speaker 100 determines that the attribute information of sound signal A includes the sounding position, sound type, and sounding time of sound signal A. Among them, the sounding position of the sound signal A is 140 degrees, the sound type of the sound signal A is the sound of the range hood 200, and the sounding time of the sound signal A is 18:30 on April 10, 2020.
在智能音箱100获取到声音信号A的属性信息以后,智能音箱100会判断声音信号A的属性信息是否与表1的固定声源库中的固定声源的属性信息相匹配。通过表1可以得知, 声音信号A的属性信息与表1中的编号1对应的固定声源的属性信息相匹配,说明声音信号A对应的声源1为固定声源发出的声音信号。After the smart speaker 100 obtains the attribute information of the sound signal A, the smart speaker 100 will determine whether the attribute information of the sound signal A matches the attribute information of the fixed sound source in the fixed sound source library of Table 1. It can be known from Table 1 that the attribute information of the sound signal A matches the attribute information of the fixed sound source corresponding to number 1 in Table 1, indicating that the sound source 1 corresponding to the sound signal A is the sound signal emitted by the fixed sound source.
S104、在第一属性信息与固定声源库中的固定声源的属性信息相匹配时,确定第一声音信号为固定声源发出的声音信号。S104: When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, determine that the first sound signal is a sound signal emitted by the fixed sound source.
在图3所示的实施例中,电子设备能够将第一音频流中的第一声音信号的第一属性信息与预先生成的固定声源库进行匹配,如果第一属性信息与固定声源库中的固定声源的属性信息相匹配,说明第一声音信号为固定声源发出的声音信号,所以电子设备能够精准的识别出环境中存在的固定声源。In the embodiment shown in FIG. 3, the electronic device can match the first attribute information of the first sound signal in the first audio stream with the pre-generated fixed sound source library, if the first attribute information matches the fixed sound source library The attribute information of the fixed sound source in the match indicates that the first sound signal is the sound signal emitted by the fixed sound source, so the electronic device can accurately identify the fixed sound source in the environment.
请参见图4和图5所示,图4所示的为本申请实施例提供的另一种场景示意图,图5所示的为图4中的2个声源的方位示意图。图4所示的场景示意图展示了智能音箱100、用户400和智能电视600。其中,图4所示的智能音箱100能够执行本申请实施例提供的固定声源识别方法。Please refer to FIG. 4 and FIG. 5. FIG. 4 shows another schematic diagram of a scene provided by an embodiment of the present application, and FIG. 5 shows a schematic diagram of the orientation of the two sound sources in FIG. 4. The scene diagram shown in FIG. 4 shows the smart speaker 100, the user 400, and the smart TV 600. Among them, the smart speaker 100 shown in FIG. 4 can execute the fixed sound source identification method provided by the embodiment of the present application.
结合图4和图5所示,在一种可能的场景中,假设用户400的家中具有智能音箱100和智能电视600等设备,而且智能音箱100和智能电视600之间的相对位置如图4和图5所示。假设用户400想要让智能音箱100播放歌曲《ABC》,用户400会向智能音箱100发出播放歌曲《ABC》的语音指令,该语音指令为图4和图5中的声音信号F。As shown in FIG. 4 and FIG. 5, in a possible scenario, assume that the user 400 has devices such as smart speakers 100 and smart TV 600 at home, and the relative positions between smart speakers 100 and smart TV 600 are shown in FIGS. 4 and Shown in Figure 5. Assuming that the user 400 wants the smart speaker 100 to play the song "ABC", the user 400 will send a voice instruction to the smart speaker 100 to play the song "ABC", and the voice instruction is the sound signal F in FIGS. 4 and 5.
假设在用户400向智能音箱100发出声音信号F的过程中,智能电视600处于工作状态,智能电视600会向智能音箱100发出噪音,智能电视600发出的噪音为图4和图5中的声音信号E。Suppose that during the process of the user 400 sending the sound signal F to the smart speaker 100, the smart TV 600 is in a working state, the smart TV 600 will emit noise to the smart speaker 100, and the noise emitted by the smart TV 600 is the sound signal in FIGS. 4 and 5 E.
此时,智能音箱100便可以利用图3所示的固定声源识别方法,来判断接收到的哪个声音信号属于固定声源。在智能音箱100确定了智能电视600属于固定声源以后,智能音箱100可以屏蔽接收到的声音信号E,仅识别接收到的声音信号F对应的语音指令,从而为用户播放歌曲《ABC》。At this time, the smart speaker 100 can use the fixed sound source identification method shown in FIG. 3 to determine which sound signal received belongs to the fixed sound source. After the smart speaker 100 determines that the smart TV 600 belongs to a fixed sound source, the smart speaker 100 can shield the received sound signal E, and only recognize the voice command corresponding to the received sound signal F, so as to play the song "ABC" for the user.
具体的,假设第一音频流包括声音信号E和声音信号F,其中,第一声音信号为声音信号E,第二声音信号为声音信号F,声音信号E和F的发声位置分为230°和320°。以所述第一声音信号为例,智能音箱100确定声音信号E的属性信息包括声音信号E的发声位置、声音内容和发声时间。假设声音信号E的发声位置为230度,声音信号E的声音内容为“欢迎收看DEF节目”,声音信号E的发声时间为18点30分。Specifically, it is assumed that the first audio stream includes a sound signal E and a sound signal F, where the first sound signal is a sound signal E, the second sound signal is a sound signal F, and the sounding positions of the sound signals E and F are divided into 230° and 320°. Taking the first sound signal as an example, the smart speaker 100 determines that the attribute information of the sound signal E includes the sounding position, sound content, and sounding time of the sound signal E. Assuming that the sounding position of the sound signal E is 230 degrees, the sound content of the sound signal E is "Welcome to the DEF program", and the sounding time of the sound signal E is 18:30.
结合表2、图4和图5所示,表2所示的为固定声源库中的多个固定声源的属性信息,表2中的固定声源的属性信息为智能音箱100根据历史信息预先学习并生成的数据。Combined with Table 2, Figure 4 and Figure 5, Table 2 shows the attribute information of multiple fixed sound sources in the fixed sound source library. Pre-learned and generated data.
Figure PCTCN2021092948-appb-000002
Figure PCTCN2021092948-appb-000002
表2Table 2
在表2、图4和图5的示例中,在智能音箱100获取到声音信号E的属性信息以后,智能音箱100会判断声音信号E的属性信息是否与表2所示的固定声源库中的固定声源的属性信息相匹配。通过表2可以得知,声音信号E的属性信息与表2中的编号1对应的固定声源的属性信息相匹配,说明声音信号E对应的声源5为固定声源发出的声音信号。In the examples in Table 2, FIG. 4, and FIG. 5, after the smart speaker 100 obtains the attribute information of the sound signal E, the smart speaker 100 will determine whether the attribute information of the sound signal E is the same as that in the fixed sound source library shown in Table 2. Match the attribute information of the fixed sound source. It can be known from Table 2 that the attribute information of the sound signal E matches the attribute information of the fixed sound source corresponding to number 1 in Table 2, indicating that the sound source 5 corresponding to the sound signal E is the sound signal emitted by the fixed sound source.
请参见图6所示,图6所示的为本申请实施例提供的另一种固定声源识别方法的流程图。图6所示的方法为图3的S102中的细化步骤,具体为“电子设备根据第一声音信号的声音特征确定第一声音信号的声音类型”的细化步骤。图6所示的方法包括以下步骤S201至S203。Please refer to FIG. 6, which shows a flowchart of another method for identifying a fixed sound source according to an embodiment of the present application. The method shown in FIG. 6 is the refinement step in S102 of FIG. 3, specifically the refinement step of "the electronic device determines the sound type of the first sound signal according to the sound characteristics of the first sound signal". The method shown in FIG. 6 includes the following steps S201 to S203.
S201、电子设备确定声音事件库中是否存在与第一声音信号的声音特征对应的声音类型。如果存在,执行步骤S202;如果不存在,执行步骤S203。S201: The electronic device determines whether there is a sound type corresponding to the sound feature of the first sound signal in the sound event library. If it exists, execute step S202; if it does not exist, execute step S203.
其中,声音事件库包括一种或多种声音类型,声音事件库内的声音类型均为预先设置好的。Among them, the sound event library includes one or more sound types, and the sound types in the sound event library are all preset.
示例的,请结合表3、图1和图2所示,表3所示的为声音事件库中的声音特征与声音类型的对应关系表。For example, please refer to Table 3, Figure 1 and Figure 2. Table 3 shows the correspondence between sound features and sound types in the sound event library.
声音特征Voice characteristics 声音类型Sound type
声音特征XVoice feature X 排油烟机200的声音The sound of range hood 200
声音特征YVoice feature Y 空调300的声音The sound of air conditioner 300
表3table 3
在表3、图1和图2的示例中,假设第一声音信号为声音信号A,智能音箱100确定出声音信号A的声音特征X。然后,智能音箱100确定表3所示的声音事件库中是否存在与声音信号A的声音特征X对应的声音类型。通过表3可以得知,与声音信号A的声音特征X对应的声音类型为排油烟机200的声音。最后,智能音箱100可以确定声音信号A的声音特征X对应的声音类型为排油烟机200的声音。In the examples in Table 3, FIG. 1 and FIG. 2, assuming that the first sound signal is the sound signal A, the smart speaker 100 determines the sound characteristic X of the sound signal A. Then, the smart speaker 100 determines whether there is a sound type corresponding to the sound feature X of the sound signal A in the sound event library shown in Table 3. It can be known from Table 3 that the sound type corresponding to the sound feature X of the sound signal A is the sound of the range hood 200. Finally, the smart speaker 100 can determine that the sound type corresponding to the sound feature X of the sound signal A is the sound of the range hood 200.
S202、将第一声音信号的声音特征对应的声音类型确定为第一声音信号的声音类型。S202: Determine the sound type corresponding to the sound feature of the first sound signal as the sound type of the first sound signal.
S203、电子设备向外部服务器发送第一网络请求,电子设备接收外部服务器发送的第一响应请求。S203. The electronic device sends a first network request to the external server, and the electronic device receives the first response request sent by the external server.
其中,第一网络请求包括第一声音信号的声音特征,第一响应请求包括第一声音信号的声音特征对应的声音类型。The first network request includes the sound feature of the first sound signal, and the first response request includes the sound type corresponding to the sound feature of the first sound signal.
示例的,请参见图1和图7所示,图7所示的为本申请实施例提供的又一种场景示意图,智能音箱100可以通过互联网连接外部的服务器1000。假设在智能音箱100的声音事件库不存在与声音信号A的声音特征X对应的声音类型,那么智能音箱100会向服务器1000发送第一网络请求,第一网络请求包括声音信号A的声音特征X。在服务器1000接收到第一网络请求以后,服务器1000会在云存储器中查询到与声音信号A的声音特征X对应的声音类型为排油烟机200的声音,然后服务器1000会向智能音箱100发送第一响应请求,第一响应请求包括声音信号A的声音特征X对应的声音类型为排油烟机200的声音。在智能音箱100接收到服务器1000发送的第一响应请求以后,智能音箱100便可以得知声音信号A的声音特征X对应的声音类型为排油烟机200的声音。For example, please refer to FIG. 1 and FIG. 7. FIG. 7 shows a schematic diagram of another scenario provided in an embodiment of the present application. The smart speaker 100 can connect to an external server 1000 through the Internet. Assuming that there is no sound type corresponding to the sound characteristic X of the sound signal A in the sound event library of the smart speaker 100, the smart speaker 100 will send a first network request to the server 1000, and the first network request includes the sound characteristic X of the sound signal A. . After the server 1000 receives the first network request, the server 1000 will query in the cloud storage that the sound type corresponding to the sound feature X of the sound signal A is the sound of the range hood 200, and then the server 1000 will send the first network request to the smart speaker 100 A response request. The first response request includes the sound characteristic X of the sound signal A and the corresponding sound type is the sound of the range hood 200. After the smart speaker 100 receives the first response request sent by the server 1000, the smart speaker 100 can learn that the sound type corresponding to the sound feature X of the sound signal A is the sound of the range hood 200.
在图6所示的实施例中,电子设备能够在声音事件库或外部服务器中获取与第一声音信号的声音特征对应的声音类型。In the embodiment shown in FIG. 6, the electronic device can obtain the sound type corresponding to the sound feature of the first sound signal in a sound event library or an external server.
在图6所示的实施例中,还可以包括步骤S204(图6中未示出S204),S204可以替换S203构成另外一种实施方式。其中,S204可以包括以下步骤:电子设备确定第一声音信号的声音特征在第一位置出现的次数是否大于第一阈值,第一位置为第一声音信号的发声 位置,如果第一声音信号的声音特征在第一位置出现的次数大于第一阈值,确定第一声音信号的声音类型为已知声音类型。另外,在确定第一声音信号的声音类型为已知声音类型以后,电子设备还可以将第一声音信号的声音特征与已知声音类型存储至声音事件库中。In the embodiment shown in FIG. 6, step S204 may be further included (S204 is not shown in FIG. 6), and S204 may replace S203 to form another implementation manner. Wherein, S204 may include the following steps: the electronic device determines whether the number of times the sound feature of the first sound signal appears in the first position is greater than a first threshold, the first position is the sounding position of the first sound signal, and if the sound of the first sound signal The number of times the feature appears in the first position is greater than the first threshold, and it is determined that the sound type of the first sound signal is a known sound type. In addition, after determining that the sound type of the first sound signal is a known sound type, the electronic device may also store the sound characteristics of the first sound signal and the known sound type in the sound event library.
在S204中,第一位置为第一声音信号对应的声源相对于电子设备的发声位置,即第一位置为第一声音信号的发声位置。第一阈值为预先设定的次数,例如,可以预先将第一阈值设定为3次。已知声音类型指的是不确定具体的声音类型但却属于固定声源的声音类型。In S204, the first position is the sounding position of the sound source corresponding to the first sound signal relative to the electronic device, that is, the first position is the sounding position of the first sound signal. The first threshold is a preset number of times. For example, the first threshold may be set to 3 times in advance. The known sound type refers to a sound type that is uncertain of a specific sound type but belongs to a fixed sound source.
其中,如果第一声音信号的声音特征在第一位置出现的次数大于第一阈值,说明第一声音信号对应的声源属于固定声源,但由于第一声音信号的声音特征对应的声音类型未存储在声音事件库中,所以电子设备可以将第一声音信号的声音类型确定为已知声音类型。Wherein, if the number of occurrences of the sound feature of the first sound signal at the first position is greater than the first threshold, it means that the sound source corresponding to the first sound signal belongs to a fixed sound source, but the sound type corresponding to the sound feature of the first sound signal is not It is stored in the sound event library, so the electronic device can determine the sound type of the first sound signal as a known sound type.
例如,电子设备可以将第一声音信号的声音类型确定为已知声音类型A,虽然电子设备不确定已知声音类型A属于哪种具体的声音类型,但是,电子设备可以确定已知声音类型A是经常能够接收到的一种声音类型。For example, the electronic device may determine the sound type of the first sound signal as the known sound type A. Although the electronic device is not sure which specific sound type the known sound type A belongs to, the electronic device may determine the known sound type A. It is a type of sound that can often be received.
请参见图8所示,图8所示的为本申请实施例提供的又一种固定声源识别方法的流程图。图8所示的方法包括以下步骤S301至S304。Please refer to FIG. 8, which shows a flowchart of another method for identifying a fixed sound source according to an embodiment of the present application. The method shown in FIG. 8 includes the following steps S301 to S304.
S301、电子设备获取第二时间段内的第二音频流,第二音频流至少包括第二声音信号。S301. The electronic device acquires a second audio stream in a second time period, where the second audio stream includes at least a second sound signal.
其中,第二时间段指的是用户未向电子设备输入语音指令的一个时间段。在用户未向电子设备输入语音指令时,电子设备会实时的获取周围环境的固定声源发出的声音信号。第二音频流为在用户未向电子设备输入语音指令时,电子设备利用麦克风阵列在第二时间段内采集电子设备所处环境中的声音生成的音频流。Wherein, the second time period refers to a time period during which the user does not input a voice instruction to the electronic device. When the user does not input a voice command to the electronic device, the electronic device will obtain the sound signal from the fixed sound source in the surrounding environment in real time. The second audio stream is an audio stream generated by the electronic device using the microphone array to collect sounds in the environment where the electronic device is located in the second time period when the user does not input a voice command to the electronic device.
示例的,请参见图9和图10所示,图9所示的为本申请实施例提供的又一种场景示意图,图10所示的为图9中的1个声源的方位示意图。图9所示的场景示意图展示了智能音箱100和饮水机500。其中,智能音箱100在一段时间段内获取到第二音频流,第二音频流仅包括声音信号D,声音信号D为饮水机500发出的声音信号。For example, please refer to FIG. 9 and FIG. 10. FIG. 9 shows a schematic diagram of another scene provided by an embodiment of this application, and FIG. 10 shows a schematic diagram of the position of one sound source in FIG. 9. The scene schematic diagram shown in FIG. 9 shows the smart speaker 100 and the water dispenser 500. Wherein, the smart speaker 100 obtains the second audio stream within a period of time, the second audio stream only includes the sound signal D, and the sound signal D is the sound signal emitted by the water dispenser 500.
S302、电子设备确定第二声音信号的第二属性信息。S302. The electronic device determines second attribute information of the second sound signal.
其中,图8的S302与图3的S102的执行过程是相同的,关于图8的S302的详细内容请参见图3的S102的详细描述。The execution process of S302 in FIG. 8 is the same as that of S102 in FIG. 3. For details of S302 in FIG. 8, please refer to the detailed description of S102 in FIG. 3.
S303、电子设备判断第二属性信息是否存在于固定声源库中。S303. The electronic device judges whether the second attribute information exists in the fixed sound source library.
其中,固定声源库中包括一个或多个固定声源对应的属性信息,固定声源为位于同一个位置且发出一种已知声音类型的声源。Wherein, the fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound sources are sound sources that are located at the same location and emit a known sound type.
如果第二属性信息存在于固定声源库中,说明第二声音信号对应的声源的属性信息已经存储在固定声源库中。如果第二属性信息不存在于固定声源库中,说明第二声音信号对应的声源的属性信息未存储在固定声源库中,即第二声音信号对应的声源对于电子设备而言是新的固定声源。If the second attribute information exists in the fixed sound source library, it means that the attribute information of the sound source corresponding to the second sound signal has been stored in the fixed sound source library. If the second attribute information does not exist in the fixed sound source library, it means that the attribute information of the sound source corresponding to the second sound signal is not stored in the fixed sound source library, that is, the sound source corresponding to the second sound signal is for the electronic device New fixed sound source.
S304、在第二属性信息不存在于固定声源库中时,将第二属性信息存储至固定声源库中。S304: When the second attribute information does not exist in the fixed sound source library, store the second attribute information in the fixed sound source library.
示例的,请结合表1、图9和图10所示,假设第二声音信号为声音信号D,智能音箱100确定声音信号D的属性信息包括声音信号D的发声位置、声音类型和发声时间。其中,声音信号D的发声位置为180度,声音信号D的声音类型为饮水机500的声音,声音信号 D的发声时间为2020年4月10日18点10分至18点13分。For example, please refer to Table 1, FIG. 9 and FIG. 10. Assuming that the second sound signal is the sound signal D, the smart speaker 100 determines that the attribute information of the sound signal D includes the sounding position, the sound type, and the sounding time of the sound signal D. The sounding position of the sound signal D is 180 degrees, the sound type of the sound signal D is the sound of the water dispenser 500, and the sounding time of the sound signal D is from 18:10 to 18:13 on April 10, 2020.
在智能音箱100获取到声音信号D的属性信息以后,智能音箱100会判断声音信号D的属性信息是否存在于固定声源库中。通过表1可以得知,声音信号D的属性信息并未存在于固定声源库中,所以智能音箱100会将声音信号D的属性信息存储至固定声源库中。After the smart speaker 100 obtains the attribute information of the sound signal D, the smart speaker 100 determines whether the attribute information of the sound signal D exists in the fixed sound source library. It can be known from Table 1 that the attribute information of the sound signal D does not exist in the fixed sound source library, so the smart speaker 100 stores the attribute information of the sound signal D in the fixed sound source library.
请参见表4所示,表4为表1所示的固定声源库中存储了声音信号D的属性信息以后的状态。Please refer to Table 4, which is the state after the attribute information of the sound signal D is stored in the fixed sound source library shown in Table 1.
Figure PCTCN2021092948-appb-000003
Figure PCTCN2021092948-appb-000003
表4Table 4
在图8所示的实施例中,电子设备能够建立固定声源库,并且还可以不断的更新固定声源库中的内容。通过图8所示的方法,可以建立表1或表4所示的固定声源库。In the embodiment shown in FIG. 8, the electronic device can establish a fixed sound source library, and can also continuously update the content in the fixed sound source library. Through the method shown in Figure 8, the fixed sound source library shown in Table 1 or Table 4 can be established.
请参见图11所示,图11所示的为本申请实施例提供的一种电子设备的示意图。图11所示的电子设备包括以下模块:Please refer to FIG. 11, which shows a schematic diagram of an electronic device provided by an embodiment of the present application. The electronic device shown in Figure 11 includes the following modules:
获取模块11,用于获取第一时间段内的第一音频流,所述第一音频流至少包括第一声音信号。The acquiring module 11 is configured to acquire a first audio stream in a first time period, where the first audio stream includes at least a first sound signal.
处理模块12,用于在第一音频流中分离出第一声音信号;确定所述第一声音信号的第一属性信息;判断所述第一属性信息是否与固定声源库中的固定声源的属性信息相匹配,所述固定声源库中包括一个或多个固定声源对应的属性信息,所述固定声源为位于同一个位置且发出一种已知声音类型的声源;在所述第一属性信息与所述固定声源库中的固定声源的属性信息相匹配时,确定所述第一声音信号为所述固定声源发出的声音信号。The processing module 12 is configured to separate the first sound signal from the first audio stream; determine the first attribute information of the first sound signal; determine whether the first attribute information is consistent with the fixed sound source in the fixed sound source library The fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound source is a sound source that is located in the same position and emits a known sound type; When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is a sound signal emitted by the fixed sound source.
其中,获取模块11和处理模块12能够实现的附加功能、实现上述功能的更多细节请参考前面各个方法实施例中的描述,在这里不再重复。For the additional functions that can be implemented by the acquiring module 11 and the processing module 12, and for more details of implementing the above-mentioned functions, please refer to the descriptions in the previous method embodiments, and will not be repeated here.
图11所描述的装置实施例仅仅是示意性的,例如,模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。The device embodiment described in FIG. 11 is only schematic. For example, the division of modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components can be combined or integrated into Another system, or some features can be ignored, or not implemented. The functional modules in the various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
请参见图12所示,图12所示的为本申请实施例提供的又一种电子设备的示意图。图12所示的电子设备包括处理器21和存储器22。Please refer to FIG. 12, which shows a schematic diagram of another electronic device provided by an embodiment of the present application. The electronic device shown in FIG. 12 includes a processor 21 and a memory 22.
在图12所示的实施例中,处理器21用于执行存储器22中存储的指令,以使电子设备执行以下操作:获取第一时间段内的第一音频流,所述第一音频流至少包括第一声音信 号;在第一音频流中分离出第一声音信号;确定所述第一声音信号的第一属性信息;判断所述第一属性信息是否与固定声源库中的固定声源的属性信息相匹配,所述固定声源库中包括一个或多个固定声源对应的属性信息,所述固定声源为位于同一个位置且发出一种已知声音类型的声源;在所述第一属性信息与所述固定声源库中的固定声源的属性信息相匹配时,确定所述第一声音信号为所述固定声源发出的声音信号。In the embodiment shown in FIG. 12, the processor 21 is configured to execute instructions stored in the memory 22, so that the electronic device performs the following operations: obtain a first audio stream in a first time period, and the first audio stream is at least The first sound signal is included; the first sound signal is separated from the first audio stream; the first attribute information of the first sound signal is determined; whether the first attribute information is related to a fixed sound source in a fixed sound source library The fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound source is a sound source that is located in the same position and emits a known sound type; When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is a sound signal emitted by the fixed sound source.
处理器21是一个或多个CPU。可选的,该CPU为单核CPU或多核CPU。The processor 21 is one or more CPUs. Optionally, the CPU is a single-core CPU or a multi-core CPU.
存储器22包括但不限于是随机存取存储器(random access memory,RAM)、只读存储器(Read only Memory,ROM)、可擦除可编程只读存储器(erasable programmable read-only memory,EPROM或者快闪存储器)、快闪存储器、或光存储器等。存储器22中保存有操作系统的代码。The memory 22 includes, but is not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory, EPROM or flash Memory), flash memory, or optical memory, etc. The code of the operating system is stored in the memory 22.
可选地,电子设备还包括总线23,上述处理器21和存储器22通过总线23相互连接,也可以采用其他方式相互连接。Optionally, the electronic device further includes a bus 23, and the above-mentioned processor 21 and the memory 22 are connected to each other through the bus 23, and may also be connected to each other in other ways.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的范围。这样,倘若本申请的这些修改和变型属于本发明权利要求的范围之内,则本发明也意图包括这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the scope of the present invention. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present invention, the present invention is also intended to include these modifications and variations.

Claims (21)

  1. 一种固定声源识别方法,其特征在于,所述方法应用于电子设备中,所述方法包括:A method for identifying a fixed sound source, wherein the method is applied to an electronic device, and the method includes:
    所述电子设备获取第一时间段内的第一音频流,所述第一音频流至少包括第一声音信号;Acquiring, by the electronic device, a first audio stream in a first time period, where the first audio stream includes at least a first sound signal;
    所述电子设备在所述第一音频流中分离出所述第一声音信号;The electronic device separates the first sound signal from the first audio stream;
    所述电子设备确定所述第一声音信号的第一属性信息;Determining, by the electronic device, first attribute information of the first sound signal;
    所述电子设备判断所述第一属性信息是否与固定声源库中的固定声源的属性信息相匹配,所述固定声源库中包括一个或多个固定声源对应的属性信息,所述固定声源为位于同一个位置且发出一种已知声音类型的声源;The electronic device determines whether the first attribute information matches the attribute information of a fixed sound source in a fixed sound source library, and the fixed sound source library includes one or more fixed sound source corresponding attribute information, the A fixed sound source is a sound source that is located at the same location and emits a known sound type;
    在所述第一属性信息与所述固定声源库中的固定声源的属性信息相匹配时,确定所述第一声音信号为所述固定声源发出的声音信号。When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is the sound signal emitted by the fixed sound source.
  2. 根据权利要求1所述的固定声源识别方法,其特征在于,所述电子设备包括麦克风阵列,所述电子设备获取第一时间段内的第一音频流,包括:The method for identifying a fixed sound source according to claim 1, wherein the electronic device comprises a microphone array, and the electronic device acquiring the first audio stream in the first time period comprises:
    所述电子设备利用所述麦克风阵列在所述第一时间段内采集所述电子设备所处环境中的声音生成所述第一音频流。The electronic device uses the microphone array to collect sounds in the environment where the electronic device is located in the first time period to generate the first audio stream.
  3. 根据权利要求2所述的固定声源识别方法,其特征在于,所述第一属性信息包括所述第一声音信号的发声位置、声音类型和发声时间。The method for identifying a fixed sound source according to claim 2, wherein the first attribute information includes a sounding position, a sound type, and a sounding time of the first sound signal.
  4. 根据权利要求3所述的固定声源识别方法,其特征在于,所述电子设备确定所述第一声音信号的第一属性信息,包括:The method for identifying a fixed sound source according to claim 3, wherein the electronic device determining the first attribute information of the first sound signal comprises:
    所述电子设备利用所述麦克风阵列确定所述第一声音信号的发声位置;The electronic device uses the microphone array to determine the sounding position of the first sound signal;
    所述电子设备根据所述第一声音信号的声音特征确定所述第一声音信号的声音类型;Determining, by the electronic device, the sound type of the first sound signal according to the sound feature of the first sound signal;
    所述电子设备确定所述第一声音信号的发声时间。The electronic device determines the sounding time of the first sound signal.
  5. 根据权利要求2所述的固定声源识别方法,其特征在于,所述第一属性信息包括所述第一声音信号的发声位置、声音内容和发声时间。The method for identifying a fixed sound source according to claim 2, wherein the first attribute information includes the sounding position, sound content, and sounding time of the first sound signal.
  6. 根据权利要求5所述的固定声源识别方法,其特征在于,所述电子设备确定所述第一声音信号的第一属性信息,包括:The method for identifying a fixed sound source according to claim 5, wherein the electronic device determining the first attribute information of the first sound signal comprises:
    所述电子设备利用所述麦克风阵列确定所述第一声音信号的发声位置;The electronic device uses the microphone array to determine the sounding position of the first sound signal;
    所述电子设备根据所述第一声音信号的声音特征确定所述第一声音信号的声音内容;Determining, by the electronic device, the sound content of the first sound signal according to the sound feature of the first sound signal;
    所述电子设备确定所述第一声音信号的发声时间。The electronic device determines the sounding time of the first sound signal.
  7. 根据权利要求2所述的固定声源识别方法,其特征在于,所述第一属性信息包括所述第一声音信号的发声位置、声音类型、声音内容和发声时间。The method for identifying a fixed sound source according to claim 2, wherein the first attribute information includes the sounding position, sound type, sound content, and sounding time of the first sound signal.
  8. 根据权利要求7所述的固定声源识别方法,其特征在于,所述电子设备确定所述第一声音信号的第一属性信息,包括:The method for identifying a fixed sound source according to claim 7, wherein the electronic device determining the first attribute information of the first sound signal comprises:
    所述电子设备利用所述麦克风阵列确定所述第一声音信号的发声位置;The electronic device uses the microphone array to determine the sounding position of the first sound signal;
    所述电子设备根据所述第一声音信号的声音特征确定所述第一声音信号的声音类型;Determining, by the electronic device, the sound type of the first sound signal according to the sound feature of the first sound signal;
    所述电子设备根据所述第一声音信号的声音特征确定所述第一声音信号的声音内容;Determining, by the electronic device, the sound content of the first sound signal according to the sound feature of the first sound signal;
    所述电子设备确定所述第一声音信号的发声时间。The electronic device determines the sounding time of the first sound signal.
  9. 根据权利要求4或8所述的固定声源识别方法,其特征在于,所述电子设备根据所述第一声音信号的声音特征确定所述第一声音信号的声音类型,包括:The method for identifying a fixed sound source according to claim 4 or 8, wherein the electronic device determining the sound type of the first sound signal according to the sound characteristics of the first sound signal comprises:
    所述电子设备确定声音事件库中是否存在与所述第一声音信号的声音特征对应的声音类型,所述声音事件库包括一种或多种声音类型;Determining, by the electronic device, whether there is a sound type corresponding to the sound feature of the first sound signal in a sound event library, the sound event library including one or more sound types;
    在所述声音事件库中存在与所述第一声音信号的声音特征对应的声音类型时,将所述第一声音信号的声音特征对应的声音类型确定为所述第一声音信号的声音类型;When there is a sound type corresponding to the sound feature of the first sound signal in the sound event library, determining the sound type corresponding to the sound feature of the first sound signal as the sound type of the first sound signal;
    在所述声音事件库中不存在与所述第一声音信号的声音特征对应的声音类型时,所述电子设备向外部服务器发送第一网络请求,所述电子设备接收所述外部服务器发送的第一响应请求,所述第一网络请求包括所述第一声音信号的声音特征,所述第一响应请求包括所述第一声音信号的声音特征对应的声音类型;或者,When there is no sound type corresponding to the sound feature of the first sound signal in the sound event library, the electronic device sends a first network request to an external server, and the electronic device receives the first network request sent by the external server. A response request, the first network request includes the sound characteristic of the first sound signal, and the first response request includes the sound type corresponding to the sound characteristic of the first sound signal; or,
    在所述声音事件库中不存在与所述第一声音信号的声音特征对应的声音类型时,所述电子设备确定所述第一声音信号的声音特征在第一位置出现的次数是否大于第一阈值,所述第一位置为所述第一声音信号的发声位置,如果所述第一声音信号的声音特征在所述第一位置出现的次数大于第一阈值,确定所述第一声音信号的声音类型为已知声音类型。When there is no sound type corresponding to the sound feature of the first sound signal in the sound event library, the electronic device determines whether the sound feature of the first sound signal appears more frequently in the first position than the first position. Threshold, the first position is the sounding position of the first sound signal, and if the number of times that the sound feature of the first sound signal appears at the first position is greater than the first threshold, determine the sounding position of the first sound signal The sound type is a known sound type.
  10. 根据权利要求1至8任意一项所述的固定声源识别方法,其特征在于,所述方法还包括:The method for identifying a fixed sound source according to any one of claims 1 to 8, wherein the method further comprises:
    所述电子设备获取第二时间段内的第二音频流,所述第二音频流至少包括第二声音信号;Acquiring, by the electronic device, a second audio stream in a second time period, where the second audio stream includes at least a second sound signal;
    所述电子设备确定所述第二声音信号的第二属性信息;Determining the second attribute information of the second sound signal by the electronic device;
    所述电子设备判断所述第二属性信息是否存在于固定声源库中,所述固定声源库中包括一个或多个固定声源对应的属性信息,所述固定声源为位于同一个位置且发出一种已知声音类型的声源;The electronic device determines whether the second attribute information exists in a fixed sound source library, the fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound sources are located at the same location And emit a sound source of a known sound type;
    在所述第二属性信息不存在于所述固定声源库中时,将所述第二属性信息存储至所述固定声源库中。When the second attribute information does not exist in the fixed sound source library, the second attribute information is stored in the fixed sound source library.
  11. 一种电子设备,其特征在于,包括存储器和与所述存储器连接的处理器,所述存储器用于存储指令;An electronic device, characterized by comprising a memory and a processor connected to the memory, the memory being used for storing instructions;
    所述处理器用于执行所述指令,以使所述计算机设备执行以下操作:The processor is configured to execute the instructions, so that the computer device performs the following operations:
    获取第一时间段内的第一音频流,所述第一音频流至少包括第一声音信号;在所述第一音频流中分离出所述第一声音信号;确定所述第一声音信号的第一属性信息;判断所述第一属性信息是否与固定声源库中的固定声源的属性信息相匹配,所述固定声源库中包括一个或多个固定声源对应的属性信息,所述固定声源为位于同一个位置且发出一种已知声音类型的声源;在所述第一属性信息与所述固定声源库中的固定声源的属性信息相匹配时,确定所述第一声音信号为所述固定声源发出的声音信号。Acquire the first audio stream in the first time period, where the first audio stream includes at least a first sound signal; separate the first sound signal from the first audio stream; determine the quality of the first sound signal First attribute information; determine whether the first attribute information matches the attribute information of a fixed sound source in a fixed sound source library, the fixed sound source library includes one or more fixed sound source corresponding attribute information, so The fixed sound source is a sound source that is located at the same location and emits a known sound type; when the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the The first sound signal is a sound signal emitted by the fixed sound source.
  12. 根据权利要求11所述的电子设备,其特征在于,所述电子设备包括麦克风阵列;The electronic device according to claim 11, wherein the electronic device comprises a microphone array;
    所述处理器,具体用于利用所述麦克风阵列在所述第一时间段内采集所述电子设备所处环境中的声音生成所述第一音频流。The processor is specifically configured to use the microphone array to collect sounds in the environment where the electronic device is located in the first time period to generate the first audio stream.
  13. 根据权利要求12所述的电子设备,其特征在于,所述第一属性信息包括所述第一声音信号的发声位置、声音类型和发声时间。The electronic device according to claim 12, wherein the first attribute information includes a sounding position, a sound type, and a sounding time of the first sound signal.
  14. 根据权利要求13所述的电子设备,其特征在于:The electronic device according to claim 13, wherein:
    所述处理器,具体用于利用所述麦克风阵列确定所述第一声音信号的发声位置;根据所述第一声音信号的声音特征确定所述第一声音信号的声音类型;确定所述第一声音信号的发声时间。The processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound type of the first sound signal according to the sound characteristics of the first sound signal; determine the first sound signal The sounding time of the sound signal.
  15. 根据权利要求12所述的电子设备,其特征在于,所述第一属性信息包括所述第一声音信号的发声位置、声音内容和发声时间。The electronic device according to claim 12, wherein the first attribute information comprises a sounding position, sound content, and sounding time of the first sound signal.
  16. 根据权利要求15所述的电子设备,其特征在于:The electronic device according to claim 15, wherein:
    所述处理器,具体用于利用所述麦克风阵列确定所述第一声音信号的发声位置;根据所述第一声音信号的声音特征确定所述第一声音信号的声音内容;确定所述第一声音信号的发声时间。The processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound content of the first sound signal according to the sound characteristics of the first sound signal; determine the first sound signal The sounding time of the sound signal.
  17. 根据权利要求12所述的电子设备,其特征在于,所述第一属性信息包括所述第一声音信号的发声位置、声音类型、声音内容和发声时间。The electronic device according to claim 12, wherein the first attribute information comprises a sounding position, sound type, sound content, and sounding time of the first sound signal.
  18. 根据权利要求17所述的电子设备,其特征在于:The electronic device according to claim 17, wherein:
    所述处理器,具体用于利用所述麦克风阵列确定所述第一声音信号的发声位置;根据所述第一声音信号的声音特征确定所述第一声音信号的声音类型;根据所述第一声音信号的声音特征确定所述第一声音信号的声音内容;确定所述第一声音信号的发声时间。The processor is specifically configured to determine the sounding position of the first sound signal by using the microphone array; determine the sound type of the first sound signal according to the sound characteristics of the first sound signal; The sound feature of the sound signal determines the sound content of the first sound signal; the sounding time of the first sound signal is determined.
  19. 根据权利要求14或18所述的电子设备,其特征在于:The electronic device according to claim 14 or 18, wherein:
    所述处理器,具体用于确定声音事件库中是否存在与所述第一声音信号的声音特征对应的声音类型,所述声音事件库包括一种或多种声音类型;在所述声音事件库中存在与所述第一声音信号的声音特征对应的声音类型时,将所述第一声音信号的声音特征对应的声音类型确定为所述第一声音信号的声音类型;在所述声音事件库中不存在与所述第一声音信号的声音特征对应的声音类型时,向外部服务器发送第一网络请求,接收所述外部服务器发送的第一响应请求,所述第一网络请求包括所述第一声音信号的声音特征,所述第一响应请求包括所述第一声音信号的声音特征对应的声音类型;或者,在所述声音事件库中不存在与所述第一声音信号的声音特征对应的声音类型时,确定所述第一声音信号的声音特征在第一位置出现的次数是否大于第一阈值,所述第一位置为所述第一声音信号的发声位置,如果所述第一声音信号的声音特征在所述第一位置出现的次数大于第一阈值,确定所述第一声音信号的声音类型为已知声音类型。The processor is specifically configured to determine whether there is a sound type corresponding to the sound feature of the first sound signal in the sound event library, the sound event library includes one or more sound types; in the sound event library When there is a sound type corresponding to the sound feature of the first sound signal in the first sound signal, the sound type corresponding to the sound feature of the first sound signal is determined as the sound type of the first sound signal; in the sound event library When there is no sound type corresponding to the sound feature of the first sound signal in the first sound signal, a first network request is sent to an external server, and a first response request sent by the external server is received, and the first network request includes the first network request. A sound characteristic of a sound signal, the first response request includes the sound type corresponding to the sound characteristic of the first sound signal; or, there is no sound characteristic corresponding to the sound characteristic of the first sound signal in the sound event library When determining the sound type of the first sound signal, it is determined whether the number of times the sound feature of the first sound signal appears at the first position is greater than the first threshold, and the first position is the sounding position of the first sound signal. If the first sound The number of occurrences of the sound feature of the signal at the first position is greater than a first threshold, and it is determined that the sound type of the first sound signal is a known sound type.
  20. 根据权利要求11至18任意一项所述的电子设备,其特征在于:The electronic device according to any one of claims 11 to 18, characterized in that:
    所述处理器,还用于获取第二时间段内的第二音频流,所述第二音频流至少包括第二声音信号;确定所述第二声音信号的第二属性信息;判断所述第二属性信息是否存在于固定声源库中,所述固定声源库中包括一个或多个固定声源对应的属性信息,所述固定声源为位于同一个位置且发出一种已知声音类型的声源;在所述第二属性信 息不存在于所述固定声源库中时,将所述第二属性信息存储至所述固定声源库中。The processor is further configured to obtain a second audio stream in a second time period, where the second audio stream includes at least a second sound signal; determine second attribute information of the second sound signal; determine the first 2. Whether the attribute information exists in a fixed sound source library, the fixed sound source library includes attribute information corresponding to one or more fixed sound sources, and the fixed sound sources are located in the same position and emit a known sound type The sound source; when the second attribute information does not exist in the fixed sound source library, the second attribute information is stored in the fixed sound source library.
  21. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    获取模块,用于获取第一时间段内的第一音频流,所述第一音频流至少包括第一声音信号;An obtaining module, configured to obtain a first audio stream in a first time period, where the first audio stream includes at least a first sound signal;
    处理模块,用于在所述第一音频流中分离出所述第一声音信号;确定所述第一声音信号的第一属性信息;判断所述第一属性信息是否与固定声源库中的固定声源的属性信息相匹配,所述固定声源库中包括一个或多个固定声源对应的属性信息,所述固定声源为位于同一个位置且发出一种已知声音类型的声源;在所述第一属性信息与所述固定声源库中的固定声源的属性信息相匹配时,确定所述第一声音信号为所述固定声源发出的声音信号。The processing module is configured to separate the first sound signal from the first audio stream; determine the first attribute information of the first sound signal; determine whether the first attribute information is consistent with that in the fixed sound source library The attribute information of the fixed sound source is matched, the fixed sound source library includes one or more fixed sound sources corresponding to the attribute information, and the fixed sound source is a sound source that is located at the same position and emits a known sound type When the first attribute information matches the attribute information of the fixed sound source in the fixed sound source library, it is determined that the first sound signal is the sound signal emitted by the fixed sound source.
PCT/CN2021/092948 2020-05-14 2021-05-11 Fixed sound source recognition method and apparatus WO2021228059A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010404799.6 2020-05-14
CN202010404799 2020-05-14
CN202011399173.7 2020-12-04
CN202011399173.7A CN113674759A (en) 2020-05-14 2020-12-04 Fixed sound source identification method and device

Publications (1)

Publication Number Publication Date
WO2021228059A1 true WO2021228059A1 (en) 2021-11-18

Family

ID=78525311

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/092948 WO2021228059A1 (en) 2020-05-14 2021-05-11 Fixed sound source recognition method and apparatus

Country Status (1)

Country Link
WO (1) WO2021228059A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000268265A (en) * 1999-03-15 2000-09-29 Chuo Joho Kaihatsu Kk Monitoring device utilizing microphone
CN102915753A (en) * 2012-10-23 2013-02-06 华为终端有限公司 Method for intelligently controlling volume of electronic device and implementation device of method
CN106971499A (en) * 2017-04-14 2017-07-21 北京克路德人工智能科技有限公司 Intelligent monitor system based on auditory localization
CN107682772A (en) * 2017-10-25 2018-02-09 倬韵科技(深圳)有限公司 A kind of hazard recognition is to control the method, apparatus of audio output and earphone
US20180070170A1 (en) * 2016-09-05 2018-03-08 Honda Motor Co., Ltd. Sound processing apparatus and sound processing method
CN107862060A (en) * 2017-11-15 2018-03-30 吉林大学 A kind of semantic recognition device for following the trail of target person and recognition methods
CN108986838A (en) * 2018-09-18 2018-12-11 东北大学 A kind of adaptive voice separation method based on auditory localization
CN109695475A (en) * 2018-11-30 2019-04-30 太原理工大学 A kind of coal mine roadway exception monitoring device and method based on sound
CN111276151A (en) * 2020-01-20 2020-06-12 北京正和恒基滨水生态环境治理股份有限公司 Bird sound identification system and identification method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000268265A (en) * 1999-03-15 2000-09-29 Chuo Joho Kaihatsu Kk Monitoring device utilizing microphone
CN102915753A (en) * 2012-10-23 2013-02-06 华为终端有限公司 Method for intelligently controlling volume of electronic device and implementation device of method
US20180070170A1 (en) * 2016-09-05 2018-03-08 Honda Motor Co., Ltd. Sound processing apparatus and sound processing method
CN106971499A (en) * 2017-04-14 2017-07-21 北京克路德人工智能科技有限公司 Intelligent monitor system based on auditory localization
CN107682772A (en) * 2017-10-25 2018-02-09 倬韵科技(深圳)有限公司 A kind of hazard recognition is to control the method, apparatus of audio output and earphone
CN107862060A (en) * 2017-11-15 2018-03-30 吉林大学 A kind of semantic recognition device for following the trail of target person and recognition methods
CN108986838A (en) * 2018-09-18 2018-12-11 东北大学 A kind of adaptive voice separation method based on auditory localization
CN109695475A (en) * 2018-11-30 2019-04-30 太原理工大学 A kind of coal mine roadway exception monitoring device and method based on sound
CN111276151A (en) * 2020-01-20 2020-06-12 北京正和恒基滨水生态环境治理股份有限公司 Bird sound identification system and identification method

Similar Documents

Publication Publication Date Title
US11289072B2 (en) Object recognition method, computer device, and computer-readable storage medium
US11138977B1 (en) Determining device groups
US10482904B1 (en) Context driven device arbitration
US10705789B2 (en) Dynamic volume adjustment for virtual assistants
US10504513B1 (en) Natural language understanding with affiliated devices
US9916832B2 (en) Using combined audio and vision-based cues for voice command-and-control
US11004453B2 (en) Avoiding wake word self-triggering
US10276154B2 (en) Processing natural language user inputs using context data
US9984563B2 (en) Method and device for controlling subordinate electronic device or supporting control of subordinate electronic device by learning IR signal
US11727939B2 (en) Voice-controlled management of user profiles
US20180033427A1 (en) Speech recognition transformation system
CN110889009B (en) Voiceprint clustering method, voiceprint clustering device, voiceprint processing equipment and computer storage medium
KR102512614B1 (en) Electronic device audio enhancement and method thereof
JP7549742B2 (en) Multi-Channel Voice Activity Detection
US9940326B2 (en) System and method for speech to speech translation using cores of a natural liquid architecture system
CN110837758A (en) Keyword input method and device and electronic equipment
KR20210001082A (en) Electornic device for processing user utterance and method for operating thereof
CN114093358A (en) Speech recognition method and apparatus, electronic device, and storage medium
TW201503116A (en) Method for using voiceprint identification to operate voice recoginition and electronic device thereof
US10693944B1 (en) Media-player initialization optimization
WO2021228059A1 (en) Fixed sound source recognition method and apparatus
CN110610706A (en) Sound signal acquisition method and device, electrical equipment control method and electrical equipment
CN115447588A (en) Vehicle control method and device, vehicle and storage medium
US11942089B2 (en) Electronic apparatus for recognizing voice and method of controlling the same
CN113674759A (en) Fixed sound source identification method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21804094

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21804094

Country of ref document: EP

Kind code of ref document: A1