CN110136722A

CN110136722A - Audio signal processing method, device, equipment and system

Info

Publication number: CN110136722A
Application number: CN201910281493.3A
Authority: CN
Inventors: 李赛; 娄晓磊; 王重乐
Original assignee: Beijing Xiaoniao Tingting Technology Co Ltd
Current assignee: Beijing Xiaoniao Tingting Technology Co Ltd
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2019-08-16

Abstract

The present invention relates to a kind of audio signal processing method, a kind of speech signal processing device, a kind of electronic equipment and a kind of speech signal processing systems.In this method, for each of the equipment group that is made of multiple equipment equipment: receiving current voice signal；Whether decision needs to respond the voice signal, obtains the result of decision；In the case where the result of decision is to need to respond the voice signal, the voice signal is responded.

Description

Audio signal processing method, device, equipment and system

Technical field

The present invention relates to field of speech recognition, more particularly, to a kind of audio signal processing method, a kind of voice signal Processing unit, a kind of electronic equipment and a kind of speech signal processing system.

Background technique

With the development of speech recognition technology, more and more electronic equipments start to interact using voice mode, make Product becomes more Intelligent portable.Such as in intelligent sound box class product, user can pass through voice mode wake-up device, control Music, inquiry weather etc..

Due to the intrinsic feature of voice signal, same voice signal can be received by multiple electronic equipments, and by multiple electricity Sub- equipment response, this is easy to cause to perplex to user.For example, for the identical multiple equipment for waking up word, user's progress When voice wakes up, the response time even response contents of multiple equipment will appear inconsistent situation, this receives user Response message is chaotic, to influence the usage experience of user.

Summary of the invention

One purpose of the embodiment of the present invention is to provide a kind of new technical solution of Speech processing.

According to the first aspect of the invention, a kind of audio signal processing method is provided, which is characterized in that for by multiple Each of the equipment group that equipment is constituted equipment:

Receive current voice signal；

Whether decision needs to respond the voice signal, obtains the result of decision；

In the case where the result of decision is to need to respond the voice signal, the voice signal is responded.

Optionally, whether the decision needs to respond the voice signal, comprising:

Obtain the equipment itself the received voice signal setting index；

Obtain other equipment in the equipment group the received voice signal setting index；

According to the setting index of the setting index of the equipment itself and the other equipment, whether decision is needed Respond the voice signal.

Optionally, it is described obtain other equipment in the equipment group the received voice signal setting index, Include:

Receive the setting index of the other equipment sent in preset time period by the other equipment.

Optionally, the setting index includes: at the time of receiving the voice signal and to receive the voice signal Intensity at least one of.

Optionally, wherein described at the time of set index and receive the voice signal described in；It is described to be set according to It is standby itself the setting index and the other equipment the setting index, whether decision need to respond the voice letter Number, comprising:

In the case where earliest at the time of the equipment itself the institute received voice signal, the result of decision is determined To need to respond the voice signal.

Optionally, the intensity for setting index to receive the voice signal；It is described according to the equipment itself Whether the setting index of the setting index and the other equipment, decision need to respond the voice signal, comprising:

The equipment itself the received voice signal maximum intensity in the case where, determine the result of decision To need to respond the voice signal.

Optionally, the setting index includes described at the time of receive the voice signal and described receiving institute simultaneously The intensity of predicate sound signal；The setting of the setting index and the other equipment according to the equipment itself refers to Whether mark, decision need to respond the voice signal, comprising:

According at the time of each equipment institute received voice signal and intensity, the comprehensive of the voice signal is determined Close index；

The equipment itself the received voice signal overall target it is optimal in the case where, determine the decision It as a result is to need to respond the voice signal.

Optionally, the method also includes:

In the case where the result of decision is not need to respond the voice signal, the voice signal is not rung It answers, and the equipment itself is set and no longer receives or respond subsequent voice signal.

Optionally, whether the decision needs to respond the voice signal, obtains the result of decision, comprising:

Obtain the received current voice signal setting index, as current criteria；

Obtain institute received first voice signal setting index, as reference index；

Compare the current criteria and described referring to index, obtains comparison result；

In the case where the comparison result meets and imposes a condition, by the result of decision of the correspondence first voice signal The result of decision as the correspondence current voice signal.

Optionally, the setting condition determines in the following manner:

The setting index for obtaining multiple voice signal, as historical data；

The setting condition is determined according to the historical data.

Determine whether the equipment itself is main equipment in the equipment group；

In the case where determining the equipment itself is the main equipment, the result of decision is determined to need described in response Voice signal；

Wherein, the main equipment is in the equipment group to the equipment of other equipment push audio data.

According to the second aspect of the invention, a kind of speech signal processing device, the Speech processing dress are additionally provided Setting in each of the equipment group being made of multiple equipment equipment, comprising:

Receiving module, for receiving current voice signal:

Whether decision-making module needs to respond the voice signal for decision, obtains the result of decision；And

Respond module, for the result of decision be need to respond the voice signal in the case where, to the voice Signal is responded.

According to the third aspect of the invention we, a kind of electronic equipment is additionally provided, including as described in respect of the second aspect of the invention Speech signal processing device；Alternatively, the electronic equipment includes:

Memory, for storing executable command；

Processor, for executing any one as described in the first aspect of the invention under the control of the executable command Method.

According to the third aspect of the invention we, a kind of speech signal processing system is additionally provided, including multiple such as the present invention the Electronic equipment described in three aspects, and for same voice signal, each electronic equipment is performed both by such as the present invention first Any one method described in aspect.

By referring to the drawings to the detailed description of exemplary embodiment of the present invention, other feature of the invention and its Advantage will become apparent.

A beneficial effect of the invention is, in audio signal processing method provided in this embodiment, equipment group In any appliance after receiving voice signal to itself whether needing to respond carry out decision, according to the result of decision to voice signal It is responded, avoids the multiple equipment problem chaotic for the response of same voice signal, be conducive to improvement user uses body It tests, speech ciphering equipment is made to become more Intelligent portable.

In addition, in audio signal processing method provided in this embodiment, Response Decision of the speech ciphering equipment to voice signal It is to be made by its own, the decision process of each equipment is relatively independent in group, so as to avoid the decision event of equipment component Hinder the response performance to equipment group to affect greatly, so that audio signal processing method interaction is stablized in the present embodiment, it can By property height.

In addition, the transmitting of voice signal setting index can carry out in a local network in the present embodiment, it is not necessarily to and server Communication, interaction Caton phenomenon caused by can be avoided because of network delay.

Detailed description of the invention

It is combined in the description and the attached drawing for constituting part of specification shows the embodiment of the present invention, and even With its explanation together principle for explaining the present invention.

Fig. 1 shows the schematic diagram that can be used for realizing the speech ciphering equipment of the embodiment of the present invention.

Fig. 2 is a kind of schematic diagram of application scenarios of audio signal processing method provided in an embodiment of the present invention.

Fig. 3 is the flow chart for the audio signal processing method that the embodiment of the present invention one provides.

Fig. 4 is the schematic diagram for the speech signal processing device that the embodiment of the present invention five provides.

Fig. 5 is the schematic diagram for the electronic equipment that the embodiment of the present invention six provides.

Specific embodiment

Carry out the various exemplary embodiments of detailed description of the present invention now with reference to attached drawing.It should also be noted that unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of invention.

Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the present invention And its application or any restrictions used.

Technology known to related fields ordinary skill personage, method and apparatus may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.

It is shown here and discuss all examples in, any occurrence should be construed as merely illustratively, without It is as limitation.Therefore, other examples of exemplary embodiment can have different values.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.

Fig. 1 shows the schematic diagram that can be used for realizing the speech ciphering equipment of the embodiment of the present invention.The speech ciphering equipment for example can Recognition of speech signals simultaneously responds.

As shown in Figure 1, speech ciphering equipment 1000 includes processor 1010, memory 1020, communication device 1030, display dress Set 1040, microphone 1050 and loudspeaker 1060.

Processor 1010 is, for example, central processor CPU, Micro-processor MCV etc..Memory 1020 for example including ROM (only Read memory), RAM (random access memory), the nonvolatile memory of hard disk etc..Communication device 1030 for example can Carry out wire communication or wireless communication.Display device 1040 is such as can be used for showing played music information, e.g. Liquid crystal display.Microphone 1050 for example can be used for receiving voice signal, e.g. electrodynamic type microphone, Electret Condencer Microphone, Piezoelectric microphone etc..Loudspeaker 1060 for example can be used for playing sound, e.g. dynamic speaker, electromagnetic loudspeaker, Electrostatic loudspeaker, piezo-electric loudspeaker etc..

Information processing system 1000 shown in FIG. 1 is only explanatory, and is never intended to that the limitation present invention, it answers With or purposes.

Speech ciphering equipment in Fig. 2 includes speech ciphering equipment 210, speech ciphering equipment 220 and speech ciphering equipment 230.These voices are set Standby configuration is for example identical as the configuration of speech ciphering equipment 1000 in Fig. 1.

Multiple speech ciphering equipments in Fig. 2 can be with component devices group.The equipment group is, for example, that can believe same voice It number carries out responding and the multiple speech ciphering equipments that can be in communication with each other passes through institute in another example being to constitute multiple speakers of audio group The audio group of foundation, can synchronously playing audio signal, such as music Streaming Media between these speakers.Wherein, multiple voices When equipment is in communication with each other, communication between any two equipment, which can be, directly to be carried out, and is also possible to by means of other equipment Such as router progress.

As shown in Fig. 2, user generates voice signal by speaking, since sound can pass around in the form of a sound wave It broadcasts, which can be received by multiple speech ciphering equipments, this is easy to cause the chaotic situation of multiple equipment response.For this purpose, The present embodiment needle provides a kind of audio signal processing method, and this method can be applied to scene shown in Fig. 2.

Audio signal processing method provided in this embodiment is implemented by each of equipment group speech ciphering equipment, such as Implemented simultaneously by each of speech ciphering equipment 210, speech ciphering equipment 220, speech ciphering equipment 230 equipment in Fig. 2.As shown in figure 3, should Method includes the following steps S3100-S3300:

Step S3100 receives current voice signal.

For example, receiving current voice signal by the speech ciphering equipment 210 in Fig. 2.Current voice signal is, for example, to be used for The wake-up voice signal of speech ciphering equipment is waken up, for example, the brief voice signal being made of several syllables, word or word, for calling out Speech ciphering equipment wake up further to receive the phonetic order of user's sending.Current voice signal can also be the " tune that user issues The voices such as big volume " " inquiry weather " " setting alarm clock ".Speech ciphering equipment 210 can for example receive voice letter by microphone Number, and electrical signal form is converted from sound wave form by voice signal.

After being connected to voice signal, speech ciphering equipment executes following steps S3200:

Step S3200, decision whether voice responsive signal, obtain the result of decision.

For example, after speech ciphering equipment 210 receives current voice quotation marks in Fig. 2, to itself whether needing to believe the voice Number response carry out decision, the result of decision responded.

In one example, step S3200 includes the following steps S3210-S3230:

Step S3210: speech ciphering equipment obtains the setting index that itself receives voice signal.

For example, the voice signal that the speech ciphering equipment 210 in Fig. 2 is received according to itself, determines that itself receives voice signal Setting index.At the time of the setting index e.g. receives voice signal, in another example be the intensity for receiving voice signal, It can also simultaneously include at the time of receive voice signal and receiving the intensity of voice signal.

When at the time of determining that speech ciphering equipment receives voice signal, it is same in advance can be carried out to multiple speech ciphering equipments the time Step, then is recorded by each speech ciphering equipment at the time of itself receiving voice signal, with guarantee distinct device record when Between be comparable.

When determining that speech ciphering equipment receives the intensity of voice signal, itself can be measured by speech ciphering equipment and receive voice The indexs such as magnitude of sound, the loudness of signal, to characterize the intensity of voice signal.

Step S3220 obtains the setting index that other equipment in equipment group receive voice signal.

By taking the speech ciphering equipment in Fig. 2 as an example, the setting index for itself receiving voice signal has been got in speech ciphering equipment 210 In the case where, speech ciphering equipment 220,230 has also got the setting index that itself receives voice signal respectively according to same way. At this moment, the setting index that itself receives voice signal can be sent to speech ciphering equipment 210 respectively by speech ciphering equipment 220,230, from And make speech ciphering equipment 210 get speech ciphering equipment 220,230 reception voice signal setting index.

In one example, speech ciphering equipment 210 receives the setting index sent by other equipment in set period of time.It should Set period of time is, for example, start of calculation at the time of receiving voice signal from speech ciphering equipment 210, in another example being received from speech ciphering equipment 210 The earliest moment of the setting index sent to other equipment starts.The setting of other equipment more than the set period of time is referred to Mark, speech ciphering equipment 210 can be received no longer.By selecting suitable setting time segment length, it can be avoided reception other equipment The process for setting index expends more time.

Speech ciphering equipment 220,230 can also obtain other equipment in equipment group by similar fashion and receive voice signal Set index.

Step S3230: according to the setting index of the setting index of itself and other equipment, whether decision needs voice responsive Signal.

For any appliance in equipment group, in the setting index situation for obtaining setting themselves index and other equipment Under, decision itself whether can need voice responsive equipment accordingly.The embodiment of step S3230 is, for example:

(1) in step S3230, the intensity for itself receiving voice signal and other equipment are received language by speech ciphering equipment The intensity of sound signal is compared, and when the intensity for itself being connected to voice signal is maximum one, is made and is needed voice responsive The decision of signal.This mode is advantageously implemented in equipment group to be responded apart from the closer equipment of user.

(2) in step S3230, speech ciphering equipment will receive language with other equipment at the time of itself receiving voice signal It is compared at the time of sound signal, when being earliest one at the time of itself receives voice signal, makes and need to respond language The decision of sound signal.This mode is advantageously implemented equipment group and makes more quick response to voice signal.

(3) in step S3230, speech ciphering equipment is according to successive and intensity at the time of each equipment institute received voice signal Size determines the overall target of the voice signal；Equipment itself received voice signal the optimal situation of overall target Under, make the decision for needing voice responsive signal.This mode comprehensive consideration moment index and intensity index, are conducive to optimize Response policy.In addition, overall target can more accurately be arrived relative to single index when determining that equipment is at a distance from user.? In one example, the equipment nearest with user distance can be selected according to the overall target and is responded.According to related acoustics Rule, it is assumed that the sound intensity of certain point is I, and the distance of the point to sound source is d, and the propagation time of sound is t, on the one hand, sound intensity I and Distance d's square is inversely proportional, that is, I ∝ 1/d², on the other hand, time t is directly proportional to distance d, that is, t ∝ d.Thus may be used See, equipment can be reflected with user's distance d by the index of two aspects of sound intensity I and propagation time t.In order to measure distance d Distance, can basisAnd t, and distribute and overall target is calculated with corresponding weight.It is with the speech ciphering equipment in Fig. 2 Example, it is assumed that the intensity value I that speech ciphering equipment 210,220,230 receives voice signal is successively 1,2,3, then corresponding It is successively 1,0.71,0.57.Assuming that speech ciphering equipment 210,220 and 230 is successively 1,2,3 at the time of receiving voice signal, this In approximate processing is carried out to propagation time of sound, it is believed that the equipment corresponding propagation time for receiving voice signal earliest is 0s, then the corresponding propagation time t of speech ciphering equipment 210,220,230 is followed successively by 0,1,2.For example successively with the weight of t It is 0.8 and 0.2, then speech ciphering equipment 210,220 and 230 receives the overall target of voice signalSuccessively It is 0.8,0.768,0.856, the overall target numerical value is smaller, and it is smaller at a distance from user to represent response apparatus, i.e. overall target Its smaller index meaning of numerical value it is more excellent.Therefore the overall target of speech ciphering equipment 220 is optimal in this example, which makes accordingly needs Want the decision of voice responsive signal.In this way, the equipment nearest apart from user can more accurately be selected.

The different embodiments of above-mentioned steps S3230 can be adapted for different equipment group response policies.For example, if Response policy be selection equipment group in responded apart from the closer equipment of user, can choose above embodiment (1) or Person (3), wherein the intensity or intensity and the weighted results at moment for receiving voice signal using equipment are as between equipment and user The measurement index of distance.In another example if response policy is that the equipment that reaction speed is most fast in group is selected to be responded, It can choose above embodiment (2), wherein as the weighing apparatus of the reaction speed of equipment at the time of receiving voice signal using equipment Figureofmerit.

After the result of decision responded, speech ciphering equipment executes following steps S3300:

Step S3300 responds voice signal in the case where the result of decision is to need voice responsive signal.

Speech ciphering equipment has determined whether itself needs the result of decision of voice responsive signal by step S3200.In decision As a result in the case where responding for needs, speech ciphering equipment can call the response of own hardware progress voice signal.

The mode that equipment responds voice signal is, for example, to issue response voice by loudspeaker, in another example being to pass through Display device shows response figure or response text, in another example being to carry out response prompt by the variation and movement of indicator light.

Speech ciphering equipment is determined by step S3200 itself do not need voice responsive signal in the case where, not to current language Sound signal is responded.

In audio signal processing method provided in this embodiment, any appliance in equipment group is receiving voice signal Afterwards to itself whether needing to respond carry out decision, voice signal is responded according to the result of decision, avoids multiple equipment pair In the problem that the response of same voice signal is chaotic, be conducive to the usage experience for improving user, speech ciphering equipment is made to become more intelligent It is convenient.

A specific example of audio signal processing method is as follows in the present embodiment:

As shown in Fig. 2, speech ciphering equipment 210, speech ciphering equipment 220 and speech ciphering equipment 230 are speaker, three speakers are constituted Equipment group is simultaneously playing same song.At this point, user wants the weather of inquiry tomorrow, and have issued that " how is weather tomorrow The voice signal (in this case, all the voice signal is responded without three speakers) of sample ".For the voice signal, Three speakers are performed both by step S3100-3300 described previously, wherein set index as receiving device and receive voice signal At the time of.By decision, equipment 210, which determines, itself to be received earliest at the time of voice signal, therefore is rung to voice signal It answers, starts the weather for broadcasting tomorrow.Equipment 220 and equipment 230 determine at the time of oneself receiving voice signal be not earliest, because This does not respond voice signal, continues to play song.It can be seen that the audio signal processing method in the present embodiment can Keep the Speech processing of equipment group more orderly, intelligent.

The present embodiment provides a kind of audio signal processing method, the basis of audio signal processing method in example 1 On, particular device is selected from equipment group, subsequent voice signal is received and responded by the equipment.

Audio signal processing method in the present embodiment is implemented by any speech ciphering equipment in equipment group, such as by Fig. 2 Middle speech ciphering equipment 210, speech ciphering equipment 220, any appliance in speech ciphering equipment 230 are implemented.This approach includes the following steps S4100-S4400:

Step S4100 receives current voice signal.

Step S4200, decision whether voice responsive signal, obtain the result of decision.

Step S4300 responds voice signal in the case where the result of decision is to need voice responsive signal.

The specific embodiment of above-mentioned steps S4100-S4300 is referred in embodiment one to step S3100-S3300 Description and explanation, be not further described.

Step S4400 does not ring voice signal in the case where the result of decision is not need voice responsive signal It answers, and equipment itself is set and no longer receives or respond subsequent voice signal.

In step S4400, for not needing the speech ciphering equipment of response current speech signal, in addition to not believing this voice Outside number being responded, also sets up equipment itself and no longer subsequent voice signal is received or responded.For example, speech ciphering equipment is set Itself standby mute microphone (MIC) is installed, to no longer receive subsequent voice signal.In another example speech ciphering equipment still maintains microphone It opens, but the result of decision of subsequent voice signal is determined as not needing to respond.

The duration that speech ciphering equipment is no longer received or responded to subsequent voice signal can be set.Example Such as, it is set as lasting one hour, continue one day or continues to that equipment is shut down.

In step S4400, for needing to respond the speech ciphering equipment of current speech signal, the equipment is in addition to current speech Outside signal is responded, also subsequent voice signal is received and responded.For example, after the equipment receives follow-up signal, The result of decision is determined as to need to respond.In another example directly being rung after the equipment receives subsequent voice signal without decision It answers.

Audio signal processing method in through this embodiment can select particular device from equipment group, be set by this Standby that voice signal after this voice signal is received and responded, other equipment no longer connect subsequent voice signal It receives and response also simplifies the response treatment process to subsequent voice signal, be conducive to improve while avoiding response confusion Response speed of the equipment group to subsequent voice signal.

The present embodiment provides a kind of audio signal processing method, the basis of audio signal processing method in example 1 On, adjacent voice signal twice is preferentially responded by identical speech ciphering equipment.

Audio signal processing method in the present embodiment is implemented by any speech ciphering equipment in equipment group, such as by Fig. 2 Middle speech ciphering equipment 210, speech ciphering equipment 220, any appliance in speech ciphering equipment 230 are implemented.This approach includes the following steps S5100-S5300:

Step S5100 receives current voice signal.

Step S5200, decision whether voice responsive signal, obtain the result of decision.

Step S5300 responds voice signal in the case where the result of decision is to need voice responsive signal.

The specific embodiment of above-mentioned steps S5100-S5300 is referred in embodiment one to step S3100-S3300 Description and explanation, be not further described.

In the present embodiment, step S5200 further comprises the steps S5210-S5240:

Step S5210: the setting index for receiving current voice signal is obtained, as current criteria.

In this step, the setting index for receiving current voice signal, the setting index including speech ciphering equipment itself, Setting index including other equipment.

Step S5220: obtaining the setting index for receiving first voice signal, as referring to index.

In this step, the setting index of first voice signal is received, including speech ciphering equipment itself institute is received formerly The setting index of voice signal, also include other equipment received first voice signal setting index.

The mode for obtaining the setting index for receiving first voice signal is, for example: in the decision process of first voice signal In, the setting index for receiving voice signal is recorded by speech ciphering equipment, and is adjusted in the decision process of current speech signal Take the record.

Step S5230: compare current criteria with referring to index, obtain comparison result.

Step S5240: in the case where comparison result meets and imposes a condition, by the decision knot of the first voice signal of correspondence The result of decision of the fruit as corresponding current voice signal.

In above-mentioned steps S5230 and step S5240, the setting index of last time voice signal is believed as this voice The reference of number decision process, when this voice signal setting index relative to last time voice signal setting index comparison knot Fruit meet impose a condition when, no matter the result of decision of this voice signal script whether the result of decision one with first voice signal It causes, all using the result of decision of first voice signal as the result of decision of current speech signal.

For example, speech ciphering equipment is received for speech ciphering equipment 210, speech ciphering equipment 220 and speech ciphering equipment 230 in Fig. 2 The magnitude of sound of voice signal is as setting index.Speech ciphering equipment 210, speech ciphering equipment 220 and speech ciphering equipment 230 receive first language The numerical value of the setting index of sound signal is for example successively 10,8,5, in this, as referring to index.As can be seen that in first decision As a result in, the result of decision of speech ciphering equipment 210 is to need to respond, and the result of decision of remaining equipment is not need to respond.Voice is set Standby 210, speech ciphering equipment 220 and speech ciphering equipment 230 receive the numerical value of the setting index of current speech signal be for example successively 20, 21,2, in this, as current criteria.As can be seen that the result of decision of this voice signal script is carried out by speech ciphering equipment 220 Response.It will be compared referring to index with current criteria, and formerly set the maximum equipment of index as speech ciphering equipment 210, this sets Determine the maximum equipment of index be speech ciphering equipment 220, but in this speech ciphering equipment 210 and speech ciphering equipment 220 setting index number Value is followed successively by 20 and 21, is closer to.This setting index maximum value (i.e. the setting desired value 21 of speech ciphering equipment 220) with (i.e. voice is set the numerical value of the setting index of equipment (i.e. speech ciphering equipment 210) the reception current speech signal of first voice responsive signal Standby 210 setting desired value 21) ratio be 1.05, it is assumed that impose a condition and be no more than 1.5 for above-mentioned ratio, then it is above-mentioned relatively to tie Fruit meets comparison condition.Therefore, speech ciphering equipment using the result of decision of first voice signal as current speech signal as a result, Final result is that current speech signal is still responded by speech ciphering equipment 210.

Audio signal processing method provided in an embodiment of the present invention can pass through ratio while avoiding response confusion Compared with current criteria and referring to index, so that in the case where setting index variation is relatively little, by identical speech ciphering equipment to phase Adjacent voice signal twice is responded, and is conducive to the consistency for keeping the response of equipment group, is avoided the equipment responded frequent Variation, therefore it is able to ascend user experience.

In the concrete embodiment of the present embodiment one, the setting condition in step S5240 can determine in the following manner；

The setting index for obtaining multiple voice signal, as historical data；It is determined and is imposed a condition according to historical data.

For example, the setting index for repeatedly receiving voice signal to equipment group records, as historical data.Determination is gone through The average value of index, frequency that each numerical value occurs etc. are set in history data, are believed according to the user speech that historical data reflects Number the characteristics of determine suitable impose a condition.

It determines and imposes a condition through the above way, be conducive to carry out Speech processing according to the personalization features of user, To further promote user experience.

The present embodiment provides a kind of audio signal processing method, the basis of audio signal processing method in example 1 On, it whether is that main equipment determines the result of decision based on equipment group.

Audio signal processing method in the present embodiment is implemented by any speech ciphering equipment in equipment group, such as by Fig. 2 Middle speech ciphering equipment 210, speech ciphering equipment 220, any appliance in speech ciphering equipment 230 are implemented.This approach includes the following steps S6100-S6300:

Step S6100 receives current voice signal.

Step S6200, decision whether voice responsive signal, obtain the result of decision.

Step S6300 responds voice signal in the case where the result of decision is to need voice responsive signal.

The specific embodiment of above-mentioned steps S6100-S6300 is referred in embodiment one to step S3100-S3300 Description and explanation, be not further described.

In the present embodiment, step S6200 is further included steps of

Step S6210: determine whether itself is main equipment in equipment group；

Step S6220: in the case where determining itself is main equipment, determine that the result of decision is to need voice responsive signal.

Wherein, main equipment is in equipment group to the equipment of other equipment push audio data.

In the present embodiment, the multiple equipment in equipment group constitutes audio group, from the main equipment in audio group to from Equipment pushes audio data.Equipment in audio group is, for example, speaker.

The method of determination of main equipment is, for example, in audio group:

(1) it for being in broadcast state when building group or being not at two equipment of broadcast state, is built by first initiating The equipment of group request is as main equipment；

(2) two equipment of broadcast state are not in broadcast state, one for one when building group, by playing The equipment of state is as main equipment.

For the equipment in audio group, in Response Decision, determine whether itself is main equipment by equipment, if based on itself Equipment then responds voice signal, does not respond to voice signal if itself not being main equipment.

The audio signal processing method provided in the present embodiment can improve decision speed while avoiding responding confusion Degree, and then improve the response speed of equipment group.

The present embodiment provides a kind of speech signal processing devices.As shown in figure 4, speech signal processing device 400 includes:

Receiving module 410, for receiving current voice signal:

Whether decision-making module 420 needs voice responsive signal for decision, obtains the result of decision；And

Respond module 430, for being rung to voice signal in the case where the result of decision is to need voice responsive signal It answers.

The purposes of modules is referred to the description as described in audio signal processing method in embodiment one in the present embodiment, Which is not described herein again.

In the concrete embodiment of the present embodiment one, decision-making module 420 is also used to:

Obtain equipment itself received voice signal setting index；

Obtain equipment group in other equipment received voice signal setting index；

According to the setting index of the setting index of equipment itself and other equipment, whether decision needs voice responsive signal.

In the concrete embodiment of the present embodiment one, decision-making module 420 is also used to: being received and is sent out in preset time period by other equipment The setting index for the other equipment sent.Wherein, setting index includes at the time of receiving voice signal and receiving voice signal Intensity at least one of.

In the concrete embodiment of the present embodiment one, decision-making module 420 is also used to: equipment itself received voice signal In the case that moment is earliest, determine that the result of decision is to need voice responsive signal.

In the concrete embodiment of the present embodiment one, decision-making module 420 is also used to: equipment itself received voice signal In the case where maximum intensity, determine that the result of decision is to need voice responsive signal.

In the concrete embodiment of the present embodiment one, decision-making module 420 is also used to: according to the received voice signal of each equipment institute At the time of successively and intensity size, determine the overall target of voice signal；Equipment itself received voice signal synthesis In the case that index is optimal, determine that the result of decision is to need voice responsive signal.

In the concrete embodiment of the present embodiment one, speech signal processing device 400 further includes that subsequent response module (is not shown in figure Out), which is used for: the result of decision be do not need voice responsive signal in the case where, not to voice signal into Row response, and equipment itself is set and no longer receives or respond subsequent voice signal；

In the case where the result of decision is to need voice responsive signal, setting equipment itself receives and responds subsequent voice Signal.

In the concrete embodiment of the present embodiment one, speech signal processing device 400 further includes comparison module (not shown), The comparison module is used for:

The setting index for receiving current voice signal is obtained, as current criteria；

The setting index for receiving first voice signal is obtained, as referring to index；

Compare current criteria with referring to index, obtains comparison result；

In the case where comparison result meets and imposes a condition, using the result of decision of the first voice signal of correspondence as correspondence The result of decision of current voice signal.

In the concrete embodiment of the present embodiment one, which is also used to:

The setting index for obtaining multiple voice signal, as historical data；

It is determined and is imposed a condition according to historical data.

In the concrete embodiment of the present embodiment one, decision-making module 420 is also used to

Determine whether itself is main equipment in equipment group；

In the case where determining itself is main equipment, determine that the result of decision is to need voice responsive signal；

The present embodiment provides a kind of electronic equipment, which includes the Speech processing dress as described in embodiment five It sets, for details, reference can be made to the descriptions as described in speech signal processing device in embodiment five.

Alternatively, the electronic equipment 500 in the electronic equipment such as Fig. 5, comprising:

Reservoir 510, for storing executable command.

Processor 520, for executing such as any one of embodiment one to embodiment three institute under the control of executable command The method stated.The description as described in audio signal processing method into embodiment three that for details, reference can be made to embodiments one.

The present embodiment provides a kind of speech signal processing system, which includes multiple six institutes of embodiment The electronic equipment stated, and for same voice signal, each electronic equipment, which is performed both by, appoints such as embodiment one into embodiment three Method described in one.

The speech signal processing system is, for example, the device cluster that figure is made of the speech ciphering equipment 210,220 and 230 in Fig. 2 Group can specifically participate in description of the embodiment one into embodiment three for the equipment group, and which is not described herein again.

The present invention can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.

Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium More specific example (non exhaustive list) includes: portable computer diskette, hard disk, random access memory (RAM), read-only deposits It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire Electric signal.

Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.

Computer program instructions for executing operation of the present invention can be assembly instruction, instruction set architecture (ISA) instructs, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, the programming language include the programming language-of object-oriented such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the invention Face.

Referring herein to according to the method for the embodiment of the present invention, the flow chart of device (system) and computer program product and/ Or block diagram describes various aspects of the invention.It should be appreciated that flowchart and or block diagram each box and flow chart and/ Or in block diagram each box combination, can be realized by computer-readable program instructions.

These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram The instruction of the various aspects of defined function action.

Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.

The flow chart and block diagram in the drawings show the system of multiple embodiments according to the present invention, method and computer journeys The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.It is right For art technology personage it is well known that, by hardware mode realize, by software mode realize and pass through software and It is all of equal value that the mode of combination of hardware, which is realized,.

Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the ordinary skill personage in art field.The selection of term used herein, purport In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its Its ordinary skill personage can understand each embodiment disclosed herein.The scope of the present invention is defined by the appended claims.

Claims

1. a kind of audio signal processing method, which is characterized in that for each of the equipment group being made of multiple equipment Equipment is performed simultaneously:

Receive current voice signal；

2. according to the method described in claim 1, wherein, whether the decision needs to respond the voice signal, comprising:

Obtain the equipment itself the received voice signal setting index；

According to the setting index of the setting index of the equipment itself and the other equipment, whether decision, which needs, is rung Answer the voice signal.

3. according to the method described in claim 2, wherein, other equipment institute is received described in the acquisition equipment group The setting index of voice signal, comprising:

4. according to the method described in claim 2, wherein, the setting index includes: at the time of receiving the voice signal With receive the voice signal intensity at least one of.

5. according to the method described in claim 2, wherein, it is described set index received described in the voice signal when It carves；It is described whether to be needed according to the setting index of the equipment itself with the setting index of the other equipment, decision Respond the voice signal, comprising:

In the case where earliest at the time of the equipment itself the institute received voice signal, the result of decision is determined to need Respond the voice signal.

6. according to the method described in claim 2, wherein, the intensity for setting index to receive the voice signal；Institute The setting index of the setting index and the other equipment according to the equipment itself is stated, whether decision, which needs, responds The voice signal, comprising:

The equipment itself the received voice signal maximum intensity in the case where, determine the result of decision for need Respond the voice signal.

7. according to the method described in claim 2, wherein, the setting index includes described receiving the voice signal simultaneously At the time of and the intensity for receiving the voice signal；The setting index according to the equipment itself and described Whether the setting index of other equipment, decision need to respond the voice signal, comprising:

According at the time of each equipment institute received voice signal and intensity, determine that the synthesis of the voice signal refers to Mark；

The equipment itself the received voice signal the overall target it is optimal in the case where, determine the decision It as a result is to need to respond the voice signal.

8. according to the method described in claim 1, wherein, the method also includes:

In the case where the result of decision is not need to respond the voice signal, the voice signal is not responded, And the equipment itself is set and no longer receives or respond subsequent voice signal.

9. obtaining decision according to the method described in claim 1, wherein, whether the decision needs to respond the voice signal As a result, comprising:

Obtain the received current voice signal setting index, as current criteria；

The comparison result meet impose a condition in the case where, using the result of decision of the correspondence first voice signal as The result of decision of the corresponding current voice signal.

10. according to the method described in claim 9, wherein, the setting condition determines in the following manner:

The setting index for obtaining multiple voice signal, as historical data；

The setting condition is determined according to the historical data.

11. obtaining decision according to the method described in claim 1, wherein, whether the decision needs to respond the voice signal As a result, comprising:

In the case where determining the equipment itself is the main equipment, determine that the result of decision is to need to respond the voice Signal；

12. a kind of speech signal processing device, the speech signal processing device is located at the equipment group being made of multiple equipment Each of in equipment, comprising:

Receiving module, for receiving current voice signal:

Respond module, for the result of decision be need to respond the voice signal in the case where, to the voice signal It is responded.

13. a kind of electronic equipment, comprising:

Memory, for storing executable command；

Processor, for executing such as the described in any item methods of claim 1-11 under the control of the executable command.

14. a kind of speech signal processing system, including multiple electronic equipments as claimed in claim 13, the multiple electronics are set Standby constitution equipment group；And for same voice signal, each electronic equipment is performed both by as appointed in claim 1-11 Method described in one.