CN112735391B - Distributed voice response method and related device - Google Patents

Distributed voice response method and related device Download PDF

Info

Publication number
CN112735391B
CN112735391B CN202011596480.4A CN202011596480A CN112735391B CN 112735391 B CN112735391 B CN 112735391B CN 202011596480 A CN202011596480 A CN 202011596480A CN 112735391 B CN112735391 B CN 112735391B
Authority
CN
China
Prior art keywords
terminal
response
voice signal
response time
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011596480.4A
Other languages
Chinese (zh)
Other versions
CN112735391A (en
Inventor
黄真明
陆春亮
王毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202011596480.4A priority Critical patent/CN112735391B/en
Publication of CN112735391A publication Critical patent/CN112735391A/en
Application granted granted Critical
Publication of CN112735391B publication Critical patent/CN112735391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • General Physics & Mathematics (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application discloses a response method of distributed voice and a related device, wherein the method comprises the following steps: receiving a voice signal of at least one pickup terminal, and identifying the voice signal to determine whether the voice signal has a wake-up keyword; if the wake-up keyword exists, acquiring the position information, response time and energy information of voice information of the pick-up terminal; and selecting a response terminal of the voice signal from the at least one pickup terminal according to the position information, the response time and the energy information. The technical scheme provided by the application has the advantage of high user experience.

Description

Distributed voice response method and related device
Technical Field
The application relates to the technical field of audio, in particular to a distributed voice response method and a related device.
Background
The intelligent home brings convenience life to people, and meanwhile, the control habit of people is gradually changed. The development of the voice technology also supplements a control entrance, and after the traditional remote controller and the mobile phone APP are thrown away, the home environment is in a comfortable state by sending a control instruction through a password, so that the development direction of the intelligent home is also realized.
In order to improve the interaction naturalness of intelligent household voice control, the voice interaction difficulty of multi-physical space and multi-voice equipment needs to be solved, namely, the cross-space scene linkage obstacle and the multi-voice equipment one-hundred-call problem are overcome. The prior art has a speech receiving unit that detects the maximum speech signal as a response unit for the speech signal. However, the scheme has a certain limitation on the response equipment, can not meet the requirement of the user on voice response, and influences the user experience.
Disclosure of Invention
The embodiment of the application provides a distributed voice response method and a related device, aiming at improving the user experience.
In a first aspect, there is provided a method of responding to distributed speech, the method comprising the steps of:
Receiving a voice signal of at least one pickup terminal, and identifying the voice signal to determine whether the voice signal has a wake-up keyword;
if the wake-up keyword exists, acquiring the position information, response time and energy information of voice information of the pick-up terminal;
And selecting a response terminal of the voice signal from the at least one pickup terminal according to the position information, the response time and the energy information.
In an alternative solution, the selecting the response terminal for the voice signal from the sound pickup terminals according to the location information, the response time, and the energy information specifically includes:
Constructing a comprehensive evaluation set, wherein the comprehensive evaluation set comprises a plurality of grades;
respectively acquiring a plurality of evaluation values of the position information, the response time and the energy information in the plurality of grades, and forming an evaluation matrix by the plurality of evaluation values;
selecting a response terminal of the voice signal according to the evaluation matrix and a preset weight set;
the weight set is a weight value of the position information, the response time and the energy information.
In an alternative scheme, the response terminal for selecting the voice signal according to the evaluation matrix and the preset weight set specifically includes:
and calculating the product of the evaluation matrix and the weight set to obtain a product result, and selecting a response terminal of the voice signal according to the size of the product result.
In an alternative solution, after calculating the product of the evaluation matrix and the set of weights to obtain the product result, the method further includes:
Calculating the product result and multiplying the product result by a preset coefficient set to obtain a decision score, and selecting a pickup terminal with the highest decision score as a response terminal of the voice signal;
the coefficient set includes: location information, response time, and coefficients of energy information.
In an alternative scheme, the acquiring the position information specifically includes:
And acquiring the distance and time delay between the receiving end microphone and the pickup terminal microphone, and estimating the position information according to the distance, the time delay and the sound velocity.
In an alternative scheme, the acquiring the response time specifically includes:
receiving a decision request packet sent by a pickup terminal, and returning a session identification packet to the pickup terminal;
and receiving response time returned by the pickup terminal, wherein the response time is the difference between the receiving time of the session identification packet and the sending time of the decision request packet.
In a second aspect, there is provided a distributed speech responsive device, the device comprising:
a communication unit for receiving a voice signal of at least one sound pickup terminal;
The processing unit is used for carrying out voice recognition on the voice signal to determine whether the voice signal has a wake-up keyword or not; if the wake-up keyword exists, acquiring the position information, response time and energy information of voice information of the pick-up terminal; and selecting a response terminal of the voice signal from at least one pickup terminal according to the position information, the response time and the energy information.
In an alternative scheme, the processing unit is specifically configured to construct a comprehensive evaluation set, where the comprehensive evaluation set includes a plurality of levels; respectively acquiring a plurality of evaluation values of the position information, the response time and the energy information in the multi-level, and forming an evaluation matrix by the plurality of evaluation values; selecting a response terminal of the voice signal according to the evaluation matrix and a preset weight set;
the weight set is a weight value of the position information, the response time and the energy information.
In an alternative arrangement, the first and second modules,
The processing unit is specifically configured to calculate a product of the evaluation matrix and the weight set to obtain a product result, and select a response terminal of the voice signal according to the magnitude of the product result.
In one alternative,
The processing unit is also used for calculating the multiplication result and a preset coefficient set to obtain a decision score, and selecting a pickup terminal with the highest decision score as a response terminal of the voice signal;
the coefficient set includes: location information, response time, and coefficients of energy information.
In one alternative,
The processing unit is specifically configured to obtain a distance and a time delay between the receiving end microphone and the pickup terminal microphone, and estimate the position information according to the distance, the time delay and the sound velocity.
In one alternative,
The processing unit is specifically used for receiving the decision request packet sent by the pickup terminal and returning a session identification packet to the pickup terminal; and receiving response time returned by the pickup terminal, wherein the response time is the difference between the receiving time of the session identification packet and the sending time of the decision request packet.
In a third aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to perform part or all of the steps described in the first aspect of the embodiments of the present application.
In a fourth aspect, embodiments of the present application provide a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.
It can be seen that, according to the technical scheme provided by the embodiment of the application, a voice signal of at least one pickup terminal is received, the voice signal is identified to determine whether a wake-up keyword exists, and if the wake-up keyword exists, the position information, the response time and the energy information of the voice information of the pickup terminal are obtained; and selecting the response terminal of the voice signal from the at least one pickup terminal according to the position information, the response time and the energy information, and further determining the response terminal through multiple dimensions, namely, through multiple aspects of the position information, the response time and the energy information, so as to meet the diversified requirements of users on voice response, and further improve the user experience.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a distributed voice response method according to an embodiment of the application.
Fig. 2 is a flow chart of a response method for distributed voice according to a second embodiment of the application.
Fig. 3 is a schematic response flow chart of distributed voice according to an embodiment of the present application.
Fig. 4 is a flow chart of a decision making system deciding a response terminal according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a distributed voice response device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical scheme of the application is realized, for example, in occasions such as intelligent home, and the network framework based on the intelligent home can comprise: decision-making system and a plurality of pickup terminals, this pickup terminal specifically can include: the terminal that intelligent audio amplifier, smart television, smart mobile phone, intelligent refrigerator can realize audio acquisition and feedback, above-mentioned decision-making system can be the high in the clouds decision-making system, and this high in the clouds decision-making system can install in equipment such as computer equipment, server.
Example 1
Referring to fig. 1, fig. 1 provides a response method of distributed voice, which may be executed by a decision making system, where the decision making system is, for example, a smart phone, a tablet computer, a personal computer, a server, a cloud platform, etc., and the technical scenario implemented in this embodiment is, for example, a smart home scenario, where a smart home scenario includes a plurality of pickup terminals, where the plurality of pickup terminals respectively communicate with the decision making system, where the communication manner includes, but is not limited to, wifi, bluetooth, etc., short-distance communication manners, and of course, in practical application, the communication manner may also be a mobile communication network, for example, LTE, 5G, etc. Whether a plurality of pickup terminals participate in the technical scheme of the application or not can be selected by a user, as shown in fig. 1, the method comprises the following steps:
Step S101, receiving a voice signal of at least one pickup terminal, and identifying the voice signal to determine whether the voice signal has a wake-up keyword;
The wake-up keywords in step S101 may be specific words of a call, such as egg, imidazole, etc., or may be words having a wake-up meaning, such as please reply, play, etc., and the specific wake-up keywords may be pre-configured.
The above-mentioned manner of recognizing the voice may be determined by a voice recognition manner, for example, a voice recognition algorithm of a mass communication, but may be determined by other manners.
Step S102, if the wake-up keyword is included, acquiring the position information, response time and energy information of voice information of the pick-up terminal;
and step S103, selecting a response terminal of the voice signal from the at least one pickup terminal according to the position information, the response time and the energy information.
The first embodiment of the application provides a technical scheme for receiving a voice signal of at least one pickup terminal, identifying the voice signal to determine whether the voice signal has a wake-up keyword, and if the voice signal has the wake-up keyword, acquiring position information, response time and energy information of the voice information of the pickup terminal; and selecting the response terminal of the voice signal from the at least one pickup terminal according to the position information, the response time and the energy information, and further determining the response terminal through multiple dimensions, namely, through multiple aspects of the position information, the response time and the energy information, so as to meet the diversified requirements of users on voice response, and further improve the user experience.
In an alternative solution, the selecting the response terminal for the voice signal from the sound pickup terminals according to the location information, the response time, and the energy information may specifically include:
Constructing a comprehensive evaluation set, wherein the comprehensive evaluation set comprises a plurality of grades;
respectively acquiring a plurality of evaluation values of the position information, the response time and the energy information in the plurality of grades, and forming an evaluation matrix by the plurality of evaluation values;
selecting a response terminal of the voice signal according to the evaluation matrix and a preset weight set;
the weight set is a weight value of the position information, the response time and the energy information.
The multiple levels may be n levels, and the value of n may be selected according to practical situations, for example, 2,3,4, and 5. For example, n=2, the determined rank may be a good rank and a normal rank, and if n=3, the determined rank may be a good, medium, or bad. Of course, in practical application, the value of n may be other values.
In an alternative solution, the method for acquiring the energy information may specifically include:
E n is the short-time average energy value of a certain voice signal at N time, x (m) is a voice sampling value at m time, N is the window length of a rectangular window, and a fixed value of 100-200 is taken.
In an alternative scheme, the acquiring the position information specifically includes:
and acquiring the distance and time delay between the receiving end microphone and the pickup terminal microphone, and estimating the position information according to the distance, the time delay and the sound velocity. The calculation method specifically comprises the following steps:
where P is a position information value, d is a distance between a receiving-end microphone and a pickup terminal microphone, c is a sound velocity,
Is the time delay between microphones.
In an alternative scheme, the acquiring the response time specifically includes:
receiving a decision request packet sent by a pickup terminal, and returning a session identification packet to the pickup terminal;
and receiving response time returned by the pickup terminal, wherein the response time = session identification packet receiving time-decision request packet sending time.
The specific calculation formula is as follows:
D=t resp-Treq formula 3
Wherein, T req is the decision request packet transmission time, and T resp is the session identification packet reception time.
In an alternative solution, the selecting, according to the evaluation matrix and the preset weight set, the response terminal of the voice signal specifically includes:
and calculating the product of the evaluation matrix and the weight set to obtain a product result, and selecting a response terminal of the voice signal according to the size of the product result.
In an alternative solution, after calculating the product of the evaluation matrix and the set of weights to obtain the product result, the method further includes:
Calculating the product result and multiplying the product result by a preset coefficient set to obtain a decision score, and selecting a pickup terminal with the highest decision score as a response terminal of the voice signal;
the coefficient set includes: location information, response time, and coefficients of energy information.
In the above alternative, the adjustment may be performed by a preset coefficient set, where the preset system set may be configured by a manufacturer, or may be issued by a decision system through configuration information.
Example two
Referring to fig. 2, fig. 2 provides a response method of distributed voice, which may be executed by a decision making system, where the decision making system is, for example, a smart phone, a tablet computer, a personal computer, a server, a cloud platform, etc., and the technical scenario implemented in this embodiment is, for example, a smart home scenario, where a smart home scenario includes a plurality of pickup terminals, where the plurality of pickup terminals respectively communicate with the decision making system, where the communication manner includes, but is not limited to, wifi, bluetooth, etc., short-distance communication manners, and of course, in practical application, the communication manner may also be a mobile communication network, for example, LTE, 5G, etc. Whether a plurality of pickup terminals participate in the technical scheme of the application can be selected by a user, as shown in fig. 2, the method comprises the following steps:
Step S201, the sound pickup module receives the voice signal, and the sound pickup module recognizes the awakening keywords.
Step S202, after analyzing the voice signal containing the wake-up word, the analysis module analyzes the energy information E and the position information P of the sound source.
The analysis module may refer to the description of the above formula 1 for the analysis of the sound energy information, which is not repeated here. The analysis module calculates the angle information P of the sound source based on the TDOA method, and the specific calculation formula can be referred to the description of the above formula 2.
In step S203, the sound pickup apparatus transmits a decision request packet (client identifier C, sound energy E, sound source position P) to the decision system.
Step S204, the decision system checks whether there is a current active session, if not, allocates a new session identifier S, records a session allocation time T, and starts a session timeout timer. And if the active session exists currently and the session is overtime, sending a dormancy instruction to the pickup terminal, and ending the flow. If there is currently an active session and no timeout occurs, the subsequent steps are continued.
And step S205, the decision system returns the session identification packet (S) to the pickup terminal.
And S206, the pickup terminal calculates network delay D according to the arrival time difference of the decision request packet and the session identification packet, and sends delay reporting packets (C, S, D) to the decision system.
The specific calculation of the delay D can be seen from the description of the above formula 3.
Step S207, the decision system establishes a comprehensive evaluation factor set U: and recording the (D, E, P) information of all the pickup terminals into an array U, and if the pickup terminals do not contain delay information D, ignoring the pickup terminals.
Step S208, establishing a comprehensive evaluation set V: the comprehensive evaluation set of the sound pickup terminals is set to v= (V1, V2, V3). Wherein v1, v2, v3 identify good, medium, bad, etc., respectively.
And S209, establishing single-factor fuzzy evaluation to obtain an evaluation matrix R. Wherein X () is a membership function of D to v1, Y () is a membership function of E to v2, and Z () is a membership function of P to v 3. R11 represents the membership of D to v1, and so on.
R1(N)=X(D,N);
R2(N)=Y(E,N);
R3(N)=Z(P,N)。
Step S210, determining a factor weight vector A as a weight set of each factor: a= [ a1, a2, a3]; and establishing a comprehensive evaluation model to obtain a fuzzy vector B=A×R.
Step S211, calculating a total score of f=b×l, where l= [ L1, L2, L3] is a rank score of the corresponding factor in V, and the decision system selects the response terminal with the highest total score, and notifies other pickup devices with low scores to enter a sleep state.
In the above scheme, when the session timeout timing time arrives, the delay of all pickup terminals not reporting the delay information D is marked as the maximum value. And selecting the optimal pickup equipment again through a fuzzy comprehensive evaluation method to serve as a final result.
The second embodiment of the present application provides a technical solution that receives a voice signal of at least one pickup terminal, identifies the voice signal to determine whether the pickup terminal has a wake-up keyword, and if the pickup terminal has the wake-up keyword, obtains location information, response time, and energy information of the voice information of the pickup terminal; and selecting the response terminal of the voice signal from the at least one pickup terminal according to the position information, the response time and the energy information, and further determining the response terminal through multiple dimensions, namely, through multiple aspects of the position information, the response time and the energy information, so as to meet the diversified requirements of users on voice response, and further improve the user experience.
Referring to fig. 3, fig. 3 is a schematic diagram of a response flow of distributed voice according to an embodiment of the present application, and as shown in fig. 3, the response flow may specifically include:
S31, the user speaks a voice wake-up word.
S32, the pickup terminal 1 to the pickup terminal N are awakened.
S33, the pickup terminal 1 to the pickup terminal N submit the awakened audio to an analysis system.
And S34, after the analysis system analyzes the audio, the energy information and the sound source position information are returned to the pickup terminal.
And S35, the pickup terminal submits the energy information and the sound source position information to the decision system through a network, and records the arrival time of a reply message of the decision system.
S36, the pickup terminal calculates the time delay of the cloud system through the return time of the reply message, and reports the time delay data to the decision system.
S37, the decision system carries out comprehensive scoring on all the terminals through a fuzzy comprehensive evaluation method, and the terminal with the highest comprehensive scoring is a response terminal, for example, the decision system decides the response terminal as the pickup terminal 1.
S38, the decision system sends a wake-up command to the pickup terminal 1, informs the pickup terminal of the wake-up state, and responds to a subsequent voice command of the user.
And S39, the decision system sends a sleep command to other pickup terminals N, informs the pickup terminals N of entering a sleep state, and stops responding to subsequent voice instructions of the user.
The process of the decision system for deciding the response terminal may be as shown in fig. 4, and specifically may include:
s41, the pickup terminal sends a decision request to a decision system.
S42, the decision system distributes the session identifier, the session identifier is sent to the pickup terminal, and if the session identifier is overtime, the pickup terminal is dormant.
And S43, the pickup terminal performs overtime statistics and reports the overtime statistics to the decision system (client identification, session identification and delay time).
S44, the decision system responds to the decision and returns a decision result to the pickup terminal.
Referring to fig. 5, fig. 5 provides a distributed voice response apparatus, the apparatus comprising:
a communication unit 501 for receiving a voice signal of at least one sound pickup terminal;
A processing unit 502, configured to perform speech recognition on the speech signal to determine whether the speech signal has a wake-up keyword; if the wake-up keyword exists, acquiring the position information, response time and energy information of voice information of the pick-up terminal; and selecting a response terminal of the voice signal from at least one pickup terminal according to the position information, the response time and the energy information.
The distributed voice response device provided by the application receives a voice signal of at least one pickup terminal, identifies the voice signal to determine whether the voice signal has a wake-up keyword, and acquires the position information, response time and energy information of the voice information of the pickup terminal if the voice signal has the wake-up keyword; and selecting the response terminal of the voice signal from the at least one pickup terminal according to the position information, the response time and the energy information, and further determining the response terminal through multiple dimensions, namely, through multiple aspects of the position information, the response time and the energy information, so as to meet the diversified requirements of users on voice response, and further improve the user experience.
In an alternative arrangement, the first and second modules,
The processing unit 502 is specifically configured to construct a comprehensive evaluation set, where the comprehensive evaluation set includes a plurality of levels; respectively acquiring a plurality of evaluation values of the position information, the response time and the energy information in the multi-level, and forming an evaluation matrix by the plurality of evaluation values; selecting a response terminal of the voice signal according to the evaluation matrix and a preset weight set;
the weight set is a weight value of the position information, the response time and the energy information.
The manner of obtaining the location information, the response time, and the energy information may be described in the embodiment shown in fig. 1, which is not described herein.
In an alternative arrangement, the first and second modules,
The processing unit 502 is specifically configured to calculate a product of the evaluation matrix and the weight set to obtain a product result, and select a response terminal of the voice signal according to the magnitude of the product result.
In an alternative arrangement, the first and second modules,
The processing unit 502 is further configured to calculate a decision score obtained by multiplying the product result by a preset coefficient set, and select a pickup terminal with the highest decision score as a response terminal of the voice signal;
the coefficient set includes: location information, response time, and coefficients of energy information.
In an alternative arrangement, the first and second modules,
The processing unit 502 is specifically configured to obtain the location information specifically includes: and acquiring the distance and time delay between the receiving end microphone and the pickup terminal microphone, and estimating the position information according to the distance, the time delay and the sound velocity.
In an alternative arrangement, the first and second modules,
The processing unit 502 is specifically configured to receive a decision request packet sent by the pickup terminal, and return a session identifier packet to the pickup terminal; and receiving response time returned by the pickup terminal, wherein the response time = session identification packet receiving time-decision request packet sending time.
The embodiment of the application can divide the functional units of the electronic device according to the method example, for example, each functional unit can be divided corresponding to each function, and two or more functions can be integrated in one processing unit. The integrated units may be implemented in hardware or in software functional units. It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.
The embodiment of the application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program makes a computer execute part or all of the steps of any one of the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above. The computer program product may be a software installation package, said computer comprising an electronic device.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. And the aforementioned memory includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (8)

1. A method of distributed speech response, the method comprising the steps of:
Receiving a voice signal of at least one pickup terminal, and identifying the voice signal to determine whether the voice signal has a wake-up keyword;
if the wake-up keyword exists, acquiring the position information, response time and energy information of voice information of the pick-up terminal;
The energy information is a short-time average energy value of the voice signal, and the response time is a difference value between the receiving time of the session identification packet and the sending time of the decision request packet;
selecting a response terminal of the voice signal from the at least one pickup terminal according to the position information, the response time and the energy information; the method specifically comprises the following steps:
Constructing a comprehensive evaluation set, wherein the comprehensive evaluation set comprises a plurality of grades;
Respectively establishing single-factor fuzzy evaluation of the position information, the response time and the energy information, and respectively acquiring evaluation matrixes of the position information, the response time and the energy information to obtain 3 evaluation matrixes;
Respectively determining factor weight vectors corresponding to the 3 evaluation matrixes, and respectively calculating fuzzy vectors of the position information, the response time and the energy information to obtain 3 fuzzy vectors, wherein the fuzzy vectors are products of the evaluation matrixes and the factor weight vectors;
and calculating the total score of the pickup terminal, wherein the total score is the product of the fuzzy vector and the corresponding factor grade score, and selecting the pickup terminal with the highest total score as the response terminal.
2. The method of claim 1, wherein selecting the response terminal for the speech signal according to the evaluation matrix and the predetermined set of weights comprises:
and calculating the product of the evaluation matrix and the weight set to obtain a product result, and selecting a response terminal of the voice signal according to the size of the product result.
3. The method of claim 2, further comprising, after calculating the product of the evaluation matrix and the set of weights to obtain a product result:
Calculating the product result and multiplying the product result by a preset coefficient set to obtain a decision score, and selecting a pickup terminal with the highest decision score as a response terminal of the voice signal;
the coefficient set includes: location information, response time, and coefficients of energy information.
4. The method according to claim 1, wherein obtaining location information comprises:
And acquiring the distance and time delay between the receiving end microphone and the pickup terminal microphone, and estimating the position information according to the distance, the time delay and the sound velocity.
5. The method according to claim 1, wherein obtaining the response time comprises:
receiving a decision request packet sent by a pickup terminal, and returning a session identification packet to the pickup terminal;
and receiving response time returned by the pickup terminal, wherein the response time is the difference between the receiving time of the session identification packet and the sending time of the decision request packet.
6. A distributed voice response apparatus, the apparatus comprising:
a communication unit for receiving a voice signal of at least one sound pickup terminal;
The processing unit is used for carrying out voice recognition on the voice signal to determine whether the voice signal has a wake-up keyword or not; if the wake-up keyword exists, acquiring the position information, response time and energy information of voice information of the pick-up terminal; selecting a response terminal of the voice signal from at least one pickup terminal according to the position information, the response time and the energy information;
The energy information is a short-time average energy value of the voice signal, and the response time is a difference value between the receiving time of the session identification packet and the sending time of the decision request packet;
the processing unit is specifically configured to construct a comprehensive evaluation set, where the comprehensive evaluation set includes a plurality of levels;
Respectively establishing single-factor fuzzy evaluation of the position information, the response time and the energy information, and respectively acquiring evaluation matrixes of the position information, the response time and the energy information to obtain 3 evaluation matrixes;
Respectively determining factor weight vectors corresponding to the 3 evaluation matrixes, and respectively calculating fuzzy vectors of the position information, the response time and the energy information to obtain 3 fuzzy vectors, wherein the fuzzy vectors are products of the evaluation matrixes and the factor weight vectors;
and calculating the total score of the pickup terminal, wherein the total score is the product of the fuzzy vector and the corresponding factor grade score, and selecting the pickup terminal with the highest total score as the response terminal.
7. An intelligent terminal, characterized in that, the intelligent terminal includes: comprising the following steps: a processor and a memory coupled to each other;
Wherein the processor is configured to invoke a computer program stored in the memory to perform the method of any of claims 1 to 5.
8. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-5.
CN202011596480.4A 2020-12-29 2020-12-29 Distributed voice response method and related device Active CN112735391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011596480.4A CN112735391B (en) 2020-12-29 2020-12-29 Distributed voice response method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011596480.4A CN112735391B (en) 2020-12-29 2020-12-29 Distributed voice response method and related device

Publications (2)

Publication Number Publication Date
CN112735391A CN112735391A (en) 2021-04-30
CN112735391B true CN112735391B (en) 2024-05-31

Family

ID=75610148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011596480.4A Active CN112735391B (en) 2020-12-29 2020-12-29 Distributed voice response method and related device

Country Status (1)

Country Link
CN (1) CN112735391B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146612A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 Voice guide method, device, smart machine and server
CN107919119A (en) * 2017-11-16 2018-04-17 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and the computer-readable medium of more equipment interaction collaborations
CN108256710A (en) * 2016-12-28 2018-07-06 上海大唐移动通信设备有限公司 A kind of user's perception evaluating method and device
CN110619467A (en) * 2019-09-17 2019-12-27 电子科技大学 Power equipment state evaluation method based on alarm big data information
CN110875041A (en) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 Voice control method, device and system
CN111192591A (en) * 2020-02-10 2020-05-22 百度在线网络技术(北京)有限公司 Awakening method and device of intelligent equipment, intelligent sound box and storage medium
CN111613221A (en) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 Nearby awakening method, device and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6985852B2 (en) * 2001-08-21 2006-01-10 Microsoft Corporation Method and apparatus for dynamic grammars and focused semantic parsing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256710A (en) * 2016-12-28 2018-07-06 上海大唐移动通信设备有限公司 A kind of user's perception evaluating method and device
CN107146612A (en) * 2017-04-10 2017-09-08 北京猎户星空科技有限公司 Voice guide method, device, smart machine and server
CN107919119A (en) * 2017-11-16 2018-04-17 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and the computer-readable medium of more equipment interaction collaborations
CN110875041A (en) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 Voice control method, device and system
CN110619467A (en) * 2019-09-17 2019-12-27 电子科技大学 Power equipment state evaluation method based on alarm big data information
CN111192591A (en) * 2020-02-10 2020-05-22 百度在线网络技术(北京)有限公司 Awakening method and device of intelligent equipment, intelligent sound box and storage medium
CN111613221A (en) * 2020-05-22 2020-09-01 云知声智能科技股份有限公司 Nearby awakening method, device and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
情感化语音交互设计――百度AI用户体验部门人机交互研究地图与设计案例;孙妍彦;李士岩;陈宪涛;;装饰;20191115(第11期);全文 *
模糊评价在器乐演奏评定中的应用;袁剑;;电脑知识与技术;20100815(第23期);全文 *

Also Published As

Publication number Publication date
CN112735391A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN110211580B (en) Multi-intelligent-device response method, device, system and storage medium
CN108847219B (en) Awakening word preset confidence threshold adjusting method and system
US11818003B2 (en) Electronic device for supporting access to wireless media using target wake time (TWT) defined in IEEE 802.11 standard
US11282520B2 (en) Method, apparatus and device for interaction of intelligent voice devices, and storage medium
US10207183B2 (en) Wireless gaming protocol
US10861450B2 (en) Method and apparatus for managing voice-based interaction in internet of things network system
US11514917B2 (en) Method, device, and system of selectively using multiple voice data receiving devices for intelligent service
WO2021136037A1 (en) Voice wake-up method, apparatus, and system
US10410651B2 (en) De-reverberation control method and device of sound producing equipment
CN110265006B (en) Wake-up method, master node, slave node, and storage medium
CN109213732B (en) Method for improving photo album classification, mobile terminal and computer readable storage medium
US11031011B2 (en) Electronic device and method for determining electronic device to perform speech recognition
CN109473092B (en) Voice endpoint detection method and device
CN110248021A (en) A kind of smart machine method for controlling volume and system
CN112334978A (en) Electronic device supporting personalized device connection and method thereof
CN113470634A (en) Control method of voice interaction equipment, server and voice interaction equipment
CN109686370A (en) The method and device of fighting landlord game is carried out based on voice control
CN112585675B (en) Method, apparatus and system for intelligent service selectively using multiple voice data receiving devices
CN112735391B (en) Distributed voice response method and related device
CN113709629A (en) Frequency response parameter adjusting method, device, equipment and storage medium
CN115150501A (en) Voice interaction method and electronic equipment
CN112669844B (en) Method for controlling equipment through voice paste, equipment control method and device
CN113177816A (en) Information processing method and device
CN117478924B (en) Digital live broadcast control method and related device based on artificial intelligence
CN113873325B (en) Sound processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant