CN108962235B - Voice interaction method and device - Google Patents

Voice interaction method and device Download PDF

Info

Publication number
CN108962235B
CN108962235B CN201711446766.2A CN201711446766A CN108962235B CN 108962235 B CN108962235 B CN 108962235B CN 201711446766 A CN201711446766 A CN 201711446766A CN 108962235 B CN108962235 B CN 108962235B
Authority
CN
China
Prior art keywords
content acquisition
acquisition instruction
instruction
content
skill field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711446766.2A
Other languages
Chinese (zh)
Other versions
CN108962235A (en
Inventor
高慧湍
韩伟
李茂全
李宝祥
修铭徽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Orion Star Technology Co Ltd
Original Assignee
Beijing Orion Star Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Orion Star Technology Co Ltd filed Critical Beijing Orion Star Technology Co Ltd
Priority to CN201711446766.2A priority Critical patent/CN108962235B/en
Publication of CN108962235A publication Critical patent/CN108962235A/en
Application granted granted Critical
Publication of CN108962235B publication Critical patent/CN108962235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voice interaction method and a voice interaction device, wherein the method comprises the following steps: receiving a first content acquisition instruction, and acquiring content according to the first content acquisition instruction; if a second content acquisition instruction is received within a preset time period, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or a related skill field; if the second content acquisition instruction and the first content acquisition instruction belong to the same technical field or related technical fields, acquiring content according to the second content acquisition instruction, and realizing that a user expresses an intention through multiple content acquisition instructions; and because the fields of noise and the like in the surrounding environment are generally irrelevant to the skill field of the content acquisition instruction of the user, the embodiment can avoid the voice equipment from executing the voice instruction in the surrounding environment by mistake, thereby improving the voice interaction efficiency and improving the experience of the user in using the voice equipment.

Description

Voice interaction method and device
Technical Field
The invention relates to the technical field of voice equipment, in particular to a voice interaction method and device.
Background
The current voice interaction methods mainly comprise two methods, one is that after each awakening, a voice instruction is executed only once. The other is to allow the execution of voice commands received within a certain time period after each wake-up. However, the first scheme requires the user to frequently wake up the voice device, and particularly, in the case that the user cannot express the intention through one voice instruction, it is difficult to achieve effective interaction between the user and the voice device through the first scheme. In the second scheme, because the voice device is generally used in an open scene, and there are many noises and background sounds faced, it is easy to cause the voice device to execute a voice instruction in the surrounding environment "mistakenly", so that it is difficult to realize effective interaction between the user and the voice device, and the voice interaction efficiency and the experience of the user in using the voice device are reduced.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present invention is to provide a voice interaction method, which is used for solving the problems that the voice interaction efficiency is poor and the experience of the user using the voice device is affected in the prior art.
The second objective of the present invention is to provide a voice interaction device.
A third object of the invention is to propose an electronic device.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a voice interaction method, including:
receiving a first content acquisition instruction, and acquiring content according to the first content acquisition instruction;
if a second content acquisition instruction is received within a preset time period, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or a related skill field;
and if the second content acquisition instruction and the first content acquisition instruction are determined to belong to the same skill field or the related skill field, acquiring content according to the second content acquisition instruction.
Further, the method further comprises the following steps:
and if the second content acquisition instruction and the first content acquisition instruction are determined not to belong to the same technical field or the related technical field, not responding to the second content acquisition instruction.
Further, if it is determined that the second content obtaining instruction and the first content obtaining instruction belong to the same technical field, obtaining content according to the second content obtaining instruction specifically includes:
acquiring content according to the analysis result of the second content acquisition instruction and in combination with the analysis result of the first content acquisition instruction;
if it is determined that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field, acquiring content according to the second content acquisition instruction, specifically including:
and acquiring the content according to the analysis result of the second content acquisition instruction.
Further, the preset time period is determined according to the skill field to which the first content acquisition instruction belongs.
Further, determining whether the second content obtaining instruction and the first content obtaining instruction belong to the same skill field or a related skill field specifically includes:
determining a first skill field to which the first content acquisition instruction belongs and a second skill field to which the second content acquisition instruction belongs according to an instruction analysis result;
if the first skill field is the same as the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the same skill field;
if the first skill field is different from the second skill field, inquiring a preset related field mapping rule, and determining a preset related skill field corresponding to the first skill field;
and if the preset related skill field comprises the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field.
Further, before determining whether the second content obtaining instruction and the first content obtaining instruction belong to the same skill field or a related skill field, the method further includes:
determining that the second content acquisition instruction is not a wake-up instruction.
Further, the method further comprises the following steps: and responding to the awakening instruction if the second content acquisition instruction is the awakening instruction.
The voice interaction method provided by the embodiment receives a first content acquisition instruction, and acquires content according to the first content acquisition instruction; if a second content acquisition instruction is received within a preset time period, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or a related skill field; if the second content acquisition instruction and the first content acquisition instruction belong to the same technical field or related technical fields, acquiring content according to the second content acquisition instruction, and realizing that a user expresses an intention through multiple content acquisition instructions; and because the fields of noise and the like in the surrounding environment are generally irrelevant to the skill field of the content acquisition instruction of the user, the embodiment can avoid the voice equipment from executing the voice instruction in the surrounding environment by mistake, thereby improving the voice interaction efficiency and improving the experience of the user in using the voice equipment.
In order to achieve the above object, a second aspect of the present invention provides a voice interaction apparatus, including:
the acquisition module is used for receiving a first content acquisition instruction and acquiring content according to the first content acquisition instruction;
the judging module is used for judging whether a second content acquisition instruction and the first content acquisition instruction belong to the same skill field or a related skill field when the second content acquisition instruction is received within a preset time period;
the obtaining module is further configured to obtain content according to the second content obtaining instruction when it is determined that the second content obtaining instruction and the first content obtaining instruction belong to the same skill field or a related skill field.
Further, the device further comprises:
and the processing module is used for not responding to the second content acquisition instruction when the second content acquisition instruction and the first content acquisition instruction are determined not to belong to the same skill field or the related skill field.
Further, the obtaining module is specifically configured to,
when the second content acquisition instruction and the first content acquisition instruction belong to the same skill field, acquiring content according to the analysis result of the second content acquisition instruction and in combination with the analysis result of the first content acquisition instruction;
and when the second content acquisition instruction and the first content acquisition instruction are determined to belong to the related skill field, acquiring content according to the analysis result of the second content acquisition instruction.
Further, the preset time period is determined according to the skill field to which the first content acquisition instruction belongs.
Further, the judging module is specifically configured to,
determining a first skill field to which the first content acquisition instruction belongs and a second skill field to which the second content acquisition instruction belongs according to an instruction analysis result;
if the first skill field is the same as the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the same skill field;
if the first skill field is different from the second skill field, inquiring a preset related field mapping rule, and determining a preset related skill field corresponding to the first skill field;
and if the preset related skill field comprises the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field.
Further, the determining module is further configured to determine that the second content obtaining instruction is not a wake-up instruction before determining whether the second content obtaining instruction and the first content obtaining instruction belong to the same skill field or a related skill field.
Further, the device further comprises:
and the response module is used for responding to the awakening instruction when the second content acquisition instruction is the awakening instruction.
The voice interaction device provided by the embodiment receives a first content acquisition instruction, and acquires content according to the first content acquisition instruction; if a second content acquisition instruction is received within a preset time period, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or a related skill field; if the second content acquisition instruction and the first content acquisition instruction belong to the same technical field or related technical fields, acquiring content according to the second content acquisition instruction, and realizing that a user expresses an intention through multiple content acquisition instructions; and because the fields of noise and the like in the surrounding environment are generally irrelevant to the skill field of the content acquisition instruction of the user, the embodiment can avoid the voice equipment from executing the voice instruction in the surrounding environment by mistake, thereby improving the voice interaction efficiency and improving the experience of the user in using the voice equipment.
To achieve the above object, a third aspect of the present invention provides an electronic device, including: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of voice interaction as described above when executing the program.
In order to achieve the above object, a fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the voice interaction method as described above.
In order to achieve the above object, a fifth embodiment of the present invention provides a computer program product, which when executed by an instruction processor in the computer program product, implements the voice interaction method as described above.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of a voice interaction method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a voice interaction apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a voice interaction method and apparatus according to an embodiment of the present invention with reference to the drawings.
Fig. 1 is a schematic flow chart of a voice interaction method according to an embodiment of the present invention. As shown in fig. 1, the voice interaction method includes the following steps:
s101, receiving a first content acquisition instruction, and acquiring content according to the first content acquisition instruction.
The execution main body of the voice interaction method provided by the invention is a voice interaction device, and the voice interaction device can be a background server corresponding to the voice equipment or the voice equipment. The voice device may be, for example, a smart sound box, a smart air conditioner, a smart washing machine, a smart television, or the like, which may perform voice interaction with a user and perform corresponding operations according to an instruction of the user.
In this embodiment, in the case that the voice interaction device is a background server corresponding to the voice device, the first content acquisition instruction may be acquired in a manner that, in the process of interacting between the voice device and the user, the voice instruction of the user is obtained by monitoring and then is directly sent to the background server. After the background server acquires the first content acquisition instruction, the background server can perform voice recognition on the first content acquisition instruction, acquire an analysis result of the first content acquisition instruction, and acquire content according to the analysis result of the first content acquisition instruction.
In this embodiment, in a case that the voice interaction apparatus is a voice device, the first content obtaining instruction may be obtained by monitoring an obtained voice instruction of the user during an interaction between the voice device and the user. After the voice interaction device acquires the first content acquisition instruction, voice recognition can be performed on the first content acquisition instruction to acquire an analysis result of the first content acquisition instruction, and content is acquired according to the analysis result of the first content acquisition instruction.
In this embodiment, the content may be a result of a response to the first content acquisition instruction. For example, when the first content obtaining instruction is "i want to hear the forgetting water", the corresponding content may be "the forgetting water of the passerby a version"; for another example, when the first content acquiring instruction is "i want to hear the logical thinking", the corresponding content may be "the 12 th set of logical thinking"; when the first content acquisition instruction is "inquire weather", the corresponding content may be "raining".
And S102, if the second content acquisition instruction is received within the preset time period, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or the related skill field.
The preset time period is determined according to the skill field to which the first content acquisition instruction belongs. In this embodiment, before step 102, after receiving the first content acquisition instruction, the voice interaction device may determine, according to an analysis result of the first content acquisition instruction, a first skill field to which the first content acquisition instruction belongs, determine a preset time period corresponding to the first skill field, perform timing, and determine whether a second content acquisition instruction is received within the preset time period; if the second content acquisition instruction is not received within the preset time period, the voice interaction is finished.
In this embodiment, in the case that the voice interaction apparatus is a background server corresponding to the voice device, after the voice interaction is finished, the voice interaction apparatus may send a stop interaction instruction to the voice device, so that the voice device does not receive the voice instruction any more until the voice device receives a wake-up instruction of a user, and after the wake-up operation is performed, the voice instruction is received again and sent to the voice interaction apparatus. Under the condition that the voice interaction device is a voice device, after the voice interaction is finished, the voice interaction device does not receive the voice instruction until the awakening instruction of the user is received, and after the awakening operation is carried out, the voice instruction of the user is received again.
In this embodiment, when the voice interaction device is a background server corresponding to the voice device, the first way for determining, by the voice interaction device, the first skill field to which the first content acquisition instruction belongs according to the analysis result of the first content acquisition instruction may be: inputting the analysis result of the first content acquisition instruction into a preset skill field model to obtain the probability that the analysis result belongs to each skill field; and determining the first skill field to which the first content acquisition instruction belongs according to the probability that the analysis result belongs to each skill field. The preset skill field model can be a skill field model obtained by training a large number of sentences or words corresponding to each skill field.
In a case that the voice interaction device is a background server corresponding to the voice device, another way for determining, by the voice interaction device, the first skill field to which the first content acquisition instruction belongs according to the analysis result of the first content acquisition instruction may be: performing word segmentation on the analysis result of the first content acquisition instruction to acquire a word segmentation result; comparing each word in the word segmentation result with the words in each skill field, and determining the number of the words belonging to each skill field in the word segmentation result; and determining a first skill field to which the first content acquisition instruction belongs according to the number of words belonging to each skill field in the word segmentation result.
Of course, the first skill area to which the first content acquisition instruction belongs may be determined in other ways, which are not illustrated here.
The implementation of determining the second skill area to which the second content acquisition instruction belongs may be the same as the implementation of determining the first skill area to which the first content acquisition instruction belongs, and will not be described in detail here.
In this embodiment, the preset time period corresponding to each skill field may be set according to actual needs, and is not specifically limited herein.
If the second content acquisition instruction is received within the preset time period, the first mode that whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or the related skill field can be specifically determined by the voice interaction device, and the first skill field to which the first content acquisition instruction belongs and the second skill field to which the second content acquisition instruction belongs are determined according to the instruction analysis result; if the first skill field is the same as the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the same skill field; if the first skill field is different from the second skill field, inquiring a preset related field mapping rule, and determining a preset related skill field corresponding to the first skill field; and if the preset related skill field comprises a second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field. And if the preset related skill field does not comprise the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction do not belong to the same skill field or the related skill field.
And the preset related skill fields corresponding to the skill fields are stored in the preset related field mapping rule.
If a second content acquisition instruction is received within a preset time period, the voice interaction device judges whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or a second mode of a related skill field, and the second mode specifically comprises the steps of determining a first skill field to which the first content acquisition instruction belongs and a second skill field to which the second content acquisition instruction belongs according to an instruction analysis result; inquiring a preset related field mapping rule, and determining a preset related skill field corresponding to the first skill field; if the preset related skill field comprises a second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field; if the preset related skill field does not comprise a second skill field, judging whether the first skill field is the same as the second skill field, and if the first skill field is the same as the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the same skill field; and if the first skill field is different from the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction do not belong to the same skill field or the related skill field.
Further, on the basis of the above embodiment, before determining whether the second content obtaining instruction and the first content obtaining instruction belong to the same skill field or a related skill field, the voice interaction device may first determine whether the second content obtaining instruction is a wake-up instruction; if the second content acquisition instruction is not the awakening instruction, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or the related skill field; and responding to the awakening instruction if the second content acquisition instruction is the awakening instruction.
S103, if the second content acquisition instruction and the first content acquisition instruction belong to the same technical field or related technical fields, acquiring the content according to the second content acquisition instruction.
In this embodiment, if it is determined that the second content obtaining instruction and the first content obtaining instruction belong to the same technical field, the voice interaction device may obtain the content according to an analysis result of the second content obtaining instruction and in combination with an analysis result of the first content obtaining instruction. If it is determined that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field, the voice interaction device may acquire the content according to an analysis result of the second content acquisition instruction.
For example, in a case where the first content acquisition instruction is "i want to listen to water of forgetfulness", and the second content acquisition instruction is "i want to listen to liudebua", the second content acquisition instruction and the first content acquisition instruction belong to the same technical field, and the corresponding content may be "water of forgetfulness of liudebua". When the first content obtaining instruction is "i want to listen to the logical thinking" and the second content obtaining instruction is "9 th set", the second content obtaining instruction and the first content obtaining instruction belong to the same technical field, and the corresponding content may be "9 th set of logical thinking".
For another example, when the first content obtaining instruction is "inquire weather", and the second content obtaining instruction is "call me a car, i also get the car to the company", the second content obtaining instruction and the first content obtaining instruction belong to the related skill field, and the corresponding content may be "start the car-taking function", for example, turn up the car-taking software according to the location, automatically input the company address, reserve the journey, and the like.
In addition, it should be noted that the method further includes: if the second content acquisition instruction and the first content acquisition instruction are determined not to belong to the same skill field or the related skill field, the voice interaction device does not respond to the second content acquisition instruction, continues timing, and judges whether a third content acquisition instruction is received before a preset time period is reached; if the third content acquisition instruction is not received, the voice interaction is finished.
The voice interaction method provided by the embodiment receives a first content acquisition instruction, and acquires content according to the first content acquisition instruction; if a second content acquisition instruction is received within a preset time period, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or a related skill field; if the second content acquisition instruction and the first content acquisition instruction belong to the same technical field or related technical fields, acquiring content according to the second content acquisition instruction, and realizing that a user expresses an intention through multiple content acquisition instructions; and because the fields of noise and the like in the surrounding environment are generally irrelevant to the skill field of the content acquisition instruction of the user, the embodiment can avoid the voice equipment from executing the voice instruction in the surrounding environment by mistake, thereby improving the voice interaction efficiency and improving the experience of the user in using the voice equipment.
Fig. 2 is a schematic structural diagram of a voice interaction apparatus according to an embodiment of the present invention. As shown in fig. 2, includes: an acquisition module 21 and a judgment module 22.
The acquiring module 21 is configured to receive a first content acquiring instruction, and acquire content according to the first content acquiring instruction;
the judging module 22 is configured to, when a second content obtaining instruction is received within a preset time period, judge whether the second content obtaining instruction and the first content obtaining instruction belong to the same skill field or a related skill field;
the obtaining module 21 is further configured to obtain content according to the second content obtaining instruction when it is determined that the second content obtaining instruction and the first content obtaining instruction belong to the same skill field or a related skill field.
The voice interaction device provided by the invention can be specifically a voice device or a background server corresponding to the voice device. The voice device may be, for example, a smart sound box, a smart air conditioner, a smart washing machine, a smart television, or the like, which may perform voice interaction with a user and perform corresponding operations according to an instruction of the user.
In this embodiment, in a case that the voice interaction apparatus is a voice device, the first content obtaining instruction may be obtained by monitoring an obtained voice instruction of the user during an interaction between the voice device and the user. After the voice interaction device acquires the first content acquisition instruction, voice recognition can be performed on the first content acquisition instruction to acquire an analysis result of the first content acquisition instruction, and content is acquired according to the analysis result of the first content acquisition instruction.
In the case that the voice interaction device is a background server corresponding to the voice device, the first content acquisition instruction may be acquired in a manner that, in the process of interaction between the voice device and the user, the voice instruction of the user is monitored and acquired and then directly sent to the background server. After the background server acquires the first content acquisition instruction, the background server can perform voice recognition on the first content acquisition instruction, acquire an analysis result of the first content acquisition instruction, and acquire content according to the analysis result of the first content acquisition instruction.
The preset time period is determined according to the skill field to which the first content acquisition instruction belongs. In this embodiment, before step 102, after receiving the first content acquisition instruction, the voice interaction device may determine, according to an analysis result of the first content acquisition instruction, a first skill field to which the first content acquisition instruction belongs, determine a preset time period corresponding to the first skill field, perform timing, and determine whether a second content acquisition instruction is received within the preset time period; if the second content acquisition instruction is not received within the preset time period, the voice interaction is finished.
Under the condition that the voice interaction device is a voice device, after the voice interaction is finished, the voice interaction device does not receive the voice instruction until the awakening instruction of the user is received, and after the awakening operation is carried out, the voice instruction of the user is received again. Under the condition that the voice interaction device is a background server corresponding to the voice equipment, after the voice interaction is finished, the voice interaction device can send an interaction stopping instruction to the voice equipment, so that the voice equipment does not receive the voice instruction any more until the voice equipment receives a wake-up instruction of a user, and after the wake-up operation is carried out, the voice instruction is received again and sent to the voice interaction device.
Further, the determining module 22 may be specifically configured to determine, according to the instruction parsing result, a first skill field to which the first content obtaining instruction belongs and a second skill field to which the second content obtaining instruction belongs; if the first skill field is the same as the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the same skill field; if the first skill field is different from the second skill field, inquiring a preset related field mapping rule, and determining a preset related skill field corresponding to the first skill field; and if the preset related skill field comprises a second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field. And if the preset related skill field does not comprise the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction do not belong to the same skill field or the related skill field.
Further, the determining module 22 may be further configured to determine, according to the instruction parsing result, a first skill field to which the first content obtaining instruction belongs and a second skill field to which the second content obtaining instruction belongs; inquiring a preset related field mapping rule, and determining a preset related skill field corresponding to the first skill field; if the preset related skill field comprises a second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field; if the preset related skill field does not comprise a second skill field, judging whether the first skill field is the same as the second skill field, and if the first skill field is the same as the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the same skill field; and if the first skill field is different from the second skill field, determining that the second content acquisition instruction and the first content acquisition instruction do not belong to the same skill field or the related skill field.
And the preset related skill fields corresponding to the skill fields are stored in the preset related field mapping rule.
Further, the obtaining module 21 is specifically configured to, when it is determined that the second content obtaining instruction and the first content obtaining instruction belong to the same skill field, obtain a content according to an analysis result of the second content obtaining instruction and in combination with an analysis result of the first content obtaining instruction; and when the second content acquisition instruction and the first content acquisition instruction are determined to belong to the related skill field, acquiring content according to the analysis result of the second content acquisition instruction.
For example, in a case where the first content acquisition instruction is "i want to listen to water of forgetfulness", and the second content acquisition instruction is "i want to listen to liudebua", the second content acquisition instruction and the first content acquisition instruction belong to the same technical field, and the corresponding content may be "water of forgetfulness of liudebua". When the first content obtaining instruction is "i want to listen to the logical thinking" and the second content obtaining instruction is "9 th set", the second content obtaining instruction and the first content obtaining instruction belong to the same technical field, and the corresponding content may be "9 th set of logical thinking". Under the condition that the first content acquisition instruction is 'inquiring weather', and the second content acquisition instruction is 'calling me a car and getting me to taxi to a company', the second content acquisition instruction and the first content acquisition instruction belong to the related skill field, and the corresponding content can be 'starting taxi taking function', such as calling taxi taking software according to the place, automatically inputting a company address, reserving a journey and the like.
Further, on the basis of the above embodiment, the determining module 22 is further configured to determine whether the second content obtaining instruction is a wake-up instruction before determining whether the second content obtaining instruction and the first content obtaining instruction belong to the same skill field or a related skill field; and if the second content acquisition instruction is not the awakening instruction, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same technical field or the related technical field. In addition, the voice interaction device further comprises: and the response module is used for responding to the awakening instruction when the second content acquisition instruction is the awakening instruction.
Further, on the basis of the above embodiment, the apparatus may further include: the processing module is used for not responding to the second content acquisition instruction when the second content acquisition instruction and the first content acquisition instruction are determined not to belong to the same skill field or the related skill field, continuing timing and judging whether a third content acquisition instruction is received before a preset time period is reached; if the third content acquisition instruction is not received, the voice interaction is finished.
The voice interaction device provided by the embodiment receives a first content acquisition instruction, and acquires content according to the first content acquisition instruction; if a second content acquisition instruction is received within a preset time period, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field or a related skill field; if the second content acquisition instruction and the first content acquisition instruction belong to the same technical field or related technical fields, acquiring content according to the second content acquisition instruction, and realizing that a user expresses an intention through multiple content acquisition instructions; and because the fields of noise and the like in the surrounding environment are generally irrelevant to the skill field of the content acquisition instruction of the user, the embodiment can avoid the voice equipment from executing the voice instruction in the surrounding environment by mistake, thereby improving the voice interaction efficiency and improving the experience of the user in using the voice equipment.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device includes:
memory 1001, processor 1002, and computer programs stored on memory 1001 and executable on processor 1002.
The processor 1002, when executing the program, implements the voice interaction method provided in the above-described embodiments.
Further, the electronic device further includes:
a communication interface 1003 for communicating between the memory 1001 and the processor 1002.
A memory 1001 for storing computer programs that may be run on the processor 1002.
Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 1002 is configured to implement the voice interaction method according to the foregoing embodiment when executing the program.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on one chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through an internal interface.
The processor 1002 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of voice interaction as described above.
The invention also provides a computer program product, wherein the instructions of the computer program product realize the voice interaction method when being executed by the processor.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware associated with instructions of a program, which may be stored in a computer-readable storage medium, and when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (16)

1. A method of voice interaction, comprising:
receiving a first content acquisition instruction, and acquiring a response result of the first content acquisition instruction according to the first content acquisition instruction;
if a second content acquisition instruction is received within a preset time period, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill field;
if the second content acquisition instruction and the first content acquisition instruction belong to the same skill field, acquiring a response result of the second content acquisition instruction by combining the first content acquisition instruction according to the second content acquisition instruction; otherwise, the response result of the second content acquisition instruction is acquired by combining the first content acquisition instruction without the second content acquisition instruction.
2. The method of claim 1, further comprising:
if the second content acquisition instruction and the first content acquisition instruction are determined not to belong to the same skill field, judging whether the second content acquisition instruction and the first content acquisition instruction belong to the related skill field;
and if the second content acquisition instruction and the first content acquisition instruction are determined to belong to the related skill field, acquiring a response result of the second content acquisition instruction according to the second content acquisition instruction.
3. The method of claim 2, further comprising:
and if the second content acquisition instruction and the first content acquisition instruction are determined not to belong to the related skill field, not responding to the second content acquisition instruction.
4. The method of claim 1, wherein the predetermined time period is determined according to a skill area to which the first content acquisition instruction belongs.
5. The method of claim 2,
judging whether the second content acquisition instruction and the first content acquisition instruction belong to the related skill field, specifically comprising:
querying a preset related field mapping rule, and determining a preset related skill field corresponding to a first skill field, wherein the first skill field is a skill field to which the first content acquisition instruction belongs;
and if the preset related skill field comprises a second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field, wherein the second skill field is the skill field to which the second content acquisition instruction belongs.
6. The method of claim 1, further comprising, prior to determining whether the second content retrieval instruction is in the same skill area as the first content retrieval instruction:
determining that the second content acquisition instruction is not a wake-up instruction.
7. The method of claim 6, further comprising:
and responding to the awakening instruction if the second content acquisition instruction is the awakening instruction.
8. A voice interaction apparatus, comprising:
the acquisition module is used for receiving a first content acquisition instruction and acquiring a response result of the first content acquisition instruction according to the first content acquisition instruction;
the judging module is used for judging whether a second content acquisition instruction and the first content acquisition instruction belong to the same skill field or not if the second content acquisition instruction is received within a preset time period;
the obtaining module is further configured to, if it is determined that the second content obtaining instruction and the first content obtaining instruction belong to the same technical field, obtain, according to the second content obtaining instruction, a response result of the second content obtaining instruction in combination with the first content obtaining instruction; otherwise, the response result of the second content acquisition instruction is acquired by combining the first content acquisition instruction without the second content acquisition instruction.
9. The apparatus according to claim 8, wherein the determining module is further configured to determine whether the second content obtaining instruction and the first content obtaining instruction belong to related skill fields if it is determined that the second content obtaining instruction and the first content obtaining instruction do not belong to the same skill field;
the obtaining module is further configured to obtain a response result of the second content obtaining instruction according to the second content obtaining instruction if it is determined that the second content obtaining instruction and the first content obtaining instruction belong to the related skill field.
10. The apparatus of claim 8, further comprising:
and the processing module is used for not responding to the second content acquisition instruction if the second content acquisition instruction and the first content acquisition instruction are determined not to belong to the related skill field.
11. The apparatus of claim 8, wherein the predetermined time period is determined according to a skill area to which the first content acquisition instruction belongs.
12. The apparatus according to claim 9, wherein the determining module is specifically configured to,
querying a preset related field mapping rule, and determining a preset related skill field corresponding to a first skill field, wherein the first skill field is a skill field to which the first content acquisition instruction belongs;
and if the preset related skill field comprises a second skill field, determining that the second content acquisition instruction and the first content acquisition instruction belong to the related skill field, wherein the second skill field is the skill field to which the second content acquisition instruction belongs.
13. The apparatus of claim 8, wherein the determining module is further configured to determine that the second content acquisition instruction is not a wake-up instruction before determining whether the second content acquisition instruction and the first content acquisition instruction belong to the same skill area.
14. The apparatus of claim 13, further comprising:
and the response module is used for responding to the awakening instruction if the second content acquisition instruction is the awakening instruction.
15. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, which when executed by the processor implements the method of voice interaction according to any of claims 1-7.
16. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the voice interaction method of any one of claims 1-7.
CN201711446766.2A 2017-12-27 2017-12-27 Voice interaction method and device Active CN108962235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711446766.2A CN108962235B (en) 2017-12-27 2017-12-27 Voice interaction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711446766.2A CN108962235B (en) 2017-12-27 2017-12-27 Voice interaction method and device

Publications (2)

Publication Number Publication Date
CN108962235A CN108962235A (en) 2018-12-07
CN108962235B true CN108962235B (en) 2021-09-17

Family

ID=64495731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711446766.2A Active CN108962235B (en) 2017-12-27 2017-12-27 Voice interaction method and device

Country Status (1)

Country Link
CN (1) CN108962235B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960754A (en) * 2019-03-21 2019-07-02 珠海格力电器股份有限公司 A kind of speech ciphering equipment and its voice interactive method, device and storage medium
CN110047481B (en) * 2019-04-23 2021-07-09 百度在线网络技术(北京)有限公司 Method and apparatus for speech recognition
CN110838292A (en) * 2019-09-29 2020-02-25 广东美的白色家电技术创新中心有限公司 Voice interaction method, electronic equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103594089A (en) * 2013-11-18 2014-02-19 联想(北京)有限公司 Voice recognition method and electronic device
US9098467B1 (en) * 2012-12-19 2015-08-04 Rawles Llc Accepting voice commands based on user identity
CN105448293A (en) * 2014-08-27 2016-03-30 北京羽扇智信息科技有限公司 Voice monitoring and processing method and voice monitoring and processing device
CN106648530A (en) * 2016-11-21 2017-05-10 海信集团有限公司 Voice control method and terminal

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040138885A1 (en) * 2003-01-09 2004-07-15 Xiaofan Lin Commercial automatic speech recognition engine combinations
CN105404161A (en) * 2015-11-02 2016-03-16 百度在线网络技术(北京)有限公司 Intelligent voice interaction method and device
CN105810194B (en) * 2016-05-11 2019-07-05 北京奇虎科技有限公司 Speech-controlled information acquisition methods and intelligent terminal under standby mode
CN107293293A (en) * 2017-05-22 2017-10-24 深圳市搜果科技发展有限公司 A kind of voice instruction recognition method, system and robot

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9098467B1 (en) * 2012-12-19 2015-08-04 Rawles Llc Accepting voice commands based on user identity
CN103594089A (en) * 2013-11-18 2014-02-19 联想(北京)有限公司 Voice recognition method and electronic device
CN105448293A (en) * 2014-08-27 2016-03-30 北京羽扇智信息科技有限公司 Voice monitoring and processing method and voice monitoring and processing device
CN106648530A (en) * 2016-11-21 2017-05-10 海信集团有限公司 Voice control method and terminal

Also Published As

Publication number Publication date
CN108962235A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN107919130B (en) Cloud-based voice processing method and device
CN107591151B (en) Far-field voice awakening method and device and terminal equipment
CN107610702B (en) Terminal device standby awakening method and device and computer device
US20190325888A1 (en) Speech recognition method, device, apparatus and computer-readable storage medium
CN108962235B (en) Voice interaction method and device
CN107591152B (en) Voice control method, device and equipment based on earphone
JP6811755B2 (en) Voice wake-up method by reading, equipment, equipment and computer-readable media, programs
CN108009303B (en) Search method and device based on voice recognition, electronic equipment and storage medium
CN107256707B (en) Voice recognition method, system and terminal equipment
CN111091813B (en) Voice wakeup model updating and wakeup method, system, device, equipment and medium
CN108932944B (en) Decoding method and device
CN105047198A (en) Voice error correction processing method and apparatus
CN108039175B (en) Voice recognition method and device and server
CN110910878B (en) Voice wake-up control method and device, storage medium and household appliance
CN111954868A (en) Multi-voice assistant control method, device, system and computer readable storage medium
CN109979467B (en) Human voice filtering method, device, equipment and storage medium
CN106887228B (en) Robot voice control method and device and robot
CN112230874A (en) Electronic device and control method thereof
CN111123728B (en) Unmanned vehicle simulation method, device, equipment and computer readable medium
CN111354336B (en) Distributed voice interaction method, device, system and household appliance
CN111177878B (en) Derived simulation scene screening method, device and terminal
CN116126937A (en) Job scheduling method, job scheduling device, electronic equipment and storage medium
CN113129902B (en) Voice processing method and device, electronic equipment and storage medium
CN114090054A (en) Intelligent equipment upgrading method and device, storage medium and electronic equipment
CN110718224B (en) Voice control method and device, storage medium and intelligent device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant