CN109492126A

CN109492126A - A kind of intelligent interactive method and device

Info

Publication number: CN109492126A
Application number: CN201811301902.3A
Authority: CN
Inventors: 侯桂森
Original assignee: Langfang Senmiaochun Domestic Fungus Co Ltd
Current assignee: Langfang Senmiaochun Domestic Fungus Co Ltd
Priority date: 2018-11-02
Filing date: 2018-11-02
Publication date: 2019-03-19
Anticipated expiration: 2038-11-02
Also published as: CN109492126B

Abstract

The invention discloses a kind of intelligent interactive method and devices, this method comprises: receiving content of text；The content of text is matched with each search scene, with the determining and matched audio content of the content of text, described search scene includes being intended to enquirement template scene, response context or the sentence contextual information scene of collection；The audio content is returned into user.Technical solution provided in an embodiment of the present invention, provided with a variety of search scenes, therefore the content of text and its relevant marked content and attribute converted for audio user can be matched with a variety of search scenes, so as to realize the complex query to audio, it could be formed with interacting for interest with user, improve user experience.

Description

A kind of intelligent interactive method and device

Technical field

The present invention relates to field of artificial intelligence, more particularly to a kind of intelligent interactive method and device.

Background technique

With the development of artificial intelligence technology, major scientific & technical corporation releases the intelligent sound box of oneself development respectively, and in city The promotion and popularization of great dynamics are carried out on field.Intelligent sound box is becoming the entrance of Intelligent housing, and has natural language Interactive ability, closer to the use habit of user.

Intelligent sound box can carry out content casting using on-demand at present.On-demand is usually used by the user to play audio (music, story, poem, joke etc.) complete content, user can say the name of content, and then intelligent sound box finds correspondence Audio and play out.

Inventor has found that user can not carry out the premium content of on-demand in the research process to the prior art More complicated inquiry can not also form more interesting interaction, for example user thinks: " play Wang Fei in Spring Festival Gala song " years " to sing Harmony part ", can be supported almost without the technical ability of intelligent sound box at present.

Summary of the invention

In order to solve the above technical problems, the embodiment of the invention provides a kind of intelligent interactive method and device, technical solution It is as follows:

A kind of intelligent interactive method is applied to intelligent sound box, comprising:

Receive content of text；

The content of text is matched with each search scene, in the determining and matched audio of the content of text Hold, described search scene includes being intended to enquirement template scene, response context or the sentence contextual information scene of collection；

The audio content is returned into user.

Preferably, content of text is matched with each search scene, with the determining and matched sound of the content of text Frequency content, comprising:

Judge whether have and the matched enquirement template of the content of text in the enquirement template for being intended to collect；

If have with the matched enquirement template of the content of text, obtain the intention of the content of text and the slot value of keyword Title；

Retrieval obtains audio collection corresponding with the matched enquirement template；

Retrieval is labeled with the sub-audio collection of the intention from the audio collection, and concentrates to retrieve from the sub-audio and include The audio content of the slot value title, and the audio content is returned into user.

Preferably, when not enquirement template matched with the content of text, by content of text and each search scene It is matched, with the determining and matched audio content of the content of text, further includes:

Judge whether that audio context relevant to the content of text can be matched from audio database；

If relevant audio context can be matched, the audio context is returned into user.

Preferably, if failing to match audio context relevant to the content of text, further includes:

The content of text is sent to intelligent response service, and receives the intelligent response that the intelligent response service returns As a result；

Judge whether that the audio met the requirements with the intelligent response result similarity can be matched from audio database；

If the audio met the requirements with the intelligent response result similarity can be matched, the audio is returned into use Family.

Preferably, when the audio context for not having similarity to meet the requirements, by content of text and each search scene into Row matching, with the determining and matched content of text of the content of text, further includes:

Extract the keyword of the content of text；

Judge whether that audio content corresponding with the keyword can be retrieved from the contextual information of sentence；

If so, the audio content retrieved is returned to user.

Preferably, further includes:

For each audio production, the audio production is divided into N number of audio file, N is positive integer；

The audio file being partitioned into is converted into text, attribute labeling is carried out to the text and determines the upper of text Context information, wherein carrying out the slot value title that attribute labeling includes the intention of determining text and the keyword of text to text；

Store the text.

A kind of intelligent interaction device is applied to intelligent sound box, comprising:

Semantic understanding module and audio search module；

The semantic understanding module, for receiving content of text；

The audio search module, for matching the content of text with each search scene, with determining and institute The matched audio content of content of text is stated, described search scene includes the enquirement template scene for being intended to collection, response context Or sentence contextual information scene；The audio content is returned into user.

Preferably, the semantic understanding module is specifically used for:

The slot of the intention and keyword of extracting the content of text plants title；

Correspondingly, the audio search module is specifically used for:

Preferably, when not enquirement template matched with the content of text, the audio search module is also used to:

Preferably, if failing to match audio context relevant to the content of text, the audio search module is also For:

Preferably, when the audio context for not having similarity to meet the requirements, the semantic understanding module is also used to:

Extract the keyword of the content of text；

Correspondingly, the audio search module is also used to:

If so, the audio content retrieved is returned to user.

Preferably, further includes:

Audio preprocessing module, for for each audio production, the audio production to be divided into N number of audio text Part, N are positive integer；The audio file being partitioned into is converted into text, attribute labeling is carried out to the text and determines text This contextual information, wherein carrying out the slot value name that attribute labeling includes the intention of determining text and the keyword of text to text Claim；Store the text.

Technical solution provided in an embodiment of the present invention, be provided with a variety of search scenes, therefore for audio user conversion and The content of text come can be matched with a variety of search scenes, so as to realize the complex query to audio, with user The interaction that could be formed with interest, improves user experience.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 is a kind of a kind of flow diagram of intelligent interactive method provided by the embodiment of the present invention；

Fig. 2 is a kind of another flow diagram of intelligent interactive method provided by the embodiment of the present invention；

Fig. 3 is a kind of another flow diagram of intelligent interactive method provided by the embodiment of the present invention；

Fig. 4 is a kind of a kind of structural schematic diagram of intelligent interaction device provided by the embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that the described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this Embodiment in invention, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall within the protection scope of the present invention.

Referring to Fig. 1, Fig. 1 is a kind of intelligent interactive method sound intermediate frequency data segmentation provided in an embodiment of the present invention and mark Implementation flow chart, comprising: step S101, cutting audio file.

One complete audio content file, such as a song or the audio of a films and television programs.It can be according to certain rule Then (such as divide according to every words, or according to the role's segmentation etc. for generating audio).The mode of segmentation can be silent by setting The threshold value of time or the different characteristic of enunciator, are split automatically；It can also be divided by hand using certain softwares for editing.Point Need to carry out some inspection verification repairs after cutting.

Step S102, the mark of a variety of attributes is carried out to the file after cutting.

Audio file after cutting, it is necessary first to obtain its corresponding content of text.These content of text can come from sound Frequently relevant caption information can also be converted into text by it by current speech-to-text service.Text is carried out later Attribute labeling and the contextual information for determining text.

The slot value title that attribute labeling includes the intention of determining text and the keyword of text is carried out to text.

The intention of text can also be referred to as the label of text, it is intended that indicate intention expressed by text.Such as in sentence Hold be " hornet, you are done well ", expression exactly " evaluate " or the intention of " encouragement ", such as content of the sentence " quickly, I Can control ocean, ha ha ha ", expression be " plan " or " hope " this intention.For the text converted This, can be marked out come one or more intention corresponding to text by artificial mode, there is certain artificial mark number According to rear, machine can be given and learnt, after machine learning obtains model, the mark that can also be intended to by machine.

The keyword of text: for some intentions, some information can be described in more detail below intention, such as For " evaluation " is intended to, " evaluated person " this information is exactly a significant correlation information.Correspond to " the rheum officinale in example above Bee " just belongs to " evaluated person " information.In text and being intended to related word or phrase records out, as keyword, Also it is recorded, and gives different types of keyword, different slot value titles.Such as or " the rheum officinale in example above Bee " can be marked as " evaluated person: hornet ", for sentence " red spider goes automobile people to drive away ", the meaning marked Figure exactly " is ordered ", and keyword is " command object: red spider ".These information can also manually mark or machine mark Mode carries out.

The contextual information of text include: work title belonging to text, where text be which collection, this collection name Place that title, speaker, text are uttered, the time that text is uttered, the upper word content of response, the mood of text representation and Tone etc..These contents mark accuracy by artificial form generally can be higher, but is also that can pass through machine learning After mark rule, marked by machine.

Step S103, audio file and annotation results after storing cutting.

Each audio file and its attached markup information can be stored in the database, number with the data mode formatted The inquiry according to various dimensions and attribute is supported according to library, so as to support flexible audio query mode.

Referring to Fig. 2, Fig. 2 is a kind of a kind of implementation flow chart of intelligent interactive method provided in an embodiment of the present invention, it should Method includes:

Step S201, content of text is received.

Text content is converted by the audio that intelligent sound box inputs user.

The intelligent sound box of the present embodiment can be using a plurality of intelligent sound boxes currently on the market, and such as day cat spirit, Jingdone district is stung Rub-a-dub, small love classmate of millet etc., the intelligent sound box of customized exploitation also can be used, use what is had on intelligent sound box ASR (speech-to-text), TTS (text-to-speech) and audio playback capability, it is former with genuine content of text works Sound forms interesting exchange and interdynamic with user.

Step S202, content of text is matched with each search scene, with the determining and matched audio of content of text Content.

Search scene includes being intended to enquirement template scene, response context or the sentence contextual information scene of collection. It is understood that be, in practical application, according to user obtain demand set by search scene be not limited to three of the above.

Multiple search scenes are arranged in the present embodiment, can be by text in determining search scene matched with content of text Content matches one by one with each search scene, during matched, if being matched to content of text in a certain search scene, and step Rapid S202 terminates, i.e., no longer matches content of text with other not matched search scenes.

Step S203, audio content is returned into user.

Referring to Fig. 3, Fig. 3 is a kind of another implementation flow chart of intelligent interactive method provided in an embodiment of the present invention, This method comprises:

Step S301, content of text is received.

Step S302, judge be intended to collection enquirement template in whether have with the matched enquirement template of content of text, if so, holding Row step S303, it is no to then follow the steps S306.

Step S303, the intention of content of text and the slot value title of keyword are obtained.

Step S304, retrieval obtains audio collection corresponding with matched enquirement template.

Step S305, from audio collection, retrieval is labeled with the sub-audio collection of the intention, and concentrates and examine from the sub-audio Rope includes the audio content of slot value title, and audio content is returned to user.

When the audio being matched to is multiple, an audio is returned at random to user.

In order to improve matching efficiency in practical application, diminution range of search can be according to content of text before step 302 Determine work title belonging to text content and enunciator's information, enunciator's information refers to issuing the corresponding sound of content of text Role's title of frequency, then can from there is enunciator's information identical with content of text, in works belonging to content of text into Row puts question to the matching of template and the retrieval of audio.

Step S302- step S305 is accomplished that content of text and is intended to this scene of enquirement template progress of collection Match.

Step S306, judge whether that audio context relevant to content of text can be matched from audio database, if It is to execute step S307, it is no to then follow the steps S308.

Step S307, audio context is returned into user.

Step S308, content of text is sent to intelligent response service, and the intelligence for receiving intelligent response service return is answered Answer result.

Step S309, judge whether to match from audio database and be met the requirements with intelligent response result similarity Audio, it is no to then follow the steps S311 if executing step S310.

When cannot directly match audio context matched with content of text from audio database, by content of text It is sent to the intelligent response service for supporting to chat function at present, the intelligent response returned according to intelligent response service is as a result, arrive sound Most similar content therewith is found in frequency information bank, and returns to user.

Step S310, audio is returned into user.

Step S306- step S310 is accomplished that this scene matches with response context by content of text.

Step S311, the keyword of content of text is extracted.

Step S312, judge whether that audio content corresponding with keyword can be retrieved from the contextual information of sentence, If so, executing step S313.

Step S313, the audio content retrieved is returned into user.

Step S311- step S313 is accomplished that this scene matches with sentence contextual information by content of text.

When content of text and be intended to the enquirement template scene of collection, response context both scenes all mismatch when, Judge in content of text whether to include some contents in sentence contextual information, for example, referred in user's input the time, The information such as point then by extracting the keyword of content of text, and are scanned in sentence contextual information using keyword, are obtained To audio relevant to user's input.Such as user's input " sound that big hero is bullied in school in broadcasting machine cat cartoon Sound ", then after being matched to work title and enunciator, when can not hit intention and upper word content, just with the letter of user's input Breath, semanteme parse the information such as " school ", " by bullying ", the matched content that is corresponding to it are found into sentence contextual information and is broadcast It puts.

Referring to Fig. 4, Fig. 4 is a kind of a kind of structural schematic diagram of intelligent interaction device provided in an embodiment of the present invention, it should The course of work of each module in structural schematic diagram referring to Fig.1 in corresponding embodiment method implementation procedure, which includes:

Semantic understanding module 410 and audio search module 420；

The semantic understanding module 410, for receiving content of text；

The audio search module 420, for the content of text to be matched with each search scene, with determine with The matched audio content of content of text, described search scene include the enquirement template scene for being intended to collection, response civil division up and down Scape or sentence contextual information scene；The audio content is returned into user.

It, in other embodiments of the invention can be in order to match content of text from different search scenes Including audio preprocessing module, for for each audio production, as unit of sentence, using audio preprocessing module by the sound Frequency works are divided into N number of audio file, and the audio file being partitioned into is converted into text, carry out attribute labeling to text and determine text This contextual information, finally stores text.

In other embodiments of the invention, the semantic understanding module 410 is specifically used for:

The slot value title of the keyword is extracted from the content of text；

The audio search module 420 is specifically used for:

If have with the matched enquirement template of the content of text, obtain the slot value name of the keyword in the content of text Claim；

Retrieval includes the audio content of the slot value title from the audio collection, and the audio content is returned to use Family.

In other embodiments of the invention, when not enquirement template matched with the content of text, the audio Search module 420 is also used to:

In other embodiments of the invention, if failing to match audio context relevant to the content of text, institute Audio search module 420 is stated to be also used to:

Judge whether to match from audio database and the most similar audio of the intelligent response result；

If can match with the most similar audio of the intelligent response result, the audio is returned into user.

In other embodiments of the invention, when no most similar audio context, the semantic understanding module 420 It is also used to:

Extract the keyword of the content of text；

Correspondingly, the audio search module 410 is also used to:

If so, the audio content retrieved is returned to user.

Description and claims of this specification and term " first ", " second ", " third " " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.

For device or system embodiments, since it essentially corresponds to embodiment of the method, thus related place referring to The part of embodiment of the method illustrates.Device or system embodiment described above is only schematical, wherein described Unit may or may not be physically separated as illustrated by the separation member, and component shown as a unit can be with It is or may not be physical unit, it can it is in one place, or may be distributed over multiple network units.It can It is achieved the purpose of the solution of this embodiment with selecting some or all of the modules therein according to the actual needs.This field is common Technical staff can understand and implement without creative efforts.

In several embodiments provided by the present invention, it should be understood that disclosed system, device and method are not having It has more than in the spirit and scope of the present invention, can realize in other way.Current embodiment is a kind of demonstration Example, should not be taken as limiting, given particular content should in no way limit the purpose of the present invention.For example, the unit or The division of subelement, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple lists First or multiple subelements combine.In addition, multiple units can with or component may be combined or can be integrated into another and be System, or some features can be ignored or not executed.

In addition, described system, the schematic diagram of device and method and different embodiments, without departing from the scope of the present invention It is interior, it can be with other systems, module, techniques or methods combination or integrated.Another point, shown or discussed mutual coupling It closes or direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit can be with It is electrically mechanical or other forms.

The above is only a specific embodiment of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of intelligent interactive method, which is characterized in that be applied to intelligent sound box, comprising:

Receive content of text；

The content of text is matched with each search scene, with determining with the matched audio content of the content of text, Described search scene includes being intended to enquirement template scene, response context or the sentence contextual information scene of collection；

The audio content is returned into user.

2. the method according to claim 1, wherein content of text is matched with each search scene, with The determining and matched audio content of the content of text, comprising:

If have with the matched enquirement template of the content of text, obtain the intention of the content of text and the slot value name of keyword Claim；

Retrieval is labeled with the sub-audio collection of the intention from the audio collection, and concentrates retrieval comprising described from the sub-audio The audio content of slot value title, and the audio content is returned into user.

3. according to the method described in claim 2, it is characterized in that, when not with the matched enquirement template of the content of text When, content of text is matched with each search scene, with the determining and matched audio content of the content of text, is also wrapped It includes:

4. according to the method described in claim 3, it is characterized in that, if failing to match audio relevant to the content of text Context, further includes:

The content of text is sent to intelligent response service, and receives the intelligent response knot that the intelligent response service returns Fruit；

If the audio met the requirements with the intelligent response result similarity can be matched, the audio is returned into user.

5. according to the method described in claim 4, it is characterized in that, when the audio for not having similarity to meet the requirements, by text Content is matched with each search scene, with the determining and matched content of text of the content of text, further includes:

Extract the keyword of the content of text；

If so, the audio content retrieved is returned to user.

6. method according to claim 1-5, which is characterized in that further include:

The audio file being partitioned into is converted into text, attribute labeling is carried out to the text and determines the context of text Information, wherein carrying out the slot value title that attribute labeling includes the intention of determining text and the keyword of text to text；

Store the text.

7. a kind of intelligent interaction device, which is characterized in that be applied to intelligent sound box, comprising:

Semantic understanding module and audio search module；

The semantic understanding module, for receiving content of text；

The audio search module, for matching the content of text with each search scene, with the determining and text The audio content of this content matching, described search scene include enquirement template scene, response context or the sentence for being intended to collection Sub- contextual information scene；The audio content is returned into user.

8. device according to claim 7, which is characterized in that the semantic understanding module is specifically used for:

Correspondingly, the audio search module is specifically used for:

9. device according to claim 8, which is characterized in that when not with the matched enquirement template of the content of text When, the audio search module is also used to:

10. device according to claim 9, which is characterized in that if failing to match sound relevant to the content of text Frequency context, the audio search module are also used to:

11. device according to claim 10, which is characterized in that when the audio context for not having similarity to meet the requirements When, the semantic understanding module is also used to:

Extract the keyword of the content of text；

Correspondingly, the audio search module is also used to:

If so, the audio content retrieved is returned to user.

12. according to device described in claim 7-11, which is characterized in that further include:

Audio preprocessing module, for the audio production being divided into N number of audio file, N is for each audio production Positive integer；The audio file being partitioned into is converted into text, attribute labeling is carried out to the text and determines the upper of text Context information, wherein carrying out the slot value title that attribute labeling includes the intention of determining text and the keyword of text to text；It deposits Store up the text.