CN106027485A - Rich media display method and system based on voice interaction - Google Patents

Rich media display method and system based on voice interaction Download PDF

Info

Publication number
CN106027485A
CN106027485A CN201610279818.0A CN201610279818A CN106027485A CN 106027485 A CN106027485 A CN 106027485A CN 201610279818 A CN201610279818 A CN 201610279818A CN 106027485 A CN106027485 A CN 106027485A
Authority
CN
China
Prior art keywords
information
rich media
user
speech data
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610279818.0A
Other languages
Chinese (zh)
Inventor
吴建国
张珩
沈韡
刘超华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intelligent Technology (beijing) Co Ltd
LeTV Holding Beijing Co Ltd
Original Assignee
Intelligent Technology (beijing) Co Ltd
LeTV Holding Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intelligent Technology (beijing) Co Ltd, LeTV Holding Beijing Co Ltd filed Critical Intelligent Technology (beijing) Co Ltd
Priority to CN201610279818.0A priority Critical patent/CN106027485A/en
Publication of CN106027485A publication Critical patent/CN106027485A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a rich media display method based on voice interaction. The method comprises the steps of acquiring voice data input by a user; performing frequency domain conversion processing on the voice data, thus acquiring spectrum characteristics of the voice data, and looking up a preset user attribute list according to the spectrum characteristics to acquire the attribute of the user; performing semantic recognition on the voice data to acquire semantic information of the voice data, and finding and acquiring rich media information related to the semantic information according to the semantic information; and displaying the rich media information according to a preset display rule corresponding to the user attribute. The invention also discloses a rich media display system based on voice interaction. According to the rich media display method and system based on voice interaction, the voice data is subjected to frequency domain conversion processing, and the user attribute is acquired, so that differential processing of voice interaction can be achieved according to the user attribute, and the display of the rich media is more targeted. By acquiring the rich media information related to the voice, the voice interaction efficiency is improved.

Description

Rich Media based on interactive voice methods of exhibiting and system
Technical field
The present invention relates to the display technique field of speech processes and Rich Media, particularly relate to a kind of based on voice friendship Mutual Rich Media's methods of exhibiting and system.
Background technology
Along with the development of information technology, user interaction techniques is widely used.And interactive voice As continue keyboard mutuality, mouse user interaction patterns of new generation alternately and after touch screen interaction, convenient with it Feature efficiently, is gradually approved by users and has by the potential prospect of large-scale promotion, and in these phases In the application closed, wisdom speech business and correlation function thereof are the most attractive.Such as, intelligent mobile is eventually Application relevant to voice on end gets more and more, and intelligent television manufacturer replaces also by quoting voice interaction technique Change traditional hand-held remote controller.In prior art, interactive voice is based on speech recognition technology, that is, voice Interactive system, after receiving one section of voice, first carries out content recognition to speech data, obtains content recognition As a result, and according to this content recognition result user view is known.Afterwards, voice interactive system is anticipated according to user Figure carries out the operation corresponding with this voice, or returns the information corresponding with this voice to terminal use.
But, existing voice interactive system, on the one hand it is merely able to identify the difference comprising semanteme in speech data Not, it is impossible to enough realize the differentiation to different user and process, on the other hand, existing voice interactive system its The effect of mutual display is the most single, and only voice or only word is mutual, and this wants to obtain for those For taking the user of more information resource, function and effect are not the most especially desirable.Especially in child's Cultivating or education aspect, existing interactive system cannot meet the use demand of child user.
Summary of the invention
In view of this, it is an object of the invention to propose a kind of Rich Media based on interactive voice methods of exhibiting and System, makes the displaying of Rich Media have more specific aim, improves the effect that interactive voice is shown.
A kind of based on interactive voice the Rich Media methods of exhibiting provided based on the above-mentioned purpose present invention, including:
Obtain the speech data of user's input;
Described speech data is carried out frequency domain transform process, obtains the spectrum signature of described speech data, according to Described spectrum signature searches the Customer attribute row form preset, and obtains the attribute of user;
Described speech data is carried out semantics recognition, it is thus achieved that the semantic information of described speech data, according to described The rich media information relevant to institute semantic information is searched and obtained to semantic information;
According to default show corresponding with described user property, rule carries out the displaying of rich media information.
Preferably, also include after the step of the semantic information of the described speech data of described acquisition:
Described speech data is converted to Word message, within the default time, by described word-information display In interface.
Preferably, described described speech data is carried out frequency domain transform process, obtain the frequency of described speech data Also include before the step of spectrum signature:
Judge whether active user is system user according to described speech data,
If system user, then obtain the pre-set user information corresponding with active user;By described speech data Carry out semantics recognition, it is thus achieved that the semantic information of described speech data, search according to institute's semantic information and obtain The rich media information relevant to institute semantic information;According to the default displaying rule corresponding with described user profile Then carry out the displaying of rich media information;
If active user is not system user, then perform described speech data is carried out frequency domain transform process, Step to the spectrum signature of described speech data.
Preferably, described rich media information is carried out according to the default displaying rule corresponding with described user property The step shown includes:
The displaying of rich media file, wherein, described Rich Media is carried out according to the displaying rule in rich media information Comprising rich media file information in information and show Rule Information accordingly, described displaying rule includes Rich Media The order of document presentation and the mode of displaying.
Preferably, described rich media information is carried out according to the default displaying rule corresponding with described user property Displaying step after also include:
According to described rich media information, play the voice guidance message preset;
Obtain the new speech data of user's input;
According to described new speech data, show the rich media information that described new speech data is corresponding.
Preferably, described search and obtain the Rich Media relevant to institute semantic information according to institute's semantic information The step of information also includes:
The rich media information obtained according to lookup, obtains the attribute character of described rich media information;
According to attribute character and the attribute of user of rich media information, search and whether judge described Rich Media Belong to the mask information preset under this user property;
The most then filter out described rich media information.
Present invention also offers a kind of Rich Media based on interactive voice display systems, including:
Data acquisition module, for obtaining the speech data of user's input;Speech data acquisition obtained is sent out Give attribute and search module and data obtaining module;
Attribute searches module, for receiving the speech data that described data acquisition module sends, to described voice Data carry out frequency domain transform process, obtain the spectrum signature of described speech data, look into according to described spectrum signature Look for default Customer attribute row form, obtain the attribute of user;The customer attribute information obtained is sent to information Display module;
Data obtaining module, for receiving the speech data that described data acquisition module sends, by described voice Data carry out semantics recognition, it is thus achieved that the semantic information of described speech data, search also according to institute's semantic information Obtain the rich media information relevant to institute semantic information;The information that the rich media information of acquisition is sent to is shown Module;
Information display module, searches, for receiving described attribute, the customer attribute information and described that module sends The rich media information that data obtaining module sends, according to the default displaying rule corresponding with described user property Carry out the displaying of rich media information.
Preferably, described system also includes word display module;
Described data acquisition module is additionally operable to, and the speech data obtained is sent to described word display module;
Described word display module is used for, and receives the speech data of described data acquisition module transmission and by described Speech data is converted to Word message, within the default time, by described word-information display in interface.
Preferably, also include: user's judge module,
Described data acquisition module is additionally operable to, and the speech data obtained is sent to described user's judge module;
Described user's judge module is used for, and receives the speech data that described data acquisition module sends, according to institute State speech data and judge whether active user is system user, if system user, then obtain and active user Corresponding pre-set user information;Described pre-set user information is sent to information display module;If active user It not system user, then described speech data is sent to attribute and searches module;
Described information display module is additionally operable to, and receives the pre-set user information that described user's judge module sends, According to default show corresponding with described pre-set user information, rule carries out the displaying of rich media information.
Preferably, described information display module is additionally operable to,
The displaying of rich media file, wherein, described Rich Media is carried out according to the displaying rule in rich media information Comprising rich media file information in information and show Rule Information accordingly, described displaying rule includes Rich Media The order of document presentation and the mode of displaying.
Preferably, also include: guide module;
Described information display module is additionally operable to, according to described rich media information to guiding module transmission guiding to refer to Order;
Described guiding module is used for, and receives the key instruction that information display module sends, and plays the voice preset Guidance information;
Described data acquisition module is additionally operable to, and obtains the new speech data of user's input;By described new language Sound data are sent to described information display module;
Described information display module is additionally operable to, and receives the new speech data that described data acquisition module sends, According to described new speech data, show the rich media information that described new speech data is corresponding.
Preferably, described data obtaining module is additionally operable to,
The rich media information obtained according to lookup, obtains the attribute character of described rich media information;
According to attribute character and the attribute of user of rich media information, search and whether judge described Rich Media Belong to the mask information preset under this user property;
The most then filter out described rich media information.
From the above it can be seen that the present invention provide Rich Media based on interactive voice methods of exhibiting and be System, by speech data carries out frequency domain transform process, and utilizes frequency domain transform to process the spectrum signature obtained Obtain the attribute of user, and then can be according to the different attribute of user, it is achieved the differentiation of interactive voice processes, The displaying making Rich Media has more specific aim.That is, can be according to the different attribute of user, and then according to not Same shows that rule carries out the displaying of Rich Media.Meanwhile, described Rich Media based on interactive voice methods of exhibiting And system passes through semantics recognition, it is possible to search and acquire the rich media information relevant to speech data, enter And improve information content and the type that interactive voice is shown so that efficiency and effect that interactive voice is shown are big Big raising.
Accompanying drawing explanation
The stream of one embodiment of Rich Media based on the interactive voice methods of exhibiting that Fig. 1 provides for the present invention Cheng Tu;
Another embodiment of Rich Media based on the interactive voice methods of exhibiting that Fig. 2 provides for the present invention Flow chart;
The stream of one embodiment of Rich Media based on the interactive voice display systems that Fig. 3 provides for the present invention Cheng Tu;
Another embodiment of Rich Media based on the interactive voice display systems that Fig. 4 provides for the present invention Flow chart.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, And referring to the drawings, the present invention is described in more detail.
It should be noted that the statement of all uses " first " and " second " is all in the embodiment of the present invention The parameter of entity or non-equal in order to distinguish two same names non-equal, it is seen that " first " " second " Only for the convenience of statement, should not be construed as the restriction to the embodiment of the present invention, subsequent embodiment is to this no longer Illustrate one by one.
With reference to shown in Fig. 1, for a reality of Rich Media based on the interactive voice methods of exhibiting that the present invention provides Execute the flow chart of example.Described Rich Media based on interactive voice methods of exhibiting includes:
Step 101, obtains the speech data of user's input;
Wherein, speech data described here is the speech data that the phonetic entry according to user produces, voice Input refers to that user says the voice needing to interact and refers at the phonetic incepting position of terminal or relevant device Order, such as: user wants to search for Fructus Mali pumilae, then need to say the spoken language pronunciation of " Fructus Mali pumilae ".Additionally, user Can also use the speech data that recorded as the phonetic entry of user, such as: use another to broadcast The speech play that user is prerecorded by equipment of putting is out.Terminal or interactive system will be according to described voices Data and user carry out interactive voice.
Step 102, carries out frequency domain transform process to described speech data, obtains the frequency spectrum of described speech data Feature, searches the Customer attribute row form preset, obtains the attribute of user according to described spectrum signature;
Wherein, described frequency domain transform processes and refers to that speech data acquisition obtained carries out frequency-domain analysis, obtains The frequency domain information of voice, and then obtain the spectral characteristic of speech data.Described Customer attribute row form refers in advance Arrange is used for relation list the most corresponding with spectrum signature for different user properties, by the frequency obtained Spectrum signature searches described Customer attribute row form, it becomes possible to determine the user property that described speech data is corresponding.Institute State the attributes such as age that user property comprises user, sex, ethnic group.It is, of course, also possible to according to spectrum signature Acquisition can distinguish other differences of user property.
Step 103, carries out semantics recognition by described speech data, it is thus achieved that the semantic information of described speech data, Search and obtain the rich media information relevant to institute semantic information according to institute's semantic information;
Wherein, first described speech data is generally carried out language by the described semantics recognition that carried out by described speech data Sound identification, then carries out semantics recognition according to the result of speech recognition.Described speech recognition refers to determine described Word content in speech data, described semantics recognition refers to identify the artistic conception in language and implication.Described richness Media include: all kinds of multimedia messages such as word, picture, video, audio frequency.
Step 104, according to default show corresponding with described user property, rule carries out rich media information Show.
Wherein, the displaying rule that described user property is corresponding refers to that each class user is respectively to there being an exhibition Show that rule, described displaying rule include showing that the layout at interface, the order of Rich Media's displaying, Rich Media show Form etc..
From above-described embodiment, described Rich Media based on interactive voice methods of exhibiting is by inputting user Speech data carry out frequency domain transform process, then utilize frequency domain transform to process the spectrum signature that obtains and used The attribute at family, and then can be according to the different attribute of user, it is achieved the differentiation of interactive voice processes, and makes richness The displaying of media has more specific aim.That is, can for different types of user, interactive system or terminal Realize different interaction, enabling according to the different attribute of user, and then according to different displaying rule Then carry out the displaying of Rich Media.Meanwhile, described Rich Media based on interactive voice methods of exhibiting is known by semanteme Not, it is possible to search and acquire the rich media information relevant to speech data, and then improve interactive voice The information content shown and type, namely substantially increase efficiency and the effect that interactive voice is shown.Especially pin For child field or the interactive voice of education sector, rich matchmaker based on interactive voice of the present invention Body display method makes user pass through interactive voice can not only obtain more abundant information resources, Er Qietong Cross the interactive display of Rich Media, it is possible to be greatly improved the Experience Degree of user, attract the user's attention while power also The impression of user can be deepened, improve and cultivate and the effect of education.
As some preferred embodiments of the present invention, the step of the semantic information of the described speech data of described acquisition The most also include: described speech data is converted to Word message according to the result of semantics recognition, default In time, by described word-information display in interface.By converting voice data into Word message, one Aspect, it is possible to the voice making user confirm that interactive system or terminal are identified is the most correct, on the other hand, Also make user the most directly perceived for the displaying of interaction results.Generally, Word message is shown in interface one Need after fixing time to hide, to avoid bringing interference to the displaying of Rich Media.The displaying side of described Word message Formula both can be illustrated in interface top, it is also possible to described Word message is illustrated in some corner at interface In, the lower right corner at such as interface, its display mode can be selected the most accordingly.So, not only improve The accuracy of voice messaging in interactive voice, and for being directed to the user having defective vision, it is possible to enter one Step determines that the voice oneself inputted is the most accurate, meanwhile, is also convenient for other users looked on identification and carries out voice The voice messaging of mutual user's input.
As another preferred embodiment of the present invention, described described speech data is carried out at frequency domain transform Reason, also includes before the step of the spectrum signature obtaining described speech data:
Judge whether active user is system user according to described speech data,
If system user, then obtain the pre-set user information corresponding with active user;By described speech data Carry out semantics recognition, it is thus achieved that the semantic information of described speech data, search according to institute's semantic information and obtain The rich media information relevant to institute semantic information;According to the default displaying rule corresponding with described user profile Then carry out the displaying of rich media information;
If active user is not system user, then perform described speech data is carried out frequency domain transform process, Step to the spectrum signature of described speech data.
Wherein, described system user refers to the user with account prestored in system, it is also possible to be referred to as member User.Such as, for some mobile phone terminal, can prestore mobile phone owner's on mobile phone Spectrum information, and record more details of this user, here, mobile phone owner is system user. So, by judging whether active user is that system user can further discriminate between system user and nonsystematic is used Family.And, by pre-set user information, it is possible to obtain the information that more detailed system user is relevant, enter And realize Rich Media more accurately for system user and show.
In some preferred embodiments, described according to the default displaying rule corresponding with described user property The step carrying out rich media information displaying includes: carry out Rich Media's literary composition according to the displaying rule in rich media information The displaying of part, wherein, comprises rich media file information and corresponding displaying rule letter in described rich media information Breath, the described order and the mode of displaying showing that rule includes that rich media file shows.Described rich-media content Including: video, audio frequency, word, animation, the application program that more even can perform.Certainly, rich matchmaker The content of body is not limited to the above-mentioned content enumerated, and can add the most accordingly, the most no longer repeat. The embodiment of described displaying rule or displaying rule is including but not limited to following manner: broadcasting video, broadcasting audio frequency, Audio frequency and video played in order, picture presentation, background display audio frequency, animation, or similar mutual display advertising Other rule or mode, exhibition methods of application etc..Such as: plurality of pictures both can slide exhibition successively Show, it is also possible to show as in PPT.Described Rich Media can include showing rule, it is also possible to no Comprise displaying rule, when not including showing rule, can default setting default rule in systems, will be silent Recognize rule as showing rule.In such manner, it is possible to the displaying realizing Different Rule for Rich Media, improve richness The multiformity of display advertising.And, by displaying rule is added in rich media information, enabling System sets for each search-type and shows rule accordingly, and then realize more abundant displaying side Formula.
Further, described rich media information is carried out according to the default displaying rule corresponding with described user property Displaying step after also include:
According to described rich media information, play the voice guidance message preset;
Obtain the new speech data of user's input;
According to described new speech data, show the rich media information that described new speech data is corresponding.
So, user can guide user again to select follow-up corresponding richness according to described guidance information further Media, then show the rich media file that user is subsequently selected so that mutual displaying process is more flexible, Improve the Experience Degree of user.
As another preferred embodiment of the present invention, described according to default corresponding with described user property Show that the regular step 104 carrying out rich media information displaying includes: according to user property, search the use preset Family attribute and the corresponding relation list showing rule, obtain the displaying rule of active user, described displaying rule Comprise the sequence of rich media information;Described rich media information is broadcast automatically according to the order in described displaying rule Put displaying.Wherein, described described rich media information is play exhibition automatically according to the order in described displaying rule Show and refer to Rich Media is play respectively according to the order shown in rule or displayed.Here, it is directed to For Voice & Video information, described broadcasting refers to directly play audio or video file;For word or For pictorial information, described broadcasting refers to that the mode using thunder scholar's lantern slide plays out, it is of course also possible to The broadcasting form using other plays out.So so that all rich matchmaker that user obtains according to speech data Body can display from trend user, and the order shown is based on the exhibition corresponding to user's self attributes Show rule, not only increase the Experience Degree that user is mutual, and enhance the effect that rich media information is shown Really, and then improve the efficiency of interactive voice.
As further embodiment of the present invention, described according to the default displaying corresponding with described user property The step of the displaying that rule carries out rich media information also includes: whether the rich media information judging current presentation is Voice or video information;If the rich media information of current presentation is voice or video information, the most do not perform any Operation;If the rich media information of current presentation is not voice or video information, then obtains and believe with described Rich Media The voice messaging of breath binding, and described voice messaging is shown with rich media information simultaneously.It is directed to not be language Sound or the message file of video information, need to preset the voice messaging of a binding, so, carrying out When the information such as picture or word is shown, it is also possible to reach the effect that voice is shown, such as: displaying is to close Picture in panel computer, then, being directed to each computer picture will have a voice messaging substantially It is introduced, such as: " * * brand flat board ".Certainly, described voice messaging can also is that interactive system or end Hold and resolve, according to rich media information, the voice messaging obtained, such as: Rich Media is Word message, then be mutual These Word messages can be converted into voice messaging by system on backstage, and ties up with these Word messages Fixed.So, not only increase the multiformity of interactive voice, and further increase the Experience Degree of user.
In some optional embodiments, described search and obtain according to institute's semantic information and described semantic letter The step 103 of the rich media information that manner of breathing closes also includes: the rich media information obtained according to lookup, obtains institute State the attribute character of rich media information;According to attribute character and the attribute of user of rich media information, search And judge whether described Rich Media belongs to the mask information preset under this user property;The most then filter described Rich media information.Wherein, described attribute character refers generally to the type of rich media information, such as: some regards Frequency file can be divided into literature and art, pornographic, violence, homicide, risk, science fiction etc. type, some picture Can be divided into bloody, pure and fresh, feel sick etc. type.For different user properties, set the most respectively The list of one mask information.Such as: for child, mask information is: game, violence, homicide, Unfavorable information such as pornographic, and for adult, mask information is: some specific religion, criminal Crime etc. information.Can also as required, the crowd for different sexes sets different mask information lists. So, be conducive to improving further the efficiency of interactive voice, it is to avoid some mistakes or be not suitable for the exhibition of information Show, for education sector, many can be disperseed the information screen of child attention fall by mask information, The interactive voice process making child is the most healthy, effectively.
With reference to shown in Fig. 2, another of Rich Media based on the interactive voice methods of exhibiting provided for the present invention The flow chart of embodiment.Shown Rich Media based on interactive voice methods of exhibiting includes:
Step 201, obtains the speech data of user's input;
Step 202, it is judged that whether active user is system user, the most then execution step 204, otherwise, Perform step 203;
Step 203, carries out frequency domain transform process to described speech data, obtains the frequency spectrum of described speech data Feature, searches the Customer attribute row form preset, obtains the attribute of user according to described spectrum signature
Step 204, obtains the pre-set user information corresponding with active user;
Step 205, carries out semantics recognition by described speech data, it is thus achieved that the semantic information of described speech data, Search and obtain the rich media information relevant to institute semantic information according to institute's semantic information;
Step 206, is converted to Word message by described speech data, within the default time, by described literary composition Word information is shown in the top at interface;
Step 207, the rich media information obtained according to lookup, obtain the attribute character of described rich media information;
Step 208, it is judged that whether described Rich Media belongs to the mask information preset under this user property;If so, Then perform step 210, otherwise perform step 209;
Step 209, according to step 208, described Rich Media is not belonging under this user property the shielding letter preset Breath, then retain shown rich media information;
Step 210, according to step 208, described Rich Media belongs to the mask information preset under this user property, Then filter out described rich media information, namely propose this class rich media information;
Step 211, plays displaying by described rich media information automatically according to the order in described displaying rule.
From above-described embodiment, described Rich Media based on interactive voice methods of exhibiting is by by voice messaging It is converted into Word message and shows, improve the accuracy of interactive voice, by obtaining the spy of rich media information Reference breath can get rid of, for corresponding user property, the mask information preset so that the displaying of all Rich Medias Process is more stable and reliable, improves the bandwagon effect of interactive voice.Therefore, of the present invention based on Rich Media's methods of exhibiting of interactive voice not only increases the accuracy of interactive voice, and improves mutual Effect, brings more preferable interactive experience to user.
With reference to shown in Fig. 3, for a reality of Rich Media based on the interactive voice display systems that the present invention provides Execute the flow chart of example.Described Rich Media based on interactive voice display systems, including:
Data acquisition module 301, for obtaining the speech data of user's input;The voice number that acquisition is obtained Module 302 and data obtaining module 303 is searched according to being sent to attribute;
Attribute searches module 302, for receiving the speech data that described data acquisition module 301 sends, right Described speech data carries out frequency domain transform process, obtains the spectrum signature of described speech data, according to described frequency Spectrum signature searches the Customer attribute row form preset, and obtains the attribute of user;The customer attribute information obtained is sent out Give information display module 304;
Data obtaining module 303, for receiving the speech data that described data acquisition module 301 sends, will Described speech data carries out semantics recognition, it is thus achieved that the semantic information of described speech data, according to described semantic letter Breath is searched and obtains the rich media information relevant to institute semantic information;The rich media information of acquisition is sent to Information display module 304;
Information display module 304, searches, for receiving described attribute, the customer attribute information that module 302 sends And the rich media information that described data obtaining module 303 sends, according to default with described user property pair That answers shows that rule carries out the displaying of rich media information.
From above-described embodiment, described Rich Media based on interactive voice display systems is obtained by described data Delivery block 301 obtains the speech data of user's input, searches module 302 by described attribute and determines user's Attribute, obtains, by described data obtaining module 303, the rich media information that speech data is corresponding, finally by Rich media information is shown by described information display module 304 according to default displaying rule.So, no Only make interactive system can carry out the interaction process of differentiation according to the different attribute of user, and significantly carry High user carries out efficiency and the effect of interactive voice.
In some preferred embodiments of the present invention, shown in reference Fig. 4, described rich matchmaker based on interactive voice Body display system also includes word display module 305;Described data acquisition module 301 is additionally operable to, and will obtain Speech data be sent to described word display module 305;Described word display module 305 is used for, and receives Described data acquisition module send speech data and described speech data is converted to Word message, preset Time in, by described word-information display in interface.
In other preferred embodiments of the present invention, also include: user's judge module 306,
Described data acquisition module 301 is additionally operable to, and the speech data obtained is sent to described user and judges mould Block 306;
Described user's judge module 306 is used for, and receives the speech data that described data acquisition module 301 sends, Judge whether active user is system user according to described speech data, if system user, then obtain and work as The pre-set user information that front user is corresponding;Described pre-set user information is sent to information display module 304; If active user is not system user, then described speech data is sent to attribute and searches module 302;
Described information display module 304 is additionally operable to, and receives the default use that described user's judge module 306 sends Family information, according to default show corresponding with described pre-set user information, rule carries out the exhibition of rich media information Show.
In further embodiment of the present invention, described information display module 304 is additionally operable to, according to Rich Media Displaying rule in information carries out the displaying of rich media file, wherein, comprises rich matchmaker in described rich media information Body fileinfo and show Rule Information accordingly, described displaying rule includes the order that rich media file is shown With the mode shown.
In optional embodiment of the present invention, also include: guide module 307;
Described information display module 304 is additionally operable to, according to described rich media information to guiding module 307 to send Key instruction;
Described guiding module 307 is used for, and receives the key instruction that information display module 304 sends, and plays pre- If voice guidance message;
Described data acquisition module 301 is additionally operable to, and obtains the new speech data of user's input;By described newly Speech data be sent to described information display module 304;
Described information display module 304 is additionally operable to, and receives the new language that described data acquisition module 301 sends Sound data, according to described new speech data, show the rich media information that described new speech data is corresponding.
As one preferred embodiment of the present invention, described data obtaining module 303 is additionally operable to, according to lookup The rich media information obtained, obtains the attribute character of described rich media information;Attribute according to rich media information Feature and the attribute of user, search and judge whether described Rich Media belongs to the screen preset under this user property Cover information;The most then filter out described rich media information.
With reference to shown in Fig. 4, another of Rich Media based on the interactive voice display systems provided for the present invention The flow chart of embodiment.Described Rich Media based on interactive voice display systems includes: data acquisition module 301, attribute searches module 302, data obtaining module 303, information display module 304, word displaying mould Block 305, user's judge module 306 and guiding module 307.
Those of ordinary skill in the field are it is understood that the discussion of any of the above embodiment is merely illustrative , it is not intended that hint the scope of the present disclosure (including claim) is limited to these examples;In the present invention Thinking under, can also be combined between the technical characteristic in above example or different embodiment, step Suddenly can realize with random order, and there is other change of many of the different aspect of the present invention as above Change, for they not offers in details simple and clear.
It addition, for simplifying explanation and discussing, and in order to obscure the invention, provided Accompanying drawing can illustrate or can not illustrate and integrated circuit (IC) chip and the known power supply of other parts / grounding connection.Furthermore, it is possible to illustrate device in block diagram form, in order to avoid obscuring the invention, And this have also contemplated that following facts, i.e. the details about the embodiment of these block diagram arrangements is highly to depend on In will implement the present invention platform (that is, these details should be completely in the reason of those skilled in the art In the range of solution).Elaborating that detail (such as, circuit) is to describe the exemplary embodiment of the present invention In the case of, it will be apparent to those skilled in the art that can there is no these details In the case of or these details change in the case of implement the present invention.Therefore, these descriptions should be recognized For being illustrative and not restrictive.
Although invention has been described to have been incorporated with the specific embodiment of the present invention, but according to above Description, these embodiments a lot of replace, amendment and modification will be for those of ordinary skills Obviously.Such as, other memory architecture (such as, dynamic ram (DRAM)) can use The embodiment discussed.
Embodiments of the invention be intended to fall within the broad range of claims all so Replacement, amendment and modification.Therefore, all within the spirit and principles in the present invention, any omission of being done, Amendment, equivalent, improvement etc., should be included within the scope of the present invention.

Claims (12)

1. Rich Media based on an interactive voice methods of exhibiting, it is characterised in that including:
Obtain the speech data of user's input;
Described speech data is carried out frequency domain transform process, obtains the spectrum signature of described speech data, according to Described spectrum signature searches the Customer attribute row form preset, and obtains the attribute of user;
Described speech data is carried out semantics recognition, it is thus achieved that the semantic information of described speech data, according to described The rich media information relevant to institute semantic information is searched and obtained to semantic information;
According to default show corresponding with described user property, rule carries out the displaying of rich media information.
Method the most according to claim 1, it is characterised in that the described speech data of described acquisition Also include after the step of semantic information:
Described speech data is converted to Word message, within the default time, by described word-information display In interface.
Method the most according to claim 1, it is characterised in that described described speech data is carried out Frequency domain transform processes, and also includes before the step of the spectrum signature obtaining described speech data:
Judge whether active user is system user according to described speech data,
If system user, then obtain the pre-set user information corresponding with active user;By described speech data Carry out semantics recognition, it is thus achieved that the semantic information of described speech data, search according to institute's semantic information and obtain The rich media information relevant to institute semantic information;According to the default displaying rule corresponding with described user profile Then carry out the displaying of rich media information;
If active user is not system user, then perform described speech data is carried out frequency domain transform process, Step to the spectrum signature of described speech data.
Method the most according to claim 1, it is characterised in that described according to default with described use What family attribute was corresponding shows that the regular step carrying out rich media information displaying includes:
The displaying of rich media file, wherein, described Rich Media is carried out according to the displaying rule in rich media information Comprising rich media file information in information and show Rule Information accordingly, described displaying rule includes Rich Media The order of document presentation and the mode of displaying.
Method the most according to claim 1, it is characterised in that described according to default with described use Also include after the step of the displaying that the displaying rule that family attribute is corresponding carries out rich media information:
According to described rich media information, play the voice guidance message preset;
Obtain the new speech data of user's input;
According to described new speech data, show the rich media information that described new speech data is corresponding.
Method the most according to claim 1, it is characterised in that described look into according to institute's semantic information The step looking for and obtaining the rich media information relevant to institute semantic information also includes:
The rich media information obtained according to lookup, obtains the attribute character of described rich media information;
According to attribute character and the attribute of user of rich media information, search and whether judge described Rich Media Belong to the mask information preset under this user property;
The most then filter out described rich media information.
7. Rich Media based on an interactive voice display systems, it is characterised in that including:
Data acquisition module, for obtaining the speech data of user's input;Speech data acquisition obtained is sent out Give attribute and search module and data obtaining module;
Attribute searches module, for receiving the speech data that described data acquisition module sends, to described voice Data carry out frequency domain transform process, obtain the spectrum signature of described speech data, look into according to described spectrum signature Look for default Customer attribute row form, obtain the attribute of user;The customer attribute information obtained is sent to information Display module;
Data obtaining module, for receiving the speech data that described data acquisition module sends, by described voice Data carry out semantics recognition, it is thus achieved that the semantic information of described speech data, search also according to institute's semantic information Obtain the rich media information relevant to institute semantic information;The information that the rich media information of acquisition is sent to is shown Module;
Information display module, searches, for receiving described attribute, the customer attribute information and described that module sends The rich media information that data obtaining module sends, according to the default displaying rule corresponding with described user property Carry out the displaying of rich media information.
System the most according to claim 7, it is characterised in that described system also includes that word is shown Module;
Described data acquisition module is additionally operable to, and the speech data obtained is sent to described word display module;
Described word display module is used for, and receives the speech data of described data acquisition module transmission and by described Speech data is converted to Word message, within the default time, by described word-information display in interface.
System the most according to claim 7, it is characterised in that also include: user's judge module,
Described data acquisition module is additionally operable to, and the speech data obtained is sent to described user's judge module;
Described user's judge module is used for, and receives the speech data that described data acquisition module sends, according to institute State speech data and judge whether active user is system user, if system user, then obtain and active user Corresponding pre-set user information;Described pre-set user information is sent to information display module;If active user It not system user, then described speech data is sent to attribute and searches module;
Described information display module is additionally operable to, and receives the pre-set user information that described user's judge module sends, According to default show corresponding with described pre-set user information, rule carries out the displaying of rich media information.
System the most according to claim 7, it is characterised in that described information display module is additionally operable to,
The displaying of rich media file, wherein, described Rich Media is carried out according to the displaying rule in rich media information Comprising rich media file information in information and show Rule Information accordingly, described displaying rule includes Rich Media The order of document presentation and the mode of displaying.
11. systems according to claim 7, it is characterised in that also include: guide module;
Described information display module is additionally operable to, according to described rich media information to guiding module transmission guiding to refer to Order;
Described guiding module is used for, and receives the key instruction that information display module sends, and plays the voice preset Guidance information;
Described data acquisition module is additionally operable to, and obtains the new speech data of user's input;By described new language Sound data are sent to described information display module;
Described information display module is additionally operable to, and receives the new speech data that described data acquisition module sends, According to described new speech data, show the rich media information that described new speech data is corresponding.
12. systems according to claim 7, it is characterised in that described data obtaining module is additionally operable to,
The rich media information obtained according to lookup, obtains the attribute character of described rich media information;
According to attribute character and the attribute of user of rich media information, search and whether judge described Rich Media Belong to the mask information preset under this user property;
The most then filter out described rich media information.
CN201610279818.0A 2016-04-28 2016-04-28 Rich media display method and system based on voice interaction Pending CN106027485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610279818.0A CN106027485A (en) 2016-04-28 2016-04-28 Rich media display method and system based on voice interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610279818.0A CN106027485A (en) 2016-04-28 2016-04-28 Rich media display method and system based on voice interaction

Publications (1)

Publication Number Publication Date
CN106027485A true CN106027485A (en) 2016-10-12

Family

ID=57081725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610279818.0A Pending CN106027485A (en) 2016-04-28 2016-04-28 Rich media display method and system based on voice interaction

Country Status (1)

Country Link
CN (1) CN106027485A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132805A (en) * 2017-12-20 2018-06-08 深圳Tcl新技术有限公司 Voice interactive method, device and computer readable storage medium
CN109147800A (en) * 2018-08-30 2019-01-04 百度在线网络技术(北京)有限公司 Answer method and device
CN109165336A (en) * 2018-08-23 2019-01-08 广东小天才科技有限公司 A kind of information output controlling method and private tutor's equipment
CN111081248A (en) * 2019-12-27 2020-04-28 安徽仁昊智能科技有限公司 Artificial intelligence speech recognition device
CN111638789A (en) * 2020-05-29 2020-09-08 广东小天才科技有限公司 Data output method and terminal equipment
CN112458703A (en) * 2019-08-19 2021-03-09 青岛海尔洗衣机有限公司 Information display processing method and device, washing machine and storage medium
WO2022041192A1 (en) * 2020-08-29 2022-03-03 深圳市永兴元科技股份有限公司 Voice message processing method and device, and instant messaging client

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201733370U (en) * 2010-05-05 2011-02-02 康佳集团股份有限公司 Set-top box with children mode
CN102708174A (en) * 2012-05-04 2012-10-03 奇智软件(北京)有限公司 Method and device for displaying rich media information in browser
CN103677516A (en) * 2013-11-27 2014-03-26 青岛海信电器股份有限公司 Interface generating method and device of terminal
CN104795067A (en) * 2014-01-20 2015-07-22 华为技术有限公司 Voice interaction method and device
CN105095406A (en) * 2015-07-09 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for voice search based on user feature

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201733370U (en) * 2010-05-05 2011-02-02 康佳集团股份有限公司 Set-top box with children mode
CN102708174A (en) * 2012-05-04 2012-10-03 奇智软件(北京)有限公司 Method and device for displaying rich media information in browser
CN103677516A (en) * 2013-11-27 2014-03-26 青岛海信电器股份有限公司 Interface generating method and device of terminal
CN104795067A (en) * 2014-01-20 2015-07-22 华为技术有限公司 Voice interaction method and device
CN105095406A (en) * 2015-07-09 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for voice search based on user feature

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132805A (en) * 2017-12-20 2018-06-08 深圳Tcl新技术有限公司 Voice interactive method, device and computer readable storage medium
CN109165336A (en) * 2018-08-23 2019-01-08 广东小天才科技有限公司 A kind of information output controlling method and private tutor's equipment
CN109147800A (en) * 2018-08-30 2019-01-04 百度在线网络技术(北京)有限公司 Answer method and device
US11475897B2 (en) 2018-08-30 2022-10-18 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for response using voice matching user category
CN112458703A (en) * 2019-08-19 2021-03-09 青岛海尔洗衣机有限公司 Information display processing method and device, washing machine and storage medium
CN112458703B (en) * 2019-08-19 2024-03-12 青岛海尔洗衣机有限公司 Information display processing method and device, washing machine and storage medium
CN111081248A (en) * 2019-12-27 2020-04-28 安徽仁昊智能科技有限公司 Artificial intelligence speech recognition device
CN111638789A (en) * 2020-05-29 2020-09-08 广东小天才科技有限公司 Data output method and terminal equipment
WO2022041192A1 (en) * 2020-08-29 2022-03-03 深圳市永兴元科技股份有限公司 Voice message processing method and device, and instant messaging client

Similar Documents

Publication Publication Date Title
CN106027485A (en) Rich media display method and system based on voice interaction
Stiernstedt et al. Watching reality from a distance: Class, genre and reality television
US20150254349A1 (en) System and Method for Providing Content in Real-Time
Thurlow et al. Visualizing teens and technology: A social semiotic analysis of stock photography and news media imagery
CN109558513A (en) A kind of content recommendation method, device, terminal and storage medium
Raymond Gender and sexuality in animated television sitcom interaction
CN108012173A (en) A kind of content identification method, device, equipment and computer-readable storage medium
Hills Television aesthetics: A pre-structuralist danger?
CN112653902A (en) Speaker recognition method and device and electronic equipment
CN102855317A (en) Multimode indexing method and system based on demonstration video
Yang Lightness, wildness, and ambivalence: China and new media studies
Díaz-Cintas 10 Audiovisual Translation in Mercurial Mediascapes
CN109326151A (en) Implementation method, client and server based on semantics-driven virtual image
Coupland Social context, style, and identity in sociolinguistics
Jessen et al. Cross-media communication in advertising: exploring multimodal connections between television commercials and websites
Navas Medrano et al. Enabling remote deictic communication with mobile devices: An elicitation study
CN112165627A (en) Information processing method, device, storage medium, terminal and system
EP3664080A1 (en) Information processing device, information processing method, and program
Campos et al. Machine Generation of Audio Description for Blind and Visually Impaired People
CN111160051B (en) Data processing method, device, electronic equipment and storage medium
CN114765033A (en) Information processing method and device based on live broadcast room
Godwin-Jones Technology-mediated SLAEvolving Trends and Emerging Technologies
KR20120027647A (en) Learning contents generating system and method thereof
Manolas et al. Soundtrack loudness as a depth cue in stereoscopic 3D media
CN110415015A (en) Product degree of recognition analysis method, device, terminal and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161012