CN109739354B

CN109739354B - Voice-based multimedia interaction method and device

Info

Publication number: CN109739354B
Application number: CN201811632295.9A
Authority: CN
Inventors: 刘永耀; 赵祯
Original assignee: Guangzhou Leafun Culture Science and Technology Co Ltd
Current assignee: Guangzhou Leafun Culture Science and Technology Co Ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2022-08-05
Anticipated expiration: 2038-12-28
Also published as: CN109739354A

Abstract

The invention relates to the technical field of exhibition and display, and discloses a multimedia interaction method and device based on sound, which comprises the following steps: recognizing the user voice collected by the voice collecting module to obtain the voice characteristics of the user voice; acquiring a target multimedia material matched with the sound characteristics from a multimedia material library; and outputting the target multimedia material through an output module. By implementing the embodiment of the invention, the user voice input by the user can be collected through the voice collection module, and the multimedia material matched with the user voice of the current user is correspondingly output according to different characteristics of different user voices, so that the output multimedia material can be determined according to the voice input by the user, and the experience effect of the user on the outdoor display device is improved.

Description

Voice-based multimedia interaction method and device

Technical Field

The invention relates to the technical field of exhibition and display, in particular to a multimedia interaction method and device based on sound.

Background

At present, more and more outdoor display devices for displaying products, putting advertisements or popular science appear in the market, and enterprises can realize the propaganda and the popularization of products at home by putting advertisements on the outdoor display devices. However, in practice, it is found that the existing outdoor display device can only play the pre-clipped picture and/or video, and cannot interact with the audience, so that the experience effect of the audience on the outdoor display device is poor.

Disclosure of Invention

The embodiment of the invention discloses a sound-based multimedia interaction method and device, which can improve the experience effect of an outdoor display device.

The first aspect of the embodiment of the invention discloses a multimedia interaction method based on sound, which comprises the following steps:

recognizing user voice collected by a voice collecting module to obtain voice characteristics of the user voice;

acquiring a target multimedia material matched with the sound characteristics from a multimedia material library;

and outputting the target multimedia material through an output module.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, before the identifying the user voice collected by the voice collecting module and obtaining the voice feature of the user voice, the method further includes:

collecting environmental sounds through a sound collection module;

identifying whether the environmental sound contains human voice;

if yes, determining the environment sound containing the human voice as the user voice collected by the voice collection module.

As an alternative implementation manner, in the first aspect of the embodiment of the present invention, the obtaining, from a multimedia material library, target multimedia materials matching with the sound features includes:

acquiring the tone, frequency and timbre of the user voice contained in the voice feature;

analyzing the tone and the frequency to obtain a target age matched with the user voice;

analyzing the tone to obtain the target gender matched with the user voice;

and acquiring target multimedia materials matched with the target age and the target gender from a multimedia material library.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, the outputting, by the output module, the target multimedia material includes:

acquiring the volume of the user voice contained in the voice feature;

analyzing to obtain a volume level corresponding to the volume;

determining an image reduction scale matched with the volume level and a target decibel matched with the volume level;

outputting and displaying the image material in the target multimedia material in the image reduction scale through a display screen of an output module;

and outputting the sound materials in the target multimedia material library in the target decibels through a loudspeaker of the output module.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, after the obtaining the volume of the user sound included in the sound feature, the method further includes:

acquiring prestored historical volumes and acquiring the total number of the historical volumes;

counting the sub-quantity of the historical volume smaller than the volume of the user sound by comparing the volume of the user sound with the historical volume;

calculating the ratio of the sub-quantity to the total quantity, and determining a score corresponding to the ratio;

the step of outputting and displaying the image material in the target multimedia material in the image reduced scale through a display screen of an output module comprises the following steps:

and outputting and displaying the image material in the target multimedia material and outputting and displaying the score according to the image reduction scale through a display screen of an output module.

The second aspect of the embodiments of the present invention discloses a multimedia interaction device based on sound, which includes:

the first identification unit is used for identifying the user voice collected by the voice collection module to obtain the voice characteristics of the user voice;

the first acquisition unit is used for acquiring a target multimedia material matched with the sound characteristics from a multimedia material library;

and the output unit is used for outputting the target multimedia material through an output module.

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the apparatus further includes:

the acquisition unit is used for acquiring environmental sounds through the sound acquisition module before the first identification unit identifies the user sounds acquired through the sound acquisition module to obtain the sound characteristics of the user sounds;

a second identification unit, configured to identify whether the environmental sound includes a human voice;

and the determining unit is used for determining the environmental sound containing the human voice as the user voice collected by the voice collecting module when the second identifying unit identifies that the result is positive.

As an optional implementation manner, in a second aspect of the embodiment of the present invention, the first obtaining unit includes:

a first acquisition subunit configured to acquire a pitch, a frequency, and a timbre of the user sound contained in the sound feature;

the first analysis subunit is used for analyzing the tone and the frequency to obtain a target age matched with the user voice;

the second analysis subunit is used for analyzing the tone to obtain the target gender matched with the user voice;

and the second acquisition subunit is used for acquiring the target multimedia material matched with the target age and the target gender from a multimedia material library.

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the output unit includes:

a third acquiring subunit, configured to acquire a volume of the user sound included in the sound feature;

the third analysis subunit is used for analyzing and obtaining the volume level corresponding to the volume;

a determining subunit, configured to determine an image reduction scale matching the volume level and a target decibel matching the volume level;

the first output subunit is used for outputting and displaying the image material in the target multimedia material in the image reduction scale through a display screen of the output module;

and the second output subunit is used for outputting the sound materials in the target multimedia material library in the target decibels through the loudspeaker of the output module.

a second obtaining unit, configured to obtain a pre-stored historical volume after the third obtaining subunit obtains the volume of the user sound included in the sound feature, and obtain the total number of the historical volumes;

the statistical unit is used for counting the sub-quantity of the historical volume smaller than the volume of the user sound by comparing the volume of the user sound with the historical volume;

the calculating unit is used for calculating the ratio of the sub-quantity to the total quantity and determining a score corresponding to the ratio;

the mode that the first output subunit outputs and displays the image material in the target multimedia material in the image reduction scale through the display screen of the output module may be:

A third aspect of an embodiment of the present invention discloses a service device, including:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program code stored in the memory to perform part or all of the steps of any one of the methods of the first aspect.

A fourth aspect of the present embodiments discloses a computer-readable storage medium storing a program code, where the program code includes instructions for performing part or all of the steps of any one of the methods of the first aspect.

A fifth aspect of embodiments of the present invention discloses a computer program product, which, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.

A sixth aspect of the present embodiment discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where the computer program product is configured to, when running on a computer, cause the computer to perform part or all of the steps of any one of the methods in the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the user voice collected by the voice collecting module is identified to obtain the voice characteristics of the user voice; acquiring a target multimedia material matched with the sound characteristics from a multimedia material library; and outputting the target multimedia material through an output module. Therefore, by implementing the embodiment of the invention, the user voice input by the user can be collected through the sound collection module, and the multimedia material matched with the user voice of the current user is correspondingly output according to different characteristics of different user voices, so that the output multimedia material can be determined according to the sound input by the user, and the experience effect of the user on the outdoor display device is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method for multimedia interaction based on sound according to an embodiment of the present invention;

FIG. 2 is a flow chart of another audio-based multimedia interaction method disclosed in the embodiment of the present invention;

FIG. 3 is a flow chart of another method for voice-based multimedia interaction according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a voice-based multimedia interaction apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of another audio-based multimedia interaction apparatus according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of another audio-based multimedia interaction apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a service device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a sound-based multimedia interaction method and device, which can ensure that output multimedia materials can be determined according to sound input by a user, thereby improving the experience effect of the user on an outdoor display device. The following are detailed below.

Example one

Referring to fig. 1, fig. 1 is a flowchart illustrating a voice-based multimedia interaction method according to an embodiment of the present invention. As shown in fig. 1, the sound-based multimedia interaction method may include the steps of:

101. the multimedia server identifies the user voice collected by the voice collection module to obtain the voice characteristics of the user voice.

In this embodiment of the present invention, the multimedia server may include a sound collection module, a server module, an output module, and the like, and the output module may include a display module, a speaker module, and the like, which is not limited in this embodiment of the present invention. The sound collection module can collect the sound of the user in the environment where the multimedia server is located through a microphone or a floor sound receiving device and other equipment. The server module in the multimedia server can process the collected user voice, such as noise reduction, human voice recognition, feature recognition and other operations. The display module in the output module can output and display the image content to be displayed through a display (such as a liquid crystal display screen, a curved screen and the like); the speaker module in the output module may output sound content matching the image content displayed by the display through the speaker.

In the embodiment of the invention, the user voice collected by the voice collection module can be voice sent by one user or voice sent by a plurality of users. The sound characteristics may include characteristics of a frequency, a tone, and a tone of sound, and the embodiment of the present invention is not limited thereto. If the user voices collected by the voice collection module contain voices of a plurality of users, voice characteristics can be classified according to different users.

As an optional implementation manner, before the multimedia server performs step 101, the following steps may also be performed:

the multimedia server collects current sound through a sound collection module;

the multimedia server identifies the voice in the current voice and detects the current decibel of the voice;

the multimedia server judges whether the current decibel is greater than the preset minimum decibel or not;

if yes, the multimedia server determines the current sound as the user sound;

if not, the multimedia server determines the current sound as the environmental sound and deletes the environmental sound.

The method and the device have the advantages that the decibels of the voices contained in the collected current sound can be detected, a plurality of individuals usually exist in the environment where the multimedia server is located, conversation may exist between the individuals, therefore, irrelevant voices may be collected by the sound collection module, the multimedia server outputs wrong contents, the current sound with the current decibel smaller than the preset minimum decibel can be deleted through the detection of the current decibel of the voices in the current sound, the user sound determined by the sound collection module is guaranteed to be the sound for interaction between the user and the multimedia server, and the accuracy of the sound collection module for collecting the user sound is improved.

102. And the multimedia server acquires the target multimedia material matched with the sound characteristics from the multimedia material library.

In the embodiment of the invention, multimedia contents such as pictures, videos, audios and the like can be stored in the multimedia material library, and different multimedia contents can be matched with different sound characteristics, so that one or more target multimedia materials matched with the sound characteristics can be obtained from the multimedia material library according to the sound characteristics, and an output module of the multimedia server can output the target multimedia materials.

As an alternative embodiment, the way that the multimedia server obtains the target multimedia material matched with the sound characteristics from the multimedia material library may comprise the following steps:

the multimedia server acquires the tone quantity of the tone of the sound from the sound characteristics;

the multimedia server judges whether the tone quantity is one;

if yes, the multimedia server acquires a target multimedia material matched with the sound timbre in the sound characteristic from the multimedia material library;

if not, the multimedia server acquires the sound frequency and the sound tone corresponding to each sound tone from the sound characteristics, and acquires target multimedia materials matched with each sound tone from the multimedia material library according to the sound frequency and the sound tone corresponding to each sound tone, wherein one sound tone is matched with one target multimedia material.

By implementing the implementation mode, the sound collection module can simultaneously collect user sounds input by a plurality of users, so that different users contained in the user sounds can be determined by identifying the sound timbre of the user sounds, and the target multimedia material matched with the sound timbre is determined according to the sound timbre of the different user sounds, so that each user can be matched with the target multimedia material matched with the sound input by each user, and the intelligence of the multimedia server is improved.

103. The multimedia server outputs the target multimedia material through the output module.

In the embodiment of the invention, if the target multimedia material is detected to contain the multimedia material in the image form, the multimedia material in the image form can be output and displayed through the display of the output module; if the target multimedia material is detected to contain the multimedia material in the audio form, the multimedia material in the audio form can be output through a loudspeaker of the output module; if the target multimedia material is detected to contain both the multimedia material in the image form and the multimedia material in the audio form, the multimedia material in the image form can be output and displayed through a display of the output module, meanwhile, the multimedia material in the audio form is output through a loudspeaker of the output module, and the multimedia material in the image form and the multimedia material in the audio form are matched in the output process, so that the output multimedia material in the image form can correspond to the multimedia material in the audio form.

As an alternative implementation, the way that the multimedia server outputs the target multimedia material through the output module may include the following steps:

the multimedia server can detect the material quantity of the target multimedia material;

the multimedia server judges whether the material quantity is one;

if yes, the multimedia server outputs the target multimedia material through the output module;

if not, the multimedia server determines the sound volume of the user sound corresponding to each target multimedia material, determines the multimedia material with the maximum sound volume from the target multimedia materials, outputs all the target multimedia materials and outputs the multimedia material with the maximum sound volume in the champion mode.

By implementing the implementation mode, a plurality of target multimedia materials can be output simultaneously, the target multimedia material with the highest volume is determined, and the target multimedia material with the highest volume is output in a champion mode, so that a plurality of users can play games through the multimedia server, and the interestingness of the multimedia server is improved.

In the method described in fig. 1, the output multimedia material can be determined according to the sound input by the user, thereby improving the experience effect of the user on the outdoor display device. In addition, the method described in fig. 1 is implemented, so that the accuracy of the sound collection module for collecting the user sound is improved. In addition, the method described in fig. 1 is implemented to improve the intelligence of the multimedia server. In addition, the method described in fig. 1 is implemented to enhance the interest of the multimedia server.

Example two

Referring to fig. 2, fig. 2 is a flowchart illustrating another audio-based multimedia interaction method according to an embodiment of the present invention. As shown in fig. 2, the sound-based multimedia interaction method may include the steps of:

201. the multimedia server collects the environmental sound through the sound collection module.

In the embodiment of the invention, the multimedia server usually needs to collect the environmental sound first, and the multimedia server cannot determine the sound to be input by the user directly through the obtained sound, so the multimedia server needs to obtain the environmental sound to identify the human sound on the basis of the environmental sound, and only when the environmental sound is identified to contain the human sound, the obtained environmental sound is determined as the user sound input by the user through the sound collection module.

202. The multimedia server identifies whether the environmental sound contains the voice, if so, the step 203 to the step 209 are executed; if not, the flow is ended.

In the embodiment of the invention, the multimedia server can detect the environmental sound through a human voice recognition technology in the deep learning algorithm to recognize whether the environmental sound contains the human voice, if the multimedia server recognizes that the environmental sound contains the voiceprint corresponding to the human voice through the human voice recognition technology, the multimedia server can think that the environmental sound contains the human voice, and then step 203-step 209 are executed; if the multimedia server does not recognize that the environmental sound contains the voiceprint corresponding to the voice through the voice recognition technology, the multimedia server can be considered that the environmental sound does not contain the voice, and the multimedia server needs to continue to collect the environmental sound through the voice collection module and recognize the collected environmental sound again.

203. The multimedia server determines the environmental sound containing the human voice as the user voice collected by the voice collection module.

In the embodiment of the present invention, by implementing the above steps 201 to 203, the collected environmental sound may be detected, and when it is detected that the environmental sound includes a voice, it may be determined that a user outputs a voice to the sound collection module, so that the environmental sound collected by the sound collection module may be determined as a user voice including a voice, thereby improving accuracy of obtaining the user voice.

204. The multimedia server identifies the user voice collected by the voice collection module to obtain the voice characteristics of the user voice.

205. The multimedia server acquires the pitch, frequency and timbre of the user's voice contained in the voice feature.

In the embodiment of the invention, the multimedia server can store the identified characteristics of the tone, the frequency, the tone color and the like of the user voice into the voice characteristics of the user voice, so that the multimedia server can use the tone, the frequency and the tone color corresponding to the user voice contained in the voice characteristics at any time.

206. The multimedia server analyzes the pitch and frequency to obtain a target age matched with the user's voice.

In the embodiment of the invention, the sound emitted by different age periods is different for the same user, so that the multimedia server can calculate the target age of the user according to the tone and the frequency of the sound of the user.

207. And the multimedia server analyzes the tone to obtain the target gender matched with the voice of the user.

In the embodiment of the invention, because the voices of the male and the female are different, the multimedia server can identify the target gender of the user according to the tone of the voice of the user, and further output different multimedia contents according to different genders of the user, so that the output multimedia contents are more suitable for the user to watch.

208. The multimedia server obtains target multimedia materials matched with the target age and the target gender from the multimedia material library.

In the embodiment of the invention, the multimedia server can determine the multimedia content which can be understood by the age according to the target age, determine the output form of the multimedia content according to the target gender and further obtain the target multimedia material according to the determined multimedia content and the output form of the multimedia content.

In the embodiment of the present invention, by implementing the above-mentioned step 205 to step 208, the obtained sound can be analyzed to obtain the tone, frequency and timbre of the sound, and since the tone, frequency and timbre of the sound emitted by people of different ages are different, the age of the current user can be estimated through the tone and frequency of the sound of the current user, and the gender of the current user can be estimated through the timbre of the sound of the current user, thereby improving the intelligence of the multimedia server.

209. The multimedia server outputs the target multimedia material through the output module.

In the method described in fig. 2, the output multimedia material can be determined according to the sound input by the user, thereby improving the experience effect of the user on the outdoor display device. In addition, the method described in fig. 2 is implemented, so that the accuracy of voice acquisition of the user is improved. In addition, the method described in fig. 2 is implemented to improve the intelligence of the multimedia server.

EXAMPLE III

Referring to fig. 3, fig. 3 is a flowchart illustrating another audio-based multimedia interaction method according to an embodiment of the present invention. As shown in fig. 3, the sound-based multimedia interaction method may include the steps of:

301. the multimedia server identifies the user voice collected by the voice collection module to obtain the voice characteristics of the user voice.

302. And the multimedia server acquires the target multimedia material matched with the sound characteristics from the multimedia material library.

303. The multimedia server acquires the volume of the user's voice contained in the voice characteristics.

As an alternative implementation, after the multimedia server performs step 303, the following steps may also be performed:

the multimedia server acquires prestored historical volume and acquires the total amount of the historical volume;

the multimedia server counts the sub-quantity of the historical volume smaller than the volume of the user sound by comparing the volume of the user sound with the historical volume;

the multimedia server calculates a ratio of the number of operators to the total number and determines a score corresponding to the ratio.

By implementing the implementation mode, the number of the overtaking volumes of the user voice can be calculated according to the acquired pre-stored historical volumes acquired in the past, the proportion of the overtaking volumes of the user voice to the total number of the historical volumes is further calculated, the score of the volume of the user voice is determined according to the proportion, and the score of the volume is positively correlated with the volume, so that the participation degree of the user is improved.

304. And the multimedia server analyzes and obtains the volume level corresponding to the volume.

In the embodiment of the invention, the volume level can be stored in the multimedia fangqi in advance, the volume level can be determined by dividing the volume into a plurality of regions, the number of levels contained in the volume level is not limited, and in addition, the volume level corresponding to the larger the volume is, the higher the volume is; the smaller the volume, the lower the corresponding volume level.

305. The multimedia server determines an image reduction scale matching the volume level and a target decibel matching the volume level.

In the embodiment of the invention, the multimedia server can preset the inverse proportion of the volume level and the image reduction proportion, namely, the higher the volume level is, the smaller the corresponding image reduction proportion is, and the lower the volume level is, the larger the corresponding image reduction proportion is; in addition, the multimedia server can also preset that the volume level is in direct proportion to the target decibel, namely the higher the volume level is, the larger the corresponding target decibel is, and the lower the volume level is, the smaller the corresponding target decibel is. In addition, if the volume level reaches the highest level, the multimedia server may set the image reduction scale to 0, and may set the target decibel to a pre-stored maximum decibel, so as to ensure the security of the output module of the multimedia server.

306. The multimedia server outputs and displays the image materials in the target multimedia materials in an image reduction ratio through the display screen of the output module.

In the embodiment of the invention, the output module of the multimedia server can reduce the image material by a certain small proportion before outputting the image material, and then output the reduced image material.

As an alternative embodiment, the manner in which the multimedia server outputs the image material in the display target multimedia material at the image reduced scale through the display screen of the output module may include the following steps:

the multimedia server outputs and displays the image materials in the target multimedia materials and outputs and displays scores in an image reduction scale through the display screen of the output module.

By implementing the implementation mode, the image material in the panel multimedia material can be displayed, and the score of the volume of the sound of the user can also be displayed, so that the user can know the volume of the currently emitted sound, and the richness of the output content of the output module is increased.

307. The multimedia server outputs the sound materials in the target multimedia material library in the target decibels through the loudspeaker of the output module.

In the embodiment of the present invention, by implementing the steps 303 to 307, the reduction ratio of the output image material can be determined according to the volume of the user sound, and the decibel of the output sound material can be determined according to the volume of the user sound, so that both the output image and the output sound change with the volume of the user sound, and the interactivity between the user and the multimedia server is improved.

In the method described in fig. 3, the output multimedia material can be determined according to the sound input by the user, thereby improving the experience effect of the user on the outdoor display device. In addition, the method described in fig. 3 is implemented to improve the user's engagement. In addition, the implementation of the method described in fig. 3 increases the richness of the output content of the output module. Furthermore, the method described in fig. 3 is implemented to improve the interactivity between the user and the multimedia server.

Example four

Referring to fig. 4, fig. 4 is a schematic structural diagram of a voice-based multimedia interaction device according to an embodiment of the present invention. As shown in fig. 4, the sound-based multimedia interactive apparatus may include:

the first identifying unit 401 is configured to identify the user voice collected by the voice collecting module, and obtain a voice feature of the user voice.

As an optional implementation manner, the first identifying unit 401 may further be configured to:

collecting current sound through a sound collection module;

identifying a voice in the current sound and detecting a current decibel of the voice;

judging whether the current decibel is larger than a preset minimum decibel or not;

if yes, determining the current sound as the user sound;

if not, the current sound is determined to be the environmental sound, and the environmental sound is deleted.

A first obtaining unit 402, configured to obtain target multimedia material from the multimedia material library, where the target multimedia material matches the sound characteristics obtained by the first identifying unit 401.

As an alternative implementation, the manner of acquiring, by the first acquiring unit 402, the target multimedia material matching the sound feature from the multimedia material library may specifically be:

acquiring the tone quantity of the tone of the sound from the sound characteristics;

judging whether the tone quantity is one;

if yes, acquiring a target multimedia material matched with the sound timbre in the sound characteristic from a multimedia material library;

if not, sound frequencies and sound tones corresponding to the sound tones are obtained from the sound features, target multimedia materials matched with the sound tones are obtained from the multimedia material library according to the sound frequencies and the sound tones corresponding to the sound tones, and one sound tone is matched with one target multimedia material.

An output unit 403, configured to output the target multimedia material acquired by the first acquiring unit 402 through an output module.

As an optional implementation manner, the manner in which the output unit 403 outputs the target multimedia material through the output module may specifically be:

the material quantity of the target multimedia material can be detected;

judging whether the material quantity is one;

if yes, outputting the target multimedia material through an output module;

if not, determining the sound volume of the user sound corresponding to each target multimedia material, determining the multimedia material with the maximum sound volume from the target multimedia materials, outputting all the target multimedia materials, and outputting the multimedia material with the maximum sound volume in a champion mode.

It can be seen that in the device described in fig. 4, the output multimedia material can be determined according to the sound input by the user, thereby improving the experience effect of the user on the outdoor display device. In addition, in the device described in fig. 4, the accuracy of the sound collection module collecting the user's sound is improved. Furthermore, in the apparatus described in fig. 4, the intelligence of the multimedia server is improved. In addition, in the apparatus described in fig. 4, the interest of the multimedia server is increased.

EXAMPLE five

Referring to fig. 5, fig. 5 is a schematic structural diagram of another audio-based multimedia interaction device according to an embodiment of the present invention. Wherein, the sound-based multimedia interactive apparatus shown in fig. 5 is optimized by the sound-based multimedia interactive apparatus shown in fig. 4. Compared with the sound-based multimedia interaction apparatus shown in fig. 4, the sound-based multimedia interaction apparatus shown in fig. 5 may further include:

the collecting unit 404 is configured to collect the environmental sound through the sound collecting module before the first identifying unit 401 identifies the user sound collected through the sound collecting module to obtain the sound feature of the user sound.

The second identifying unit 405 is configured to identify whether the environmental sound acquired by the acquiring unit 404 includes a human voice.

A determining unit 406, configured to determine, when the result of the identification by the second identifying unit 405 is yes, the environmental sound containing the human voice as the user voice collected by the voice collecting module.

In the embodiment of the invention, the collected environment sound can be detected, and when the environment sound is detected to contain the voice, the user can be considered to output the voice to the voice collection module, so that the environment sound collected by the voice collection module can be confirmed as the user voice containing the voice, and the accuracy of obtaining the user voice is improved.

As an alternative implementation, the first obtaining unit 402 of the voice-based multimedia interaction apparatus shown in fig. 5 may include:

a first obtaining subunit 4021, configured to obtain a tone, a frequency, and a timbre of the user sound included in the sound feature obtained by the first identifying unit 401;

a first analyzing subunit 4022, configured to analyze the pitch and frequency acquired by the first acquiring subunit 4021 to obtain a target age matched with the user's voice;

the second analysis subunit 4023 is configured to analyze the timbre acquired by the first acquisition subunit 4021 to obtain a target gender matched with the user's voice;

a second obtaining subunit 4024, configured to obtain, from the multimedia material library, a target multimedia material that matches the target age obtained by the first analyzing subunit 4022 and the target gender obtained by the second analyzing subunit 4023.

By implementing the implementation mode, the acquired sound can be analyzed to obtain the tone, the frequency and the tone of the sound, and because the tone, the frequency and the tone of the sound emitted by people of different ages are different, the age of the current user can be estimated according to the tone and the frequency of the sound of the current user, and the gender of the current user can be estimated according to the tone of the sound of the current user, so that the intelligence of the multimedia server is improved.

It can be seen that in the device described in fig. 5, the output multimedia material can be determined according to the sound input by the user, thereby improving the experience effect of the user on the outdoor display device. In addition, in the device described in fig. 5, the accuracy of user voice acquisition is improved. Furthermore, in the apparatus described in fig. 5, the intelligence of the multimedia server is improved.

EXAMPLE six

Referring to fig. 6, fig. 6 is a schematic structural diagram of another audio-based multimedia interaction device according to an embodiment of the present invention. Wherein, the sound-based multimedia interactive apparatus shown in fig. 6 is optimized by the sound-based multimedia interactive apparatus shown in fig. 5. Compared with the sound-based multimedia interaction apparatus shown in fig. 5, the output unit 403 of the sound-based multimedia interaction apparatus shown in fig. 6 may further include:

a third acquiring subunit 4031, configured to acquire a volume of the user sound included in the sound feature.

A third analyzing subunit 4032, configured to analyze and obtain a volume level corresponding to the volume acquired by the third acquiring subunit 4031.

A determining subunit 4033, configured to determine an image reduction ratio matched with the volume level obtained by the third analyzing subunit 4032 and a target decibel matched with the volume level.

A first output subunit 4034 configured to output, through the display screen of the output module, an image material in the display-target multimedia material at the image reduction scale determined by the determination subunit 4033.

A second output subunit 4035, configured to output, through the speaker of the output module, the sound material in the target multimedia material library in the target decibel determined by the determination subunit 4033.

In the embodiment of the invention, the reduction proportion of the output image material can be determined according to the volume of the user sound, and the decibel of the output sound material can be determined according to the volume of the user sound, so that the output image and the output sound are changed along with the difference of the volume of the user sound, and the interactivity between the user and the multimedia server is improved.

As an alternative embodiment, the sound-based multimedia interaction apparatus shown in fig. 6 may further include:

a second obtaining unit 407, configured to obtain a prestored historical volume after the third obtaining subunit 4031 obtains the volume of the user sound included in the sound feature, and obtain the total number of the historical volumes;

a counting unit 408, configured to count the sub-number of the volume of the history volume smaller than the volume of the user sound by comparing the volume of the user sound acquired by the third acquiring subunit 4031 with the volume of the history volume acquired by the second acquiring unit 407;

a calculating unit 409, configured to calculate a ratio between the sub-quantity counted by the counting unit 408 and the total quantity acquired by the second acquiring unit 407, and determine a score corresponding to the ratio;

As an optional implementation manner, the manner in which the first output subunit 4034 outputs the image material in the display target multimedia material at the image reduced scale through the display screen of the output module may specifically be:

and outputting the image material in the display target multimedia material and the output display score in an image reduction scale through a display screen of the output module.

It can be seen that in the device described in fig. 6, the output multimedia material can be determined according to the sound input by the user, thereby improving the experience effect of the user on the outdoor display device. Furthermore, in the apparatus depicted in fig. 6, the user's engagement is improved. In addition, in the apparatus depicted in fig. 6, the richness of the output content of the output module is increased. Furthermore, in the arrangement depicted in fig. 6, the interactivity between the user and the multimedia server is improved.

EXAMPLE seven

Referring to fig. 7, fig. 7 is a schematic structural diagram of a service device according to an embodiment of the present invention. As shown in fig. 7, the service apparatus may include:

a memory 701 in which executable program code is stored;

a processor 702 coupled to the memory 701;

wherein, the processor 702 calls the executable program code stored in the memory 701 to execute part or all of the steps of the method in the above method embodiments.

The embodiment of the invention also discloses a computer readable storage medium, wherein the computer readable storage medium stores program codes, wherein the program codes comprise instructions for executing part or all of the steps of the method in the above method embodiments.

Embodiments of the present invention also disclose a computer program product, wherein, when the computer program product is run on a computer, the computer is caused to execute part or all of the steps of the method as in the above method embodiments.

The embodiment of the present invention also discloses an application publishing platform, wherein the application publishing platform is used for publishing a computer program product, and when the computer program product runs on a computer, the computer is caused to execute part or all of the steps of the method in the above method embodiments.

It should be appreciated that reference throughout this specification to "an embodiment of the present invention" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase "in embodiments of the invention" appearing in various places throughout the specification are not necessarily all referring to the same embodiments. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also appreciate that the embodiments described in this specification are exemplary and alternative embodiments, and that the acts and modules illustrated are not required in order to practice the invention.

In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

In addition, the terms "system" and "network" are often used interchangeably herein. It should be understood that the term "and/or" herein is merely one type of association relationship describing an associated object, meaning that three relationships may exist, for example, a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.

The foregoing describes in detail a method and apparatus for multimedia interaction based on sound disclosed in the embodiments of the present invention, and the present invention is described in its principles and embodiments by applying specific examples, and the description of the foregoing embodiments is only used to help understanding the method and its core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for voice-based multimedia interaction, the method comprising:

recognizing user voice collected by a voice collecting module to obtain voice characteristics of the user voice; acquiring a target multimedia material matched with the sound characteristics from a multimedia material library;

acquiring the volume of the user voice contained in the voice feature; analyzing to obtain a volume level corresponding to the volume; determining an image reduction scale matching the volume level and a target decibel matching the volume level, the volume level being inversely proportional to the image reduction scale and directly proportional to the target decibel;

detecting the material quantity of the target multimedia material, and judging whether the material quantity is one;

if yes, outputting and displaying an image material in the target multimedia material in the image reduction scale through a display screen of an output module, and outputting a sound material in the target multimedia material in the target decibel through a loudspeaker of the output module;

and if not, determining the sound volume of the user sound corresponding to each target multimedia material, determining the multimedia material with the maximum sound volume from the target multimedia materials, outputting all the target multimedia materials, and outputting the multimedia material with the maximum sound volume in a champion mode.

2. The method of claim 1, wherein before the identifying the user voice collected by the voice collection module and obtaining the voice characteristics of the user voice, the method further comprises:

collecting environmental sounds through a sound collection module;

identifying whether the environmental sound contains human voice;

3. The method of claim 2, wherein obtaining the target multimedia material from the library of multimedia materials that matches the sound feature comprises:

analyzing the tone to obtain the target gender matched with the user voice;

4. The method of claim 1, wherein after obtaining the volume of the user's voice contained in the voice feature, the method further comprises:

5. A voice-based multimedia interaction apparatus, comprising:

the output unit is used for outputting the target multimedia material through an output module;

the output unit includes:

a determining subunit, configured to determine an image reduction ratio matching the volume level and a target decibel matching the volume level, where the volume level is inversely proportional to the image reduction ratio and directly proportional to the target decibel;

the subunit is used for detecting the material quantity of the target multimedia material and judging whether the material quantity is one;

the first output subunit is used for outputting and displaying the image material in the target multimedia material in the image reduction scale through a display screen of an output module when the material quantity is one;

the second output subunit is used for outputting the sound material in the target multimedia material in the target decibel through the loudspeaker of the output module when the material quantity is one;

and the subunit is used for determining the sound volume of the user sound corresponding to each target multimedia material when the number of the materials is not one, determining the multimedia material with the maximum sound volume from each target multimedia material, outputting all the target multimedia materials and outputting the multimedia material with the maximum sound volume in a champion mode.

6. The voice-based multimedia interaction apparatus of claim 5, wherein the apparatus further comprises:

7. The audio-based multimedia interaction device of claim 6, wherein the first obtaining unit comprises:

8. The voice-based multimedia interaction apparatus of claim 5, wherein the apparatus further comprises:

a second obtaining unit, configured to obtain pre-stored historical volumes after the third obtaining subunit obtains the volumes of the user sounds included in the sound features, and obtain the total number of the historical volumes;