CN110691261A

CN110691261A - Multimedia data interaction method, communication device and computer readable storage medium

Info

Publication number: CN110691261A
Application number: CN201910938977.0A
Authority: CN
Inventors: 李立锋; 刘昕; 颜忠伟; 叶军; 吴嘉旭; 颜伟婷; 王斌
Original assignee: MIGU Video Technology Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: MIGU Video Technology Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-01-14

Abstract

The invention provides a multimedia data interaction method, communication equipment and a computer readable storage medium, wherein the multimedia data interaction method comprises the following steps: in the process of playing the received multimedia data, if the triggering operation is detected, updating the multimedia data received in the preset time period according to the interactive material data received in the preset time period; playing the updated multimedia data; and the preset time period is a time period after the moment of detecting the trigger operation. The method and the device can solve the problem that the display effect of the interactive materials displayed in the video playing process is poor in the prior art.

Description

Multimedia data interaction method, communication device and computer readable storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a multimedia data interaction method, a communication device, and a computer-readable storage medium.

Background

In the prior art, for an advertisement displayed in a video playing process, the main related processing schemes are as follows: and changing the current video playing content or performing page jump by monitoring the click event of the advertisement position.

However, the above treatment scheme has the following drawbacks:

advertisement substitution feels the defect: after being selected (interacted with a user), the existing delivered advertisements are independently presented only in a mode of jumping out of the current page to enter an advertisement page, displaying advertisements in a popup window mode and the like, and cannot be combined with the currently played video picture.

That is to say, in the process of playing a video in the prior art, the selected interactive material such as an advertisement cannot be combined with the currently played multimedia data such as a video, which results in a poor display effect of the interactive material.

Disclosure of Invention

The invention aims to provide a multimedia data interaction method, communication equipment and a computer readable storage medium, which solve the problem that the display effect of interactive materials presented in the video playing process is poor in the prior art.

In order to solve the above technical problem, an embodiment of the present invention provides a multimedia data interaction method, which is applied to a terminal, and includes:

in the process of playing the received multimedia data, if the triggering operation is detected, updating the multimedia data received in the preset time period according to the interactive material data received in the preset time period;

playing the updated multimedia data;

and the preset time period is a time period after the moment of detecting the trigger operation.

Optionally, the interactive material data includes: object information to be replaced and target object information;

the updating process of the multimedia data received in the preset time period according to the interactive material data received in the preset time period comprises the following steps:

and updating the image data of the object to be replaced in the multimedia data into the image data of the target object to obtain target video data.

Optionally, the interactive material data further includes: text information corresponding to the target object;

after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object, and the target video data is obtained, the method further comprises the following steps:

matching the target video data with the text information;

the playing the updated multimedia data includes:

and displaying the text information in a preset form in a playing picture of the target video data.

Optionally, the interactive material data further includes: audio data of voice information corresponding to the target object;

and replacing the audio data corresponding to the picture with the object to be replaced in the multimedia data with the audio data of the voice information.

Optionally, before playing the received multimedia data, the method further includes:

receiving first audio data and corresponding timestamp information sent by a server; the first audio data includes audio data of voice information corresponding to the target object;

acquiring a playing time point when the trigger operation is detected;

acquiring target audio data corresponding to the target video data from the first audio data according to the playing time point and the timestamp information;

and replacing the audio data in the multimedia data with the target audio data.

Optionally, before updating the multimedia data received in the preset time period according to the interactive material data received in the preset time period, the method further includes:

acquiring a playing time point when the trigger operation is detected;

sending an interactive request to a server according to the playing time point;

receiving target audio data fed back by the server according to the interaction request; the target audio data includes audio data of voice information corresponding to the target object;

and replacing the audio data in the multimedia data with the target audio data.

The embodiment of the invention also provides a multimedia data interaction method, which is applied to a server and comprises the following steps:

and pushing the multimedia data and the corresponding interactive material data to the terminal in real time.

Optionally, the interactive material data includes: object information to be replaced and target object information.

Optionally, the interactive material data further includes: text information corresponding to the target object; and/or audio data of speech information corresponding to the target object.

Optionally, before pushing the multimedia data and the corresponding interactive material data to the terminal in real time, the method further includes:

sending first audio data and corresponding timestamp information to the terminal; the first audio data includes audio data of speech information corresponding to the target object.

Optionally, after the multimedia data and the corresponding interactive material data are pushed to the terminal in real time, the method further includes:

receiving an interactive request sent by the terminal;

acquiring target audio data according to the playing time point in the interactive request; the target audio data includes audio data of voice information corresponding to the target object;

and feeding back the target audio data to the terminal according to the interactive request.

acquiring characteristic parameters of the target object;

according to the characteristic parameters, obtaining the matching degree between the target object and each object in a preset picture; the preset pictures comprise pictures formed by first video data corresponding to the multimedia data;

acquiring at least one group of target pictures meeting preset conditions from each picture according to the matching degree;

and obtaining at least one video clip of the target object according to the target picture.

Optionally, in a case that the interactive material data includes audio data of the voice information corresponding to the target object, before the multimedia data and the corresponding interactive material data are pushed to the terminal in real time, the method further includes:

acquiring action information of a target person in the video clip;

acquiring target action information adapted to the target object from the action information;

matching text information corresponding to the target action information aiming at the target action information;

and configuring voice information aiming at the target object according to the text information corresponding to the target action information.

Optionally, the matching, for the target action information, text information corresponding to the target action information includes:

obtaining a content category corresponding to the target action information according to the action type information in the target action information and the quantity information corresponding to each action type;

and matching text information corresponding to the target action information according to the content category corresponding to the target action information.

Optionally, before configuring the voice information for the target object according to the text information corresponding to the target action information, the method further includes:

acquiring voice data of the target person from second audio data;

obtaining voice characteristic information of the target person according to the voice data;

the configuring, according to the text information corresponding to the target action information, the voice information for the target object includes:

configuring voice information aiming at the target object according to the voice characteristic information and text information corresponding to the target action information;

wherein the second audio data comprises audio data corresponding to the first video data.

The embodiment of the invention also provides a multimedia data interaction device, which is applied to a terminal and comprises the following components:

the first processing module is used for updating the multimedia data received in the preset time period according to the interactive material data received in the preset time period if the triggering operation is detected in the process of playing the received multimedia data;

the first playing module is used for playing the updated multimedia data;

the first processing module comprises:

and the first processing submodule is used for updating the image data of the object to be replaced in the multimedia data into the image data of the target object to obtain target video data.

the first processing module further comprises:

the first matching submodule is used for matching the target video data with the text information after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data;

the first playing module comprises:

and the first display sub-module is used for displaying the text information in a preset form in a playing picture of the target video data.

the first processing module further comprises:

and the second processing submodule is used for replacing the audio data corresponding to the picture with the object to be replaced in the multimedia data with the audio data of the voice information after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data.

Optionally, the method further includes:

the first receiving module is used for receiving first audio data and corresponding timestamp information sent by a server before the received multimedia data are played; the first audio data includes audio data of voice information corresponding to the target object;

the first processing module further comprises:

the first obtaining submodule is used for obtaining a playing time point when the trigger operation is detected after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain target video data;

the second obtaining submodule is used for obtaining target audio data corresponding to the target video data from the first audio data according to the playing time point and the timestamp information;

and the third processing submodule is used for replacing the audio data in the multimedia data with the target audio data.

Optionally, the method further includes:

the first acquisition module is used for acquiring a playing time point when the trigger operation is detected before multimedia data received in a preset time period is updated according to the interactive material data received in the preset time period;

the first sending module is used for sending an interaction request to a server according to the playing time point;

the second receiving module is used for receiving target audio data fed back by the server according to the interaction request; the target audio data includes audio data of voice information corresponding to the target object;

the first processing module further comprises:

and the fourth processing submodule is used for replacing the audio data in the multimedia data with the target audio data after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data.

The embodiment of the invention also provides a multimedia data interaction device, which is applied to a server and comprises the following components:

the first pushing module is used for pushing the multimedia data and the corresponding interactive material data to the terminal in real time.

Optionally, the method further includes:

the second sending module is used for sending the first audio data and the corresponding timestamp information to the terminal before the multimedia data and the corresponding interactive material data are pushed to the terminal in real time; the first audio data includes audio data of speech information corresponding to the target object.

Optionally, the method further includes:

the third receiving module is used for receiving the interaction request sent by the terminal after the multimedia data and the corresponding interactive material data are pushed to the terminal in real time;

the second acquisition module is used for acquiring target audio data according to the playing time point in the interactive request; the target audio data includes audio data of voice information corresponding to the target object;

and the first feedback module is used for feeding back the target audio data to the terminal according to the interactive request.

Optionally, the method further includes:

the third acquisition module is used for acquiring the characteristic parameters of the target object before the multimedia data and the corresponding interactive material data are pushed to the terminal in real time;

the fourth obtaining module is used for obtaining the matching degree between the target object and each object in a preset picture according to the characteristic parameters; the preset pictures comprise pictures formed by first video data corresponding to the multimedia data;

a fifth obtaining module, configured to obtain, according to the matching degree, at least one group of target pictures that meet a preset condition from the pictures;

and the second processing module is used for obtaining at least one video clip of the target object according to the target picture.

Optionally, in a case that the interactive material data includes audio data of voice information corresponding to the target object, the method further includes:

the sixth acquisition module is used for acquiring action information of a target person in the video clip before the multimedia data and the corresponding interactive material data are pushed to the terminal in real time;

a seventh obtaining module, configured to obtain target action information adapted to the target object from the action information;

the first matching module is used for matching text information corresponding to the target action information aiming at the target action information;

and the first configuration module is used for configuring the voice information aiming at the target object according to the text information corresponding to the target action information.

Optionally, the first matching module includes:

the fifth processing submodule is used for obtaining a content category corresponding to the target action information according to the action type information in the target action information and the quantity information corresponding to each action type;

and the second matching sub-module is used for matching the text information corresponding to the target action information according to the content category corresponding to the target action information.

Optionally, the method further includes:

an eighth obtaining module, configured to obtain voice data of the target person from second audio data before configuring voice information for the target object according to the text information corresponding to the target action information;

the third processing module is used for obtaining the voice characteristic information of the target person according to the voice data;

the first configuration module, comprising:

the first configuration submodule is used for configuring the voice information aiming at the target object according to the voice characteristic information and the text information corresponding to the target action information;

The embodiment of the invention also provides communication equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor; and when the processor executes the program, the multimedia data interaction method of the terminal side or the server side is realized.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the multimedia data interaction method on the terminal side or the server side.

The technical scheme of the invention has the following beneficial effects:

in the scheme, the multimedia data interaction method updates the multimedia data received in the preset time period according to the interactive material data received in the preset time period if the triggering operation is detected in the process of playing the received multimedia data; playing the updated multimedia data; the preset time period is a time period after the moment of detecting the trigger operation; the interactive material data and the played multimedia data can be combined and then presented to the user, so that the substitution sense of the interactive mode of the interactive material data and the expressive force of the interactive material data in the multimedia data playing process are improved, and the display effect of the interactive material is improved; the problem that the display effect of the interactive materials displayed in the video playing process is poor in the prior art is well solved.

Drawings

FIG. 1 is a first flowchart illustrating a multimedia data interaction method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a multimedia data interaction method according to an embodiment of the present invention;

FIG. 3 is a first schematic view of video playing according to an embodiment of the present invention;

FIG. 4 is a second schematic view of video playing according to an embodiment of the present invention;

FIG. 5 is a third schematic view of video playing according to an embodiment of the present invention;

FIG. 6 is a fourth schematic view of video playing according to an embodiment of the present invention;

FIG. 7 is a first schematic structural diagram of a multimedia data interaction device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a multimedia data interaction device according to an embodiment of the invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

The invention provides a multimedia data interaction method aiming at the problem of poor display effect of interactive materials presented in the video playing process in the prior art, which is applied to a terminal and comprises the following steps:

step 11: in the process of playing the received multimedia data, if the triggering operation is detected, updating the multimedia data received in the preset time period according to the interactive material data received in the preset time period;

step 12: playing the updated multimedia data;

the preset time period is a time period after the moment of detecting the trigger operation (which can also be understood as the preset time period is a time period after the play time point of detecting the trigger operation).

Specifically, the preset time period may be a time period formed by the playing time corresponding to the received multimedia data; alternatively, the preset time period may include: a first time period formed by the playing time corresponding to the received multimedia data and a second time period formed by the playing time corresponding to the multimedia data to be received; alternatively, the preset time period may include: a time period formed by playing time corresponding to the multimedia data to be received; and is not limited herein.

The trigger operation may include selection of a preset position, selection of preset information, and the like, which are not limited herein.

According to the multimedia data interaction method provided by the embodiment of the invention, in the process of playing the received multimedia data, if the trigger operation is detected, the multimedia data received in the preset time period is updated according to the interactive material data received in the preset time period; playing the updated multimedia data; the preset time period is a time period after the moment of detecting the trigger operation; the interactive material data and the played multimedia data can be combined and then presented to the user, so that the substitution sense of the interactive mode of the interactive material data and the expressive force of the interactive material data in the multimedia data playing process are improved, and the display effect of the interactive material is improved; the problem that the display effect of the interactive materials displayed in the video playing process is poor in the prior art is well solved.

In this embodiment of the present invention, before playing the received multimedia data, the method further includes: sending a multimedia data acquisition request to a server; and receiving the multimedia data and the interactive material data which are pushed in real time by the server according to the multimedia data acquisition request. The method further comprises the following steps: and caching the received multimedia data and the interactive material data.

Wherein the interactive material data comprises: object information to be replaced and target object information; the updating process of the multimedia data received in the preset time period according to the interactive material data received in the preset time period comprises the following steps: and updating the image data of the object to be replaced in the multimedia data into the image data of the target object to obtain target video data.

Specifically, for example, the information of the object to be replaced is milk tea cup information, and the information of the target object is coffee cup information; and replacing the milk tea cup image with the coffee cup image in the multimedia data.

In order to further improve the expressive power of the target object, the embodiment of the present invention further provides a publicity term of the target object, and regarding a concrete embodiment form of the publicity term, the embodiment of the present invention provides two examples, wherein the example one is a text form, and the example two is a speech form:

for example one, the interactive material data further includes: text information corresponding to the target object; after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data, the method further includes (it can also be understood that the method performs update processing on the multimedia data received in the preset time period according to the interactive material data received in the preset time period, and further includes the following steps: matching the target video data with the text information; the playing the updated multimedia data includes: and displaying the text information in a preset form in a playing picture of the target video data.

Specific examples thereof are: the text information comprises advertising words of the coffee cup, and the advertising words of the coffee cup are displayed in a playing picture in the process of playing the target video for replacing the milk tea cup image with the coffee cup image.

The preset form is not limited to a player floating advertisement, a banner advertisement, etc.

For example two, the embodiment of the present invention provides the following three specific implementation manners:

in a first implementation manner, the interactive material data further includes: audio data of voice information corresponding to the target object; after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data, the method further includes (it can also be understood that the method performs update processing on the multimedia data received in the preset time period according to the interactive material data received in the preset time period, and further includes the following steps: and replacing the audio data corresponding to the picture with the object to be replaced in the multimedia data with the audio data of the voice information.

Specific examples thereof are: aiming at replacing the milk tea cup image with the video picture of the coffee cup image, replacing the voice data corresponding to the video picture with the voice data corresponding to the coffee cup, and also being understood as the voice advertising word of the coffee cup; the voice advertising words in the implementation mode can be pushed to the terminal in real time along with the multimedia data.

In a second implementation manner, before playing the received multimedia data, the method further includes: receiving first audio data and corresponding timestamp information sent by a server; the first audio data includes audio data of voice information corresponding to the target object; after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data, the method further includes (it can also be understood that the method performs update processing on the multimedia data received in the preset time period according to the interactive material data received in the preset time period, and further includes the following steps: acquiring a playing time point when the trigger operation is detected; acquiring target audio data corresponding to the target video data from the first audio data according to the playing time point and the timestamp information; and replacing the audio data in the multimedia data with the target audio data.

Specific examples thereof are: aiming at replacing the milk tea cup image with the video picture of the coffee cup image, replacing the voice data corresponding to the video picture with the voice data corresponding to the coffee cup, and also being understood as the voice advertising word of the coffee cup; the voice advertising words in this implementation may be downloaded in advance at the terminal.

In a third implementation manner, before updating the multimedia data received in the preset time period according to the interactive material data received in the preset time period, the method further includes: acquiring a playing time point when the trigger operation is detected; sending an interactive request to a server according to the playing time point; receiving target audio data fed back by the server according to the interaction request; the target audio data includes audio data of voice information corresponding to the target object; after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data, the method further includes (it can also be understood that the method performs update processing on the multimedia data received in the preset time period according to the interactive material data received in the preset time period, and further includes the following steps: and replacing the audio data in the multimedia data with the target audio data.

Specific examples thereof are: aiming at replacing the milk tea cup image with the video picture of the coffee cup image, replacing the voice data corresponding to the video picture with the voice data corresponding to the coffee cup, and also being understood as the voice advertising word of the coffee cup; the voice advertising words in the implementation mode can be pushed by the terminal real-time request server.

An embodiment of the present invention further provides a multimedia data interaction method, applied to a server, as shown in fig. 2, including:

step 21: and pushing the multimedia data and the corresponding interactive material data to the terminal in real time.

The multimedia data interaction method provided by the embodiment of the invention pushes multimedia data and corresponding interactive material data to a terminal in real time; the terminal can be supported to realize that the interactive material data is combined with the played multimedia data and then presented to the user, so that the substitution sense of the interactive mode of the interactive material data and the expressive force of the interactive material data in the multimedia data playing process are improved, and the display effect of the interactive material is improved; under the condition that the interactive material data relate to the advertisement content, the substitution sense of the advertisement content interactive mode in the multimedia data playing process is also improved, and the problem that the interactive material display effect is poor due to the fact that the interactive material presented in the video playing process cannot be combined with the currently played multimedia data in the prior art is well solved.

Specifically, the pushing of the multimedia data and the corresponding interactive material data to the terminal in real time may include: receiving a multimedia data acquisition request sent by a terminal; and pushing corresponding multimedia data and interactive material data to the terminal in real time according to the multimedia data acquisition request.

Wherein the interactive material data comprises: object information to be replaced and target object information.

In this way, the terminal can replace the object to be replaced in the related video picture with the target object, specifically, for example, the information of the object to be replaced is milk cup information, and the information of the target object is coffee cup information; and replacing the milk tea cup image with the coffee cup image in the multimedia data.

correspondingly, the interactive material data further comprises: text information corresponding to the target object; and/or audio data of speech information corresponding to the target object.

Specifically, for the first example, the interactive material data further includes: text information corresponding to the target object.

In this example, the server may also send display form information (the above-mentioned preset form) of the text information to the terminal; the display position information of the text information can be sent to the terminal; these messages may be sent to the terminal together with the text message, or may be sent separately, which is not limited herein. Of course, the display form, display position, and the like of the text information may be determined by the terminal itself, and are not limited herein.

in a first implementation manner, the interactive material data further includes: audio data of the speech information corresponding to the target object.

The audio data in this implementation may be pushed to the terminal in real-time with the multimedia data.

The second implementation manner, before pushing the multimedia data and the corresponding interactive material data to the terminal in real time, further includes: sending first audio data and corresponding timestamp information to the terminal; the first audio data includes audio data of speech information corresponding to the target object.

The first audio data in this implementation may be understood as being sent to the terminal in advance (before the terminal processes the multimedia data).

In a third implementation manner, after the multimedia data and the corresponding interactive material data are pushed to the terminal in real time, the method further includes (also may include, after the multimedia data acquisition request sent by the terminal is received, the method further includes): receiving an interactive request sent by the terminal; acquiring target audio data according to the playing time point in the interactive request; the target audio data includes audio data of voice information corresponding to the target object; and feeding back the target audio data to the terminal according to the interactive request.

The target audio data in the implementation mode can be pushed by a terminal real-time request server.

The obtaining of the target audio data by the server may specifically include: and selecting audio data of a corresponding time interval (which can contain the playing time point) from the stored voice data as target audio data.

Further, before pushing the multimedia data and the corresponding interactive material data to the terminal in real time, the method further includes (also may include before receiving the multimedia data acquisition request sent by the terminal): acquiring characteristic parameters of the target object; according to the characteristic parameters, obtaining the matching degree between the target object and each object in a preset picture; the preset pictures comprise pictures formed by first video data corresponding to the multimedia data (which can also be understood as corresponding to the multimedia data acquisition request); acquiring at least one group of target pictures meeting preset conditions from each picture according to the matching degree; and obtaining at least one video clip of the target object according to the target picture.

According to the embodiment of the invention, the advertisement of the target object can be put according to the obtained video clip.

Regarding "obtaining at least one video clip of the target object according to the target picture", the following may be specifically mentioned: acquiring various durations of continuous appearance of various objects to be replaced in the video according with requirements; and taking the video segment corresponding to the time length greater than the threshold value as the video segment of the target object. The threshold may be 5s, but is not limited thereto. And under the condition that the time difference between the playing moments respectively corresponding to the two video pictures with the to-be-replaced objects is less than or equal to a preset value, confirming that the two video pictures belong to the situation that the to-be-replaced objects continuously appear. The preset value may be 2s, but is not limited thereto.

In the embodiment of the present invention, after the target pictures are obtained, the position information and the shape information of the object to be replaced corresponding to the target object in each target picture can be obtained, and the like, which are required when the replacement operation is subsequently performed (i.e., the image of the object to be replaced in the multimedia data is updated to the image of the target object).

Furthermore, in order to obtain audio data of the voice information corresponding to the target object (specifically, it may be understood as configuring a voice advertisement word), in the case that the interactive material data includes audio data of the voice information corresponding to the target object, before pushing the multimedia data and the corresponding interactive material data to the terminal in real time, the method further includes (also may include, before receiving a multimedia data acquisition request sent by the terminal): acquiring action information of a target person in the video clip; acquiring target action information adapted to the target object from the action information; matching text information corresponding to the target action information aiming at the target action information; and configuring voice information aiming at the target object according to the text information corresponding to the target action information.

And the text information corresponding to the target action information can be understood as scenario information. The target person may include a person holding the target object, a person facing the target object, and a person with a body part pointing to the target object, which is not limited herein.

Specifically, the matching, for the target action information, the text information corresponding to the target action information includes: obtaining a content category corresponding to the target action information according to the action type information in the target action information and the quantity information corresponding to each action type; and matching text information corresponding to the target action information according to the content category corresponding to the target action information.

The content category may be understood as a scenario category.

Further, before configuring the voice information for the target object according to the text information corresponding to the target action information, the method further includes: acquiring voice data of the target person from second audio data; obtaining voice characteristic information of the target person according to the voice data; the configuring, according to the text information corresponding to the target action information, the voice information for the target object includes: configuring voice information aiming at the target object according to the voice characteristic information and text information corresponding to the target action information; wherein the second audio data comprises audio data corresponding to the first video data.

Namely, the voice features of the target person are extracted in advance, and then dubbing is performed on the target object by using the extracted voice features.

In the embodiment of the present invention, regarding the operation of replacing the object to be replaced with the target object by the terminal, the operation may further include an operation of adding a special effect, for example, adding a target special effect to the target object, and the like, which is not limited herein.

The multimedia data interaction method provided by the embodiment of the invention is further described below by combining a terminal, a server and the like, the triggering operation is performed by clicking advertisement information on a video playing interface, the target object is an advertisement, and the terminal is a mobile phone.

In view of the above technical problems, an embodiment of the present invention provides a multimedia data interaction method, which may specifically include the following steps:

part 1, advertisement feature tagging

a. The ad feature is tagged, in the following way:

1) and establishing a label library, and labeling the advertisement according to the labels in the label library. The label library is mainly used for keeping the same advertisement objects and the printed labels consistent;

2) the object identification capability can be used for identifying the advertisement materials (advertisement information) and uniformly printing the identified labels.

b. The tag may include at least one of the following characteristic information of the advertisement:

name information, shape information, aspect ratio information, category information, and brand information.

Part 2, the server carries out advertisement putting according to the characteristic information of the advertisement

a. The video clips needing to be advertised can be analyzed by using AI (artificial intelligence) object recognition and face recognition capabilities, and objects similar to or identical to the advertised object are recognized (the objects can be recognized according to the comprehensive matching degree of the characteristic information of the advertised object).

b. And (5) delivering advertisements according to a set rule or matching degree. The matching degree calculation method may be as follows:

the highest matching degree is a highest value defined according to the influence of each dimension on a result and can be adjusted according to actual service requirements;

the comprehensive matching degree is name matching degree, category matching degree, shape matching degree, aspect ratio matching degree and brand matching degree.

c. The advertisement putting method comprises the following steps:

1) selecting the advertisement put in the video picture from high to low according to the matching degree of the put advertisement and the video content (each object in the video picture);

2) according to the appearance duration, filtering (excluding) video contents with the continuous appearance time less than 5s (the video pictures of the same video content are separated by less than 2s, and are regarded as continuous appearance and within one appearance duration);

3) and (3) according to the specified time length of the appearance of the advertising objects, preferably performing the delivery according to the step 1).

Part 3, the server can analyze and process the video clip by using AI capability

Video understanding:

a. after obtaining the video segment of the advertisement delivery (the existing video can be segmented and analyzed), the video segment is analyzed by using the AI video understanding capability, and the video content is output in a specific text structure, such as:

person + time (not necessarily) + location (not necessarily) + what is dry (action + target object).

b. Classifying the output content according to the action type and the action quantity;

such as: the person A drinks the milk tea; person B is drinking coffee; the character C takes up the milk tea, puts down the milk tea and takes up the tea water; pouring milk by a character D;

in the above, the following steps: the people A and B drink milk tea and coffee respectively belong to the same category.

The character C takes up the milk tea, puts down the milk tea and takes up the tea water;

"character D inverted milk" is one type;

c. an advertisement scenario is set for each category in b.

For example, the actions "xxx picked up beverage a, xxx dropped beverage a, xxx picked up beverage B" correspond to scenarios where:

replacing the beverage B with an advertisement;

xxx after putting back the beverage A, saying that the phrase "is still the beverage B to me flavor";

in the above, xxx represents a person; beverage a and beverage B represent the same category, but are not the same object.

Further, the embodiment of the present invention further includes obtaining the voice characteristics:

a. after a video clip for advertisement delivery is obtained, people are used for identifying and detecting whether people exist in the picture;

b. when a person exists in the picture, detecting voice information of the person in all the segments in the video, and extracting audio information from the voice;

this can facilitate subsequent configuration of the voice advertising words for the advertisement with respect to the character.

Part 4, advertisement interaction

Before performing an advertisement interaction, the solution provided by the embodiment of the present invention may include the following:

(1) the server can analyze the video content and the advertising objects, and respectively identify people, objects, actions, logo and the like related to the advertising objects in the video content, and brands, categories and the like of the advertising objects (see part 2);

(2) the server can store the video content and the advertisement in specific formats respectively according to the identification result. The format can be json, xml format file, or database type such as mango.

For a video content portion, at least the following information is contained: identifying object name, motion, location, time period of occurrence in the video, category, size, shape, aspect ratio, brand (optional), etc. information. This information may be stored in a server or a media asset repository.

For the advertisement portion, at least the following information is contained: name, category, brand, shape, aspect ratio. This information may be stored at a server, or at an advertising platform.

(3) And the server (which can be an AI server or other servers) acquires the advertising objects to be delivered to the video from the advertising platform and matches the advertising objects with the video content to be delivered. And returning the matching result to the advertisement platform according to the matching rule from high to low in matching degree. The advertisement platform can be launched according to default business rules (the advertisement is preferentially appeared in a video picture with higher matching degree), and can also be adjusted manually. The advertisement platform may send the impression results to the server.

Of course, the matching operation can be completed by the server.

Advertisement interaction example scenario 1: advertising information interacting with advertising objects

a. Click events of advertisement information can be monitored by using the APP;

b. when the user watches the video with the mobile phone, the mobile phone obtains information of the object to be replaced from the server (in the operation (2), information of the shape, the position, and the like of the object to be replaced can be obtained), and obtains interactive materials (for buffering and temporary storage) such as related information of the advertisement object (the target object) from the server or the advertisement platform.

c. After the user clicks the advertisement information, the mobile phone end can carry out local deformation processing on the image of the advertisement object according to the information such as the shape of the object to be replaced in the video clip, so that the image of the advertisement object is consistent with the shape of the object to be replaced in the video picture, and then the advertisement object is replaced; or the size of the advertisement image can be changed firstly to keep the shapes of the advertisement image and the advertisement image close, then the picture of the object to be replaced in the video picture is taken as a mask, the redundant part of the advertisement image is cut and then replaced; and the video deduction technology can be adopted to deduct the image of the object to be replaced in the video picture and then replace the image by using the advertisement image.

The video clips in the present operation may include: the video picture comprises video clips of the advertisement corresponding to the clicked advertisement information within a preset time period; the preset time period may be a time period satisfying the first condition, the time period may be a first time period satisfying the first condition, or may be a plurality of time periods such as a first time period satisfying the first condition, or even all time periods satisfying the first condition in the remaining video content that is not played in the entire video content. The first condition may specifically include: and continuously including the advertising objects corresponding to the advertising information in the video picture from the moment when the advertising information is clicked.

d. After the object image to be replaced in the video clip is replaced by the advertisement image, the clicked advertisement information is changed into another form of advertisement information aiming at the advertisement. As shown in fig. 3 and 4, the original advertisement information: coupon for X coffee, change for the advertising message "coffee you love all drink for xxx, do you not get one cup".

Specifically, in fig. 3, a coupon for coffee X is currently given when the person a drinks milk tea;

when the user clicks the coupon, AI replaces milk tea to become X coffee, and meanwhile, the original advertising words (clicked advertising information) are changed into: "character A likes coffee and does not have you get a cup".

The display position of the changed advertisement information (the advertisement information in fig. 4) is not limited to the player floating advertisement, and may be in the form of banner advertisement or the like; the changed advertisement information and the video picture of the advertisement object can be in the same page.

The terminal may also acquire the advertisement information in fig. 4 in operation b of this example.

Advertisement interaction example scenario 2: advertising information transformation based on video content

Before this example is performed, the scheme provided by the embodiment of the present invention may further include the following:

(4) the server acquires each playing time point of the preset video content, and can call AI (artificial intelligence) capability to analyze the video clips from each playing time point to acquire corresponding action classification (see part 3);

(5) after the server obtains the action classification of the video clips, a scenario can be distributed to each video clip according to the scenario matching degree (the scenario matching degree can comprise the item matching of an object, the scenario matching degree can comprise the action and the object matching, and the like); the method can also be as follows:

the server uploads the action classification to the advertisement platform, the advertisement platform allocates a script for each video clip according to the script matching degree, and the allocation result is fed back to the server; and is not limited herein.

(6) After the server acquires the script, the server processes the audio corresponding to each video clip according to the script and stores the audio, for example: changing the lines of the target characters, adding dubbing and the like (off-line processing can be carried out in advance, and the purpose of real-time changing is achieved).

The present exemplary scheme may specifically include (taking the third implementation manner as an example):

a. click events of advertisement information can be monitored by using the APP;

c. After the user clicks the advertisement information, the mobile phone end can record the playing time point when the user clicks the advertisement information (so as to select the audio corresponding to the advertisement, and the terminal side closes the original audio of the current video content); sending the playing time point to a server; and the number of the first and second groups,

the mobile phone end can carry out local deformation processing on the image of the advertisement according to the information such as the shape of the object to be replaced in the video clip, so that the image is consistent with the shape of the object to be replaced in the video picture, and then the image is replaced; or the size of the advertisement image can be changed firstly to keep the shapes of the advertisement image and the advertisement image close, then the picture of the object to be replaced in the video picture is taken as a mask, the redundant part of the advertisement image is cut and then replaced; and the video deduction technology can be adopted to deduct the image of the object to be replaced in the video picture and then replace the image by using the advertisement image.

The video clips in the present operation can refer to the related explanation in the previous example, and are not described herein again.

d. And the server acquires corresponding audio according to the playing time point sent by the mobile phone terminal and sends the audio to the mobile phone terminal.

e. And after the mobile phone end receives the audio sent by the server, matching and playing the audio and the video clip processed in the operation c according to the time stamp information of the audio and the time stamp information of the video clip processed in the operation c.

f. After the audio sent by the server in the operation d and the video clip processed in the operation c are played, continuing to play (play the subsequent original video and audio) according to the original video progress;

that is, the video segment processed in operation c replaces the related video segment of the original video, and the audio sent by the server in operation d replaces the original audio of the corresponding time segment.

The embodiment of the present example can be specifically shown in fig. 5 and 6, in fig. 5, the person a drinks milk tea and looks at the remote saying "who is the person who is there". When the user clicks on the X coffee advertisement, the video becomes: character a drinks X coffee, and the audio becomes "drink cup X coffee frightening" spoken with the voice of character a.

From the above, the scheme provided by the embodiment of the invention is as follows:

(1) the advertisement object can interact with the video content, has more expression forms, increases the expression of the advertisement object, and simultaneously, the advertisement object can adapt to the video content to achieve the optimal effect;

(2) can make the advertisement more have the substitution sense, increase the user to the perception of advertisement.

An embodiment of the present invention further provides a multimedia data interaction apparatus, applied to a terminal, as shown in fig. 7, including:

the first processing module 71 is configured to, in the process of playing the received multimedia data, update the multimedia data received within a preset time period according to the interactive material data received within the preset time period if a trigger operation is detected;

a first playing module 72, configured to play the updated multimedia data;

The multimedia data interaction device provided by the embodiment of the invention updates the multimedia data received in the preset time period according to the interactive material data received in the preset time period if the triggering operation is detected in the process of playing the received multimedia data; playing the updated multimedia data; the preset time period is a time period after the moment of detecting the trigger operation; the interactive material data and the played multimedia data can be combined and then presented to the user, so that the substitution sense of the interactive mode of the interactive material data and the expressive force of the interactive material data in the multimedia data playing process are improved, and the display effect of the interactive material is improved; the problem that the display effect of the interactive materials displayed in the video playing process is poor in the prior art is well solved.

Wherein the interactive material data comprises: object information to be replaced and target object information; the first processing module comprises: and the first processing submodule is used for updating the image data of the object to be replaced in the multimedia data into the image data of the target object to obtain target video data.

for example one, the interactive material data further includes: text information corresponding to the target object; the first processing module further comprises: the first matching submodule is used for matching the target video data with the text information after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data; the first playing module comprises: and the first display sub-module is used for displaying the text information in a preset form in a playing picture of the target video data.

in a first implementation manner, the interactive material data further includes: audio data of voice information corresponding to the target object; the first processing module further comprises: and the second processing submodule is used for replacing the audio data corresponding to the picture with the object to be replaced in the multimedia data with the audio data of the voice information after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data.

In a second implementation manner, the multimedia data interaction apparatus further includes: the first receiving module is used for receiving first audio data and corresponding timestamp information sent by a server before the received multimedia data are played; the first audio data includes audio data of voice information corresponding to the target object; the first processing module further comprises: the first obtaining submodule is used for obtaining a playing time point when the trigger operation is detected after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain target video data; the second obtaining submodule is used for obtaining target audio data corresponding to the target video data from the first audio data according to the playing time point and the timestamp information; and the third processing submodule is used for replacing the audio data in the multimedia data with the target audio data.

In a third implementation manner, the multimedia data interaction apparatus further includes: the first acquisition module is used for acquiring a playing time point when the trigger operation is detected before multimedia data received in a preset time period is updated according to the interactive material data received in the preset time period; the first sending module is used for sending an interaction request to a server according to the playing time point; the second receiving module is used for receiving target audio data fed back by the server according to the interaction request; the target audio data includes audio data of voice information corresponding to the target object; the first processing module further comprises: and the fourth processing submodule is used for replacing the audio data in the multimedia data with the target audio data after the image data of the object to be replaced in the multimedia data is updated to the image data of the target object to obtain the target video data.

The implementation embodiments of the terminal-side multimedia data interaction method are all applicable to the embodiment of the multimedia data interaction device, and the same technical effects can be achieved.

An embodiment of the present invention further provides a multimedia data interaction apparatus, applied to a server, as shown in fig. 8, including:

the first pushing module 81 is used for pushing the multimedia data and the corresponding interactive material data to the terminal in real time.

The multimedia data interaction device provided by the embodiment of the invention pushes multimedia data and corresponding interactive material data to a terminal in real time; the terminal can be supported to realize that the interactive material data is combined with the played multimedia data and then presented to the user, so that the substitution sense of the interactive mode of the interactive material data and the expressive force of the interactive material data in the multimedia data playing process are improved, and the display effect of the interactive material is improved; under the condition that the interactive material data relate to the advertisement content, the substitution sense of the advertisement content interactive mode in the multimedia data playing process is also improved, and the problem that the interactive material display effect is poor due to the fact that the interactive material presented in the video playing process cannot be combined with the currently played multimedia data in the prior art is well solved.

In a second implementation manner, the multimedia data interaction apparatus further includes: the second sending module is used for sending the first audio data and the corresponding timestamp information to the terminal before the multimedia data and the corresponding interactive material data are pushed to the terminal in real time; the first audio data includes audio data of speech information corresponding to the target object.

In a third implementation manner, the multimedia data interaction apparatus further includes: the third receiving module is used for receiving the interaction request sent by the terminal after the multimedia data and the corresponding interactive material data are pushed to the terminal in real time; the second acquisition module is used for acquiring target audio data according to the playing time point in the interactive request; the target audio data includes audio data of voice information corresponding to the target object; and the first feedback module is used for feeding back the target audio data to the terminal according to the interactive request.

Further, the multimedia data interaction apparatus further includes: the third acquisition module is used for acquiring the characteristic parameters of the target object before the multimedia data and the corresponding interactive material data are pushed to the terminal in real time; the fourth obtaining module is used for obtaining the matching degree between the target object and each object in a preset picture according to the characteristic parameters; the preset pictures comprise pictures formed by first video data corresponding to the multimedia data; a fifth obtaining module, configured to obtain, according to the matching degree, at least one group of target pictures that meet a preset condition from the pictures; and the second processing module is used for obtaining at least one video clip of the target object according to the target picture.

Further, in a case where the interactive material data includes audio data of voice information corresponding to the target object, the multimedia data interactive apparatus further includes: the sixth acquisition module is used for acquiring action information of a target person in the video clip before the multimedia data and the corresponding interactive material data are pushed to the terminal in real time; a seventh obtaining module, configured to obtain target action information adapted to the target object from the action information; the first matching module is used for matching text information corresponding to the target action information aiming at the target action information; and the first configuration module is used for configuring the voice information aiming at the target object according to the text information corresponding to the target action information.

Specifically, the first matching module includes: the fifth processing submodule is used for obtaining a content category corresponding to the target action information according to the action type information in the target action information and the quantity information corresponding to each action type; and the second matching sub-module is used for matching the text information corresponding to the target action information according to the content category corresponding to the target action information.

Further, the multimedia data interaction apparatus further includes: an eighth obtaining module, configured to obtain voice data of the target person from second audio data before configuring voice information for the target object according to the text information corresponding to the target action information; the third processing module is used for obtaining the voice characteristic information of the target person according to the voice data; the first configuration module, comprising: the first configuration submodule is used for configuring the voice information aiming at the target object according to the voice characteristic information and the text information corresponding to the target action information; wherein the second audio data comprises audio data corresponding to the first video data.

The implementation embodiments of the server-side multimedia data interaction method are all applicable to the embodiment of the multimedia data interaction device, and the same technical effect can be achieved.

The implementation embodiments of the multimedia data interaction method on the terminal side or the server side are all applicable to the embodiment of the communication device, and the same technical effects can be achieved.

The implementation embodiments of the multimedia data interaction method on the terminal side or the server side are all applicable to the embodiment of the computer-readable storage medium, and the same technical effects can be achieved.

It should be noted that many of the functional components described in this specification are referred to as modules/sub-modules in order to more particularly emphasize their implementation independence.

In embodiments of the invention, the modules/sub-modules may be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be constructed as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different bits which, when joined logically together, comprise the module and achieve the stated purpose for the module.

Indeed, a module of executable code may be a single instruction, or many instructions, and may even be distributed over several different code segments, among different programs, and across several memory devices. Likewise, operational data may be identified within the modules and may be embodied in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed over different locations including over different storage devices, and may exist, at least partially, merely as electronic signals on a system or network.

When a module can be implemented by software, considering the level of existing hardware technology, a module implemented by software may build a corresponding hardware circuit to implement a corresponding function, without considering cost, and the hardware circuit may include a conventional Very Large Scale Integration (VLSI) circuit or a gate array and an existing semiconductor such as a logic chip, a transistor, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.

While the preferred embodiments of the present invention have been described, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A multimedia data interaction method is applied to a terminal, and is characterized by comprising the following steps:

playing the updated multimedia data;

2. The multimedia data interaction method of claim 1, wherein the interactive material data comprises: object information to be replaced and target object information;

3. The multimedia data interaction method of claim 2, wherein the interactive material data further comprises: text information corresponding to the target object;

matching the target video data with the text information;

the playing the updated multimedia data includes:

4. The multimedia data interaction method of claim 2, wherein the interactive material data further comprises: audio data of voice information corresponding to the target object;

5. The multimedia data interaction method of claim 2, further comprising, before playing the received multimedia data:

acquiring a playing time point when the trigger operation is detected;

and replacing the audio data in the multimedia data with the target audio data.

6. The multimedia data interaction method according to claim 2, wherein before updating the multimedia data received within the preset time period according to the interactive material data received within the preset time period, the method further comprises:

acquiring a playing time point when the trigger operation is detected;

sending an interactive request to a server according to the playing time point;

and replacing the audio data in the multimedia data with the target audio data.

7. A multimedia data interaction method is applied to a server and is characterized by comprising the following steps:

8. The multimedia data interaction method of claim 7, wherein the interactive material data comprises: object information to be replaced and target object information.

9. The multimedia data interaction method of claim 8, wherein the interactive material data further comprises: text information corresponding to the target object; and/or audio data of speech information corresponding to the target object.

10. The multimedia data interaction method of claim 8, further comprising, before pushing the multimedia data and the corresponding interactive material data to the terminal in real time:

11. The multimedia data interaction method of claim 8, further comprising, after the multimedia data and the corresponding interactive material data are pushed to the terminal in real time:

receiving an interactive request sent by the terminal;

12. The multimedia data interaction method of claim 9, further comprising, before pushing the multimedia data and the corresponding interactive material data to the terminal in real time:

acquiring characteristic parameters of the target object;

13. The multimedia data interaction method according to claim 12, wherein in a case that the interactive material data includes audio data of voice information corresponding to the target object, before the multimedia data and the corresponding interactive material data are pushed to the terminal in real time, the method further comprises:

acquiring action information of a target person in the video clip;

14. The multimedia data interaction method according to claim 13, wherein the matching, for the target action information, text information corresponding to the target action information includes:

15. The multimedia data interaction method according to claim 13, before configuring the voice information for the target object according to the text information corresponding to the target action information, further comprising:

acquiring voice data of the target person from second audio data;

16. A communication device comprising a memory, a processor and a computer program stored on the memory and executable on the processor; characterized in that the processor implements the multimedia data interaction method according to any one of claims 1 to 15 when executing the program.

17. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the multimedia data interaction method according to any one of claims 1 to 15.