CN115665496B

CN115665496B - Online video voice interaction method, system and storage medium

Info

Publication number: CN115665496B
Application number: CN202211549550.XA
Authority: CN
Inventors: 陆天钦; 蔡树伟
Original assignee: Shenzhen SDMC Technology Co Ltd
Current assignee: Shenzhen SDMC Technology Co Ltd
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-03-10
Anticipated expiration: 2042-12-05
Also published as: CN115665496A

Abstract

The invention provides an online video voice interaction method, a system and a storage medium, wherein the method comprises the following steps: selecting a click target film through a user client, and entering a viewing page; acquiring friend voice fragment information left in the target film; according to the updating of the playing progress, buffering the voice fragment information to be played after a time threshold T from the current time node to the local; when the playing progress reaches the recording time point of the friend voice clip information, starting to play the friend voice clip information; selecting friend voice fragment information for interaction, taking the end time of the friend voice fragment information as the start time of interactive voice, and uploading the interactive voice to a voice server after recording; and receiving the interactive voice pushed by the voice server by the friend client, and instantly joining to the target film together. According to the invention, the target film is watched through the combination of voice interaction, so that more practical small-scene film watching interaction is realized, and the film watching experience of a user is improved.

Description

Online video voice interaction method, system and storage medium

Technical Field

The invention relates to the technical field of audio communication, in particular to an online video and voice interaction method, system and storage medium.

Background

With the improvement of the network speed and the audio and video coding capability, online videos become the first choice for users to watch films, but the users not only meet the improvement of video playing quality such as image quality and sound effect, but pay more and more attention to interaction.

In the online cinema, every wonderful segment always has various accompanying and watching video persons stealing whisper to communicate with relatives and friends in low voice, so that the interaction requirement in the process of watching video is objectively existed, and the requirement is suppressed only because the environment does not allow.

In order to improve the product competitiveness, the online video platform has implemented several interactive schemes, each having advantages and disadvantages.

Scheme 1: video comments are presented in a mode of words and expression packages, lines can be aligned in comment areas for different opinions, and comment exposure rate can be increased for approved opinions.

Disadvantage 1: the video and the comment can not be viewed simultaneously, and the attention is not paid to the video when the content of the comment area is viewed, so that the function can be generally only used for publishing the opinions after the film is viewed.

And (2) disadvantage: the comment area is opened to all users, the number of comments is too large, many repeated and meaningless comments exist, and the enthusiasm of the users for checking the comments is reduced.

Scheme 2: the video barrage is sent to be displayed on a screen and can roll along with video playing, the problem that video comments cannot interact with other users when watching a film is solved, meanwhile, the problem that repeated comments influence the enthusiasm of the users for checking the comments is also solved, the defects are turned over to be an advantage, a plurality of cheering comments can appear in a video wonderful segment, the atmosphere is rendered, and the user cannot endure and wants to copy one of the cheering-in comments.

The disadvantages are as follows: the method is suitable for scenes used by all users at the same time, and the character barrage is pale in small scenes of family or relatives and friends interaction.

Scheme 3: the video is watched together, and the video is synchronously played and combined with instant chat, so that excellent film watching and interaction experience is achieved in a small scene of the interaction of relatives and friends.

The disadvantages are as follows: it is difficult to gather so many buddies in reality while being available to see the video together, resulting in very few opportunities to use this feature.

Disclosure of Invention

In order to solve at least one technical problem, the invention provides an online video voice interaction method, a system and a storage medium, wherein voices of users are uploaded to a server in a film watching process, voices of friends are obtained through the server to be played and interacted synchronously in the film watching process, the problem that the friends have little time to watch a target film together in reality is solved, and on the other hand, voice comment contents produced by the users through watching the films together with the friends are stored, so that the product competitiveness is improved, and the user viscosity is improved.

The invention provides an online video voice interaction method in a first aspect, which comprises the following steps:

selecting a click target film through a user client, and entering a viewing page;

acquiring friend voice fragment information left in the target film;

according to the updating of the playing progress, buffering the voice fragment information to be played after a time threshold T from the current time node to the local;

when the playing progress reaches the recording time point of the friend voice clip information, starting to play the friend voice clip information;

selecting friend voice fragment information for interaction, taking the end time of the friend voice fragment information as the start time of interactive voice, and uploading the interactive voice to a voice server after recording;

and receiving the interactive voice pushed by the voice server by the friend client, and instantly joining to the target film together.

In this scheme, add into and look at the target film together, specifically include:

when a user clicks a target film through a user client to enter a watching page, a new IM room is added by default, and real-time voice is started;

the user automatically becomes a house owner as the earliest joined person in the online members of the IM room, and processes the play synchronization requests of other members;

a friend selects a user watching a target film at present through a friend client, joins an IM room in which the user is positioned, sends a synchronous playing request message, and synchronizes the current playing progress;

so that the user and the friend can exchange views while watching the target film in the current IM room.

In this scheme, enabling a user and a friend to exchange a view while watching a target film in a current IM room specifically includes:

the method comprises the steps that a user and a plurality of friends record and send a first voice fragment to an IM server at the same time within a preset period;

respectively acquiring parameter information of a plurality of first voice fragments by an IM server, wherein the parameter information comprises sending time and identity attributes;

calculating the parameter information of the first voice fragments according to a first algorithm to obtain the issuing sequence of the first voice fragments in the IM room;

and sequentially publishing the plurality of first voice fragments in the IM room according to the publishing sequence.

In this embodiment, the method further includes:

a user logs in a voice server through a user client and inquires a voice fragment list and a friend reply list issued by the user according to a film id;

clicking the corresponding voice clip in the voice clip list to jump to the corresponding film position for playing, and executing the action of re-recording or deleting the voice clip;

and clicking the friend voice fragment in the friend reply list to jump to the corresponding film position for playing, and executing a voice reply action on the friend voice fragment.

In this scheme, buffering the voice segment information to be played after the time threshold T passes from the current time node to the local, specifically includes:

acquiring the current network state of a user client and the size of a next voice segment to be played;

based on the current network state and the size of the next voice segment to be played, predicting a time threshold T of the next voice segment to be played through a deep learning model;

and when the time interval between the time node of the next voice clip information to be played and the current time node reaches a time threshold T, beginning to buffer the next voice clip information to be played to the local.

In this scheme, after the time threshold T of the next speech segment information to be played is obtained through the deep learning model prediction, the method further includes:

acquiring a plurality of historical voice segment playing data, wherein each historical voice segment playing data at least comprises a historical network state, a historical voice segment size and actual buffering advance time of a historical voice segment;

performing feature calculation based on the current network state to obtain current network features;

performing characteristic calculation based on the historical network state of each historical voice segment playing data to obtain the historical network characteristics of each historical voice segment playing data;

comparing the similarity of the current network characteristics with a plurality of historical network characteristics one by one, screening out historical voice segment playing data with the similarity larger than a first preset threshold value, and adding the historical voice segment playing data into a historical database;

calculating the historical network state and the size of each historical voice segment of the playing data of each historical voice segment in the historical database through a deep learning model to obtain a historical prediction time threshold of the playing data of each historical voice segment;

subtracting the corresponding historical predicted time threshold from the actual buffering advance time of each historical voice segment playing data in the historical database to obtain a buffering time difference value;

adding the buffering time difference values of all the historical voice segment playing data in the historical database to obtain a difference sum, and dividing the difference sum by the total amount of the historical voice segment playing data in the historical database to obtain a buffering time correction value;

and adding a buffering time correction value on the basis of predicting the time threshold T of the next voice clip information to be played to obtain the corrected time threshold T.

The second aspect of the present invention further provides an online video-audio interaction system, which includes a memory and a processor, where the memory includes an online video-audio interaction method program, and when executed by the processor, the online video-audio interaction method program implements the following steps:

acquiring friend voice fragment information left in the target film;

when a user clicks a target film through a user client to enter a viewing page, a new IM room is added by default, and real-time voice is started;

In this scheme, when executed by the processor, the online video and voice interaction method further includes:

The third aspect of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a program of an online video-audio interaction method, and when the program of the online video-audio interaction method is executed by a processor, the steps of the online video-audio interaction method are implemented.

The method combines voice interaction and watches the target film together, realizes listening to the voice comment issued by the friend in the film at any time and any place, sends the comment and replies for interaction, and synchronously watches the film with the friend, thereby solving the problem that the function is not practical due to inconsistent user leisure time when the target film is watched together. In addition, the invention stores the voice comment content produced by watching with friends, thereby improving the product competitiveness and the user stickiness.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a flow chart illustrating an online video-voice interaction method of the present invention;

FIG. 2 illustrates a target film voice interaction flow diagram of a particular embodiment of the present invention;

FIG. 3 is a block diagram of an online video-voice interaction system according to the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Fig. 1 shows a flow chart of an online video voice interaction method according to the present invention.

As shown in fig. 1, a first aspect of the present invention provides an online video-voice interaction method, where the method includes:

s102, selecting a click target film through a user client, and entering a viewing page;

s104, acquiring friend voice fragment information left in the target film;

s106, according to the updating of the playing progress, buffering the voice fragment information to be played to the local after the time threshold T from the current time node;

s108, when the playing progress reaches the recording time point of the friend voice clip information, starting to play the friend voice clip information;

s110, selecting friend voice fragment information for interaction, taking the end time of the friend voice fragment information as the start time of interactive voice, and uploading the interactive voice to a voice server after recording;

and S112, receiving the interactive voice pushed by the voice server by the friend client, and instantly joining to the target film together.

According to the voice comment and interaction process, firstly, a user clicks a target film to enter a viewing page. And acquiring voice fragment information left by friends in the film. And according to the updating of the playing progress, the voice data which will be played by a future time threshold (such as 30 seconds) is buffered to the local by multiple threads in advance. And when the playing progress reaches the recording time point of the voice, starting to play the voice. The user can select which friends' voices need to be banned, and voice filtering is achieved. And clicking the voice of the friend to carry out interaction, taking the ending time of the voice as the starting time of the interactive voice, recording and uploading the voice to a server, and pushing the voice reply to the server by the friend through a message so as to instantly join the voice reply and watch the target film.

The invention realizes more practical small-scene film watching interaction by combining voice interaction and watching a target film. The invention uploads the voice to the voice server by recording the voice of the user in the film watching process, and obtains the voice of the friend through the voice server to synchronously play and interact during the film watching process. If a friend is watching the video, the user can also listen to the voice comments issued by other friends in the past while watching the target video by joining the IM room in which the friend is located.

Based on the above, the method and the device solve the problem that friends rarely have time to watch the target film together in reality, and on the other hand, the content (voice comment) produced by the user generates commemorative value along with the time, so that the user stickiness can be improved.

According to an embodiment of the present invention, adding a target movie together includes:

the user automatically becomes a house owner as the earliest joining person among the online members of the IM room, and processes the play synchronization requests of other members;

and the user and the friends can exchange opinions while watching the target film in the current IM room.

It can be understood that if a friend is watching at present, the user can also listen to the voice comment issued by other friends in the past while watching the target film by joining the IM room in which the friend is located.

According to an embodiment of the invention, the method further comprises:

The method can also manage the film voice so as to meet the watching requirements of different users at different periods, and can achieve the effect of stimulating the film watching based on the voice clips of friends.

According to a specific embodiment of the present invention, the method further comprises:

a user selects a forbidden friend list through a personalized setting interface of a user client;

and when the user plays the film through the user client, filtering the voice fragments of the forbidden friend list.

It can be understood that the friend list with forbidden languages will not be played synchronously along with the progress of the video, and the user can select which friends' voices need to be forbidden languages, so that voice filtering is realized, the requirement of user personalized setting is met, and the user experience is improved.

and selecting whether to start an automatic recording voice clip uploading mode or not through a personalized setting interface of the user client, if so, automatically recording the voice clip to upload to a voice server in the film watching process, otherwise, switching to a manual mode, and manually clicking a recording button by the user to generate voice clip information to comment in the film watching process.

Fig. 2 shows a flow diagram of a target movie voice interaction in accordance with an embodiment of the present invention.

As shown in fig. 2, first, a user requests a voice server to obtain voice clip information of a friend in a current movie; returning voice fragment information by the voice server; the user inquires the friend watching the target film from the voice server, and the voice server returns the information of the friend watching the target film. The user requests to join the new room to the IM server, and the IM server returns the joining result to the user. Room information is synchronized by the user to the voice server.

Then, the client of the user loads voice data for playing according to the playing progress of the target film; and simultaneously, playing the real-time voice of the IM room where the current IM room is located. If the user needs the voice comment, the voice clip is recorded and then sent to the IM room where the user is located so as to share the voice comment of the user to other friends in the IM room, and meanwhile, the voice clip is stored in the voice server.

The user can apply to the IM server to switch to the room where the friend watching at present is located, if the user is successfully added, the IM server synchronizes the film playing progress of the user to the film playing progress of the room where the friend is located, and meanwhile, the user updates the IM room information to the voice server.

It is to be appreciated that the IM server is used to manage IM rooms to facilitate users sharing voice comments with buddies within the same IM room.

According to the embodiment of the invention, the method for enabling the user and the friend to exchange the target film in the current IM room comprises the following steps:

It should be noted that when the number of the users and the friends is large, a plurality of friends or the users may issue the voice comments at the same time, and at this time, in order to more reasonably arrange the issuing sequence of the plurality of voice comments, the invention performs comparison calculation according to the first algorithm, so as to obtain a suitable issuing sequence, and later, the voice comments can be issued in the IM room according to the suitable issuing sequence, so as to reasonably deal with the conflict situation of the voice comments issued by the plurality of friends or the users at the same time.

According to the specific embodiment of the present invention, calculating the parameter information of the plurality of first voice segments according to the first algorithm to obtain the issuing order of the plurality of first voice segments in the IM room, specifically includes:

comparing the sending time of each first voice segment with the sending times of other first voice segments one by one, if the sending time of the former is prior to the latter, adding 1 to the sending time item of the first voice segment of the former, and if not, not processing;

comparing the identity attribute of each first voice fragment with the identity attributes of other first voice fragments one by one, if the identity attribute level of the former is higher than that of the latter, adding 1 to the identity attribute item of the former first voice fragment, and if not, not processing the identity attribute item;

after the plurality of first voice segments are compared with each other, counting the cumulative value of the sending time item and the cumulative value of the identity attribute item of each first voice segment;

presetting sending time and identity attribute to have different weight factors for the issuing sequence of the voice fragments;

on the basis of each first voice segment, multiplying the cumulative value of the sending time item by the corresponding weight factor to obtain a first value, multiplying the cumulative value of the identity attribute item by the corresponding weight factor to obtain a second value, adding the first value and the second value, and calculating to obtain the issuing sequence value of each first voice segment;

and according to the issuing sequence value, the issuing sequence of the first voice fragments in the IM room is obtained by arrangement.

It can be understood that the issuing order of the plurality of first voice fragments is comprehensively evaluated by combining the sending time and the identity attribute, for example, some clients have higher identity levels, such as a homeowner, the first voice fragments need to be issued preferentially at this time, but cannot be ordered completely according to the identity levels of the clients, and the sending time of each client needs to be combined for comprehensive evaluation, so that a more reasonable issuing order can be obtained conveniently.

According to the embodiment of the present invention, buffering the voice clip information to be played after a time threshold T has elapsed since the current time node to the local includes:

It can be understood that the invention obtains the time threshold T of the next voice segment information to be played through the deep learning model prediction based on the current network state and the size of the next voice segment to be played, thereby facilitating the completion of the work of caching the voice segment information to the local before playing the next voice segment information, and avoiding the influence on the overall viewing effect due to the voice segment delay or pause phenomenon caused by the untimely caching.

The deep learning model of the invention is preferably a CNN model, CNN (conditional Neural Network) is one of Neural networks, and the weight value of the CNN model shares a Network structure to make the CNN model more similar to a biological Neural Network, thereby reducing the complexity of the Network model and the number of the weight values. The CNN convolutional neural network structure comprises: convolutional layer, downsampling layer, full link layer. Each layer has a plurality of feature maps, each feature map extracting a feature of the input through a convolution filter, each feature map having a plurality of neurons.

According to the embodiment of the present invention, after the time threshold T of the next speech segment information to be played is obtained through deep learning model prediction, the method further includes:

It can be understood that the invention selects a plurality of historical voice segment playing data which are successfully played (without pause or delay), and uses the data as reference to correct the time threshold T which is predicted by the model and needs to be buffered in advance, thereby further improving the accuracy of the time threshold T, further ensuring the fluency of voice segment playing and improving the experience of viewing.

As shown in fig. 3, the second aspect of the present invention further provides an online video-audio interaction system 3, which includes a memory 31 and a processor 32, where the memory includes an online video-audio interaction method program, and the online video-audio interaction method program, when executed by the processor, implements the following steps:

acquiring friend voice fragment information left in the target film;

according to the updating of the playing progress, buffering the voice fragment information to be played to the local after the time threshold T from the current time node;

and the friend client receives the interactive voice pushed by the voice server and instantly joins in to watch the target film.

According to the embodiment of the invention, when being executed by the processor, the online video voice interaction method further comprises the following steps:

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. An online video voice interaction method, the method comprising:

acquiring friend voice fragment information left in the target film;

2. The method as claimed in claim 1, wherein the step of adding a target movie for viewing together comprises:

3. The method of claim 2, wherein the allowing the user and the friend to exchange the target movie in the current IM room comprises:

4. The method of claim 1, further comprising:

5. The method of claim 1, wherein buffering the audio clip information to be played locally after a time threshold T has elapsed since a current time node, specifically comprises:

acquiring the current network state of a user client and the size of the next voice segment to be played;

6. The method of claim 5, wherein after the time threshold T of the next speech segment information to be played is predicted by the deep learning model, the method further comprises:

acquiring a plurality of historical voice segment playing data, wherein each historical voice segment playing data comprises a historical network state, a historical voice segment size and actual buffering advance time of a historical voice segment;

subtracting the corresponding historical prediction time threshold from the actual buffering advance time of each historical voice segment playing data in the historical database to obtain a buffering time difference value;

7. An online video-audio interaction system, comprising a memory and a processor, wherein the memory includes an online video-audio interaction method program, and the online video-audio interaction method program, when executed by the processor, implements the following steps:

acquiring friend voice fragment information left in the target film;

8. The system of claim 7, wherein the joining of the target movie together comprises:

9. The system of claim 7, wherein the program for online video-audio interaction, when executed by the processor, further implements the following steps:

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises an online video voice interaction method program, which when executed by a processor implements the steps of an online video voice interaction method according to any one of claims 1 to 6.