CN109688475B

CN109688475B - Video playing skipping method and system and computer readable storage medium

Info

Publication number: CN109688475B
Application number: CN201811654558.6A
Authority: CN
Inventors: 李其浪
Original assignee: Shenzhen TCL New Technology Co Ltd
Current assignee: Shenzhen TCL New Technology Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-10-02
Anticipated expiration: 2038-12-29
Also published as: WO2020135161A1; CN109688475A

Abstract

The invention discloses a video playing skipping method, a system and a computer readable storage medium, comprising: receiving user voice information collected by a video playing terminal; recognizing the user voice information and extracting the characteristics of the voice information; matching the voice information features with different scene labels in preset audio data to obtain scene labels matched with the voice information features; and sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position. The invention also discloses a video playing and skipping system and a computer readable storage medium. According to the invention, through voice recognition and semantic recognition of the server, the video skipping can be realized by the user through a voice command, so that the user experience is improved.

Description

Video playing skipping method and system and computer readable storage medium

Technical Field

The present invention relates to the field of video playing technologies, and in particular, to a video playing skip method, a video playing skip system, and a computer readable storage medium.

Background

With the development of internet technology, people no longer rely on receiving television live broadcast signals to watch live videos, but watch any videos existing in the network through the internet, including live videos. Therefore, the video type can be selected according to the preference of the user, the playing progress can be adjusted at will in the process of watching the video, and the video can be directly jumped to a scene to be watched.

When the video playing progress is adjusted, a user can realize the adjustment through a key on the television remote control or a virtual key on the video playing software, for example, the user presses the key on the television remote control or the virtual key on the video playing software, and the video playing progress jumps forwards or backwards for a certain time; if the user always presses a key on a television remote control or a virtual key on video playing software, the video playing progress jumps forwards or backwards for a certain time; for example, after the user sets the jump time, the television or the video playing software loads the jump time and then plays the video, and the like. Therefore, the user can jump the video to the scene to be watched only by manually operating the key, and the one-time jumping is difficult to complete, so that the user experience is poor.

Disclosure of Invention

The invention mainly aims to provide a video playing and skipping method, a video playing and skipping system and a computer readable storage medium, and aims to solve the technical problem that a user needs to manually operate keys for multiple times to skip a video to a scene to be watched, and the user experience is poor.

In order to achieve the above object, the present invention provides a video playing and skipping method, which comprises the following steps:

receiving user voice information collected by a video playing terminal;

recognizing the user voice information and extracting the characteristics of the voice information;

matching the voice information features with different scene labels in preset audio data to obtain scene labels matched with the voice information features;

and sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position.

Preferably, before the step of matching the voice information features with different scene tags in preset audio data and obtaining a scene tag matched with the voice information features, the method includes:

judging whether the voice information characteristics comprise a skip video name or not;

if the voice information characteristics do not include the name of the skip video, acquiring the name of the currently played video;

the step of matching the voice information features with different scene tags in preset audio data and acquiring the scene tags matched with the voice information features comprises the following steps:

and matching the voice information characteristics and the name of the currently played video with different scene labels in the audio data to obtain the scene label matched with the voice information characteristics.

Preferably, after the step of determining whether the voice information feature includes a skip video name, the method includes:

if the voice information feature comprises a skip video name, executing the following steps: and matching the voice information characteristics with different scene labels in preset audio data to obtain the scene label matched with the voice information characteristics.

Preferably, the step of matching the voice information features with different scene tags in preset audio data and obtaining the scene tags matched with the voice information features includes:

judging whether preset audio data comprises audio data corresponding to a currently played video;

if the preset audio data does not contain the audio data corresponding to the currently played video, sending a request instruction to a video playing terminal;

and receiving audio data corresponding to the currently played video sent by the video playing terminal, and storing the audio data to preset audio data.

Preferably, after the step of matching the voice information feature with different scene tags in preset audio data and obtaining a scene tag matched with the voice information feature, the method further includes:

if the scene label which accords with the voice information characteristic is not matched within the preset time, generating a matching failure prompt;

and sending the matching failure prompt to the video playing terminal so that the video playing terminal displays the prompt information.

In addition, in order to achieve the above object, the present invention further provides a video playing and skipping method, which includes the following steps:

collecting voice information input by a user;

sending the user voice information to a server so that the server matches the voice information features with different scene tags in the audio data to obtain scene tags matched with the voice information features;

and receiving the scene label matched with the voice information characteristics, and skipping the video played on the video playing terminal to a corresponding position.

Preferably, after the step of sending the user voice information and the name information of the currently playing video to the server, the method further includes:

receiving an audio data request instruction sent by a server;

and sending the audio data corresponding to the currently played video to the server.

Preferably, after the step of sending the user voice information to a server, the method further includes:

and if the server does not match the scene label which accords with the voice information characteristic within the preset time, receiving a matching failure prompt and displaying the prompt in a video terminal interface so as to prompt the user.

In addition, to achieve the above object, the present invention further provides a video playing skip system, where the video playing skip system includes: a video playing terminal and a server,

the video playing terminal collects voice information input by a user and sends the voice information of the user and name information of a currently played video to a server;

the server receives user voice information collected by the video playing terminal, identifies the voice information, extracts the characteristics of the voice information, matches the voice information characteristics with different scene labels in preset audio data, acquires a scene label matched with the voice information characteristics, and sends the scene label matched with the voice information characteristics to the video playing terminal;

and the video playing terminal receives the scene label matched with the voice information characteristic and skips the video played on the video playing terminal to a corresponding position.

In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which when executed by a video playback terminal and a server, implements the video playback skip method as described above.

The invention is applied to an interactive system consisting of a video playing terminal and a server, firstly, user voice information acquired by the video playing terminal through a voice acquisition module such as a microphone is received, the user voice information is identified through voice identification and semantic identification functions to obtain characteristics of the user voice information, the characteristics mainly comprise information such as video names and scenes which the user intends to jump, meanwhile, the server matches the voice information characteristics with different scene labels in audio data to obtain scene labels matched with the voice information characteristics, and finally, the scene labels matched with the voice information characteristics are sent to the video playing terminal to enable the video to jump to corresponding positions. Therefore, the video skipping can be realized by the user through the voice command, the video skipping can be accurately skipped to the scene desired by the user, and the user experience is improved.

Drawings

FIG. 1 is a schematic diagram of a system architecture to which embodiments of the present invention relate;

FIG. 2 is a flowchart illustrating a video playing skip method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a video playing skip method according to a second embodiment of the present invention;

FIG. 4 is a flowchart illustrating a video playing and skipping method according to a third embodiment of the present invention;

FIG. 5 is a flowchart illustrating a video playing and skipping method according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video playing and skipping system according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the embodiment of the invention is as follows: receiving user voice information collected by a video playing terminal; recognizing the user voice information and extracting the characteristics of the voice information; matching the voice information features with different scene labels in preset audio data to obtain scene labels matched with the voice information features; and sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position.

The present invention is needed to solve the problem that the prior art cannot jump the video playing to the corresponding scene position through the scene features in the user voice.

The invention provides a solution, which enables a user to realize video skipping through a voice command, and can skip to a scene desired by the user accurately, thereby improving the user experience.

Fig. 1 is a schematic diagram of a system architecture of an embodiment of a video playing and skipping method according to the present application.

Referring to fig. 1, the system architecture 100 may include

video playback terminals

101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the

video playback terminals

101, 102, 103 and the server 105. The network 104 may include various wired, wireless communication links, such as fiber optic cables, mobile networks, WiFi, bluetooth, or hot spots, among others.

A user may use the

video playback terminals

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages or the like. The

video playing terminals

101, 102, 103 may have various communication client applications installed thereon, such as a video playing application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

video playback terminals

101, 102, 103 may be hardware or software. When the

video playback terminals

101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting video playback, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio layer iii, motion Picture Experts compression standard Audio layer 3), MP4 players (Moving Picture Experts Group Audio layer IV, motion Picture Experts compression standard Audio layer 4), laptop portable computers, desktop computers, and the like. When the

video playback terminals

101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, for example, reading videos played on the

video playback terminals

101, 102, and 103, or analyzing and processing received various voice information, instruction information, and video/audio data, and feeding back processing results, such as video clips, scene tags, instruction information, and the like, to the video playback terminals, so that the video playback terminals complete corresponding actions according to the processing results.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the video playing skip method provided in the embodiment of the present application may be executed by the

video playing terminals

101, 102, and 103, or may be executed by the server 105. Accordingly, the means for pushing information may be provided in the

video playback terminals

101, 102, 103, or in the server 105. And is not particularly limited herein.

It should be understood that the number of video playback terminals, networks, and servers in fig. 1 is merely illustrative. There may be any number of video playback terminals, networks, and servers, as desired for implementation.

Referring to fig. 2, a first embodiment of the present invention provides a video playing skip method, including the following steps:

and step S10, receiving the user voice information collected by the video playing terminal.

The invention can be applied to an interactive system consisting of the video playing terminal and the server, and the video playing terminal is connected with the server through the network to realize interaction. In this embodiment, the video playing terminal takes a television as an example, acquires voice information of a user in real time through a voice acquisition module of the television, and sends the acquired voice information to the server through a wireless network. And the server receives the user voice information sent by the television at the other end of the network in real time.

Step S20, recognizing the user voice information, and extracting the features of the voice information.

The server carries out voice recognition and semantic recognition on the received user language information, wherein the voice recognition is to convert the voice information into character information which can be recognized by a computer through an acoustic model and a voice model, and the semantic recognition is to carry out intelligent analysis based on the characteristics of gender, hobbies, ordinary on-demand tendency and the like of the user on the basis of the voice recognition so as to better understand the intention of the user. If the user inputs voice as the full name of a specific movie or a specific television show, the server can find out the movie or the television show which the user wants to watch only through voice recognition, if the user inputs voice as fuzzy sentences such as ' an love film ', ' a hot-cast action film ', ' a hong kong director ' movie ' and ' hollywood big film ', and the server also needs semantic recognition to accurately jump.

The server can extract the characteristics of the user voice information based on the voice recognition and voice recognition functions, for example, the user records the voice as that the Zhao position length in the name of TV play is checked, the server can recognize the voice and extract the characteristics of the TV play, the Min meaning of the name and the Zhao position length are checked.

Step S30, matching the voice information features with different scene tags in the audio data, and obtaining a scene tag matched with the voice information features.

The server of the invention is preset with mass audio data and carries out voice recognition marking on all the audio data to generate corresponding scene labels, and the server can generate different scene labels for different scenes in the audio data, wherein the scene labels comprise related information such as video types, names, scene descriptions, characters, time, set numbers and the like. The scene tag can be at the beginning, end or climax of the corresponding scene audio information, and the case is preferably at the beginning of the corresponding scene audio information.

It should be noted that, in addition to the above embodiments, the server can obtain the video clip or the subtitle information corresponding to the mass audio data from the television or the network according to the mass audio data in the audio database, and then perform intelligent analysis on the video clip or the subtitle information to generate the scene tag at the corresponding position of the audio data.

In this embodiment, the user intends to jump to the video playback terminal to jump to the time period corresponding to the video, for example, when the video playback terminal is currently playing the television drama minqi, at this moment, the user enters a voice command that the user is a word of the television drama, and a place length is found in the word of the television drama, and the server first determines that the user voice information is extracted from the audio database and extracts all audio information related to the word of the television drama.

According to scene information which is contained in the voice information characteristics of the user and is to be jumped by the user, matching the scene information which is to be jumped by the user with each scene label in the audio data, and finding out the scene label with the highest matching degree, if a voice command input by the user is ' name of television drama '. The Chu Zhao Chuang is found ', all scene labels in the corresponding audio data are found in an audio database, and if the Zhao Chuang is caught, the Chen rock resists an excavator, the Hou Liang Ping and Qigong Wei ' Chi Dou ', the Ouyang is caught, and the like, the scene label matched with the ' Zhao Chuang is caught ' is found out.

And step S40, sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position.

And after acquiring the scene label matched with the voice information characteristic, the server sends the scene label to the video playing terminal so that the video playing terminal jumps to a corresponding position according to the scene label.

It should be noted that, in addition to the foregoing embodiments, the server may generate a jump instruction according to the scene tag matched with the voice information feature, where the jump instruction includes scene tag position information, so that the video playing terminal can jump to a corresponding position according to the jump instruction.

In this embodiment, the server receives user voice information and name information of a currently played video, which are collected by a video playing terminal, performs voice recognition and semantic recognition on the user voice information, extracts features of the voice information, confirms that an audio database contains audio data corresponding to the currently played video according to the name information of the currently played video, matches the voice information features with different scene tags in the audio data, obtains scene tags matched with the voice information features, and sends the scene tags matched with the voice information features to the video playing terminal so as to control the video played on the video playing terminal to jump to corresponding positions. According to the method and the device, the voice information characteristics of the user are identified through the voice identification function of the server, and the scene label which is consistent with the voice command of the user is matched according to the voice characteristics of the user, so that the video playing terminal can realize video skipping and can skip to the scene which the user wants accurately, and the experience of the user is improved.

Further, referring to fig. 3, a second embodiment of the present invention provides a video playing skip method, based on the embodiment shown in fig. 2, before the step of matching the voice information feature with different scene tags in the preset audio data in step S30, and acquiring a scene tag matched with the voice information feature, the method includes:

step S50, determining whether the voice information feature includes a skip video name.

In order to improve the accuracy of the query result, in this embodiment, before matching the tag, it is further determined whether the voice information feature includes the skipped video name, and if the voice information feature does not include the skipped video name, step S60 is executed to obtain the name of the currently played video.

In this embodiment, the voice command entered by the user does not have a name of a video to be skipped, and those skilled in the art can understand that the object to be skipped by the user is a video currently being played by the video playing terminal, and at this time, the server obtains the name of the currently played video from the playing terminal. Step S30 is replaced with: step S31: and matching the voice information characteristics and the name of the currently played video with different scene labels in the audio data to obtain the scene label matched with the voice information characteristics.

After the video name is obtained, matching is carried out according to the name of the voice of the user and the currently played video with different scene labels in the audio data, and a scene label matched with the voice information characteristic is obtained, for example, the video playing terminal plays the name of the television drama at present, the server collects the currently played video from the video playing terminal, then the server judges that the voice information of the user is extracted from all audio information related to the name of the television drama in the audio database according to the fact that the voice command input by the user is 'the position length of the Zhao in the name of the television drama', and then the server judges that the voice information of the user is in all audio information related to the name of the television drama according to the characteristic in the voice information, so that the matching speed according to the video name is faster, and the result is more accurate. In addition, if the user inputs the voice as 'jump to big ending', the characteristics of the 'big ending' are extracted, and the video currently played jumps to the starting position of the last set.

Certainly, if the voice information feature does not include the name of the skipped video, the name of the currently played video may not be obtained, and the voice information feature is directly adopted to perform label matching on the preset audio data, which requires more audio data to be queried, resulting in a slower query speed.

If the voice information feature includes the skip video name, step S30 is executed to match the voice information feature with different scene tags in preset audio data, and obtain a scene tag matched with the voice information feature.

The execution process of the server at this time is the same as step S31 except that the video names one among the user voice information and one obtained by the server to the video playback terminal.

In addition, if the voice recorded by the user is "a love film", "a hot-cast action film", "a hong kong director's movie", "hollywood big film", and the like, which do not contain specific tv series or movie name information, the server needs to perform self-matching in the voice database, and can perform intelligent analysis based on the characteristics of the user, such as gender, hobbies, and ordinary on-demand tendency, and select a video suitable for the user, so that the video playing terminal jumps to the video. The user can also perform other instructions, for example, if the voice recorded by the user is "advance for 30 minutes", the characteristics of "advance" and "30 minutes" are extracted, and the video currently being played is jumped to the position advanced for 30 minutes.

The invention judges whether the skip video name exists in the voice information characteristics of the user through the server, thereby realizing the skip of the currently played video, the switching of the video playing to other video names or the switching of the video playing to the corresponding scene of other video names, and being more in line with the public requirement.

Further, the step S30 matches the voice information feature with different scene tags in preset audio data, and obtains a scene tag matched with the voice information feature, including:

step S32, judging whether the preset audio data includes the audio data corresponding to the currently played video;

if the preset audio data does not include the audio data corresponding to the currently played video, step S31 is executed, and step S34 is executed.

Step S33, a request instruction is sent to the video playback terminal.

And step S34, receiving the audio data corresponding to the currently played video sent by the video playing terminal, and storing the audio data in an audio database.

If the audio database does not contain the audio data corresponding to the currently played video of the video playing terminal, the server sends a request instruction to the video playing terminal, the request instruction requires the video playing terminal to send the audio data corresponding to the currently played video, and the server requests the video playing terminal to store the audio data in the audio database after receiving the audio data sent by the video playing terminal. Therefore, the audio data in the audio database is richer and more complete, and meanwhile, when the video object to be jumped by the user is the current playing video, the scene label required by the user can be matched in time.

Further, referring to fig. 4, a third embodiment of the present invention provides a video playing skipping method, based on the embodiment shown in fig. 2, after matching the voice information feature with different scene tags in the audio data in step S30, and acquiring a scene tag matched with the voice information feature, the method further includes:

step S70, if the scene label according with the voice information characteristic is not matched in the preset time, generating a matching failure prompt;

and step S80, sending the matching failure prompt to the video playing terminal so that the video playing terminal displays the prompt information.

Matching the voice information characteristics with different scene labels in an audio database, and if no video object to be skipped by a user exists in the audio database, directly ending the matching; if the audio database has video objects to be jumped by the user, identifying the audio information corresponding to the video names to be jumped by the user in the audio database, obtaining each scene label corresponding to the audio information, and matching with each scene label. And if the scene label which accords with the voice information characteristic is not matched in the preset time, finishing the matching. And after the matching is finished, generating a matching failure prompt and sending the matching failure prompt to the video playing terminal. The video playing terminal receives the prompt information of the matching failure, can directly display the prompt information on a video playing interface, and can also prompt the control such as Toast, Snackbar and other prompt information through a user on the terminal. Of course, besides the matching result giving a matching failure prompt, other video information closer to the intention in the audio database can be recommended to the user according to the voice information characteristics. If the voice recorded by the user is ' one love film ', ' hot-broadcast action film ', ' Hongkong director ' film ', ' hollywood large film ', and the like, the server performs self-matching in the voice database, can perform intelligent analysis based on the characteristics of the user such as sex, hobbies, ordinary on-demand tendency, and the like, and selects a video suitable for the user so that the video playing terminal jumps to the video.

Referring to fig. 5, a fourth embodiment of the present invention provides a video playing skip method, including the following steps:

step S110, collecting voice information input by a user.

In this embodiment, the video playing terminal may include a video playing module and a voice collecting module; or only comprise a video playing module and then be externally connected with a voice acquisition module, such as a microphone. In the embodiment, the mobile phone is used as the video playing terminal, voice information of a user is collected through a microphone of the mobile phone, and a video playing application program is installed in the mobile phone, so that a video which the user wants to watch can be played by the video playing application program.

Step S120, the user voice information is sent to a server, so that the server matches the voice information characteristics with different scene labels in the audio data, and the scene label matched with the voice information characteristics is obtained.

Sending the voice information of the user to a server by a mobile phone, wherein the voice information can comprise a scene keyword (such as 'X-zhu Xiantai'), and can also comprise a drama name keyword and a scene keyword (such as 'drama name A plot B') so that the server can directly analyze a video object and scene information which the user intends to jump from the voice information, and simultaneously, the server judges whether audio data corresponding to the currently played video exists in an audio database according to the name information of the currently played video sent by the mobile phone, and if not, executing the following steps:

step S121, receiving an audio data request instruction sent by the server.

Step S122, sending the audio data corresponding to the currently played video to the server.

After receiving an audio data request instruction sent by the server, the mobile phone calls audio data corresponding to the currently played video from the background, packages the audio data and uploads the audio data to the server, so that the audio database of the server has the audio data corresponding to the currently played video.

And step S130, receiving the scene label matched with the voice information characteristic, and skipping the video played on the video playing terminal to a corresponding position.

And the mobile phone receives the matching result sent by the server in real time, and if the matching result is a scene label matched with the voice information characteristic, the mobile phone performs skipping on the video playing application program according to the position information contained in the scene label. And if the server does not match the scene label conforming to the voice information characteristic and the mobile phone receives a matching failure prompt, displaying text information on a screen of the mobile phone to prompt a user.

In this embodiment, the video playing terminal collects voice information input by a user through a microphone, acquires name information of a currently playing video in a background, and sends the voice information of the user and the name information of the currently playing video to the server, so that the server matches the voice information features with different scene tags in the audio data, acquires a scene tag matched with the voice information features, receives the scene tag matched with the voice information features, and skips the video played on the video playing terminal to a corresponding position. The invention enables the user to directly send the voice command to realize video skipping and skip to the video scene to be watched, thereby improving the user experience.

Referring to fig. 6, the present invention is a schematic diagram of a first embodiment of a video playing skip system, where the video playing skip system includes: a video playing terminal and a server,

the video playing terminal collects voice information input by a user and sends the voice information of the user to a server;

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a video playing skipping program is stored on the computer-readable storage medium, and when executed by the video playing terminal and the server, the video playing skipping program implements the following operations:

receiving user voice information collected by a video playing terminal;

Further, before the step of matching the voice information features with different scene tags in preset audio data and obtaining a scene tag matched with the voice information features, the method includes:

Further, after the step of determining whether the voice information feature includes a skip video name, the method includes:

Further, the step of matching the voice information features with different scene tags in preset audio data to obtain the scene tags matched with the voice information features includes:

Further, after the step of matching the voice information features with different scene tags in preset audio data and obtaining the scene tags matched with the voice information features, the method further includes:

The computer readable storage medium stores a video playing skip program, and when the video playing skip program is executed by the video playing terminal and the server, the following operations are further implemented:

collecting voice information input by a user;

Further, after the step of sending the user voice information and the name information of the currently played video to a server, the method further includes:

receiving an audio data request instruction sent by a server;

The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the video skipping method, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a video playing terminal (e.g., a mobile phone, a computer, a television, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A video playing and skipping method is characterized in that the video playing and skipping method is applied to an interactive system comprising a video playing terminal and a server, and the video playing and skipping method comprises the following steps:

receiving user voice information collected by a video playing terminal;

matching the voice information features with different scene tags in preset audio data to obtain the scene tags matched with the voice information features, wherein the preset audio data are a plurality of audio data with the scene tags, and each scene tag is arranged at the beginning, the end or the climax position of the corresponding scene audio data;

2. The video play skipping method of claim 1, wherein before the step of matching the voice information feature with a different scene tag in the preset audio data and obtaining a scene tag matching the voice information feature, the method comprises:

3. The video playback skipping method of claim 2, wherein after said step of determining whether a skipped video name is included in said voice information feature, comprising:

4. The video playing skip method according to claim 1, wherein the step of matching the voice information feature with different scene tags in the preset audio data to obtain the scene tag matched with the voice information feature comprises:

5. The video play skipping method of claim 1, wherein after the step of matching the voice information feature with a different scene tag in the preset audio data and obtaining a scene tag matching the voice information feature, further comprising:

6. A video playing and skipping method is characterized in that the video playing and skipping method is applied to an interactive system comprising a video playing terminal and a server, and the video playing and skipping method comprises the following steps:

collecting voice information input by a user;

sending the user voice information to a server so that the server matches the voice information features with different scene tags in preset audio data to obtain the scene tags matched with the voice information features, wherein the preset audio data are a plurality of audio data with the scene tags, and each scene tag is arranged at the beginning, the end or the climax position of the corresponding scene audio data;

7. The video playback skip method of claim 6, wherein after said step of sending said user voice information to a server, further comprising:

receiving an audio data request instruction sent by a server;

8. The video playback skip method of claim 6, wherein after said step of sending said user voice information to a server, further comprising:

9. A video playback skip system, said video playback skip system comprising: a video playing terminal and a server,

the method comprises the steps that a server receives user voice information collected by a video playing terminal, identifies the voice information, extracts characteristics of the voice information, matches the voice information characteristics with different scene labels in preset audio data, acquires the scene labels matched with the voice information characteristics, and sends the scene labels matched with the voice information characteristics to the video playing terminal, wherein the preset audio data are a plurality of audio data with scene labels, and each scene label is arranged at the beginning, the end or the climax position of the corresponding scene audio data;

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a video playback terminal and a server, implements the video playback skip method according to any one of claims 1 to 8.