CN109688475B - Video playing skipping method and system and computer readable storage medium - Google Patents

Video playing skipping method and system and computer readable storage medium Download PDF

Info

Publication number
CN109688475B
CN109688475B CN201811654558.6A CN201811654558A CN109688475B CN 109688475 B CN109688475 B CN 109688475B CN 201811654558 A CN201811654558 A CN 201811654558A CN 109688475 B CN109688475 B CN 109688475B
Authority
CN
China
Prior art keywords
voice information
video
audio data
scene
video playing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811654558.6A
Other languages
Chinese (zh)
Other versions
CN109688475A (en
Inventor
李其浪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL New Technology Co Ltd
Original Assignee
Shenzhen TCL New Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL New Technology Co Ltd filed Critical Shenzhen TCL New Technology Co Ltd
Priority to CN201811654558.6A priority Critical patent/CN109688475B/en
Publication of CN109688475A publication Critical patent/CN109688475A/en
Priority to PCT/CN2019/126022 priority patent/WO2020135161A1/en
Application granted granted Critical
Publication of CN109688475B publication Critical patent/CN109688475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses a video playing skipping method, a system and a computer readable storage medium, comprising: receiving user voice information collected by a video playing terminal; recognizing the user voice information and extracting the characteristics of the voice information; matching the voice information features with different scene labels in preset audio data to obtain scene labels matched with the voice information features; and sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position. The invention also discloses a video playing and skipping system and a computer readable storage medium. According to the invention, through voice recognition and semantic recognition of the server, the video skipping can be realized by the user through a voice command, so that the user experience is improved.

Description

Video playing skipping method and system and computer readable storage medium
Technical Field
The present invention relates to the field of video playing technologies, and in particular, to a video playing skip method, a video playing skip system, and a computer readable storage medium.
Background
With the development of internet technology, people no longer rely on receiving television live broadcast signals to watch live videos, but watch any videos existing in the network through the internet, including live videos. Therefore, the video type can be selected according to the preference of the user, the playing progress can be adjusted at will in the process of watching the video, and the video can be directly jumped to a scene to be watched.
When the video playing progress is adjusted, a user can realize the adjustment through a key on the television remote control or a virtual key on the video playing software, for example, the user presses the key on the television remote control or the virtual key on the video playing software, and the video playing progress jumps forwards or backwards for a certain time; if the user always presses a key on a television remote control or a virtual key on video playing software, the video playing progress jumps forwards or backwards for a certain time; for example, after the user sets the jump time, the television or the video playing software loads the jump time and then plays the video, and the like. Therefore, the user can jump the video to the scene to be watched only by manually operating the key, and the one-time jumping is difficult to complete, so that the user experience is poor.
Disclosure of Invention
The invention mainly aims to provide a video playing and skipping method, a video playing and skipping system and a computer readable storage medium, and aims to solve the technical problem that a user needs to manually operate keys for multiple times to skip a video to a scene to be watched, and the user experience is poor.
In order to achieve the above object, the present invention provides a video playing and skipping method, which comprises the following steps:
receiving user voice information collected by a video playing terminal;
recognizing the user voice information and extracting the characteristics of the voice information;
matching the voice information features with different scene labels in preset audio data to obtain scene labels matched with the voice information features;
and sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position.
Preferably, before the step of matching the voice information features with different scene tags in preset audio data and obtaining a scene tag matched with the voice information features, the method includes:
judging whether the voice information characteristics comprise a skip video name or not;
if the voice information characteristics do not include the name of the skip video, acquiring the name of the currently played video;
the step of matching the voice information features with different scene tags in preset audio data and acquiring the scene tags matched with the voice information features comprises the following steps:
and matching the voice information characteristics and the name of the currently played video with different scene labels in the audio data to obtain the scene label matched with the voice information characteristics.
Preferably, after the step of determining whether the voice information feature includes a skip video name, the method includes:
if the voice information feature comprises a skip video name, executing the following steps: and matching the voice information characteristics with different scene labels in preset audio data to obtain the scene label matched with the voice information characteristics.
Preferably, the step of matching the voice information features with different scene tags in preset audio data and obtaining the scene tags matched with the voice information features includes:
judging whether preset audio data comprises audio data corresponding to a currently played video;
if the preset audio data does not contain the audio data corresponding to the currently played video, sending a request instruction to a video playing terminal;
and receiving audio data corresponding to the currently played video sent by the video playing terminal, and storing the audio data to preset audio data.
Preferably, after the step of matching the voice information feature with different scene tags in preset audio data and obtaining a scene tag matched with the voice information feature, the method further includes:
if the scene label which accords with the voice information characteristic is not matched within the preset time, generating a matching failure prompt;
and sending the matching failure prompt to the video playing terminal so that the video playing terminal displays the prompt information.
In addition, in order to achieve the above object, the present invention further provides a video playing and skipping method, which includes the following steps:
collecting voice information input by a user;
sending the user voice information to a server so that the server matches the voice information features with different scene tags in the audio data to obtain scene tags matched with the voice information features;
and receiving the scene label matched with the voice information characteristics, and skipping the video played on the video playing terminal to a corresponding position.
Preferably, after the step of sending the user voice information and the name information of the currently playing video to the server, the method further includes:
receiving an audio data request instruction sent by a server;
and sending the audio data corresponding to the currently played video to the server.
Preferably, after the step of sending the user voice information to a server, the method further includes:
and if the server does not match the scene label which accords with the voice information characteristic within the preset time, receiving a matching failure prompt and displaying the prompt in a video terminal interface so as to prompt the user.
In addition, to achieve the above object, the present invention further provides a video playing skip system, where the video playing skip system includes: a video playing terminal and a server,
the video playing terminal collects voice information input by a user and sends the voice information of the user and name information of a currently played video to a server;
the server receives user voice information collected by the video playing terminal, identifies the voice information, extracts the characteristics of the voice information, matches the voice information characteristics with different scene labels in preset audio data, acquires a scene label matched with the voice information characteristics, and sends the scene label matched with the voice information characteristics to the video playing terminal;
and the video playing terminal receives the scene label matched with the voice information characteristic and skips the video played on the video playing terminal to a corresponding position.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which when executed by a video playback terminal and a server, implements the video playback skip method as described above.
The invention is applied to an interactive system consisting of a video playing terminal and a server, firstly, user voice information acquired by the video playing terminal through a voice acquisition module such as a microphone is received, the user voice information is identified through voice identification and semantic identification functions to obtain characteristics of the user voice information, the characteristics mainly comprise information such as video names and scenes which the user intends to jump, meanwhile, the server matches the voice information characteristics with different scene labels in audio data to obtain scene labels matched with the voice information characteristics, and finally, the scene labels matched with the voice information characteristics are sent to the video playing terminal to enable the video to jump to corresponding positions. Therefore, the video skipping can be realized by the user through the voice command, the video skipping can be accurately skipped to the scene desired by the user, and the user experience is improved.
Drawings
FIG. 1 is a schematic diagram of a system architecture to which embodiments of the present invention relate;
FIG. 2 is a flowchart illustrating a video playing skip method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a video playing skip method according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a video playing and skipping method according to a third embodiment of the present invention;
FIG. 5 is a flowchart illustrating a video playing and skipping method according to a fourth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a video playing and skipping system according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: receiving user voice information collected by a video playing terminal; recognizing the user voice information and extracting the characteristics of the voice information; matching the voice information features with different scene labels in preset audio data to obtain scene labels matched with the voice information features; and sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position.
The present invention is needed to solve the problem that the prior art cannot jump the video playing to the corresponding scene position through the scene features in the user voice.
The invention provides a solution, which enables a user to realize video skipping through a voice command, and can skip to a scene desired by the user accurately, thereby improving the user experience.
Fig. 1 is a schematic diagram of a system architecture of an embodiment of a video playing and skipping method according to the present application.
Referring to fig. 1, the system architecture 100 may include video playback terminals 101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the video playback terminals 101, 102, 103 and the server 105. The network 104 may include various wired, wireless communication links, such as fiber optic cables, mobile networks, WiFi, bluetooth, or hot spots, among others.
A user may use the video playback terminals 101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages or the like. The video playing terminals 101, 102, 103 may have various communication client applications installed thereon, such as a video playing application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The video playback terminals 101, 102, 103 may be hardware or software. When the video playback terminals 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting video playback, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio layer iii, motion Picture Experts compression standard Audio layer 3), MP4 players (Moving Picture Experts Group Audio layer IV, motion Picture Experts compression standard Audio layer 4), laptop portable computers, desktop computers, and the like. When the video playback terminals 101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, for example, reading videos played on the video playback terminals 101, 102, and 103, or analyzing and processing received various voice information, instruction information, and video/audio data, and feeding back processing results, such as video clips, scene tags, instruction information, and the like, to the video playback terminals, so that the video playback terminals complete corresponding actions according to the processing results.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the video playing skip method provided in the embodiment of the present application may be executed by the video playing terminals 101, 102, and 103, or may be executed by the server 105. Accordingly, the means for pushing information may be provided in the video playback terminals 101, 102, 103, or in the server 105. And is not particularly limited herein.
It should be understood that the number of video playback terminals, networks, and servers in fig. 1 is merely illustrative. There may be any number of video playback terminals, networks, and servers, as desired for implementation.
Referring to fig. 2, a first embodiment of the present invention provides a video playing skip method, including the following steps:
and step S10, receiving the user voice information collected by the video playing terminal.
The invention can be applied to an interactive system consisting of the video playing terminal and the server, and the video playing terminal is connected with the server through the network to realize interaction. In this embodiment, the video playing terminal takes a television as an example, acquires voice information of a user in real time through a voice acquisition module of the television, and sends the acquired voice information to the server through a wireless network. And the server receives the user voice information sent by the television at the other end of the network in real time.
Step S20, recognizing the user voice information, and extracting the features of the voice information.
The server carries out voice recognition and semantic recognition on the received user language information, wherein the voice recognition is to convert the voice information into character information which can be recognized by a computer through an acoustic model and a voice model, and the semantic recognition is to carry out intelligent analysis based on the characteristics of gender, hobbies, ordinary on-demand tendency and the like of the user on the basis of the voice recognition so as to better understand the intention of the user. If the user inputs voice as the full name of a specific movie or a specific television show, the server can find out the movie or the television show which the user wants to watch only through voice recognition, if the user inputs voice as fuzzy sentences such as ' an love film ', ' a hot-cast action film ', ' a hong kong director ' movie ' and ' hollywood big film ', and the server also needs semantic recognition to accurately jump.
The server can extract the characteristics of the user voice information based on the voice recognition and voice recognition functions, for example, the user records the voice as that the Zhao position length in the name of TV play is checked, the server can recognize the voice and extract the characteristics of the TV play, the Min meaning of the name and the Zhao position length are checked.
Step S30, matching the voice information features with different scene tags in the audio data, and obtaining a scene tag matched with the voice information features.
The server of the invention is preset with mass audio data and carries out voice recognition marking on all the audio data to generate corresponding scene labels, and the server can generate different scene labels for different scenes in the audio data, wherein the scene labels comprise related information such as video types, names, scene descriptions, characters, time, set numbers and the like. The scene tag can be at the beginning, end or climax of the corresponding scene audio information, and the case is preferably at the beginning of the corresponding scene audio information.
It should be noted that, in addition to the above embodiments, the server can obtain the video clip or the subtitle information corresponding to the mass audio data from the television or the network according to the mass audio data in the audio database, and then perform intelligent analysis on the video clip or the subtitle information to generate the scene tag at the corresponding position of the audio data.
In this embodiment, the user intends to jump to the video playback terminal to jump to the time period corresponding to the video, for example, when the video playback terminal is currently playing the television drama minqi, at this moment, the user enters a voice command that the user is a word of the television drama, and a place length is found in the word of the television drama, and the server first determines that the user voice information is extracted from the audio database and extracts all audio information related to the word of the television drama.
According to scene information which is contained in the voice information characteristics of the user and is to be jumped by the user, matching the scene information which is to be jumped by the user with each scene label in the audio data, and finding out the scene label with the highest matching degree, if a voice command input by the user is ' name of television drama '. The Chu Zhao Chuang is found ', all scene labels in the corresponding audio data are found in an audio database, and if the Zhao Chuang is caught, the Chen rock resists an excavator, the Hou Liang Ping and Qigong Wei ' Chi Dou ', the Ouyang is caught, and the like, the scene label matched with the ' Zhao Chuang is caught ' is found out.
And step S40, sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position.
And after acquiring the scene label matched with the voice information characteristic, the server sends the scene label to the video playing terminal so that the video playing terminal jumps to a corresponding position according to the scene label.
It should be noted that, in addition to the foregoing embodiments, the server may generate a jump instruction according to the scene tag matched with the voice information feature, where the jump instruction includes scene tag position information, so that the video playing terminal can jump to a corresponding position according to the jump instruction.
In this embodiment, the server receives user voice information and name information of a currently played video, which are collected by a video playing terminal, performs voice recognition and semantic recognition on the user voice information, extracts features of the voice information, confirms that an audio database contains audio data corresponding to the currently played video according to the name information of the currently played video, matches the voice information features with different scene tags in the audio data, obtains scene tags matched with the voice information features, and sends the scene tags matched with the voice information features to the video playing terminal so as to control the video played on the video playing terminal to jump to corresponding positions. According to the method and the device, the voice information characteristics of the user are identified through the voice identification function of the server, and the scene label which is consistent with the voice command of the user is matched according to the voice characteristics of the user, so that the video playing terminal can realize video skipping and can skip to the scene which the user wants accurately, and the experience of the user is improved.
Further, referring to fig. 3, a second embodiment of the present invention provides a video playing skip method, based on the embodiment shown in fig. 2, before the step of matching the voice information feature with different scene tags in the preset audio data in step S30, and acquiring a scene tag matched with the voice information feature, the method includes:
step S50, determining whether the voice information feature includes a skip video name.
In order to improve the accuracy of the query result, in this embodiment, before matching the tag, it is further determined whether the voice information feature includes the skipped video name, and if the voice information feature does not include the skipped video name, step S60 is executed to obtain the name of the currently played video.
In this embodiment, the voice command entered by the user does not have a name of a video to be skipped, and those skilled in the art can understand that the object to be skipped by the user is a video currently being played by the video playing terminal, and at this time, the server obtains the name of the currently played video from the playing terminal. Step S30 is replaced with: step S31: and matching the voice information characteristics and the name of the currently played video with different scene labels in the audio data to obtain the scene label matched with the voice information characteristics.
After the video name is obtained, matching is carried out according to the name of the voice of the user and the currently played video with different scene labels in the audio data, and a scene label matched with the voice information characteristic is obtained, for example, the video playing terminal plays the name of the television drama at present, the server collects the currently played video from the video playing terminal, then the server judges that the voice information of the user is extracted from all audio information related to the name of the television drama in the audio database according to the fact that the voice command input by the user is 'the position length of the Zhao in the name of the television drama', and then the server judges that the voice information of the user is in all audio information related to the name of the television drama according to the characteristic in the voice information, so that the matching speed according to the video name is faster, and the result is more accurate. In addition, if the user inputs the voice as 'jump to big ending', the characteristics of the 'big ending' are extracted, and the video currently played jumps to the starting position of the last set.
Certainly, if the voice information feature does not include the name of the skipped video, the name of the currently played video may not be obtained, and the voice information feature is directly adopted to perform label matching on the preset audio data, which requires more audio data to be queried, resulting in a slower query speed.
If the voice information feature includes the skip video name, step S30 is executed to match the voice information feature with different scene tags in preset audio data, and obtain a scene tag matched with the voice information feature.
The execution process of the server at this time is the same as step S31 except that the video names one among the user voice information and one obtained by the server to the video playback terminal.
In addition, if the voice recorded by the user is "a love film", "a hot-cast action film", "a hong kong director's movie", "hollywood big film", and the like, which do not contain specific tv series or movie name information, the server needs to perform self-matching in the voice database, and can perform intelligent analysis based on the characteristics of the user, such as gender, hobbies, and ordinary on-demand tendency, and select a video suitable for the user, so that the video playing terminal jumps to the video. The user can also perform other instructions, for example, if the voice recorded by the user is "advance for 30 minutes", the characteristics of "advance" and "30 minutes" are extracted, and the video currently being played is jumped to the position advanced for 30 minutes.
The invention judges whether the skip video name exists in the voice information characteristics of the user through the server, thereby realizing the skip of the currently played video, the switching of the video playing to other video names or the switching of the video playing to the corresponding scene of other video names, and being more in line with the public requirement.
Further, the step S30 matches the voice information feature with different scene tags in preset audio data, and obtains a scene tag matched with the voice information feature, including:
step S32, judging whether the preset audio data includes the audio data corresponding to the currently played video;
if the preset audio data does not include the audio data corresponding to the currently played video, step S31 is executed, and step S34 is executed.
Step S33, a request instruction is sent to the video playback terminal.
And step S34, receiving the audio data corresponding to the currently played video sent by the video playing terminal, and storing the audio data in an audio database.
If the audio database does not contain the audio data corresponding to the currently played video of the video playing terminal, the server sends a request instruction to the video playing terminal, the request instruction requires the video playing terminal to send the audio data corresponding to the currently played video, and the server requests the video playing terminal to store the audio data in the audio database after receiving the audio data sent by the video playing terminal. Therefore, the audio data in the audio database is richer and more complete, and meanwhile, when the video object to be jumped by the user is the current playing video, the scene label required by the user can be matched in time.
Further, referring to fig. 4, a third embodiment of the present invention provides a video playing skipping method, based on the embodiment shown in fig. 2, after matching the voice information feature with different scene tags in the audio data in step S30, and acquiring a scene tag matched with the voice information feature, the method further includes:
step S70, if the scene label according with the voice information characteristic is not matched in the preset time, generating a matching failure prompt;
and step S80, sending the matching failure prompt to the video playing terminal so that the video playing terminal displays the prompt information.
Matching the voice information characteristics with different scene labels in an audio database, and if no video object to be skipped by a user exists in the audio database, directly ending the matching; if the audio database has video objects to be jumped by the user, identifying the audio information corresponding to the video names to be jumped by the user in the audio database, obtaining each scene label corresponding to the audio information, and matching with each scene label. And if the scene label which accords with the voice information characteristic is not matched in the preset time, finishing the matching. And after the matching is finished, generating a matching failure prompt and sending the matching failure prompt to the video playing terminal. The video playing terminal receives the prompt information of the matching failure, can directly display the prompt information on a video playing interface, and can also prompt the control such as Toast, Snackbar and other prompt information through a user on the terminal. Of course, besides the matching result giving a matching failure prompt, other video information closer to the intention in the audio database can be recommended to the user according to the voice information characteristics. If the voice recorded by the user is ' one love film ', ' hot-broadcast action film ', ' Hongkong director ' film ', ' hollywood large film ', and the like, the server performs self-matching in the voice database, can perform intelligent analysis based on the characteristics of the user such as sex, hobbies, ordinary on-demand tendency, and the like, and selects a video suitable for the user so that the video playing terminal jumps to the video.
Referring to fig. 5, a fourth embodiment of the present invention provides a video playing skip method, including the following steps:
step S110, collecting voice information input by a user.
In this embodiment, the video playing terminal may include a video playing module and a voice collecting module; or only comprise a video playing module and then be externally connected with a voice acquisition module, such as a microphone. In the embodiment, the mobile phone is used as the video playing terminal, voice information of a user is collected through a microphone of the mobile phone, and a video playing application program is installed in the mobile phone, so that a video which the user wants to watch can be played by the video playing application program.
Step S120, the user voice information is sent to a server, so that the server matches the voice information characteristics with different scene labels in the audio data, and the scene label matched with the voice information characteristics is obtained.
Sending the voice information of the user to a server by a mobile phone, wherein the voice information can comprise a scene keyword (such as 'X-zhu Xiantai'), and can also comprise a drama name keyword and a scene keyword (such as 'drama name A plot B') so that the server can directly analyze a video object and scene information which the user intends to jump from the voice information, and simultaneously, the server judges whether audio data corresponding to the currently played video exists in an audio database according to the name information of the currently played video sent by the mobile phone, and if not, executing the following steps:
step S121, receiving an audio data request instruction sent by the server.
Step S122, sending the audio data corresponding to the currently played video to the server.
After receiving an audio data request instruction sent by the server, the mobile phone calls audio data corresponding to the currently played video from the background, packages the audio data and uploads the audio data to the server, so that the audio database of the server has the audio data corresponding to the currently played video.
And step S130, receiving the scene label matched with the voice information characteristic, and skipping the video played on the video playing terminal to a corresponding position.
And the mobile phone receives the matching result sent by the server in real time, and if the matching result is a scene label matched with the voice information characteristic, the mobile phone performs skipping on the video playing application program according to the position information contained in the scene label. And if the server does not match the scene label conforming to the voice information characteristic and the mobile phone receives a matching failure prompt, displaying text information on a screen of the mobile phone to prompt a user.
In this embodiment, the video playing terminal collects voice information input by a user through a microphone, acquires name information of a currently playing video in a background, and sends the voice information of the user and the name information of the currently playing video to the server, so that the server matches the voice information features with different scene tags in the audio data, acquires a scene tag matched with the voice information features, receives the scene tag matched with the voice information features, and skips the video played on the video playing terminal to a corresponding position. The invention enables the user to directly send the voice command to realize video skipping and skip to the video scene to be watched, thereby improving the user experience.
Referring to fig. 6, the present invention is a schematic diagram of a first embodiment of a video playing skip system, where the video playing skip system includes: a video playing terminal and a server,
the video playing terminal collects voice information input by a user and sends the voice information of the user to a server;
the server receives user voice information collected by the video playing terminal, identifies the voice information, extracts the characteristics of the voice information, matches the voice information characteristics with different scene labels in preset audio data, acquires a scene label matched with the voice information characteristics, and sends the scene label matched with the voice information characteristics to the video playing terminal;
and the video playing terminal receives the scene label matched with the voice information characteristic and skips the video played on the video playing terminal to a corresponding position.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a video playing skipping program is stored on the computer-readable storage medium, and when executed by the video playing terminal and the server, the video playing skipping program implements the following operations:
receiving user voice information collected by a video playing terminal;
recognizing the user voice information and extracting the characteristics of the voice information;
matching the voice information features with different scene labels in preset audio data to obtain scene labels matched with the voice information features;
and sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position.
Further, before the step of matching the voice information features with different scene tags in preset audio data and obtaining a scene tag matched with the voice information features, the method includes:
judging whether the voice information characteristics comprise a skip video name or not;
if the voice information characteristics do not include the name of the skip video, acquiring the name of the currently played video;
the step of matching the voice information features with different scene tags in preset audio data and acquiring the scene tags matched with the voice information features comprises the following steps:
and matching the voice information characteristics and the name of the currently played video with different scene labels in the audio data to obtain the scene label matched with the voice information characteristics.
Further, after the step of determining whether the voice information feature includes a skip video name, the method includes:
if the voice information feature comprises a skip video name, executing the following steps: and matching the voice information characteristics with different scene labels in preset audio data to obtain the scene label matched with the voice information characteristics.
Further, the step of matching the voice information features with different scene tags in preset audio data to obtain the scene tags matched with the voice information features includes:
judging whether preset audio data comprises audio data corresponding to a currently played video;
if the preset audio data does not contain the audio data corresponding to the currently played video, sending a request instruction to a video playing terminal;
and receiving audio data corresponding to the currently played video sent by the video playing terminal, and storing the audio data to preset audio data.
Further, after the step of matching the voice information features with different scene tags in preset audio data and obtaining the scene tags matched with the voice information features, the method further includes:
if the scene label which accords with the voice information characteristic is not matched within the preset time, generating a matching failure prompt;
and sending the matching failure prompt to the video playing terminal so that the video playing terminal displays the prompt information.
The computer readable storage medium stores a video playing skip program, and when the video playing skip program is executed by the video playing terminal and the server, the following operations are further implemented:
collecting voice information input by a user;
sending the user voice information to a server so that the server matches the voice information features with different scene tags in the audio data to obtain scene tags matched with the voice information features;
and receiving the scene label matched with the voice information characteristics, and skipping the video played on the video playing terminal to a corresponding position.
Further, after the step of sending the user voice information and the name information of the currently played video to a server, the method further includes:
receiving an audio data request instruction sent by a server;
and sending the audio data corresponding to the currently played video to the server.
Further, after the step of sending the user voice information and the name information of the currently played video to a server, the method further includes:
and if the server does not match the scene label which accords with the voice information characteristic within the preset time, receiving a matching failure prompt and displaying the prompt in a video terminal interface so as to prompt the user.
The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the video skipping method, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a video playing terminal (e.g., a mobile phone, a computer, a television, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A video playing and skipping method is characterized in that the video playing and skipping method is applied to an interactive system comprising a video playing terminal and a server, and the video playing and skipping method comprises the following steps:
receiving user voice information collected by a video playing terminal;
recognizing the user voice information and extracting the characteristics of the voice information;
matching the voice information features with different scene tags in preset audio data to obtain the scene tags matched with the voice information features, wherein the preset audio data are a plurality of audio data with the scene tags, and each scene tag is arranged at the beginning, the end or the climax position of the corresponding scene audio data;
and sending the scene label matched with the voice information characteristic to a video playing terminal so as to control the video played on the video playing terminal to jump to a corresponding position.
2. The video play skipping method of claim 1, wherein before the step of matching the voice information feature with a different scene tag in the preset audio data and obtaining a scene tag matching the voice information feature, the method comprises:
judging whether the voice information characteristics comprise a skip video name or not;
if the voice information characteristics do not include the name of the skip video, acquiring the name of the currently played video;
the step of matching the voice information features with different scene tags in preset audio data and acquiring the scene tags matched with the voice information features comprises the following steps:
and matching the voice information characteristics and the name of the currently played video with different scene labels in the audio data to obtain the scene label matched with the voice information characteristics.
3. The video playback skipping method of claim 2, wherein after said step of determining whether a skipped video name is included in said voice information feature, comprising:
if the voice information feature comprises a skip video name, executing the following steps: and matching the voice information characteristics with different scene labels in preset audio data to obtain the scene label matched with the voice information characteristics.
4. The video playing skip method according to claim 1, wherein the step of matching the voice information feature with different scene tags in the preset audio data to obtain the scene tag matched with the voice information feature comprises:
judging whether preset audio data comprises audio data corresponding to a currently played video;
if the preset audio data does not contain the audio data corresponding to the currently played video, sending a request instruction to a video playing terminal;
and receiving audio data corresponding to the currently played video sent by the video playing terminal, and storing the audio data to preset audio data.
5. The video play skipping method of claim 1, wherein after the step of matching the voice information feature with a different scene tag in the preset audio data and obtaining a scene tag matching the voice information feature, further comprising:
if the scene label which accords with the voice information characteristic is not matched within the preset time, generating a matching failure prompt;
and sending the matching failure prompt to the video playing terminal so that the video playing terminal displays the prompt information.
6. A video playing and skipping method is characterized in that the video playing and skipping method is applied to an interactive system comprising a video playing terminal and a server, and the video playing and skipping method comprises the following steps:
collecting voice information input by a user;
sending the user voice information to a server so that the server matches the voice information features with different scene tags in preset audio data to obtain the scene tags matched with the voice information features, wherein the preset audio data are a plurality of audio data with the scene tags, and each scene tag is arranged at the beginning, the end or the climax position of the corresponding scene audio data;
and receiving the scene label matched with the voice information characteristics, and skipping the video played on the video playing terminal to a corresponding position.
7. The video playback skip method of claim 6, wherein after said step of sending said user voice information to a server, further comprising:
receiving an audio data request instruction sent by a server;
and sending the audio data corresponding to the currently played video to the server.
8. The video playback skip method of claim 6, wherein after said step of sending said user voice information to a server, further comprising:
and if the server does not match the scene label which accords with the voice information characteristic within the preset time, receiving a matching failure prompt and displaying the prompt in a video terminal interface so as to prompt the user.
9. A video playback skip system, said video playback skip system comprising: a video playing terminal and a server,
the video playing terminal collects voice information input by a user and sends the voice information of the user to a server;
the method comprises the steps that a server receives user voice information collected by a video playing terminal, identifies the voice information, extracts characteristics of the voice information, matches the voice information characteristics with different scene labels in preset audio data, acquires the scene labels matched with the voice information characteristics, and sends the scene labels matched with the voice information characteristics to the video playing terminal, wherein the preset audio data are a plurality of audio data with scene labels, and each scene label is arranged at the beginning, the end or the climax position of the corresponding scene audio data;
and the video playing terminal receives the scene label matched with the voice information characteristic and skips the video played on the video playing terminal to a corresponding position.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a video playback terminal and a server, implements the video playback skip method according to any one of claims 1 to 8.
CN201811654558.6A 2018-12-29 2018-12-29 Video playing skipping method and system and computer readable storage medium Active CN109688475B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811654558.6A CN109688475B (en) 2018-12-29 2018-12-29 Video playing skipping method and system and computer readable storage medium
PCT/CN2019/126022 WO2020135161A1 (en) 2018-12-29 2019-12-17 Video playback jump method and system, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811654558.6A CN109688475B (en) 2018-12-29 2018-12-29 Video playing skipping method and system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109688475A CN109688475A (en) 2019-04-26
CN109688475B true CN109688475B (en) 2020-10-02

Family

ID=66191672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811654558.6A Active CN109688475B (en) 2018-12-29 2018-12-29 Video playing skipping method and system and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN109688475B (en)
WO (1) WO2020135161A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109688475B (en) * 2018-12-29 2020-10-02 深圳Tcl新技术有限公司 Video playing skipping method and system and computer readable storage medium
CN110166845B (en) * 2019-05-13 2021-10-26 Oppo广东移动通信有限公司 Video playing method and device
CN112261436B (en) * 2019-07-04 2024-04-02 青岛海尔多媒体有限公司 Video playing method, device and system
CN111209437B (en) * 2020-01-13 2023-11-28 腾讯科技(深圳)有限公司 Label processing method and device, storage medium and electronic equipment
CN111601163B (en) * 2020-04-26 2023-03-03 百度在线网络技术(北京)有限公司 Play control method and device, electronic equipment and storage medium
CN111818172B (en) * 2020-07-21 2022-08-19 海信视像科技股份有限公司 Method and device for controlling intelligent equipment by management server of Internet of things
CN112632329A (en) * 2020-12-18 2021-04-09 咪咕互动娱乐有限公司 Video extraction method and device, electronic equipment and storage medium
CN112954426B (en) * 2021-02-07 2022-11-15 咪咕文化科技有限公司 Video playing method, electronic equipment and storage medium
CN113689856B (en) * 2021-08-20 2023-11-03 Vidaa(荷兰)国际控股有限公司 Voice control method for video playing progress of browser page and display equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869623A (en) * 2015-12-07 2016-08-17 乐视网信息技术(北京)股份有限公司 Video playing method and device based on speech recognition
CN106162357A (en) * 2016-05-31 2016-11-23 腾讯科技(深圳)有限公司 Obtain the method and device of video content
CN107135418A (en) * 2017-06-14 2017-09-05 北京易世纪教育科技有限公司 A kind of control method and device of video playback
CN107155138A (en) * 2017-06-06 2017-09-12 深圳Tcl数字技术有限公司 Video playback jump method, equipment and computer-readable recording medium
CN107506385A (en) * 2017-07-25 2017-12-22 努比亚技术有限公司 A kind of video file retrieval method, equipment and computer-readable recording medium
CN107871500A (en) * 2017-11-16 2018-04-03 百度在线网络技术(北京)有限公司 One kind plays multimedia method and apparatus
CN107948729A (en) * 2017-12-13 2018-04-20 广东欧珀移动通信有限公司 Rich Media's processing method, device, storage medium and electronic equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336093B2 (en) * 1998-01-16 2002-01-01 Avid Technology, Inc. Apparatus and method using speech recognition and scripts to capture author and playback synchronized audio and video
US9451195B2 (en) * 2006-08-04 2016-09-20 Gula Consulting Limited Liability Company Moving video tags outside of a video area to create a menu system
CN101329867A (en) * 2007-06-21 2008-12-24 西门子(中国)有限公司 Method and device for playing speech on demand
US9113128B1 (en) * 2012-08-31 2015-08-18 Amazon Technologies, Inc. Timeline interface for video content
CN105677735B (en) * 2015-12-30 2020-04-21 腾讯科技(深圳)有限公司 Video searching method and device
CN107071542B (en) * 2017-04-18 2020-07-28 百度在线网络技术(北京)有限公司 Video clip playing method and device
CN107704525A (en) * 2017-09-04 2018-02-16 优酷网络技术(北京)有限公司 Video searching method and device
CN109688475B (en) * 2018-12-29 2020-10-02 深圳Tcl新技术有限公司 Video playing skipping method and system and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869623A (en) * 2015-12-07 2016-08-17 乐视网信息技术(北京)股份有限公司 Video playing method and device based on speech recognition
CN106162357A (en) * 2016-05-31 2016-11-23 腾讯科技(深圳)有限公司 Obtain the method and device of video content
CN107155138A (en) * 2017-06-06 2017-09-12 深圳Tcl数字技术有限公司 Video playback jump method, equipment and computer-readable recording medium
CN107135418A (en) * 2017-06-14 2017-09-05 北京易世纪教育科技有限公司 A kind of control method and device of video playback
CN107506385A (en) * 2017-07-25 2017-12-22 努比亚技术有限公司 A kind of video file retrieval method, equipment and computer-readable recording medium
CN107871500A (en) * 2017-11-16 2018-04-03 百度在线网络技术(北京)有限公司 One kind plays multimedia method and apparatus
CN107948729A (en) * 2017-12-13 2018-04-20 广东欧珀移动通信有限公司 Rich Media's processing method, device, storage medium and electronic equipment

Also Published As

Publication number Publication date
WO2020135161A1 (en) 2020-07-02
CN109688475A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN109688475B (en) Video playing skipping method and system and computer readable storage medium
RU2614137C2 (en) Method and apparatus for obtaining information
EP3044725B1 (en) Generating alerts based upon detector outputs
US10979775B2 (en) Seamless switching from a linear to a personalized video stream
CN110913241B (en) Video retrieval method and device, electronic equipment and storage medium
JP2020504475A (en) Providing related objects during video data playback
US20150012840A1 (en) Identification and Sharing of Selections within Streaming Content
CN109474843B (en) Method for voice control of terminal, client and server
WO2016029561A1 (en) Display terminal-based data processing method
JP2020030814A (en) Method and apparatus for processing information
CN109829064B (en) Media resource sharing and playing method and device, storage medium and electronic device
WO2017015114A1 (en) Media production system with social media feature
CN105122242A (en) Methods, systems, and media for presenting mobile content corresponding to media content
CN112104915B (en) Video data processing method and device and storage medium
CN104065979A (en) Method for dynamically displaying information related with video content and system thereof
CN110691281B (en) Video playing processing method, terminal device, server and storage medium
CN105210376A (en) Using an audio stream to identify metadata associated with a currently playing television program
US20170272793A1 (en) Media content recommendation method and device
CN110309324B (en) Searching method and related device
CN109600646B (en) Voice positioning method and device, smart television and storage medium
CN104881407A (en) Information recommending system and information recommending method based on feature recognition
CN112579935B (en) Page display method, device and equipment
JP7058795B2 (en) Video processing methods, devices, terminals and storage media
CN110750719A (en) IPTV-based information accurate pushing system and method
WO2017008498A1 (en) Method and device for searching program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant