WO2020135161A1 - 视频播放跳转方法、系统及计算机可读存储介质 - Google Patents

视频播放跳转方法、系统及计算机可读存储介质 Download PDF

Info

Publication number
WO2020135161A1
WO2020135161A1 PCT/CN2019/126022 CN2019126022W WO2020135161A1 WO 2020135161 A1 WO2020135161 A1 WO 2020135161A1 CN 2019126022 W CN2019126022 W CN 2019126022W WO 2020135161 A1 WO2020135161 A1 WO 2020135161A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice information
video
audio data
video playback
server
Prior art date
Application number
PCT/CN2019/126022
Other languages
English (en)
French (fr)
Inventor
李其浪
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Publication of WO2020135161A1 publication Critical patent/WO2020135161A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the technical field of video playback, and in particular to a video playback jump method, system, and computer-readable storage medium.
  • the user can use the buttons on the TV remote control or virtual buttons on the video playback software. For example, if the user presses the buttons on the TV remote control or the virtual buttons on the video playback software, the video playback progresses forward or backward. Jump after a certain time; if the user keeps pressing the button on the TV remote control or the virtual button on the video playback software, the video playback progress will jump forward or backward for a certain time; if the user sets the jump time, the TV Or the video playback software loads the jump time and then plays the video. In this way, the user needs to manually operate the buttons to jump the video to the scene to be watched, and it is difficult to complete the jump at a time, and the user experience is poor.
  • the main purpose of this application is to provide a video playback jump method, system and computer readable storage medium, which aims to solve the need for users to manually operate the buttons multiple times to jump the video to the scene they want to watch. Poor technical issues.
  • this application provides a video playback jump method, including the following steps:
  • the step of matching the voice information feature with different scene tags in the preset audio data to obtain scene tags matching the voice information feature it includes:
  • the feature of the voice information does not include the name of the jump video, the name of the currently playing video is obtained;
  • the step of matching the voice information feature with different scene tags in the preset audio data, and obtaining the scene tags matching the voice information feature includes:
  • the step of determining whether the feature of the voice information includes the name of the jump video it includes:
  • the voice information feature includes the name of the jump video, then perform the step of: matching the voice information feature with different scene tags in the preset audio data to obtain a scene tag that matches the voice information feature.
  • the step of matching the voice information feature with different scene tags in the preset audio data, and obtaining the scene tags matching the voice information feature includes:
  • the method further includes:
  • a matching failure prompt is generated
  • the matching failure prompt is sent to the video playback terminal, so that the video playback terminal displays the prompt information.
  • the present application also provides a video playback jump method, including the following steps:
  • the method further includes:
  • the method further includes:
  • the server If the server does not match a scene tag that matches the characteristics of the voice information within a preset time, it receives a matching failure prompt and displays it on the video terminal interface to prompt the user.
  • the present application also provides a video playback jump system
  • the video playback jump system includes: a video playback terminal and a server
  • the video playback terminal collects the voice information input by the user, and sends the user voice information and the name information of the currently playing video to the server;
  • the server receives user voice information collected by a video playback terminal, recognizes the voice information, extracts the characteristics of the voice information, matches the characteristics of the voice information with different scene tags in preset audio data, and obtains Scene tags matching the characteristics of the voice information, sending the scene tags matching the characteristics of the voice information to the video playback terminal;
  • the video playback terminal receives the scene tag matching the characteristics of the voice information, and jumps the video played on the video playback terminal to a corresponding position.
  • the present application also provides a computer-readable storage medium that implements the video playback jump method as described above when the computer program is executed by the video playback terminal and the server.
  • This application is applied to an interactive system composed of a video playback terminal and a server. It first receives user voice information collected by the video playback terminal through a voice collection module such as a microphone, and passes the user voice information through voice recognition and semantic recognition functions to perform user voice information Recognize and obtain the characteristics of the user's voice information, which mainly includes the video name, scene and other information that the user intends to jump to, and the server matches the voice information characteristics with different scene tags in the audio data to obtain Scene tags matching the features of the voice information, and finally sending the scene tags matching the features of the voice information to the video playback terminal, so that the video jumps to the corresponding position.
  • the user can realize the video jump through the voice command, and can jump to the scene that the user wants accurately, thereby improving the user's experience.
  • FIG. 1 is a schematic diagram of a system architecture involved in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a video playback jumping method of the present application
  • FIG. 3 is a schematic flowchart of a second embodiment of a video playback jump method of the present application.
  • FIG. 4 is a schematic flowchart of a third embodiment of a video playback and jump method according to this application.
  • FIG. 5 is a schematic flowchart of a fourth embodiment of a video playback jump method according to this application.
  • FIG. 6 is a schematic structural diagram of a first embodiment of a video playback jump system of the present application.
  • the main solutions of the embodiments of the present application are: receiving user voice information collected by a video playback terminal; recognizing the user voice information and extracting characteristics of the voice information; and combining the voice information characteristics with preset audio data Match different scene tags to obtain scene tags that match the features of the voice information; send the scene tags that match the features of the voice information to the video playback terminal to control the video played on the video playback terminal to jump to the corresponding location.
  • the present application provides a solution that enables users to jump to a video through voice commands, and can accurately jump to the scene the user wants, which improves the user's experience.
  • FIG. 1 is a schematic diagram of a system architecture of an embodiment of a video playback jumping method of the present application.
  • the system architecture 100 may include video playback terminals 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is used as a medium for providing a communication link between the video playback terminals 101, 102, 103 and the server 105.
  • the network 104 may include various wired and wireless communication links, such as fiber optic cables, mobile networks, WiFi, Bluetooth, or hotspots.
  • the user can use the video playback terminals 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages, and so on.
  • Various communication client applications can be installed on the video playback terminals 101, 102, and 103, such as video playback applications, web browser applications, shopping applications, search applications, instant communication tools, email clients, social platform software, etc. .
  • the video playback terminals 101, 102, and 103 may be hardware or software.
  • the video playback terminals 101, 102, and 103 may be various electronic devices with a display screen and supporting video playback, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio layer 4) Players, laptops and desktop computers, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio layer 3
  • MP4 Motion Picture Experts Group Audio Layer IV, motion picture expert compression standard audio layer 4
  • Players laptops and desktop computers, etc.
  • laptops and desktop computers etc.
  • the video playback terminals 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example to provide distributed services) or as a single software or software module. There is no specific limitation here
  • the server 105 may be a server that provides various services, such as reading videos played on the video playback terminals 101, 102, and 103, or analyzing various received voice information, instruction information, and video/audio data, etc. Process, and feed back the processing results, such as video clips, scene tags, instruction information, etc., to the video playback terminal, so that the video playback terminal performs corresponding actions according to the processing results.
  • the server can be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server is software, it may be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.
  • the video playback jumping method provided in the embodiments of the present application may be executed by the video playback terminals 101, 102, and 103, or may be executed by the server 105.
  • the device for pushing information may be installed in the video playback terminals 101, 102, 103, or may be installed in the server 105. There is no specific limitation here.
  • FIG. 1 the numbers of video playback terminals, networks, and servers in FIG. 1 are only schematic. According to the implementation needs, there can be any number of video playback terminals, networks and servers.
  • the first embodiment of the present application provides a video playback jump method, including the following steps:
  • Step S10 Receive user voice information collected by the video playback terminal.
  • This application can be applied to an interactive system composed of a video playback terminal and a server.
  • the video playback terminal and the server are connected through a network to realize interaction.
  • the video playback terminal takes a TV as an example, collects the user's voice information in real time through the TV's voice collection module, and sends the collected voice information to the server through the wireless network.
  • the server receives user voice information sent by the TV set at the other end of the network in real time.
  • Step S20 Recognize the user's voice information and extract features of the voice information.
  • the server performs speech recognition and semantic recognition on the received user language information, in which speech recognition is to convert the speech information into computer-recognizable text information through the acoustic model and the speech model.
  • Semantic recognition is based on speech recognition and is based on the user’s Perform intelligent analysis based on characteristics such as gender, hobbies, and usual on-demand tendencies to better understand the user’s intentions. If the user input voice is the full name of a specific movie or TV series, the server only needs to perform voice recognition to find out the movie or TV series that the user wants to watch. If the user input voice is "a love movie", " For the vague sentences such as “Hot Action Movies”, “Hong Kong Director’s Movies”, and “Hollywood Blockbuster”, the server also needs to perform semantic recognition in order to make accurate jumps.
  • the server can extract the characteristics of the user's voice information. For example, if the user's input voice is "Director Zhao is checked in the name of the TV drama "**", the server can recognize the voice and extract it The characteristics of "TV series”, “The Popular Righteousness of **” and "Director Zhao is investigated”.
  • Step S30 Match the voice information features with different scene tags in the audio data to obtain scene tags that match the voice information features.
  • the application server presets a large amount of audio data, and all audio data is voice-recognized to generate corresponding scene tags.
  • the server can generate different scene tags for different scenes in the audio data.
  • the scene tags include the video type, Name, scene description, person, time, episode number and other related information.
  • the scene label can be at the beginning, end, or climax of the corresponding scene audio information, and the case is preferably at the beginning of the corresponding scene audio information.
  • the server can obtain the corresponding video clip or subtitle information from the TV or network according to the massive audio data in the audio database, and then intelligently analyze the video clip or subtitle information in the Scene tags are generated at corresponding positions of audio data.
  • the user's intention is to jump the video playback terminal to the time period corresponding to the corresponding video. For example, if the video playback terminal is currently playing the TV show "The Civil Rights of **", the user's voice command at the moment is "TV drama "* "Director Zhao in "The Name of *” was investigated", the server will first judge the user's voice information to extract all audio information related to the TV series "The Name of **" in the audio database.
  • the scene information of the user to be jumped included in the characteristics of the user's voice information match the scene information of the user to jump with each scene tag in the audio data to find the scene tag with the highest matching degree, such as the user inputting a voice command
  • For "Director Zhao was investigated in the TV series "The Name of **” find all scene tags in the corresponding audio data in the audio database, such as Director Zhao was arrested, Chen Yanyan against the excavator, Hou Liangping and Qi Tongwei singing "Wisdom Fight", Ouyang Jing was arrested, etc., to find out the scene tags that match "Director Zhao was arrested”.
  • Step S40 Send the scene tag matching the characteristics of the voice information to the video playback terminal to control the video played on the video playback terminal to jump to the corresponding position.
  • the server After the server obtains the scene tag matching the characteristics of the voice information, it sends it to the video playback terminal, so that the video playback terminal jumps to the corresponding position according to the scene tag.
  • the server may generate a jump instruction according to the scene tag matching the voice information feature, and the jump instruction includes scene tag position information, so that the video playback terminal can The turn command jumps to the corresponding position.
  • the server receives the user's voice information collected by the video playback terminal and the name information of the currently playing video, and performs voice recognition and semantic recognition on the user's voice information, extracts the characteristics of the voice information, and then according to the Name information of the currently playing video, confirm that the audio database contains the audio data corresponding to the currently playing video, match the voice information features with different scene tags in the audio data, obtain scene tags matching the voice information features, and then Send the scene tags matching the characteristics of the voice information to the video playback terminal to control the video played on the video playback terminal to jump to the corresponding position.
  • This application recognizes the user's voice information features through the server's voice recognition function, and matches the scene tags that match the user's voice commands according to the user's voice features, so that the video playback terminal can achieve video jumping and can accurately jump to the user's In the desired scene, which improves the user's experience.
  • a second embodiment of the present application provides a video playback jump method.
  • the voice information feature is different from the preset audio data.
  • the scene tags Before matching the scene tags to obtain scene tags that match the features of the voice information, including:
  • step S50 it is determined whether the feature of the voice information includes the name of the jump video.
  • Step S30 is replaced by: Step S31: matching the voice information feature and the name of the currently playing video with different scene tags in the audio data to obtain a scene tag matching the voice information feature.
  • the server After obtaining the name of the video, match the different scene tags in the audio data according to the user's voice and the name of the currently playing video, and obtain scene tags matching the characteristics of the voice information, such as the video playback terminal currently playing the TV series " **Citizen's Right", the server collects the currently playing video from the video playback terminal, and then according to the user's voice command input, "Director Zhao is checked in the name of the TV series "**", the server will first determine the user's voice information and extract audio All audio information in the database related to the TV series "The Name of **", and then in the audio information related to the TV series "The Name of **" according to the characteristics of the voice information, so that the matching speed according to the video name is faster, The result is also more accurate. In addition, if the user enters the voice as "jump to the big ending", the feature of the "big ending” will be extracted, and the currently playing video will be skipped to the beginning of the last episode.
  • the voice information feature does not include the name of the jump video, or the name of the currently playing video may not be obtained, the voice information feature is directly used to perform tag matching on the preset audio data. This method requires more audio data to be queried. The query speed will be slower.
  • step S30 is executed to match the voice information feature with different scene tags in the preset audio data to obtain a scene tag matching the voice information feature.
  • the execution process of the server at this time is the same as step S31, except that the video name is one of the user voice information and the other is obtained by the server from the video playback terminal.
  • the server needs to be in the voice database In the self-matching, you can perform intelligent analysis based on the characteristics of the user's gender, hobbies, and usual on-demand tendencies, and select a video suitable for the user to make the video playback terminal jump to the video.
  • the user can also perform other instructions. For example, the voice input by the user is "forward 30 minutes”, the features of "forward” and “30 minutes” will be extracted, and the video currently being played will be jumped to a position forward by 30 minutes.
  • This application uses the server to determine whether there is a jump video name in the user's voice information feature, so as to jump to the currently playing video, switch the video playback to other video names or switch the video playback to other video name corresponding scenes, and can better meet the requirements of the public .
  • step S30 matches the voice information feature with different scene tags in the preset audio data, and obtaining scene tags matching the voice information feature includes:
  • Step S32 Determine whether the preset audio data includes the audio data corresponding to the currently playing video
  • step S31 If there is no audio data corresponding to the currently playing video in the preset audio data, step S31 is executed, and step S34 is executed.
  • Step S33 Send a request instruction to the video playback terminal.
  • Step S34 Receive audio data corresponding to the currently playing video sent by the video playing terminal, and save the audio data to an audio database.
  • the server sends a request instruction to the video playback terminal.
  • the request instruction requires the video playback terminal to send the audio data corresponding to the currently playing video.
  • the server receives the video data sent by the video playback terminal. After the audio data, please save it to the audio database. In this way, the audio data in the audio database is richer and more complete, and at the same time, when the video object to be jumped by the user is the current playing video, the scene tag required by the user can be matched in time.
  • a third embodiment of the present application provides a video playback jump method.
  • the characteristics of the voice information are different from those in the audio data.
  • the method further includes:
  • Step S70 if a scene tag that matches the characteristics of the voice information is not matched within a preset time, a matching failure prompt is generated;
  • Step S80 Send a matching failure prompt to the video playback terminal, so that the video playback terminal displays the prompt information.
  • Match the voice information feature with different scene tags in the audio database if there is no video object to be jumped by the user in the audio database, the matching is directly ended; if there is a video object to be jumped by the user in the audio database, then recognize The audio information corresponding to the video name to be jumped by the user in the audio database is obtained, and each scene label corresponding to the audio information is obtained and matched with each scene label. If the scene tags that match the characteristics of the voice information are not matched within a preset time, the matching ends. After the matching is completed, a matching failure prompt is generated and sent to the video playback terminal.
  • the video playback terminal receives the matching failure prompt information, which can be directly displayed on the video playback interface, or through the user prompt control on the terminal such as Toast, Snackbar and other prompt information.
  • the matching result may also recommend to the user other video information closer to the intention in the audio database according to the voice information feature. If the voice recorded by the user is "a love movie”, “popular action movie”, “Hong Kong director's movie”, “Hollywood blockbuster”, etc., the server matches itself in the voice database, which can be based on the user's gender and hobby , The usual on-demand tendencies and other characteristics for intelligent analysis, select the video suitable for the user, so that the video playback terminal jump to the video.
  • a fourth embodiment of the present application provides a video playback jump method, including the following steps:
  • Step S110 Collect voice information input by the user.
  • the video playback terminal may include a video playback module and a voice collection module; it may also include only a video playback module, and then an external voice collection module, such as a microphone.
  • Mobile phones, televisions, computers, etc. can all be used as video playback terminals.
  • the mobile phone is used as the video playback terminal.
  • the user's voice information is collected through the mobile phone's microphone, and a video playback application is installed in the mobile phone, which can be played by the video. The application plays the video that the user wants to watch.
  • Step S120 Send the user's voice information to the server, so that the server matches the voice information feature with different scene tags in the audio data to obtain a scene tag matching the voice information feature.
  • the user's voice information is sent to the server through the mobile phone, and the voice information may include scene keywords (for example, "XJiaozhuxiantai”), and may also include the title keywords and scene keywords (for example, "playname A plot B") , So that the server can directly parse out the video object and scene information that the user intends to jump from the voice information, and at the same time, the server can determine whether there is a corresponding video currently playing in the audio database according to the name information of the currently playing video sent by the mobile phone If there is no audio data, perform the following steps:
  • step S121 an audio data request instruction sent by the server is received.
  • Step S122 Send audio data corresponding to the currently playing video to the server.
  • the mobile phone After receiving the audio data request instruction sent by the server, the mobile phone retrieves the audio data corresponding to the currently playing video from the background, packages it, and uploads it to the server, so that the audio data corresponding to the currently playing video is included in the server's audio database.
  • Step S130 Receive the scene tag that matches the characteristics of the voice information, and jump the video played on the video playback terminal to a corresponding position.
  • the mobile phone receives the matching result sent by the server in real time. If the matching result is a scene tag matching the characteristics of the voice information, the video playback application is jumped according to the location information contained in the scene tag. If the server does not match a scene tag that matches the characteristics of the voice information, the mobile phone receives a matching failure prompt, and then displays text information on the mobile phone screen to prompt the user.
  • the video playback terminal collects the voice information input by the user through the microphone, and obtains the name information of the currently playing video in the background, and sends the user voice information and the name information of the currently playing video to the server, so that the server Matching the voice information feature with different scene tags in the audio data, acquiring the scene tag matching the voice information feature, receiving the scene tag matching the voice information feature, and jumping the video played on the video playback terminal to Corresponding position.
  • This application enables users to directly send voice commands to achieve video jumps and jump to the video scene they want to watch, thereby improving user experience.
  • this application is a schematic diagram of a first embodiment of a video playback jump system
  • the video playback jump system includes: a video playback terminal and a server,
  • the video playback terminal collects the voice information input by the user, and sends the user voice information to the server;
  • the server receives user voice information collected by a video playback terminal, recognizes the voice information, extracts the characteristics of the voice information, matches the characteristics of the voice information with different scene tags in preset audio data, and obtains Scene tags matching the characteristics of the voice information, sending the scene tags matching the characteristics of the voice information to the video playback terminal;
  • the video playback terminal receives the scene tag matching the characteristics of the voice information, and jumps the video played on the video playback terminal to a corresponding position.
  • embodiments of the present application also provide a computer-readable storage medium that stores a video playback jump program on the computer-readable storage medium, and the video playback jump program is implemented as follows when executed by a video playback terminal and a server :
  • step of matching the voice information feature with different scene tags in the preset audio data to obtain scene tags matching the voice information feature it includes:
  • the feature of the voice information does not include the name of the jump video, the name of the currently playing video is obtained;
  • the step of matching the voice information feature with different scene tags in the preset audio data, and obtaining the scene tags matching the voice information feature includes:
  • the step of judging whether the voice information feature includes the name of the jump video it includes:
  • the voice information feature includes the name of the jump video, then perform the step of: matching the voice information feature with different scene tags in the preset audio data to obtain a scene tag that matches the voice information feature.
  • the step of matching the voice information feature with different scene tags in preset audio data to obtain scene tags matching the voice information feature includes:
  • the method further includes:
  • a matching failure prompt is generated
  • the matching failure prompt is sent to the video playback terminal, so that the video playback terminal displays the prompt information.
  • a video playback jump program is stored on the computer-readable storage medium, and when the video playback jump program is executed by the video playback terminal and the server, the following operations are also implemented:
  • the method further includes:
  • the method further includes:
  • the server If the server does not match a scene tag that matches the characteristics of the voice information within a preset time, it receives a matching failure prompt and displays it on the video terminal interface to prompt the user.
  • the methods in the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware, but in many cases the former is better Implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or part that contributes to the existing technology, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above , Magnetic disks, optical disks), including several instructions to enable a video playback terminal (which may be a mobile phone, computer, television, or network device, etc.) to perform the methods described in the embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请公开了一种视频播放跳转方法、系统及计算机可读存储介质,包括:接收视频播放终端采集的用户语音信息;对所述用户语音信息进行识别,提取出所述语音信息的特征;将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签;将所述与语音信息特征匹配的场景标签发送至视频播放终端,以控制视频播放终端上播放的视频跳转至相应位置。本申请还公开了一种视频播放跳转系统及计算机可读存储介质。本申请通过服务器的语音识别和语义识别,实现了用户通过语音命令即可实现视频跳转,从而提高用户体验。

Description

视频播放跳转方法、系统及计算机可读存储介质
本申请要求于2018年12月29日提交中国专利局、申请号为201811654558.6、发明名称为“视频播放跳转方法、系统及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及视频播放技术领域,尤其涉及一种视频播放跳转方法、系统及计算机可读存储介质。
背景技术
随着互联网技术的发展,人们不再单纯依赖接收电视直播信号来观看直播视频,而是通过互联网收看网络中现有的任意视频,包括直播视频。这样不仅可以根据自己的喜好选择视频类型,还可以在观看视频的过程中随意调整播放进度,直接将视频跳转至想要观看的场景中。
调整视频播放进度时,用户可以通过电视遥控上的按键或者视频播放软件上的虚拟按键来实现,如用户按下电视遥控上的按键或者视频播放软件上的虚拟按键,视频播放进度向前或者向后跳转一定的时间;如用户一直按住电视遥控上的按键或者视频播放软件上的虚拟按键,视频播放进度一直向前或者向后跳转一定的时间;如用户设置跳转时间后,电视或视频播放软件加载跳转时间后进行视频播放等。这样使得用户需要手动操作按键才能将视频跳转到想要观看的场景中,而且很难一次性跳转完成,用户体验性较差。
技术解决方案
本申请的主要目的在于提供一种视频播放跳转方法、系统及计算机可读存储介质,旨在解决用户需要多次手动操作按键才能将视频跳转到想要观看的场景中,用户体验性较差的技术问题。
为实现上述目的,本申请提供一种视频播放跳转方法,包括以下步骤:
接收视频播放终端采集的用户语音信息;
对所述用户语音信息进行识别,提取出所述语音信息的特征;
将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签;
将所述与语音信息特征匹配的场景标签发送至视频播放终端,以控制视频播放终端上播放的视频跳转至相应位置。
优选地,在所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤之前,包括:
判断所述语音信息特征中是否包括跳转视频名称;
若所述语音信息特征中不包括跳转视频名称,则获取当前播放视频的名称;
所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤包括:
将所述语音信息特征和当前播放视频的名称与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
优选地,在所述判断所述语音信息特征中是否包括跳转视频名称的步骤之后,包括:
若所述语音信息特征中包括跳转视频名称,则执行步骤:将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
优选地,所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤包括:
判断预设音频数据中是否包括当前播放视频对应的音频数据;
若预设音频数据中没有当前播放视频对应的音频数据,则向视频播放终端发送请求指令;
接收视频播放终端发送的当前播放视频对应的音频数据,将所述音频数据保存到预设音频数据。
优选地,在所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤之后,还包括:
若在预设时间内未匹配出符合所述语音信息特征的场景标签,则生成匹配失败提示;
将匹配失败提示发送至视频播放终端,以使视频播放终端显示提示信息。
此外,为实现上述目的,本申请还提供一种视频播放跳转方法,包括以下步骤:
采集用户输入的语音信息;
将所述用户语音信息发送至服务器,以使服务器将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签;
接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。
优选地,在所述将所述用户语音信息和当前播放视频的名称信息发送至服务器步骤之后,还包括:
接收到服务器发送的音频数据请求指令;
发送当前播放视频对应的音频数据至服务器。
优选地,在所述将所述用户语音信息发送至服务器步骤之后,还包括:
若服务器在预设时间内未匹配出符合所述语音信息特征的场景标签,则接收匹配失败提示,并在视频终端界面中显示,以提示用户。
此外,为实现上述目的,本申请还提供一种视频播放跳转系统,所述视频播放跳转系统包括:视频播放终端和服务器,
所述视频播放终端采集用户输入的语音信息,并将所述用户语音信息和当前播放视频的名称信息发送至服务器;
所述服务器接收视频播放终端采集的用户语音信息,对所述语音信息进行识别,提取出所述语音信息的特征,将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,将所述与语音信息特征匹配的场景标签发送至视频播放终端;
所述视频播放终端接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,该计算机程序被视频播放终端和服务器执行时实现如上所述的视频播放跳转方法。
本申请应用于视频播放终端和服务器组成的交互系统,首先接收视频播放终端通过语音采集模块如麦克风采集到的用户语音信息,将上述用户语音信息通过语音识别和语义识别功能,对用户语音信息进行识别,获取到用户语音信息的特征,该特征主要包括用户意图要跳转的视频名称、场景等信息,同时服务器将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,最后将与语音信息特征匹配的场景标签发送至视频播放终端,使视频跳转至相应位置。从而实现了用户通过语音命令即可实现视频跳转,且能准确地跳转至用户想要的场景中,提高了用户的体验性。
附图说明
图1是本申请实施例方案涉及的系统架构示意图;
图2为本申请视频播放跳转方法第一实施例的流程示意图;
图3为本申请视频播放跳转方法第二实施例的流程示意图;
图4为本申请视频播放跳转方法第三实施例的流程示意图;
图5为本申请视频播放跳转方法第四实施例的流程示意图;
图6为本申请视频播放跳转系统第一实施例的结构示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例的主要解决方案是:接收视频播放终端采集的用户语音信息;对所述用户语音信息进行识别,提取出所述语音信息的特征;将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签;将所述与语音信息特征匹配的场景标签发送至视频播放终端,以控制视频播放终端上播放的视频跳转至相应位置。
由于现有技术还不能通过用户语音中的场景特征,将视频播放跳转至相应场景位置,故需要本申请来解决。
本申请提供一种解决方案,使用户通过语音命令即可实现视频跳转,且能准确地跳转至用户想要的场景中,提高了用户的体验性。
图1为本申请的视频播放跳转方法实施例的系统架构示意图。
请参照图1,系统架构100可以包括视频播放终端101、102、103,网络104和服务器105。网络104用以在视频播放终端101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种有线、无线通信链路,如光纤电缆、移动网络、WiFi、蓝牙或者热点等等。
用户可以使用视频播放终端101、102、103通过网络104与服务器105交互,以接收或发送消息等。视频播放终端101、102、103上可以安装有各种通讯客户端应用,例如视频播放类应用、网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
视频播放终端101、102、103可以是硬件,也可以是软件。当视频播放终端101、102、103为硬件时,可以是具有显示屏并且支持视频播放的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当视频播放终端101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如对视频播放终端101、102、103上播放的视频进行读取,也可以对接收到的各种语音信息、指令信息、视频/音频数据进行分析等处理,并将处理结果例如视频片段、场景标签、指令信息等,反馈给视频播放终端,以使视频播放终端根据处理结果完成相应动作。
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。
需要说明的是,本申请实施例所提供的视频播放跳转方法可以由视频播放终端101、102、103执行,也可以由服务器105执行。相应地,用于推送信息的装置可以设置于视频播放终端101、102、103中,也可以设置于服务器105中。在此不做具体限定。
应该理解,图1中的视频播放终端、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的视频播放终端、网络和服务器。
请参照图2,本申请第一实施例提供一种视频播放跳转方法,包括以下步骤:
步骤S10,接收视频播放终端采集的用户语音信息。
本申请可以应用于视频播放终端和服务器组成的交互系统,视频播放终端与服务器通过网络相连,实现交互。本实施例中视频播放终端以电视机为例,通过电视机的语音采集模块实时采集用户的语音信息,通过无线网络,将采集到的语音信息发送至服务器。服务器实时接收网络另一端电视机发送的用户语音信息。
步骤S20,对所述用户语音信息进行识别,提取出所述语音信息的特征。
服务器将接收到的用户语言信息进行语音识别和语义识别,其中语音识别是通过声学模型和语音模型将语音信息转化为计算机能识别的文字信息,语义识别是在语音识别的基础在,基于用户的性别、爱好、平时的点播倾向等特征来进行智能分析,更好的理解用户的意图。如用户录入语音为一部具体的电影或者电视剧的全名,服务器只需要经过语音识别即可找出用户想要观看的这部电影或者电视剧,如用户录入语音是“一部爱情片”、“热播的动作片”、“香港导演的电影”、“好莱坞大片”等模糊语句,服务器还需要进行语义识别,才能进行精准的跳转。
服务器基于语音识别和语音识别功能,可以提取出用户语音信息的特征,如用户录入语音为“电视剧《**的名义》中赵处长被查”,服务器能对该语音进行识别,并提取出“电视剧”、“**的民义”、“赵处长被查”的特征。
步骤S30,将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
本申请服务器中预设有海量音频数据,并对所有的音频数据进行语音识别标记,生成相应的场景标签,服务器能将音频数据中不同的场景生成不同的场景标签,该场景标签包含视频类型、名称、场景描述、人物、时间、集数等相关信息。场景标签可以在对应场景音频信息的开头、结尾或高潮位置,本案优选在对应场景音频信息的开始位置。
需要说明的是,除上述实施方式外,服务器能根据音频数据库中的海量音频数据从电视机或网络中获取与之对应的视频片段或字幕信息,然后对视频片段或字幕信息进行智能分析,在音频数据的对应位置生成场景标签。
本实施例中,用户意图是要跳转视频播放终端跳转到对应视频对应的时间段,如视频播放终端当前正在播放电视剧《**的民义》,此刻用户录入语音命令为“电视剧《**的名义》中赵处长被查”,服务器首先会判断用户语音信息提取音频数据库中与电视剧《**的名义》有关的所有音频信息。
根据用户语音信息特征中包含的用户要跳转的场景信息,将用户要跳转的场景信息与所述音频数据中的各个场景标签匹配,找出匹配度最高的场景标签,如用户录入语音命令为“电视剧《**的名义》中赵处长被查”,则在音频数据库中找到对应的音频数据中所有场景标签,如赵处长被抓、陈岩石对抗挖掘机、侯亮平与祁同伟唱《智斗》、欧阳菁被抓等,找出与“赵处长被抓”相匹配的场景标签。
步骤S40,将所述与语音信息特征匹配的场景标签发送至视频播放终端,以控制视频播放终端上播放的视频跳转至相应位置。
服务器获取到与语音信息特征匹配的场景标签后,将其发送至视频播放终端,以使视频播放终端根据该场景标签跳转至相应位置。
需要说明的是,除上述实施方式外,服务器可以根据所述与语音信息特征匹配的场景标签,生成一个跳转指令,该跳转指令包含场景标签位置信息,以使视频播放终端能根据该跳转指令跳转至相应位置。
在本实施例中服务器接收视频播放终端采集的用户语音信息和当前播放视频的名称信息,并对所述用户语音信息进行语音识别和语义识别,提取出所述语音信息的特征,再根据所述当前播放视频的名称信息,确认音频数据库中包含当前播放视频对应的音频数据,将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,再将所述与语音信息特征匹配的场景标签发送至视频播放终端,以控制视频播放终端上播放的视频跳转至相应位置。本申请通过服务器的语音识别功能识别出用户语音信息特征,并根据用户语音特征匹配出与用户语音命令相符的场景标签,以使视频播放终端实现视频跳转,且能准确地跳转至用户想要的场景中,从而提高了用户的体验性。
进一步的,请参照图3,本申请第二实施例提供一种视频播放跳转方法,基于上述图2所示的实施例,在步骤S30将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤之前,包括:
步骤S50,判断所述语音信息特征中是否包括跳转视频名称。
为提高查询结果的准确性,本实施例在进行标签的匹配前,还判断所述语音信息特征中是否包括跳转视频名称,若所述语音信息特征中不包括跳转视频名称,则执行步骤S60,获取当前播放视频的名称。
本实施例中,用户录入的语音命令中没有要跳转的视频名称,本领域技术人员可以理解为用户要跳转的对象为视频播放终端当前正在播放的视频,此时服务器向播放终端获取当前播放视频的名称。步骤S30则替换为:步骤S31:将所述语音信息特征和当前播放视频的名称与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
在获取到视频名称后,则根据用户的语音和当前播放视频的名称与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,如视频播放终端当前正在播放电视剧《**的民义》,服务器向视频播放终端采集当前播放的视频,然后根据用户录入语音命令为“电视剧《**的名义》中赵处长被查”,服务器首先会判断用户语音信息提取音频数据库中与电视剧《**的名义》有关的所有音频信息,再根据语音信息中特征在与电视剧《**的名义》有关的所有音频信息中,这样先根据视频名称进行匹配的速度更快,结果也更加准确。另外如用户录入语音为“跳至大结局”,将提取“大结局”的特征,并对当前正在播放的视频跳至最后一集开始位置。
当然若所述语音信息特征中不包括跳转视频名称,也可以不获取当前播放视频的名称,直接采用语音信息特征在预设音频数据进行标签匹配,这种方式需要查询的音频数据较多,导致查询速度会较慢。
若所述语音信息特征中包括跳转视频名称,则执行步骤S30,将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
此时服务器的执行过程与步骤S31相同,区别在于视频名称一种在用户语音信息中,一种由服务器向视频播放终端获得。
此外如果用户录入的语音为“一部爱情片”、“热播的动作片”、“香港导演的电影”、“好莱坞大片”等不包含具体电视剧或电影名称信息时,则需要服务器在语音数据库中自行匹配,可以基于用户的性别、爱好、平时的点播倾向等特征来进行智能分析,选择出适合用户的视频,以使视频播放终端跳转至该视频。用户还可以进行其他指令,如用户录入的语音为“前进30分钟”,将提取“前进”、“30分钟”的特征,对当前正在播放的视频跳转至前进30分钟的位置。
本申请通过服务器判断用户语音信息特征中有无跳转视频名称,从而实现对当前播放视频进行跳转、切换视频播放至其他视频名称或切换视频播放至其他视频名称相应场景,更能符合大众要求。
进一步的,所述步骤S30将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,包括:
步骤S32,判断预设音频数据中是否包括当前播放视频对应的音频数据;
若预设音频数据中没有当前播放视频对应的音频数据,则执行步骤S31,并执行步骤S34。
步骤S33,向视频播放终端发送请求指令。
步骤S34,接收视频播放终端发送的当前播放视频对应的音频数据,将所述音频数据保存到音频数据库。
如音频数据库中没有视频播放终端当前播放视频对应的音频数据,则服务器向视频播放终端发送请求指令,该请求指令要求视频播放终端发送当前播放视频对应的音频数据,服务器接收到视频播放终端发送的音频数据后,请其保存至音频数据库。这样使得音频数据库中的音频数据更丰富、更完整,同时也便于当用户要跳转的视频对象为当前的播放视频时,能及时匹配到用户需要的场景标签。
进一步的,请参照图4,本申请第三实施例提供一种视频播放跳转方法,基于上述图2所示的实施例,在步骤S30将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签之后,还包括:
步骤S70,若在预设时间内未匹配出符合所述语音信息特征的场景标签,则生成匹配失败提示;
步骤S80,将匹配失败提示发送至视频播放终端,以使视频播放终端显示提示信息。
将所述语音信息特征与音频数据库中不同的场景标签进行匹配,若音频数据库中没有用户要跳转的视频对象,则直接结束匹配;如音频数据库中有用户要跳转的视频对象,则识别出音频数据库中用户要跳转的视频名称对应的音频信息,获得所述音频信息对应的各个场景标签,并与各个场景标签匹配。若在预设时间内未匹配出符合所述语音信息特征的场景标签,则结束匹配。结束匹配后,生成匹配失败提示,并发送至视频播放终端。视频播放终端接收到匹配失败提示信息,可以在视频播放界面直接显示出来,也可以通过终端上的用户提示控件如Toast、Snackbar等提示信息。当然,匹配结果出了匹配失败提示外,也可以根据所述语音信息特征给用户推荐音频数据库中更贴近用于意图的其他视频信息。如果用户录入的语音为“一部爱情片”、“热播的动作片”、“香港导演的电影”、“好莱坞大片”等时,服务器在语音数据库中自行匹配,可以基于用户的性别、爱好、平时的点播倾向等特征来进行智能分析,选择出适合用户的视频,以使视频播放终端跳转至该视频。
参照图5,本申请第四实施例提供一种视频播放跳转方法,包括以下步骤:
步骤S110,采集用户输入的语音信息。
在本实施例中,视频播放终端既可以包含视频播放模块和语音采集模块;也可以只包含视频播放模块,再外接语音采集模块,如麦克风。手机、电视机、电脑等都可作为视频播放终端,本实施例中以手机作为视频播放终端,通过手机的麦克风采集用户的语音信息,并在手机中安装有视频播放应用程序,可以由视频播放应用程序播放用户想要观看的视频。
步骤S120,将所述用户语音信息发送至服务器,以使服务器将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
通过手机将用户语音信息发送给服务器,语音信息中可包括场景关键字(例如“X跳诛仙台”),还可同时包括剧名关键字和场景关键字(例如“剧名A情节B”),以使服务器能直接从语音信息中解析出用户意图要跳转的视频对象和场景信息,同时以使服务器根据手机发送的当前播放视频的名称信息,判断音频数据库中是否有当前播放视频对应的音频数据,如无,则执行以下步骤:
步骤S121,接收到服务器发送的音频数据请求指令。
步骤S122,发送当前播放视频对应的音频数据至服务器。
手机接收到服务器发送的音频数据请求指令后,从后台调取当前播放视频对应的音频数据,将其打包,上传至服务器,以使服务器的音频数据库中有当前播放视频对应的音频数据。
步骤S130,接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。
手机实时接收服务器发送的匹配结果,如该匹配结果为与语音信息特征匹配的场景标签,则根据该场景标签中包含的位置信息对视频播放应用程序执行跳转。若服务器未匹配出符合所述语音信息特征的场景标签,手机接收到的是匹配失败提示,则在手机屏幕上显示文本信息,以提示用户。
在本实施例中视频播放终端通过麦克风采集用户输入的语音信息,并获取后台中当前播放视频的名称信息,将所述用户语音信息和当前播放视频的名称信息发送至服务器,以使服务器将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。本申请使得用户直接发送语音命令就可以实现视频跳转,并跳转到想要观看的视频场景中,从而提高用户体验性。
参照图6,本申请为一种视频播放跳转系统第一实施例示意图,所述视频播放跳转系统包括:视频播放终端和服务器,
所述视频播放终端采集用户输入的语音信息,并将所述用户语音信息发送至服务器;
所述服务器接收视频播放终端采集的用户语音信息,对所述语音信息进行识别,提取出所述语音信息的特征,将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,将所述与语音信息特征匹配的场景标签发送至视频播放终端;
所述视频播放终端接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有视频播放跳转程序,所述视频播放跳转程序被视频播放终端和服务器执行时实现如下操作:
接收视频播放终端采集的用户语音信息;
对所述用户语音信息进行识别,提取出所述语音信息的特征;
将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签;
将所述与语音信息特征匹配的场景标签发送至视频播放终端,以控制视频播放终端上播放的视频跳转至相应位置。
进一步地,在所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤之前,包括:
判断所述语音信息特征中是否包括跳转视频名称;
若所述语音信息特征中不包括跳转视频名称,则获取当前播放视频的名称;
所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤包括:
将所述语音信息特征和当前播放视频的名称与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
进一步地,在所述判断所述语音信息特征中是否包括跳转视频名称的步骤之后,包括:
若所述语音信息特征中包括跳转视频名称,则执行步骤:将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
进一步地,所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤,包括:
判断预设音频数据中是否包括当前播放视频对应的音频数据;
若预设音频数据中没有当前播放视频对应的音频数据,则向视频播放终端发送请求指令;
接收视频播放终端发送的当前播放视频对应的音频数据,将所述音频数据保存到预设音频数据。
进一步地,在所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤之后,还包括:
若在预设时间内未匹配出符合所述语音信息特征的场景标签,则生成匹配失败提示;
将匹配失败提示发送至视频播放终端,以使视频播放终端显示提示信息。
所述计算机可读存储介质上存储有视频播放跳转程序,所述视频播放跳转程序被视频播放终端和服务器执行时还实现如下操作:
采集用户输入的语音信息;
将所述用户语音信息发送至服务器,以使服务器将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签;
接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。
进一步地,在所述将所述用户语音信息和当前播放视频的名称信息发送至服务器步骤之后,还包括:
接收到服务器发送的音频数据请求指令;
发送当前播放视频对应的音频数据至服务器。
进一步地,在所述将所述用户语音信息和当前播放视频的名称信息发送至服务器步骤之后,还包括:
若服务器在预设时间内未匹配出符合所述语音信息特征的场景标签,则接收匹配失败提示,并在视频终端界面中显示,以提示用户。
本申请计算机可读存储介质的具体实施例与上述视频跳转方法各实施例基本相同,在此不作赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台视频播放终端(可以是手机,计算机,电视机或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (16)

  1. 一种视频播放跳转方法,其中,包括以下步骤:
    接收视频播放终端采集的用户语音信息;
    对所述用户语音信息进行识别,提取出所述语音信息的特征;
    将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签;
    将所述与语音信息特征匹配的场景标签发送至视频播放终端,以控制视频播放终端上播放的视频跳转至相应位置。
  2. 如权利要求1所述的视频播放跳转方法,其中,在所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤之前,包括:
    判断所述语音信息特征中是否包括跳转视频名称;
    若所述语音信息特征中不包括跳转视频名称,则获取当前播放视频的名称;
    所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤包括:
    将所述语音信息特征和当前播放视频的名称与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
  3. 如权利要求2所述的视频播放跳转方法,其中,在所述判断所述语音信息特征中是否包括跳转视频名称的步骤之后,包括:
    若所述语音信息特征中包括跳转视频名称,则执行步骤:将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
  4. 如权利要求1所述的视频播放跳转方法,其中,所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤,包括:
    判断预设音频数据中是否包括当前播放视频对应的音频数据;
    若预设音频数据中没有当前播放视频对应的音频数据,则向视频播放终端发送请求指令;
    接收视频播放终端发送的当前播放视频对应的音频数据,将所述音频数据保存到预设音频数据。
  5. 如权利要求1所述的视频播放跳转方法,其中,在所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤之后,还包括:
    若在预设时间内未匹配出符合所述语音信息特征的场景标签,则生成匹配失败提示;
    将匹配失败提示发送至视频播放终端,以使视频播放终端显示提示信息。
  6. 一种视频播放跳转方法,其中,包括以下步骤:
    采集用户输入的语音信息;
    将所述用户语音信息发送至服务器,以使服务器将所述语音信息特征与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签;
    接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。
  7. 如权利要求6所述的视频播放跳转方法,其中,在所述将所述用户语音信息发送至服务器步骤之后,还包括:
    接收到服务器发送的音频数据请求指令;
    发送当前播放视频对应的音频数据至服务器。
  8. 如权利要求6所述的视频播放跳转方法,其中,在所述将所述用户语音信息发送至服务器步骤之后,还包括:
    若服务器在预设时间内未匹配出符合所述语音信息特征的场景标签,则接收匹配失败提示,并在视频终端界面中显示,以提示用户。
  9. 一种视频播放跳转系统,其中,所述视频播放跳转系统包括:视频播放终端和服务器,
    所述视频播放终端采集用户输入的语音信息,并将所述用户语音信息发送至服务器;
    所述服务器接收视频播放终端采集的用户语音信息,对所述语音信息进行识别,提取出所述语音信息的特征,将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,将所述与语音信息特征匹配的场景标签发送至视频播放终端;
    所述视频播放终端接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。
  10. 如权利要求9所述的视频播放跳转系统,其中,所述服务器判断所述语音信息特征中是否包括跳转视频名称;
    若所述语音信息特征中不包括跳转视频名称,则所述服务器获取当前播放视频的名称;
    所述将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤包括:
    将所述语音信息特征和当前播放视频的名称与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
  11. 如权利要求9所述的视频播放跳转系统,其中,
    若所述语音信息特征中包括跳转视频名称,则执行步骤:所述服务器将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
  12. 如权利要求9所述的视频播放跳转系统,其中,所述服务器判断预设音频数据中是否包括当前播放视频对应的音频数据;
    若预设音频数据中没有当前播放视频对应的音频数据,则所述服务器向视频播放终端发送请求指令;
    所述服务器接收视频播放终端发送的当前播放视频对应的音频数据,将所述音频数据保存到预设音频数据。
  13. 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被视频播放终端和服务器执行时实现如下步骤:
    所述视频播放终端采集用户输入的语音信息,并将所述用户语音信息发送至服务器;
    所述服务器接收视频播放终端采集的用户语音信息,对所述语音信息进行识别,提取出所述语音信息的特征,将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签,将所述与语音信息特征匹配的场景标签发送至视频播放终端;
    所述视频播放终端接收所述与语音信息特征匹配的场景标签,将视频播放终端上播放的视频跳转至相应位置。
  14. 如权利要求13所述的计算机可读存储介质,其中,该计算机程序被视频播放终端和服务器执行时还实现如下步骤:
    若所述语音信息特征中不包括跳转视频名称,则所述服务器获取当前播放视频的名称;
    所述服务器将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签的步骤包括:
    所述服务器将所述语音信息特征和当前播放视频的名称与所述音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
  15. 如权利要求13所述的计算机可读存储介质,其中,该计算机程序被视频播放终端和服务器执行时还实现如下步骤:
    若所述语音信息特征中包括跳转视频名称,则执行步骤:所述服务器将所述语音信息特征与预设音频数据中不同的场景标签进行匹配,获取与语音信息特征匹配的场景标签。
  16. 如权利要求13所述的计算机可读存储介质,其中,该计算机程序被视频播放终端和服务器执行时还实现如下步骤:
    所述服务器判断预设音频数据中是否包括当前播放视频对应的音频数据;
    若预设音频数据中没有当前播放视频对应的音频数据,则所述服务器向视频播放终端发送请求指令;
    所述服务器接收视频播放终端发送的当前播放视频对应的音频数据,将所述音频数据保存到预设音频数据。
PCT/CN2019/126022 2018-12-29 2019-12-17 视频播放跳转方法、系统及计算机可读存储介质 WO2020135161A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811654558.6A CN109688475B (zh) 2018-12-29 2018-12-29 视频播放跳转方法、系统及计算机可读存储介质
CN201811654558.6 2018-12-29

Publications (1)

Publication Number Publication Date
WO2020135161A1 true WO2020135161A1 (zh) 2020-07-02

Family

ID=66191672

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/126022 WO2020135161A1 (zh) 2018-12-29 2019-12-17 视频播放跳转方法、系统及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN109688475B (zh)
WO (1) WO2020135161A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689856A (zh) * 2021-08-20 2021-11-23 海信电子科技(深圳)有限公司 一种浏览器页面视频播放进度的语音控制方法及显示设备

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109688475B (zh) * 2018-12-29 2020-10-02 深圳Tcl新技术有限公司 视频播放跳转方法、系统及计算机可读存储介质
CN110166845B (zh) * 2019-05-13 2021-10-26 Oppo广东移动通信有限公司 视频播放方法和装置
CN112261436B (zh) * 2019-07-04 2024-04-02 青岛海尔多媒体有限公司 视频播放的方法、装置及系统
CN111209437B (zh) * 2020-01-13 2023-11-28 腾讯科技(深圳)有限公司 一种标签处理方法、装置、存储介质和电子设备
CN111601163B (zh) * 2020-04-26 2023-03-03 百度在线网络技术(北京)有限公司 播放控制方法、装置、电子设备及存储介质
CN111818172B (zh) * 2020-07-21 2022-08-19 海信视像科技股份有限公司 一种物联网管理服务器控制智能设备的方法及装置
CN112632329A (zh) * 2020-12-18 2021-04-09 咪咕互动娱乐有限公司 视频提取方法、装置、电子设备及存储介质
CN112954426B (zh) * 2021-02-07 2022-11-15 咪咕文化科技有限公司 视频播放方法、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020069073A1 (en) * 1998-01-16 2002-06-06 Peter Fasciano Apparatus and method using speech recognition and scripts to capture, author and playback synchronized audio and video
CN101329867A (zh) * 2007-06-21 2008-12-24 西门子(中国)有限公司 一种语音点播方法及装置
CN105677735A (zh) * 2015-12-30 2016-06-15 腾讯科技(深圳)有限公司 一种视频搜索方法及装置
CN107071542A (zh) * 2017-04-18 2017-08-18 百度在线网络技术(北京)有限公司 视频片段播放方法及装置
CN107506385A (zh) * 2017-07-25 2017-12-22 努比亚技术有限公司 一种视频文件检索方法、设备及计算机可读存储介质
CN107704525A (zh) * 2017-09-04 2018-02-16 优酷网络技术(北京)有限公司 视频搜索方法和装置
CN109688475A (zh) * 2018-12-29 2019-04-26 深圳Tcl新技术有限公司 视频播放跳转方法、系统及计算机可读存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9451195B2 (en) * 2006-08-04 2016-09-20 Gula Consulting Limited Liability Company Moving video tags outside of a video area to create a menu system
US9113128B1 (en) * 2012-08-31 2015-08-18 Amazon Technologies, Inc. Timeline interface for video content
CN105869623A (zh) * 2015-12-07 2016-08-17 乐视网信息技术(北京)股份有限公司 基于语音识别的视频播放方法及装置
CN106162357B (zh) * 2016-05-31 2019-01-25 腾讯科技(深圳)有限公司 获取视频内容的方法及装置
CN107155138A (zh) * 2017-06-06 2017-09-12 深圳Tcl数字技术有限公司 视频播放跳转方法、设备及计算机可读存储介质
CN107135418A (zh) * 2017-06-14 2017-09-05 北京易世纪教育科技有限公司 一种视频播放的控制方法及装置
CN107871500B (zh) * 2017-11-16 2021-07-20 百度在线网络技术(北京)有限公司 一种播放多媒体的方法和装置
CN107948729B (zh) * 2017-12-13 2020-03-27 Oppo广东移动通信有限公司 富媒体处理方法、装置、存储介质和电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020069073A1 (en) * 1998-01-16 2002-06-06 Peter Fasciano Apparatus and method using speech recognition and scripts to capture, author and playback synchronized audio and video
CN101329867A (zh) * 2007-06-21 2008-12-24 西门子(中国)有限公司 一种语音点播方法及装置
CN105677735A (zh) * 2015-12-30 2016-06-15 腾讯科技(深圳)有限公司 一种视频搜索方法及装置
CN107071542A (zh) * 2017-04-18 2017-08-18 百度在线网络技术(北京)有限公司 视频片段播放方法及装置
CN107506385A (zh) * 2017-07-25 2017-12-22 努比亚技术有限公司 一种视频文件检索方法、设备及计算机可读存储介质
CN107704525A (zh) * 2017-09-04 2018-02-16 优酷网络技术(北京)有限公司 视频搜索方法和装置
CN109688475A (zh) * 2018-12-29 2019-04-26 深圳Tcl新技术有限公司 视频播放跳转方法、系统及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689856A (zh) * 2021-08-20 2021-11-23 海信电子科技(深圳)有限公司 一种浏览器页面视频播放进度的语音控制方法及显示设备
CN113689856B (zh) * 2021-08-20 2023-11-03 Vidaa(荷兰)国际控股有限公司 一种浏览器页面视频播放进度的语音控制方法及显示设备

Also Published As

Publication number Publication date
CN109688475A (zh) 2019-04-26
CN109688475B (zh) 2020-10-02

Similar Documents

Publication Publication Date Title
WO2020135161A1 (zh) 视频播放跳转方法、系统及计算机可读存储介质
US20230138030A1 (en) Methods and systems for correcting, based on speech, input generated using automatic speech recognition
US10142585B2 (en) Methods and systems for synching supplemental audio content to video content
US9215510B2 (en) Systems and methods for automatically tagging a media asset based on verbal input and playback adjustments
EP3680896B1 (en) Method for controlling terminal by voice, terminal, server and storage medium
JP2021002884A (ja) メディアアセットの部分を識別し記憶するためのシステムおよび方法
CN105957530A (zh) 一种语音控制方法、装置和终端设备
US11375287B2 (en) Systems and methods for gamification of real-time instructional commentating
US9544656B1 (en) Systems and methods for recognition of sign language for improved viewing experiences
CN109600646B (zh) 语音定位的方法及装置、智能电视、存储介质
US20210392240A1 (en) Method and device for automatically adjusting synchronization of sound and picture of tv, and storage medium
EP3076678A1 (en) Display apparatus for searching and control method thereof
CN111581434A (zh) 视频服务提供方法、装置、电子设备和存储介质
US11595729B2 (en) Customizing search results in a multi-content source environment
US20200302312A1 (en) Method and apparatus for outputting information
US10616649B2 (en) Providing recommendations based on passive microphone detections
KR102145370B1 (ko) 화면을 제어하는 미디어 재생 장치, 방법 및 화면을 분석하는 서버
WO2017008498A1 (zh) 搜索节目的方法及装置
CN109922376A (zh) 一种模式设置方法、装置、电子设备及存储介质
US9396192B2 (en) Systems and methods for associating tags with media assets based on verbal input
EP4161085A1 (en) Real-time audio/video recommendation method and apparatus, device, and computer storage medium
CN111274449B (zh) 视频播放方法、装置、电子设备和存储介质
US10691733B2 (en) Methods and systems for replying to queries based on indexed conversations and context
US10817553B2 (en) Methods and systems for playing back indexed conversations based on the presence of other people
CN112883144A (zh) 一种信息交互方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19902447

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19902447

Country of ref document: EP

Kind code of ref document: A1