WO2019128724A1 - 数据处理方法及装置 - Google Patents

数据处理方法及装置 Download PDF

Info

Publication number
WO2019128724A1
WO2019128724A1 PCT/CN2018/120770 CN2018120770W WO2019128724A1 WO 2019128724 A1 WO2019128724 A1 WO 2019128724A1 CN 2018120770 W CN2018120770 W CN 2018120770W WO 2019128724 A1 WO2019128724 A1 WO 2019128724A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
feature
audio
feature database
target video
Prior art date
Application number
PCT/CN2018/120770
Other languages
English (en)
French (fr)
Inventor
徐维昌
田智平
徐倩
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP18897396.0A priority Critical patent/EP3745727A4/en
Publication of WO2019128724A1 publication Critical patent/WO2019128724A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/4147PVR [Personal Video Recorder]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4335Housekeeping operations, e.g. prioritizing content for deletion because of storage space restrictions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44222Analytics of user selections, e.g. selection of programs or purchase activity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the present application relates to, but is not limited to, the field of data processing technology, for example, to a data processing method and apparatus.
  • the recording function of the set-top box has gradually become a common function of the set-top box.
  • the video recorded by the set-top box and the video recorded by the Internet Protocol Television (IPTV) back to the program are all directly recorded in the live broadcast, such as
  • the advertisement videos in the middle of the TV show, the end of the film, and the middle of the feature will be recorded; when the user clicks on the recorded program, they need to read the title or the advertisement to continue watching the feature.
  • a user clicks on a video on the Internet on a smart terminal device such as a smart OTT (Over The Top) set-top box, a mobile phone, or a tablet computer if the user does not want to watch the title or the advertisement, he can only manually skip forward to skip the title or advertisement, not only The operation is cumbersome, and the fast forward is also easy to jump inaccurate, resulting in fast forward and backward retreat.
  • a smart terminal device such as a smart OTT (Over The Top) set-top box, a mobile phone, or a tablet computer
  • the embodiment of the present application provides a data processing method and device, which can identify repeated videos, thereby improving user experience.
  • the embodiment of the present application provides a data processing method, including: determining, according to an audio feature of a target video, whether the target video includes a repeated video by using a feature database; wherein the feature database is obtained by learning an audio feature of at least one video. And if it is determined that the target video includes a repeated video, filtering the repeated video.
  • the embodiment of the present application provides a data processing apparatus, including: an identification module, a feature database, and a processing module;
  • the identification module is configured to determine, according to an audio feature of the target video, whether the target video includes a repeated video by using the feature database; wherein the feature database is obtained by learning an audio feature of the at least one video;
  • the processing module is configured to filter the repeated video after the identifying module determines that the target video includes a repeated video.
  • An embodiment of the present application provides a data processing apparatus, including: a memory and a processor, where the memory is configured to store a data processing program, and the data processing program is implemented by the processor to implement the data processing method provided above.
  • an embodiment of the present application provides a computer readable medium storing a data processing program, where the data processing program is executed by a processor to implement the data processing method provided above.
  • FIG. 1 is a flowchart of a data processing method according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a data processing apparatus according to an embodiment of the present application.
  • FIG. 4 is a flowchart of a process for detecting and identifying a repeated video according to Embodiment 1 of the present application;
  • FIG. 5 is a flowchart of an aging process of a feature database according to Embodiment 1 of the present application.
  • FIG. 7 is a flowchart of Embodiment 3 of the present application.
  • FIG. 9 is a schematic structural diagram of a playback apparatus according to Embodiments 2 to 4 of the present application.
  • FIG. 10 is a flowchart of Embodiment 5 of the present application.
  • FIG. 11 is a schematic structural diagram of a recording apparatus according to Embodiment 5 of the present application.
  • FIG. 12 is a schematic diagram of a data processing apparatus according to an embodiment of the present application.
  • FIG. 1 is a flowchart of a data processing method according to an embodiment of the present application. As shown in FIG. 1, the data processing method provided in this embodiment includes: step S101 and step S102.
  • step S101 based on the audio characteristics of the target video, the feature database is used to determine whether the target video includes a duplicate video.
  • the feature database is formed by learning audio features of at least one video.
  • step S102 if it is determined that the target video includes a duplicate video, the duplicate video is filtered.
  • the data processing method provided by this embodiment can be applied to a terminal, such as a smart terminal device such as a set top box, a smart phone, and a tablet computer.
  • a terminal such as a smart terminal device such as a set top box, a smart phone, and a tablet computer.
  • this application is not limited thereto.
  • the data processing method provided by this embodiment may also be applicable to a server computing device, such as a server.
  • the data processing method of this embodiment may further include: learning an audio feature of the at least one video to obtain a feature database by: adding any video in the feature database if there is no video record in the feature database Corresponding video record; if there is a video record in the feature database, the feature database is updated according to the matching result of the audio feature of the video and the audio feature of the video record in the feature database.
  • the above video may include at least one of the following: a video being played, a video to be recorded, and a video to be played.
  • the feature database may be created and continuously updated during video playback, or may be created and continuously updated during video recording, or may be created according to audio characteristics of multiple videos to be played and played in video. Continuously updated during the process.
  • this application is not limited thereto.
  • audio features of the video and video attribute information may be extracted.
  • the initial feature database may be empty.
  • one or more video records are saved in the feature database, and the information stored in each video record may include: the number of occurrences, whether it is a duplicate video, Audio characteristics, and duration information.
  • the information saved by the video record may further include at least one of the following: a file name, a path, a Uniform Resource Locator (URL), a total duration of the video, and The length of time that has been played.
  • URL Uniform Resource Locator
  • updating the feature database according to the matching result of the audio feature of the video and the audio feature of the video record in the feature database may include: if the video exists in the feature database If the audio feature matches the video record, the information stored in the video record is updated in the feature database; if there is no video record matching the audio feature of the video in the feature database, the corresponding video is added in the feature database. Video recording.
  • updating the information stored in the feature database in the feature database may include: when the audio feature and feature database of the video If the continuous matching duration of the audio feature of one video record is greater than or equal to the duration threshold, the audio feature corresponding to the consecutive matching duration in the video record is retained; the number of occurrences of the video record existing in the feature database is increased by 1, and If the number of occurrences of the video record is greater than or equal to the number of times threshold, the video record is marked as a duplicate video.
  • the video recording can be marked as a duplicate video.
  • this application is not limited thereto.
  • the target video may include at least one of the following: a video being played, a video to be recorded, and a video to be played.
  • Duplicate videos can be video titles for commercials and TV shows, and so on. However, this application is not limited thereto.
  • the terminal may use the feature database to perform repeated video identification and update the feature database while playing the video; or, the terminal may learn a plurality of locally stored videos and update the feature database to play the local
  • the stored video directly uses the updated feature database to identify the repeated video; or, the server can perform repeated video recognition and learning when recording the video, and perform video recording based on the updated feature database to skip the repetition. video.
  • this application is not limited thereto.
  • the data processing method of the embodiment may further include: extracting audio features of the target video; and step S101 may include: recording audio features of the target video and audio recorded in the feature database. The feature is matched; and according to the matching result of the audio feature of the target video and the audio feature of the video record in the feature database, whether the target video includes the repeated video is identified.
  • identifying whether the target video includes the repeated video according to the matching result of the audio feature of the target video and the audio feature of the video record in the feature database may include: if the audio feature of the target video and a video in the feature database Recording audio features match, and the video record has been marked as a repeating video, determining that the target video includes a repeating video; or, if the audio feature of the target video matches an audio feature of a video recording in the feature database, and the video recording If the number of occurrences is greater than or equal to the number of times threshold after adding 1, it is determined that the target video includes a repeated video.
  • the matching between the audio feature of the target video and the audio feature of the video record in the feature database may include: the duration of the continuous matching of the audio feature of the target video and the audio feature of the video record is greater than or equal to the duration threshold. For example, when the duration of the matching of the audio features of a target video with the audio features of the video recording is greater than 5 seconds, the two are considered to match. However, this application is not limited thereto.
  • the audio feature may include at least one of: an audio amplitude waveform, an audio spectrum, and text information generated by the speech recognition; matching the audio features of the target video with the audio features of the video recording in the feature database, One of the following may be included: when the audio feature of the target video includes text information generated by the voice recognition, the text information of the target video is divided into sentences, and the text information generated by the voice recognition of the video record in the feature database is performed in units of sentences. Matching; when the audio feature of the target video includes at least one of the following: an audio amplitude waveform and an audio spectrum, the video recording in the feature database is divided into a silent interval and a voiced interval, in units of voice intervals to the target video and the feature database. Video recordings are matched.
  • step S102 may include: if the target video is a video being played, skipping the repeated video to continue playing the target video according to the duration of the repeated video in the feature database; if the target video is the video to be recorded, The duplicate video is skipped based on the length of the repeated video in the feature database.
  • the terminal may use the feature database to identify whether the video being played includes repeated video while playing the video, and when the duplicate video is recognized, skip the repeated video to continue the video playback, and identify that the video is not included. When the video is in progress, the video is played in order.
  • the terminal can use the feature database to identify whether the video to be recorded includes a repeated video when recording the video, and skip the repeated video for video recording when identifying the repeated video, and when the recognition does not include the repeated video, sequentially Video recording.
  • the data processing method of this embodiment may further include: if there is a video record matching the audio feature of the target video in the feature database, updating the information stored in the matched video record in the feature database; If there is no video record matching the audio feature of the target video in the database, the video record corresponding to the target video is added in the feature database.
  • updating the information stored in the feature database in the feature database may include: when the audio feature of the target video is in the feature database.
  • the audio feature corresponding to the consecutive matching duration in the video record is retained; and the number of occurrences of the video record existing in the feature database is increased by one, if If the number of occurrences of the video recording is greater than or equal to the number of times threshold after adding one, the video recording is marked as a duplicate video.
  • the update process of the feature database may be performed, thereby implementing continuous optimization of the feature database.
  • the user experience can be greatly improved.
  • the creation and updating of the feature database does not require user operations, and can be easily adapted to identify duplicate videos in different scenes or types of video.
  • the data processing method of this embodiment may further include: when detecting that the total number of video records in the feature database is greater than or equal to a first aging threshold, or the total occupation of video records in the feature database. If the space size is greater than or equal to the second aging threshold, the video record that satisfies the set condition is deleted.
  • the video records in the feature database can be periodically aged to avoid an infinite increase in the feature database.
  • the setting condition may include at least one of the following: the number of occurrences of the video recording is less than or equal to the first threshold, and the duration between the most recent occurrence of the video recording and the current moment is greater than or equal to the second threshold.
  • this application is not limited thereto.
  • the feature database is used to identify whether the target video includes a repeated video; wherein the feature database is obtained by learning audio features of one or more videos; if the target video is identified, including the repetition Video, then filter the duplicate video.
  • the feature database obtained by learning the audio features identifies the duplicate video, thereby improving the user experience.
  • the feature database can be used to automatically detect and skip duplicate video, thereby improving the user experience of watching the video; or, in the process of recording the video, the feature database can be used to automatically detect and identify the repeated video, thereby skipping Repeat the video for recording to facilitate subsequent video viewing.
  • the feature database is obtained through self-learning, and the creation and update of the feature database does not require user operations, and can be conveniently applied to recognize duplicate videos in different scenes or types of videos.
  • FIG. 2 is a schematic diagram of a data processing apparatus according to an embodiment of the present application.
  • the data processing apparatus provided in this embodiment includes: an identification module 201, a processing module 202, and a feature database 203.
  • the identification module 201 is configured to determine, according to the audio characteristics of the target video, whether the target video includes duplicate video by using the feature database 203; wherein the feature database 203 is obtained by learning audio features of the at least one video.
  • the processing module 202 is configured to filter the repeated video after the identification module 201 determines that the target video includes a repeated video.
  • the data processing apparatus of this embodiment may further include: an audio feature extraction module 200 configured to extract an audio feature of the target video.
  • the identification module 201 may be configured to use the feature database 203 to identify whether the target video includes a duplicate video according to the audio feature of the target video by matching the audio feature of the target video with the audio feature of the video record in the feature database 203; The matching result of the audio feature of the target video and the video record in the feature database 203 identifies whether the target video includes a duplicate video.
  • This embodiment describes that the repeated video is automatically detected and skipped during the process of playing the video by the terminal.
  • the target video is the currently played video.
  • Figure 3 is a flow chart of the embodiment. As shown in FIG. 3, the embodiment includes the following steps S301 to S304.
  • step S301 the terminal decodes the currently played video, and outputs a video image and an audio sound.
  • step S302 the terminal performs processing analysis on the decoded audio sound, extracts the audio feature, and uses the feature database to identify whether the currently played video is a duplicate video.
  • the feature database is automatically created and continuously updated during the process of playing video through the terminal history. If it is recognized that the currently played video is a duplicate video, S303 is performed; otherwise, S304 is performed.
  • step S303 when it is recognized that the currently played video is a duplicate video, the terminal reads the information of the repeated video (for example, duration) from the feature database, and then directly locates the video point after the time period. , that is, skip this repeated video to continue the video playback.
  • the information of the repeated video for example, duration
  • step S304 when it is recognized that the currently played video is not a duplicate video, the terminal continues to play the current video sequentially, and adds a video record corresponding to the currently played video in the feature database, wherein the audio feature of the current video is saved.
  • the terminal user can set whether to enable the above function of automatically skipping the repeated video. For example, when entering the interface of the playing software displayed by the terminal, the terminal will give a user prompt to prompt the user to select whether to enable the automatic skipping of the repeated video function, wherein after the user makes a selection, the selected function can be permanently valid, It can only be effective for this use.
  • the terminal may first determine whether the function of automatically skipping the repeated video is enabled. If the above function is enabled, the step shown in FIG. 3 is performed, if not When enabled, normal sequential playback is performed.
  • FIG. 4 is a flow chart of the process of detecting and identifying a repeated video of the present embodiment. S302 to S304 will be described in detail below with reference to FIG. As shown in FIG. 4, the above process may include the following steps S401 to 414.
  • step S401 the audio sound in the currently played video is analyzed to extract an audio feature; wherein the audio feature may include but is not limited to at least one of: an audio amplitude waveform, an audio spectrum, an audio zero-crossing rate, and a voice recognition. Generated text information.
  • the feature database is an audio feature database created and continuously updated during the process of playing the video by the terminal, wherein the audio features of the partial repetition time period of the previously played video or the audio features of the entire time period are saved.
  • the information stored in a video record in the feature database may include: the number of occurrences, whether it is a repeated video, an audio feature, and duration information (eg, the total duration of the video, the duration of the play, etc.); if a video record is saved If the number of times is 1, the information saved by the video record may further include: a file name of the video, a file path, and a Uniform Resource Locator (URL).
  • URL Uniform Resource Locator
  • the audio features of the currently played video extracted in step S401 are matched with the audio features of the video records in the feature database, and the video records matching the audio features extracted in step S401 are searched for.
  • the traversal order of the video records in the feature database is performed in descending order of the number of occurrences to increase the probability of fast hit matching.
  • the following describes the matching process between the audio features of the currently played video and the audio features of the video recording.
  • step S402 it is determined whether the number of occurrences of the video record is 1, and whether at least one of the file name, the file path, and the URL of the video record whose appearance number is 1 in the current play video and the feature database is consistent; if the video record If the number of occurrences is 1, and at least one of the file name, the file path, and the URL of the two are consistent, step S403 is performed; otherwise, step S404 is performed.
  • step S403 the currently played video is not counted as a duplicate video, and the matching with the video recording in the feature database is stopped, and the currently played video is sequentially played.
  • the current play video is the video corresponding to the video record. Therefore, the currently playing video is not counted as a duplicate video, and the number of occurrences of the video recording is not increased by one. If the played duration in the video record is less than the total duration, it means that the last time the playback time has not been played. If the playback duration exceeds the last played duration, the excess audio feature can be saved to the video record. And update the elapsed duration of the video recording.
  • step S404 it is determined whether the audio feature of the currently played video and the audio feature of a video record match; if the two match, step S405 is performed; otherwise, step S406 is performed, that is, the next video record is selected for matching.
  • the audio feature includes text information generated by voice recognition
  • the text information may be divided into sentences, and the currently played video and the video record are matched in units of sentences.
  • the audio feature includes at least one of the following: an audio amplitude waveform and an audio spectrum
  • the video can be divided into a silent interval and a voice interval by a Pulse Code Modulation (PCM) level value, and then matched in units of voice intervals.
  • PCM Pulse Code Modulation
  • the comparison that is, comparing the voiced interval of the currently played video with each voiced interval of the video record; since there is generally a silent interval between the repeated video and the positive film such as the title and the advertisement, the matching is performed in units of voiced intervals. , the accuracy of the matching can be ensured, and the matching can be avoided from each sampling point one by one, thereby greatly reducing the computational complexity.
  • the video record corresponding to the currently played video may be added in the feature database, where
  • the information saved by the video record may include: the number of occurrences (which can be recorded as 1), the audio characteristics of the currently played video, and the duration information, and may also record the file name, file path or URL of the currently played video.
  • step S405 if the audio feature of the currently played video matches the audio feature of a certain video record, and the matching point is not the starting point of the video recording, the audio feature before the matching point is deleted from the video record, and only the matching is retained. Part of the audio features, and then continue to match the following audio features; wherein the audio features before the matching points can be permanently deleted, or can be re-saved into a new video record for subsequent extraction of possible repeating video.
  • step S407 it is determined whether the duration of the matching of the audio feature of the currently played video and the audio feature of the video record exceeds the duration threshold; if yes, step S409 is performed; otherwise, step S408 is performed, that is, the current video is continued to be played, without skipping Any video, and reuse the feature database for the recognition process of the repeated video during the playback.
  • step S409 when the continuous matching duration exceeds the duration threshold (for example, 5 seconds), it is checked whether the video record in the matching feature database is marked as a duplicate video; if yes, step S410 is performed to indicate that the currently played video is Repeating the video, according to the duration of the repeated video, skipping the repeated video, and then continuing to play down, and repeating the feature database for repeating video recognition processing during the playback; if the video recording is not marked as duplicate If the video is video, step S411 is performed, that is, the subsequent audio features are continued to be matched until the end of the video recording in the feature database.
  • the duration threshold for example, 5 seconds
  • step S412 it is determined whether the audio feature of the currently played video always matches the end of the video recording; if it has been matched to the end, step S413 is performed, otherwise, step S414 is performed.
  • step S413 if the end of the video recording is always matched, it indicates that the current playing video completely matches the video recording during the period of time, and the number of occurrences of the video recording may be increased by one, and if the number of occurrences after adding 1 exceeds
  • the number of times threshold (for example, 2 times) can mark the matched video record as a duplicate video.
  • step S414 if the end of the video recording is not matched, the information of the video record is updated, and the unmatched audio feature after the matching end point is deleted from the video record, and the number of occurrences is increased by one, if 1 is added After the number of occurrences exceeds the number of times threshold, the matched video record can be marked as a duplicate video.
  • the audio feature behind the matching end point can be re-saved into a new video record for subsequent extraction of possible duplicate video.
  • the duration of the repeated video is read from the feature database, and the video is directly skipped to the time point after the duration of the period to continue playing, and the repeated video is continued in subsequent playback.
  • the recognition process if it recognizes the duplicate video, continues to skip the repeated video for playback.
  • FIG. 5 is a flowchart of an aging process of a feature database according to an embodiment of the present application.
  • the aging processing flow of the feature database may be performed periodically by a separately created thread, or when a trigger instruction is received.
  • this application is not limited thereto.
  • the aging processing flow of the feature database includes the following steps S501 to S505.
  • step S501 the total number of video records of the feature database is acquired.
  • step S502 it is determined whether the total number of video records of the feature database exceeds the first aging threshold; if yes, step S503 is performed; otherwise, the process returns to step S501.
  • step S503 each video record in the feature database is traversed.
  • step S504 it is determined whether the video recording meets the setting condition; for example, the setting condition may include: the number of occurrences of the video recording is less than the first threshold, and the duration between the most recent occurrence of the video recording and the current moment is greater than If the currently determined video record satisfies the set condition, step S505 is performed to delete the video record; otherwise, return to step S503 to traverse the next video record.
  • the first threshold and the second threshold may be preset.
  • the video record may not be a duplicate video, and may be deleted.
  • the capacity of the small feature database if the number of occurrences of the video record in the feature database is too small, and the most recent time has exceeded the current time, the video record may not be a duplicate video, and may be deleted.
  • step S501 the total occupied space size of the video record of the feature database may also be acquired.
  • step S502 it is determined whether the total occupied space size is greater than or equal to the second aging threshold. Determine if aging is performed.
  • the first aging threshold and the second aging threshold may be preset according to the total storage amount of the feature database.
  • the process of creating and updating the feature database can be referred to FIG. 4, that is, the creation and update of the feature database is performed during the video playing process.
  • the terminal may first perform the creation and update of the feature database according to the audio features of the multiple stored videos locally, and then directly perform video playback by using the updated feature database, or use multiple videos stored locally.
  • duplicate videos can be marked by identification in each video, and then repeated videos are skipped according to the identification when playing the videos.
  • This embodiment illustrates the automatic detection and skipping of advertisements by voice recognition.
  • the playback apparatus of this embodiment may include: a data reading module 901 (corresponding to the processing module described above), an audio and video decoding module 902, an audio feature extraction module 903, a feature database 905, and a feature matching module 904 ( Equivalent to the above identification module).
  • the playback device in this embodiment may be a smart terminal device such as a smart phone or a tablet computer, or a variety of players or software installed on the smart terminal device. However, this application is not limited thereto.
  • FIG. 6 is a schematic flowchart of the embodiment. As shown in FIG. 6, the embodiment includes the following steps S601 to S606.
  • step S601 the playback device plays a locally recorded video or a video on demand (vod) video, and the data reading module acquires a video stream to be played from a local storage device or a network.
  • step S602 the audio and video decoding module of the playback device decodes the acquired video code stream, and outputs the video image and the audio sound.
  • step S603 the audio feature extraction module analyzes the audio sound in the played video, recognizes the speech into a text by voice recognition, divides the text into sentences, and uses these sentences as audio features to be matched; the feature matching module passes the feature database. Identify whether the currently playing video is a duplicate video. In this example, the duplicate video refers to the advertising video.
  • the feature matching module matches the statement identified in step S603 with the sentence features of the video record in the feature database, and compares the sentences in a complete sentence.
  • the traversal order of the video records in the feature database is from the high number of occurrences. To the low order.
  • step S604 if there is a matching video record in the feature database, and the video record is recorded as a duplicate video (in this example, an advertisement video), the feature matching module reads the information of the segment of the repeated video from the feature database. , such as the duration, and feedback to the data reading module of the playback device.
  • step S606 the data reading module directly locates the time point after the duration of the period, that is, skips the advertisement video, and stops the playback at the end time point of the advertisement video.
  • step S605 if there is no matching video record in the feature database, the data reading module continues to play the video sequentially, and saves the statement feature information to the feature database to update the feature database.
  • This embodiment illustrates the automatic detection and skipping of advertisements by audio amplitude waveforms.
  • the difference between this embodiment and the second embodiment is that in the present embodiment, the audio amplitude waveform of the video program is used as the audio feature for matching.
  • the playback apparatus of this embodiment may include: a data reading module 901 (corresponding to the processing module described above), an audio and video decoding module 902, an audio feature extraction module 903, a feature database 905, and a feature matching module 904 ( Equivalent to the above identification module).
  • Fig. 7 is a flow chart showing the embodiment. As shown in FIG. 7, the embodiment includes the following steps S701 to S706.
  • step S701 the playing device plays the locally recorded video or the video of the network on-demand vod, and the data reading module acquires the video code stream to be played from the local storage device or the network.
  • step S702 the audio and video decoding module of the playback device decodes the acquired video code stream, and outputs the video image and the audio sound.
  • step S703 the audio feature extraction module analyzes the audio sound in the played video, samples the audio amplitude at the moment at a certain time, and the audio amplitude of the plurality of times depicts the audio amplitude waveform; the feature matching module passes the feature
  • the database identifies whether the currently playing video is a duplicate video.
  • the duplicate video refers to the advertising video.
  • the feature matching module matches the audio amplitude waveform extracted in step S703 and the audio amplitude waveform of the video record in the feature database. Because there is usually a silent interval between repeated video and positive video such as title and advertisement, the video file can be divided into silent interval and voice interval by Pulse Code Modulation (PCM) level. The silent interval and the voice interval are If the interval is arranged, the matching comparison is performed in units of voiced intervals, and the comparison is started from the starting point of the voiced interval. In the case where there is a duplicate video in the video database of the feature database that is consistent with the currently played video, it can be guaranteed. Finding the starting point of this duplicate video, will not match because of the misalignment of the matching time point, can not find this duplicate video. In addition, it is also avoided that the matching must be started from each sampling point one by one, which can ensure the accuracy of the matching and greatly reduce the computational complexity.
  • PCM Pulse Code Modulation
  • step S704 if the feature database has a matching video record, and the video record is recorded as a duplicate video (in this example, an advertisement video), the feature matching module reads the information of the segment of the repeated video from the feature database, Such as the duration, and feedback to the data reading module of the playback device.
  • a matching video record in this example, an advertisement video
  • the feature matching module reads the information of the segment of the repeated video from the feature database, Such as the duration, and feedback to the data reading module of the playback device.
  • step S706 the data reading module directly locates the time point after the duration of the period, that is, skips the advertisement video, and stops the playback at the end time of the advertisement video.
  • step S705 if there is no matching video record in the feature database, the data reading module continues to play the video sequentially, and saves the audio feature information to the feature database to update the feature database.
  • This embodiment illustrates the automatic detection and skipping of a television drama header by an audio amplitude waveform.
  • Users sometimes have the habit of playing a drama, downloading a whole TV series from the Internet to watch locally, or watching it on the TV set module of the set-top box.
  • Some TV series have a long title in front of each episode. It is very tedious for the user to watch the repeated titles every time. If you don't want to see it, you need to manually operate the remote controller to jump forward and jump, resulting in poor user experience.
  • the embodiment provides a playback apparatus for automatically detecting and skipping a TV drama title.
  • the playback apparatus of this embodiment may include: a data reading module 901 (corresponding to the processing module described above), an audio and video decoding module 902, an audio feature extraction module 903, a feature database 905, and a feature matching module 904 ( Equivalent to the above identification module).
  • FIG. 8 is a schematic flow chart of the embodiment. As shown in FIG. 8, the embodiment includes the following steps S801 to S805.
  • step S801 the playback device plays the locally recorded or network-on-demand vox TV drama video, and the data reading module acquires the video code stream to be played from the local storage device or the network.
  • step S802 the audio and video decoding module of the playback device decodes the acquired video code stream, and outputs the video image and the audio sound.
  • the audio feature extraction module performs processing analysis on the decoded audio sound, and draws an audio amplitude waveform, and recognizes the slice header by playing the video of the first N sets (where N can be an integer greater than or equal to 2).
  • the audio amplitude waveform creates a video recording of the repeating video in the feature database.
  • the repeating video refers to the intro video of the TV series.
  • the matching manner of the audio features refer to the description of the third embodiment.
  • step S804 in the subsequently played episode, the audio feature extraction module extracts the audio amplitude waveform of the front slice header portion, and the feature matching module converts the audio amplitude waveform of the front slice header portion and the audio amplitude of the video record in the feature database.
  • the value waveforms are matched to identify the video of the title in the currently playing episode.
  • step S805 the data reading module locates the time point at which the slice header recorded in the feature database ends to continue playing. In this way, the user does not need to watch the repeated title video during the drama, which improves the user experience.
  • This embodiment describes that the repeated video is automatically detected and skipped during the process of recording video by the terminal. Users often use the set-top box to record the programs they want to watch later to the local storage device for later viewing.
  • This embodiment provides a recording apparatus that automatically detects and skips repeated video.
  • the recording apparatus provided in this embodiment may include: a data reading module 1101, an audio and video decoding module 1102, an audio feature extraction module 1103, a feature database 1105, and a feature matching module 1104 (corresponding to the above identification module).
  • the recording buffer 1106 and the recording module 1107 (corresponding to the processing module described above).
  • the recording device of this embodiment may be a smart terminal device such as a set top box, a smart phone, or a tablet computer. However, this application is not limited thereto.
  • Fig. 10 is a flow chart of this embodiment. As shown in FIG. 10, the embodiment includes the following steps S1001 to S1007.
  • step S1001 the recording device first saves the code stream to be recorded into the recording buffer area of the memory.
  • step S1002 the audio and video decoding module decodes the code stream while recording.
  • step S1003 the audio feature extraction module performs processing analysis on the decoded audio to extract audio features.
  • the feature matching module uses the feature database to identify whether the currently recorded video is a duplicate video; wherein the feature database is an audio feature database automatically created by the historically played video.
  • the feature database is an audio feature database automatically created by the historically played video.
  • step S1005 if it is recognized that the current recording content is a duplicate video, the recording module does not save the portion of the content in the recording buffer to the storage device.
  • step S1007 skipping the part of the content and continuing the matching of the recorded content, if the audio feature of the recorded content and the audio feature of the duplicate video in the feature database are consistent, the recording module continues to not save the part of the recording buffer to the storage.
  • the device continues to record until it encounters content that does not match in the feature database, that is, it is saved to the storage device.
  • step S1006 if it is recognized that the current recording content is not a duplicate video, the recording module continues to sequentially record the video, that is, to continue saving the content in the recording buffer to the storage device.
  • FIG. 12 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention.
  • the data processing apparatus 1200 provided in this embodiment, for example, a terminal or a server, includes: a memory 1201 and a processor 1202.
  • the memory 1201 is configured to store a data processing program, and when the data processing program is executed by the processor 1202, The steps of implementing the data processing method described above.
  • the processor 1202 may include, but is not limited to, a processing device such as a Micro Controller Unit (MCU) or a Field Programmable Gate Array (FPGA).
  • the memory 1201 may be set as a software program and a module for storing application software, such as program instructions or modules corresponding to the data processing method in the embodiment, and the processor 1202 executes various kinds by executing a software program and a module stored in the memory 1201. Functional application and data processing, such as implementing the data processing method provided by this embodiment.
  • Memory 1201 can include high speed random access memory and can also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 1201 can include memory remotely located relative to processor 1202 that can be coupled to data processing device 1200 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the embodiment of the present application further provides a computer storage medium, where a data processing program is stored, and the data processing method is implemented by the processor to implement the foregoing data processing method.
  • computer storage medium includes volatile and nonvolatile, implemented in any method or technology for storing information, such as computer readable instructions, data structures, program modules or other data. Sex, removable and non-removable media.
  • Computer storage media include, but are not limited to, Random Access Memory (RAM), Read-Only Memory (ROM), and Electrically Erasable Programmable Read Only Memory (EEPROM). , flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical disc storage, magnetic box, magnetic tape, disk storage or other magnetic storage A device, or any other medium that can be used to store desired information and that can be accessed by a computer.
  • communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and can include any information delivery media. .

Abstract

一种数据处理方法及装置;该数据处理方法包括:根据目标视频的音频特征,利用特征数据库确定目标视频是否包括重复视频(S101);若确定目标视频包括重复视频,则对重复视频进行过滤(S102)。

Description

数据处理方法及装置
本申请要求在2017年12月26日提交中国专利局、申请号为201711435400.5的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及但不限于数据处理技术领域,例如涉及一种数据处理方法及装置。
背景技术
目前机顶盒的录制功能已经逐渐成为机顶盒的一个常见功能,机顶盒录制的视频以及交互式网络电视(Internet Protocol Television,IPTV)回看节目中录制的视频,都是直接把直播的内容完整录制下来,比如电视剧的片头、片尾、正片中间的广告视频均会被录制下来;用户点播录制节目时,需要把片头或广告看完才能继续看正片。另外,用户在智能OTT(Over The Top)机顶盒、手机、以及平板电脑等智能终端设备上点播互联网上的视频时,如果用户不想看片头或广告,只能手动快进跳过片头或广告,不仅操作繁琐,而且快进还容易跳的不准确,导致快进快退来回操作。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供一种数据处理方法及装置,能够识别重复视频,从而提高用户体验。
本申请实施例提供一种数据处理方法,包括:根据目标视频的音频特征,利用特征数据库确定所述目标视频是否包括重复视频;其中,所述特征数据库是通过学习至少一个视频的音频特征得到的;若确定所述目标视频包括重复视频,则对所述重复视频进行过滤。
本申请实施例提供一种数据处理装置,包括:识别模块、特征数据库以及处理模块;
所述识别模块,配置为根据目标视频的音频特征,利用所述特征数据库确 定所述目标视频是否包括重复视频;其中,所述特征数据库是通过学习至少一个视频的音频特征得到的;
所述处理模块,配置为在所述识别模块确定所述目标视频包括重复视频后,对所述重复视频进行过滤。
本申请实施例提供一种数据处理装置,包括:存储器和处理器,所述存储器设置为存储数据处理程序,所述数据处理程序被所述处理器执行时实现上述提供的数据处理方法。
此外,本申请实施例提供一种计算机可读介质,存储有数据处理程序,所述数据处理程序被处理器执行时实现上述提供的数据处理方法。
附图概述
图1为本申请实施例提供的数据处理方法的流程图;
图2为本申请实施例提供的数据处理装置的示意图;
图3为本申请实施例一的流程图;
图4为本申请实施例一的重复视频的检测和识别过程的流程图;
图5为本申请实施例一的特征数据库的老化处理流程图;
图6为本申请实施例二的流程图;
图7为本申请实施例三的流程图;
图8为本申请实施例四的流程图;
图9为本申请实施例二至四的播放装置的结构示意图;
图10为本申请实施例五的流程图;
图11为本申请实施例五的录制装置的结构示意图;
图12为本申请实施例提供的数据处理装置的示意图。
具体实施方式
以下结合附图对本申请实施例进行详细说明,应当理解,以下所说明的实施例仅用于说明和解释本申请,并不用于限定本申请。
在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行。并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
图1为本申请实施例提供的数据处理方法的流程图。如图1所示,本实施 例提供的数据处理方法,包括:步骤S101和步骤S102。
在步骤S101中,根据目标视频的音频特征,利用特征数据库确定目标视频是否包括重复视频。
其中,特征数据库是通过学习至少一个视频的音频特征形成的。
在步骤S102中,若确定目标视频包括重复视频,则对重复视频进行过滤。
本实施例提供的数据处理方法可以适用于终端,比如,机顶盒、智能手机、以及平板电脑等智能终端设备。然而,本申请对此并不限定。在其他实现方式中,本实施例提供的数据处理方法还可以适用于服务端计算设备,比如服务器。
在示例性实施方式中,本实施例的数据处理方法还可以包括:通过以下方式学习至少一个视频的音频特征得到特征数据库:若特征数据库中不存在视频记录,则在特征数据库新增任一视频对应的视频记录;若特征数据库中存在视频记录,则根据该视频的音频特征与特征数据库中的视频记录的音频特征的匹配结果,更新特征数据库。
其中,上述视频可以包括以下至少一项:正在播放的视频、待录制的视频、以及待播放的视频。然而,本申请对此并不限定。示例性地,特征数据库可以在视频播放过程中创建并持续更新,或者,可以在视频录制过程中创建并持续更新,或者,可以根据待播放的多个视频的音频特征进行创建,并在视频播放过程中持续更新。然而,本申请对此并不限定。
本实施例中,针对任一视频,可以提取该视频的音频特征以及视频属性信息(比如,时长信息和文件名等)。初始的特征数据库可以为空,通过学习一个或多个视频的音频特征后,特征数据库内会保存一个或多个视频记录,每个视频记录保存的信息可以包括:出现次数、是否为重复视频、音频特征、以及时长信息。在一个视频记录的出现次数为1时,该视频记录保存的信息还可以包括以下至少之一:视频的文件名、路径、统一资源定位符(Uniform Resource Locator,URL)、视频的总时长、以及已播放时长。
在示例性实施方式中,若特征数据库中存在视频记录,则根据视频的音频特征与特征数据库中的视频记录的音频特征的匹配结果,更新特征数据库,可以包括:若特征数据库中存在与该视频的音频特征匹配的视频记录,则更新特征数据库中匹配到的该视频记录保存的信息;若特征数据库中不存在与该视频的音频特征匹配的视频记录,则在特征数据库中新增该视频对应的视频记录。
在示例性实施方式中,若特征数据库中存在与该视频的音频特征匹配的视 频记录,则更新特征数据库中匹配到的该视频记录保存的信息,可以包括:当该视频的音频特征与特征数据库内的一个视频记录的音频特征的连续匹配时长大于或等于时长阈值,则保留该视频记录中该连续匹配时长对应的音频特征;将特征数据库中存在的该视频记录的出现次数加1,并在该视频记录的出现次数加1后大于或等于次数阈值,则将该视频记录标记为重复视频。
例如,一个视频记录的出现次数大于或等于2次时,可以将该视频记录标记为重复视频。然而,本申请对此并不限定。
在本实施例中,目标视频可以包括以下至少之一:正在播放的视频、待录制的视频、以及待播放的视频。重复视频可以为广告视频与电视剧的片头视频等。然而,本申请对此并不限定。
在一实施例中,终端可以在播放视频的同时,利用特征数据库进行重复视频的识别,并更新特征数据库;或者,终端可以对本地存储的多个视频进行学习,并更新特征数据库,在播放本地存储的视频时直接利用更新后的特征数据库进行重复视频的识别;或者,服务器可以在录制视频时,进行重复视频的识别和学习,并基于学习更新后的特征数据库进行视频录制,以跳过重复视频。然而,本申请对此并不限定。
在示例性实施方式中,在步骤S101之前,本实施例的数据处理方法还可以包括:提取目标视频的音频特征;步骤S101可以包括:将目标视频的音频特征与特征数据库中的视频记录的音频特征进行匹配;根据目标视频的音频特征与特征数据库中的视频记录的音频特征的匹配结果,识别目标视频是否包括重复视频。
在一实施例中,根据目标视频的音频特征与特征数据库中的视频记录的音频特征的匹配结果,识别目标视频是否包括重复视频,可以包括:若目标视频的音频特征与特征数据库中的一个视频记录的音频特征匹配,且该视频记录已标记为重复视频,则确定目标视频包括重复视频;或者,若目标视频的音频特征与特征数据库中的一个视频记录的音频特征匹配,且该视频记录的出现次数加1后大于或等于次数阈值,则确定目标视频包括重复视频。
其中,目标视频的音频特征与特征数据库中的一个视频记录的音频特征匹配可以包括:目标视频的音频特征与视频记录的音频特征的连续匹配时长大于或等于时长阈值。比如,当一个目标视频的音频特征与视频记录的音频特征的连续匹配时长大于5秒,则认为两者匹配。然而,本申请对此并不限定。
在一实施例中,音频特征可以包括以下至少一项:音频幅值波形、音频频谱、以及语音识别生成的文字信息;将目标视频的音频特征与特征数据库中的视频记录的音频特征进行匹配,可以包括以下之一:当目标视频的音频特征包括语音识别生成的文字信息,则将目标视频的文字信息划分为句子,以句子为单位与特征数据库中的视频记录的语音识别生成的文字信息进行匹配;当目标视频的音频特征包括以下至少之一:音频幅值波形与音频频谱,则将特征数据库中的视频记录分成无声区间和有声区间,以有声区间为单位对目标视频与特征数据库中的视频记录进行匹配。
如此,可以避免必须逐个从采样点开始进行匹配,不仅可以保证匹配的准确度,而且减少了计算复杂度。
在示例性实施方式中,步骤S102可以包括:若目标视频为正在播放的视频,则根据特征数据库中重复视频的时长,跳过重复视频继续播放目标视频;若目标视频为待录制的视频,则根据特征数据库中重复视频的时长,跳过重复视频。
在一实施例中,终端可以在播放视频的同时,利用特征数据库识别正在播放的视频是否包括重复视频,在识别出包括重复视频时,跳过重复视频继续进行视频播放,在识别出不包括重复视频时,按顺序播放视频。或者,终端可以在录制视频时,利用特征数据库识别待录制的视频是否包括重复视频,在识别出包括重复视频时,跳过重复视频进行视频录制,在识别出不包括重复视频时,按顺序进行视频录制。
在示例性实施方式中,本实施例的数据处理方法还可以包括:在特征数据库中存在与目标视频的音频特征匹配的视频记录,则更新特征数据库中匹配到的视频记录保存的信息;在特征数据库中不存在与目标视频的音频特征匹配的视频记录,则在特征数据库新增该目标视频对应的视频记录。
在一实施例中,在特征数据库中存在与目标视频的音频特征匹配的视频记录,则更新特征数据库中匹配到的该视频记录保存的信息,可以包括:当目标视频的音频特征与特征数据库内的一个视频记录的音频特征的连续匹配时长大于或等于时长阈值时,则保留该视频记录中该连续匹配时长对应的音频特征;将特征数据库中存在的该视频记录的出现次数加1,若该视频记录的出现次数加1后大于或等于次数阈值,则将该视频记录标记为重复视频。
其中,在对目标视频的重复视频的识别过程中,可以进行特征数据库的更新过程,从而实现对特征数据库的不断优化。
本实施例中,利用自我学习得到的特征数据库识别重复视频,可以大大地提高用户体验。而且,特征数据库的创建和更新无需用户操作,还能够方便地适用于识别不同场景或类型的视频中的重复视频。
在示例性实施方式中,本实施例的数据处理方法还可以包括:当检测到特征数据库内的视频记录的总条数大于或等于第一老化阈值,或者,特征数据库内的视频记录的总占用空间大小大于或等于第二老化阈值,则删除满足设定条件的视频记录。
如此,通过引入老化机制,可以定期对特征数据库内的视频记录进行老化,从而避免特征数据库无限地增大。
其中,设定条件可以包括以下至少之一:视频记录的出现次数小于或等于第一阈值,视频记录的最近出现时刻与当前时刻之间的时长大于或等于第二阈值。然而,本申请对此并不限定。
在本申请实施例中,根据目标视频的音频特征,利用特征数据库识别目标视频是否包括重复视频;其中,特征数据库是通过学习一个或多个视频的音频特征得到的;若识别出目标视频包括重复视频,则对重复视频进行过滤。如此,利用学习音频特征得到的特征数据库识别重复视频,从而提高用户体验。比如,在播放视频的过程中可以利用特征数据库自动检测和跳过重复视频,从而提升观看视频的用户体验;或者,在录制视频的过程中可以利用特征数据库自动检测和识别重复视频,从而跳过重复视频进行录制,以方便后续的视频观看。而且,特征数据库是通过自我学习得到的,特征数据库的创建和更新无需用户操作,能够方便地适用于识别不同场景或类型的视频中的重复视频。
图2为本申请实施例提供的数据处理装置的示意图。如图2所示,本实施例提供的数据处理装置,包括:识别模块201、处理模块202以及特征数据库203。
识别模块201,配置为根据目标视频的音频特征,利用特征数据库203确定目标视频是否包括重复视频;其中,特征数据库203是通过学习至少一个视频的音频特征得到的。
处理模块202,配置为在识别模块201确定目标视频包括重复视频,则对重复视频进行过滤。
在一实施例中,本实施例的数据处理装置还可以包括:音频特征提取模块200,配置为提取目标视频的音频特征。
识别模块201,可以配置为通过以下方式根据目标视频的音频特征,利用特征数据库203识别目标视频是否包括重复视频:将目标视频的音频特征与特征数据库203中的视频记录的音频特征进行匹配;根据目标视频的音频特征与特征数据库203中的视频记录的匹配结果,识别目标视频是否包括重复视频。
关于本实施例的数据处理装置的相关说明可以参照上述方法实施例及下述示例的说明,故于此不再赘述。
下面通过多个实施例对本申请的方案进行说明。
实施例一
本实施例说明在终端播放视频的过程中自动检测和跳过重复视频。本实施例中,目标视频为当前播放的视频。
图3为本实施例的流程图。如图3所示,本实施例包括以下步骤S301至步骤S304。
在步骤S301中,终端对当前播放的视频进行解码,输出视频图像和音频声音。
在步骤S302中,终端对解码的音频声音进行处理分析,提取出音频特征,并利用特征数据库识别出当前播放的视频是否是重复视频。本实施例中,特征数据库是通过终端历史播放视频过程中自动创建并持续更新得到的。若识别出当前播放的视频是重复视频,执行S303,否则,执行S304。
在步骤S303中,在识别出当前播放的视频是重复视频时,终端从特征数据库中读取此段重复视频的信息(比如,时长),然后直接定位到这段时长之后的时间点进行视频播放,即跳过这段重复视频继续视频播放。
在步骤S304中,在识别出当前播放的视频不是重复视频时,终端继续顺序播放当前视频,并在特征数据库新增当前播放的视频对应的视频记录,其中,保存当前视频的音频特征。
本实施例中,终端用户可以自己设置是否启用上述自动跳过重复视频的功能。比如,在进入终端显示的播放软件的界面时,终端会给出用户提示,提示用户选择是否启用自动跳过重复视频的功能,其中,用户进行选择之后,本次选择的功能可以永久生效,也可以只针对本次使用生效。在一实施例中,终端在执行图3所示的步骤之前,可以先判断是否使能了自动跳过重复视频的功能,如果使能了上述功能,则执行图3所示的步骤,如果没有使能,则进行普通的顺序播放。
图4为本实施例的重复视频的检测和识别过程的流程图。下面通过图4对S302至S304进行详细说明。如图4所示,上述过程可以包括以下步骤S401至步骤414。
在步骤S401中,对当前播放视频中的音频声音进行分析,提取音频特征;其中,音频特征可以包括但不限于以下至少之一:音频幅值波形、音频频谱、音频过零率、以及语音识别生成的文字信息。
本实施例中,特征数据库是在终端播放视频的过程中创建并持续更新的音频特征数据库,其中保存了之前播放过的视频的部分重复时间段的音频特征或者全部时间段的音频特征。其中,特征数据库内的一个视频记录保存的信息可以包括:出现次数、是否为重复视频、音频特征、以及时长信息(比如,视频的总时长、已播放时长等);如果一个视频记录保存的出现次数为1,则该视频记录保存的信息还可以包括:视频的文件名、文件路径、统一资源定位符(Uniform Resource Locator,URL)。如果一个视频记录保存的出现次数大于1,则其中保存的文件名、文件路径或者URL没有作用。本实施例中,将步骤S401提取的当前播放视频的音频特征与特征数据库中的视频记录的音频特征进行匹配,查找与步骤S401提取的音频特征匹配的视频记录。其中,特征数据库中的视频记录的遍历顺序按照出现次数从高到低的顺序进行,以增大快速命中匹配的几率。
下面对当前播放视频的音频特征与视频记录的音频特征的匹配过程进行说明。
在步骤S402中,判断视频记录的出现次数是否为1,且当前播放视频和特征数据库中的该出现次数为1的视频记录的文件名、文件路径和URL中至少一项是否一致;若视频记录的出现次数为1,且两者的文件名、文件路径和URL中至少一项一致,则执行步骤S403;否则,执行步骤S404。
在步骤S403中,当前播放视频不算作重复视频,停止与特征数据库内视频记录的匹配,顺序播放当前播放视频。
在本实施例中,如果当前播放视频的文件名、文件路径和URL中至少一项和特征数据库中的某个出现次数为1的视频记录相同,说明当前播放视频即为该视频记录对应的视频,因此,当前播放视频不算做重复视频,该视频记录的出现次数不加1。如果该视频记录中的已播放时长小于总时长,则说明上次没有播放完,如果这次的播放时长超过了上次的已播放时长,则可以将超过部分的 音频特征保存到该视频记录中,并更新该视频记录的已播放时长。
在步骤S404中,判断当前播放视频的音频特征和一个视频记录的音频特征是否匹配;若两者匹配,则执行步骤S405,否则,执行步骤S406,即选择下一个视频记录进行匹配。
本实施例中,如果音频特征包括语音识别生成的文字信息,可以将文字信息划分为句子,以句子为单位对当前播放视频与视频记录进行匹配。如果音频特征包括以下至少之一:音频幅值波形与音频频谱,可以通过脉冲编码调制(Pulse Code Modulation,PCM)电平值将视频划分为无声区间和有声区间,然后以有声区间为单位进行匹配比对,即,将当前播放视频的有声区间和视频记录的每个有声区间进行比对;因为片头和广告等重复视频和正片之间一般会有一段无声区间,这样以有声区间为单位进行匹配,既可以保证匹配的准确度,也可以避免逐个从每个采样点开始进行匹配,从而大大减小计算复杂度。
本实施例中,在遍历特征数据库内全部的视频记录后,如果特征数据库内没有与当前播放视频的音频特征匹配的视频记录,则可以在特征数据库内新增当前播放视频对应的视频记录,其中,该视频记录保存的信息可以包括:出现次数(可以记为1)、当前播放视频的音频特征、以及时长信息,还可以记录当前播放视频的文件名、文件路径或者URL。
在步骤S405中,如果当前播放视频的音频特征和某一视频记录的音频特征匹配,且匹配点不是该视频记录的起始点,则将匹配点之前的音频特征从该视频记录删除,只保留匹配部分的音频特征,然后继续匹配后面的音频特征;其中,匹配点之前的音频特征可以永久删除掉,或者,也可以重新保存成一个新的视频记录,以备后续提取其中可能的重复视频。
在步骤S407中,判断当前播放视频的音频特征和该视频记录的音频特征的连续匹配时长是否超过时长阈值;若是,则执行步骤S409,否则,执行步骤S408,即继续播放当前视频,不跳过任何视频,并且在继续播放的过程中重复利用特征数据库进行重复视频的识别处理。
在步骤S409中,在连续匹配时长超过时长阈值(比如5秒)时,则查看与之匹配的特征数据库中的视频记录是否标记为重复视频;如果是,则执行步骤S410,表示当前播放视频为重复视频,可以根据该重复视频的时长,跳过该重复视频,然后继续往下播放,并且在继续播放的过程中重复利用特征数据库进行重复视频的识别处理;如果该视频记录没有被标记为重复视频,则执行步骤 S411,即继续匹配后面的音频特征,直到特征数据库中该视频记录的结束。
在步骤S412中,判断当前播放视频的音频特征是否一直匹配到该视频记录的结束;若一直匹配到结束,则执行步骤S413,否则,执行步骤S414。
在步骤S413中,如果一直匹配到该视频记录的结束,则说明当前播放视频在这段时长和该视频记录完全匹配,则该视频记录的出现次数可以加1,如果加1之后的出现次数超过次数阈值(比如,2次),则可以标记匹配的该视频记录为重复视频。
在步骤S414中,如果匹配不到该视频记录的结束,则更新该视频记录的信息,从该视频记录中删除匹配结束点后面的不匹配的音频特征,同时将出现次数加1,如果加1之后的出现次数超过次数阈值,则可以标记匹配的该视频记录为重复视频。其中,匹配结束点后面的音频特征可以重新保存成一个新的视频记录,以备后续提取其中可能的重复视频。
本实施例中,如果识别出当前播放视频是重复视频,则从特征数据库读取该重复视频的时长,直接跳到这段时长之后的时间点继续播放,在后续的播放中继续进行上述重复视频的识别过程,如果又识别到重复视频,则继续跳过重复视频进行播放。
本实施例中,为了避免特征数据库无限地增大,占用过多的存储空间以及增加遍历匹配的时间,引入老化机制,定期对特征数据库中存在时间过长、出现次数过少的视频记录进行老化。
图5为本申请实施例的特征数据库的老化处理流程图。其中,特征数据库的老化处理流程可以通过单独创建的线程周期性执行,或者,在接收到触发指令时执行。然而,本申请对此并不限定。
如图5所示,特征数据库的老化处理流程包括以下步骤S501至步骤S505。
在步骤S501中,获取特征数据库的视频记录的总条数。
在步骤S502中,判断特征数据库的视频记录的总条数是否超过第一老化阈值;若超过,则执行步骤S503,否则,返回步骤S501。
在步骤S503中,遍历特征数据库中的每一条视频记录。
在步骤S504中,判断该视频记录是否满足设定条件;比如,设定条件可以包括:视频记录的出现次数小于第一阈值,且该视频记录的最近出现时刻与当前时刻之间的时长大于第二阈值;若当前判断的视频记录满足设定条件,则执行步骤S505,即删除该视频记录;否则,返回步骤S503,遍历下一条视频记录。 其中,第一阈值和第二阈值可以预设确定。
本实施例中,如果特征数据库中的视频记录的出现次数过少,并且最近出现时刻距离当前时刻已经超过一定时长,可以认为该视频记录很可能不会是重复视频,可以将之删除,以减小特征数据库的容量。
需要说明的是,在老化处理过程中,在步骤S501中,也可以获取特征数据库的视频记录的总占用空间大小,在步骤S502中,通过判断总占用空间大小是否大于或等于第二老化阈值来确定是否进行老化处理。然而,本申请对此并不限定。其中,第一老化阈值和第二老化阈值可以根据特征数据库的总存储量进行预设。
本实施例中,特征数据库的创建和更新过程可以参照图4所示,即在视频播放过程中进行特征数据库的创建和更新。然而,本申请对此并不限定。在其他实现方式中,终端可以根据本地存储的多个视频的音频特征先进行特征数据库的创建和更新,然后,利用更新好的特征数据库直接进行视频播放,或者,在利用本地存储的多个视频进行特征数据库的创建和更新过程中,可以在每个视频中通过标识标记出重复视频,然后在播放这些视频时直接根据标识跳过重复视频。
实施例二
本实施例说明通过语音识别自动检测和跳过广告。在用户平时观看的电影、电视剧等视频资源中经常会插入一些广告,用户往往想跳过这些广告,直接观看后面的正片。本实施例提供一种自动检测和跳过广告的播放装置。如图9所示,本实施例的播放装置可以包括:数据读取模块901(相当于上述的处理模块)、音视频解码模块902、音频特征提取模块903、特征数据库905以及特征匹配模块904(相当于上述的识别模块)。其中,本实施例的播放装置可以为智能手机、平板电脑等智能终端设备,或者,智能终端设备上设置的各种播放器或软件。然而,本申请对此并不限定。
图6为本实施例的流程示意图。如图6所示,本实施例包括以下步骤S601至步骤S606。
在步骤S601中,播放装置播放本地录制的视频或网络视频点播(Video on Demand,vod)的视频,数据读取模块从本地存储设备或网络获取要播放的视频码流。
在步骤S602中,播放装置的音视频解码模块对获取的视频码流进行解码, 输出视频图像和音频声音。
在步骤S603中,音频特征提取模块对播放视频中的音频声音进行分析,通过语音识别将语音识别成文字,将文字划分成语句,将这些语句作为待匹配的音频特征;特征匹配模块通过特征数据库识别当前播放视频是否为重复视频,本示例中,重复视频指广告视频。
其中,特征匹配模块将步骤S603识别出的语句一一和特征数据库中的视频记录的语句特征进行匹配,以一个完整句子为单位进行比对,特征数据库中视频记录的遍历顺序为出现次数从高到低的顺序。
在步骤S604中,如果特征数据库中有匹配的视频记录,并且该视频记录被记录为重复视频(本示例中即为广告视频),则特征匹配模块从特征数据库中读取此段重复视频的信息,如时长,并反馈给播放装置的数据读取模块。
在步骤S606中,数据读取模块直接定位到这段时长之后的时间点继续播放,即跳过这段广告视频,定位到该广告视频的结束时间点继续播放。
在步骤S605中,如果特征数据库中没有匹配的视频记录,则数据读取模块继续顺序播放视频,并保存语句特征信息到特征数据库,对特征数据库进行更新。
需要说明的是,关于特征数据库的创建和更新过程可以参照图4所示,故于此不再赘述。
实施例三
本实施例说明通过音频幅值波形自动检测和跳过广告。本实施例与实施例二不同的是:在本实施例中,利用视频节目的音频幅值波形作为音频特征进行匹配。
本实施例提供一种自动检测和跳过广告的播放装置。如图9所示,本实施例的播放装置可以包括:数据读取模块901(相当于上述的处理模块)、音视频解码模块902、音频特征提取模块903、特征数据库905以及特征匹配模块904(相当于上述的识别模块)。
图7是本实施例的流程示意图。如图7所示,本实施例包括以下步骤S701至步骤S706。
在步骤S701中,播放装置播放本地录制的视频或网络点播vod的视频,数据读取模块从本地存储设备或网络获取要播放的视频码流。
在步骤S702中,播放装置的音视频解码模块对获取的视频码流进行解码, 输出视频图像和音频声音。
在步骤S703中,音频特征提取模块对播放视频中的音频声音进行分析,每隔一定时间采样该时刻的音频幅值,多个时刻的音频幅值描绘出音频幅值波形;特征匹配模块通过特征数据库识别当前播放视频是否为重复视频,本示例中,重复视频指广告视频。
其中,特征匹配模块将步骤S703提取出的音频幅值波形一一和特征数据库中的视频记录的音频幅值波形进行匹配。因为片头和广告等重复视频和正片之间一般都会有一段无声区间,可以通过脉冲编码调制(Pulse Code Modulation,PCM)电平值将视频文件划分为无声区间和有声区间,无声区间和有声区间是间隔排列的,以有声区间为单位进行匹配比对,从有声区间的起始点开始比对,在特征数据库的某个视频记录中有和当前播放视频中一致的重复视频的情况下,可以保证能找到这个重复视频的起始点,不会因为比对时间点错位导致匹配不上,找不到这个重复视频。另外,也避免了必须逐个从每个采样点开始进行匹配,既可以保证匹配的准确度,也大大减小了计算复杂度。
在步骤S704中,如果特征数据库有匹配的视频记录,并且该视频记录被记录为重复视频(本示例中即为广告视频),则特征匹配模块从特征数据库中读取此段重复视频的信息,如时长,并反馈给播放装置的数据读取模块。
在步骤S706中,数据读取模块直接定位到这段时长之后的时间点播放,即跳过这段广告视频,定位到该广告视频的结束时间点继续播放。
在步骤S705中,如果特征数据库中没有匹配的视频记录,则数据读取模块继续顺序播放视频,并保存音频特征信息到特征数据库,对特征数据库进行更新。
需要说明的是,关于特征数据库的创建和更新过程可以参照图4所示,故于此不再赘述。
实施例四
本实施例说明通过音频幅值波形自动检测和跳过电视剧片头。用户有时候有煲剧的习惯,从网上下载一整部电视剧在本地观看,或者在机顶盒的电视剧模块点播观看。有的电视剧每集前面都有较长的片头,用户每次都要观看重复的片头的话是很乏味的事情,不想看还需要手动操作遥控器快进跳过去,导致用户体验不佳。
本实施例提供一种自动检测和跳过电视剧片头的播放装置。如图9所示, 本实施例的播放装置可以包括:数据读取模块901(相当于上述的处理模块)、音视频解码模块902、音频特征提取模块903、特征数据库905以及特征匹配模块904(相当于上述的识别模块)。
图8为本实施例的流程示意图。如图8所示,本实施例包括以下步骤S801至步骤S805。
在步骤S801中,播放装置播放本地录制的或网络点播vod的电视剧视频,数据读取模块从本地存储设备或网络获取要播放的视频码流。
在步骤S802中,播放装置的音视频解码模块对获取的视频码流进行解码,输出视频图像和音频声音。
在步骤S803中,音频特征提取模块对解码的音频声音进行处理分析,描绘出音频幅值波形,通过前N集(其中,N可以为大于或等于2的整数)的视频播放,识别出片头部分的音频幅值波形,在特征数据库中创建重复视频的视频记录,本示例中,重复视频指电视剧的片头视频。其中,关于音频特征的匹配方式可以参照实施例三的描述,关于特征数据库的创建和更新过程可以参照图4所示,故于此不再赘述。
在步骤S804中,在后续播放的剧集中,音频特征提取模块提取出前面片头部分的音频幅值波形,特征匹配模块将前面片头部分的音频幅值波形和特征数据库中的视频记录的音频幅值波形进行匹配,识别出当前播放剧集中的片头视频。其中,关于音频特征的匹配方式可以参照实施例三的描述,故于此不再赘述。
在步骤S805中,数据读取模块定位到特征数据库中记录的片头结束的时间点继续播放。如此,用户在煲剧时,无需观看重复的片头视频,提高了用户体验。
实施例五
本实施例说明在终端录制视频的过程中自动检测和跳过重复视频。用户经常用机顶盒将想后续观看的节目录制到本地存储设备中,留作以后观看。本实施例提供一种自动检测和跳过重复视频的录制装置。如图11所示,本实施例提供的录制装置可以包括:数据读取模块1101、音视频解码模块1102、音频特征提取模块1103、特征数据库1105、特征匹配模块1104(相当于上述的识别模块)、录制缓存1106以及录制模块1107(相当于上述的处理模块)。其中,本实施例的录制装置可以为机顶盒、智能手机、以及平板电脑等智能终端设备。然而, 本申请对此并不限定。
图10是本实施例的流程图。如图10所示,本实施例包括以下步骤S1001至步骤S1007。
在步骤S1001中,录制装置先把要录制的码流保存到内存的录制缓存区域中。
在步骤S1002中,音视频解码模块在录制的同时对码流进行解码。
在步骤S1003中,音频特征提取模块对解码的音频进行处理分析,提取音频特征。
在步骤S1004中,特征匹配模块利用特征数据库识别出当前录制的视频是否是重复视频;其中,特征数据库是通过历史播放的视频自动创建的音频特征数据库。关于音频特征的匹配方式、特征数据库的创建和更新方式可以参照上述实施例的说明,故于此不再赘述。
在步骤S1005中,如果识别出当前录制内容是重复视频,则录制模块不将录制缓存中的这部分内容保存到存储设备中。
在步骤S1007中,跳过这部分内容继续后面的录制内容的匹配,如果录制内容的音频特征和特征数据库中的重复视频的音频特征一致,录制模块继续不保存录制缓存中的这部分内容到存储设备中,直到遇到特征数据库中不匹配的内容才继续录制,即保存到存储设备中。
在步骤S1006中,如果识别出当前录制内容不是重复视频,则录制模块继续顺序录制视频,即继续将录制缓存中的内容保存到存储设备中。
如此,用户可以跳过重复视频,不把重复视频录制下来,方便后续的观看。
图12为本实施例提供的数据处理装置的示意图。如图12所示,本实施例提供的数据处理装置1200,比如,终端或服务器,包括:存储器1201和处理器1202,存储器1201设置为存储数据处理程序,该数据处理程序被处理器1202执行时实现上述的数据处理方法的步骤。
其中,处理器1202可以包括但不限于微处理器(Microcontroller Unit,MCU)或可编程逻辑器件(Field Programmable Gate Array,FPGA)等的处理装置。存储器1201可设置为存储应用软件的软件程序以及模块,如本实施例中的数据处理方法对应的程序指令或模块,处理器1202通过运行存储在存储器1201内的软件程序以及模块,从而执行各种功能应用以及数据处理,比如实现本实施例提供的数据处理方法。存储器1201可包括高速随机存储器,还可包括非易失性 存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些示例中,存储器1201可包括相对于处理器1202远程设置的存储器,这些远程存储器可以通过网络连接至数据处理装置1200。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
此外,本申请实施例还提供一种计算机存储介质,存储有数据处理程序,该数据处理程序被处理器执行时实现上述的数据处理方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块或单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块或单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器,如数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)、闪存或其他存储器技术、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Versatile Disc,DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。

Claims (15)

  1. 一种数据处理方法,包括:
    根据目标视频的音频特征,利用特征数据库确定所述目标视频是否包括重复视频;其中,所述特征数据库是通过学习至少一个视频的音频特征得到的;
    响应于确定所述目标视频包括重复视频,对所述重复视频进行过滤。
  2. 根据权利要求1所述的方法,还包括:提取所述目标视频的音频特征;
    所述根据目标视频的音频特征,利用特征数据库确定所述目标视频是否包括重复视频,包括:
    将所述目标视频的音频特征与所述特征数据库中的视频记录的音频特征进行匹配;
    根据所述目标视频的音频特征与所述特征数据库中的视频记录的音频特征的匹配结果,确定所述目标视频是否包括重复视频。
  3. 根据权利要求2所述的方法,其中,所述根据所述目标视频的音频特征与所述特征数据库中的视频记录的音频特征的匹配结果,确定所述目标视频是否包括重复视频,包括:
    基于确定所述目标视频的音频特征与所述特征数据库中的一个视频记录的音频特征匹配,且所述一个视频记录已标记为重复视频,确定所述目标视频包括重复视频;或者,
    基于确定所述目标视频的音频特征与所述特征数据库中的一个视频记录的音频特征匹配,且所述一个视频记录的出现次数加1后大于或等于次数阈值,确定所述目标视频包括重复视频。
  4. 根据权利要求2所述的方法,其中,所述音频特征包括以下至少一项:音频幅值波形、音频频谱、以及语音识别生成的文字信息;
    所述将所述目标视频的音频特征与所述特征数据库中的视频记录的音频特征进行匹配,包括以下之一:
    基于确定所述目标视频的音频特征包括语音识别生成的文字信息,将所述目标视频的文字信息划分为句子,以句子为单位与所述特征数据库中的视频记录的语音识别生成的文字信息进行匹配;
    基于确定所述目标视频的音频特征包括音频幅值波形和音频频谱中的至少之一,将所述特征数据库中的视频记录分成无声区间和有声区间,以有声区间为单位对所述目标视频与所述特征数据库中的视频记录进行匹配。
  5. 根据权利要求2所述的方法,还包括:
    基于确定所述特征数据库中存在与所述目标视频的音频特征匹配的视频记录,更新所述特征数据库中与所述目标视频的音频特征匹配的视频记录保存的信息;
    基于确定所述特征数据库中不存在与所述目标视频的音频特征匹配的视频记录,在所述特征数据库中新增所述目标视频对应的视频记录。
  6. 根据权利要求5所述的方法,其中,所述基于确定所述特征数据库中存在与所述目标视频的音频特征匹配的视频记录,更新所述特征数据库中与所述目标视频的音频特征匹配的视频记录保存的信息,包括:
    基于确定所述目标视频的音频特征与所述特征数据库内的一个视频记录的音频特征的连续匹配时长大于或等于时长阈值,保留所述一个视频记录中所述连续匹配时长对应的音频特征;
    基于确定所述目标视频的音频特征与所述特征数据库内的一个视频记录的音频特征的连续匹配时长大于或等于时长阈值,将所述特征数据库中存在的所述一个视频记录的出现次数加1,基于确定所述视频记录的出现次数加1后大于或等于次数阈值,则将所述视频记录标记为重复视频。
  7. 根据权利要求1所述的方法,其中,所述响应于确定所述目标视频包括重复视频,则对所述重复视频进行过滤,包括:
    基于确定所述目标视频为正在播放的视频,根据所述特征数据库中所述重复视频的时长,跳过所述重复视频继续播放所述目标视频;
    基于确定若所述目标视频为待录制的视频,根据所述特征数据库中所述重复视频的时长,跳过所述重复视频。
  8. 根据权利要求1所述的方法,还包括:
    基于检测到所述特征数据库内的视频记录的总条数大于或等于第一老化阈值,或者,所述特征数据库内的视频记录的总占用空间大小大于或等于第二老化阈值,删除满足设定条件的视频记录。
  9. 根据权利要求8所述的方法,其中,所述设定条件包括以下至少之一:所述视频记录的出现次数小于或等于第一阈值,视频记录的最近出现时刻与当前时刻之间的时长大于或等于第二阈值。
  10. 根据权利要求1所述的方法,还包括:通过以下方式学习至少一个视频的音频特征得到所述特征数据库:
    基于确定所述特征数据库中不存在视频记录,在所述特征数据库新增任一 视频对应的视频记录;
    基于确定所述特征数据库中存在视频记录,根据所述至少一个视频的音频特征与所述特征数据库中的视频记录的音频特征的匹配结果,更新所述特征数据库。
  11. 根据权利要求10所述的方法,其中,所述特征数据库中的任一视频记录保存的信息包括:出现次数、是否为重复视频、音频特征、以及时长信息;在所述出现次数为1时,所述视频记录保存的信息还包括以下至少之一:视频的文件名、路径、以及统一资源定位符。
  12. 一种数据处理装置,包括:
    识别模块、特征数据库以及处理模块;
    所述识别模块,配置为根据目标视频的音频特征,利用所述特征数据库确定所述目标视频是否包括重复视频;其中,所述特征数据库是通过学习至少一个视频的音频特征得到的;
    所述处理模块,配置为在所述识别模块确定所述目标视频包括重复视频时,对所述重复视频进行过滤。
  13. 根据权利要求12所述的装置,还包括:
    音频特征提取模块,配置为提取所述目标视频的音频特征;
    所述识别模块,配置为通过以下方式根据目标视频的音频特征,利用特征数据库确定所述目标视频是否包括重复视频:
    将所述目标视频的音频特征与所述特征数据库中的视频记录的音频特征进行匹配;
    根据所述目标视频的音频特征与所述特征数据库中的视频记录的匹配结果,确定所述目标视频是否包括重复视频。
  14. 一种数据处理装置,包括:存储器和处理器,所述存储器设置为存储数据处理程序,所述数据处理程序被所述处理器执行时实现如权利要求1至11中任一项所述的数据处理方法。
  15. 一种计算机可读介质,存储有数据处理程序,所述数据处理程序被处理器执行时实现如权利要求1至11中任一项所述的数据处理方法。
PCT/CN2018/120770 2017-12-26 2018-12-13 数据处理方法及装置 WO2019128724A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP18897396.0A EP3745727A4 (en) 2017-12-26 2018-12-13 DATA PROCESSING PROCESS AND DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711435400.5 2017-12-26
CN201711435400.5A CN108153882A (zh) 2017-12-26 2017-12-26 一种数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2019128724A1 true WO2019128724A1 (zh) 2019-07-04

Family

ID=62463127

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/120770 WO2019128724A1 (zh) 2017-12-26 2018-12-13 数据处理方法及装置

Country Status (3)

Country Link
EP (1) EP3745727A4 (zh)
CN (1) CN108153882A (zh)
WO (1) WO2019128724A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810737A (zh) * 2021-09-30 2021-12-17 深圳市雷鸟网络传媒有限公司 一种视频处理方法、装置、电子设备和存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153882A (zh) * 2017-12-26 2018-06-12 中兴通讯股份有限公司 一种数据处理方法及装置
CN112181938A (zh) * 2019-07-05 2021-01-05 杭州海康威视数字技术股份有限公司 数据库清理方法、装置和计算机可读存储介质
CN110413603B (zh) * 2019-08-06 2023-02-24 北京字节跳动网络技术有限公司 重复数据的确定方法、装置、电子设备及计算机存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106034240A (zh) * 2015-03-13 2016-10-19 小米科技有限责任公司 视频检测方法及装置
CN107155138A (zh) * 2017-06-06 2017-09-12 深圳Tcl数字技术有限公司 视频播放跳转方法、设备及计算机可读存储介质
CN108153882A (zh) * 2017-12-26 2018-06-12 中兴通讯股份有限公司 一种数据处理方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030093790A1 (en) * 2000-03-28 2003-05-15 Logan James D. Audio and video program recording, editing and playback systems using metadata
US7461392B2 (en) * 2002-07-01 2008-12-02 Microsoft Corporation System and method for identifying and segmenting repeating media objects embedded in a stream
WO2004004351A1 (en) * 2002-07-01 2004-01-08 Microsoft Corporation A system and method for providing user control over repeating objects embedded in a stream
US7788696B2 (en) * 2003-10-15 2010-08-31 Microsoft Corporation Inferring information about media stream objects
US8611422B1 (en) * 2007-06-19 2013-12-17 Google Inc. Endpoint based video fingerprinting
US8364671B1 (en) * 2009-02-23 2013-01-29 Mefeedia, Inc. Method and device for ranking video embeds
US8930980B2 (en) * 2010-05-27 2015-01-06 Cognitive Networks, Inc. Systems and methods for real-time television ad detection using an automated content recognition database
CN101650740B (zh) * 2009-08-27 2011-09-21 中国科学技术大学 一种电视广告检测方法及装置
CN106484837B (zh) * 2016-09-30 2020-08-04 腾讯科技(北京)有限公司 相似视频文件的检测方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106034240A (zh) * 2015-03-13 2016-10-19 小米科技有限责任公司 视频检测方法及装置
CN107155138A (zh) * 2017-06-06 2017-09-12 深圳Tcl数字技术有限公司 视频播放跳转方法、设备及计算机可读存储介质
CN108153882A (zh) * 2017-12-26 2018-06-12 中兴通讯股份有限公司 一种数据处理方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIU HAIBO, ET AL: "Near-Duplicate video retrieval and copy video detection", CHINA MASTER'S THESES FULL-TEXT DATABASE, INFORMATION TECHNOLOGY, 15 August 2015 (2015-08-15), pages 1 - 53, XP009522079, ISSN: 1674-0246 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810737A (zh) * 2021-09-30 2021-12-17 深圳市雷鸟网络传媒有限公司 一种视频处理方法、装置、电子设备和存储介质
CN113810737B (zh) * 2021-09-30 2024-03-12 深圳市雷鸟网络传媒有限公司 一种视频处理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
EP3745727A4 (en) 2021-07-21
EP3745727A1 (en) 2020-12-02
CN108153882A (zh) 2018-06-12

Similar Documents

Publication Publication Date Title
WO2019128724A1 (zh) 数据处理方法及装置
US9888279B2 (en) Content based video content segmentation
CN106686404B (zh) 一种视频分析平台、匹配方法、精准投放广告方法及系统
CN111460219B (zh) 视频处理方法及装置、短视频平台
US10026446B2 (en) Intelligent playback method for video records based on a motion information and apparatus thereof
KR102197098B1 (ko) 콘텐츠 추천 방법 및 장치
JP5135024B2 (ja) コンテンツのシーン出現を通知する装置、方法およびプログラム
US9813784B1 (en) Expanded previously on segments
US9215496B1 (en) Determining the location of a point of interest in a media stream that includes caption data
JP2008148077A (ja) 動画再生装置
CN1977262A (zh) 用于赶上正在播放的广播或存储内容的方法和设备
Dumont et al. Automatic story segmentation for tv news video using multiple modalities
CN112699787B (zh) 一种广告插入时间点的检测方法及装置
JP2011504034A (ja) オーディオビジュアル信号における意味的なまとまりの開始点を決定する方法
US10795932B2 (en) Method and apparatus for generating title and keyframe of video
CN114143575A (zh) 视频剪辑方法、装置、计算设备及存储介质
CN114117120A (zh) 基于内容分析的视频文件智能索引生成系统及方法
CN114845149B (zh) 视频片段的剪辑方法、视频推荐方法、装置、设备及介质
CN110795597A (zh) 视频关键字确定、视频检索方法及装置、存储介质、终端
CN113347489A (zh) 视频片段检测方法、装置、设备及存储介质
CN113423014A (zh) 一种播放信息的推送方法、装置、终端设备及存储介质
Tsao et al. Thumbnail image selection for VOD services
CN113012723B (zh) 多媒体文件播放方法、装置、电子设备
CN116017088A (zh) 视频字幕处理方法、装置、电子设备和存储介质
CN115080792A (zh) 一种视频关联方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18897396

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018897396

Country of ref document: EP

Effective date: 20200727