CN110740389A - Video positioning method and device, computer readable medium and electronic equipment - Google Patents

Video positioning method and device, computer readable medium and electronic equipment Download PDF

Info

Publication number
CN110740389A
CN110740389A CN201911046300.2A CN201911046300A CN110740389A CN 110740389 A CN110740389 A CN 110740389A CN 201911046300 A CN201911046300 A CN 201911046300A CN 110740389 A CN110740389 A CN 110740389A
Authority
CN
China
Prior art keywords
video
target
video content
positioning
plot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911046300.2A
Other languages
Chinese (zh)
Other versions
CN110740389B (en
Inventor
陈姿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911046300.2A priority Critical patent/CN110740389B/en
Publication of CN110740389A publication Critical patent/CN110740389A/en
Application granted granted Critical
Publication of CN110740389B publication Critical patent/CN110740389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides video positioning methods, devices, computer readable media and electronic equipment, wherein the video positioning method comprises the steps of obtaining scenario positioning data, determining video content tags matched with the scenario positioning data according to a preset video content tag library, and positioning playing time points in target videos according to the video content tags so as to play the target videos from the playing time points in the target videos.

Description

Video positioning method and device, computer readable medium and electronic equipment
Technical Field
The present application relates to the field of computer and communication technologies, and in particular, to video positioning methods and apparatuses, a computer-readable medium, and an electronic device.
Background
Currently, when a user wants to watch a certain scenario in a video while watching the video, generally locates the desired scenario position by searching for a scenario introduction or playing the video manually, which requires the user to spend a lot of time on manual search and is difficult to locate the desired scenario position quickly.
Disclosure of Invention
The embodiment of the application provides video positioning methods, devices, computer readable media and electronic equipment, so that the time for a user to perform manual search can be reduced to at least degree, and a plot position which the user wants to watch can be quickly positioned.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to aspects of the embodiment of the application, video positioning methods are provided and comprise the steps of obtaining scenario positioning data, determining video content labels matched with the scenario positioning data according to a preset video content label library, and positioning playing time points in target videos according to the video content labels so that the target videos are played from the playing time points in the target videos.
According to aspects of the embodiment of the application, video positioning methods are provided and include displaying a plot search interface, displaying input plot positioning data in response to an input operation on the plot search interface, and displaying a video playing interface in response to a plot positioning instruction aiming at the plot positioning data, wherein a target video starts to be played at a positioned playing time point in the video playing interface, and the playing time point is a time point positioned in the target video according to the plot positioning data.
According to aspects of the embodiment of the application, video positioning devices are provided, and include a th obtaining unit, a matching unit and a positioning unit, wherein the th obtaining unit is used for obtaining scenario positioning data, the matching unit is used for determining a video content label matched with the scenario positioning data according to a preset video content label library, and the positioning unit is used for positioning a playing time point in a target video according to the video content label so as to play the target video from the playing time point in the target video.
In embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes a second obtaining unit configured to obtain a pre-processed video, an identifying unit configured to identify a target object included in the pre-processed video and a target playing time point at which the target object appears in the pre-processed video, and a generating unit configured to generate a video content tag according to identification information of the target object and store the generated video content tag in association with the target playing time point.
In embodiments of the present application, based on the foregoing solution, the identification unit is configured to match target playing time points at which the respective target characters appear in the pre-processed video based on features of the respective target characters if the target objects include target characters, match target playing time points at which the respective target backgrounds appear in the pre-processed video based on features of the respective target backgrounds if the target objects include target backgrounds, and match target playing time points at which the respective target actions appear in the pre-processed video based on features of the respective target actions if the target objects include target actions.
In embodiments of the present application, based on the foregoing solution, if the target object includes objects of a target person, a target background, and a target action, the generating unit is configured to generate a video content tag according to the identification information of the objects, and store the generated video content tag in association with a corresponding target playing time point of the objects appearing in the preprocessed video.
In embodiments of the present application, based on the foregoing solution, if the target object includes at least two objects of a target person, a target background, and a target human motion, the generating unit is configured to determine a playing time point at which the at least two objects appear in the pre-processed video at the same time, generate a video content tag according to identification information of the at least two objects appearing in the pre-processed video at the same time, and store the generated video content tag in association with the playing time point at which the at least two objects appear at the same time.
In embodiments of the application, based on the foregoing scheme, the video positioning apparatus further includes a third obtaining unit configured to obtain video search data including a video screenshot and corpus data obtained by searching the preprocessed video in a video library search engine, an extracting unit configured to extract a keyword in the corpus data and determine a playing time point corresponding to the video screenshot in the preprocessed video, and a storage unit configured to generate a video content tag according to the keyword and store the generated video content tag in association with the playing time point corresponding to the video screenshot in the preprocessed video.
In embodiments of the present application, based on the foregoing scheme, the matching unit is configured to perform word segmentation on the scenario positioning data to obtain a word segmentation result, determine the number of words matched with each video content tag in the video content tag library, determine the similarity between each video content tag and the scenario positioning data according to the number of words matched with each video content tag and the word segmentation result, and determine the video content tag matched with the scenario positioning data according to the similarity.
In embodiments of the present application, based on the foregoing solution, the matching unit is configured to calculate a ratio of the number of words matched with the segmentation result to the number of words matched with the segmentation result according to the number of words matched with the segmentation result, and determine a similarity between each video content tag and the plot positioning data according to the ratio.
In embodiments of the present application, based on the foregoing solution, the matching unit is configured to determine the similarity between each video content tag and the scenario positioning data according to the number of words matched by each video content tag and the word segmentation result, and the correspondence between the number of words and the similarity.
In embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes an execution unit configured to determine a video content tag library for matching the scenario positioning data in response to a search range request for the scenario positioning data, or configured to determine a video content tag library for matching the scenario positioning data from among the plurality of preset video content tag libraries according to video content information indicated by the scenario positioning data.
In embodiments of the present application, based on the foregoing scheme, the matching unit is configured to convert the voice data to obtain converted text data, and determine a video content tag matched with the converted text data according to a preset video content tag library.
In embodiments of the present application, based on the foregoing solution, if the video positioning device is built in a video client, the video positioning device further includes a playing unit configured to obtain the target video and play the target video from the play time point in the target video, and if the video positioning device is built in a video server, the positioning device further includes a pushing unit configured to push the identification information of the target video and the play time point located in the target video to the video client, so that the video client obtains the target video according to the identification information of the target video and plays the target video from the play time point in the target video.
According to aspects of the embodiment of the application, video positioning devices are provided, and comprise a th display unit, a second display unit and a second playing unit, wherein the th display unit is used for displaying a plot searching interface, the second display unit is used for responding to input operation on the plot searching interface and displaying input plot positioning data, and the second playing unit is used for responding to a plot positioning instruction aiming at the plot positioning data and displaying a video playing interface, target videos are played in the video playing interface at the positioned playing time points, and the playing time points are time points positioned in the target videos according to the plot positioning data.
According to aspects of embodiments of the present application, there are provided computer readable media having stored thereon a computer program which, when executed by a processor, implements the video positioning method as described in the embodiments above.
According to aspects of embodiments of the application, there is provided electronic device comprising or more processors and storage means for storing or more programs, which when executed by the or more processors cause the or more processors to implement the video positioning method as described in the above embodiments.
According to the technical scheme provided by the embodiments of the application, the plot positioning data describing the plot is input into the plot in the video that the user wants to watch, the video content tag matched with the plot positioning data is determined according to the preset video content tag library, and then the playing time point is positioned in the target video according to the matched video content tag, so that the plot position in the video that the user wants to watch can be quickly positioned according to the plot positioning data input by the user, the search response speed of the system is improved, the user does not need to position the plot position that the user wants to watch in a manner of manually searching the plot introduction and manually playing the video, the manual searching time of the user can be further reduced to degree, and the watching experience of the user is improved.
It is to be understood that both the foregoing -general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated herein and constitute part of this specification, illustrate embodiments consistent with the present application and, together with the description , serve to explain the principles of the application, it is to be understood that the drawings in the following description are illustrative of only embodiments of the application and that other drawings may be devised by those skilled in the art without the benefit of the teachings herein, and in which:
fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
Fig. 2 shows a flow diagram of a video positioning method according to embodiments of the present application.
Fig. 3 shows a flow chart of a video positioning method according to embodiments of the present application.
Fig. 4 shows a flow chart of face image recognition in embodiments according to the application.
Figure 5 shows a motion recognition schematic block diagram of the character motion recognition model in embodiments according to the present application.
Fig. 6 shows a detailed flowchart of step S330 of the video positioning method according to embodiments of the present application.
Fig. 7 shows a flow chart of a video positioning method according to embodiments of the present application.
Fig. 8 shows a detailed flowchart of step S220 of the video positioning method according to embodiments of the present application.
Fig. 9 shows a detailed flowchart of step S220 of the video positioning method according to embodiments of the present application.
Fig. 10 shows a detailed flowchart of step S530 of the video positioning method according to embodiments of the present application.
Fig. 11A shows a flow diagram of a video positioning method according to embodiments of the present application.
Fig. 11B shows a flow diagram of a video positioning method according to embodiments of the present application.
Fig. 12 shows a flow diagram of voice storyline positioning according to embodiments of the present application.
Fig. 13 shows a flow diagram of voice storyline positioning according to embodiments of the present application.
Fig. 14 shows a flow diagram of voice storyline positioning according to embodiments of the present application.
Fig. 15 shows a flow diagram of a video positioning method according to embodiments of the present application.
Fig. 16 shows a plot search interface diagram according to embodiments of the present application.
Fig. 17 shows a video playback interface diagram according to embodiments of the present application.
Fig. 18 shows a video playback interface diagram according to embodiments of the present application.
FIG. 19 shows a block diagram of a video positioning device according to embodiments of the present application.
FIG. 20 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in or more embodiments in the following description, however, it will be recognized by one skilled in the art that the present teachings may be practiced without the or more of the specific details, or with other methods, components, devices, steps, etc. in other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the present disclosure.
The block diagrams shown in the figures are only functional entities and do not necessarily correspond to physically separate entities -that is, the functional entities may be implemented in software, or in hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
As shown in FIG. 1, the system architecture 100 may include a video client (e.g., of the smart phone 101, tablet computer 102, and portable computer 103 shown in FIG. 1, and certainly a desktop computer, etc.), a network 104, and a video server 105. the network 104 is the medium used to provide a communication link between the video client and the video server 105. the network 104 may include various connection types, such as wired communication links, wireless communication links, etc.
It should be understood that the number of video clients, networks, and video servers in fig. 1 is merely illustrative. There may be any number of video clients, networks, and video servers, as desired for an implementation. For example, the video server 105 may be a server cluster composed of a plurality of servers, or the like.
For example, a user can upload scenario positioning data to the video server 105 through the video client 103, wherein the scenario positioning data is data describing a certain scenario in a video desired to be watched, the video server 105 determines a video content tag matching the scenario positioning data according to a preset video content tag library, and positions a playing time point in a target video according to the matching video content tag, so that the video server 105 can push the playing time point positioned in the target video to the video client 103, the video client 103 plays the target video from the playing time point in the target video, thereby realizing that a scenario position in the video desired to be watched by the user can be quickly positioned according to the scenario positioning data input by the user, improving the search response speed of the system, and enabling the user to position the scenario position in the video desired to be watched by the user in a searching and manual playing manner, thereby reducing the viewing experience of the user .
It should be noted that the video positioning method provided in the embodiment of the present application is generally executed by the video server 105, and accordingly, the video positioning apparatus is generally disposed in the video server 105, however, in other embodiments of the present application, the video client 103 (which may also be the video client 101 or 102) may also have a similar function as the video server 105, so as to execute the solution of the video positioning method provided in the embodiment of the present application, for example, a user inputs scenario positioning data at the video client 103, the video client 103 determines a video content tag matching the scenario positioning data according to a preset video content tag library, and then locates a playing time point in a target video according to the matched video content tag, thereby enabling the target video to be played from the playing time point in the target video.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 2 shows a flow chart of a video positioning method according to embodiments of the present application, which may be performed by a server, which may be the video server 105 shown in fig. 1, the video positioning method comprises at least steps S210 to S230, which are described in detail below, with reference to fig. 2.
In step S210, scenario positioning data is acquired.
The plot positioning data is data describing a plot when the user wants to watch the plot in the video, and the data may include kinds or more of information such as character identification information, background information, and character actions, for example, when the user needs to watch the plot that the grand Wu kong is pressed under the Wu Zhi shan in the western script, the plot positioning data for plot positioning may be input as "when the grand Wu kong is pressed under the Wu Zhishan".
The scenario positioning data can be voice data or text data, and when the scenario positioning data is the voice data, a user can input voice as the scenario positioning data through a voice device in the video client; when the plot positioning data is text data, the user can input the text as the plot positioning data through the entity keys or the virtual keys in the video client. After receiving the plot positioning data input by the user, the video client uploads the received plot positioning data to the video server.
In step S220, a video content tag matching the scenario positioning data is determined according to a preset video content tag library.
The video content tag library comprises video content tags for performing scenario description on scenarios in videos, wherein the video content tags specifically comprise texts for describing character identification information, background information and character action information appearing in a certain section of scenarios in the videos and are used for matching with the scenario positioning data so as to position the scenario positions in the videos which the user wants to watch according to the matched video content tags.
Referring to fig. 3, fig. 3 shows a flowchart of embodiments of the video positioning method according to the present application, and the video positioning method may further include steps S310 to S330, which are described in detail below.
In step S310, a pre-processed video is acquired.
The preprocessed video serves as a video needing to add a plot content tag to a specific plot in the video, and may be some specific videos in a video library or all videos in the video library, for example.
In step S320, a target object included in the preprocessed video and a target playing time point of the target object appearing in the preprocessed video are identified.
The target object refers to object information, such as character identification information, background information or character action information, in the preprocessed video, which can describe a scenario in the video. In order to be able to locate a scenario in a video by a target object, it is therefore necessary to determine a target playing time point of the target object appearing in the preprocessed video, thereby facilitating accurate location of the scenario in the video according to the target object.
In embodiments of the present application, the step S320 specifically includes matching, in the pre-processed video, target playing time points at which the respective target characters appear based on features of the respective target characters if the target objects include target characters, matching, in the pre-processed video, target playing time points at which the respective target backgrounds appear based on features of the respective target backgrounds if the target objects include target backgrounds, and matching, in the pre-processed video, target playing time points at which the respective target actions appear based on features of the respective target actions if the target objects include target actions.
When the target object is a target person to be matched, when each target person included in the pre-processed video and a target playing time point of each target person appearing in the pre-processed video are identified, the characteristics of each target person to be matched can be obtained first, and the target playing time point of each target person appearing in the pre-processed video is matched according to the characteristics of each target person.
Referring to fig. 4, fig. 4 shows a flow chart of face image recognition in embodiments according to the present application, the face recognition model is obtained by training a machine learning model through training sample data, wherein the machine learning model may be a CNN (Convolutional Neural Network) model, or may also be a deep Neural Network model, etc. during training.
In the process of training a machine learning model to obtain a face recognition model, acquiring a face image and a face label known to the face image as training sample data, and inputting the training sample data into the machine learning model for training, wherein the specific training process comprises the steps of preprocessing the face image in the training sample data to obtain a face picture for face region recognition and cutting, recognizing and cutting the face region in the face picture to obtain a face region image, extracting face image features according to the face region image, and outputting the face label according to the face image features.
After the face recognition model is obtained, a target face image to be matched can be obtained first, the preprocessed video is input into the face recognition model, so that the target face image contained in the preprocessed video and the playing time point of the target face image appearing in the preprocessed video are determined, and then the target person contained in the preprocessed video is recognized, so that a video content label for performing plot description on a plot in the video can be generated according to the identification information of the target person.
When a target object is a target background which needs to be matched, when each target background contained in a preprocessed video and a target playing time point of each target background appearing in the preprocessed video are recognized, the characteristics of each target background which needs to be matched can be obtained first, and the target playing time point of each target background appearing in the preprocessed video is matched according to the characteristics of each target background.
The background recognition model is also obtained by training a machine learning model through training sample data, the machine learning model during training may also be a CNN model or a deep neural network model, and it should be noted that the training processes of the background recognition model and the face recognition model are different from each other in terms of training objects, and reference is also made to the training process of the face recognition model in the specific training process, which is not described herein again.
When the target object is a target action which needs to be matched, when each target action contained in the preprocessed video and a target playing time point of each target action appearing in the preprocessed video are identified, the characteristics of the target action which needs to be matched can be obtained first, and the target playing time point of each target action appearing in the preprocessed video is matched according to the characteristics of each target action. In this embodiment, the target action may be a character action occurring in a certain scenario in the preprocessed video, and the feature of the target action refers to a character action tag corresponding to the character action. Specifically, the preprocessed video may be input into a preset character motion recognition model, and the character motion recognition model matches the character motion labels corresponding to the target motions according to needs, and matches the target playing time points of the target motions in the preprocessed video.
The character motion recognition model is also obtained by training a machine learning model through training sample data, and the machine learning model during training may be a CNN model. In the process of training the machine learning model to obtain the character action recognition model, the machine learning model can be trained by acquiring a motion target video containing character actions and a human action label with known character actions as training samples.
Referring to fig. 5, fig. 5 shows a schematic block diagram of motion recognition of a character motion recognition model, and as can be seen from fig. 5, the process of training the character motion recognition model includes extracting feature characterization information corresponding to a character motion included in a moving target video, where the feature characterization information is a multidimensional vector representing the human motion, performing motion recognition and understanding according to the feature characterization information corresponding to the character motion, and outputting a character motion tag.
After the character action recognition model is obtained, the target character action to be matched can be obtained firstly, the preprocessed video is input into the character action recognition model, so that the target character action contained in the preprocessed video and the playing time point of the target character action appearing in the preprocessed video are determined, the target character action contained in the preprocessed video is further recognized, and a video content label for performing plot description on a plot in the video is generated according to the identification information of the target character action.
Referring to fig. 3 again, in step S330, a video content tag is generated according to the identification information of the target object, and the generated video content tag is stored in association with the target playing time point.
The identification information of the target object is used as information for uniquely identifying the target object, and may be identification information such as name or character name of the target person when the target object is the target person, a background name when the target object is the target background, and a character action tag corresponding to the target action when the target object is the target action.
The video content label is generated according to the identification information of the target object for carrying out plot description on the plot in the video, and the generated video content label is stored in association with the target playing time point, so that the playing time point can be positioned according to the video content label, and the video can be conveniently and directly played at the plot position in the video.
In embodiments of the present application, if the target object includes objects of a target person, a target background and a target action, the step S330 specifically includes generating a video content tag according to the identification information of the objects, and storing the generated video content tag in association with a target playing time point at which the corresponding objects appear in the preprocessed video.
When the target objects only include objects in the target character, the target background and the target action, the video content tags can be generated according to the identification information corresponding to objects, and the generated video content tags and the target playing time points of objects appearing in the preprocessed video are stored in an associated manner, so that the plot positions of the objects appearing in the video, such as all plot positions of a certain character appearing in the video, can be conveniently and accurately found according to objects, and the personalized plot positioning requirement of the user can be further met.
Referring to fig. 6, fig. 6 shows a detailed flowchart of step S330 of a video positioning method according to embodiments of the present application, in which if the target object includes at least two objects of a target person, a target background and a target human body action, step S330 may include steps S3301 to S3302, which are described in detail below.
In step S3301, a play time point at which the at least two objects appear simultaneously in the preprocessed video is determined.
If the target object includes at least two objects of the target person, the target background, and the target human body motion, it is necessary to determine a playing time point at which the at least two objects appear in the preprocessed video at the same time, so as to generate a video content tag according to the identification information of the at least two different objects appearing at the same time.
In step S3302, a video content tag is generated according to the identification information of the at least two objects appearing simultaneously in the preprocessed video, and the generated video content tag is stored in association with the playing time point of the at least two objects appearing simultaneously.
When a video content tag is generated according to identification information of at least two objects appearing in a preprocessed video at the same time, the identification information corresponding to different objects can be connected in series according to a preset arrangement sequence, wherein the arrangement sequence can be that the identification information is connected in series according to an arrangement sequence of a target person, a target background and a target human body action; for example, in the video of tatannik, for a specific scenario of jack in hugs on a ship, a video content tag may be generated according to identification information of a target person, a target background, and a target human body action, and the generated video content tag is "jack and ross hug on a ship", although the preset arrangement sequence may also be other arrangement sequences, which is not limited herein.
In the technical solution of the embodiment shown in fig. 6, for a video content tag formed by identification information of at least two different target objects, a scenario in a video can be described in more detail, so that the scenario can be more accurately matched with scenario positioning data, and the accuracy of positioning a scenario position that a user wants to watch according to the scenario positioning data is further improved.
Referring to fig. 7, fig. 7 shows a flowchart of a video positioning method according to embodiments of the present application, and in this embodiment, the video positioning method may further include steps S410 to S430, which are described in detail below.
In step S410, video search data is obtained, where the video search data includes a video screenshot and corpus data obtained by searching the preprocessed video in a video library search engine.
When the video content tag corresponding to the preprocessed video is generated, video search data can be obtained by searching the preprocessed video in a video library search engine, wherein the video search data is derived from forum data or comment area data set for the preprocessed video. The video search data includes a video screenshot obtained after searching a pre-processed video in a video library and corpus data, the video screenshot may be a video screen screenshot of a certain playing time point captured by a user on the pre-processed video, and the corpus data may be description information describing the video screenshot, such as information on relevant comments on the video screenshot.
In step S420, extracting the keywords in the corpus data, and determining a playing time point corresponding to the video screenshot in the preprocessed video.
In order to obtain a text for marking the plot content corresponding to the video screenshot according to the corpus data in the video search data, keywords in the corpus data can be extracted, wherein the types of the keywords in the corpus data can include a character name, a character role name, a background name, a character action tag and the like, so that a text for plot description of the plot content reflected by the video screenshot can be generated according to the keywords. In order to determine the video scenario position corresponding to the corpus data according to the video screenshot in the video search data, the playing time point of the video screenshot in the preprocessed video can be determined, so that the scenario position corresponding to the video screenshot can be conveniently positioned.
In step S430, a video content tag is generated according to the keyword, and the generated video content tag and the corresponding playing time point of the video screenshot in the preprocessed video are stored in an associated manner.
When the video content tags are generated according to the keywords, different types of keywords may be connected in series according to a preset arrangement sequence, for example, the video content tags may be generated by connecting in series according to the sequence of the names, the role names, the backgrounds and the actions of the people.
And for the video content label generated according to the keyword, the video content label and the corresponding playing time point of the video screenshot in the preprocessed video are stored in an associated manner, so that the corresponding plot position can be accurately positioned according to the video label.
In the technical scheme of the embodiment shown in fig. 7, by acquiring the video search data from the video library search engine and generating the video content tags at the specific scenarios in the preprocessed video according to the video search data, the video content tags contained in the video content tag library are greatly enriched, so that the corresponding video content tags can be more easily matched according to the scenario positioning data, and the accuracy of positioning the scenario positions that the user wants to watch according to the scenario positioning data is improved in degree.
In embodiments of the present application, when generating a video content tag corresponding to a pre-processed video, the video content tag may also be generated according to other corpus data related to the pre-processed video, for example, the video content tag may be generated according to speech data appearing in the pre-processed video or barrage data corresponding to the pre-processed video.
Specifically, when the playing time point of the target speech appearing in the preprocessed video is determined, an OCR (Optical Character Recognition) method can be adopted to identify the speech in the preprocessed video, compare the identified speech with the target speech to determine the speech caused by the target speech , further determine the target speech appearing in the preprocessed video and the playing time point of the target speech appearing in the preprocessed video, extract keywords in the target speech, and generate the target content label according to the extracted keywords and the target content of the target speech, and store the target content label.
In the process of watching a video, when a user watches classical shots or a specific scenario, the user sends target barrages related to the scenario content, so that the playing time point of the target barrage in the preprocessed video can be determined according to the target barrage needing to be identified in the preprocessed video, a video content label is generated according to the target barrage, and the generated video content label and the playing time point of the target barrage are stored in an associated manner.
Generating video content tags according to the lines data appearing in the pre-processed video or the barrage data corresponding to the pre-processed video, steps can further enrich the video content tags contained in the video content tag library, so that the corresponding video content tags can be more easily matched according to the plot positioning data, and steps can further improve the accuracy of positioning the plot position which the user wants to watch according to the plot positioning data.
Referring to fig. 8, fig. 8 is a detailed flowchart illustrating step S220 of a video positioning method according to embodiments of the present application, where if the scenario positioning data is voice data, the step S220 includes steps S2201 to S2202, which are described in detail below.
In step S2201, the voice data is converted to obtain converted text data.
When the scenario positioning data is voice data, in order to enable the voice data to be subjected to similarity matching with the video content tag in the text form, the voice data needs to be converted into text data so as to determine the video content tag matched with the scenario positioning data. Specifically, speech recognition may be performed on the speech data through a natural language processing technique, and the converted text data may be obtained according to the speech recognition.
In step S2202, a video content tag matching the converted text data is determined from a preset video content tag library.
After the converted text data is obtained, matching each video content tag contained in the video content tag library with the converted text data to determine the video content tag matched with the converted text data.
Referring to fig. 9, fig. 9 shows a detailed flowchart of step S220 of the video positioning method according to embodiments of the present application, and in this embodiment, the step S220 may include step S510 to step S540, which are described in detail below.
In step S510, the scenario positioning data is segmented to obtain a segmentation result.
If the scenario positioning data is text data, word segmentation processing can be directly performed on the text data to obtain word segmentation results, so that all words in the scenario positioning data can be obtained, if the scenario positioning data is voice data, the voice data needs to be converted into text data, wherein word segmentation processing can be performed on the scenario positioning data by adopting a crust word segmentation method, and other word segmentation methods can be adopted without limitation.
In step S520, the number of words of each video content tag in the video content tag library that matches the word segmentation result is determined.
The word segmentation result obtained by segmenting the scenario positioning data can be matched with each video content tag in the video content tag library respectively to determine the number of words which can be successfully matched, wherein the more the number of the matched words is, the higher the matching degree of the scenario positioning data and the video content tags is, and otherwise, the less the number of the matched words is, the lower the matching degree of the scenario positioning data and the video content tags is.
In step S530, determining a similarity between each video content tag and the scenario positioning data according to the number of words matched between each video content tag and the word segmentation result.
When determining whether the video content tags in the video content tag library are matched with the plot positioning data, determining the similarity between each video content tag in the video content tag library and the plot positioning data according to the number of matched vocabularies, wherein the similarity between the number of matched vocabularies and the plot positioning data is a positive correlation relationship, in other words, the more the number of matched vocabularies is, the higher the similarity is.
Referring to fig. 10, fig. 10 shows a detailed flowchart of step S530 of the video positioning method according to embodiments of the present application, and the step S530 may include steps S5301 to S5302, which are described in detail below.
In step S5301, a ratio between the number of words matched with the video content tags and the word segmentation result and the number of words of each video content tag is calculated according to the number of words matched with the word segmentation result.
When determining the similarity between each video content tag and the scenario positioning data according to the number of the vocabularies matched with each video content tag and the word segmentation result, calculating the ratio of the number of the vocabularies matched with each video content tag and the word segmentation result to the number of the vocabularies of each video content tag, wherein the larger the corresponding ratio is, the higher the similarity between the video content tags and the scenario positioning data is, and otherwise, the smaller the corresponding ratio is, the lower the similarity between the video content tags and the scenario positioning data is.
In step S5302, a similarity between each video content tag and the scenario positioning data is determined according to the ratio.
The similarity between each video content label and the plot positioning data is determined through the ratio of the number of the vocabularies matched with the word segmentation result to the number of the vocabularies of each video content label, the video content label with high similarity to the plot positioning data can be accurately determined, and the accuracy of positioning the specific plot position in the video through the input plot positioning data is further improved.
In embodiments of the present application, the step S530 may further include determining a similarity between each video content tag and the scenario positioning data according to the number of words matched with the word segmentation result and a corresponding relationship between the number of words and the similarity.
When the similarity between each video content label and the plot positioning data is determined according to the number of the vocabularies matched with each video content label and the word segmentation result, the similarity between each video content label and the plot positioning data can be determined according to the number of the vocabularies matched with each video content label and the word segmentation result and the corresponding relation between the number of the vocabularies and the similarity, the similarity between each video content label in the video content label library and the plot positioning data can be directly determined through the number of the matched vocabularies, the influence on similarity calculation caused by the difference of the number of the vocabularies contained in different video content labels can be avoided, and the accuracy of the plot position positioning in the video through the input plot positioning data can be further improved.
Referring to fig. 9 again, in step S540, a video content tag matching the scenario positioning data is determined according to the similarity.
In this embodiment, the video content tag with the highest similarity to the scenario positioning data may be used as the matching video content tag. Of course, the video content tags with the similarity higher than the predetermined similarity threshold may also be used as the video content tags matched with the scenario positioning data, and the method is not limited herein.
Still referring to fig. 2, in step S230, a playing time point is located in a target video according to the video content tag, so that the target video is played from the playing time point in the target video.
And determining a playing time point in the target video according to the matched video content label so as to play the target video from the playing time point in the target video, thereby realizing that the plot position in the video which the user wants to watch is quickly positioned according to the plot positioning data input by the user.
When detecting that no video content tag matched with the scenario positioning data exists in the preset video content tag library, generating a notice that the scenario is not retrieved, so that the video client can perform voice playing or text display in an interface according to the notice that the scenario is not retrieved, so as to inform a user of a matching result, and prompt the user to input the corresponding scenario positioning data again.
It can be seen from the above that, the scenario positioning data describing the scenario is input in the video that the user wants to watch, the video content tag matched with the scenario positioning data is determined according to the preset video content tag library, and then the playing time point is positioned in the target video according to the matched video content tag, so that the scenario position in the video that the user wants to watch can be quickly positioned according to the scenario positioning data input by the user, the search response speed of the system is improved, the user does not need to position the scenario position that the user wants to watch in a manner of searching for the scenario introduction and manually playing the video, the time for the user to perform manual search can be further reduced to a certain extent of , and the watching experience of the user is improved.
Referring to fig. 11A, fig. 11A shows a flowchart of embodiments of the present application, where in this embodiment, if the video positioning method is executed by a video client, the video positioning method further includes step S240, which is described in detail as follows.
In step S240, the target video is obtained, and the target video is played from the playing time point in the target video.
When the video positioning method is executed by a video client, after positioning a playing time point in a target video according to a video content tag, the video client can directly acquire the target video according to identification information of the target video, wherein the identification information can be Uniform Resource Identifier (URI, system Resource Identifier) information, and after acquiring the target video, the target video can be played from the playing time point in the target video without user operation, so that user experience is good.
Referring to fig. 11B, fig. 11B is a flowchart illustrating a video positioning method according to embodiments of the present application, where in this embodiment, if the video playing method is executed by a video server, the video playing method further includes step S241, which is described in detail below.
In step S241, the identification information of the target video and the playing time point located in the target video are pushed to a video client, so that the video client obtains the target video according to the identification information of the target video, and starts to play the target video from the playing time point in the target video.
When the video positioning method is executed by the video server, after the video client locates the playing time point in the target video according to the video content tag, the URI information of the target video and the playing time point located in the target video need to be pushed to the video client, so that the video client obtains the target video according to the URI information of the target video and plays the target video from the playing time point in the target video.
In embodiments of the present application, the video playing method further includes:
in response to a search range request for the plot positioning data, determining a video content tag library for matching the plot positioning data; or according to the video content information indicated by the plot positioning data, determining a video content tag library used for matching the plot positioning data from the plurality of preset video content tag libraries.
When a user needs to perform accurate plot positioning on a part of videos in a video library, for example, plot positioning is performed only on a certain video, in order to reduce unnecessary workload, a search range request for determining a matching range of the plot positioning data may be input, where the search range request includes a video identifier for identifying the search range, the video identifier may be a video name, the request may be triggered by an entity key or a virtual key provided by the user at a video client, and the video client uploads the search range request input by the user to a video server.
In embodiments of the present application, the video server responds to a search range request uploaded by the video client, and determines a video content tag library for matching scenario positioning data from a plurality of preset video content tag libraries according to a video identifier included in the search range request and a corresponding relationship between the video identifier and the video content tag library, thereby avoiding matching all the video content tag libraries with the scenario positioning data, and while reducing the matching workload, improving the efficiency of finding a video content tag matched with the scenario positioning data.
If a character role name or a character name is included in the scenario positioning data, the character role name or the character name can be used as the video content information indicated by the scenario positioning data, the video server can determine a video related to the video content information according to the indicated video content information, such as a video containing the character role, and determine the video content tag library corresponding to the video related to the video content information as the video content tag library matched with the scenario positioning data, so that the process can reduce the matching workload and does not need any operation of the user, thereby improving the viewing experience of the user.
The following describes a video positioning method in an embodiment of the present application by taking a scene of positioning a voice scenario as an example.
Referring to fig. 12, fig. 12 is a flowchart illustrating a voice scenario positioning according to embodiments of the present application, in this embodiment, when a user needs to watch a scenario of a certain main actor in a certain movie, a role name of the actor may be input as a scenario positioning voice described by the scenario to a video client, the video client uploads the scenario positioning voice to a video server, the video server performs semantic recognition on the scenario positioning voice to obtain a recognized text, and searches the recognized text according to a locally pre-stored video content tag library to determine whether a video content tag matching the recognized text exists in the pre-stored video content tag library.
Referring to fig. 13, fig. 13 shows a flowchart of embodiments of the voice storyline positioning according to the present application, which may include steps S1301 to S1306, described in detail below.
Step S1301, the user inputs scenario positioning voice in the video client.
Step 1302, the video background corresponding to the video client obtains the scenario positioning voice input by the user.
And step S1303, the video background sends the plot positioning voice to a voice server.
In step S1304, the speech server performs semantic recognition on the plot positioning speech to obtain a recognized text, and the recognized text is sent to the video tag server.
Step S1305, the video tag server matches the identified text according to a locally pre-stored video content tag library to determine a video content tag matched with the identified text, locates a playing time point of the matched video content tag, and feeds back the located playing time point in the target video to the video background.
In step S1306, the video client obtains the playing time point located in the target video received by the video background, displays the video playing interface, and starts playing the target video at the located playing time point in the video playing interface.
The video server may be a server cluster consisting of a voice server and a video tag server. The voice server is used for carrying out semantic recognition on the plot positioning voice by the video server to obtain a recognized text, and the voice server sends the recognized text to the video label server. And the video label server retrieves the identified text according to a local pre-stored video content label library to determine whether a video content label matched with the identified text exists in the preset video content label library or not, and feeds back a matching result to the video client.
Referring to fig. 14, fig. 14 shows a flowchart of positioning of a voice scenario according to embodiments of the present application, and as can be seen from fig. 14, a video server generates a video content tag library according to steps S1401 to S1404, which is described in detail below.
In step S1401, the operator uploads the preprocessed movie video to the video server, and uploads the face images of the main actors in the movie video to the video server.
In step S1402, the video server determines coordinates at which the face image of the starring actor appears and a time point at which the face image of the starring actor appears in the movie video through the face recognition model.
In step S1403, the time point when the facial image of the main actor appears, the coordinates when the facial image of the main actor appears, and the role name of the main actor in the movie video are determined to be stored in association, so as to facilitate searching for the scenario position where the actor appears in the movie according to the role name of a certain actor.
Still referring to fig. 12, if the video server detects that a video content tag matching the identified text exists in the preset video content tag library, the playing time point is determined according to the matched video content tag, and the movie and the determined playing time point in the movie are fed back to the video client as a result, so that the video client displays a video playing interface, and starts playing the movie at the located playing time point in the movie in the video playing interface, thereby quickly locating the position of the movie to be watched. If the video server does not detect that the video content tags matched with the recognized texts exist in the preset video content tag library, a notification of the unrequired scenario can be generated, so that the video client can perform voice playing according to the notification of the unrequired scenario.
It can be seen from the above that, when a user wants to watch a scenario of a certain lead actor in a certain movie, the role name of the lead actor can be input in a voice manner as scenario positioning voice, and according to the video content tag library corresponding to the movie, the role name video content tag of the lead actor is determined, and then the play time point of the lead actor is positioned in the movie according to the matched video content tag, so that the scenario of the lead actor appearing in the movie is quickly positioned according to the role name of the lead actor, so that the user no longer needs to position the scenario position to be watched by searching the scenario introduction and manually playing the video, the time for the user to perform manual search can be reduced to a certain extent of , and the watching experience of the user is improved.
Referring to fig. 15, fig. 15 shows a flowchart of a video positioning method according to embodiments of the present application, where the main execution body of the video positioning method is a video client, and the video positioning method includes steps S1510 to S1530, which are described in detail below.
In step S1510, a scenario search interface is displayed.
The user can enter the plot search interface by clicking a virtual button entering the plot search interface in the video client, so that plot search data can be input to search for a specific plot position of a video to be watched.
In step S1520, the inputted scenario positioning data is displayed in response to the input operation at the scenario search interface.
Referring to fig. 16, fig. 16 shows a plot search interface diagram according to embodiments of the present application, a user may input a corresponding search text in an input box 1601 in the plot search interface to perform an input operation, and the input box 1601 displays plot positioning data input by the user.
In step S1530, in response to the scenario positioning instruction for the scenario positioning data, a video playing interface is displayed, in which a target video starts to be played at a positioned playing time point, which is a time point positioned in the target video according to the scenario positioning data.
Referring to fig. 16, after the scenario positioning data is input in the input box 1601, the user may trigger a scenario positioning instruction by clicking a virtual button 1602 in the scenario search interface, so as to enable the video client to perform scenario search.
The method comprises the steps of determining video content tags matched with scenario positioning data according to a preset video content tag library when a video client carries out scenario search, and then positioning playing time points in a target video according to the matched video content tags, wherein the video content tag library comprises video content tags for describing scenarios in videos, and the video content tags specifically comprise texts for describing character identification information, background information and character action information appearing in a certain section of the scenarios in the videos and are used for being matched with the scenario positioning data so as to be positioned at scenario positions in the videos which a user wants to watch according to the matched video content tags.
The video client determines to position the playing time point in the target video according to the matched video content label so as to play the target video from the playing time point in the target video, and the situation position in the video which the user wants to watch is quickly positioned according to the situation positioning data input by the user.
After the video client locates the corresponding playing time point in the target video according to the incidence relation between the video content label and the playing time point, the video client displays a video playing interface in the video client, and the target video is played at the located playing time point in the video playing interface.
Referring to fig. 17 and 18, fig. 17 and 18 show video playing interface diagrams according to embodiments of the present application, as shown in fig. 17, after a video client locates a corresponding playing time point in a target video according to an association relationship between a video content tag and the playing time point, the video client displays a video playing interface for playing, and in this embodiment, the located playing time point is 30 minutes and 35 seconds.
The user can click the virtual play button 1701 in the video play interface to play the video, and then the target video starts to be played at the located play time point in the video play interface, that is, the target video starts to be played directly from the located play time point of 30 minutes and 35 seconds.
The plot positioning data which is used for describing the plot and is input by the plot in the video which the user wants to watch is determined according to the preset video content label library, the video content label matched with the plot positioning data is determined, the playing time point is positioned in the target video according to the matched video content label, the plot position in the video which the user wants to watch can be quickly positioned according to the plot positioning data input by the user, the searching response speed of the system is improved, the user does not need to position the plot position which the user wants to watch in a mode of manually searching the plot introduction and manually playing the video, the time of manually searching the user can be further reduced to the extent of , and the watching experience of the user is improved.
It should be noted that, in other embodiments, the video client may directly start playing the target video at the located playing time point in the video playing interface, so as to improve the playing efficiency, and the user does not need to click the virtual playing button 1701 in the video playing interface for manual playing, which may further reduce user operations and improve user experience.
The following describes embodiments of the apparatus of the present application, which can be used to perform the video positioning method in the above embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the video positioning method described above in the present application.
FIG. 19 shows a block diagram of a video positioning device according to embodiments of the present application.
Referring to fig. 19, a video positioning apparatus 1900 according to embodiments of the present application can be as in the video server or the video client shown in fig. 1, where the video positioning apparatus 1900 includes an obtaining unit 1910, a matching unit 1920, and a positioning unit 1930.
The system comprises an th acquisition unit 1910 for acquiring scenario positioning data, a matching unit 1920 for determining a video content tag matched with the scenario positioning data according to a preset video content tag library, and a positioning unit 1930 for positioning a playing time point in a target video according to the video content tag so as to play the target video from the playing time point in the target video.
In embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes a second obtaining unit configured to obtain a pre-processed video, an identifying unit configured to identify a target object included in the pre-processed video and a target playing time point at which the target object appears in the pre-processed video, and a generating unit configured to generate a video content tag according to identification information of the target object and store the generated video content tag in association with the target playing time point.
In embodiments of the present application, based on the foregoing solution, the identification unit is configured to match target playing time points at which the respective target characters appear in the pre-processed video based on features of the respective target characters if the target objects include target characters, match target playing time points at which the respective target backgrounds appear in the pre-processed video based on features of the respective target backgrounds if the target objects include target backgrounds, and match target playing time points at which the respective target actions appear in the pre-processed video based on features of the respective target actions if the target objects include target actions.
In embodiments of the present application, based on the foregoing solution, if the target object includes objects of a target person, a target background, and a target action, the generating unit is configured to generate a video content tag according to the identification information of the objects, and store the generated video content tag in association with a corresponding target playing time point of the objects appearing in the preprocessed video.
In embodiments of the present application, based on the foregoing solution, if the target object includes at least two objects of a target person, a target background, and a target human motion, the generating unit is configured to determine a playing time point at which the at least two objects appear in the pre-processed video at the same time, generate a video content tag according to identification information of the at least two objects appearing in the pre-processed video at the same time, and store the generated video content tag in association with the playing time point at which the at least two objects appear at the same time.
In embodiments of the application, based on the foregoing scheme, the video positioning apparatus further includes a third obtaining unit configured to obtain video search data including a video screenshot and corpus data obtained by searching the preprocessed video in a video library search engine, an extracting unit configured to extract a keyword in the corpus data and determine a playing time point corresponding to the video screenshot in the preprocessed video, and a storage unit configured to generate a video content tag according to the keyword and store the generated video content tag in association with the playing time point corresponding to the video screenshot in the preprocessed video.
In embodiments of the present application, based on the foregoing scheme, the matching unit is configured to perform word segmentation on the scenario positioning data to obtain a word segmentation result, determine the number of words matched with each video content tag in the video content tag library, determine the similarity between each video content tag and the scenario positioning data according to the number of words matched with each video content tag and the word segmentation result, and determine the video content tag matched with the scenario positioning data according to the similarity.
In embodiments of the present application, based on the foregoing solution, the matching unit is configured to calculate a ratio of the number of words matched with the segmentation result to the number of words matched with the segmentation result according to the number of words matched with the segmentation result, and determine a similarity between each video content tag and the plot positioning data according to the ratio.
In embodiments of the present application, based on the foregoing solution, the matching unit is configured to determine the similarity between each video content tag and the scenario positioning data according to the number of words matched by each video content tag and the word segmentation result, and the correspondence between the number of words and the similarity.
In embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes an execution unit configured to determine a video content tag library for matching the scenario positioning data in response to a search range request for the scenario positioning data, or configured to determine a video content tag library for matching the scenario positioning data from among the plurality of preset video content tag libraries according to video content information indicated by the scenario positioning data.
In embodiments of the present application, based on the foregoing scheme, the matching unit is configured to convert the voice data to obtain converted text data, and determine a video content tag matched with the converted text data according to a preset video content tag library.
In embodiments of the present application, based on the foregoing solution, if the video positioning apparatus is built in a video client, the video positioning apparatus further includes a playing unit, configured to obtain the target video, and play the target video in the target video from the playing time point;
if the video positioning device is internally arranged in a video server, the positioning device further comprises: and the pushing unit is used for pushing the identification information of the target video and the playing time point positioned in the target video to a video client, so that the video client acquires the target video according to the identification information of the target video and plays the target video from the playing time point in the target video.
According to aspects of the embodiment of the application, video positioning devices are provided, and comprise a th display unit, a second display unit and a second playing unit, wherein the th display unit is used for displaying a plot searching interface, the second display unit is used for responding to input operation on the plot searching interface and displaying input plot positioning data, and the second playing unit is used for responding to a plot positioning instruction aiming at the plot positioning data and displaying a video playing interface, target videos are played in the video playing interface at the positioned playing time points, and the playing time points are time points positioned in the target videos according to the plot positioning data.
FIG. 20 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 2000 of the electronic device shown in fig. 20 is merely examples, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in fig. 20, the computer system 2000 includes a Central Processing Unit (CPU)2001, which can perform various appropriate actions and processes, such as executing the method described in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) 2002 or a program loaded from a storage section 2008 into a Random Access Memory (RAM) 2003. In the RAM 2003, various programs and data necessary for system operation are also stored. The CPU 2001, ROM 2002, and RAM 2003 are connected to each other via a bus 2004. An Input/Output (I/O) interface 2005 is also connected to bus 2004.
The following components are connected to the I/O interface 2005: an input portion 2006 including a keyboard, a mouse, and the like; an output section 2007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 2008 including a hard disk and the like; and a communication section 2009 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 2009 performs communication processing via a network such as the internet. Drive 2010 is also connected to I/O interface 2005 as needed. A removable medium 2011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 2010 as necessary, so that a computer program read out therefrom is mounted in the storage section 2008 as necessary.
For example, embodiments of the present application include computer program products comprising a computer program embodied on a computer readable medium, the computer program containing a computer program for performing the method illustrated by the flowchart.
A more specific example of a computer-readable storage medium may include, but is not limited to, an electrical connection having or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.
It should also be noted that in some alternative implementations, the functions noted in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can occur out of the order noted in the figures, for example, in substantially the same way as two blocks in a block diagrams or flowchart illustration, or combinations of blocks in a computer program product according to the various embodiments of the present application.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
In another aspect, the present application further provides computer-readable media, which may be included in the electronic devices described in the above embodiments, or may be separately present and not assembled into the electronic devices, the computer-readable media carries or more programs, and when the or more programs are executed by electronic devices, the electronic devices are enabled to implement the methods described in the above embodiments.
In fact, according to embodiments of the present application, the features and functionality of two or more of the modules or units described above may be embodied in modules or units whereas the features and functionality of modules or units described above may be further divided into being embodied by a plurality of modules or units.
Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in nonvolatile storage media (which can be CD-ROM, usb disk, removable hard disk, etc.) or on a network, and includes several instructions to make computing devices (which can be personal computers, servers, touch terminals, or network devices, etc.) execute the method according to the embodiments of the present application.
This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the application pertains and as may be applied to the disclosed embodiments.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (16)

1, A video positioning method, comprising:
obtaining plot positioning data;
determining a video content tag matched with the plot positioning data according to a preset video content tag library;
and positioning a playing time point in a target video according to the video content label so as to play the target video from the playing time point in the target video.
2. The video localization method according to claim 1, further comprising:
acquiring a preprocessed video;
identifying a target object contained in the preprocessed video and a target playing time point of the target object appearing in the preprocessed video;
and generating a video content label according to the identification information of the target object, and storing the generated video content label in association with the target playing time point.
3. The method according to claim 2, wherein the identifying the target object included in the preprocessed video and the target playing time point of the target object appearing in the preprocessed video comprises:
if the target object comprises target characters, matching target playing time points of the target characters in the preprocessed video based on the characteristics of the target characters;
if the target object comprises a target background, matching a target playing time point of each target background in the preprocessed video based on the characteristics of each target background;
and if the target object comprises target actions, matching target playing time points of the target actions in the preprocessed video based on the characteristics of the target actions.
4. The video positioning method according to claim 2, wherein if the target object includes kinds of objects selected from a target person, a target background, and a target action, the generating a video content tag according to the identification information of the target object and storing the generated video content tag in association with the target playing time point comprises:
and generating video content tags according to the identification information of the objects, and storing the generated video content tags in association with corresponding target playing time points of the objects appearing in the preprocessed video.
5. The video positioning method according to claim 2, wherein if the target object includes at least two objects selected from a target person, a target background, and a target human motion, the generating a video content tag according to the identification information of the target object and storing the generated video content tag in association with the target playing time point comprises:
determining a play time point at which the at least two objects appear simultaneously in the preprocessed video;
and generating a video content label according to the identification information of the at least two objects which appear in the preprocessed video at the same time, and storing the generated video content label and the playing time point of the at least two objects which appear at the same time in a correlated manner.
6. The video localization method according to claim 2, further comprising:
acquiring video search data, wherein the video search data comprises a video screenshot and corpus data obtained by searching the preprocessed video in a video library search engine;
extracting key words in the corpus data, and determining corresponding playing time points of the video screenshots in the preprocessed video;
and generating a video content label according to the keyword, and storing the generated video content label and a corresponding playing time point of the video screenshot in the preprocessed video in an associated manner.
7. The video positioning method according to claim 1, wherein the determining the video content tags matching the scenario positioning data according to a preset video content tag library comprises:
segmenting the scenario positioning data to obtain a segmentation result;
determining the number of vocabularies matched with the word segmentation result for each video content label in the video content label library;
determining the similarity between each video content label and the plot positioning data according to the number of words matched with the word segmentation result;
and determining the video content label matched with the plot positioning data according to the similarity.
8. The video positioning method according to claim 7, wherein the determining the similarity between each video content tag and the plot positioning data according to the vocabulary number of each video content tag matching with the word segmentation result comprises:
calculating the ratio of the number of the words matched with the word segmentation result to the number of the words matched with the word segmentation result according to the number of the words matched with the word segmentation result;
and determining the similarity between each video content label and the plot positioning data according to the ratio.
9. The video positioning method according to claim 7, wherein the determining the similarity between each video content tag and the plot positioning data according to the vocabulary number of each video content tag matching with the word segmentation result comprises:
and determining the similarity between each video content label and the plot positioning data according to the number of the vocabularies matched with the word segmentation result and the corresponding relation between the number of the vocabularies and the similarity.
10. The video positioning method according to claim 1, wherein if there are a plurality of preset video content tag libraries, the video positioning method further comprises:
in response to a search range request for the plot positioning data, determining a video content tag library for matching the plot positioning data; or
And determining a video content tag library used for matching the plot positioning data from the plurality of preset video content tag libraries according to the video content information indicated by the plot positioning data.
11. The video positioning method of claim 1, wherein if the scenario positioning data is voice data, the determining a video content tag matching the scenario positioning data according to a preset video content tag library comprises:
converting the voice data to obtain converted text data;
and determining the video content label matched with the converted text data according to a preset video content label library.
12. The video positioning method according to any of of claims 1 to 11, wherein if the video positioning method is performed by a video client, the video positioning method further comprises obtaining the target video in which the target video is played from the playing time point;
if the video playing method is executed by a video server, the video playing method further includes: and pushing the identification information of the target video and the playing time point positioned in the target video to a video client, so that the video client acquires the target video according to the identification information of the target video and plays the target video from the playing time point in the target video.
The video positioning method of , comprising:
displaying a plot search interface;
responding to the input operation on the plot searching interface, and displaying the input plot positioning data;
responding to a plot positioning instruction aiming at the plot positioning data, displaying a video playing interface, and starting playing a target video at a positioned playing time point in the video playing interface, wherein the playing time point is the time point positioned in the target video according to the plot positioning data.
14, a video locator device, comprising:
an th acquiring unit, for acquiring plot positioning data;
the matching unit is used for determining the video content tags matched with the plot positioning data according to a preset video content tag library;
and the positioning unit is used for positioning a playing time point in the target video according to the video content label so as to play the target video from the playing time point in the target video.
15, computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the video positioning method according to any of claims 1 to 13, wherein the computer program is adapted to carry out the video positioning method according to any of claims .
16, an electronic device, comprising:
or more processors;
a storage device to store or more programs that, when executed by the or more processors, cause the or more processors to implement the video positioning method of any of claims 1-13.
CN201911046300.2A 2019-10-30 2019-10-30 Video positioning method, video positioning device, computer readable medium and electronic equipment Active CN110740389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911046300.2A CN110740389B (en) 2019-10-30 2019-10-30 Video positioning method, video positioning device, computer readable medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911046300.2A CN110740389B (en) 2019-10-30 2019-10-30 Video positioning method, video positioning device, computer readable medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110740389A true CN110740389A (en) 2020-01-31
CN110740389B CN110740389B (en) 2023-05-02

Family

ID=69271886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911046300.2A Active CN110740389B (en) 2019-10-30 2019-10-30 Video positioning method, video positioning device, computer readable medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110740389B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111294660A (en) * 2020-03-12 2020-06-16 咪咕文化科技有限公司 Video clip positioning method, server, client and electronic equipment
CN111314732A (en) * 2020-03-19 2020-06-19 青岛聚看云科技有限公司 Method for determining video label, server and storage medium
CN111615007A (en) * 2020-05-27 2020-09-01 北京达佳互联信息技术有限公司 Video display method, device and system
CN111680189A (en) * 2020-04-10 2020-09-18 北京百度网讯科技有限公司 Method and device for retrieving movie and television play content
CN111711869A (en) * 2020-06-24 2020-09-25 腾讯科技(深圳)有限公司 Label data processing method and device and computer readable storage medium
CN112203115A (en) * 2020-10-10 2021-01-08 腾讯科技(深圳)有限公司 Video identification method and related device
CN112328829A (en) * 2020-10-27 2021-02-05 维沃移动通信(深圳)有限公司 Video content retrieval method and device
WO2024059959A1 (en) * 2022-09-19 2024-03-28 京东方科技集团股份有限公司 Demonstration control method, control device, demonstration device and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140053209A1 (en) * 2012-08-16 2014-02-20 Nuance Communications, Inc. User interface for entertainment systems
CN105898362A (en) * 2015-11-25 2016-08-24 乐视网信息技术(北京)股份有限公司 Video content retrieval method and device
CN109947993A (en) * 2019-03-14 2019-06-28 百度国际科技(深圳)有限公司 Plot jump method, device and computer equipment based on speech recognition
CN110121033A (en) * 2018-02-06 2019-08-13 上海全土豆文化传播有限公司 Video categorization and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140053209A1 (en) * 2012-08-16 2014-02-20 Nuance Communications, Inc. User interface for entertainment systems
CN105898362A (en) * 2015-11-25 2016-08-24 乐视网信息技术(北京)股份有限公司 Video content retrieval method and device
CN110121033A (en) * 2018-02-06 2019-08-13 上海全土豆文化传播有限公司 Video categorization and device
CN109947993A (en) * 2019-03-14 2019-06-28 百度国际科技(深圳)有限公司 Plot jump method, device and computer equipment based on speech recognition

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111294660A (en) * 2020-03-12 2020-06-16 咪咕文化科技有限公司 Video clip positioning method, server, client and electronic equipment
CN111314732A (en) * 2020-03-19 2020-06-19 青岛聚看云科技有限公司 Method for determining video label, server and storage medium
CN111680189A (en) * 2020-04-10 2020-09-18 北京百度网讯科技有限公司 Method and device for retrieving movie and television play content
CN111680189B (en) * 2020-04-10 2023-07-25 北京百度网讯科技有限公司 Movie and television play content retrieval method and device
CN111615007A (en) * 2020-05-27 2020-09-01 北京达佳互联信息技术有限公司 Video display method, device and system
CN111711869A (en) * 2020-06-24 2020-09-25 腾讯科技(深圳)有限公司 Label data processing method and device and computer readable storage medium
CN111711869B (en) * 2020-06-24 2022-05-17 腾讯科技(深圳)有限公司 Label data processing method and device and computer readable storage medium
CN112203115A (en) * 2020-10-10 2021-01-08 腾讯科技(深圳)有限公司 Video identification method and related device
CN112203115B (en) * 2020-10-10 2023-03-10 腾讯科技(深圳)有限公司 Video identification method and related device
CN112328829A (en) * 2020-10-27 2021-02-05 维沃移动通信(深圳)有限公司 Video content retrieval method and device
WO2024059959A1 (en) * 2022-09-19 2024-03-28 京东方科技集团股份有限公司 Demonstration control method, control device, demonstration device and readable storage medium

Also Published As

Publication number Publication date
CN110740389B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN110740389B (en) Video positioning method, video positioning device, computer readable medium and electronic equipment
KR102394756B1 (en) Method and apparatus for processing video
CN109117777B (en) Method and device for generating information
CN104598644B (en) Favorite label mining method and device
CN109348275B (en) Video processing method and device
CN109034069B (en) Method and apparatus for generating information
CN110781347A (en) Video processing method, device, equipment and readable storage medium
CN110557659B (en) Video recommendation method and device, server and storage medium
CN109862397B (en) Video analysis method, device, equipment and storage medium
EP3872652A2 (en) Method and apparatus for processing video, electronic device, medium and product
CN110708607B (en) Live broadcast interaction method and device, electronic equipment and storage medium
US11302361B2 (en) Apparatus for video searching using multi-modal criteria and method thereof
CN113297891A (en) Video information processing method and device and electronic equipment
CN107679070B (en) Intelligent reading recommendation method and device and electronic equipment
CN110347866B (en) Information processing method, information processing device, storage medium and electronic equipment
CN112738556A (en) Video processing method and device
CN111460185A (en) Book searching method, device and system
CN110347869B (en) Video generation method and device, electronic equipment and storage medium
CN112866577B (en) Image processing method and device, computer readable medium and electronic equipment
CN113869063A (en) Data recommendation method and device, electronic equipment and storage medium
CN113407775B (en) Video searching method and device and electronic equipment
CN113641837A (en) Display method and related equipment thereof
CN112104914B (en) Video recommendation method and device
WO2023239477A1 (en) Video recording processing
CN113840177B (en) Live interaction method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40020814

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant