CN110740389B - Video positioning method, video positioning device, computer readable medium and electronic equipment - Google Patents

Video positioning method, video positioning device, computer readable medium and electronic equipment Download PDF

Info

Publication number
CN110740389B
CN110740389B CN201911046300.2A CN201911046300A CN110740389B CN 110740389 B CN110740389 B CN 110740389B CN 201911046300 A CN201911046300 A CN 201911046300A CN 110740389 B CN110740389 B CN 110740389B
Authority
CN
China
Prior art keywords
video
target
video content
scenario
content tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911046300.2A
Other languages
Chinese (zh)
Other versions
CN110740389A (en
Inventor
陈姿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911046300.2A priority Critical patent/CN110740389B/en
Publication of CN110740389A publication Critical patent/CN110740389A/en
Application granted granted Critical
Publication of CN110740389B publication Critical patent/CN110740389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/232Content retrieval operation locally within server, e.g. reading video streams from disk arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors

Abstract

Embodiments of the present application provide a video positioning method, a video positioning device, a computer readable medium and an electronic device. The video positioning method comprises the following steps: acquiring scenario positioning data; determining a video content tag matched with the scenario positioning data according to a preset video content tag library; and positioning a playing time point in the target video according to the video content label so as to play the target video from the playing time point in the target video. According to the technical scheme, the scenario position in the video which the user wants to watch can be rapidly positioned according to the scenario positioning data input by the user, and the search response speed of the system is improved.

Description

Video positioning method, video positioning device, computer readable medium and electronic equipment
Technical Field
The present invention relates to the field of computers and communication technologies, and in particular, to a video positioning method, a video positioning device, a computer readable medium and an electronic device.
Background
Currently, when a user wants to watch a scenario in a video, the user usually searches for a scenario introduction or plays the video manually to realize positioning to the scenario position to watch, and this process requires a great deal of time for the user to search manually and makes it difficult to quickly position to the scenario position to watch.
Disclosure of Invention
The embodiment of the application provides a video positioning method, a video positioning device, a computer readable medium and electronic equipment, so that the time for a user to perform manual search can be reduced at least to a certain extent, and the user can quickly position a scenario position to be watched.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned in part by the practice of the application.
According to an aspect of an embodiment of the present application, there is provided a video positioning method, including: acquiring scenario positioning data; determining a video content tag matched with the scenario positioning data according to a preset video content tag library; and positioning a playing time point in the target video according to the video content label so as to play the target video from the playing time point in the target video.
According to an aspect of an embodiment of the present application, there is provided a video positioning method, including: displaying a scenario search interface; responding to the input operation on the scenario searching interface, and displaying the input scenario positioning data; and responding to a scenario positioning instruction aiming at the scenario positioning data, displaying a video playing interface, and starting to play a target video at a positioned playing time point in the video playing interface, wherein the playing time point is a time point positioned in the target video according to the scenario positioning data.
According to an aspect of an embodiment of the present application, there is provided a video positioning apparatus including: the first acquisition unit is used for acquiring scenario positioning data; the matching unit is used for determining a video content tag matched with the scenario positioning data according to a preset video content tag library; and the positioning unit is used for positioning a playing time point in the target video according to the video content label so as to play the target video from the playing time point in the target video.
In some embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes: the second acquisition unit is used for acquiring the preprocessed video; the identification unit is used for identifying a target object contained in the preprocessed video and a target playing time point of the target object in the preprocessed video; and the generating unit is used for generating a video content label according to the identification information of the target object and storing the generated video content label in association with the target playing time point.
In some embodiments of the present application, based on the foregoing solution, the identifying unit is configured to: if the target object comprises a target person, matching target play time points of the occurrence of the target persons in the preprocessed video based on the characteristics of the target persons; if the target object comprises a target background, matching target playing time points of the occurrence of the target backgrounds in the preprocessed video based on the characteristics of the target backgrounds; and if the target object comprises target actions, matching target playing time points of the occurrence of the target actions in the preprocessed video based on the characteristics of the target actions.
In some embodiments of the present application, based on the foregoing solution, if the target object includes one of a target person, a target background, and a target action, the generating unit is configured to: generating a video content label according to the identification information of the one object, and storing the generated video content label and a target playing time point of the corresponding one object in the preprocessed video in an associated mode.
In some embodiments of the present application, based on the foregoing solution, if the target object includes at least two objects of a target person, a target background, and a target human action, the generating unit is configured to: determining a play time point at which the at least two objects appear simultaneously in the preprocessed video; generating a video content label according to the identification information of the at least two objects which are simultaneously appeared in the preprocessed video, and storing the generated video content label in association with the playing time point of the at least two objects which are simultaneously appeared.
In some embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes: the third acquisition unit is used for acquiring video search data, wherein the video search data comprises video screenshot and corpus data obtained by searching the preprocessed video in a video library search engine; the extraction unit is used for extracting keywords in the corpus data and determining a corresponding playing time point of the video screenshot in the preprocessed video; and the storage unit is used for generating a video content label according to the keywords and storing the generated video content label and the corresponding playing time point of the video screenshot in the preprocessed video in an associated manner.
In some embodiments of the present application, based on the foregoing scheme, the matching unit is configured to: word segmentation is carried out on the scenario positioning data, and word segmentation results are obtained; determining the number of words matched with the word segmentation result by each video content tag in the video content tag library; determining the similarity between each video content tag and the scenario positioning data according to the number of words matched with each video content tag and the word segmentation result; and determining the video content label matched with the scenario positioning data according to the similarity.
In some embodiments of the present application, based on the foregoing scheme, the matching unit is configured to: calculating the ratio of the number of words matched with the word segmentation result of each video content tag to the number of words of each video content tag according to the number of words matched with the word segmentation result of each video content tag; and determining the similarity between each video content label and the scenario positioning data according to the ratio.
In some embodiments of the present application, based on the foregoing scheme, the matching unit is configured to: and determining the similarity between each video content label and the scenario positioning data according to the number of words matched with the word segmentation result and the corresponding relation between the number of words and the similarity.
In some embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes: an execution unit for determining a video content tag library for matching the scenario positioning data in response to a search range request for the scenario positioning data; or determining a video content tag library for matching the scenario positioning data from the plurality of preset video content tag libraries according to the video content information indicated by the scenario positioning data.
In some embodiments of the present application, based on the foregoing scheme, the matching unit is configured to: converting the voice data to obtain converted text data; and determining the video content label matched with the converted text data according to a preset video content label library.
In some embodiments of the present application, based on the foregoing solution, if the video positioning device is built in a video client, the video positioning device further includes: the first playing unit is used for acquiring the target video and playing the target video from the playing time point in the target video; if the video positioning device is built in a video server, the positioning device further comprises: the pushing unit is used for pushing the identification information of the target video and the playing time point positioned in the target video to the video client so that the video client obtains the target video according to the identification information of the target video and plays the target video from the playing time point in the target video.
According to an aspect of an embodiment of the present application, there is provided a video positioning apparatus including: the first display unit is used for displaying a scenario search interface; the second display unit is used for responding to the input operation on the scenario search interface and displaying the input scenario positioning data; and the second playing unit is used for responding to the scenario positioning instruction aiming at the scenario positioning data, displaying a video playing interface, and starting to play the target video at the positioned playing time point in the video playing interface, wherein the playing time point is the time point positioned in the target video according to the scenario positioning data.
According to an aspect of the embodiments of the present application, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a video localization method as described in the above embodiments.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; and a storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the video localization method as described in the above embodiments.
According to the technical scheme provided by the embodiments of the application, the scenario positioning data describing the scenario is input in the scenario in the video which the user wants to watch, the video content label matched with the scenario positioning data is determined according to the preset video content label library, and then the playing time point is positioned in the target video according to the matched video content label, so that the scenario position in the video which the user wants to watch can be rapidly positioned according to the scenario positioning data input by the user, the search response speed of the system is improved, the user does not need to manually search the scenario introduction and manually play the video to be positioned at the scenario position which the user wants to watch, the time of manual search by the user can be reduced to a certain extent, and the watching experience of the user is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application may be applied.
Fig. 2 shows a flow chart of a video localization method according to one embodiment of the present application.
Fig. 3 shows a flow chart of a video localization method according to one embodiment of the present application.
Fig. 4 shows a flow chart of face image recognition in accordance with one embodiment of the present application.
FIG. 5 illustrates a functional block diagram of motion recognition of a character motion recognition model in accordance with one embodiment of the present application.
Fig. 6 shows a specific flowchart of step S330 of the video localization method according to one embodiment of the present application.
Fig. 7 shows a flow chart of a video localization method according to one embodiment of the present application.
Fig. 8 shows a specific flowchart of step S220 of the video localization method according to one embodiment of the present application.
Fig. 9 shows a specific flowchart of step S220 of the video localization method according to one embodiment of the present application.
Fig. 10 shows a specific flowchart of step S530 of the video localization method according to one embodiment of the present application.
Fig. 11A shows a flow chart of a video localization method according to one embodiment of the present application.
Fig. 11B shows a flowchart of a video localization method according to one embodiment of the present application.
FIG. 12 illustrates a flow chart of voice scenario localization according to one embodiment of the present application.
FIG. 13 illustrates a flow chart of voice scenario localization according to one embodiment of the present application.
FIG. 14 illustrates a flow chart of voice scenario localization according to one embodiment of the present application.
Fig. 15 shows a flow chart of a video localization method according to one embodiment of the present application.
FIG. 16 illustrates a scenario search interface diagram according to one embodiment of the present application.
Fig. 17 illustrates a video playback interface diagram according to one embodiment of the present application.
Fig. 18 illustrates a video playback interface diagram according to one embodiment of the present application.
Fig. 19 shows a block diagram of a video locating apparatus according to an embodiment of the present application.
Fig. 20 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present application. One skilled in the relevant art will recognize, however, that the aspects of the application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include a video client (e.g., one or more of the smartphone 101, tablet 102, and portable computer 103 shown in fig. 1, but of course, a desktop computer, etc.), a network 104, and a video server 105. The network 104 is the medium used to provide a communication link between the video clients and the video server 105. The network 104 may include various connection types, such as wired communication links, wireless communication links, and the like.
It should be understood that the number of video clients, networks, and video servers in fig. 1 is merely illustrative. There may be any number of video clients, networks, and video servers, as desired for implementation. For example, the video server 105 may be a server cluster formed by a plurality of servers.
A user may interact with video server 105 via network 104 using video client 103 (or video client 101 or 102 as well), to receive or send messages, etc., and video server 105 may be a server providing video services. For example, a user may upload scenario positioning data to the video server 105 through the video client 103, where the scenario positioning data is data describing a scenario input by the user for a certain section of scenario in a video to be watched, and the video server 105 determines a video content tag matched with the scenario positioning data according to a preset video content tag library, and positions a play time point in a target video according to the matched video content tag, so that the video server 105 may push the play time point positioned in the target video to the video client 103, and the video client 103 starts to play the target video from the play time point in the target video, thereby realizing that the scenario positioning data input by the user is quickly positioned to a scenario position in the video to be watched by the user, improving a search response speed of the system, and enabling the user to be positioned to the scenario position to be watched by no longer needing to search the scenario and manually playing the video, so as to reduce a time of manual search by the user to a certain extent, and improve a watching experience of the user.
It should be noted that, the video positioning method provided in the embodiments of the present application is generally performed by the video server 105, and accordingly, the video positioning device is generally disposed in the video server 105. However, in other embodiments of the present application, the video client 103 (may also be the video client 101 or 102) may also have a similar function to the video server 105, so as to execute the scheme of the video positioning method provided in the embodiments of the present application, for example, the user inputs scenario positioning data at the video client 103, the video client 103 determines, according to a preset video content tag library, a video content tag matching with the scenario positioning data, and then positions a playing time point in the target video according to the matched video content tag, so as to implement playing of the target video from the playing time point in the target video.
The implementation details of the technical solutions of the embodiments of the present application are described in detail below:
fig. 2 shows a flow chart of a video localization method according to one embodiment of the present application, which may be performed by a server, which may be the video server 105 shown in fig. 1. Referring to fig. 2, the video positioning method at least includes steps S210 to S230, and these steps are described in detail below.
In step S210, scenario positioning data is acquired.
The scenario positioning data is data describing a scenario input by a user when the user wants to watch a scenario in a video, and the data may include one or more of personal identification information, background information, and information such as a person action, for example, "when a grand monkey is pressed in a five-finger mountain" may be input as scenario positioning data for scenario positioning when the user needs to watch a scenario in a western tour in which the grand monkey is pressed under the five-finger mountain ".
The scenario positioning data can be voice data or text data, and when the scenario positioning data is voice data, a user can input voice through a voice device in the video client side as the scenario positioning data; when the scenario positioning data is text data, the user can input text as the scenario positioning data through an entity key or a virtual key in the video client. And the video client uploads the received scenario positioning data to the video server after receiving the scenario positioning data input by the user.
In step S220, a video content tag matching the scenario positioning data is determined according to a preset video content tag library.
The video content tag library comprises video content tags for describing the dramas in the video, and the video content tags can specifically comprise texts for describing personal identification information, background information and personal action information appearing in a certain segment of the video, and the texts are used for matching with the drama positioning data so as to position the drama in the video which the user wants to watch according to the matched video content tags. Wherein the video content tag may include one or more of personal identification information, background information, and character action information.
Referring to fig. 3, fig. 3 shows a flowchart of a video positioning method according to an embodiment of the present application, and the video positioning method may further include steps S310 to S330, which are described in detail below.
In step S310, a preprocessed video is acquired.
The preprocessed video, as a video that requires a scenario content tag to be added to a particular scenario in the video, may be, for example, some particular video of a video library or all of the videos in the video library.
In step S320, a target object contained in the preprocessed video and a target play time point at which the target object appears in the preprocessed video are identified.
The target object refers to object information, such as personal identification information, background information or character action information, in the preprocessed video, which can describe the scenario in the video. In order to be able to locate a scenario in a video by a target object, it is therefore necessary to determine a target play time point at which the target object appears in the preprocessed video, so as to accurately locate a scenario position in the video according to the target object.
In one embodiment of the present application, the step S320 specifically includes: if the target object includes a target person, matching, in the preprocessed video, a target play time point at which the respective target person appears based on characteristics of the respective target person: if the target object comprises a target background, matching target playing time points of the occurrence of the target backgrounds in the preprocessed video based on the characteristics of the target backgrounds; and if the target object comprises target actions, matching target playing time points of the occurrence of the target actions in the preprocessed video based on the characteristics of the target actions.
When the target object is a target person to be matched, the characteristics of each target person to be matched can be acquired first when each target person contained in the preprocessed video and a target playing time point of each target person in the preprocessed video are identified, and the target playing time point of each target person in the preprocessed video can be matched according to the characteristics of each target person. The character is characterized by information that uniquely identifies the character, such as a face image. When the feature of the target person is a face image, the target face image of the target person for recognition may be acquired, and the preprocessed video may be input into the face recognition model to determine a target playing time point at which each target person appears in the preprocessed video, where in this embodiment, the target person may be a main person at which the preprocessed video appears.
Referring to fig. 4, fig. 4 illustrates a flow chart of face image recognition in accordance with one embodiment of the present application. The face recognition model is obtained by training a machine learning model through training sample data, and the machine learning model can be a CNN (Convolutional Neural Network ) model or a deep neural network model and the like during training.
In the process of training a machine learning model to obtain a face recognition model, acquiring a face image and a face label with a known face image as training sample data, and inputting the training sample data into the machine learning model for training, wherein the specific training process comprises the steps of carrying out picture preprocessing on the face image in the training sample data to obtain a face picture for carrying out face region recognition and cutting; recognizing and cutting a face region in the face picture to obtain a face region image; extracting face image features according to the face region images; and outputting the face label according to the face image characteristics. When determining whether the face recognition model is trained, if the proportion of the face label output in the training sample data, which is consistent with the face label known by the face image, reaches a preset threshold value, the face recognition model can be determined to be trained.
After the face recognition model is obtained, a target face image to be matched can be obtained first, and the preprocessed video is input into the face recognition model to determine the target face image contained in the preprocessed video and the playing time point of the target face image in the preprocessed video, so that the target person contained in the preprocessed video is identified, and a video content label for describing the scenario in the video is generated according to the identification information of the target person.
When the target object is a target background to be matched, when each target background contained in the preprocessed video and a target playing time point of each target background in the preprocessed video are identified, the characteristics of each target background to be matched can be acquired first, and the target playing time point of each target background in the preprocessed video is matched according to the characteristics of each target background. The feature of the target background is information that uniquely identifies the target background, and in this embodiment, the feature of the target background may be a background picture that appears in the preprocessed video, such as a ship, a seat, a vehicle, or the like. The preprocessed video is input into a preset background recognition model, and the target playing time points of each target background in the preprocessed video can be determined through the background recognition model.
The background recognition model is obtained by training a machine learning model through training sample data, and the machine learning model during training can also be a CNN model or a deep neural network model, etc., and it should be noted that the background recognition model is the same as the training process of the face recognition model, and the difference is that the training objects are different, and the specific training process also refers to the training process of the face recognition model, and is not repeated herein.
When the target object is a target action to be matched, when each target action contained in the preprocessed video and a target playing time point of each target action in the preprocessed video are identified, the characteristics of the target action to be matched can be acquired first, and the target playing time point of each target action in the preprocessed video can be matched according to the characteristics of each target action. In this embodiment, the target action may be a character action in which a scenario appears in the preprocessed video, and the feature of the target action refers to a character action tag corresponding to the character action. Specifically, the preprocessed video may be input to a preset character action recognition model, and the character action recognition model matches, according to the need, character action tags corresponding to the target actions, and matches, in the preprocessed video, target play time points at which the respective target actions appear.
The character motion recognition model is obtained by training a machine learning model by training sample data, and the machine learning model during training may be a CNN model. In the process of training the machine learning model to obtain the character motion recognition model, the machine learning model can be trained by acquiring a moving target video containing the character motion and a human motion label with known character motion as training samples.
Referring to fig. 5, fig. 5 shows a schematic block diagram of motion recognition of a character motion recognition model, and as can be seen from fig. 5, the process of training the character motion recognition model includes extracting feature characterization information corresponding to a character motion contained in a moving target video, where the feature characterization information is a multidimensional vector representing a human motion; and performing action recognition and understanding according to the characteristic characterization information corresponding to the character action, and outputting a character action label. When determining whether the training of the character motion recognition model is completed, if the proportion of the output character motion label and the human motion label with known character motion in the training sample data reaches a preset threshold value, the training of the character motion recognition model can be determined to be completed.
After the character action recognition model is obtained, the target character actions to be matched can be obtained first, the preprocessed video is input into the character action recognition model, so that the target character actions contained in the preprocessed video and the playing time points of the target character actions in the preprocessed video are determined, and further the target character actions contained in the preprocessed video are recognized, so that a video content tag for carrying out scenario description on the scenario in the video is generated according to the identification information of the target character actions.
Still referring to fig. 3, in step S330, a video content tag is generated according to the identification information of the target object, and the generated video content tag is stored in association with the target playing time point.
The identification information of the target object is used as information for uniquely identifying the target object, and when the target object is a target person, the identification information can be identification information such as a name or a role name of the target person; when the target object is a target background, the identification information may be a background name; when the target object is a target action, the identification information may be a character action tag corresponding to the target action.
By generating a video content tag according to the identification information of the target object for scenario description in the video and storing the generated video content tag in association with the target playing time point, the playing time point can be positioned according to the video content tag, so that the video can be played at the scenario position in the video directly.
In one embodiment of the present application, if the target object includes one of a target character, a target background and a target action, the step S330 specifically includes: generating a video content label according to the identification information of the one object, and storing the generated video content label and a target playing time point of the corresponding one object in the preprocessed video in an associated mode.
When the target object only comprises one object of a target person, a target background and a target action, a video content label can be generated according to identification information corresponding to the one object, and the generated video content label and a target playing time point of the corresponding one object in the preprocessed video are associated and stored, so that the scenario position of the object in the video, such as all scenario positions of a person in the video, can be conveniently and accurately found according to the one object, and personalized scenario positioning requirements of users can be further met.
Referring to fig. 6, fig. 6 shows a specific flowchart of step S330 of the video positioning method according to an embodiment of the present application, in this embodiment, if the target object includes at least two objects of a target person, a target background, and a target human motion, step S330 may include steps S3301 to S3302, which are described in detail below.
In step S3301, a play time point at which the at least two objects appear simultaneously in the preprocessed video is determined.
If the target object includes at least two objects of a target person, a target background and a target human motion, a play time point when the at least two objects appear simultaneously in the preprocessed video needs to be determined, so that a video content label is generated according to identification information of the at least two different objects appearing simultaneously.
In step S3302, a video content tag is generated according to the identification information of the at least two objects that appear simultaneously in the preprocessed video, and the generated video content tag is stored in association with the play time points of the at least two objects that appear simultaneously.
When generating a video content tag according to identification information of at least two objects which are simultaneously present in the preprocessed video, the identification information corresponding to different objects can be connected in series according to a preset arrangement sequence, wherein the arrangement sequence can be that according to the arrangement sequence of a target person, a target background and a target human body action, the identification information is connected in series; for example, in the video of the thatanit number, for the specific scenario of the jackfruit on the ship, the video content tag can be generated according to the identification information of the target person, the target background and the target human motion, and the generated video content tag is "jackfruit and ross hug on the ship", however, the preset arrangement order can be other arrangement orders, which is not limited herein.
In the technical solution of the embodiment shown in fig. 6, for a video content tag formed by the identification information of at least two different target objects, scenario description can be performed on scenarios in a video in more detail, so that scenario positioning data can be more accurately matched, and further accuracy of positioning to a scenario position desired to be watched by a user according to the scenario positioning data is improved.
Referring to fig. 7, fig. 7 shows a flowchart of a video positioning method according to an embodiment of the present application, in which the video positioning method may further include steps S410 to S430, which are described in detail below.
In step S410, video search data is acquired, where the video search data includes a video screenshot obtained by searching the preprocessed video in a video library search engine and corpus data.
When the video content label corresponding to the preprocessed video is generated, video search data can be obtained by searching the preprocessed video in the video library search engine, wherein the video search data is derived from forum data or comment area data set for the preprocessed video. The video search data comprises a video screenshot obtained after searching the preprocessed video in the video library and corpus data, the video screenshot can be a video screenshot of a certain playing time point intercepted by a user on the preprocessed video, and the corpus data can be descriptive information describing the video screenshot, such as information of relevant comments on the video screenshot.
In step S420, keywords in the corpus data are extracted, and a play time point corresponding to the video screenshot in the preprocessed video is determined.
In order to obtain text marking the scenario content corresponding to the video screenshot according to the corpus data in the video search data, keywords in the corpus data can be extracted, wherein the keyword types in the corpus data can comprise person names, person character names, background names, person action labels and the like, so that text describing the scenario content reflected by the video screenshot can be generated according to the keywords. In order to determine the video scenario corresponding to the corpus data according to the video screenshot in the video search data, a play time point of the video screenshot in the preprocessed video can be determined so as to be convenient for positioning to the scenario position corresponding to the video screenshot.
In step S430, a video content tag is generated according to the keyword, and the generated video content tag and the play time point corresponding to the video screenshot in the preprocessed video are stored in association.
When generating the video content tag according to the keywords, different types of keywords can be connected in series according to a preset arrangement sequence, for example, the video content tag can be generated by connecting in series according to the sequence of the name, the character name, the background and the character action.
And for the video content label generated according to the keywords, storing the video content label and the play time point corresponding to the video screenshot in the preprocessed video in an associated manner, so that the corresponding scenario position can be accurately positioned according to the video label.
In the technical solution of the embodiment shown in fig. 7, by acquiring video search data from a video library search engine and generating video content tags at specific scenarios in the preprocessed video according to the video search data, the video content tags included in the video content tag library are greatly enriched, so that the corresponding video content tags can be more easily matched according to scenario positioning data, and the accuracy of positioning to the scenario position which the user wants to watch according to the scenario positioning data is improved to a certain extent.
In one embodiment of the present application, when generating the video content tag corresponding to the preprocessed video, the video content tag may also be generated according to other corpus data related to the preprocessed video, for example, the video content tag may be generated according to the speech data appearing in the preprocessed video or the bullet screen data corresponding to the preprocessed video.
When generating a video content tag according to the speech data appearing in the preprocessed video, the target speech to be identified in the preprocessed video can be acquired first, the playing time point of the target speech appearing in the preprocessed video is determined, the target speech can be the speech at the corresponding specific scenario in the preprocessed video, and the speech can describe the specific scenario in the video in detail; specifically, when determining a play time point of a target speech in the preprocessed video, an OCR (Optical Character Recognition ) method may be used to identify speech in the preprocessed video, and the speech obtained by the identification is compared with the target speech to determine speech consistent with the target speech, so as to determine the target speech in the preprocessed video and the play time point of the target speech in the preprocessed video. When generating the video content tag according to the target line, the keyword in the target line can be extracted, and the generated video content tag is associated with the target time point when the target line appears according to the video content tag generated by extracting the keyword in the target line.
In the process of watching the video, when watching some classical shots or specific dramas, the user can send some target barrages related to the drama content, so that the playing time point of the target barrages in the preprocessed video can be determined according to the target barrages required to be identified in the preprocessed video, video content labels are generated according to the target barrages, and the generated video content labels are stored in an associated mode with the playing time points of the target barrages.
The video content labels are generated according to the speech data appearing in the preprocessed video or the bullet screen data corresponding to the preprocessed video, so that the video content labels contained in the video content label library can be further enriched, the corresponding video content labels can be more easily matched according to the scenario positioning data, and the accuracy of positioning to the scenario positions which the user wants to watch according to the scenario positioning data can be further improved.
Referring to fig. 8, fig. 8 is a specific flowchart of step S220 of the video positioning method according to an embodiment of the present application, and if the scenario positioning data is voice data, the step S220 includes steps S2201 to S2202, which are described in detail below.
In step S2201, the voice data is converted to obtain converted text data.
When the scenario positioning data is voice data, in order to enable the voice data to be subjected to similarity matching with the video content tag in a text form, the voice data needs to be converted into text data so as to determine the video content tag matched with the scenario positioning data. Specifically, the voice data can be subjected to voice recognition by a natural language processing technology, and converted text data can be obtained according to the voice recognition.
In step S2202, a video content tag matching the converted text data is determined according to a preset video content tag library.
After the converted text data is obtained, each video content tag contained in the video content tag library is matched with the converted text data to determine the video content tag matched with the converted text data.
Referring to fig. 9, fig. 9 shows a specific flowchart of step S220 of the video positioning method according to an embodiment of the present application, in which step S220 may include steps S510 to S540, which are described in detail below.
In step S510, the scenario positioning data is segmented, and a segmentation result is obtained.
If the scenario positioning data is text data, word segmentation can be directly performed on the text data to obtain word segmentation results so as to obtain all words in the scenario positioning data, and if the scenario positioning data is voice data, the voice data needs to be converted into text data, wherein the scenario positioning data can be subjected to word segmentation by adopting a barker word segmentation method, and of course, other word segmentation methods can be adopted, so that the method is not limited.
In step S520, the number of vocabularies that each video content tag in the video content tag library matches with the word segmentation result is determined.
And respectively matching the word segmentation result obtained by word segmentation of the scenario positioning data with each video content tag in the video content tag library to determine the number of words which can be successfully matched, wherein the more the number of words which are matched, the higher the matching degree of the scenario positioning data with the video content tag is, otherwise, the fewer the number of words which are matched, the lower the matching degree of the scenario positioning data with the video content tag is.
In step S530, the similarity between each video content tag and the scenario positioning data is determined according to the number of words that match each video content tag with the word segmentation result.
When determining whether the video content labels in the video content label library are matched with the scenario positioning data, the similarity between each video content label in the video content label library and the scenario positioning data can be determined according to the number of matched words, wherein the similarity between the number of matched words and the scenario positioning data is a positive correlation relationship, in other words, the more the number of matched words is, the higher the similarity is.
Referring to fig. 10, fig. 10 shows a specific flowchart of step S530 of the video locating method according to an embodiment of the present application, and the step S530 may include steps S5301 to S5302, which are described in detail below.
In step S5301, according to the number of words that each video content tag matches with the word segmentation result, a ratio of the number of words that each video content tag matches with the word segmentation result to the number of words of each video content tag is calculated.
When the similarity between each video content tag and the scenario positioning data is determined according to the number of words matched with each video content tag and the word segmentation result, the ratio of the number of words matched with each video content tag and the word segmentation result to the number of words of each video content tag can be calculated according to the number of words matched with each video content tag and the word segmentation result, when the corresponding ratio is larger, the similarity between the video content tag and the scenario positioning data is higher, otherwise, when the corresponding ratio is smaller, the similarity between the video content tag and the scenario positioning data is lower.
In step S5302, a similarity between the video content tags and the scenario positioning data is determined according to the ratio.
The similarity between each video content label and the scenario positioning data is determined through the ratio of the number of words matched with the word segmentation result of each video content label to the number of words of each video content label, and the video content label with high similarity to the scenario positioning data can be accurately determined, so that the accuracy of positioning the specific scenario position in the video through the inputted scenario positioning data is improved.
In an embodiment of the present application, the step S530 may further include: and determining the similarity between each video content label and the scenario positioning data according to the number of words matched with the word segmentation result and the corresponding relation between the number of words and the similarity.
When the similarity between each video content tag and the scenario positioning data is determined according to the number of words matched with each video content tag and the word segmentation result, the similarity between each video content tag and the scenario positioning data can be determined according to the number of words matched with each video content tag and the word segmentation result and the corresponding relation between the number of words and the similarity, the similarity between each video content tag and the scenario positioning data in a video content tag library is directly determined according to the number of words matched with each video content tag, the influence on similarity calculation caused by the number difference of words contained in different video content tags can be avoided, and the accuracy of the scenario position positioned in the video by the inputted scenario positioning data can be further improved.
Still referring to fig. 9, in step S540, a video content tag matching with the scenario positioning data is determined according to the similarity.
In this embodiment, the video content tag having the highest similarity with scenario positioning data may be used as the matching video content tag. Of course, a video content tag having a similarity higher than a predetermined similarity threshold may also be used as a video content tag matching with scenario positioning data, which is not limited herein.
Still referring to fig. 2, in step S230, a playing time point is located in a target video according to the video content tag, so as to play the target video from the playing time point in the target video.
And determining a play time point in the target video according to the matched video content label so as to play the target video from the play time point in the target video, thereby realizing rapid positioning to the scenario position in the video which the user wants to watch according to scenario positioning data input by the user.
When the fact that the video content label matched with the scenario positioning data does not exist in the preset video content label library is detected, a notice of not retrieving the scenario can be generated, so that the video client can play voice or display text in an interface according to the notice of not retrieving the scenario, and the user is informed of the matched result, so that the user is prompted to input the corresponding scenario positioning data again.
The above can be seen that, through the scenario positioning data describing the scenario input in the video which the user wants to watch, and according to the preset video content tag library, the video content tag matched with the scenario positioning data is determined, and then the play time point is positioned in the target video according to the matched video content tag, the scenario position in the video which the user wants to watch can be quickly positioned according to the scenario positioning data input by the user, the search response speed of the system is improved, and the user does not need to position the scenario position which the user wants to watch by searching scenario introduction and manually playing the video, so that the time for the user to manually search can be reduced to a certain extent, and the watching experience of the user is improved.
Referring to fig. 11A, fig. 11A shows a flowchart of a video positioning method according to an embodiment of the present application, in this embodiment, if the video positioning method is performed by a video client, the video positioning method further includes step S240, which is described in detail below.
In step S240, the target video is acquired, and the target video is played from the play time point in the target video.
When the video positioning method is executed by the video client, the video client can directly acquire the target video according to the identification information of the target video after positioning the playing time point in the target video according to the video content label, wherein the identification information can be URI (Uniform Resource Identifier ) information, and after acquiring the target video, the target video can be played from the playing time point in the target video without operation of a user, so that the user experience is good.
Referring to fig. 11B, fig. 11B shows a flowchart of a video positioning method according to an embodiment of the present application, in this embodiment, if the video positioning method is performed by a video server, the video positioning method further includes step S241, which is described in detail below.
In step S241, the identification information of the target video and the playing time point located in the target video are pushed to the video client, so that the video client obtains the target video according to the identification information of the target video, and plays the target video from the playing time point in the target video.
When the video positioning method is executed by the video server, after the video client positions the playing time point in the target video according to the video content tag, the URI information of the target video and the playing time point positioned in the target video need to be pushed to the video client, so that the video client acquires the target video according to the URI information of the target video, and starts playing the target video from the playing time point in the target video.
In one embodiment of the present application, the video positioning method further includes:
determining a video content tag library for matching the scenario positioning data in response to a search range request for the scenario positioning data; or determining a video content tag library for matching the scenario positioning data from the plurality of preset video content tag libraries according to the video content information indicated by the scenario positioning data.
When a user needs to accurately plot the video in the video library, for example, plot the video is only mapped to a certain part of the video, in order to reduce unnecessary workload, a search range request for determining a matching range of plot positioning data can be input, the search range request includes a video identifier for identifying a search range, the video identifier can be a video name, the request can be triggered by a physical key or a virtual key provided by the user at a video client, and the video client uploads the search range request input by the user to a video server.
In one embodiment of the application, a video server responds to a search range request uploaded by a video client, determines a video content tag library for matching scenario positioning data from a plurality of preset video content tag libraries according to video identifications contained in the search range request and corresponding relations between the video identifications and the video content tag libraries, and further avoids matching all video content tag libraries with the scenario positioning data, so that the efficiency of finding video content tags matched with the scenario positioning data can be improved while the matching workload is reduced.
In one embodiment of the present application, when accurate video positioning is required for a part of videos in the video library, the video server may further determine, according to the video content information indicated by the scenario positioning data, a video content tag library for matching the scenario positioning data from a plurality of preset video content tag libraries. If the scenario positioning data comprises a person character name or a person name, the person character name or the person name can be used as video content information indicated by the scenario positioning data, the video server can determine videos related to the video content information, such as videos containing the person character, according to the indicated video content information, and a video content tag library corresponding to the videos related to the video content information is determined as a video content tag library matched with the scenario positioning data.
The following describes a video positioning method in the embodiment of the present application by taking a scenario of voice scenario positioning as an example.
Referring to fig. 12, fig. 12 shows a flowchart of locating a scenario of a voice according to an embodiment of the present application, in this embodiment, when a user needs to watch a scenario of a main actor in a certain movie, a role name of the actor may be input as scenario locating voice of scenario description to a video client, the video client uploads the scenario locating voice to a video server, the video server performs semantic recognition on the scenario locating voice to obtain a recognized text, and searches the recognized text according to a locally pre-stored video content tag library to determine whether a video content tag matching the recognized text exists in a preset video content tag library.
Referring to fig. 13, fig. 13 shows a flowchart of a voice scenario positioning according to an embodiment of the present application, which may include steps S1301 to S1306, described in detail below.
In step S1301, the user inputs scenario positioning voice in the video client.
Step S1302, a video background corresponding to the video client acquires scenario positioning voice input by a user.
In step S1303, the video background sends scenario positioning voice to the voice server.
In step S1304, the voice server performs semantic recognition on the scenario positioning voice to obtain a recognized text, and the recognized text is sent to the video tag server.
In step S1305, the video tag server matches the identified text according to the locally pre-stored video content tag library to determine a video content tag matched with the identified text, locates a play time point of the matched video content tag, and feeds back the located play time point in the target video to the video background.
In step S1306, the video client obtains the located playing time point in the target video received by the video background, displays the video playing interface, and starts playing the target video at the located playing time point in the video playing interface.
The video server may be a server cluster consisting of a voice server and a video tag server. The voice server is used for carrying out semantic recognition on the scenario positioning voice by the video server to obtain a recognized text, and the voice server sends the recognized text to the video tag server. The video tag server searches the identified text according to a local pre-stored video content tag library so as to determine whether a video content tag matched with the identified text exists in the preset video content tag library, and feeds back a matching result to the video client.
Referring to fig. 14, fig. 14 shows a flowchart of voice scenario positioning according to an embodiment of the present application, and as can be seen from fig. 14, the video server generates a video content tag library according to steps S1401 to S1404, which will be described in detail below.
In step S1401, the operator uploads the preprocessed movie video to the video server, and uploads the face image of the starring actor in the movie video to the video server.
In step S1402, the video server determines coordinates at which the face image of the starring actor appears and a point in time at which the face image of the starring actor appears in the movie video through the face recognition model.
In step S1403, a time point when the face image of the starring actor appears in the movie video, coordinates when the face image of the starring actor appears, and a role name of the starring actor are determined and stored in association, so as to facilitate finding a scenario position of the actor in the movie according to the role name of the actor.
Still referring to fig. 12, if the video server detects that a preset video content tag library has a video content tag matching with the identified text, determining a playing time point according to the matched video content tag, and feeding back the movie and the determined playing time point in the movie to the video client as a result, so that the video client displays a video playing interface, and in the video playing interface, starting playing the movie at the playing time point in the located movie, thereby rapidly locating the movie position to be watched. If the video server does not detect that the preset video content label library has the video content label matched with the identified text, a notification of the unsearched scenario can be generated, so that the video client can play the voice according to the notification of the unsearched scenario.
According to the method, when a user wants to watch the scenario of a certain main actor in a certain film, the role name of the main actor can be input in a voice mode to serve as scenario positioning voice, the video content label of the role name of the main actor is determined according to the video content label library corresponding to the film, then the playing time point of the main actor is positioned in the film according to the matched video content label, and further the situation that the main actor appears in the film is rapidly positioned according to the role name of the main actor, so that the user does not need to position the scenario position to be watched in a mode of searching scenario introduction and manually playing video, the time of manual searching by the user can be reduced to a certain extent, and the watching experience of the user is improved.
Referring to fig. 15, fig. 15 shows a flowchart of a video positioning method according to an embodiment of the present application, in which an execution subject of the video positioning method is a video client, and the video positioning method includes steps S1510 to S1530, which are described in detail below.
In step S1510, a scenario search interface is displayed.
The user may enter the scenario search interface by clicking a virtual button in the video client to enter the scenario search interface in order to enter scenario search data to find a particular scenario location of the video desired to be viewed.
In step S1520, input scenario positioning data is displayed in response to an input operation at the scenario search interface.
Referring to fig. 16, fig. 16 illustrates a scenario search interface diagram according to an embodiment of the present application, a user may input a corresponding search text in an input box 1601 in the scenario search interface to perform an input operation, and the input box 1601 may display scenario positioning data input by the user.
In step S1530, in response to the scenario positioning instruction for the scenario positioning data, a video playing interface is displayed, in which a target video starts to be played at a located playing time point, which is a time point located in the target video according to the scenario positioning data.
Referring to fig. 16, after inputting scenario positioning data in the box 1601, a user may trigger a scenario positioning instruction by clicking a virtual button 1602 in a scenario search interface, so that a video client performs scenario search.
When the video client performs scenario searching, a video content tag matched with scenario positioning data can be determined according to a preset video content tag library; the play time point is then located in the target video according to the matching video content tag. The video content tag library comprises video content tags for describing the scenario in the video, and the video content tags can specifically comprise texts for describing personal identification information, background information and personal action information appearing in a certain section of the scenario of the video and are used for matching with scenario positioning data so as to position the scenario in the video which the user wants to watch according to the matched video content tags. Wherein the video content tag may include one or more of personal identification information, background information, and character action information.
The video client determines a positioning playing time point in the target video according to the matched video content label so as to conveniently start playing the target video from the playing time point in the target video, and the situation position in the video which the user wants to watch is quickly positioned according to the situation positioning data input by the user.
And after the video client locates the corresponding playing time point in the target video according to the association relation of the video content label and the playing time point, displaying a video playing interface in the video client, and starting to play the target video in the video playing interface at the located playing time point.
Referring to fig. 17 and 18, fig. 17 and 18 illustrate video playback interface diagrams according to one embodiment of the present application. As shown in fig. 17, after the video client locates a corresponding playing time point in the target video according to the association relationship between the video content tag and the playing time point, the video client displays a video playing interface for playing, and in this embodiment, the located playing time point is 30 minutes and 35 seconds.
The user can click the virtual play button 1701 in the video play interface to play the video, and at this time, the video play interface starts playing the target video at the located play time point, that is, starts playing the target video directly from the located play time point for 30 minutes and 35 seconds.
The scenario positioning data describing the scenario is input in the video which the user wants to watch, the video content label matched with the scenario positioning data is determined according to the preset video content label library, and then the playing time point is positioned in the target video according to the matched video content label, so that the scenario position in the video which the user wants to watch can be rapidly positioned according to the scenario positioning data input by the user, the searching response speed of the system is improved, the user does not need to position the scenario position which the user wants to watch in a mode of manually searching scenario introduction and manually playing the video, the time of manual searching by the user can be reduced to a certain extent, and the watching experience of the user is improved.
It should be noted that, in other embodiments, the video client may directly start playing the target video at the located playing time point in the video playing interface, so as to improve the playing efficiency, and the user does not need to click the virtual playing button 1701 in the video playing interface to manually play the target video, so that the user operation can be further reduced, and the user experience can be improved.
The following describes an embodiment of an apparatus of the present application, which may be used to perform the video positioning method in the above embodiment of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the video positioning method described in the present application.
Fig. 19 shows a block diagram of a video locating apparatus according to an embodiment of the present application.
Referring to fig. 19, a video positioning apparatus 1900 according to an embodiment of the present application may be, for example, a video server or a video client as shown in fig. 1, where the video positioning apparatus 1900 includes: a first acquisition unit 1910, a matching unit 1920, and a positioning unit 1930.
The first acquiring unit 1910 is configured to acquire scenario positioning data; a matching unit 1920, configured to determine, according to a preset video content tag library, a video content tag that matches the scenario positioning data; a positioning unit 1930, configured to position a playing time point in a target video according to the video content tag, so as to play the target video from the playing time point in the target video.
In some embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes: the second acquisition unit is used for acquiring the preprocessed video; the identification unit is used for identifying a target object contained in the preprocessed video and a target playing time point of the target object in the preprocessed video; and the generating unit is used for generating a video content label according to the identification information of the target object and storing the generated video content label in association with the target playing time point.
In some embodiments of the present application, based on the foregoing solution, the identifying unit is configured to: if the target object comprises a target person, matching target play time points of the occurrence of the target persons in the preprocessed video based on the characteristics of the target persons; if the target object comprises a target background, matching target playing time points of the occurrence of the target backgrounds in the preprocessed video based on the characteristics of the target backgrounds; and if the target object comprises target actions, matching target playing time points of the occurrence of the target actions in the preprocessed video based on the characteristics of the target actions.
In some embodiments of the present application, based on the foregoing solution, if the target object includes one of a target person, a target background, and a target action, the generating unit is configured to: generating a video content label according to the identification information of the one object, and storing the generated video content label and a target playing time point of the corresponding one object in the preprocessed video in an associated mode.
In some embodiments of the present application, based on the foregoing solution, if the target object includes at least two objects of a target person, a target background, and a target human action, the generating unit is configured to: determining a play time point at which the at least two objects appear simultaneously in the preprocessed video; generating a video content label according to the identification information of the at least two objects which are simultaneously appeared in the preprocessed video, and storing the generated video content label in association with the playing time point of the at least two objects which are simultaneously appeared.
In some embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes: the third acquisition unit is used for acquiring video search data, wherein the video search data comprises video screenshot and corpus data obtained by searching the preprocessed video in a video library search engine; the extraction unit is used for extracting keywords in the corpus data and determining a corresponding playing time point of the video screenshot in the preprocessed video; and the storage unit is used for generating a video content label according to the keywords and storing the generated video content label and the corresponding playing time point of the video screenshot in the preprocessed video in an associated manner.
In some embodiments of the present application, based on the foregoing scheme, the matching unit is configured to: word segmentation is carried out on the scenario positioning data, and word segmentation results are obtained; determining the number of words matched with the word segmentation result by each video content tag in the video content tag library; determining the similarity between each video content tag and the scenario positioning data according to the number of words matched with each video content tag and the word segmentation result; and determining the video content label matched with the scenario positioning data according to the similarity.
In some embodiments of the present application, based on the foregoing scheme, the matching unit is configured to: calculating the ratio of the number of words matched with the word segmentation result of each video content tag to the number of words of each video content tag according to the number of words matched with the word segmentation result of each video content tag; and determining the similarity between each video content label and the scenario positioning data according to the ratio.
In some embodiments of the present application, based on the foregoing scheme, the matching unit is configured to: and determining the similarity between each video content label and the scenario positioning data according to the number of words matched with the word segmentation result and the corresponding relation between the number of words and the similarity.
In some embodiments of the present application, based on the foregoing solution, the video positioning apparatus further includes: an execution unit for determining a video content tag library for matching the scenario positioning data in response to a search range request for the scenario positioning data; or determining a video content tag library for matching the scenario positioning data from the plurality of preset video content tag libraries according to the video content information indicated by the scenario positioning data.
In some embodiments of the present application, based on the foregoing scheme, the matching unit is configured to: converting the voice data to obtain converted text data; and determining the video content label matched with the converted text data according to a preset video content label library.
In some embodiments of the present application, based on the foregoing solution, if the video positioning device is built in a video client, the video positioning device further includes: the first playing unit is used for acquiring the target video and playing the target video from the playing time point in the target video;
if the video positioning device is built in a video server, the positioning device further comprises: the pushing unit is used for pushing the identification information of the target video and the playing time point positioned in the target video to the video client so that the video client obtains the target video according to the identification information of the target video and plays the target video from the playing time point in the target video.
According to an aspect of an embodiment of the present application, there is provided a video positioning apparatus including: the first display unit is used for displaying a scenario search interface; the second display unit is used for responding to the input operation on the scenario search interface and displaying the input scenario positioning data; and the second playing unit is used for responding to the scenario positioning instruction aiming at the scenario positioning data, displaying a video playing interface, and starting to play the target video at the positioned playing time point in the video playing interface, wherein the playing time point is the time point positioned in the target video according to the scenario positioning data.
Fig. 20 shows a schematic diagram of a computer system suitable for use in implementing the electronic device of the embodiments of the present application.
It should be noted that, the computer system 2000 of the electronic device shown in fig. 20 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 20, the computer system 2000 includes a central processing unit (Central Processing Unit, CPU) 2001, which can perform various appropriate actions and processes, such as performing the method described in the above embodiment, according to a program stored in a Read-Only Memory (ROM) 2002 or a program loaded from a storage section 2008 into a random access Memory (Random Access Memory, RAM) 2003. In the RAM 2003, various programs and data required for the system operation are also stored. The CPU 2001, ROM 2002, and RAM 2003 are connected to each other by a bus 2004. An Input/Output (I/O) interface 2005 is also connected to bus 2004.
The following components are connected to the I/O interface 2005: an input section 2006 including a keyboard, a mouse, and the like; an output portion 2007 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, a speaker, and the like; a storage section 2008 including a hard disk and the like; and a communication section 2009 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 2009 performs communication processing via a network such as the internet. The drive 2010 is also connected to the I/O interface 2005 as needed. A removable medium 2011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 2010 as needed so that a computer program read out therefrom is mounted into the storage section 2008 as needed.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 2009 and/or installed from the removable medium 2011. When executed by a Central Processing Unit (CPU) 2001, the computer program performs the various functions defined in the system of the present application.
It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the methods described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (13)

1. A video positioning method, comprising:
acquiring video search data, wherein the video search data comprises video screenshots obtained by searching preprocessed videos in a video library search engine and description information corresponding to the video screenshots, and the description information comprises comment information for commenting on the video screenshots;
extracting keywords in the description information, and determining a corresponding playing time point of the video screenshot in the preprocessed video;
Generating a video content tag according to the keyword, and storing the generated video content tag and the video screenshot in association with a corresponding playing time point in the preprocessed video;
acquiring scenario positioning data, wherein the scenario positioning data is scenario data described by a user for watching a certain section of scenario in a video;
word segmentation is carried out on the scenario positioning data, and word segmentation results are obtained;
respectively matching the word segmentation result with each video content tag in a preset video content tag library to determine the number of words matched with the word segmentation result by each video content tag in the preset video content tag library;
obtaining the number of matched words corresponding to each video content tag according to the number of words matched with each video content tag and the word segmentation result;
calculating the ratio of the number of matched words corresponding to each video content tag to the number of words of each video content tag;
determining the similarity between each video content tag and the scenario positioning data according to the ratio, wherein the greater the ratio is, the higher the similarity between the video content tag and the scenario positioning data is;
Determining the video content labels with the similarity higher than a preset similarity threshold value as video content labels matched with the scenario positioning data;
and positioning a playing time point in the target video according to the video content label so as to play the target video from the playing time point in the target video.
2. The video positioning method of claim 1, further comprising:
acquiring a preprocessed video;
identifying a target object contained in the preprocessed video and a target play time point of the target object in the preprocessed video;
and generating a video content label according to the identification information of the target object, and storing the generated video content label in association with the target playing time point.
3. The video localization method of claim 2, wherein the identifying the target object contained in the preprocessed video and the target play time point at which the target object appears in the preprocessed video comprises:
if the target object comprises a target person, matching target play time points of the occurrence of the target persons in the preprocessed video based on the characteristics of the target persons;
If the target object comprises a target background, matching target playing time points of the occurrence of the target backgrounds in the preprocessed video based on the characteristics of the target backgrounds;
and if the target object comprises target actions, matching target playing time points of the occurrence of the target actions in the preprocessed video based on the characteristics of the target actions.
4. The video positioning method according to claim 2, wherein if the target object includes one of a target character, a target background, and a target action, the generating a video content tag according to the identification information of the target object, and storing the generated video content tag in association with the target play time point includes:
generating a video content label according to the identification information of the one object, and storing the generated video content label and a target playing time point of the corresponding one object in the preprocessed video in an associated mode.
5. The video positioning method according to claim 2, wherein if the target object includes at least two objects of a target person, a target background, and a target human action, the generating a video content tag according to the identification information of the target object, and storing the generated video content tag in association with the target play time point includes:
Determining a play time point at which the at least two objects appear simultaneously in the preprocessed video;
generating a video content label according to the identification information of the at least two objects which are simultaneously appeared in the preprocessed video, and storing the generated video content label in association with the playing time points of the at least two objects which are simultaneously appeared.
6. The video localization method of claim 1, wherein the determining the similarity between the respective video content tags and the scenario localization data comprises:
and determining the similarity between each video content label and the scenario positioning data according to the number of words matched with the word segmentation result and the corresponding relation between the number of words and the similarity.
7. The video positioning method according to claim 1, wherein if there are a plurality of preset video content tag libraries, the video positioning method further comprises:
determining a video content tag library for matching the scenario positioning data in response to a search range request for the scenario positioning data; or alternatively
And determining a video content tag library for matching the scenario positioning data from the plurality of preset video content tag libraries according to the video content information indicated by the scenario positioning data.
8. The video positioning method according to claim 1, wherein if the scenario positioning data is voice data, the word segmentation is performed on the scenario positioning data to obtain a word segmentation result, including:
converting the voice data to obtain converted text data;
and determining the converted text data as the word segmentation result.
9. The video positioning method according to any one of claims 1 to 8, wherein if the video positioning method is performed by a video client, the video positioning method further comprises: acquiring the target video, and playing the target video from the playing time point in the target video;
if the video positioning method is executed by a video server, the video positioning method further comprises: pushing the identification information of the target video and the playing time point positioned in the target video to a video client, so that the video client obtains the target video according to the identification information of the target video, and playing the target video from the playing time point in the target video.
10. A video positioning method, comprising:
Displaying a scenario search interface;
responding to the input operation on the scenario searching interface, displaying the input scenario positioning data, wherein the scenario positioning data is scenario data described by a user for a section of scenario in a video to be watched;
responding to a scenario positioning instruction aiming at the scenario positioning data, displaying a video playing interface, starting to play a target video in the video playing interface at a positioned playing time point, wherein the playing time point is obtained by positioning in the target video according to a video content tag matched with the scenario positioning data, the process of associating and storing the video content tag with the playing time point comprises the steps of obtaining video search data, wherein the video search data comprises video screenshots obtained by searching preprocessed videos in a video library search engine and description information corresponding to the video screenshots, and the description information comprises comment information for commenting on the video screenshots; extracting keywords in the description information, and determining a corresponding playing time point of the video screenshot in the preprocessed video; generating a video content tag according to the keyword, and storing the generated video content tag and the video screenshot in association with a corresponding playing time point in the preprocessed video; when determining a video content tag matched with the scenario positioning data, performing word segmentation on the scenario positioning data to obtain a word segmentation result; respectively matching the word segmentation result with each video content tag in a preset video content tag library to determine the number of words matched with the word segmentation result by each video content tag in the preset video content tag library; obtaining the number of matched words corresponding to each video content tag according to the number of words matched with each video content tag and the word segmentation result; calculating the ratio of the number of matched words corresponding to each video content tag to the number of words of each video content tag; determining the similarity between each video content tag and the scenario positioning data according to the ratio, wherein the greater the ratio is, the higher the similarity between the video content tag and the scenario positioning data is; and determining the video content labels with the similarity higher than a preset similarity threshold value as the video content labels matched with the scenario positioning data.
11. A video positioning apparatus, comprising:
the third acquisition unit is used for acquiring video search data, wherein the video search data comprises video screenshots obtained by searching the preprocessed video in a video library search engine and description information corresponding to the video screenshots, and the description information comprises comment information for commenting on the video screenshots;
the extraction unit is used for extracting keywords in the description information and determining a corresponding playing time point of the video screenshot in the preprocessed video;
the storage unit is used for generating a video content label according to the keywords and storing the generated video content label and the corresponding playing time point of the video screenshot in the preprocessed video in an associated manner;
the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring scenario positioning data, and the scenario positioning data is scenario data described by a user for a section of scenario in a video to be watched;
the matching unit is used for word segmentation of the scenario positioning data to obtain word segmentation results; respectively matching the word segmentation result with each video content tag in a preset video content tag library to determine the number of words matched with the word segmentation result by each video content tag in the preset video content tag library; obtaining the number of matched words corresponding to each video content tag according to the number of words matched with each video content tag and the word segmentation result; calculating the ratio of the number of matched words corresponding to each video content tag to the number of words of each video content tag; determining the similarity between each video content tag and the scenario positioning data according to the ratio, wherein the greater the ratio is, the higher the similarity between the video content tag and the scenario positioning data is; determining the video content labels with the similarity higher than a preset similarity threshold value as video content labels matched with the scenario positioning data;
And the positioning unit is used for positioning a playing time point in the target video according to the video content label so as to play the target video from the playing time point in the target video.
12. A computer readable medium on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the video localization method of any one of claims 1 to 10.
13. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the video localization method of any one of claims 1 to 10.
CN201911046300.2A 2019-10-30 2019-10-30 Video positioning method, video positioning device, computer readable medium and electronic equipment Active CN110740389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911046300.2A CN110740389B (en) 2019-10-30 2019-10-30 Video positioning method, video positioning device, computer readable medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911046300.2A CN110740389B (en) 2019-10-30 2019-10-30 Video positioning method, video positioning device, computer readable medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110740389A CN110740389A (en) 2020-01-31
CN110740389B true CN110740389B (en) 2023-05-02

Family

ID=69271886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911046300.2A Active CN110740389B (en) 2019-10-30 2019-10-30 Video positioning method, video positioning device, computer readable medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110740389B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111294660B (en) * 2020-03-12 2021-11-16 咪咕文化科技有限公司 Video clip positioning method, server, client and electronic equipment
CN111314732A (en) * 2020-03-19 2020-06-19 青岛聚看云科技有限公司 Method for determining video label, server and storage medium
CN111680189B (en) * 2020-04-10 2023-07-25 北京百度网讯科技有限公司 Movie and television play content retrieval method and device
CN111615007A (en) * 2020-05-27 2020-09-01 北京达佳互联信息技术有限公司 Video display method, device and system
CN111711869B (en) * 2020-06-24 2022-05-17 腾讯科技(深圳)有限公司 Label data processing method and device and computer readable storage medium
CN112203115B (en) * 2020-10-10 2023-03-10 腾讯科技(深圳)有限公司 Video identification method and related device
CN112328829A (en) * 2020-10-27 2021-02-05 维沃移动通信(深圳)有限公司 Video content retrieval method and device
WO2024059959A1 (en) * 2022-09-19 2024-03-28 京东方科技集团股份有限公司 Demonstration control method, control device, demonstration device and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8799959B2 (en) * 2012-08-16 2014-08-05 Hoi L. Young User interface for entertainment systems
CN105898362A (en) * 2015-11-25 2016-08-24 乐视网信息技术(北京)股份有限公司 Video content retrieval method and device
CN110121033A (en) * 2018-02-06 2019-08-13 上海全土豆文化传播有限公司 Video categorization and device
CN109947993B (en) * 2019-03-14 2022-10-21 阿波罗智联(北京)科技有限公司 Plot skipping method and device based on voice recognition and computer equipment

Also Published As

Publication number Publication date
CN110740389A (en) 2020-01-31

Similar Documents

Publication Publication Date Title
CN110740389B (en) Video positioning method, video positioning device, computer readable medium and electronic equipment
US10824874B2 (en) Method and apparatus for processing video
CN109117777B (en) Method and device for generating information
US10209782B2 (en) Input-based information display method and input system
CN108776676B (en) Information recommendation method and device, computer readable medium and electronic device
CN109543058B (en) Method, electronic device, and computer-readable medium for detecting image
CN110134931B (en) Medium title generation method, medium title generation device, electronic equipment and readable medium
CN104598644B (en) Favorite label mining method and device
CN112115299A (en) Video searching method and device, recommendation method, electronic device and storage medium
CN109034069B (en) Method and apparatus for generating information
CN110991187A (en) Entity linking method, device, electronic equipment and medium
WO2022134701A1 (en) Video processing method and apparatus
US11856277B2 (en) Method and apparatus for processing video, electronic device, medium and product
CN109582825B (en) Method and apparatus for generating information
CN113806588B (en) Method and device for searching video
CN110737824B (en) Content query method and device
CN115269913A (en) Video retrieval method based on attention fragment prompt
CN113407775B (en) Video searching method and device and electronic equipment
CN113869063A (en) Data recommendation method and device, electronic equipment and storage medium
US10499121B2 (en) Derivative media content systems and methods
CN116629236A (en) Backlog extraction method, device, equipment and storage medium
CN115098729A (en) Video processing method, sample generation method, model training method and device
CN114827702A (en) Video pushing method, video playing method, device, equipment and medium
CN113497953A (en) Music scene recognition method, device, server and storage medium
CN114697762B (en) Processing method, processing device, terminal equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40020814

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant