CN110297943B

CN110297943B - Label adding method and device, electronic equipment and storage medium

Info

Publication number: CN110297943B
Application number: CN201910604615.8A
Authority: CN
Inventors: 庄凯; 肖剑锋; 崔恒利
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2022-07-26
Anticipated expiration: 2039-07-05
Also published as: CN110297943A

Abstract

The application provides a label adding method, a label adding device, electronic equipment and a storage medium, wherein at least one frame of image contained in a video is obtained firstly, and attribute information of an object contained in the at least one frame of image is extracted; if the attribute information is matched with the standard attribute information of the target object stored in advance, adding a label of the target object corresponding to the position of the at least one frame of image; the label corresponding to the target object is used for indicating the playing position of the image containing the target object in the video containing the at least one frame of image, so that the purpose of automatically adding the label of the target object in the video is achieved, namely, if a user needs to check the interested object in the video, the image containing the target object can be played through the label of the target object, the user does not need to search the interested object, and the time is saved. In the process of obtaining the image to obtain the video, the purpose of automatically adding the label of the target object to the video is achieved, and time and energy of a video post-production worker are saved.

Description

Label adding method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for adding a tag, an electronic device, and a storage medium.

Background

Currently, users may record events via video, such as maintenance machine events, family activity events, and so forth. If the user needs to view the content of interest in the video, for example, the maintenance process of the engine, the user needs to browse the video and find the interested part of the user in the video.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for adding a label, an electronic device, and a storage medium.

In order to achieve the above purpose, the present application provides the following technical solutions:

a method of adding a label, comprising:

in the process of obtaining an image to obtain a video, obtaining at least one frame of image from the currently obtained image;

acquiring attribute information of an object contained in the at least one frame of image;

if the attribute information of the object is matched with the standard attribute information of the pre-stored target object, adding a label of the target object corresponding to the position of the at least one frame of image;

the label of the target object is used for indicating a playing position of the video containing the at least one frame of image for playing the image containing the target object.

In an optional embodiment, the tag of the target object comprises at least one of a time tag and a location tag.

In an optional embodiment, the tag of the target object further includes a content tag, and the tag adding method further includes at least one of:

determining a content tag of the target object based on audio data corresponding to the at least one frame of image;

determining a content label of the target object based on characters contained in the at least one frame of image;

and determining a content label of the target object based on scene information of the object contained in the at least one frame of image.

In an alternative embodiment of the method of the invention,

the time tag includes at least one of: an initial time at which an image containing the target object appears for the first time in the video; a termination time at which an image containing the target object appears in the video for the last time;

the location tag includes at least one of: an initial position in the video where an image containing the target object appears for the first time; an end position where an image containing the target object appears last in the video.

In an optional embodiment, the obtaining of the attribute information of the object included in the at least one image includes at least one of:

obtaining appearance characteristic information of the object based on the at least one frame of image, wherein the appearance characteristic information of the object represents the appearance form of the object;

and obtaining attribute identification information of the object based on the at least one frame of image, wherein the attribute identification information of the object is arranged on the surface of the object and is used for representing the object class to which the object belongs.

In an optional embodiment, further comprising:

and storing the label of the target object to a database.

In an optional embodiment, further comprising:

if the label corresponding to the target object is triggered, generating a control instruction; the control instruction is used for controlling the video to jump to the playing position.

An apparatus for adding a label, comprising:

the first acquisition module is used for acquiring at least one frame of image from the currently acquired image in the process of acquiring the image to acquire the video;

the second acquisition module is used for acquiring attribute information of an object contained in the at least one frame of image;

the setting module is used for adding a label of the target object corresponding to the position of the at least one frame of image if the attribute information of the object is matched with the standard attribute information of the target object stored in advance;

the label of the target object is used for indicating a playing position of the image containing the target object in the video containing the at least one frame of image.

An electronic device, comprising:

a memory for storing a program;

a processor configured to execute the program, the program specifically configured to:

A readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of adding a tag as described in any one of the above.

According to the technical scheme, in the tag adding method provided by the application, at least one frame of image contained in a video is obtained firstly, and the attribute information of an object contained in the at least one frame of image is extracted; if the attribute information is matched with the standard attribute information of the target object stored in advance, adding a label of the target object corresponding to the position of the at least one frame of image; the label corresponding to the target object is used for indicating the playing position of the image containing the target object in the video containing the at least one frame of image, so that the purpose of automatically adding the label of the target object in the video is realized, namely, if a user needs to check the object interested in the video, the image containing the target object can be played by the video through the label of the target object, the user does not need to search the image, and the time is saved.

Furthermore, the method for adding the label achieves the purpose of automatically adding the label of the target object to the video in the process of obtaining the video by obtaining the image, and after the video is obtained, the label of the target object is added, so that the time and the energy of a video post-production worker are saved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an acquisition terminal, a server, and one or more display terminals provided in an embodiment of the present application;

fig. 2 is a flowchart of a method for adding a tag according to an embodiment of the present application;

fig. 3a to fig. 3b are schematic diagrams of two implementation manners of a time tag provided in an embodiment of the present application;

fig. 4a to 4b are schematic diagrams of two implementations of a position tag provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of an implementation of a content tag provided by an embodiment of the present application;

fig. 6 is a block diagram of an implementation manner of a tag adding apparatus provided in an embodiment of the present application;

fig. 7 is a structural diagram of an implementation manner of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The embodiment of the application provides a label adding method and device, electronic equipment and a readable storage medium.

The tag adding device may include a tag adding device running in the terminal and a tag adding device running in the background server/platform.

The terminal may be an electronic device such as a desktop, a mobile terminal (e.g., a smart phone), an ipad, etc. In one example, the adding means of the tag running in the terminal may be a client running in the terminal. The client may be an application client or a web client.

The adding device of the tag running in the background server/platform can be a hardware component of the server/platform, and also can be a functional module or component.

The background server or the platform may be one server, a server cluster composed of a plurality of servers, or a cloud computing service center.

The method for adding the label provided by the embodiment of the application can be applied to various application scenarios, and the embodiment of the application provides, but is not limited to, the following application scenarios.

A first application scenario: remote assistance application scenarios, such as equipment routing inspection and recording of maintenance process issues and remote expert guidance.

The method for adding the label can be applied to a hardware structure consisting of the acquisition terminal, the server and the display terminal.

As shown in fig. 1, a schematic structural diagram of an acquisition terminal 11, a server 12, and one or more display terminals 13 provided in the embodiment of the present application is shown.

In an alternative embodiment, the collection terminal 11 may be an AR (Augmented Reality) terminal, such as AR glasses or an AR helmet, to free the hands of a user located on the scene; in another alternative embodiment, the capture terminal 11 may be a camera.

A user can carry the acquisition terminal 11 to the site; the acquisition terminal 11 can acquire an on-site image and upload the image to the server 12; the server 12 transmits the captured images to one or more display terminals 13.

As shown in fig. 1, the image captured by the capture terminal 11 is an image displayed by the display terminal 13.

In an alternative embodiment, the display terminal 13 may be a computer or a terminal device such as a smart phone or a head-mounted display device.

In an alternative embodiment, the holder of one or more display terminals, which may be a guidance expert, may guide the user who is present through the image displayed by the display terminal 13.

In an optional embodiment, the holder of the display terminal 13 and the holder of the acquisition terminal 11 can perform audio communication through the server 12; in an alternative embodiment, the holder of the display terminal 13 and/or the holder of the capture terminal 11 may be annotated with, for example, text messages based on the currently displayed image.

It is understood that the capture terminal 11 may send multiple frames of consecutive images to the display terminal 13 through the server 12, and may record the multiple frames of consecutive images to obtain a video.

In an optional embodiment, in order to allow other people to check the remote assistance process performed by the holder of the display terminal 13 on the holder of the acquisition terminal 11, multiple frames of continuous images may be recorded to obtain a video; therefore, if other people need to view the remote assistance process, the corresponding videos can be watched.

The method for adding the label provided by the embodiment of the application can be applied to the above-mentioned acquisition terminal 11, server 12 or display terminal 13. That is, in the process of obtaining an image by the capture terminal 11 or the server 12 or the display terminal 13 to obtain a video, if it is recognized that one or more frames of images contain attribute information of a target object, a tag of the target object is added, where the tag of the target object is used to indicate a playing position in the video where the image containing the target object is played. Therefore, after the video is obtained, the label of the target object is added, and the video can be directly stored. The post-production personnel of the video do not need to process the video again, and manpower and material resources are saved.

Furthermore, the label is automatically added by identifying the attribute information of the target object contained in the image, the object identification technology is utilized, the attribute information of the target object does not need to be manually added, and manpower and material resources are saved.

Second application scenario: video communication application scenarios.

The method for adding the label provided by the embodiment of the application can be applied to a hardware structure consisting of a server, at least one first terminal device and at least one second terminal device.

The first terminal equipment can collect images and transmit the images to the server; the server can transmit the image to the second terminal device, so that the second terminal device displays the image of the side where the first terminal device is located; the second terminal equipment can collect images and transmit the images to the server; the server may transmit the image to the first terminal device; the first terminal device may display an image of the side where the second terminal device is located.

For example, the user a and the user B perform video communication through a video communication function of WeChat.

In an optional embodiment, during the process of video communication between the first terminal device and the second terminal device, audio communication may also be performed through the server.

In an alternative embodiment, where the video communication may be important and the progress of the video communication needs to be recorded, the progress of the video communication may be recorded to obtain the video.

In an optional embodiment, the method for adding a tag provided in this embodiment of the present application may be applied to the first terminal device, the server, or the second terminal device. That is, the first terminal device or the server or the second terminal device may obtain the image acquired by the first terminal device and/or the image acquired by the second terminal device. In an optional embodiment, the first terminal device or the server or the second terminal device may further obtain audio data collected by the first terminal device and/or audio data collected by the second terminal device.

In the process of obtaining an image collected by first terminal equipment and an image collected by second terminal equipment to obtain a video, if one or more frames of images are identified to contain attribute information of a target object, adding a label of the target object, wherein the label of the target object is used for indicating a playing position of the image containing the target object in the video.

In summary, in the second application scenario, the video is obtained by recording an interactive image between at least one first participant and at least one second participant in the video communication; the participant participating in the video communication comprises the at least one first participant and the at least one second participant;

or the like, or, alternatively,

the video is obtained by recording interactive images and interactive audio data of at least one first party and at least one second party in video communication; the participant in the video communication includes the at least first participant and the at least one second participant.

The third application scenario: video recording application scenes, for example, application scenes such as recording of home images.

In a video recording application scenario, only the terminal device (e.g., a camera) may be involved, for example, the terminal device captures an image and stores it in its own contained database.

In a video recording application scenario, a terminal device and a server may be involved; after the terminal collects the image, the image is transmitted to the server, and the server stores the image. Optionally, the server comprises a database.

The method for adding the label provided by the embodiment of the application can be applied to the terminal equipment or the server. In the process of obtaining a video, if one or more frames of images are identified to contain the attribute information of the target object, adding a label of the target object, wherein the label of the target object is used for indicating the playing position of the video for playing the images containing the target object.

In the following, a method for adding a label provided in the embodiment of the present application is described with reference to the application scenario.

As shown in fig. 2, a flowchart of a method for adding a tag provided in an embodiment of the present application includes:

step S201: in the process of acquiring images to obtain a video, at least one frame of image is acquired from the currently obtained images.

There are various implementation manners of step S201, and the embodiments of the present application provide, but are not limited to, the following.

The first implementation mode comprises the following steps: and in the process of sequentially obtaining the images to obtain the video, obtaining the at least one frame of image contained in a set time period taking the current time as the termination time.

It is understood that the video includes a plurality of frames of ordered images, and the "sequentially obtained images" in the process of sequentially obtaining images to obtain the video is to sequentially obtain images according to the playing sequence of the plurality of frames of images forming the video.

It is understood that more and more images are obtained over time; after each frame or frames of images are obtained, the frame or frames of images may be processed (for example, the processing manner in the subsequent step S202). Therefore, the data volume of the image processed each time is small, and the processing time is shortened.

The above "current time", "end time", and "set time period" will be described below by way of example.

It is understood that the "current time" is constantly changing over time.

Assuming that the current time is 10 hours, 5 minutes and 15 seconds, the set period of time may be [10 hours, 4 minutes and 15 seconds, 10 hours, 5 minutes and 15 seconds ]. I.e. at least one image acquired within 1 minute from the current time.

The second implementation mode comprises the following steps: in the process of obtaining the image to obtain the video, at least one frame of image is obtained from the obtained image.

In an alternative embodiment, at least one frame of image is randomly obtained from the obtained images; alternatively, at least one frame of image not subjected to the processing of the subsequent step S202 is obtained from the obtained images.

It is to be understood that the above-mentioned "at least one frame image" may or may not include the target object.

In an alternative embodiment, the obtained image may not be detected to obtain "at least one frame of image" that may contain the target object; i.e., the "at least one image" may or may not include the target object. It is clear that this approach requires a large number of images to be processed.

To reduce the number of images that need to be processed, in an alternative embodiment, the resulting images may be detected to obtain "at least one frame of image" that may contain the target object. Optionally, the similarity between the object included in the "at least one frame of image" and the target object is greater than or equal to a first threshold.

Based on this, step S201 may also include the following implementation.

The third implementation mode comprises the following steps: in the process of sequentially obtaining the images to obtain the video, at least one frame of image with the similarity between the object and the target object being greater than or equal to a first threshold value is obtained from a plurality of frames of images contained in a set time period taking the current time as the termination time.

The fourth implementation mode comprises the following steps: in the process of obtaining the image to obtain the video, at least one frame of image containing an object with the similarity of the object and the target object being greater than or equal to a first threshold value is obtained from the obtained image.

In an alternative embodiment, the obtained global features of each frame of image may be extracted, and whether an object similar to the target object is included in the image is detected based on comparison between the global features of the image and the target object, for example, the global features of the image may be extracted by a BOW (Bag of Words) method.

In an alternative embodiment, the global features include: at least one of a color feature, a texture feature, and a shape feature.

The color features can be represented by color histograms and can represent visual information of the image; the texture features are linear textures presented on the image, and different objects can be distinguished by utilizing the textures; the shape features characterize contour information of objects comprised by the image.

Step S202: and acquiring attribute information of an object contained in the at least one frame of image.

In an alternative embodiment, the attribute information of the object included in the at least one frame of image may be obtained by using a local feature extraction method.

The embodiment of the present application provides, but is not limited to, the following local feature extraction method: SIFT (Scale-invariant feature transform), MSER (maximum Stable explicit Regions), HARISS AFFINE (affine invariant feature matching algorithm), HESSIAN AFFINE, and the like.

Step S203: and if the attribute information of the object is matched with the standard attribute information of the pre-stored target object, adding a label of the target object corresponding to the position of the at least one frame of image.

In an optional embodiment, the "position where the at least one frame of image is located" refers to a playing position where the at least one frame of image is played in the video.

In an alternative embodiment, the standard attribute information of the target object may be 3D model information of the target object.

In an alternative embodiment, the tag of the target object may also be stored to a database.

In an alternative embodiment, the tags of the target objects and the videos may be stored to a database.

In an optional embodiment, the process of "adding a tag of the target object corresponding to the position where the at least one frame of image is located" is a process of establishing an association relationship between a video file corresponding to a video and a tag file containing the tag of the target object.

In an alternative embodiment, "storing the tag and video of the target object to the database" refers to storing the video file and the tag file to the database. In an alternative embodiment, "storing the tag of the target object in the database" means storing the tag file in the database.

In an alternative embodiment, the tag file and the video file may be stored in the same database or in different databases.

In an optional embodiment, since the video file and the tag file have an association relationship, the tag of the target object can be displayed after the video is started; in an optional embodiment, since the video file and the tag file have an association relationship, the corresponding video can be opened after the tag file is opened.

In an optional embodiment, after a complete video is obtained, if it is detected that a tag corresponding to the target object is triggered, a control instruction is generated; the control instruction is used for controlling the video to jump to the playing position. In summary, in the process of playing the video, the user can quickly jump to the corresponding playing position in the video for watching through the tag of the target object.

According to the method for adding the label, at least one frame of image contained in a video is obtained firstly, and attribute information of an object contained in the at least one frame of image is extracted; if the attribute information is matched with the standard attribute information of the target object stored in advance, adding a label of the target object corresponding to the position of the at least one frame of image; the label corresponding to the target object is used for indicating the playing position of the image containing the target object in the video containing the at least one frame of image, so that the purpose of automatically adding the label of the target object in the video is realized, namely if a user needs to check the object interested in the video, the user does not need to search for the image containing the target object by himself through the label of the target object, and the time is saved.

Further, the method for adding the label achieves the purpose of automatically adding the label of the target object to the video in the process of obtaining the video by obtaining the image, and after the video is obtained, the label of the target object is added, so that the time and the energy of a video post-production worker are saved.

The "label" mentioned in the examples of the present application is explained below.

In an alternative embodiment, the time tag is used to indicate the time of the image in the video containing the target object.

In an alternative embodiment, the time tag is used to indicate the time of the image containing the target object in the video, and may refer to the time when the time tag includes the image containing the target object in the video, or the time tag has a time attribute, that is, the time tag is associated with the time.

In an alternative embodiment, the position tag is used to indicate the position of an image in the video that contains the target object.

In an alternative embodiment, the location tag is used to indicate the location of the image containing the target object in the video, and may refer to the location tag including the location of the image containing the target object in the video, or the location tag has a location attribute, i.e. the location tag is associated with the location.

In order to make those skilled in the art more understand the time tag and the location tag mentioned in the embodiments of the present application, the following description is made with reference to specific examples.

As shown in fig. 3a to fig. 3b, schematic diagrams of two implementation manners of the time tag provided in the embodiment of the present application are shown.

The window displaying the time stamp shown in fig. 3a to 3b is suspended on the surface of the window displaying the video, that is, the window displaying the time stamp is a suspended window, and the relative positions of the window displaying the time stamp and the window displaying the video shown in fig. 3a to 3b are only examples, for example, the window displaying the time stamp may also be located on the upper side or the lower side or the left side or the right side of the window displaying the video.

As shown in fig. 3a, the time tag may have a time attribute, and the time tag may be represented by the name or identification of the target object.

For example, the left graph of fig. 3a represents time labels by names of target objects, for example, the time label is represented by "name a", and one video may include at least one label corresponding to each target object, for example, the "name a" and the "name B" shown in fig. 3a are labels corresponding to two target objects.

After the user clicks "name a" in the left image of fig. 3a, the video may jump to the picture containing the star image of "name a" shown in the right image of fig. 3a, and continue playing the video.

In an alternative embodiment, if the user desires to view a video containing a star of "name B," then "more" in the right diagram of FIG. 3a may be clicked on to show the floating window containing "name A and name B" shown in the left diagram of FIG. 3 a.

The playing position of the currently presented video is indicated in fig. 3a by a rectangular square filled with a "grid pattern"; it can be seen that the image shown on the left side of FIG. 3a does not include a star for "name A"; the image shown on the right side of FIG. 3a includes a star of "name A"; if the user clicks "name a", the rectangular box of "grid pattern" jumps from the position of the left image of fig. 3a to the position of the right image of fig. 3a, so that the video containing the star of "name a" is played.

In an alternative embodiment, the multi-frame images in the video containing the target object are continuous in time, for example, if the total length of the video is 10 minutes, the images in 2 minutes and 5 seconds to 5 minutes and 6 seconds of the video all contain the target object, and the images in other times of the video do not contain the target object, it indicates that the multi-frame images in the video containing the target object are continuous in time. At this time, the time stamp may include: at least one of an initial time (e.g., 2 minutes 5 seconds described above) at which the video first appears in the image containing the target object and a terminal time (e.g., 5 minutes 6 seconds described above) at which the video last appears in the image containing the target object.

In an alternative embodiment, the multiple frames of images in the video containing the target object are not consecutive in time, for example, the total length of the video is 10 minutes, and the images in 1 minute 5 seconds to 2 minutes 6 seconds of the video all contain the target object; images included in 4 minutes 6 seconds to 7 minutes 8 seconds of the video each include a target object; no other time-contained image of the video includes the target object. It indicates that the multiple frames of images in the video containing the target object are not consecutive in time.

It is assumed that the continuous image of the plurality of frames including the target object is referred to as a sub-video, for example, the continuous image of the plurality of frames included in 1 minute 5 seconds to 2 minutes 6 seconds of the video is referred to as a sub-video, and the continuous image of the plurality of frames included in 4 minutes 6 seconds to 7 minutes 8 seconds of the video is referred to as another sub-video.

If a plurality of frames of images containing the target object in the video are not continuous in time, the video comprises a plurality of sub-videos corresponding to the target object; in an optional embodiment, after the user clicks the time tag corresponding to the target object, for example, "name a", the multiple sub-videos may be automatically played until the multiple sub-videos are all played, without any operation by the user.

If the multiple frames of images containing the target object in the video are not consecutive in time, in an optional embodiment, the time tag may include one or more time periods, where one time period corresponds to the one sub-video, and optionally, one time period corresponding to the target object (for example, 1 minute 5 seconds to 2 minutes 6 seconds of the video) includes: an initial time (e.g., 1 minute 5 seconds) at which the sub video corresponding to the period of time first appears in the image containing the target object and a final time (e.g., 2 minutes 6 seconds) at which the sub video corresponding to the period of time last appears in the image containing the target object.

As shown in fig. 3b, the time label — name a corresponds to time period 1 and time period 2; the time label- "name B" corresponds to time period 3 and time period 4.

As shown in fig. 3b, if the user clicks "name a" corresponding to time period 1, the playing of the video in time period 1 can be controlled to include multiple continuous images.

As shown in fig. 4a to 4b, schematic diagrams of two implementations of the position tag provided in the embodiment of the present application are shown.

The position tag as shown in fig. 4a may be located on a progress bar of the rectangular representation shown in fig. 4 a.

The relative position of the position label shown in fig. 4a and the window displaying the video is only an example and is not limiting to the present application, for example, the position label may be a floating window floating on the surface of the window displaying the video as the time label shown in fig. 3 a; alternatively, the position tag may not be suspended on the surface of the window showing the video, but be located on the left or right side or upper or lower side of the window showing the video.

In an alternative embodiment, the location tag of the target object may be represented by the name or logo or pattern of the target object, such as the location tag of the target object represented by the pattern of the target object in fig. 4 a.

If the user clicks the position tag on the progress bar, the video may be controlled to jump to the playing position indicated by the position tag, and continue to play the video, as shown in the right diagram of fig. 4a, which is a result of the position tag clicked in the left diagram of fig. 4 a.

The position tag shown in fig. 4a has a position attribute, i.e. although it does not contain any position, the position tag can indicate that the image containing the target object is located at the position of the video.

In an alternative embodiment, the location tag of the target object includes the location of the image in the video containing the target object.

In an optional embodiment, the multi-frame images including the target object in the video are continuous in position, for example, the images included in the positions 1 to 2 of the video all include the target object, and the images included in other positions do not include the target object, which indicates that the multi-frame images including the target object in the video are continuous in position. At this time, the position tag includes at least one of: an initial position (e.g., position 1 described above) at which an image containing the target object appears for the first time in the video; the last time an end position of the image containing the target object (e.g., position 2 described above) appears in the video.

As shown in fig. 4b, the location label- "name a" includes location 1 and location 2; the position 1 may be an initial position where an image including a star with a name of "name a" appears for the first time in the video; location 2 may be the last termination location in the video where an image containing the first name "name a" appears.

In an alternative embodiment, the multiple frames of images in the video containing the target object are not continuous in position, for example, the images contained in position 1 to position 2 of the video each include the target object; the images contained in position 3 (where position 2 and position 3 are not consecutive in the video) through position 4 of the video each include a target object; no other time-contained image of the video includes the target object. It is indicated that the multi-frame images of the video including the target object are not continuous in time.

It is assumed that the continuous images of the frames including the target object are referred to as one sub-video, for example, the continuous images of the frames included in positions 1 to 2 of the video are referred to as one sub-video, and the continuous images of the frames included in positions 3 to 4 of the video are referred to as another sub-video.

If a plurality of frames of images containing the target object in the video are not continuous in time, the video comprises a plurality of sub-videos corresponding to the target object; optionally, the target object may correspond to one or more location ranges. One position range corresponds to one sub video. One location range includes: an initial position (for example, the position 1 or the position 3) at which an image including the target object appears for the first time in the sub video corresponding to the position range; the end position (for example, the position 2 or the position 4 described above) at which the image including the target object appears last in the sub video corresponding to the position range.

In an optional embodiment, the tags of the target object may also include content tags. Wherein the content tag of the target object comprises introduction information for one or more frames of images containing the target object.

In an optional embodiment, the introduction information for one or more frames of images containing the target object may include at least one of the following:

scene introduction information for one or more frames of images including the target object;

and object introduction information of other objects except the target object is contained in the one or more frames of images containing the target object.

Fig. 5 is a schematic diagram of an implementation manner of a content tag provided in an embodiment of the present application.

As shown in the left diagram of fig. 5, the video may include one or more content tags, such as the content tag shown in the left diagram of fig. 5- "a scene of a peach blossom in three mouths" and the content tag- "a scene of fighting with a monster".

The relative position relationship between the window displaying the content label and the window displaying the video shown in fig. 5 is only an example, that is, the window displaying the content label may be a floating window floating on the surface of the window displaying the video.

Optionally, the window displaying the content tag may also be located on the upper side or the lower side or the left side or the right side of the window displaying the video, and the embodiment of the application does not limit the relative position of the window displaying the content tag and the window displaying the video.

After clicking a content tag, namely 'watching a peach blossom scene with three mouths', the user can control the video to jump to an interface shown in the right side diagram of fig. 5.

In an alternative embodiment, the content tag has a time attribute or a location attribute.

The content tag has a time attribute, which means that the content tag can indicate the time when the image containing the content tag is located in the video; a content tag having a location attribute means that the content tag is capable of indicating where in the video the image containing the content tag is located.

In an alternative embodiment, if the content tag does not have a time attribute or a location attribute, then, optionally, the time tag of the target object may correspond to the content tag, one target object may correspond to one or more time tags, and each time tag may correspond to one or more content tags; in an alternative embodiment, the location tags of the target objects may correspond to content tags, one target object may correspond to one or more location tags, and one location tag may correspond to one or more content tags.

In an alternative embodiment, the user may determine whether there is self-perceptual content in the video by looking at the content tag.

A method of acquiring the content tag mentioned in the embodiment of the present application is explained below. The embodiments of the present application provide, but are not limited to, the following ways.

The first implementation manner of obtaining the content tag is as follows: determining a content tag of the target object based on audio data corresponding to the at least one frame of image.

Through the description of the three application scenes, in the process of obtaining the image to obtain the video, the audio data corresponding to the image can be obtained, for example, in the video communication application scene, not only the image but also the audio data of multiple communication parties can be obtained.

The audio data corresponding to the "at least one frame image" mentioned in the above-mentioned step S202 may be subjected to speech recognition to obtain a content tag.

For example, the audio data is identified to obtain corresponding text information, and keywords are extracted from the text information to obtain the content tag.

The second implementation manner of obtaining the content tag is as follows: and determining a content label of the target object based on the characters contained in the at least one frame of image.

Through the description of the three application scenes, in the process of obtaining the image to obtain the video, characters can be added in the image, for example, in the first application scene, the holder of the display terminal 13 and/or the holder of the capture terminal 11 can label, for example, character information based on the currently displayed image.

Characters contained in the image can be recognized through a character recognition method in image recognition, and optionally, the content label of the target object can be determined based on keywords contained in the characters.

In the foregoing implementation manner of obtaining the content tag in the first kind and the implementation manner of obtaining the content tag in the second kind, optionally, the method for adding the tag provided in the embodiment of the present application may further include:

the method comprises the steps of pre-storing scene introduction information corresponding to at least one keyword set respectively, wherein one keyword set comprises one or more keywords.

For example, the scene introduction information corresponding to the keyword set including flowers, trees and benches is a park; scene introduction information corresponding to the keyword set containing the strange animals and the weapons is used for strange battles.

In the first implementation manner for obtaining the content tag and the second implementation manner for obtaining the content tag, optionally, the content tag may be automatically obtained by using a natural language processing technology based on the obtained keyword.

The third implementation manner of obtaining the content tag is as follows: and determining the content label of the target object based on the scene information of the object contained in the at least one frame of image.

Optionally, standard attribute information corresponding to one or more scenes may be pre-stored, the attribute information of the scene information where the object is located is matched with the standard attribute information corresponding to the one or more scenes that are pre-stored, and if the scene information where the object is located is matched with the standard attribute information of the target scene that is pre-stored, the content tag of the target object is determined based on the target scene.

Optionally, the process of "matching the attribute information of the scene information where the object is located with the standard attribute information corresponding to one or more scenes that are stored in advance" may refer to the matching process of the attribute information of the object and the attribute information of the target object, and is not described herein again.

A fourth implementation manner of obtaining the content tag: the content tag is obtained by at least one of the first, second and third methods.

The following describes an implementation of "acquiring attribute information of an object included in the at least one frame of image" in the embodiment of the present application. The embodiments of the present application provide, but are not limited to, the following two implementations.

The first implementation mode comprises the following steps: and obtaining appearance characteristic information of the object based on the at least one frame of image, wherein the appearance characteristic information of the object represents the appearance form of the object.

If the "at least one image" mentioned in step S202 includes multiple images, in an alternative embodiment, for each image, a contour corresponding to one or more surfaces of the object may be extracted from the image by using a contour tracing algorithm, for example, for each surface of the object, preprocessing such as edge finding and edge breaking connection may be performed, and after the preprocessing, the surface of the object included in the image becomes a contour composed of pixels.

After the contour of each face of the object contained in each frame of image is obtained, a connection map of the faces can be created to find all the adjacent faces of each face. For example, two surfaces are adjacent to each other if the contour of the two surfaces has a portion of the common pixels. Based on this, appearance characteristic information of the object can be obtained.

If the "at least one frame of image" mentioned in step S202 includes one frame of image, the respective corresponding contour of one or more surfaces of the object included in the frame of image may be obtained to obtain the appearance characteristic information of the object.

The second implementation mode comprises the following steps: and obtaining attribute identification information of the object based on the at least one frame of image, wherein the attribute identification information of the object is arranged on the surface of the object and is used for representing the object class to which the object belongs.

In an alternative embodiment, the attribute identification information may be a one-dimensional barcode or a two-dimensional barcode. Information of an object contained in a barcode, for example, a name of the object, may be obtained by a one-dimensional barcode or a two-dimensional barcode.

The one-dimensional bar code is a binary code consisting of parallel lines with different widths and intervals. The lines and spaces are arranged according to a predetermined pattern and express data items of the corresponding token system. The arrangement order of lines and spaces with different widths can be interpreted as numbers or letters.

The two-dimensional barcode technology is generated on the premise that the one-dimensional barcode cannot meet the requirements of practical application. Due to the limited information capacity, one-dimensional barcodes are typically used to identify items, rather than to describe them. Two-dimensional barcodes can express information in both transverse and longitudinal directions simultaneously, and thus can express a large amount of information in a small area.

The method is described in detail in the embodiments disclosed in the present application, and the method of the present application can be implemented by various types of apparatuses, so that several types of apparatuses are also disclosed in the present application, and specific embodiments are described in detail below.

As shown in fig. 6, a structure diagram of an implementation manner of a tag adding apparatus provided in an embodiment of the present application is provided, where the tag adding apparatus includes:

the first obtaining module 61 is configured to obtain at least one frame of image from the currently obtained image in the process of obtaining the image to obtain the video.

A second obtaining module 62, configured to obtain attribute information of an object included in the at least one frame of image.

And a setting module 63, configured to add a tag of the target object corresponding to the position of the at least one frame of image if the attribute information of the object matches with standard attribute information of a target object stored in advance.

In an optional embodiment, the tag of the target object further includes a content tag, and the tag adding device further includes at least one of the following modules:

a first determination module to determine a content tag of the target object based on audio data corresponding to the at least one frame of image;

the second determination module is used for determining the content label of the target object based on the characters contained in the at least one frame of image;

and the third determining module is used for determining the content label of the target object by applying the scene information of the object contained in the at least one frame of image.

In an optional embodiment, the time tag comprises at least one of: an initial time at which an image containing the target object appears for the first time in the video; a termination time at which an image containing the target object appears in the video for the last time;

In an alternative embodiment, the second obtaining module 62 includes at least one of the following units:

a first obtaining unit, configured to obtain appearance feature information of the object based on the at least one frame of image, where the appearance feature information of the object represents an appearance form of the object;

and the second acquisition unit is used for acquiring attribute identification information of the object based on the at least one frame of image, the attribute identification information of the object is arranged on the surface of the object, and the attribute identification information of the object is used for representing the class of the object to which the object belongs.

In an optional embodiment, the tag adding device further comprises:

and the storage module is used for storing the label of the target object to a database.

In an optional embodiment, the tag adding device further comprises:

the generation module is used for generating a control instruction if the label corresponding to the target object is triggered; the control instruction is used for controlling the video to jump to the playing position.

As shown in fig. 7, which is a structural diagram of an implementation manner of an electronic device provided in an embodiment of the present application, the electronic device includes:

the memory 71 stores programs.

A processor 72 configured to execute the program, the program specifically configured to:

The processor 72 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, or the like.

Optionally, the electronic device further includes: a communication interface 73 and a communication bus 74. In the embodiment of the present application, the memory 71, the processor 72, and the communication interface 73 are communicated with each other through a communication bus 74.

An embodiment of the present application further provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for adding a tag as described in any one of the above.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the embodiments of the apparatus or system, since they are substantially similar to the embodiments of the method, the description is simple, and reference may be made to the partial description of the embodiments of the method for relevant points.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of adding a label, comprising:

in the process of obtaining an image to obtain a video, automatically obtaining at least one frame of image from the currently obtained image;

the label of the target object is used for indicating a playing position for playing the image containing the target object in the video containing the at least one frame of image;

in the process of acquiring the image to obtain the video, acquiring at least one frame of image from the currently obtained image comprises:

in the process of sequentially obtaining the images to obtain the video, obtaining the at least one frame of image contained in a set time period taking the current time as the termination time; or the like, or a combination thereof,

in the process of obtaining the image to obtain the video, at least one frame of image is randomly obtained from the obtained image, or at least one frame of image with the similarity between the contained object and the target object being greater than or equal to a first threshold value is obtained from the obtained image, wherein the similarity between the contained object and the target object is obtained by comparing the global features of the image with the target object.

2. The method for adding a tag according to claim 1, wherein the tag of the target object includes at least one of a time tag and a location tag.

3. The method for adding labels as claimed in claim 2, wherein the labels of the target objects further comprise content labels, and the method for adding labels further comprises at least one of:

4. The method of adding a label according to claim 2 or 3,

the location tag includes at least one of: an initial position in the video at which an image containing the target object appears for the first time; an end position where an image containing the target object appears last in the video.

5. The method for adding a tag according to any one of claims 1 to 3, wherein the obtaining of the attribute information of the object included in the at least one frame of image includes at least one of:

6. The method of adding a label according to claim 1, further comprising:

and storing the label of the target object in a database.

7. The method of adding a label according to claim 1, further comprising:

8. An apparatus for adding a label, comprising:

the first acquisition module is used for automatically acquiring at least one frame of image from the currently acquired image in the process of acquiring the image to acquire the video;

9. An electronic device, comprising:

a memory for storing a program;

in the process of sequentially obtaining the images to obtain the video, obtaining the at least one frame of image contained in a set time period taking the current time as the termination time; or the like, or, alternatively,

10. A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of adding a tag according to any one of claims 1 to 7.