WO2018171234A1 - Video processing method and apparatus - Google Patents

Video processing method and apparatus Download PDF

Info

Publication number
WO2018171234A1
WO2018171234A1 PCT/CN2017/112342 CN2017112342W WO2018171234A1 WO 2018171234 A1 WO2018171234 A1 WO 2018171234A1 CN 2017112342 W CN2017112342 W CN 2017112342W WO 2018171234 A1 WO2018171234 A1 WO 2018171234A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
target object
identification information
information
coordinate system
Prior art date
Application number
PCT/CN2017/112342
Other languages
French (fr)
Chinese (zh)
Inventor
徐异凌
张文军
黄巍
胡颖
马展
吴钊
李明
吴平
Original Assignee
上海交通大学
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海交通大学, 中兴通讯股份有限公司 filed Critical 上海交通大学
Publication of WO2018171234A1 publication Critical patent/WO2018171234A1/en

Links

Images

Definitions

  • the present disclosure relates to the field of communications, and in particular, to a video processing method and apparatus.
  • the core of these application scenarios is often the research and processing of the Region of Interest (ROI).
  • ROI Region of Interest
  • the user's area of interest that is, when the user watches the video media, the line of sight is mainly concentrated and the video area of interest.
  • the user needs to identify the video area that is of interest by identifying the video content in the received large number of videos, which leads to a problem that requires a large amount of resources and time, and there is no reasonable solution at present.
  • An embodiment of the present disclosure provides a method and an apparatus for processing a video, so as to at least solve the problem that a user needs to detect a video area that is of interest by identifying a video content in a received large number of videos in the related art, resulting in a large amount of resources and The problem of time.
  • a method for processing a video includes: marking a target object in a video, and generating identification information of the target object according to the marking result, wherein the identification information is at least set to indicate One of the following: a type of the target object, a content of the target object, spatial location information of the target object in the video, acquiring instruction information, and specifying specified identification information of the target object according to the instruction information index; Pushing or displaying part or all of the video corresponding to the specified identification information in the video.
  • the identifier information is set to at least one of: a mark type of the identifier information, a mark content type of the identifier information, a mark content of the identifier information, a length information of the identifier information, the a quality level of a part or all of the video in which the target object is located, a quantity of identification information included in part or all of the video in which the target object is located, time information corresponding to part or all of the video in which the target object is located, and the target object is located
  • the spatial location information of the part or all of the video in the video includes at least one of: a central point coordinate of the part or all of the video, a width of the part or all of the video, the part or all of the video
  • the height of the coordinate system includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
  • the value of the coordinate includes at least one of the following: a value of a two-dimensional rectangular coordinate system, and a value of a two-dimensional spherical coordinate system; in a three-dimensional space coordinate system, the coordinate The value is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system.
  • the target object in the video is marked, and the identification information of the target object is generated according to the marking result, including: marking, in the process of video capturing or editing, the target object in the video, and further marking according to the marking
  • the result is generated identification information of the target object; and/or in the captured or edited video data, the target object in the video is marked, and the identification information of the target object is generated according to the marking result.
  • the acquiring the instruction information for indicating the at least one specified target object comprises: acquiring first instruction information preset by the user; and/or acquiring second instruction information obtained after analyzing the video viewing behavior of the user.
  • a video processing apparatus comprising: a marking module configured to mark a target object in a video; and a generating module configured to generate identification information of the target object according to the marking result And the identifier information is set to at least one of: a type of the target object, a content of the target object, spatial location information of the target object in the video, and an obtaining module configured to acquire an instruction And an indexing module, configured to index the specified identification information of the specified target object according to the instruction information; and the processing module is configured to push or display part or all of the video corresponding to the specified identification information in the video.
  • the identifier information is set to at least one of: a mark type of the identifier information, a mark content type of the identifier information, a length information of the identifier information, a mark content of the identifier information, the a quality level of a part or all of the video in which the target object is located, a quantity of identification information included in part or all of the video in which the target object is located, time information corresponding to the part or all of the video, and the part or all of the video is in the Spatial location information in the video.
  • the spatial location information of the part or all of the video in the video includes at least one of: a central point coordinate of the part or all of the video, a width of the part or all of the video, the part or all of the video
  • the height of the coordinate system includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
  • the value of the coordinate includes at least one of the following: a value of a two-dimensional rectangular coordinate system, a value of a two-dimensional spherical coordinate system; a value of a three-dimensional space rectangular coordinate system, a three-dimensional spherical coordinate Take the value.
  • the marking module comprises: a first marking unit configured to mark a target object in the video during video capture or editing; and a second marking unit configured to be in the captured or edited video data. , mark the target object in the video.
  • the obtaining module includes: a first acquiring unit configured to acquire first instruction information preset by the user; and a second acquiring unit configured to acquire second instruction information obtained after analyzing the video viewing behavior of the user.
  • a storage medium including a stored program, wherein the execution of the processing method of the video in the above embodiment is performed while the program is running.
  • a processor configured to execute a program, wherein the program is executed to execute an implementation of a processing method of a video in the above embodiment.
  • the target object in the video is marked, and then the identification information of the target object is generated according to the marking result, the identification information includes at least the spatial position information of the target object in the video, and then the instruction set to indicate the specified target object is obtained by acquiring Information, and indexing to the specified identification information of the specified target object according to the instruction information, and then pushing or displaying part or all of the video corresponding to the specified identification information according to the spatial location information in the identification information, where some or all of the video is included in the entire video.
  • the above method solves the problem that the user needs to detect the video area that is interested by identifying the video content in the received large number of videos in the related art, which causes a problem that requires a large amount of resources and time, and the user can already exist through the index video.
  • the identification information quickly captures the video of interest, which greatly saves resources and time during the video retrieval process.
  • FIG. 1 is a schematic diagram of an application environment of an optional video processing method according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a method of processing an optional video according to an embodiment of the present disclosure
  • FIG. 3 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure.
  • FIG. 4 is a block diagram showing the structure of an optional video processing apparatus according to an embodiment of the present disclosure.
  • FIG. 5 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of content of an optional identification information in an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of an optional video positioning method according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of an optional video retrieval method according to an embodiment of the present disclosure.
  • FIG. 1 A schematic diagram of an application environment of an optional video processing method of the embodiment.
  • the processing method of the video may be, but is not limited to, being applied to an application environment as shown in FIG. 1.
  • the terminal 102 is connected to the server 106, wherein the server 106 may push the video file to the terminal 102.
  • An application client 104 that can receive and display video images is run on the terminal 102.
  • the server 106 marks the target object in the video image, and further generates identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and the target object is in the The spatial location information in the video image; the server 106 acquires the instruction information, and acquires the specified identification information of the specified target object according to the instruction information, wherein the instruction information is set to indicate the at least one specified target object; the server 106 pushes the video corresponding to the specified identification information, wherein The video includes: some or all of the videos. It should be noted that each step of the foregoing steps performed by the server 106 can also be performed at the terminal 102. This embodiment of the disclosure does not limit this.
  • Embodiments of the present disclosure also provide a method of processing a video.
  • 2 is a flow chart of an alternative video processing method in accordance with an embodiment of the present disclosure. As shown in FIG. 2, an optional process of the video processing method includes:
  • Step S202 marking the target object in the video, and further generating identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and the target object is in the video. Spatial location information;
  • Step S204 acquiring instruction information, and specifying specified identification information of the target object according to the instruction information index;
  • Step S206 pushing or displaying part or all of the videos corresponding to the specified identification information in the video.
  • the target object in the video is marked, and then the identification information of the target object is generated according to the marking result, and the identification information includes at least the spatial location information of the target object in the video, and then is set to indicate the specified target by acquiring
  • the instruction information of the object is indexed to the specified identification information of the specified target object according to the instruction information, and then part or all of the video corresponding to the specified identification information is pushed or displayed according to the spatial location information in the identification information, where some or all of the video is included In the entire video.
  • the above method solves the problem that the user needs to detect the video area that is interested by identifying the video content in the received large number of videos in the related art, which causes a problem that requires a large amount of resources and time, and the user can already exist through the index video.
  • the identification information quickly captures the video of interest, which greatly saves resources and time during the video retrieval process.
  • the identification information is at least set to indicate one of: a tag type of the identification information, a tag content type of the identification information, a tag content of the identification information, a length information of the identification information, and a target object
  • a tag type of the identification information a tag type of the identification information
  • a tag content of the identification information a tag content of the identification information
  • a target object The quality level of some or all of the videos, the number of identification information contained in some or all of the videos in which the target object is located, the time information corresponding to some or all of the videos, and the spatial location information of some or all of the videos in the video.
  • the spatial location information of the video in the video includes at least one of the following: the coordinates of the center point of part or all of the video, the width of part or all of the video, and the height of some or all of the video.
  • the coordinate system in which the coordinates are located includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
  • the value of the coordinate includes at least one of the following: a value of the two-dimensional rectangular coordinate system, and a value of the two-dimensional spherical coordinate system.
  • the value of the two-dimensional Cartesian coordinate system here can be expressed as (x, y), and the value of the two-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value).
  • the value of the coordinates is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system.
  • the value of the three-dimensional rectangular coordinate system can be expressed as (x, y, z)
  • the value of the three-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value, roll angle).
  • the target object in the video is marked, and the identification information of the target object is generated according to the marking result, including: marking the target object in the video during video capture or editing. And generating identification information of the target object according to the marking result; and/or marking the target object in the video in the captured or edited video data, and generating identification information of the target object according to the marking result.
  • acquiring the instruction information set to indicate the at least one specified target object comprises: acquiring first instruction information preset by the user; and/or obtaining the obtained after analyzing the video viewing behavior of the user. Second instruction information.
  • module may implement a combination of software and/or hardware of a predetermined function.
  • apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 3 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure. As shown in Figure 3, the device comprises:
  • a marking module 302 configured to mark a target object in the video
  • the generating module 304 is configured to generate identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and spatial location information of the target object in the video;
  • the obtaining module 306 is configured to acquire instruction information, where the instruction information is set to indicate at least one specified target object;
  • the indexing module 308 is configured to specify specified identification information of the target object according to the instruction information index
  • the processing module 310 is configured to push or display part or all of the video corresponding to the specified identification information in the video.
  • the marking module marks the target object in the video
  • the generating module generates the identification information of the target object according to the marking result
  • the identification information includes at least the spatial location information of the target object in the video
  • the acquisition module obtains the setting as Instructing instruction information of the specified target object
  • the indexing module indexes the specified identification information of the specified target object according to the instruction information
  • the processing module pushes or displays part or all of the video corresponding to the specified identification information in the video according to the spatial location information in the identification information.
  • the invention solves the problem that the user needs to detect the video area that is interested in the video content in the received large number of videos, which causes a large amount of resources and time, and the user can quickly identify the existing identification information in the video. Get video feeds of interest, greatly saving resources and time in the video retrieval process.
  • the identification information is at least set to indicate one of: a tag type of the identification information, a tag content type of the identification information, a length information of the identification information, a tag content of the identification information, and a target pair.
  • the spatial location information of the video in the video includes at least one of the following: the coordinates of the center point of part or all of the video, the width of part or all of the video, and the height of some or all of the video.
  • the coordinate system in which the coordinates are located includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
  • the value of the coordinate includes at least one of the following: a value of the two-dimensional rectangular coordinate system, and a value of the two-dimensional spherical coordinate system.
  • the value of the two-dimensional Cartesian coordinate system here can be expressed as (x, y), and the value of the two-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value).
  • the value of the coordinates is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system.
  • the value of the three-dimensional rectangular coordinate system can be expressed as (x, y, z)
  • the value of the three-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value, roll angle).
  • the embodiment of the present disclosure also provides an optional video processing device.
  • 4 is a block diagram showing the structure of an optional video processing apparatus according to an embodiment of the present disclosure.
  • the marking module 302 includes: a first marking unit 3020 configured to mark a target object in the video during video capture or editing; and a second marking unit 3022 configured to be completed in acquisition or editing In the video data, mark the target object in the video.
  • the obtaining module 306 includes: a first obtaining unit 3060 configured to acquire first instruction information preset by the user; and a second obtaining unit 3062 configured to acquire second instruction information obtained after analyzing the video viewing behavior of the user.
  • the foregoing apparatus may be applied to any hardware device having the foregoing functional module in the server or the terminal, which is not limited in the embodiment of the present disclosure.
  • FIG. 5 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure. As shown in Figure 5, the device includes:
  • the memory 52 is configured to store instructions executable by the processor 50; the processor 50 is configured to perform an operation of tagging a target object in the video based on instructions stored in the memory 52, thereby As a result, the identification information of the target object is generated, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and spatial location information of the target object in the video; acquiring instruction information, and specifying the target according to the instruction information index The specified identification information of the object; push or display part or all of the video corresponding to the specified identification information.
  • the processor 50 described above may also perform an implementation of any of the above-described video processing methods.
  • the processor marks the target object in the video, and then generates identification information of the target object according to the marking result, where the identification information includes at least spatial location information of the target object in the video, and then is set to indicate the specified target object by acquiring
  • the instruction information is indexed to the specified identification information of the specified target object according to the instruction information, and then part or all of the video corresponding to the specified identification information is pushed according to the spatial location information in the identification information, and some or all of the video here is included in the entire video.
  • the related art needs to detect the video content that is interested in the video content in the received large number of videos, which results in a large amount of resources and The problem of time, the user can quickly obtain the video push of interest by indexing the information already existing in the video, which greatly saves the resources and time in the video retrieval process.
  • Embodiments of the present disclosure also provide a storage medium including a stored program, wherein the program is executed to perform an implementation of a processing method of a video in the above-described embodiments and its alternative examples.
  • the present embodiment introduces the technical solutions of the embodiments of the present disclosure in an embodiment based on an exemplary application scenario.
  • An embodiment of the present disclosure provides an identification information marking method based on video content and a spatial location thereof, which can perform corresponding information identification on a specific content or a video area of a specific spatial location in the video medium, thereby being able to pass the content of interest of the user.
  • the identification information provided in the embodiments of the present disclosure is associated to a corresponding video area.
  • the video area herein can be understood as a video image of a certain range around the target object to which the information is associated.
  • the size or shape of the area can be customized, which is not limited in this embodiment.
  • Video positioning that is, pre-acquisition information according to user habits, preferences, etc.
  • the identification information is based on specific video content and spatial location
  • the video area is directly located and the video is directly located. The area is pushed to the user.
  • the video positioning application of the present disclosure can realize the application of the panoramic video such as the initial viewing angle, or Priority is given to areas of interest to the user.
  • Video retrieval which directly retrieves the video content required by the user in a large number of videos. For example, in a video surveillance application scenario, it is necessary to perform fast and centralized processing on an area of interest to the user.
  • the present disclosure provides a marking method based on identification information of video content and its spatial location. Therefore, the identification information provided by the present disclosure can be retrieved to quickly retrieve the corresponding video region, which greatly improves the efficiency of video retrieval.
  • the embodiment of the present disclosure adopts the following technical solutions. It should be noted that the video content accessory tag or tag mentioned in the embodiment of the present disclosure may be understood as the identification information based on the video content and its spatial location.
  • An object of the present disclosure is to provide an identification information marking method based on video content and its spatial position, which is exemplarily: for a video picture finally presented to a user, a video area in which a specific content or a specific spatial position is attached is unique Associated specific video tag information.
  • the identification information based on the video content and its spatial location to be added may be varied.
  • the following set of information may be used as an example:
  • Information three set as exemplary information indicating the label content of the video content affiliate tag of the area;
  • Message 5 Set to indicate the spatial location of the area in the overall video.
  • the disclosure identifies information about a specific content or a specific spatial location of the video medium, and the identification information indicates a specific content category, content information, content quality, and content location of the portion of the video.
  • the video tag information provided by the present disclosure may be set up in one embodiment as a process and presentation of a client application or service.
  • the server can simultaneously analyze video content through image processing, pattern recognition, and the like in the video capture and acquisition phase. Mark specific content or specific spatial locations of the video media based on the results of the analysis.
  • the server may mark specific content or specific spatial locations of the video media during the video editing process.
  • the server tags the specific content or specific spatial location of the video media in the captured or edited video data.
  • the server may place the tagged specific content or specific spatial location information in a reserved field in the video stream or codestream.
  • the server separately creates the tag data associated with the corresponding video data.
  • the client used by the user can separately create the tag data of the corresponding video according to the user's usage habit, and feed back to the server.
  • the user After receiving the video media, the user can learn the specific content and the spatial location of the video by identifying the information, thereby performing further application processing.
  • the server may first obtain the video area that matches the user information by matching the preset user information and the identification information marked in the video. Then match the push according to user preferences or settings.
  • the server dynamically matches the identification information according to the video viewing requirement of the user for the specific content, and pushes the corresponding video area to the user.
  • the server pushes the complete video to the user
  • the terminal acquires the video area that matches the user information according to the preset user information and the identification information marked in the video, and performs matching display according to the user's preference or setting.
  • the server pushes the complete video to the user.
  • the terminal dynamically matches the identification information according to the video viewing requirement of the specific content, and displays the corresponding video area to the user.
  • the user information herein may include, but is not limited to, at least one of: a user's viewing habits, a user's preference for a particular content, a user's preference, and a user's specific use.
  • the identification information herein may be set to indicate but not limited to at least one of the following: a label type of the video content attachment label of the area, a label content type of the video content attachment label of the area, and information of the label content of the area video content attachment label, The quality level of the video content in the area, the spatial location of the area in the overall video.
  • the server can directly locate the video area that matches the user information and push the video area to the user, and the terminal can be straight.
  • the video area that matches the user information is located and displayed to the user.
  • the user information here may be the user information that is obtained in advance before the video is pushed, or may be obtained by collecting the feedback of the user in the process of the user watching the video, which is not limited in this embodiment. . If the user information is collected in advance, the matched video area can be pushed to the user in the initial stage of the user watching the video. If the user information collected during the video viewing by the user is analyzed, the user information can be analyzed and identified in the video. After the information is matched, the matched video area is pushed to the user during the subsequent viewing process of the user.
  • the above marking process can be implemented by adding new identification information to the video media related information, which can be implemented variously, preferably by the following set of information.
  • Quality_level indicates the quality level of the video content in the area
  • Label_center_yaw indicates the yaw coordinate value of the yaw angle of the center point of the label area
  • Label_center_pitch indicates the pitch angle pitch coordinate value of the center point of the label area
  • Label_width indicates the width of the label area
  • Label_height indicates the height of the label area
  • Label_type indicates the label type of the video content attachment label of the area
  • Label_info_type indicates the tag content type of the video content affiliate tag of the area
  • Label_info_content_length indicates the content length of the video content affiliate tag of the area
  • Content_byte Indicates the specific byte information of the tag content of the video content affiliate tag of the area.
  • the identification information based on the video content and its spatial position that is, quality_level, label_center_yaw, label_center_pitch, label_width, label_height, label_type, label_info_type, label_info_content_length, content_byte, is appropriately added to form a specific content and a specific spatial position.
  • the ID of the video area is appropriately added to form a specific content and a specific spatial position.
  • Label_number Indicates the number of labels included in this video area.
  • Quality_level Indicates the quality level of video content in this area. The higher the value, the higher the video quality.
  • Label_center_yaw Indicates the yaw coordinate value of the center point of the label area, in units of 0.01 degrees, ranging from [-18000, 18000).
  • Label_center_pitch Indicates the pitch coordinate value of the center point of the label area, in units of 0.01 degrees, in the range of [-9000, 9000].
  • Label_width Indicates the width of the label area, in units of 0.01 degrees.
  • Label_height Indicates the height of the label area, in units of 0.01 degrees.
  • Label_type indicates the label type of the video content attachment label of the area. The value and meaning of the label type are shown in Table 1.
  • the attached label of the video content is a face 1
  • the attached label for this video content is the license plate 2
  • the affiliate tag of this video content is a general sports target. 3
  • the affiliate tag of the video content is a general static target 4
  • the attached label for this video content is the product 5
  • the attached label for this video content is a plant 6-254 This part is reserved 255
  • the affiliate tag for this video content is a user-defined tag
  • Label_info_type indicates the tag content type of the video content attachment tag of the area. The value and meaning of the tag content type are shown in Table 2.
  • the label content is text 1
  • the tag content is a URL 2-255 This part is reserved
  • Label_info_content_length indicates the length of the label content of the video content attachment tag of the area.
  • Content_byte Indicates the specific byte information of the tag content of the video content affiliate tag of the area.
  • the label group LabelBox corresponding to a video area includes label_number label information LabelInfoBox and label area information LabelRegionBox.
  • a label information LabelInfoBox contains a label type label_type, a label content type label_info_type, a label content length label_content_length, and a label_content_length content information content_byte.
  • a label area information LabelRegionBox contains a quality level quality_level, spatial position information: label area center point information (label_center_yaw.label_center_pitch), label area width label_width, label area height label_height.
  • FIG. 6 is a schematic diagram of content of an optional identification information in an embodiment of the present disclosure.
  • Embodiment 1 Video Positioning Application
  • the panoramic video includes a 180-degree or 360-degree viewing angle range, but the human viewing angle has limitations, and the entire panoramic video content cannot be viewed at the same time, but only a part of the panoramic video is viewed. Therefore, the user can view different area videos in the panorama in different browsing order. It is worth noting that some areas where the user views the panoramic video are not completely random, but the video area is switched according to the user's personal preference.
  • the disclosure provides a label associated with a video, which is set to indicate a partial video area specific content and specific spatial space location information, and then directly locates a corresponding video area according to user preferences, and presents the part of the video to the user. The following is exemplified by several examples.
  • the information of the corresponding video area is marked in the recorded panoramic video content, and the video area containing the label is preferentially pushed to the user for viewing according to the user's preference for the label type during the user viewing process.
  • FIG. 7 is a schematic diagram of an optional video positioning method according to an embodiment of the present disclosure.
  • the tag may indicate that the corresponding area is information such as a face, a plant, and the like. If the user likes to pay attention to the plants in the video, the user can preferentially push the video content of the region to the user by positioning the plant tag and according to the corresponding spatial location information and rotation information when the user views the panoramic video.
  • the user's view area is limited at the same time, the entire panoramic video will not be viewed. Therefore, in the case of limited bandwidth, the user's region of interest can be encoded with high quality, and the non-interest region of the user can be low-quality encoded.
  • the area of the face in which the user is interested is in a high quality coding mode, and the other parts are in low quality coding.
  • the user can set a plurality of tags of interest, and push the optimal area video to the user according to various possible combinations of the tags.
  • the user is interested in a certain person and a certain car, and sets the two types as the tags of interest.
  • the video area containing both the person and the car tags is preferentially displayed when there is no simultaneous presence.
  • the tag type added in the panoramic video can be preset, or it can be a label customized by the user according to his own needs.
  • the user can combine the related tags to define the type of combination that they need.
  • the user sets a custom label for an item in the video, feeds the information of the label to the server, and the server subsequently pushes the relevant video area to the user according to the set label.
  • the content form carried by the different tags in the panoramic video may be different from the content itself, and the tag content may be text, such as a character tag, and the text content describes the person's name and resume.
  • the tag content can be a number, such as a product tag, and a digital content describing the price information.
  • the tag content can be a link, such as a plant tag, and the link content gives the URL address of the plant in detail.
  • a tag in the same video area can be associated with multiple types of content information.
  • text information of the product name, digital price of the product price or production date, and goods can be added.
  • the link information for the purchase path can be added.
  • a label of the panoramic video area setting can be nested to contain multiple sub-tags.
  • multiple character sub-tags can be nested under the same sports label so that the user can watch.
  • Embodiment 2 Virtual Reality Video Application
  • the video area viewed by the user is not a complete virtual reality video area, so the video content of interest can be pushed for the user by adding different labels.
  • a tag is added to multiple view videos, and the user sets a tag for the region of interest, and the best view video can be selected and pushed to the user according to the tag of the user's interest.
  • Embodiment 4 Video Retrieval Application
  • the acquired surveillance video is usually used to track target vehicles, target people, etc., but because these tracking behaviors often need to analyze and process a large number of surveillance videos through image processing and other technologies in a short time, for video surveillance.
  • the application brings a lot of work.
  • the video area of a specific content such as a face, a license plate, and the like can be marked during the shooting of the monitoring video, the label in the video can be directly retrieved after receiving the monitoring video. , greatly reducing the workload of video retrieval.
  • the following is exemplified by several examples.
  • FIG. 8 is a schematic diagram of an optional video retrieval method according to an embodiment of the present disclosure.
  • the specific content in the video is tagged during the monitoring video capture, and the user can directly retrieve the tags after receiving the monitoring video, for example, the tags of all the license plates can be retrieved, and the tags are obtained.
  • the associated information finally obtains the video information of all the license plates included in the video and the number information of the license plate.
  • tags can be set for video retrieval, and all relevant video regions are searched based on various combinations of these tags.
  • a person and a certain car are searched for a combined tag, and finally video information containing the two tags is obtained.
  • Video retrieval which directly retrieves the video content required by the user in a large number of videos.
  • the present disclosure provides an identification information marking method based on video content and its spatial location. Therefore, the corresponding video region can be quickly retrieved by retrieving the identification information provided by the present disclosure, thereby greatly improving the efficiency of video retrieval.
  • Embodiments of the present disclosure also provide a storage medium.
  • the foregoing storage medium may be configured to save the program code executed by the card tray pop-up method provided in the first embodiment.
  • the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
  • the storage medium is arranged to store program code arranged to perform the following steps:
  • identification information is at least set to indicate one of: a type of the target object, a content of the target object, and a target object in the video. Spatial location information;
  • the disclosed technical contents may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present disclosure.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .
  • a method for processing a video provided by an embodiment of the present disclosure, by marking a target object in a video, and further according to The tag result generates identification information of the target object, and then obtains instruction information set to indicate the specified target object, and indexes the specified identification information to the specified target object according to the instruction information, and then pushes or displays the specified identifier according to the spatial location information in the identification information.
  • Part or all of the video corresponding to the information solves the problem that the user needs to detect the video area that is interested in the video content in the received large number of videos in the related art, which causes a large amount of resources and time, and the user can pass the index.
  • the identification information already existing in the video quickly acquires the video of interest, which greatly saves resources and time in the video retrieval process.

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A video processing method and apparatus, comprising: labeling a target object within a video, and thereby generating label information of the target object according to a labeling result, wherein the label information is at least configured to indicate one of the following: the type of the target object, content of the target object, and spatial position information of the target object in the video (S202); acquiring instruction information, and indexing specified label information of a specified target object according to the instruction information (S204); pushing or displaying part or all of the videos corresponding to said specified label information within said video (S206).

Description

视频的处理方法及装置Video processing method and device 技术领域Technical field
本公开涉及通信领域,具体而言,涉及一种视频的处理方法及装置。The present disclosure relates to the field of communications, and in particular, to a video processing method and apparatus.
背景技术Background technique
随着数字媒体技术的快速发展,流媒体消费中的应用场景越来越智能化、个性化和多样化。而这些应用场景的核心,往往在于对用户感兴趣区域(Region of Interest,ROI)的研究和处理。用户感兴趣区域即用户在观看视频媒体时,其视线主要集中和关注的视频区域。With the rapid development of digital media technology, the application scenarios in streaming media consumption are becoming more intelligent, personalized and diversified. The core of these application scenarios is often the research and processing of the Region of Interest (ROI). The user's area of interest, that is, when the user watches the video media, the line of sight is mainly concentrated and the video area of interest.
相关技术中,对于ROI的研究集中在用户收到视频后对视频内容进行相应的识别和检索。例如在视频监控的应用中,若用户需要查找特定的视频内容,可以在用户收到监控视频之后,通过感兴趣区域检测技术寻找用户所需的视频内容,这就需要在短期内对大量监控视频进行检测,进而识别出来用户感兴趣的区域对应的视频图像。此外,在全景视频的应用领域中,用户若对其中某些区域的视频内容感兴趣,也需要在用户收到全景视频之后,通过感兴趣区域检测技术寻找用户所需的视频区域。上述检索过程中用户需要在接收到的大量视频中通过识别视频内容来检测自己感兴趣的视频区域,需要耗费大量的资源和时间。In the related art, research on ROI focuses on the corresponding recognition and retrieval of video content after the user receives the video. For example, in a video surveillance application, if a user needs to find a specific video content, after the user receives the monitoring video, the user can find the video content required by the user through the region of interest detection technology, which requires a large amount of surveillance video in a short period of time. The detection is performed to identify the video image corresponding to the area of interest to the user. In addition, in the application field of panoramic video, if the user is interested in the video content of some of the regions, the user needs to find the video region required by the user through the region of interest detection technology after the user receives the panoramic video. In the above retrieval process, the user needs to identify the video area of interest by recognizing the video content in the received large number of videos, which requires a large amount of resources and time.
针对相关技术中,用户需要在接收到的大量视频中通过识别视频内容来检测自己感兴趣的视频区域,导致需要耗费大量的资源和时间的问题,目前尚未有合理的解决办法。In the related art, the user needs to identify the video area that is of interest by identifying the video content in the received large number of videos, which leads to a problem that requires a large amount of resources and time, and there is no reasonable solution at present.
发明内容Summary of the invention
本公开实施例提供了一种视频的处理方法及装置,以至少解决相关技术中用户需要在接收到的大量视频中通过识别视频内容来检测自己感兴趣的视频区域,导致需要耗费大量的资源和时间的问题。An embodiment of the present disclosure provides a method and an apparatus for processing a video, so as to at least solve the problem that a user needs to detect a video area that is of interest by identifying a video content in a received large number of videos in the related art, resulting in a large amount of resources and The problem of time.
根据本公开的一个方面,提供了一种视频的处理方法,包括:对视频中的目标对象进行标记,进而根据标记结果生成所述目标对象的标识信息,其中,所述标识信息至少设置为指示以下之一:所述目标对象的类型,所述目标对象的内容,所述目标对象在所述视频中的空间位置信息;获取指令信息,根据所述指令信息索引指定目标对象的指定标识信息;推送或显示在所述视频中所述指定标识信息对应的部分或全部视频。According to an aspect of the present disclosure, a method for processing a video includes: marking a target object in a video, and generating identification information of the target object according to the marking result, wherein the identification information is at least set to indicate One of the following: a type of the target object, a content of the target object, spatial location information of the target object in the video, acquiring instruction information, and specifying specified identification information of the target object according to the instruction information index; Pushing or displaying part or all of the video corresponding to the specified identification information in the video.
优选的,所述标识信息至少设置为指示以下之一:所述标识信息的标记类型,所述标识信息的标记内容类型,所述标识信息的标记内容,所述标识信息的长度信息,所述目标对象所在的部分或全部视频的质量等级,所述目标对象所在的部分或全部视频中包含的标识信息的数量,所述目标对象所在的部分或全部视频对应的时间信息,所述目标对象所在的部分或全部视频在所述视频中的空间位置信息。 Preferably, the identifier information is set to at least one of: a mark type of the identifier information, a mark content type of the identifier information, a mark content of the identifier information, a length information of the identifier information, the a quality level of a part or all of the video in which the target object is located, a quantity of identification information included in part or all of the video in which the target object is located, time information corresponding to part or all of the video in which the target object is located, and the target object is located The spatial location information of some or all of the videos in the video.
优选的,所述部分或全部视频在所述视频中的空间位置信息至少包括以下之一:所述部分或全部视频的中心点坐标,所述部分或全部视频的宽度,所述部分或全部视频的高度;其中,所述坐标所在的坐标系包括以下之一:二维空间坐标系,三维空间坐标系。Preferably, the spatial location information of the part or all of the video in the video includes at least one of: a central point coordinate of the part or all of the video, a width of the part or all of the video, the part or all of the video The height of the coordinate system includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
优选的,在二维空间坐标系下,所述坐标的取值包括以下至少之一:二维直角坐标系取值,二维球面坐标系取值;在三维空间坐标系下,所述坐标的取值为以下至少之一:三维空间直角坐标系取值,三维球面坐标系取值。Preferably, in the two-dimensional space coordinate system, the value of the coordinate includes at least one of the following: a value of a two-dimensional rectangular coordinate system, and a value of a two-dimensional spherical coordinate system; in a three-dimensional space coordinate system, the coordinate The value is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system.
优选的,所述对视频中的目标对象进行标记,进而根据标记结果生成所述目标对象的标识信息,包括:在视频采集或编辑的过程中,对视频中的目标对象进行标记,进而根据标记结果生成所述目标对象的标识信息;和/或在采集或编辑完成的视频数据中,对视频中的目标对象进行标记,进而根据标记结果生成所述目标对象的标识信息。Preferably, the target object in the video is marked, and the identification information of the target object is generated according to the marking result, including: marking, in the process of video capturing or editing, the target object in the video, and further marking according to the marking The result is generated identification information of the target object; and/or in the captured or edited video data, the target object in the video is marked, and the identification information of the target object is generated according to the marking result.
优选的,所述获取用于指示至少一个指定目标对象的指令信息包括:获取用户预先设置的第一指令信息;和/或获取在分析用户的视频观看行为后得出的第二指令信息。Preferably, the acquiring the instruction information for indicating the at least one specified target object comprises: acquiring first instruction information preset by the user; and/or acquiring second instruction information obtained after analyzing the video viewing behavior of the user.
根据本公开的另一个方面,还提供了一种视频的处理装置,包括:标记模块,设置为对视频中的目标对象进行标记;生成模块,设置为根据标记结果生成所述目标对象的标识信息,其中,所述标识信息至少设置为指示以下之一:所述目标对象的类型,所述目标对象的内容,所述目标对象在所述视频中的空间位置信息;获取模块,设置为获取指令信息;索引模块,设置为根据所述指令信息索引所述指定目标对象的指定标识信息;处理模块,设置为推送或显示在所述视频中所述指定标识信息对应的部分或全部视频。According to another aspect of the present disclosure, there is also provided a video processing apparatus, comprising: a marking module configured to mark a target object in a video; and a generating module configured to generate identification information of the target object according to the marking result And the identifier information is set to at least one of: a type of the target object, a content of the target object, spatial location information of the target object in the video, and an obtaining module configured to acquire an instruction And an indexing module, configured to index the specified identification information of the specified target object according to the instruction information; and the processing module is configured to push or display part or all of the video corresponding to the specified identification information in the video.
优选的,所述标识信息至少设置为指示以下之一:所述标识信息的标记类型,所述标识信息的标记内容类型,所述标识信息的长度信息,所述标识信息的标记内容,所述目标对象所在的部分或全部视频的质量等级,所述目标对象所在的部分或全部视频中包含的标识信息的数量,所述部分或全部视频对应的时间信息,所述部分或全部视频在所述视频中的空间位置信息。Preferably, the identifier information is set to at least one of: a mark type of the identifier information, a mark content type of the identifier information, a length information of the identifier information, a mark content of the identifier information, the a quality level of a part or all of the video in which the target object is located, a quantity of identification information included in part or all of the video in which the target object is located, time information corresponding to the part or all of the video, and the part or all of the video is in the Spatial location information in the video.
优选的,所述部分或全部视频在所述视频中的空间位置信息至少包括以下之一:所述部分或全部视频的中心点坐标,所述部分或全部视频的宽度,所述部分或全部视频的高度;其中,所述坐标所在的坐标系包括以下之一:二维空间坐标系,三维空间坐标系。Preferably, the spatial location information of the part or all of the video in the video includes at least one of: a central point coordinate of the part or all of the video, a width of the part or all of the video, the part or all of the video The height of the coordinate system includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
优选的,在二维空间坐标系下,所述坐标的取值包括以下至少之一:二维直角坐标系取值,二维球面坐标系取值;三维空间直角坐标系取值,三维球面坐标系取值。Preferably, in the two-dimensional space coordinate system, the value of the coordinate includes at least one of the following: a value of a two-dimensional rectangular coordinate system, a value of a two-dimensional spherical coordinate system; a value of a three-dimensional space rectangular coordinate system, a three-dimensional spherical coordinate Take the value.
优选的,所述标记模块包括:第一标记单元,设置为在视频采集或编辑的过程中,对视频中的目标对象进行标记;第二标记单元,设置为在采集或编辑完成的视频数据中,对视频中的目标对象进行标记。Preferably, the marking module comprises: a first marking unit configured to mark a target object in the video during video capture or editing; and a second marking unit configured to be in the captured or edited video data. , mark the target object in the video.
优选的,所述获取模块包括:第一获取单元,设置为获取用户预先设置的第一指令信息;第二获取单元,设置为获取在分析用户的视频观看行为后得出的第二指令信息。Preferably, the obtaining module includes: a first acquiring unit configured to acquire first instruction information preset by the user; and a second acquiring unit configured to acquire second instruction information obtained after analyzing the video viewing behavior of the user.
根据本公开的另一个方面,还提供了一种存储介质,所述存储介质包括存储的程序,其中,所述程序运行时执行上述实施例中的视频的处理方法的实现。 According to another aspect of the present disclosure, there is also provided a storage medium including a stored program, wherein the execution of the processing method of the video in the above embodiment is performed while the program is running.
根据本公开的另一个方面,还提供了一种处理器,所述存储器设置为运行程序,其中,所述程序运行时执行上述实施例中的视频的处理方法的实现。According to another aspect of the present disclosure, there is further provided a processor configured to execute a program, wherein the program is executed to execute an implementation of a processing method of a video in the above embodiment.
通过本公开,对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息,标识信息中至少含有目标对象在视频中的空间位置信息,然后通过获取设置为指示指定目标对象的指令信息,并根据指令信息索引到指定目标对象的指定标识信息,然后根据标识信息中的空间位置信息推送或显示指定标识信息对应的部分或全部视频,此处的部分或全部视频包含在整个视频中。通过上述方法,解决了相关技术中用户需要在接收到的大量视频中通过识别视频内容来检测自己感兴趣的视频区域,导致需要耗费大量的资源和时间的问题,用户可以通过索引视频中已经存在的标识信息快速获取感兴趣的视频,大大节省了视频检索过程中的资源和时间。Through the disclosure, the target object in the video is marked, and then the identification information of the target object is generated according to the marking result, the identification information includes at least the spatial position information of the target object in the video, and then the instruction set to indicate the specified target object is obtained by acquiring Information, and indexing to the specified identification information of the specified target object according to the instruction information, and then pushing or displaying part or all of the video corresponding to the specified identification information according to the spatial location information in the identification information, where some or all of the video is included in the entire video. . The above method solves the problem that the user needs to detect the video area that is interested by identifying the video content in the received large number of videos in the related art, which causes a problem that requires a large amount of resources and time, and the user can already exist through the index video. The identification information quickly captures the video of interest, which greatly saves resources and time during the video retrieval process.
附图说明DRAWINGS
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:The drawings described herein are provided to provide a further understanding of the present disclosure, which is a part of the present disclosure, and the description of the present disclosure and the description thereof are not intended to limit the disclosure. In the drawing:
图1是根据本公开实施例的一种可选的视频的处理方法的应用环境示意图;1 is a schematic diagram of an application environment of an optional video processing method according to an embodiment of the present disclosure;
图2是根据本公开实施例的一种可选的视频的处理方法的流程图;2 is a flowchart of a method of processing an optional video according to an embodiment of the present disclosure;
图3是根据本公开实施例的一种可选的视频的处理装置的结构框图;3 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure;
图4是根据本公开实施例的一种可选的视频的处理装置的结构框图;4 is a block diagram showing the structure of an optional video processing apparatus according to an embodiment of the present disclosure;
图5是根据本公开实施例的一种可选的视频的处理装置的结构框图;FIG. 5 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure; FIG.
图6是本公开实施例中的一种可选的标识信息的内容示意图;6 is a schematic diagram of content of an optional identification information in an embodiment of the present disclosure;
图7是本公开实施例的一种可选的视频定位方法示意图;FIG. 7 is a schematic diagram of an optional video positioning method according to an embodiment of the present disclosure; FIG.
图8是本公开实施例的一种可选的视频检索方法示意图。FIG. 8 is a schematic diagram of an optional video retrieval method according to an embodiment of the present disclosure.
具体实施方式detailed description
下文中将参考附图并结合实施例来详细说明本公开。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The present disclosure will be described in detail below with reference to the drawings in conjunction with the embodiments. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It is to be understood that the terms "first", "second", and the like in the specification and claims of the present disclosure are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the data so used may be interchanged as appropriate, such that the embodiments of the present disclosure described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.
实施例1Example 1
在本公开实施例中,提供了一种上述视频的处理方法的实施例。图1是根据本公开实 施例的一种可选的视频的处理方法的应用环境示意图。作为一种可选的实施方式,该视频的处理方法可以但不限于应用于如图1所示的应用环境中,终端102与服务器106连接,其中服务器106可以向终端102推送视频文件。终端102上运行有可以接收并显示视频图像的应用客户端104。服务器106对视频图像中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息,其中,标识信息至少设置为指示以下之一:目标对象的类型,目标对象的内容,目标对象在所述视频图像中的空间位置信息;服务器106获取指令信息,根据指令信息获取指定目标对象的指定标识信息,其中,指令信息设置为指示至少一个指定目标对象;服务器106推送指定标识信息对应的视频,其中,视频包括:部分或全部视频。需要说明的是,上述服务器106完成的每一步工作也可以在终端102测执行,本公开实施例对此不做限定。In an embodiment of the present disclosure, an embodiment of a method of processing the above video is provided. Figure 1 is based on the present disclosure A schematic diagram of an application environment of an optional video processing method of the embodiment. As an optional implementation manner, the processing method of the video may be, but is not limited to, being applied to an application environment as shown in FIG. 1. The terminal 102 is connected to the server 106, wherein the server 106 may push the video file to the terminal 102. An application client 104 that can receive and display video images is run on the terminal 102. The server 106 marks the target object in the video image, and further generates identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and the target object is in the The spatial location information in the video image; the server 106 acquires the instruction information, and acquires the specified identification information of the specified target object according to the instruction information, wherein the instruction information is set to indicate the at least one specified target object; the server 106 pushes the video corresponding to the specified identification information, wherein The video includes: some or all of the videos. It should be noted that each step of the foregoing steps performed by the server 106 can also be performed at the terminal 102. This embodiment of the disclosure does not limit this.
本公开实施例还提供了一种视频的处理方法。图2是根据本公开实施例的一种可选的视频的处理方法的流程图。如图2所示,视频的处理方法的一种可选流程包括:Embodiments of the present disclosure also provide a method of processing a video. 2 is a flow chart of an alternative video processing method in accordance with an embodiment of the present disclosure. As shown in FIG. 2, an optional process of the video processing method includes:
步骤S202,对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息,其中,标识信息至少设置为指示以下之一:目标对象的类型,目标对象的内容,目标对象在视频中的空间位置信息;Step S202, marking the target object in the video, and further generating identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and the target object is in the video. Spatial location information;
步骤S204,获取指令信息,根据所指令信息索引指定目标对象的指定标识信息;Step S204, acquiring instruction information, and specifying specified identification information of the target object according to the instruction information index;
步骤S206,推送或显示在视频中指定标识信息对应的部分或全部视频。Step S206, pushing or displaying part or all of the videos corresponding to the specified identification information in the video.
通过本公开提供的方法,对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息,标识信息中至少含有目标对象在视频中的空间位置信息,然后通过获取设置为指示指定目标对象的指令信息,并根据指令信息索引到指定目标对象的指定标识信息,然后根据标识信息中的空间位置信息推送或显示指定标识信息对应的部分或全部视频,此处的部分或全部视频包含在整个视频中。通过上述方法,解决了相关技术中用户需要在接收到的大量视频中通过识别视频内容来检测自己感兴趣的视频区域,导致需要耗费大量的资源和时间的问题,用户可以通过索引视频中已经存在的标识信息快速获取感兴趣的视频,大大节省了视频检索过程中的资源和时间。Through the method provided by the present disclosure, the target object in the video is marked, and then the identification information of the target object is generated according to the marking result, and the identification information includes at least the spatial location information of the target object in the video, and then is set to indicate the specified target by acquiring The instruction information of the object is indexed to the specified identification information of the specified target object according to the instruction information, and then part or all of the video corresponding to the specified identification information is pushed or displayed according to the spatial location information in the identification information, where some or all of the video is included In the entire video. The above method solves the problem that the user needs to detect the video area that is interested by identifying the video content in the received large number of videos in the related art, which causes a problem that requires a large amount of resources and time, and the user can already exist through the index video. The identification information quickly captures the video of interest, which greatly saves resources and time during the video retrieval process.
本公开实施例的一个可选示例中,标识信息至少设置为指示以下之一:标识信息的标记类型,标识信息的标记内容类型,标识信息的标记内容,标识信息的长度信息,目标对象所在的部分或全部视频的质量等级,目标对象所在的部分或全部视频中包含的标识信息的数量,部分或全部视频对应的时间信息,部分或全部视频在视频中的空间位置信息。In an optional example of the embodiment of the present disclosure, the identification information is at least set to indicate one of: a tag type of the identification information, a tag content type of the identification information, a tag content of the identification information, a length information of the identification information, and a target object The quality level of some or all of the videos, the number of identification information contained in some or all of the videos in which the target object is located, the time information corresponding to some or all of the videos, and the spatial location information of some or all of the videos in the video.
本公开实施例的一个可选示例中,部分或全部视频在视频中的空间位置信息至少包括以下之一:部分或全部视频的中心点坐标,部分或全部视频的宽度,部分或全部视频的高度;其中,坐标所在的坐标系包括以下之一:二维空间坐标系,三维空间坐标系。In an optional example of the embodiment of the present disclosure, the spatial location information of the video in the video includes at least one of the following: the coordinates of the center point of part or all of the video, the width of part or all of the video, and the height of some or all of the video. Wherein, the coordinate system in which the coordinates are located includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
本公开实施例的一个可选示例中,在二维空间坐标系下,坐标的取值包括以下至少之一:二维直角坐标系取值,二维球面坐标系取值。此处的二维直角坐标系取值可以表示为(x,y),二维球面坐标系取值可以表示为(俯仰角坐标值,偏航角坐标值)。在三维空间 坐标系下,坐标的取值为以下至少之一:三维空间直角坐标系取值,三维球面坐标系取值。此处的三维空间直角坐标系取值可以表示为(x,y,z),三维球面坐标系取值可以表示为(俯仰角坐标值,偏航角坐标值,翻滚角)。In an optional example of the embodiment of the present disclosure, in the two-dimensional space coordinate system, the value of the coordinate includes at least one of the following: a value of the two-dimensional rectangular coordinate system, and a value of the two-dimensional spherical coordinate system. The value of the two-dimensional Cartesian coordinate system here can be expressed as (x, y), and the value of the two-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value). In three dimensions In the coordinate system, the value of the coordinates is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system. Here, the value of the three-dimensional rectangular coordinate system can be expressed as (x, y, z), and the value of the three-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value, roll angle).
本公开实施例的一个可选示例中,对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息,包括:在视频采集或编辑的过程中,对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息;和/或在采集或编辑完成的视频数据中,对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息。In an optional example of the embodiment of the present disclosure, the target object in the video is marked, and the identification information of the target object is generated according to the marking result, including: marking the target object in the video during video capture or editing. And generating identification information of the target object according to the marking result; and/or marking the target object in the video in the captured or edited video data, and generating identification information of the target object according to the marking result.
本公开实施例的一个可选示例中,获取设置为指示至少一个指定目标对象的指令信息包括:获取用户预先设置的第一指令信息;和/或获取在分析用户的视频观看行为后得出的第二指令信息。In an optional example of the embodiment of the present disclosure, acquiring the instruction information set to indicate the at least one specified target object comprises: acquiring first instruction information preset by the user; and/or obtaining the obtained after analyzing the video viewing behavior of the user. Second instruction information.
实施例2Example 2
在本实施例中还提供了一种可选的视频的处理装置,该装置设置为实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。An optional video processing device is also provided in the embodiment, and the device is configured to implement the foregoing embodiments and preferred embodiments, and details are not described herein. As used below, the term "module" may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
根据本公开实施例,还提供了一种设置为实施上述视频的处理装置。图3是根据本公开实施例的一种可选的视频的处理装置的结构框图。如图3所示,该装置包括:According to an embodiment of the present disclosure, there is also provided a processing apparatus configured to implement the above video. FIG. 3 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure. As shown in Figure 3, the device comprises:
标记模块302,设置为对视频中的目标对象进行标记;a marking module 302, configured to mark a target object in the video;
生成模块304,设置为根据标记结果生成目标对象的标识信息,其中,标识信息至少设置为指示以下之一:目标对象的类型,目标对象的内容,目标对象在视频中的空间位置信息;The generating module 304 is configured to generate identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and spatial location information of the target object in the video;
获取模块306,设置为获取指令信息,其中,指令信息设置为指示至少一个指定目标对象;The obtaining module 306 is configured to acquire instruction information, where the instruction information is set to indicate at least one specified target object;
索引模块308,设置为根据指令信息索引指定目标对象的指定标识信息;The indexing module 308 is configured to specify specified identification information of the target object according to the instruction information index;
处理模块310,设置为推送或显示在视频中指定标识信息对应的部分或全部视频。The processing module 310 is configured to push or display part or all of the video corresponding to the specified identification information in the video.
通过上述装置,标记模块对视频中的目标对象进行标记,进而生成模块根据标记结果生成目标对象的标识信息,标识信息中至少含有目标对象在视频中的空间位置信息,然后通过获取模块获取设置为指示指定目标对象的指令信息,索引模块根据指令信息索引到指定目标对象的指定标识信息,然后处理模块根据标识信息中的空间位置信息推送或显示在视频中的指定标识信息对应的部分或全部视频,此处的部分或全部视频包含在整个视频中。解决了相关技术中用户需要在接收到的大量视频中通过识别视频内容来检测自己感兴趣的视频区域,导致需要耗费大量的资源和时间的问题,用户可以通过索引视频中已经存在的标识信息快速获取感兴趣的视频推送,大大节省了视频检索过程中的资源和时间。Through the above device, the marking module marks the target object in the video, and the generating module generates the identification information of the target object according to the marking result, and the identification information includes at least the spatial location information of the target object in the video, and then the acquisition module obtains the setting as Instructing instruction information of the specified target object, the indexing module indexes the specified identification information of the specified target object according to the instruction information, and then the processing module pushes or displays part or all of the video corresponding to the specified identification information in the video according to the spatial location information in the identification information. Some or all of the videos here are included throughout the video. The invention solves the problem that the user needs to detect the video area that is interested in the video content in the received large number of videos, which causes a large amount of resources and time, and the user can quickly identify the existing identification information in the video. Get video feeds of interest, greatly saving resources and time in the video retrieval process.
本公开实施例的一个可选示例中,标识信息至少设置为指示以下之一:标识信息的标记类型,标识信息的标记内容类型,标识信息的长度信息,标识信息的标记内容,目标对 象所在的部分或全部视频的质量等级,目标对象所在的部分或全部视频中包含的标识信息的数量,部分或全部视频对应的时间信息,部分或全部视频在视频中的空间位置信息。In an optional example of the embodiment of the present disclosure, the identification information is at least set to indicate one of: a tag type of the identification information, a tag content type of the identification information, a length information of the identification information, a tag content of the identification information, and a target pair. The quality level of some or all of the videos, the number of identification information contained in some or all of the videos in which the target object is located, the time information corresponding to some or all of the videos, and the spatial location information of some or all of the videos in the video.
本公开实施例的一个可选示例中,部分或全部视频在视频中的空间位置信息至少包括以下之一:部分或全部视频的中心点坐标,部分或全部视频的宽度,部分或全部视频的高度;其中,坐标所在的坐标系包括以下之一:二维空间坐标系,三维空间坐标系。In an optional example of the embodiment of the present disclosure, the spatial location information of the video in the video includes at least one of the following: the coordinates of the center point of part or all of the video, the width of part or all of the video, and the height of some or all of the video. Wherein, the coordinate system in which the coordinates are located includes one of the following: a two-dimensional space coordinate system, and a three-dimensional space coordinate system.
本公开实施例的一个可选示例中,在二维空间坐标系下,坐标的取值包括以下至少之一:二维直角坐标系取值,二维球面坐标系取值。此处的二维直角坐标系取值可以表示为(x,y),二维球面坐标系取值可以表示为(俯仰角坐标值,偏航角坐标值)。在三维空间坐标系下,坐标的取值为以下至少之一:三维空间直角坐标系取值,三维球面坐标系取值。此处的三维空间直角坐标系取值可以表示为(x,y,z),三维球面坐标系取值可以表示为(俯仰角坐标值,偏航角坐标值,翻滚角)。In an optional example of the embodiment of the present disclosure, in the two-dimensional space coordinate system, the value of the coordinate includes at least one of the following: a value of the two-dimensional rectangular coordinate system, and a value of the two-dimensional spherical coordinate system. The value of the two-dimensional Cartesian coordinate system here can be expressed as (x, y), and the value of the two-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value). In the three-dimensional space coordinate system, the value of the coordinates is at least one of the following: the value of the three-dimensional space rectangular coordinate system, and the value of the three-dimensional spherical coordinate system. Here, the value of the three-dimensional rectangular coordinate system can be expressed as (x, y, z), and the value of the three-dimensional spherical coordinate system can be expressed as (pitch angle coordinate value, yaw angle coordinate value, roll angle).
本公开实施例还提供了一种可选的视频的处理装置。图4是根据本公开实施例的一种可选的视频的处理装置的结构框图。The embodiment of the present disclosure also provides an optional video processing device. 4 is a block diagram showing the structure of an optional video processing apparatus according to an embodiment of the present disclosure.
如图4所示,标记模块302包括:第一标记单元3020,设置为在视频采集或编辑的过程中,对视频中的目标对象进行标记;第二标记单元3022,设置为在采集或编辑完成的视频数据中,对视频中的目标对象进行标记。As shown in FIG. 4, the marking module 302 includes: a first marking unit 3020 configured to mark a target object in the video during video capture or editing; and a second marking unit 3022 configured to be completed in acquisition or editing In the video data, mark the target object in the video.
获取模块306包括:第一获取单元3060,设置为获取用户预先设置的第一指令信息;第二获取单元3062,设置为获取在分析用户的视频观看行为后得出的第二指令信息。The obtaining module 306 includes: a first obtaining unit 3060 configured to acquire first instruction information preset by the user; and a second obtaining unit 3062 configured to acquire second instruction information obtained after analyzing the video viewing behavior of the user.
需要说明的是,在本公开实施例中,上述装置可以应用于服务器或者终端任何具有上述功能模块的硬件设备,本公开实施例对此不作限制。It should be noted that, in the embodiment of the present disclosure, the foregoing apparatus may be applied to any hardware device having the foregoing functional module in the server or the terminal, which is not limited in the embodiment of the present disclosure.
本公开实施例还提供了一种应用上述功能模块的实体装置。图5是根据本公开实施例的一种可选的视频的处理装置的结构框图。如图5所示,该装置包括:The embodiment of the present disclosure further provides a physical device applying the above functional module. FIG. 5 is a structural block diagram of an optional video processing apparatus according to an embodiment of the present disclosure. As shown in Figure 5, the device includes:
处理器50;存储器52,其中,存储器52设置为存储处理器50可执行的指令;处理器50设置为根据存储器52中存储的指令执行以下操作:对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息,其中,标识信息至少设置为指示以下之一:目标对象的类型,目标对象的内容,目标对象在视频中的空间位置信息;获取指令信息,根据指令信息索引指定目标对象的指定标识信息;推送或显示指定标识信息对应的部分或全部视频。a processor 50, wherein the memory 52 is configured to store instructions executable by the processor 50; the processor 50 is configured to perform an operation of tagging a target object in the video based on instructions stored in the memory 52, thereby As a result, the identification information of the target object is generated, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and spatial location information of the target object in the video; acquiring instruction information, and specifying the target according to the instruction information index The specified identification information of the object; push or display part or all of the video corresponding to the specified identification information.
上述处理器50还可以执行上述视频处理方法中的任一可选示例的实现。The processor 50 described above may also perform an implementation of any of the above-described video processing methods.
通过上述装置,处理器对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息,标识信息中至少含有目标对象在视频中的空间位置信息,然后通过获取设置为指示指定目标对象的指令信息,并根据指令信息索引到指定目标对象的指定标识信息,然后根据标识信息中的空间位置信息推送指定标识信息对应的部分或全部视频,此处的部分或全部视频包含在整个视频中。通过上述方法,解决了相关技术中用户需要在接收到的大量视频中通过识别视频内容来检测自己感兴趣的视频区域,导致需要耗费大量的资源和 时间的问题,用户可以通过索引视频中已经存在的标识信息快速获取感兴趣的视频推送,大大节省了视频检索过程中的资源和时间。Through the above device, the processor marks the target object in the video, and then generates identification information of the target object according to the marking result, where the identification information includes at least spatial location information of the target object in the video, and then is set to indicate the specified target object by acquiring The instruction information is indexed to the specified identification information of the specified target object according to the instruction information, and then part or all of the video corresponding to the specified identification information is pushed according to the spatial location information in the identification information, and some or all of the video here is included in the entire video. . Through the above method, the related art needs to detect the video content that is interested in the video content in the received large number of videos, which results in a large amount of resources and The problem of time, the user can quickly obtain the video push of interest by indexing the information already existing in the video, which greatly saves the resources and time in the video retrieval process.
本公开实施例还提供了一种存储介质,该存储介质包括存储的程序,其中,所述程序运行时执行上述实施例及其可选示例中的视频的处理方法的实现。Embodiments of the present disclosure also provide a storage medium including a stored program, wherein the program is executed to perform an implementation of a processing method of a video in the above-described embodiments and its alternative examples.
实施例3Example 3
为了更好地理解上述实施例中的技术方案,本实施例基于示例性的应用场景来在一实施例中介绍本公开实施例的技术方案。In order to better understand the technical solutions in the foregoing embodiments, the present embodiment introduces the technical solutions of the embodiments of the present disclosure in an embodiment based on an exemplary application scenario.
本公开实施例提供了一种基于视频内容及其空间位置的标识信息标记方法,能够对视频媒体中特定内容或特定空间位置的视频区域进行对应信息标识,从而能够根据用户的感兴趣内容,通过本公开实施例中提供的标识信息关联到相应的视频区域。此处的视频区域可以理解为标识信息关联的目标对象周边一定范围的视频图像,区域大小或形状可以自定义,本实施例对此不做限定。An embodiment of the present disclosure provides an identification information marking method based on video content and a spatial location thereof, which can perform corresponding information identification on a specific content or a video area of a specific spatial location in the video medium, thereby being able to pass the content of interest of the user. The identification information provided in the embodiments of the present disclosure is associated to a corresponding video area. The video area herein can be understood as a video image of a certain range around the target object to which the information is associated. The size or shape of the area can be customized, which is not limited in this embodiment.
示例性地,可以应用为视频定位和视频检索等应用和服务。视频定位,即根据用户习惯、喜好等预先获取的信息,与视频本身标记的标识信息进行匹配,由于该标识信息是基于特定视频内容和空间位置的,因此直接定位到该视频区域并将该视频区域推送给用户。特别的,在全景视频的消费中,由于用户无法一次性观看整个全景视频而是只能观看其中的一部分区域,结合本公开的视频定位应用即可实现诸如初始视角等全景视频的应用,也可以对用户感兴趣的区域进行优先呈现。视频检索,即在大量视频中直接检索用户所需要的视频内容。例如在视频监控应用场景中,需要对用户感兴趣的区域进行快速和集中处理等。Illustratively, applications and services such as video location and video retrieval can be applied. Video positioning, that is, pre-acquisition information according to user habits, preferences, etc., is matched with the identification information marked by the video itself. Since the identification information is based on specific video content and spatial location, the video area is directly located and the video is directly located. The area is pushed to the user. In particular, in the consumption of the panoramic video, since the user cannot view the entire panoramic video at one time but only a part of the area can be viewed, the video positioning application of the present disclosure can realize the application of the panoramic video such as the initial viewing angle, or Priority is given to areas of interest to the user. Video retrieval, which directly retrieves the video content required by the user in a large number of videos. For example, in a video surveillance application scenario, it is necessary to perform fast and centralized processing on an area of interest to the user.
本公开提供了一种基于视频内容及其空间位置的标识信息的标记方法,因此,可以通过检索本公开提供的标识信息从而快速检索到对应的视频区域,大大地提高了视频检索的效率。The present disclosure provides a marking method based on identification information of video content and its spatial location. Therefore, the identification information provided by the present disclosure can be retrieved to quickly retrieve the corresponding video region, which greatly improves the efficiency of video retrieval.
为实现上述目的,本公开实施例采用了以下技术方案,需要说明的是,本公开实施例中提及的视频内容附属标签或者标签,可以理解为基于视频内容及其空间位置的标识信息。In order to achieve the above object, the embodiment of the present disclosure adopts the following technical solutions. It should be noted that the video content accessory tag or tag mentioned in the embodiment of the present disclosure may be understood as the identification information based on the video content and its spatial location.
本公开的目的是提供一种基于视频内容及其空间位置的标识信息标记方法,示例性为:对于最终呈现给用户的视频画面,针对其中特定内容或特定空间位置的视频区域,附加与之唯一关联的特定的视频标签信息。An object of the present disclosure is to provide an identification information marking method based on video content and its spatial position, which is exemplarily: for a video picture finally presented to a user, a video area in which a specific content or a specific spatial position is attached is unique Associated specific video tag information.
本公开中,需添加的基于视频内容及其空间位置的标识信息可以多样,优选地,可以由以下一组信息为例实现:In the present disclosure, the identification information based on the video content and its spatial location to be added may be varied. Preferably, the following set of information may be used as an example:
信息一:设置为指示该区域视频内容附属标签的标签类型;Information one: set to indicate the tag type of the video content accessory tag of the area;
信息二:设置为指示该区域视频内容附属标签的标签内容类型;Information 2: set to indicate the tag content type of the video content accessory tag of the area;
信息三:设置为指示该区域视频内容附属标签的标签内容的示例性信息;Information three: set as exemplary information indicating the label content of the video content affiliate tag of the area;
信息四:设置为指示该区域视频内容的质量等级;Information 4: set to indicate the quality level of the video content in the area;
信息五:设置为指示该区域在整体视频中的空间位置。 Message 5: Set to indicate the spatial location of the area in the overall video.
本公开对视频媒体的特定内容或特定空间位置进行信息标识,标识信息指示了该部分视频具体的内容类别、内容信息、内容质量以及内容位置。在示例性的应用中,比如视频会议、视频监控、视频广告投放等,本公开提供的视频标签信息可在一实施例中设置为客户端应用或服务的处理和呈现。The disclosure identifies information about a specific content or a specific spatial location of the video medium, and the identification information indicates a specific content category, content information, content quality, and content location of the portion of the video. In an exemplary application, such as video conferencing, video surveillance, video ad placement, etc., the video tag information provided by the present disclosure may be set up in one embodiment as a process and presentation of a client application or service.
下面结合示例性实施例对本公开进行详细说明。以下实施例将有助于本领域的技术人员在一实施例中理解本公开,但不以任何形式限制本公开。应当指出的是,对本领域的普通技术人员来说,在不脱离本公开构思的前提下,还可以做出若干变形和改进。这些都属于本公开的保护范围。The present disclosure is described in detail below in conjunction with the exemplary embodiments. The following examples are intended to help those skilled in the art to understand the present disclosure in an embodiment, but do not limit the disclosure in any form. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the scope of the present disclosure. These are all within the scope of protection of the present disclosure.
示例性来说,服务器在视频拍摄采集阶段可同时通过图像处理、模式识别等技术对视频内容进行分析。根据分析结果,对视频媒体的特定内容或特定空间位置进行标记。For example, the server can simultaneously analyze video content through image processing, pattern recognition, and the like in the video capture and acquisition phase. Mark specific content or specific spatial locations of the video media based on the results of the analysis.
也可以,服务器在视频编辑过程中,对视频媒体的特定内容或特定空间位置进行标记。Alternatively, the server may mark specific content or specific spatial locations of the video media during the video editing process.
也可以,服务器在采集完成或者编辑完成的视频数据中,对视频媒体的特定内容或特定空间位置进行标记。Alternatively, the server tags the specific content or specific spatial location of the video media in the captured or edited video data.
示例性地,服务器可以将标记的特定内容或特定空间位置信息放置在视频流或码流中的的预留字段中。Illustratively, the server may place the tagged specific content or specific spatial location information in a reserved field in the video stream or codestream.
也可以是服务器单独制作标记数据与对应的视频数据关联。It is also possible that the server separately creates the tag data associated with the corresponding video data.
也可以用户使用的客户端根据用户使用习惯,单独制作对应视频的标记数据,并反馈给服务器。It is also possible for the client used by the user to separately create the tag data of the corresponding video according to the user's usage habit, and feed back to the server.
用户在收到视频媒体后,可通过对这些信息标识获悉该视频中的特定内容及其空间位置,从而进行进一步的应用处理。After receiving the video media, the user can learn the specific content and the spatial location of the video by identifying the information, thereby performing further application processing.
服务器在向用户推送视频之前,可以先通过匹配预设用户信息和视频中标记的标识信息来获取与用户信息相匹配的视频区域。再根据用户喜好或设定进行匹配推送。Before the server pushes the video to the user, the server may first obtain the video area that matches the user information by matching the preset user information and the identification information marked in the video. Then match the push according to user preferences or settings.
也可以,服务器在视频推送的过程中,服务器根据用户针对特定内容的视频观看需求,动态匹配标识信息,并推送对应的视频区域给用户。Alternatively, during the video push process, the server dynamically matches the identification information according to the video viewing requirement of the user for the specific content, and pushes the corresponding video area to the user.
也可以,服务器给用户推送的是完整视频,终端根据预设的用户信息和视频中标记的标识信息来获取用户信息相匹配的视频区域,在根据用户的喜好或设定进行匹配显示。Alternatively, the server pushes the complete video to the user, and the terminal acquires the video area that matches the user information according to the preset user information and the identification information marked in the video, and performs matching display according to the user's preference or setting.
也可以,服务器给用户推送的是完整视频,终端在用户观看的过程中,根据用户对特定内容的视频观看需求,动态匹配标识信息,并显示对应的视频区域给用户。Alternatively, the server pushes the complete video to the user. During the user's viewing process, the terminal dynamically matches the identification information according to the video viewing requirement of the specific content, and displays the corresponding video area to the user.
此处的用户信息可以包括但不限于以下至少之一:用户的观看习惯,用户对特定内容的喜好,用户的喜好程度,用户的特定用途。此处的标识信息可以设置为指示但不限于以下至少之一:该区域视频内容附属标签的标签类型,该区域视频内容附属标签的标签内容类型,该区域视频内容附属标签的标签内容的信息,该区域视频内容的质量等级,该区域在整体视频中的空间位置。The user information herein may include, but is not limited to, at least one of: a user's viewing habits, a user's preference for a particular content, a user's preference, and a user's specific use. The identification information herein may be set to indicate but not limited to at least one of the following: a label type of the video content attachment label of the area, a label content type of the video content attachment label of the area, and information of the label content of the area video content attachment label, The quality level of the video content in the area, the spatial location of the area in the overall video.
由于视频中的标识信息是基于特定视频内容和空间位置来进行标识的,因此可以服务器可以直接定位到与用户信息相匹配的视频区域并将该视频区域推送给用户,终端可以直 接定位到用户信息相匹配的视频区域并将该视频区域显示给用户。需要说明的是,此处的用户信息可以是在视频推送之前预先获取的用户信息,也可以是在用户观看视频的过程中,通过收集用户的反馈来获取的,本实施例对此不做限定。若是预先收集的用户信息,可以在用户观看视频的初始阶段,就将匹配好的视频区域推送给用户,若是在用户观看视频的过程中收集的用户信息,可以在分析用户信息并与视频中标识信息匹配后,在用户的后续观看过程中,将匹配好的视频区域推送给用户。Since the identification information in the video is identified based on the specific video content and spatial location, the server can directly locate the video area that matches the user information and push the video area to the user, and the terminal can be straight. The video area that matches the user information is located and displayed to the user. It should be noted that the user information here may be the user information that is obtained in advance before the video is pushed, or may be obtained by collecting the feedback of the user in the process of the user watching the video, which is not limited in this embodiment. . If the user information is collected in advance, the matched video area can be pushed to the user in the initial stage of the user watching the video. If the user information collected during the video viewing by the user is analyzed, the user information can be analyzed and identified in the video. After the information is matched, the matched video area is pushed to the user during the subsequent viewing process of the user.
上述标记过程可以通过在视频媒体相关信息中增加新的标识信息来实现,这些信息可以多样地,优选地由以下一组信息为例实现。The above marking process can be implemented by adding new identification information to the video media related information, which can be implemented variously, preferably by the following set of information.
quality_level:指示该区域视频内容的质量等级;Quality_level: indicates the quality level of the video content in the area;
label_center_yaw:指示该标签区域中心点的偏航角yaw坐标值;Label_center_yaw: indicates the yaw coordinate value of the yaw angle of the center point of the label area;
label_center_pitch:指示该标签区域中心点的俯仰角pitch坐标值;Label_center_pitch: indicates the pitch angle pitch coordinate value of the center point of the label area;
label_width:指示该标签区域的宽;Label_width: indicates the width of the label area;
label_height:指示该标签区域的高;Label_height: indicates the height of the label area;
label_type:指示该区域视频内容附属标签的标签类型;Label_type: indicates the label type of the video content attachment label of the area;
label_info_type:指示该区域视频内容附属标签的标签内容类型;Label_info_type: indicates the tag content type of the video content affiliate tag of the area;
label_info_content_length:指示该区域视频内容附属标签的内容长度;Label_info_content_length: indicates the content length of the video content affiliate tag of the area;
content_byte:指示该区域视频内容附属标签的标签内容的具体字节信息。Content_byte: Indicates the specific byte information of the tag content of the video content affiliate tag of the area.
以下实施例中为了描述方便,引用上述的一组标识信息描述,但在其他实施例中,也可以或可能是其他的信息。For the convenience of description in the following embodiments, a set of identification information descriptions described above is cited, but in other embodiments, other information may or may be used.
以基础媒体文件格式ISOBMFF为例,合理地添加基于视频内容及其空间位置的标识信息,即quality_level、label_center_yaw、label_center_pitch、label_width、label_height、label_type、label_info_type、label_info_content_length、content_byte,形成对特定内容和特定空间位置视频区域的标识。Taking the basic media file format ISOBMFF as an example, the identification information based on the video content and its spatial position, that is, quality_level, label_center_yaw, label_center_pitch, label_width, label_height, label_type, label_info_type, label_info_content_length, content_byte, is appropriately added to form a specific content and a specific spatial position. The ID of the video area.
针对本公开,可以根据需要合理地添加如下字段:For the present disclosure, the following fields can be reasonably added as needed:
label_number:指示该视频区域中包含的标签数量。Label_number: Indicates the number of labels included in this video area.
quality_level:指示该区域视频内容的质量等级,取值越高则视频质量越高。Quality_level: Indicates the quality level of video content in this area. The higher the value, the higher the video quality.
label_center_yaw:指示该标签区域中心点的yaw坐标值,以0.01度为单位,取值范围[-18000,18000)。Label_center_yaw: Indicates the yaw coordinate value of the center point of the label area, in units of 0.01 degrees, ranging from [-18000, 18000).
label_center_pitch:指示该标签区域中心点的pitch坐标值,以0.01度为单位,取值范围[-9000,9000]。Label_center_pitch: Indicates the pitch coordinate value of the center point of the label area, in units of 0.01 degrees, in the range of [-9000, 9000].
label_width:指示该标签区域的宽度,以0.01度为单位。Label_width: Indicates the width of the label area, in units of 0.01 degrees.
label_height:指示该标签区域的高度,以0.01度为单位。label_type:指示该区域视频内容附属标签的标签类型,其中标签类型的取值和含义如表1所示。 Label_height: Indicates the height of the label area, in units of 0.01 degrees. Label_type: indicates the label type of the video content attachment label of the area. The value and meaning of the label type are shown in Table 1.
表1Table 1
取值Value 描述description
00 该视频内容的附属标签为人脸The attached label of the video content is a face
11 该视频内容的附属标签为车牌The attached label for this video content is the license plate
22 该视频内容的附属标签为一般运动目标The affiliate tag of this video content is a general sports target.
33 该视频内容的附属标签为一般静态目标The affiliate tag of the video content is a general static target
44 该视频内容的附属标签为商品The attached label for this video content is the product
55 该视频内容的附属标签为植物The attached label for this video content is a plant
6-2546-254 该部分为保留字段This part is reserved
255255 该视频内容的附属标签为用户自定义标签The affiliate tag for this video content is a user-defined tag
label_info_type:指示该区域视频内容附属标签的标签内容类型,其中标签内容类型的取值和含义如表2所示。Label_info_type: indicates the tag content type of the video content attachment tag of the area. The value and meaning of the tag content type are shown in Table 2.
表2Table 2
取值Value 描述description
00 该标签内容为文本The label content is text
11 该标签内容为URLThe tag content is a URL
2-2552-255 该部分为保留字段This part is reserved
label_info_content_length:指示该区域视频内容附属标签的标签内容的长度。Label_info_content_length: indicates the length of the label content of the video content attachment tag of the area.
content_byte:指示该区域视频内容附属标签的标签内容的具体字节信息。Content_byte: Indicates the specific byte information of the tag content of the video content affiliate tag of the area.
基于以上信息,以ISOBMFF为例,下面给出一种对这些信息的组织结构。一个视频区域对应的标签组LabelBox中包含label_number个标签信息LabelInfoBox和标签区域信息LabelRegionBox。Based on the above information, taking ISOBMFF as an example, the organization of this information is given below. The label group LabelBox corresponding to a video area includes label_number label information LabelInfoBox and label area information LabelRegionBox.
一个标签信息LabelInfoBox包含有一个标签类型label_type,一个标签内容类型label_info_type,一个标签的内容长度label_content_length,以及label_content_length的内容信息content_byte。A label information LabelInfoBox contains a label type label_type, a label content type label_info_type, a label content length label_content_length, and a label_content_length content information content_byte.
一个标签区域信息LabelRegionBox包含质量等级quality_level、空间位置信息:标签区域中心点信息(label_center_yaw.label_center_pitch),标签区域宽度label_width,标签区域高度label_height。A label area information LabelRegionBox contains a quality level quality_level, spatial position information: label area center point information (label_center_yaw.label_center_pitch), label area width label_width, label area height label_height.
Figure PCTCN2017112342-appb-000001
Figure PCTCN2017112342-appb-000001
Figure PCTCN2017112342-appb-000002
Figure PCTCN2017112342-appb-000002
上述各字段的含义,已在上文中作出说明。The meaning of each of the above fields has been explained above.
需要注意的是,本公开中只是以上述字段为例对视频内容附属标签进行说明,并不局限于以上字段及其大小。为了更好的理解以上字段的含义,可参见附图6所示的应用实例。图6是本公开实施例中的一种可选的标识信息的内容示意图。It should be noted that, in the present disclosure, the video content attached label is only described by taking the above field as an example, and is not limited to the above fields and their sizes. In order to better understand the meaning of the above fields, refer to the application example shown in FIG. 6. FIG. 6 is a schematic diagram of content of an optional identification information in an embodiment of the present disclosure.
实施例4Example 4
为了更好地理解上述实施例中的技术方案,本实施例通过以下优选实施例来介绍本公开实施例的技术方案。In order to better understand the technical solutions in the foregoing embodiments, the present embodiment introduces the technical solutions of the embodiments of the present disclosure by using the following preferred embodiments.
优选实施例一:视频定位应用Preferred Embodiment 1: Video Positioning Application
在全景视频包含180度或者360度的视角范围,但人的视角有局限性,并不能同一时刻观看整个全景视频的内容,而只是观看全景视频的部分内容。因此,用户可以按照不同的浏览顺序来观看全景中的不同区域视频。值得注意的是,用户观看全景视频的某些区域并不是完全随机的行为,而是根据用户个人的喜好来进行视频区域的切换。本公开提供与视频关联的标签,设置为指示部分视频区域特定内容和特定空间空间位置信息,进而结合用户喜好直接定位到对应的视频区域,将该部分的视频呈现给用户。以下通过几个示例来示例性说明。The panoramic video includes a 180-degree or 360-degree viewing angle range, but the human viewing angle has limitations, and the entire panoramic video content cannot be viewed at the same time, but only a part of the panoramic video is viewed. Therefore, the user can view different area videos in the panorama in different browsing order. It is worth noting that some areas where the user views the panoramic video are not completely random, but the video area is switched according to the user's personal preference. The disclosure provides a label associated with a video, which is set to indicate a partial video area specific content and specific spatial space location information, and then directly locates a corresponding video area according to user preferences, and presents the part of the video to the user. The following is exemplified by several examples.
示例一Example one
根据预设标签类型在已录制完成的全景视频内容中,标记出对应视频区域的信息,在用户观看过程中根据用户的对标签类型的喜好设置,将含有此标签的视频区域优先推送给用户观看。According to the preset label type, the information of the corresponding video area is marked in the recorded panoramic video content, and the video area containing the label is preferentially pushed to the user for viewing according to the user's preference for the label type during the user viewing process. .
也可以根据视频中已有的标签类型,动态收集用户观看内容信息,分析用户喜好,推送用户感兴趣区域视频给用户观看。 It is also possible to dynamically collect the user's viewing content information according to the existing tag types in the video, analyze the user's preferences, and push the user's area of interest video to the user for viewing.
示例性地可参见图7。图7是本公开实施例的一种可选的视频定位方法示意图。如图7所示,标签可以指示对应区域为人脸、植物等信息。如果该用户喜欢关注视频中的植物,则可以在用户观看全景视频时,通过定位植物标签并根据其对应的空间位置信息和旋转信息,优先推送该区域视频内容给用户观看。See Figure 7 for an exemplary example. FIG. 7 is a schematic diagram of an optional video positioning method according to an embodiment of the present disclosure. As shown in FIG. 7, the tag may indicate that the corresponding area is information such as a face, a plant, and the like. If the user likes to pay attention to the plants in the video, the user can preferentially push the video content of the region to the user by positioning the plant tag and according to the corresponding spatial location information and rotation information when the user views the panoramic video.
示例二Example two
因为同一时刻用户视角区域有限,不会观看全部全景视频,所以,在带宽有限的情况下,可以对用户感兴趣区域进行高质量编码,对用户非感兴趣区域进行低质量编码。Because the user's view area is limited at the same time, the entire panoramic video will not be viewed. Therefore, in the case of limited bandwidth, the user's region of interest can be encoded with high quality, and the non-interest region of the user can be low-quality encoded.
示例性地,用户感兴趣的人脸所在区域采用高质量编码方式,其它部分采用低质量编码。Illustratively, the area of the face in which the user is interested is in a high quality coding mode, and the other parts are in low quality coding.
示例三Example three
在观看全景视频的过程中,用户可以设置多个感兴趣标签,根据这些标签的各种可能组合形式,给用户推送最优区域视频。In the process of watching the panoramic video, the user can set a plurality of tags of interest, and push the optimal area video to the user according to various possible combinations of the tags.
也可以根据动态收集用户的观看习惯,分析用户喜好,选取多个用户喜好组合成多种形式。It is also possible to dynamically collect the user's viewing habits, analyze the user's preferences, and select multiple user preferences to combine into multiple forms.
示例性地,用户对某个人和某种车感兴趣,同时将这两类设置为感兴趣标签,在推送视频的时候,优先显示同时含有人和车两个标签的视频区域,当没有同时存在时,选取单独存在人或车的视频区域显示。Illustratively, the user is interested in a certain person and a certain car, and sets the two types as the tags of interest. When pushing the video, the video area containing both the person and the car tags is preferentially displayed when there is no simultaneous presence. When, select the video area display of the person or car alone.
示例四Example four
在全景视频中添加的标签类型可以是预先设定好的,也可以是用户根据自身需要自定义的标签,The tag type added in the panoramic video can be preset, or it can be a label customized by the user according to his own needs.
或者是用户自己对相关标签进行组合定义出自己需要的组合类型。Or the user can combine the related tags to define the type of combination that they need.
示例性地,用户对视频中某个物品设置自定义标签,将标签的信息反馈会服务器,服务器根据设定的标签,后续推送给用户相关视频区域。Exemplarily, the user sets a custom label for an item in the video, feeds the information of the label to the server, and the server subsequently pushes the relevant video area to the user according to the set label.
示例五Example five
在全景视频中的不同标签携带的内容形式和内容本身可以不同,标签内容可以是文字,如人物标签,文字内容描述人物姓名和履历。标签内容可以是数字,比如商品标签,数字内容描述价格信息。标签内容可以是链接,比如植物标签,链接内容给出植物详细介绍的URL地址。The content form carried by the different tags in the panoramic video may be different from the content itself, and the tag content may be text, such as a character tag, and the text content describes the person's name and resume. The tag content can be a number, such as a product tag, and a digital content describing the price information. The tag content can be a link, such as a plant tag, and the link content gives the URL address of the plant in detail.
示例六Example six
同一个视频区域里的一个标签可以关联多种类型的内容信息,示例性地,对于视频中的某一商品标志进行描述,可以添加商品名称的文本信息,商品价格或者生产日期的数字信息,商品购买路径的链接信息。A tag in the same video area can be associated with multiple types of content information. Illustratively, for a description of a product mark in a video, text information of the product name, digital price of the product price or production date, and goods can be added. The link information for the purchase path.
示例七Example seven
全景视频区域设定的一个标签可以嵌套包含多个子标签,示例性地,针对体育运动全景视频,场上有多位运动员,用户关注的不是单独某位运动员,而是整个运动画面和运动 员之间的配合,在同一体育标签下可以嵌套多个人物子标签,以便用户可以观看。A label of the panoramic video area setting can be nested to contain multiple sub-tags. Illustratively, for a sports panoramic video, there are a plurality of athletes on the field, and the user does not pay attention to a single athlete, but the entire moving picture and motion. In the cooperation between the members, multiple character sub-tags can be nested under the same sports label so that the user can watch.
优选实施例二:虚拟现实视频应用Preferred Embodiment 2: Virtual Reality Video Application
类似全景视频应用,在虚拟现实视频应用中,用户观看的视频区域并不是完整的虚拟现实视频区域,因此通过增加不同标签的方法,可以为用户推送感兴趣视频内容。Similar to the panoramic video application, in the virtual reality video application, the video area viewed by the user is not a complete virtual reality video area, so the video content of interest can be pushed for the user by adding different labels.
优选实施例三:多视点视频应用Preferred Embodiment 3: Multi-view video application
多个视点视频里增加标签,用户对感兴趣区域设定标签,可以根据用户感兴趣标签,选取最佳视点视频推送给用户。A tag is added to multiple view videos, and the user sets a tag for the region of interest, and the best view video can be selected and pushed to the user according to the tag of the user's interest.
优选实施例四:视频检索应用Preferred Embodiment 4: Video Retrieval Application
在视频监控的应用场景中,获取的监控视频通常被用来追踪目标车辆、目标人物等,但由于这些追踪行为往往需要在短时间内通过图像处理等技术分析处理大量的监控视频,为视频监控的应用带来了繁重的工作量。而结合本公开提供的指示信息,由于在拍摄监控视频期间即可对特定内容如人脸、车牌等的视频区域进行标签指示,在收到该监控视频后则能直接对视频中的标签进行检索,大大地减少了视频检索的工作量。以下通过几个示例来示例性说明。In the application scenario of video surveillance, the acquired surveillance video is usually used to track target vehicles, target people, etc., but because these tracking behaviors often need to analyze and process a large number of surveillance videos through image processing and other technologies in a short time, for video surveillance. The application brings a lot of work. In combination with the indication information provided by the present disclosure, since the video area of a specific content such as a face, a license plate, and the like can be marked during the shooting of the monitoring video, the label in the video can be directly retrieved after receiving the monitoring video. , greatly reducing the workload of video retrieval. The following is exemplified by several examples.
示例一Example one
图8是本公开实施例的一种可选的视频检索方法示意图。如图8所示,在监控视频拍摄采集期间对视频中的特定内容进行标签标识,用户在收到监控视频后即可直接对这些标签进行检索,比如可检索所有车牌的标签,并获取这些标签所关联的信息,最后获得视频中包含的所有车牌的视频信息和车牌的号码信息。FIG. 8 is a schematic diagram of an optional video retrieval method according to an embodiment of the present disclosure. As shown in FIG. 8, the specific content in the video is tagged during the monitoring video capture, and the user can directly retrieve the tags after receiving the monitoring video, for example, the tags of all the license plates can be retrieved, and the tags are obtained. The associated information finally obtains the video information of all the license plates included in the video and the number information of the license plate.
示例二Example two
视频检索时可以设置多个标签,根据这些标签的各种组合形式,搜索出所有相关视频区域。Multiple tags can be set for video retrieval, and all relevant video regions are searched based on various combinations of these tags.
示例性地,以某个人和某种车为组合标签进行搜索,最后获得包含这两个标签的视频信息。Illustratively, a person and a certain car are searched for a combined tag, and finally video information containing the two tags is obtained.
视频检索,即在大量视频中直接检索用户所需要的视频内容。本公开提供了一种基于视频内容及其空间位置的标识信息标记方法,因此,可以通过检索本公开提供的标识信息从而快速检索到对应的视频区域,大大地提高了视频检索的效率。Video retrieval, which directly retrieves the video content required by the user in a large number of videos. The present disclosure provides an identification information marking method based on video content and its spatial location. Therefore, the corresponding video region can be quickly retrieved by retrieving the identification information provided by the present disclosure, thereby greatly improving the efficiency of video retrieval.
本公开中以ISOBMFF为例,阐明所提出的解决方案,但是这些方案同样也可以用于其它文件封装、传输系统和协议里。The present disclosure uses ISOBMFF as an example to illustrate the proposed solution, but these solutions can also be used in other file packages, transmission systems, and protocols.
实施例5Example 5
本公开的实施例还提供了一种存储介质。可选地,在本公开实施例中,上述存储介质可以设置为保存上述实施例一所提供的卡托弹出方法所执行的程序代码。Embodiments of the present disclosure also provide a storage medium. Optionally, in the embodiment of the present disclosure, the foregoing storage medium may be configured to save the program code executed by the card tray pop-up method provided in the first embodiment.
可选地,在本公开实施例中,上述存储介质可以位于计算机网络中计算机终端群中的任意一个计算机终端中,或者位于移动终端群中的任意一个移动终端中。Optionally, in the embodiment of the present disclosure, the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.
可选地,在本公开实施例中,存储介质被设置为存储设置为执行以下步骤的程序代码: Optionally, in an embodiment of the present disclosure, the storage medium is arranged to store program code arranged to perform the following steps:
S1,对视频中的目标对象进行标记,进而根据标记结果生成目标对象的标识信息,其中,标识信息至少设置为指示以下之一:目标对象的类型,目标对象的内容,目标对象在视频中的空间位置信息;S1, marking the target object in the video, and generating identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and a target object in the video. Spatial location information;
S2,获取指令信息,根据所指令信息索引指定目标对象的指定标识信息;S2: acquiring instruction information, and specifying specified identification information of the target object according to the index of the instruction information;
S3,推送或显示在视频中指定标识信息对应的部分或全部视频。S3, pushing or displaying some or all of the videos corresponding to the specified identification information in the video.
上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present disclosure are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
在本公开的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present disclosure, the descriptions of the various embodiments are all focused, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed technical contents may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本公开实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present disclosure.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .
以上所述仅为本公开的优选实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The above description is only a preferred embodiment of the present disclosure, and is not intended to limit the disclosure, and various changes and modifications may be made to the present disclosure. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the present disclosure are intended to be included within the scope of the present disclosure.
工业实用性Industrial applicability
本公开实施例提供的视频的处理方法,通过对视频中的目标对象进行标记,进而根据 标记结果生成目标对象的标识信息,然后通过获取设置为指示指定目标对象的指令信息,并根据指令信息索引到指定目标对象的指定标识信息,然后根据标识信息中的空间位置信息推送或显示指定标识信息对应的部分或全部视频,解决了相关技术中用户需要在接收到的大量视频中通过识别视频内容来检测自己感兴趣的视频区域,导致需要耗费大量的资源和时间的问题,用户可以通过索引视频中已经存在的标识信息快速获取感兴趣的视频,大大节省了视频检索过程中的资源和时间。 A method for processing a video provided by an embodiment of the present disclosure, by marking a target object in a video, and further according to The tag result generates identification information of the target object, and then obtains instruction information set to indicate the specified target object, and indexes the specified identification information to the specified target object according to the instruction information, and then pushes or displays the specified identifier according to the spatial location information in the identification information. Part or all of the video corresponding to the information solves the problem that the user needs to detect the video area that is interested in the video content in the received large number of videos in the related art, which causes a large amount of resources and time, and the user can pass the index. The identification information already existing in the video quickly acquires the video of interest, which greatly saves resources and time in the video retrieval process.

Claims (14)

  1. 一种视频的处理方法,其中,包括:A video processing method, including:
    对视频中的目标对象进行标记,进而根据标记结果生成所述目标对象的标识信息,其中,所述标识信息至少设置为指示以下之一:所述目标对象的类型,所述目标对象的内容,所述目标对象在所述视频中的空间位置信息;Marking the target object in the video, and generating the identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, Spatial location information of the target object in the video;
    获取指令信息,根据所述指令信息索引指定目标对象的指定标识信息;Obtaining instruction information, and specifying specified identification information of the target object according to the instruction information index;
    推送或显示在所述视频中所述指定标识信息对应的部分或全部视频。Pushing or displaying part or all of the video corresponding to the specified identification information in the video.
  2. 根据权利要求1所述的方法,其中,所述标识信息至少还设置为指示以下之一:所述标识信息的标记类型,所述标识信息的标记内容类型,所述标识信息的标记内容,所述标识信息的长度信息,所述目标对象所在的部分或全部视频的质量等级,所述目标对象所在的部分或全部视频中包含的标识信息的数量,所述目标对象所在的部分或全部视频对应的时间信息,所述目标对象所在的部分或全部视频在所述视频中的空间位置信息。The method according to claim 1, wherein the identification information is further configured to at least indicate one of: a tag type of the identification information, a tag content type of the identification information, and a tag content of the identification information, a length information of the identification information, a quality level of a part or all of the video in which the target object is located, a quantity of the identification information included in part or all of the video in which the target object is located, and a part or all of the video in which the target object is located Time information, spatial location information of some or all of the videos in which the target object is located in the video.
  3. 根据权利要求2所述的方法,其中,所述部分或全部视频在所述视频中的空间位置信息至少包括以下之一:所述部分或全部视频的中心点坐标,所述部分或全部视频的宽度,所述部分或全部视频的高度;其中,所述坐标所在的坐标系包括以下之一:二维空间坐标系,三维空间坐标系。The method according to claim 2, wherein the spatial position information of the part or all of the video in the video comprises at least one of: a central point coordinate of the part or all of the video, the part or all of the video Width, the height of the part or all of the video; wherein the coordinate system in which the coordinates are located includes one of the following: a two-dimensional spatial coordinate system, a three-dimensional spatial coordinate system.
  4. 根据权利要求3所述的方法,其中,The method of claim 3, wherein
    在二维空间坐标系下,所述坐标的取值包括以下至少之一:二维直角坐标系取值,二维球面坐标系取值;In the two-dimensional space coordinate system, the value of the coordinate includes at least one of the following: a value of a two-dimensional rectangular coordinate system, and a value of a two-dimensional spherical coordinate system;
    在三维空间坐标系下,所述坐标的取值为以下至少之一:三维空间直角坐标系取值,三维球面坐标系取值。In the three-dimensional space coordinate system, the coordinates are at least one of the following values: a three-dimensional space rectangular coordinate system takes values, and a three-dimensional spherical coordinate system takes values.
  5. 根据权利要求1所述的方法,其中,所述对视频中的目标对象进行标记,进而根据标记结果生成所述目标对象的标识信息,包括:The method according to claim 1, wherein the marking the target object in the video, and generating the identification information of the target object according to the marking result, comprises:
    在视频采集或编辑的过程中,对视频中的目标对象进行标记,进而根据标记结果生成所述目标对象的标识信息;和/或During the process of video capture or editing, marking the target object in the video, and generating the identification information of the target object according to the marking result; and/or
    在采集或编辑完成的视频数据中,对视频中的目标对象进行标记,进而根据标记结果生成所述目标对象的标识信息。In the captured or edited video data, the target object in the video is marked, and the identification information of the target object is generated according to the marking result.
  6. 根据权利要求1所述的方法,其中,所述获取指令信息包括:The method of claim 1, wherein the obtaining instruction information comprises:
    获取用户预先设置的第一指令信息;和/或Obtaining first instruction information preset by a user; and/or
    获取在分析用户的视频观看行为后得出的第二指令信息。Obtaining second instruction information obtained after analyzing the video viewing behavior of the user.
  7. 一种视频的处理装置,其中,包括:A video processing device, comprising:
    标记模块,设置为对视频中的目标对象进行标记;a tagging module, configured to tag a target object in the video;
    生成模块,设置为根据标记结果生成所述目标对象的标识信息,其中,所述标识信息至少设置为指示以下之一:所述目标对象的类型,所述目标对象的内容,所述目标对象在 所述视频中的空间位置信息;Generating a module, configured to generate identification information of the target object according to the marking result, wherein the identification information is at least set to indicate one of: a type of the target object, a content of the target object, and the target object is Spatial location information in the video;
    获取模块,设置为获取指令信息;Obtaining a module, set to obtain instruction information;
    索引模块,设置为根据所述指令信息索引指定目标对象的指定标识信息;An indexing module, configured to: specify specified identification information of the target object according to the instruction information index;
    处理模块,设置为推送或显示在所述视频中所述指定标识信息对应的部分或全部视频。The processing module is configured to push or display part or all of the video corresponding to the specified identification information in the video.
  8. 根据权利要求7所述的装置,其中,所述标识信息至少设置为指示以下之一:所述标识信息的标记类型,所述标识信息的标记内容类型,所述标识信息的长度信息,所述标识信息的标记内容,所述目标对象所在的部分或全部视频的质量等级,所述目标对象所在的部分或全部视频中包含的标识信息的数量,所述部分或全部视频对应的时间信息,所述部分或全部视频在所述视频中的空间位置信息。The apparatus according to claim 7, wherein the identification information is at least set to indicate one of: a tag type of the identification information, a tag content type of the identification information, a length information of the identification information, the a mark content of the identification information, a quality level of a part or all of the video in which the target object is located, a quantity of the identification information included in part or all of the videos in which the target object is located, and time information corresponding to the part or all of the videos. The spatial location information of some or all of the videos in the video.
  9. 根据权利要求8所述的装置,其中,所述部分或全部视频在所述视频中的空间位置信息至少包括以下之一:所述部分或全部视频的中心点坐标,所述部分或全部视频的宽度,所述部分或全部视频的高度;其中,所述坐标所在的坐标系包括以下之一:二维空间坐标系,三维空间坐标系。The apparatus according to claim 8, wherein the spatial position information of said part or all of said video in said video comprises at least one of: a central point coordinate of said part or all of said video, said part or all of video Width, the height of the part or all of the video; wherein the coordinate system in which the coordinates are located includes one of the following: a two-dimensional spatial coordinate system, a three-dimensional spatial coordinate system.
  10. 根据权利要求9所述的装置,其中,The apparatus according to claim 9, wherein
    在二维空间坐标系下,所述坐标的取值包括以下至少之一:二维直角坐标系取值,二维球面坐标系取值;In the two-dimensional space coordinate system, the value of the coordinate includes at least one of the following: a value of a two-dimensional rectangular coordinate system, and a value of a two-dimensional spherical coordinate system;
    在三维空间坐标系下,所述坐标的取值为以下至少之一:三维空间直角坐标系取值,三维球面坐标系取值。In the three-dimensional space coordinate system, the coordinates are at least one of the following values: a three-dimensional space rectangular coordinate system takes values, and a three-dimensional spherical coordinate system takes values.
  11. 根据权利要求7所述的装置,其中,所述标记模块包括:The apparatus of claim 7 wherein said marking module comprises:
    第一标记单元,设置为在视频采集或编辑的过程中,对视频中的目标对象进行标记;a first marking unit configured to mark a target object in the video during video capture or editing;
    第二标记单元,设置为在采集或编辑完成的视频数据中,对视频中的目标对象进行标记。The second marking unit is configured to mark the target object in the video in the captured or edited video data.
  12. 根据权利要求7所述的装置,其中,所述获取模块包括:The apparatus of claim 7, wherein the obtaining module comprises:
    第一获取单元,设置为获取用户预先设置的第一指令信息;a first acquiring unit, configured to acquire first instruction information preset by a user;
    第二获取单元,设置为获取在分析用户的视频观看行为后得出的第二指令信息。The second obtaining unit is configured to acquire second instruction information that is obtained after analyzing the video viewing behavior of the user.
  13. 一种存储介质,其中,所述存储介质包括存储的程序,其中,所述程序运行时执行权利要求1至6中任一项所述的方法。A storage medium, wherein the storage medium comprises a stored program, wherein the program is executed to perform the method of any one of claims 1 to 6.
  14. 一种处理器,其中,所述处理器设置为运行程序,其中,所述程序运行时执行权利要求1至6中任一项所述的方法。 A processor, wherein the processor is configured to execute a program, wherein the program is executed to perform the method of any one of claims 1 to 6.
PCT/CN2017/112342 2017-03-24 2017-11-22 Video processing method and apparatus WO2018171234A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710186180.0A CN108628913B (en) 2017-03-24 2017-03-24 Video processing method and device
CN201710186180.0 2017-03-24

Publications (1)

Publication Number Publication Date
WO2018171234A1 true WO2018171234A1 (en) 2018-09-27

Family

ID=63584114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/112342 WO2018171234A1 (en) 2017-03-24 2017-11-22 Video processing method and apparatus

Country Status (2)

Country Link
CN (1) CN108628913B (en)
WO (1) WO2018171234A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110798736B (en) * 2019-11-28 2021-04-20 百度在线网络技术(北京)有限公司 Video playing method, device, equipment and medium
CN111487889A (en) * 2020-05-08 2020-08-04 北京金山云网络技术有限公司 Method, device and equipment for controlling intelligent equipment, control system and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101068342A (en) * 2007-06-05 2007-11-07 西安理工大学 Video frequency motion target close-up trace monitoring method based on double-camera head linkage structure
CN101207807A (en) * 2007-12-18 2008-06-25 孟智平 Method for processing video and system thereof
US20080252723A1 (en) * 2007-02-23 2008-10-16 Johnson Controls Technology Company Video processing systems and methods
CN101420595A (en) * 2007-10-23 2009-04-29 华为技术有限公司 Method and equipment for describing and capturing video object

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003079952A (en) * 2001-09-14 2003-03-18 Square Co Ltd Computer readable record medium in which video game program is recorded, video game program, and method and device for processing video game
CN101930779B (en) * 2010-07-29 2012-02-29 华为终端有限公司 Video commenting method and video player
CN104602128A (en) * 2014-12-31 2015-05-06 北京百度网讯科技有限公司 Video processing method and device
CN104837034B (en) * 2015-03-09 2019-04-12 腾讯科技(北京)有限公司 A kind of information processing method, client and server
CN106303401B (en) * 2015-05-12 2019-12-06 杭州海康威视数字技术股份有限公司 Video monitoring method, equipment and system thereof and video monitoring method based on shopping mall
CN105843541A (en) * 2016-03-22 2016-08-10 乐视网信息技术(北京)股份有限公司 Target tracking and displaying method and device in panoramic video
CN105847998A (en) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 Video playing method, playing terminal, and media server
CN105933650A (en) * 2016-04-25 2016-09-07 北京旷视科技有限公司 Video monitoring system and method
CN106023261B (en) * 2016-06-01 2019-11-29 无锡天脉聚源传媒科技有限公司 A kind of method and device of television video target following
CN106254925A (en) * 2016-08-01 2016-12-21 乐视控股(北京)有限公司 Destination object extracting method based on video identification, equipment and system
CN106303726B (en) * 2016-08-30 2021-04-16 北京奇艺世纪科技有限公司 Video tag adding method and device
CN106504187A (en) * 2016-11-17 2017-03-15 乐视控股(北京)有限公司 Video frequency identifying method and device
CN106534944B (en) * 2016-11-30 2020-01-14 北京字节跳动网络技术有限公司 Video display method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080252723A1 (en) * 2007-02-23 2008-10-16 Johnson Controls Technology Company Video processing systems and methods
CN101068342A (en) * 2007-06-05 2007-11-07 西安理工大学 Video frequency motion target close-up trace monitoring method based on double-camera head linkage structure
CN101420595A (en) * 2007-10-23 2009-04-29 华为技术有限公司 Method and equipment for describing and capturing video object
CN101207807A (en) * 2007-12-18 2008-06-25 孟智平 Method for processing video and system thereof

Also Published As

Publication number Publication date
CN108628913B (en) 2024-06-25
CN108628913A (en) 2018-10-09

Similar Documents

Publication Publication Date Title
US10380170B2 (en) Integrated image searching system and service method thereof
US9560323B2 (en) Method and system for metadata extraction from master-slave cameras tracking system
US20230351709A1 (en) Interaction analysis systems and methods
CN103577788A (en) Augmented reality realizing method and augmented reality realizing device
KR20130105542A (en) Object identification in images or image sequences
US9007396B2 (en) Methods and systems for analyzing parts of an electronic file
WO2012064494A1 (en) Aligning and annotating different photo streams
WO2018113659A1 (en) Method of displaying streaming medium data, device, process, and medium
CN111586474A (en) Live video processing method and device
CN105022773B (en) Image processing system including picture priority
CN105183739B (en) Image processing method
CN104520848A (en) Searching for events by attendants
CN110881131B (en) Classification method of live review videos and related device thereof
Kim et al. Key frame selection algorithms for automatic generation of panoramic images from crowdsourced geo-tagged videos
WO2018171234A1 (en) Video processing method and apparatus
US9538209B1 (en) Identifying items in a content stream
WO2018133321A1 (en) Method and apparatus for generating shot information
CN108833964B (en) Real-time continuous frame information implantation identification system
US9727890B2 (en) Systems and methods for registering advertisement viewing
US20130100296A1 (en) Media content distribution
JP6422259B2 (en) Information provision system
US9569440B2 (en) Method and apparatus for content manipulation
JP6740598B2 (en) Program, user terminal, recording device, and information processing system
JP5983745B2 (en) Image information providing apparatus, image information providing system, and image information providing method
CN112187851A (en) Multi-screen information pushing method and device based on 5G and edge calculation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17901791

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13/01/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17901791

Country of ref document: EP

Kind code of ref document: A1