WO2021208255A1 - Video clip marking method and device, and handheld camera - Google Patents

Video clip marking method and device, and handheld camera Download PDF

Info

Publication number
WO2021208255A1
WO2021208255A1 PCT/CN2020/099832 CN2020099832W WO2021208255A1 WO 2021208255 A1 WO2021208255 A1 WO 2021208255A1 CN 2020099832 W CN2020099832 W CN 2020099832W WO 2021208255 A1 WO2021208255 A1 WO 2021208255A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
target image
information
mark
category
Prior art date
Application number
PCT/CN2020/099832
Other languages
French (fr)
Chinese (zh)
Inventor
康含玉
梁峰
Original Assignee
上海摩象网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海摩象网络科技有限公司 filed Critical 上海摩象网络科技有限公司
Publication of WO2021208255A1 publication Critical patent/WO2021208255A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7328Query by example, e.g. a complete video frame or video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects

Definitions

  • the embodiments of the present application relate to the field of image processing technologies, and in particular, to a video clip marking method, device, and handheld camera.
  • the description information used to describe the video clip can be generated, so that the video clip or part of the image in the video clip can be subsequently determined according to the description information corresponding to the video clip.
  • Frame search, clustering and other processing
  • one of the technical problems solved by the embodiments of the present invention is to provide a video segment marking method, device, and handheld camera to overcome the inconsistency of the description information recording method generated by multiple image recognition algorithms in the prior art , which is not conducive to the defects of subsequent data processing and storage.
  • the embodiment of the application provides a video segment marking method, including:
  • the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
  • the attribute information includes identification mark information for identifying an identification mark of at least one object category corresponding to the target image frame; correspondingly, the video clip is obtained according to the attribute information corresponding to the target image frame
  • the tag description information includes:
  • the mark description information of the video clip is obtained.
  • the identification mark information includes at least one of the following information:
  • Object category information used to identify the identification mark of the object category corresponding to the target image frame; scene category information used to identify the identification mark of the scene object category corresponding to the target image frame; used to identify the target image Face category information of the recognition mark of the face object category corresponding to the frame.
  • the face category information includes at least one of the following sub-information: expression sub-attribute information used to identify the recognition mark of the expression category corresponding to the target image frame; and used to identify the target image frame The orientation sub-attribute information of the identification mark of the corresponding orientation category; the gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.
  • the method further includes: obtaining the mark recording information of the video clip according to the identification mark information corresponding to the target image frame, wherein the mark recording information is used to record in the target image frame At least one identification mark corresponding to the second target category.
  • the attribute information includes time information used to identify the time stamp corresponding to the target image frame
  • the method further includes: obtaining the time information of the video clip according to the time information corresponding to the target image frame.
  • the first time description information and/or the second time description information wherein the first time description information is used to record the time stamp corresponding to the target image frame including at least one target mark, and the second time description information is used To record the start time stamp and the end time stamp of the video segment.
  • the N is equal to 8.
  • An embodiment of the present application also provides a video clip marking device, including: a memory, a processor, and a video collector, where the video collector is used to collect a target to be tracked in a target area; the memory is used to store program code; The processor calls the program code, and when the program code is executed, it is used to perform the following operations:
  • the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
  • An embodiment of the present application also provides a handheld camera, including the video clip marking device according to the foregoing, and is characterized in that it further includes a carrier, which is fixedly connected to the video collector and is used to carry the video. At least part of the collector.
  • the carrier includes but is not limited to a handheld pan/tilt.
  • the handheld PTZ is a handheld three-axis PTZ.
  • the video capture device includes, but is not limited to, a handheld three-axis pan/tilt camera.
  • the attribute information corresponding to at least one target image frame in the continuous image frame is obtained by recognizing the continuous image frames in the video segment; then the tag description of the video segment is obtained according to the attribute information corresponding to the target image frame Information, where the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4. Therefore, the embodiment of the present invention can not only record the recognition results of consecutive image frames in the video segment by different image recognition algorithms in a unified manner, but also greatly save storage space.
  • FIG. 1 is a schematic flowchart of a method for marking video clips provided in Embodiment 1 of this application;
  • FIG. 2 is a schematic flowchart of a method for marking video clips provided in Embodiment 2 of the present application;
  • FIG. 3 is a schematic flowchart of a method for marking video clips provided in Embodiment 3 of the present application.
  • FIG. 4 is a schematic structural diagram of a video segment marking device provided in Embodiment 4 of this application.
  • FIG. 5 is a schematic structural diagram of a handheld pan/tilt head provided by Embodiment 5 of the application; FIG. 5
  • FIG. 6 is a schematic structural diagram of a handheld PTZ connected with a mobile phone according to Embodiment 5 of the application;
  • FIG. 7 is a schematic structural diagram of a handheld pan/tilt head provided in Embodiment 5 of this application.
  • Embodiment 1 of the present application provides a video segment marking method, as shown in FIG. 1.
  • FIG. 1 is a schematic flowchart of a video segment marking provided by an embodiment of this application, including:
  • Step S101 Recognizing continuous image frames in a video clip, and obtaining attribute information corresponding to at least one target image frame in the continuous image frames.
  • the video clip includes multiple consecutive image frames, and the number of consecutive image frames in the video clip is not limited.
  • one long video can be divided into multiple short video segments, and the number of consecutive image frames included in each video segment can be a fixed value or a non-fixed value.
  • one or more image recognition algorithms may be used to recognize consecutive image frames in the video segment.
  • the type of image recognition algorithm selected is not limited, and it can be selected according to the video processing requirements or the hardware configuration to perform the processing in practical applications.
  • the target image frame is part or all of the continuous image frames in the video clip.
  • the target image frame After at least one image recognition algorithm recognizes the continuous image frame, it can generate attribute information for identifying the recognition result of the target image frame.
  • the type of information included in the attribute information and the way of identifying the information are not limited, and it mainly depends on the image recognition algorithm that recognizes the target image frame.
  • an image recognition algorithm for identifying object categories can be used to obtain attribute information for identifying whether the target image frame includes objects such as people, cats, dogs, etc.
  • an image recognition algorithm for identifying scene categories can be used to obtain an image for identifying the target Whether the frame includes the attribute information of the sky, sea, grass and other scene objects.
  • Step S102 Obtain tag description information of the video clip according to the attribute information corresponding to the target image frame.
  • the mark description information is used to record the description content of the image recognition result of the target image frame, so that subsequent video processing operations such as similarity comparison and clustering between video clips can be performed according to the mark description information.
  • the tag description information describes the image recognition result of the target image frame.
  • the tag description information can be used to describe how many cats have appeared in the video clip in total, or used to describe the magnitude of the cats that have appeared in the video clip, and so on.
  • the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4.
  • the value of T can be determined according to the subsequent video processing requirements and/or the image recognition results of continuous image frames in the video clip; the value of N can be determined according to the subsequent video processing requirements and/or the hardware storage space for data processing Sure.
  • the bit length is 3N.
  • N 4 bits
  • the bit length of each category is 4 bits
  • the three categories of human, cat, and dog need to use a total of 12 bits for recording.
  • the bit is the smallest storage unit of the computer, and the value of the bit is represented by 0 or 1.
  • the rule of recording based on the bit in this embodiment Not limited, in actual applications, the bit recording rules can be set according to subsequent video processing requirements and/or video clip content.
  • the tag description information is used to record the number of faces included in all target image frames of the video clip, when the value of N is set to 4, you can use 0001 to record a total of 0 faces, and use 0010 to record a total of 1 For faces, use 0100 to record a total of 2 faces, and use 1000 to record a total of more than 3 faces.
  • N when the value of N is set to 5, you can use 00000 to record a total of 0 faces, use 00001 to record a total of 1 face, use 00010 to record a total of 2 faces, and use 00011 to record a total of 3 faces. A total of 4 faces, etc. are recorded using 00100.
  • bit-based information to record on the one hand, the results of image processing by different image recognition algorithms can be recorded in a unified manner, which is convenient for subsequent video processing operations; on the other hand, it can also greatly save storage space.
  • N is equal to 8.
  • the embodiments of the present invention first identify the continuous image frames in the video clip to obtain the attribute information corresponding to at least one target image frame in the continuous image frames; then, according to the attribute information corresponding to the target image frame, Obtain the mark description information of the video segment, where the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4 . Therefore, the embodiment of the present invention can not only record the recognition results of consecutive image frames in the video segment by multiple image recognition algorithms in a unified manner, but also greatly save data storage space.
  • FIG. 2 is a schematic flowchart of a video segment marking provided by an embodiment of the application, including:
  • Step S201 Recognizing continuous image frames in a video clip to obtain attribute information corresponding to at least one target image frame in the continuous image frames, where the attribute information includes identification mark information.
  • a variety of different image recognition algorithms can be used to recognize consecutive image frames in a video clip, and the target image frame or the objects included in the target image frame are classified according to multiple angles to obtain at least one The identification mark information corresponding to the target image frame.
  • the identification mark information is used to identify the identification mark of at least one object category corresponding to the target image frame.
  • One target image frame or one object in the target image frame may correspond to the identification mark of one or more object categories.
  • a variety of different identification marks can be included.
  • the identification mark information can be included.
  • Use “DOG” to identify the corresponding identification marks of the three dogs in the "object category” and use "01", “02", and “03” to identify the corresponding identification marks of the three dogs in the "dog category”.
  • the identification mark information may include at least one of the following information: object category information used to identify the identification mark of the object object category corresponding to the target image frame; and the identification mark used to identify the scene object category corresponding to the target image frame Scene category information; face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
  • the object category is to classify the objects included in the target image frame, and the angle of the classification and the corresponding identification mark can be determined according to the video processing requirements or the adopted image recognition algorithm.
  • the identification mark corresponding to the object category can be used to identify objects of different animal categories such as “people”, “cats”, and “dogs”, and it can also be used to identify different object categories such as “animals”, “plants”, and “daily necessities”. Objects.
  • the scene object category is to classify the scene objects included in the target image frame.
  • the angle of the classification and the corresponding identification identifier can be determined according to the video processing requirements or the image recognition algorithm used.
  • the identification mark corresponding to the scene object category can be used to identify scene objects in different weather categories such as “rainy”, “sunny”, and “cloudy”, and it can also be used to identify different backgrounds such as “grassland”, “sky”, and “sea”.
  • the category of scene objects can be used to identify scene objects in different weather categories such as "rainy”, “sunny”, and “cloudy”, and it can also be used to identify different backgrounds such as “grassland”, “sky”, and “sea”.
  • the face object category is to classify the face objects included in the target image frame.
  • the angle of the classification and the corresponding identification identifier can be determined according to the video processing requirements or the image recognition algorithm used.
  • the recognition mark corresponding to the face object category can be used to identify face objects of different age groups such as “elderly”, “middle-aged”, “child”, etc., and can also be used to identify “round face” and “square face”. , "Melon seed face” and other face objects with different face shapes.
  • the target image frame can be targeted
  • the face objects of the face objects are recognized and identified in more categories.
  • the face object category information includes at least one of the following sub-information: expression sub-attribute information used to identify the recognition mark of the expression category corresponding to the target image frame; recognition used to identify the orientation category corresponding to the target image frame The orientation sub-attribute information of the mark; the gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.
  • the expression category is to classify the human faces included in the target image frame according to expressions.
  • the recognition mark corresponding to the expression category can be used to identify facial expressions such as "laughing”, “cry”, and "in a daze”.
  • the orientation category is to classify the faces included in the target image frame according to the face orientation.
  • the identification mark corresponding to the orientation category can be used to identify the face orientations such as "front”, “back”, and "side”.
  • Gender is to classify the faces included in the target image frame according to gender.
  • the identification mark corresponding to the gender can be used to identify "male”, “female”, and "uncertain”.
  • Step S202 Determine the number of identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame.
  • the tag description information obtained subsequently may only describe and record more important object categories.
  • at least one of all object categories may be determined as the first target category, so that all the identification marks corresponding to the first target category in the target image frame can be determined according to the identification mark information corresponding to the target image frame. quantity.
  • the target image frame A, target image frame B, and target image frame C in the video clip all include the corresponding "dog” identification mark
  • the identification mark "01" "And “02” mark the two dogs appearing in the target image frame A
  • a dog appears in the video clip three dogs are marked with the identification marks "01", "02", and "03” in the video clip. That is, the number of all the identification marks corresponding to "dogs" in the video clip is 3.
  • Step S203 Obtain mark description information of the video clip according to the number of identification marks corresponding to at least one first target category in the target image frame.
  • the recording method of the mark description information of the video segment is the same as that in step S102 in the first embodiment, and the details are not described herein again in this embodiment.
  • it may further include: obtaining mark recording information of the video segment according to the identification mark information corresponding to the target image frame, where , The mark recording information is used to record the identification mark corresponding to at least one second target category in the target image frame.
  • the second target category can be the same or different from the aforementioned first target category; in addition, the identification mark corresponding to the second target category can be all the identification marks corresponding to the second target category, or it can be the part corresponding to the second target category.
  • the identification mark can be selected reasonably according to the subsequent video processing requirements in practical applications.
  • the target image frame includes three identification marks of "laugh”, “cry”, and “in a daze” corresponding to the expression category, and the identification record information can only be used to record "laugh”.
  • the two identification marks of "" and “cry” can also be used to record the three identification marks of "laughing", “cry”, and "in a daze”.
  • the same recording method can be used to record all the identifications corresponding to the second target category. mark.
  • an int type ID may be used to record the identification mark corresponding to the second target category, where each ID corresponds to an identification mark.
  • the embodiment of the present invention can obtain video clip description information used to record the number of identification marks corresponding to at least one first target category; and by selecting subsequent videos Dealing with commonly used object categories to identify continuous image frames in video clips can reduce data processing and storage; by adopting a unified way to record mark and record information, it is convenient for subsequent management and use of data.
  • FIG. 3 is a schematic flowchart of a video segment marking provided by an embodiment of the application, including:
  • Step S301 Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames, where the attribute information includes identification mark information and time information.
  • the continuous image frames in the video clip all include the corresponding time stamp, in order to describe the time-related information of the video clip, when the continuous image frames in the video clip are identified, they can be used to identify Time information of the timestamp corresponding to the target image frame.
  • Step S302 Obtain the tag description information of the video segment according to the identification tag information corresponding to the target image frame, and obtain the first time description information and/or the second time description information according to the time information corresponding to the target image frame.
  • the first time description information is used to record the time stamp corresponding to the target image frame including at least one target mark, so that the object or target image frame identified by the target mark can be determined in the video clip according to the first time description information. Time of appearance. According to the first time description information, subsequent video processing operations such as clustering and screening of target image frames or video fragments including target tags can be performed more conveniently.
  • the user may focus on the appearance of a cat.
  • the cat can be identified using a preset target mark in the target image frame; By obtaining the timestamp corresponding to at least one target image frame including the target mark, the total appearance time of the cat in the video segment can be determined; thus, the first time describing the appearance time of the cat in the video segment can be finally generated Description.
  • the first time description information can be recorded using an array structure, where the numbers stored in the array are used to identify the timestamp corresponding to the target image frame including at least one target mark.
  • the target mark is usually used to mark objects required for subsequent video processing or objects that the user pays more attention to.
  • the target mark is one or more of the identification marks corresponding to at least one object category, which can be preset according to video description requirements.
  • the second time description information is used to record the start time stamp and the end time stamp of the video segment, so that the start and end time of the video segment can be determined subsequently based on the second time description information.
  • the start timestamp of the video segment is the timestamp corresponding to the first one of the continuous image frames of the video segment
  • the end timestamp of the video segment is the timestamp corresponding to the last one of the continuous image frames of the video segment.
  • the second time description information may be recorded using a series of numbers that identify the start time stamp and the end time stamp.
  • the embodiment of the present invention obtains the first time description information and/or the second time description information according to the time information corresponding to the target image frame, and can describe and record the time-related information of the video clip.
  • the information describing the video segment may include multiple types of tag description information, first time description information, and/or second time description information, which can better meet subsequent video processing requirements.
  • FIG. 4 is a video processing device 40 provided in the fourth embodiment of the application, including: a memory 401, a processor 402, and a video collector 403. Tracking target; the memory 401 is used to store program code; the processor 402 calls the program code, and when the program code is executed, it is used to perform the following operations:
  • the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
  • the attribute information includes identification mark information for identifying the identification mark of at least one object category corresponding to the target image frame; correspondingly, the attribute information corresponding to the target image frame is obtained.
  • the tag description information of the video clip includes:
  • the mark description information of the video clip is obtained.
  • the identification mark information includes at least one of the following information:
  • Object category information used to identify the identification mark of the object category corresponding to the target image frame; scene category information used to identify the identification mark of the scene object category corresponding to the target image frame; used to identify the target image Face category information of the recognition mark of the face object category corresponding to the frame.
  • the face category information includes at least one of the following sub-information:
  • Expression sub-attribute information used to identify the recognition tag of the expression category corresponding to the target image frame; orientation sub-attribute information used to identify the recognition tag of the orientation category corresponding to the target image frame; used to identify the target image frame The gender sub-attribute information of the corresponding gender identification mark.
  • the processor calls the program code, and when the program code is executed, it is further configured to perform the following operations: obtain the video clip according to the identification mark information corresponding to the target image frame The mark recording information of the, wherein the mark recording information is used to record the identification mark corresponding to at least one second target category in the target image frame.
  • the attribute information includes time information used to identify the time stamp corresponding to the target image frame; the processor calls the program code, and when the program code is executed, it is also used to execute the following Operation: Obtain the first time description information and/or the second time description information of the video clip according to the time information corresponding to the target image frame; wherein the first time description information used for recording includes at least one A timestamp corresponding to the target image frame of the target mark, and the second time description information is used to record the start timestamp and the end timestamp of the video segment.
  • the N is equal to 8.
  • a handheld camera including the video processing device described in the fourth embodiment, further includes: a carrier, which is fixedly connected to the video collector, and is configured to carry at least a part of the video collector.
  • the carrier includes, but is not limited to, a handheld pan/tilt.
  • the handheld pan/tilt is a handheld three-axis pan/tilt.
  • the video capture device includes, but is not limited to, a handheld three-axis pan-tilt camera.
  • the handheld pan/tilt head 1 of the embodiment of the present invention includes a handle 11 and a photographing device 12 loaded on the handle 11.
  • the photographing device 12 may include a three-axis pan/tilt camera , In other embodiments, it includes a pan-tilt camera with two axes or more than three axes.
  • the handle 11 is provided with a display screen 13 for displaying the shooting content of the shooting device 12.
  • the invention does not limit the type of the display screen 13.
  • the display screen 13 By setting the display screen 13 on the handle 11 of the handheld PTZ 1, the display screen can display the shooting content of the shooting device 12, so that the user can quickly browse the pictures or videos shot by the shooting device 12 through the display screen 13, thereby improving The interaction and fun of the handheld PTZ 1 with the user meets the diverse needs of the user.
  • the handle 11 is further provided with an operating function unit for controlling the camera 12, and by operating the operating function unit, the operation of the camera 12 can be controlled, for example, the opening and closing of the camera 12 can be controlled. Turning off and controlling the shooting of the shooting device 12, controlling the posture change of the pan-tilt part of the shooting device 12, etc., so that the user can quickly operate the shooting device 12.
  • the operation function part may be in the form of a button, a knob or a touch screen.
  • the operating function unit includes a photographing button 14 for controlling the photographing of the photographing device 12, a power/function button 15 for controlling the opening and closing of the photographing device 12 and other functions, as well as controlling the pan/tilt.
  • the universal key 16 may also include other control buttons, such as image storage buttons, image playback control buttons, etc., which can be set according to actual needs.
  • the operation function part and the display screen 13 are arranged on the same side of the handle 11.
  • the operation function part and the display screen 13 shown in FIG. Engineering, and at the same time make the overall appearance and layout of the handheld PTZ 1 more reasonable and beautiful.
  • the side of the handle 11 is provided with a function operation key A, which is used to facilitate the user to quickly and intelligently form a sheet with one key.
  • a function operation key A which is used to facilitate the user to quickly and intelligently form a sheet with one key.
  • the handle 11 is further provided with a card slot 17 for inserting a storage element.
  • the card slot 17 is provided on the side of the handle 11 adjacent to the display screen 13, and a memory card is inserted into the card slot 17 to store the images taken by the camera 12 in the memory card. .
  • arranging the card slot 17 on the side does not affect the use of other functions, and the user experience is better.
  • a power supply battery for supplying power to the handle 11 and the imaging device 12 may be provided inside the handle 11.
  • the power supply battery can be a lithium battery with large capacity and small size to realize the miniaturized design of the handheld pan/tilt 1.
  • the handle 11 is also provided with a charging interface/USB interface 18.
  • the charging interface/USB interface 18 is provided at the bottom of the handle 11 to facilitate connection with an external power source or storage device, so as to charge the power supply battery or perform data transmission.
  • the handle 11 is further provided with a sound pickup hole 19 for receiving audio signals, and the sound pickup hole 19 communicates with a microphone inside.
  • the sound pickup hole 19 may include one or more. It also includes an indicator light 20 for displaying status. The user can realize audio interaction with the display screen 13 through the sound pickup hole 19.
  • the indicator light 20 can serve as a reminder, and the user can obtain the power status of the handheld PTZ 1 and the current execution function status through the indicator light 20.
  • the sound pickup hole 19 and the indicator light 20 can also be arranged on the front of the handle 11, which is more in line with the user's usage habits and operation convenience.
  • the imaging device 12 includes a pan-tilt support and a camera mounted on the pan-tilt support.
  • the imager may be a camera, or an image pickup element composed of a lens and an image sensor (such as CMOS or CCD), etc., which can be specifically selected according to needs.
  • the camera may be integrated on the pan-tilt support, so that the photographing device 12 is a pan-tilt camera; it may also be an external photographing device, which can be detachably connected or clamped to be mounted on the pan-tilt support.
  • the pan/tilt support is a three-axis pan/tilt support
  • the photographing device 12 is a three-axis pan/tilt camera.
  • the three-axis pan/tilt head bracket includes a yaw axis assembly 22, a roll axis assembly 23 movably connected to the yaw axis assembly 22, and a pitch axis assembly 24 movably connected to the roll axis assembly 23.
  • the camera is mounted on the pitch axis assembly 24.
  • the yaw axis assembly 22 drives the camera 12 to rotate in the yaw direction.
  • the pan/tilt support can also be a two-axis pan/tilt, a four-axis pan/tilt, etc., which can be specifically selected according to needs.
  • a mounting portion is further provided, the mounting portion is provided at one end of the connecting arm connected to the roll shaft assembly, and the yaw shaft assembly may be set in the handle, and the yaw shaft assembly drives The camera 12 rotates in the yaw direction together.
  • the handle 11 is provided with an adapter 26 for coupling with a mobile device 2 (such as a mobile phone), and the adapter 26 and the handle 11 can be Disconnect the connection.
  • the adapter 26 protrudes from the side of the handle for connecting to the mobile device 2.
  • the adapter 26 is connected to the mobile device 2, the handheld platform 1 and The adapter 26 is docked and used to be supported at the end of the mobile device 2.
  • the handle 11 is provided with an adapter 26 for connecting with the mobile device 2 to connect the handle 11 and the mobile device 2 to each other.
  • the handle 11 can be used as a base of the mobile device 2.
  • the user can hold the other end of the mobile device 2 Let's pick up and operate the handheld PTZ 1 together, the connection is convenient and fast, and the product is beautiful.
  • a communication connection between the handheld pan-tilt 1 and the mobile device 2 can be realized, and the camera 12 and the mobile device 2 can transmit data.
  • the adapter 26 and the handle 11 are detachably connected, that is, the adapter 26 and the handle 11 can be mechanically connected or removed. Further, the adapter 26 is provided with an electrical contact portion, and the handle 11 is provided with an electrical contact matching portion that matches with the electrical contact portion.
  • the adapter 26 can be removed from the handle 11.
  • the adapter 26 is installed on the handle 11 to complete the mechanical connection between the adapter 26 and the handle 11, and at the same time through the electrical contact part and the electrical contact mating part. The connection ensures the electrical connection between the two, so as to realize the data transmission between the camera 12 and the mobile device 2 through the adapter 26.
  • a receiving groove 27 is provided on the side of the handle 11, and the adapter 26 is slidably clamped in the receiving groove 27. After the adapter 26 is installed in the receiving slot 27, the adapter 26 partially protrudes from the receiving slot 27, and the portion of the adapter 26 protruding from the receiving slot 27 is used to connect with the mobile device 2.
  • the adapter 26 when the adapter 26 is inserted into the receiving groove 27 from the adapter 26, the adapter part is flush with the receiving groove 27, and then The adapter 26 is stored in the receiving groove 27 of the handle 11.
  • the adapter 26 can be inserted into the receiving groove 27 from the adapter part, so that the adapter 26 protrudes from the receiving groove 27, So that the mobile device 2 and the handle 11 are connected to each other
  • the adapter 26 can be taken out of the receiving slot 27 of the handle 11, and then inserted into the receiving slot from the adapter 26 in the reverse direction 27, the adapter 26 is further stored in the handle 11.
  • the adapter 26 is flush with the receiving groove 27 of the handle 11. After the adapter 26 is stored in the handle 11, the surface of the handle 11 can be ensured to be flat, and the adapter 26 is stored in the handle 11 to make it easier to carry.
  • the receiving groove 27 is semi-opened on one side surface of the handle 11, which makes it easier for the adapter 26 to be slidably connected to the receiving groove 27.
  • the adapter 26 can also be detachably connected to the receiving slot 27 of the handle 11 by means of a snap connection, a plug connection, or the like.
  • the receiving groove 27 is provided on the side of the handle 11.
  • the receiving groove 27 is clamped and covered by the cover 28, which is convenient for the user to operate, and does not affect the front and sides of the handle. The overall appearance.
  • the electrical contact part and the electrical contact mating part may be electrically connected in a contact contact manner.
  • the electrical contact portion can be selected as a telescopic probe, can also be selected as an electrical plug-in interface, or can be selected as an electrical contact.
  • the electrical contact portion and the electrical contact mating portion can also be directly connected to each other in a surface-to-surface contact manner.
  • a method for marking video clips characterized in that it comprises:
  • the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
  • the video segment marking method wherein the attribute information includes identification mark information for identifying identification marks of at least one object category corresponding to the target image frame; correspondingly, according to the target
  • the attribute information corresponding to the image frame, and obtaining the mark description information of the video segment includes:
  • the mark description information of the video clip is obtained.
  • identification marking information includes at least one of the following information:
  • Object category information used to identify the identification mark of the object category corresponding to the target image frame
  • Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame
  • Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
  • A4 The video clip marking method according to A3, wherein the face category information includes at least one of the following sub-information:
  • the gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame is not limited to the gender sub-attribute information of the identification mark.
  • A5. The video segment marking method according to A2, wherein the method further includes:
  • the mark record information of the video clip is obtained, wherein the mark record information is used to record the identification corresponding to at least one second target category in the target image frame mark.
  • A6 The video segment marking method according to A2, wherein the attribute information includes time information used to identify a timestamp corresponding to the target image frame, and the method further includes:
  • the time information corresponding to the target image frame obtain the first time description information and/or the second time description information of the video clip; wherein the first time description information is used for recording including at least one target mark
  • the time stamp corresponding to the target image frame, and the second time description information is used to record the start time stamp and the end time stamp of the video segment.
  • a video segment marking device characterized by comprising: a memory, a processor, and a video collector, the video collector is used to collect a target to be tracked in a target area; the memory is used to store program code; the processing The program code is called, and when the program code is executed, it is used to perform the following operations:
  • the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
  • the video clip marking device wherein the attribute information includes identification mark information for identifying identification marks of at least one object category corresponding to the target image frame; correspondingly, according to the target image
  • the attribute information corresponding to the frame, and obtaining the mark description information of the video segment includes:
  • the mark description information of the video clip is obtained.
  • identification marking information includes at least one of the following information:
  • Object category information used to identify the identification mark of the object category corresponding to the target image frame
  • Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame
  • Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
  • A11 The video clip marking device according to A10, wherein the face category information includes at least one of the following sub-information:
  • the gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame is not limited to the gender sub-attribute information of the identification mark.
  • A12 The video clip marking device according to A9, wherein the processor calls the program code, and when the program code is executed, it is further configured to perform the following operations:
  • the mark record information of the video clip is obtained, wherein the mark record information is used to record the identification corresponding to at least one second target category in the target image frame mark.
  • the video clip marking device according to A9, wherein the attribute information includes time information used to identify the timestamp corresponding to the target image frame; the processor calls the program code, when the program code When executed, it is also used to perform the following operations:
  • the time information corresponding to the target image frame obtain the first time description information and/or the second time description information of the video clip; wherein the first time description information is used for recording including at least one target mark
  • the time stamp corresponding to the target image frame, and the second time description information is used to record the start time stamp and the end time stamp of the video segment.
  • A14 The video segment marking device according to A8, wherein the N is equal to 8.
  • a handheld camera characterized by comprising the video clip marking device according to any one of A8-A14, characterized by further comprising: a carrier, which is fixedly connected to the video collector , Used to carry at least a part of the video collector.
  • A16 The handheld camera according to A15, wherein the carrier includes but is not limited to a handheld pan/tilt.
  • A17 The handheld camera according to A16, wherein the handheld PTZ is a handheld three-axis PTZ.
  • a programmable logic device for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • ABEL Advanced Boolean Expression Language
  • AHDL Altera Hardware Description Language
  • HDCal JHDL
  • Lava Lava
  • Lola MyHDL
  • PALASM RHDL
  • VHDL Very-High-Speed Integrated Circuit Hardware Description Language
  • Verilog Verilog
  • the controller can be implemented in any suitable manner.
  • the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic.
  • controllers in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded logic.
  • the same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
  • a typical implementation device is a computer.
  • the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • this application can be provided as a method, a system, or a computer program product. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific transactions or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network execute transactions.
  • program modules can be located in local and remote computer storage media including storage devices.

Abstract

A video clip marking method and device, and a handheld camera. The method comprises: identifying continuous images in a video clip, so as to obtain attribute information corresponding to at least one target image in the continuous images (S101); according to the attribute information corresponding to the target image, obtaining marking description information of the video clip (S102), the marking description information comprising information recorded on the basis of a bit, the length of the bit being T*N, T representing the number of object categories in the target image, N being an integer greater than or equal to 4. Said method can not only record, in a unified manner, results obtained by identifying continuous images in a video clip by different image identification algorithms, but also greatly save storage space.

Description

一种视频片段标记方法、设备及手持相机Video segment marking method, equipment and handheld camera 技术领域Technical field
本申请实施例涉及图像处理技术领域,尤其涉及一种视频片段标记方法、设备及手持相机。The embodiments of the present application relate to the field of image processing technologies, and in particular, to a video clip marking method, device, and handheld camera.
背景技术Background technique
随着图像处理技术的发展,出现了越来越多的图像识别算法。通过图像识别算法对视频片段中的连续图像帧进行识别及标记后,可生成用于描述视频片段的描述信息,从而后续可根据视频片段对应的描述信息,对视频片段或者视频片段中的部分图像帧进行搜索、聚类等多种处理。With the development of image processing technology, more and more image recognition algorithms have appeared. After the continuous image frames in the video clip are identified and marked by the image recognition algorithm, the description information used to describe the video clip can be generated, so that the video clip or part of the image in the video clip can be subsequently determined according to the description information corresponding to the video clip. Frame search, clustering and other processing.
为了满足不同的视频处理需求,通常会采用多种图像识别算法对视频中的连续图像帧进行识别,但是不同种类的图像识别算法所生成的描述信息的记录方式会有所不同,导致后续进行搜索、聚类等处理时对描述信息使用不便。In order to meet different video processing needs, a variety of image recognition algorithms are usually used to identify continuous image frames in the video, but the description information generated by different types of image recognition algorithms will be recorded in different ways, leading to subsequent searches It is inconvenient to use the description information in processing such as, clustering, etc.
发明内容Summary of the invention
有鉴于此,本发明实施例所解决的技术问题之一在于提供一种视频片段标记方法、设备及手持相机,用以克服现有技术中多种图像识别算法所生成的描述信息记录方式不统一,不利于后续数据处理及存储的缺陷。In view of this, one of the technical problems solved by the embodiments of the present invention is to provide a video segment marking method, device, and handheld camera to overcome the inconsistency of the description information recording method generated by multiple image recognition algorithms in the prior art , Which is not conducive to the defects of subsequent data processing and storage.
本申请实施例提供了一种视频片段标记法,包括:The embodiment of the application provides a video segment marking method, including:
对视频片段中的连续图像帧进行识别,获得所述连续图像帧中至少一个目标图像帧所对应的属性信息;Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;
根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息,其中,所述标记描述信息包括基于比特位来记录的信息,所述比特位的长度为T*N,T表示所述目标图像帧中的对象类别数量,N为大于或者等于4的整数。According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
可选的,所述属性信息包括用于标识所述目标图像帧对应的至少一对象类别的识别标记的识别标记信息;对应的,根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息包括:Optionally, the attribute information includes identification mark information for identifying an identification mark of at least one object category corresponding to the target image frame; correspondingly, the video clip is obtained according to the attribute information corresponding to the target image frame The tag description information includes:
根据所述目标图像帧对应的所述识别标记信息,确定所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量;Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;
根据所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量,获得所述视频片段的标记描述信息。According to the number of the identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.
可选的,所述识别标记信息包括下述信息中的至少其一:Optionally, the identification mark information includes at least one of the following information:
用于标识所述目标图像帧对应的物体对象类别的识别标记的物体类别信息;用于标识所述目标图像帧对应的场景对象类别的识别标记的场景类别信息;用于标识所述目目标图像帧对应的人脸对象类别的识别标记的人脸类别信息。Object category information used to identify the identification mark of the object category corresponding to the target image frame; scene category information used to identify the identification mark of the scene object category corresponding to the target image frame; used to identify the target image Face category information of the recognition mark of the face object category corresponding to the frame.
可选的,所述人脸类别信息包括下述子信息中的至少其一:用于标识所述目标图像帧对应的表情类别的识别标记的表情子属性信息;用于标识所述目标图像帧对应的朝向类别的识别标记的朝向子属性信息;用于标识所述目标图像帧对应的性别的识别标记的性别子属性信息。Optionally, the face category information includes at least one of the following sub-information: expression sub-attribute information used to identify the recognition mark of the expression category corresponding to the target image frame; and used to identify the target image frame The orientation sub-attribute information of the identification mark of the corresponding orientation category; the gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.
可选的,所述方法还包括:根据所述目标图像帧对应的所述识别标记信息,获得所述视频片段的标记记录信息,其中,所述标记记录信息用于记录所述目标图像帧中的至少一个第二目标类别对应的识别标记。Optionally, the method further includes: obtaining the mark recording information of the video clip according to the identification mark information corresponding to the target image frame, wherein the mark recording information is used to record in the target image frame At least one identification mark corresponding to the second target category.
可选的,所述属性信息包括用于标识所述目标图像帧对应的时间戳的时间信息,所述方法还包括:根据所述目标图像帧对应的所述时间信息,获得所述视频片段的第一时间描述信息和/或第二时间描述信息;其中,所述第一时间描述信息用于记录包括至少一目标标记的所述目标图像帧对应的时间戳,所述第二时间描述信息用于记录所述视频片段的开始时间戳和结束时间戳。Optionally, the attribute information includes time information used to identify the time stamp corresponding to the target image frame, and the method further includes: obtaining the time information of the video clip according to the time information corresponding to the target image frame. The first time description information and/or the second time description information; wherein the first time description information is used to record the time stamp corresponding to the target image frame including at least one target mark, and the second time description information is used To record the start time stamp and the end time stamp of the video segment.
可选的,所述N等于8。Optionally, the N is equal to 8.
本申请实施例还提供了一种视频片段标设备,包括:存储器、处理器、视频采集器,所述视频采集器用于采集目标区域的待跟踪目标;所述存储器用于存储程序代码;所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:An embodiment of the present application also provides a video clip marking device, including: a memory, a processor, and a video collector, where the video collector is used to collect a target to be tracked in a target area; the memory is used to store program code; The processor calls the program code, and when the program code is executed, it is used to perform the following operations:
对视频片段中的连续图像帧进行识别,获得所述连续图像帧中至少一个目标图像帧所对应的属性信息;Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;
根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息,其中,所述标记描述信息包括基于比特位来记录的信息,所述比特位的长度为T*N,T表示所述目标图像帧中的对象类别数量,N为大于或者等于4的整数。According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
本申请实施例还提供了一种手持相机,包括根据前述的视频片段标记设备,其特征在于,还包括:承载器,所述承载器与所述视频采集器固定连接,用于承载所述视频采集器的至少一部分。An embodiment of the present application also provides a handheld camera, including the video clip marking device according to the foregoing, and is characterized in that it further includes a carrier, which is fixedly connected to the video collector and is used to carry the video. At least part of the collector.
可选的,所述承载器包括但不限于手持云台。Optionally, the carrier includes but is not limited to a handheld pan/tilt.
可选的,所述手持云台为手持三轴云台。Optionally, the handheld PTZ is a handheld three-axis PTZ.
可选的,所述视频采集器包括但不限于手持三轴云台用摄像头。Optionally, the video capture device includes, but is not limited to, a handheld three-axis pan/tilt camera.
本申请实施例中,通过对视频片段中的连续图像帧进行识别,获得连续图像帧中至少一个目标图像帧所对应的属性信息;然后根据目标图像帧对应的属性信息,获得视频片段的标记描述信息,其中,标记描述信息包括基于比特位来记录的信息,比特位的长度为T*N,T表示目标图像帧中的对象类别数量,N为大于或者等于4的整数。因此,本发明实施例不仅可以用统一的方式记录不同图像识别算法对视频片段中连续图像帧的识别结果,还可极大地节约存储空间。In the embodiment of the present application, the attribute information corresponding to at least one target image frame in the continuous image frame is obtained by recognizing the continuous image frames in the video segment; then the tag description of the video segment is obtained according to the attribute information corresponding to the target image frame Information, where the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4. Therefore, the embodiment of the present invention can not only record the recognition results of consecutive image frames in the video segment by different image recognition algorithms in a unified manner, but also greatly save storage space.
附图说明Description of the drawings
后文将参照附图以示例性而非限制性的方式详细描述本申请实施例的一些具体实施例。附图中相同的附图标记标示了相同或类似的部件或部分。本领域技术人员应该理解,这些附图未必是按比值绘制的。附图中:Hereinafter, some specific embodiments of the embodiments of the present application will be described in detail in an exemplary but not restrictive manner with reference to the accompanying drawings. The same reference numerals in the drawings indicate the same or similar components or parts. Those skilled in the art should understand that these drawings are not necessarily drawn in ratios. In the attached picture:
图1为本申请实施例一提供的一种视频片段标记方法的示意性流程图;FIG. 1 is a schematic flowchart of a method for marking video clips provided in Embodiment 1 of this application;
图2为本申请实施例二提供的一种视频片段标记方法的示意性流程图;FIG. 2 is a schematic flowchart of a method for marking video clips provided in Embodiment 2 of the present application;
图3为本申请实施例三提供的一种视频片段标记方法的示意性流程图;FIG. 3 is a schematic flowchart of a method for marking video clips provided in Embodiment 3 of the present application;
图4为本申请实施例四提供的一种视频片段标记设备的示意性结构图;4 is a schematic structural diagram of a video segment marking device provided in Embodiment 4 of this application;
图5为本申请实施例五提供的一种手持云台的示意性结构图;FIG. 5 is a schematic structural diagram of a handheld pan/tilt head provided by Embodiment 5 of the application; FIG.
图6为本申请实施例五提供的一种手持云台的与手机连接的示意性结构图;FIG. 6 is a schematic structural diagram of a handheld PTZ connected with a mobile phone according to Embodiment 5 of the application;
图7为本申请实施例五提供的一种手持云台的示意性结构图。FIG. 7 is a schematic structural diagram of a handheld pan/tilt head provided in Embodiment 5 of this application.
具体实施方式Detailed ways
在本发明使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本发明和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包括一个或多个相关联的列出项目的任何或所有可能组合。The terms used in the present invention are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. The singular forms of "a", "said" and "the" used in the present invention and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items.
应当理解,本申请说明书以及权利要求书中使用的“第一”“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”或者“一”等类似词语也不表示数量限制,而是表示存 在至少一个。It should be understood that the "first", "second" and similar words used in the specification and claims of this application do not denote any order, quantity or importance, but are only used to distinguish different components. Similarly, similar words such as "one" or "one" do not mean a quantity limit, but instead mean that there is at least one.
下面结合本发明实施例附图进一步说明本发明实施例具体实现。The specific implementation of the embodiments of the present invention will be further described below in conjunction with the accompanying drawings of the embodiments of the present invention.
实施例一Example one
本申请实施例一提供一种视频片段标记方法,如图1所示,图1为本申请实施例提供的一种视频片段标记的示意性流程图,包括:Embodiment 1 of the present application provides a video segment marking method, as shown in FIG. 1. FIG. 1 is a schematic flowchart of a video segment marking provided by an embodiment of this application, including:
步骤S101,对视频片段中的连续图像帧进行识别,获得连续图像帧中至少一个目标图像帧所对应的属性信息。Step S101: Recognizing continuous image frames in a video clip, and obtaining attribute information corresponding to at least one target image frame in the continuous image frames.
本实施例中,视频片段中包括连续的多个图像帧,视频片段中的连续图像帧的数量不限。例如在对长视频进行处理时,可将一个长视频分为多个短的视频片段,每个视频片段中包括的连续图像帧数量可以是固定值或者非固定值。In this embodiment, the video clip includes multiple consecutive image frames, and the number of consecutive image frames in the video clip is not limited. For example, when processing a long video, one long video can be divided into multiple short video segments, and the number of consecutive image frames included in each video segment can be a fixed value or a non-fixed value.
本实施例中,可使用一种或者多种图像识别算法对视频片段中的连续图像帧进行识别。所选用的图像识别算法的种类不限,在实际应用中可根据视频处理需求或者执行处理的硬件配置进行选择。In this embodiment, one or more image recognition algorithms may be used to recognize consecutive image frames in the video segment. The type of image recognition algorithm selected is not limited, and it can be selected according to the video processing requirements or the hardware configuration to perform the processing in practical applications.
本实施例中,目标图像帧为视频片段中连续图像帧中的部分或者全部,至少一种图像识别算法对连续图像帧进行识别后,可生成用于标识对目标图像帧的识别结果的属性信息。属性信息中所包括的信息种类和对信息的标识方式不限,主要取决于对目标图像帧进行识别的图像识别算法。In this embodiment, the target image frame is part or all of the continuous image frames in the video clip. After at least one image recognition algorithm recognizes the continuous image frame, it can generate attribute information for identifying the recognition result of the target image frame. . The type of information included in the attribute information and the way of identifying the information are not limited, and it mainly depends on the image recognition algorithm that recognizes the target image frame.
例如,可利用识别物体类别的图像识别算法,获得用于标识目标图像帧中是否包括人、猫、狗等物体对象的属性信息;可利用识别场景类别的图像识别算法,获得用于标识目标图像帧中是否包括天空、大海、草地等场景对象的属性信息。For example, an image recognition algorithm for identifying object categories can be used to obtain attribute information for identifying whether the target image frame includes objects such as people, cats, dogs, etc.; an image recognition algorithm for identifying scene categories can be used to obtain an image for identifying the target Whether the frame includes the attribute information of the sky, sea, grass and other scene objects.
步骤S102,根据目标图像帧对应的属性信息,获得视频片段的标记描述信息。Step S102: Obtain tag description information of the video clip according to the attribute information corresponding to the target image frame.
本实施例中,标记描述信息用于记录对目标图像帧的图像识别结果的描述内容,以使得后续可根据标记描述信息进行视频片段之间的相似性比较、聚类等视频处理操作。标记描述信息对目标图像帧的图像识别结果进行描述的方式不限。例如,标记描述信息可以用于描述视频片段中总计出现过多少只猫,或者用于描述视频片段中出现的猫的数量级等。In this embodiment, the mark description information is used to record the description content of the image recognition result of the target image frame, so that subsequent video processing operations such as similarity comparison and clustering between video clips can be performed according to the mark description information. There is no limit to the manner in which the tag description information describes the image recognition result of the target image frame. For example, the tag description information can be used to describe how many cats have appeared in the video clip in total, or used to describe the magnitude of the cats that have appeared in the video clip, and so on.
本实施例中,标记描述信息包括基于比特位来记录的信息,比特位的长度为T*N,T表示目标图像帧中的对象类别数量,N为大于或者等于4的整数。其中,T的取值可根据后续的视频处理需求和/或对视频片段中连续图像帧的图 像识别结果确定;N的取值可根据后续的视频处理需求和/或进行数据处理的硬件存储空间确定。In this embodiment, the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4. Among them, the value of T can be determined according to the subsequent video processing requirements and/or the image recognition results of continuous image frames in the video clip; the value of N can be determined according to the subsequent video processing requirements and/or the hardware storage space for data processing Sure.
例如,如果目标图像帧中的对象类别包括人、猫、狗三种的话,则比特位的长度为3N。当N取值为4时,即每种类别的比特位长度为4位,则人、猫、狗三种类别一共需使用12位进行记录。For example, if the object categories in the target image frame include human, cat, and dog, the bit length is 3N. When the value of N is 4, that is, the bit length of each category is 4 bits, and the three categories of human, cat, and dog need to use a total of 12 bits for recording.
本实施例中,比特位是计算机最小的存储单位,以0或1来表示比特位的值,越多的比特位数可以记录越复杂的图像信息,本实施例中基于比特位来记录的规则不限,在实际应用中可根据后续的视频处理需求和/或视频片段内容设定比特位的记录规则。In this embodiment, the bit is the smallest storage unit of the computer, and the value of the bit is represented by 0 or 1. The more the bit can record the more complex image information, the rule of recording based on the bit in this embodiment Not limited, in actual applications, the bit recording rules can be set according to subsequent video processing requirements and/or video clip content.
例如,若标记描述信息用于记录视频片段全部目标图像帧中包括的人脸数,当设置N的取值为4时,可使用0001记录一共有0张人脸,使用0010记录一共有1张人脸,使用0100记录一共有2张人脸,使用1000记录一共有3张以上的人脸。For example, if the tag description information is used to record the number of faces included in all target image frames of the video clip, when the value of N is set to 4, you can use 0001 to record a total of 0 faces, and use 0010 to record a total of 1 For faces, use 0100 to record a total of 2 faces, and use 1000 to record a total of more than 3 faces.
又例如,当设置N的取值为5时,可使用00000记录一共有0张人脸,使用00001记录一共有1张人脸,使用00010记录一共有2张人脸,使用00011记录一共有3张人脸,使用00100记录一共有4张人脸等。For another example, when the value of N is set to 5, you can use 00000 to record a total of 0 faces, use 00001 to record a total of 1 face, use 00010 to record a total of 2 faces, and use 00011 to record a total of 3 faces. A total of 4 faces, etc. are recorded using 00100.
采用基于比特位来记录的信息一方面可以将不同图像识别算法进行图像处理的结果采用统一方式进行记录,便于进行后续视频处理操作;另一方面还可极大地节约存储空间。Using bit-based information to record, on the one hand, the results of image processing by different image recognition algorithms can be recorded in a unified manner, which is convenient for subsequent video processing operations; on the other hand, it can also greatly save storage space.
可选的,通过多次应用测试,为广泛适用于包括不同内容的视频片段且不会占用较多的存储空间,优选N等于8。Optionally, through multiple application tests, it is widely applicable to video clips including different contents and does not occupy much storage space. Preferably, N is equal to 8.
由以上本发明实施例可见,本发明实施例首先对视频片段中的连续图像帧进行识别,获得连续图像帧中至少一个目标图像帧所对应的属性信息;然后根据目标图像帧对应的属性信息,获得视频片段的标记描述信息,其中,标记描述信息包括基于比特位来记录的信息,比特位的长度为T*N,T表示目标图像帧中的对象类别数量,N为大于或者等于4的整数。因此,本发明实施例不仅可以用统一的方式记录多种图像识别算法对视频片段中连续图像帧的识别结果,还可极大地节约数据存储空间。It can be seen from the above embodiments of the present invention that the embodiments of the present invention first identify the continuous image frames in the video clip to obtain the attribute information corresponding to at least one target image frame in the continuous image frames; then, according to the attribute information corresponding to the target image frame, Obtain the mark description information of the video segment, where the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4 . Therefore, the embodiment of the present invention can not only record the recognition results of consecutive image frames in the video segment by multiple image recognition algorithms in a unified manner, but also greatly save data storage space.
实施例二Example two
本申请实施例二提供一种视频片段标记方法,如图2所示,图2为本申请实施例提供的一种视频片段标记的示意性流程图,包括:The second embodiment of the present application provides a video segment marking method. As shown in FIG. 2, FIG. 2 is a schematic flowchart of a video segment marking provided by an embodiment of the application, including:
步骤S201,对视频片段中的连续图像帧进行识别,获得连续图像帧中至少一个目标图像帧所对应的属性信息,其中,属性信息包括识别标记信息。Step S201: Recognizing continuous image frames in a video clip to obtain attribute information corresponding to at least one target image frame in the continuous image frames, where the attribute information includes identification mark information.
本实施例中,可使用多种不同的图像识别算法对视频片段中的连续图像帧进行识别,并按照多个角度对目标图像帧或者目标图像帧中所包括的对象进行分类,以获得至少一个目标图像帧所对应的识别标记信息。其中,识别标记信息用于标识目标图像帧对应的至少一对象类别的识别标记,一个目标图像帧或者目标图像帧中的一个对象可对应于一个或者多个对象类别的识别标记,一种对象类别可包括多种不同的识别标记。In this embodiment, a variety of different image recognition algorithms can be used to recognize consecutive image frames in a video clip, and the target image frame or the objects included in the target image frame are classified according to multiple angles to obtain at least one The identification mark information corresponding to the target image frame. Wherein, the identification mark information is used to identify the identification mark of at least one object category corresponding to the target image frame. One target image frame or one object in the target image frame may correspond to the identification mark of one or more object categories. A variety of different identification marks can be included.
例如,当使用图像识别算法识别视频片段中的连续图像帧中所包括对象对应于“物体类别”和“狗类”时,若目标图像帧中包括三只狗的话,则在识别标记信息中可使用“DOG”标识三只狗在“物体类别”中对应的识别标记,分别使用“01”、“02”、“03”标识三只狗在“狗类”中对应的识别标记。For example, when the image recognition algorithm is used to identify the objects included in the continuous image frames in the video clip corresponding to the "object category" and "dog category", if three dogs are included in the target image frame, the identification mark information can be included. Use "DOG" to identify the corresponding identification marks of the three dogs in the "object category", and use "01", "02", and "03" to identify the corresponding identification marks of the three dogs in the "dog category".
可选的,在满足后续视频处理需求的同时,为了尽可能地减少数据处理量,可仅对视频片段中的连续图像帧针对常见的对象类别进行识别。具体的,识别标记信息可包括下述信息中的至少其一:用于标识目标图像帧对应的物体对象类别的识别标记的物体类别信息;用于标识目标图像帧对应的场景对象类别的识别标记的场景类别信息;用于标识目目标图像帧对应的人脸对象类别的识别标记的人脸类别信息。Optionally, while meeting subsequent video processing requirements, in order to reduce the amount of data processing as much as possible, only the continuous image frames in the video segment may be identified for common object categories. Specifically, the identification mark information may include at least one of the following information: object category information used to identify the identification mark of the object object category corresponding to the target image frame; and the identification mark used to identify the scene object category corresponding to the target image frame Scene category information; face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
其中,物体对象类别是对目标图像帧中所包括的物体对象进行分类,分类的角度及对应的识别标识可根据视频处理需求或者所采用的图像识别算法所确定。例如,物体对象类别对应的识别标记可用于标识“人”、“猫”、“狗”等不同动物类别的物体对象,也可用于标识“动物”、“植物”、“日用品”等不同物品类别的物体对象。Among them, the object category is to classify the objects included in the target image frame, and the angle of the classification and the corresponding identification mark can be determined according to the video processing requirements or the adopted image recognition algorithm. For example, the identification mark corresponding to the object category can be used to identify objects of different animal categories such as "people", "cats", and "dogs", and it can also be used to identify different object categories such as "animals", "plants", and "daily necessities". Objects.
场景对象类别是对目标图像帧中所包括的场景对象进行分类,分类的角度及对应的识别标识可根据视频处理需求或者所采用的图像识别算法所确定。例如,场景对象类别对应的识别标记可用于标识“雨天”、“晴天”、“阴天”等不同天气类别的场景对象,也可用于标识“草原”、“天空”、“大海”等不同背景类别的场景对象。The scene object category is to classify the scene objects included in the target image frame. The angle of the classification and the corresponding identification identifier can be determined according to the video processing requirements or the image recognition algorithm used. For example, the identification mark corresponding to the scene object category can be used to identify scene objects in different weather categories such as "rainy", "sunny", and "cloudy", and it can also be used to identify different backgrounds such as "grassland", "sky", and "sea". The category of scene objects.
人脸对象类别是对目标图像帧中所包括的人脸对象进行分类,分类的角度及对应的识别标识可根据视频处理需求或者所采用的图像识别算法所确定。例如,人脸对象类别对应的识别标记可用于标识“老年人”、“中年人”、“儿 童”等不同的年龄段的人脸对象,也可用于标识“圆脸”、“方脸”、“瓜子脸”等不同脸型的人脸对象。The face object category is to classify the face objects included in the target image frame. The angle of the classification and the corresponding identification identifier can be determined according to the video processing requirements or the image recognition algorithm used. For example, the recognition mark corresponding to the face object category can be used to identify face objects of different age groups such as "elderly", "middle-aged", "child", etc., and can also be used to identify "round face" and "square face". , "Melon seed face" and other face objects with different face shapes.
可选的,由于随着互联网和视频拍摄相关技术的发展,用户在进行视频拍摄或者处理时更为关注对人物对象的识别及处理,因此为了满足大部分用户的需求,可针对目标图像帧中的人脸对象进行更多类别的识别及标识。具体的,人脸对象类别信息包括下述子信息中的至少其一:用于标识目标图像帧对应的表情类别的识别标记的表情子属性信息;用于标识目标图像帧对应的朝向类别的识别标记的朝向子属性信息;用于标识目标图像帧对应的性别的识别标记的性别子属性信息。Optionally, with the development of the Internet and video shooting-related technologies, users pay more attention to the recognition and processing of human objects when performing video shooting or processing. Therefore, in order to meet the needs of most users, the target image frame can be targeted The face objects of the face objects are recognized and identified in more categories. Specifically, the face object category information includes at least one of the following sub-information: expression sub-attribute information used to identify the recognition mark of the expression category corresponding to the target image frame; recognition used to identify the orientation category corresponding to the target image frame The orientation sub-attribute information of the mark; the gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.
其中,表情类别是对目标图像帧中所包括的人脸按照表情进行分类,例如表情类别对应的识别标记可用于标识“笑”、“哭”、“发呆”等人脸表情。The expression category is to classify the human faces included in the target image frame according to expressions. For example, the recognition mark corresponding to the expression category can be used to identify facial expressions such as "laughing", "cry", and "in a daze".
朝向类别是对目标图像帧中所包括的人脸按照脸部朝向进行分类,例如朝向类别对应的识别标记可用于标识“正面”、“背面”、“侧面”等人脸朝向。The orientation category is to classify the faces included in the target image frame according to the face orientation. For example, the identification mark corresponding to the orientation category can be used to identify the face orientations such as "front", "back", and "side".
性别是对目标图像帧中所包括的人脸按照性别进行分类,例如性别对应的识别标记可用于标识“男性”、“女性”、“不确定”。Gender is to classify the faces included in the target image frame according to gender. For example, the identification mark corresponding to the gender can be used to identify "male", "female", and "uncertain".
步骤S202,根据目标图像帧对应的识别标记信息,确定目标图像帧中的至少一第一目标类别对应的识别标记的数量。Step S202: Determine the number of identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame.
本实施例中,为了减少数据处理和存储量,后续获得的标记描述信息可仅对较为重要的对象类别进行描述及记录。具体的,在步骤S202中可将全部对象类别中的至少一种确定为第一目标类别,从而可根据目标图像帧对应的识别标记信息,确定目标图像帧中第一目标类别对应的全部识别标记的数量。In this embodiment, in order to reduce the amount of data processing and storage, the tag description information obtained subsequently may only describe and record more important object categories. Specifically, in step S202, at least one of all object categories may be determined as the first target category, so that all the identification marks corresponding to the first target category in the target image frame can be determined according to the identification mark information corresponding to the target image frame. quantity.
例如,若第一目标类别为“狗类”,视频片段中的目标图像帧A、目标图像帧B、目标图像帧C中均包括对应的“狗类”的识别标记,其中通过识别标记“01”和“02”标记目标图像帧A中出现的两只狗,通过识别标记“01”和“03”标记目标图像帧B中出现的两只狗,通过识别标记“02”标记目标图像帧C中出现的一只狗,则该视频片段中共使用识别标记“01”、“02”、“03”分别标记出三只狗,即该视频片段中“狗类”对应的全部识别标记的数量为3。For example, if the first target category is "dog", the target image frame A, target image frame B, and target image frame C in the video clip all include the corresponding "dog" identification mark, and the identification mark "01" "And "02" mark the two dogs appearing in the target image frame A, mark the two dogs appearing in the target image frame B with the identification marks "01" and "03", and mark the target image frame C with the identification mark "02" If a dog appears in the video clip, three dogs are marked with the identification marks "01", "02", and "03" in the video clip. That is, the number of all the identification marks corresponding to "dogs" in the video clip is 3.
步骤S203,根据目标图像帧中的至少一第一目标类别对应的识别标记的数量,获得视频片段的标记描述信息。Step S203: Obtain mark description information of the video clip according to the number of identification marks corresponding to at least one first target category in the target image frame.
本实施例中,对视频片段的标记描述信息的记录方式与实施例一中的步骤S102中相同,本实施例在此不再赘述。In this embodiment, the recording method of the mark description information of the video segment is the same as that in step S102 in the first embodiment, and the details are not described herein again in this embodiment.
本实施例中,为了对识别获得的全部或者部分较为重要的识别标记进行记录用于后续的视频处理,还可包括:根据目标图像帧对应的识别标记信息,获得视频片段的标记记录信息,其中,标记记录信息用于记录目标图像帧中的至少一个第二目标类别对应的识别标记。In this embodiment, in order to record all or part of the more important identification marks obtained by identification for subsequent video processing, it may further include: obtaining mark recording information of the video segment according to the identification mark information corresponding to the target image frame, where , The mark recording information is used to record the identification mark corresponding to at least one second target category in the target image frame.
其中,第二类目标类别与前述第一目标类别相同或者不同均可;此外第二目标类别对应的识别标记可以是第二目标类别对应的全部识别标记,也可以是第二目标类别对应的部分识别标记,在实际应用中可根据后续的视频处理需求进行合理选择。Among them, the second target category can be the same or different from the aforementioned first target category; in addition, the identification mark corresponding to the second target category can be all the identification marks corresponding to the second target category, or it can be the part corresponding to the second target category. The identification mark can be selected reasonably according to the subsequent video processing requirements in practical applications.
例如,若确定的第二目标类别为表情类别的话,在目标图像帧中共包括表情类别对应的“笑”、“哭”、“发呆”三种识别标记,标识记录信息可仅用于记录“笑”和“哭”两种识别标记,也可用于记录“笑”、“哭”、“发呆”三种识别标记。For example, if the determined second target category is an expression category, the target image frame includes three identification marks of "laugh", "cry", and "in a daze" corresponding to the expression category, and the identification record information can only be used to record "laugh". The two identification marks of "" and "cry" can also be used to record the three identification marks of "laughing", "cry", and "in a daze".
可选的,由于不同图像识别算法对目标图像帧进行识别后获得的识别标记内容或者标识方式均不同,为了便于进行后续的视频处理,可采用相同的记录方式记录全部第二目标类别对应的识别标记。Optionally, because different image recognition algorithms recognize the target image frame after the target image frame is identified, the content of the identification mark or the identification method are different, in order to facilitate subsequent video processing, the same recording method can be used to record all the identifications corresponding to the second target category. mark.
可选的,为了节约存储空间,可采用int型的ID记录第二目标类别对应的识别标记,其中,每个ID对应一个识别标记。Optionally, in order to save storage space, an int type ID may be used to record the identification mark corresponding to the second target category, where each ID corresponds to an identification mark.
由以上本发明实施例可见,本发明实施例根据目标图像帧对应的识别标记信息,可获得用于记录至少一第一目标类别对应的识别标记的数量的视频片段描述信息;并且通过选用后续视频处理常用到的对象类别对视频片段中连续图像帧进行识别,可降低数据处理及存储量;通过采用统一的方式记录标记记录信息,便于后续对数据的管理和使用。It can be seen from the above embodiments of the present invention that, according to the identification mark information corresponding to the target image frame, the embodiment of the present invention can obtain video clip description information used to record the number of identification marks corresponding to at least one first target category; and by selecting subsequent videos Dealing with commonly used object categories to identify continuous image frames in video clips can reduce data processing and storage; by adopting a unified way to record mark and record information, it is convenient for subsequent management and use of data.
实施例三Example three
本申请实施例三提供一种视频片段标记方法,如图3所示,图3为本申请实施例提供的一种视频片段标记的示意性流程图,包括:The third embodiment of the present application provides a video segment marking method, as shown in FIG. 3, which is a schematic flowchart of a video segment marking provided by an embodiment of the application, including:
步骤S301,对视频片段中的连续图像帧进行识别,获得连续图像帧中至少一个目标图像帧所对应的属性信息,其中,属性信息包括识别标记信息和时间信息。Step S301: Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames, where the attribute information includes identification mark information and time information.
本实施例中,由于视频片段中的连续图像帧均包括对应的时间戳,为了对视频片段与时间相关的信息进行描述,在对视频片段中的连续图像帧进行识别时,可获得用于标识目标图像帧对应的时间戳的时间信息。In this embodiment, since the continuous image frames in the video clip all include the corresponding time stamp, in order to describe the time-related information of the video clip, when the continuous image frames in the video clip are identified, they can be used to identify Time information of the timestamp corresponding to the target image frame.
步骤S302,根据目标图像帧对应的识别标记信息获得视频片段的标记描述信息,以及根据目标图像帧对应的时间信息获得第一时间描述信息和/或第二时间描述信息。Step S302: Obtain the tag description information of the video segment according to the identification tag information corresponding to the target image frame, and obtain the first time description information and/or the second time description information according to the time information corresponding to the target image frame.
本实施例中,第一时间描述信息用于记录包括至少一目标标记的目标图像帧对应的时间戳,从而根据第一时间描述信息可确定目标标记所标识的对象或者目标图像帧在视频片段中的出现时间。根据第一时间描述信息,后续可更为便捷地对包括目标标记的目标图像帧或者视频片段进行聚类、筛选等视频处理操作。In this embodiment, the first time description information is used to record the time stamp corresponding to the target image frame including at least one target mark, so that the object or target image frame identified by the target mark can be determined in the video clip according to the first time description information. Time of appearance. According to the first time description information, subsequent video processing operations such as clustering and screening of target image frames or video fragments including target tags can be performed more conveniently.
例如,在对视频片进行识别及描述时,用户可能会重点关注一只猫的出现情况,为满足该需求,可首先在目标图像帧中使用预设的目标标记对这只猫进行标识;然后通过获取包括目标标记的至少一目标图像帧对应的时间戳,可以确定这只猫在视频片段中的全部出现时间;从而最后可生成用于描述这只猫在视频片段中出现时间的第一时间描述信息。For example, when recognizing and describing a video clip, the user may focus on the appearance of a cat. To meet this requirement, the cat can be identified using a preset target mark in the target image frame; By obtaining the timestamp corresponding to at least one target image frame including the target mark, the total appearance time of the cat in the video segment can be determined; thus, the first time describing the appearance time of the cat in the video segment can be finally generated Description.
可选的,为有效进行数据存储且节约存储空间,第一时间描述信息可以使用数组结构进行记录,其中,数组内存放的数字用于标识包括至少一目标标记的目标图像帧对应的时间戳。Optionally, in order to effectively store data and save storage space, the first time description information can be recorded using an array structure, where the numbers stored in the array are used to identify the timestamp corresponding to the target image frame including at least one target mark.
其中,在实际应用中,目标标记通常用于标记后续视频处理所需的对象或者用户较为关注的对象。目标标记为至少一对象类别对应的识别标记中的一种或多种,可根据视频描述需求进行预先设定。Among them, in practical applications, the target mark is usually used to mark objects required for subsequent video processing or objects that the user pays more attention to. The target mark is one or more of the identification marks corresponding to at least one object category, which can be preset according to video description requirements.
本实施例中,第二时间描述信息用于记录视频片段的开始时间戳和结束时间戳,从而后续可根据第二时间描述信息确定视频片段的开始和结束时间。In this embodiment, the second time description information is used to record the start time stamp and the end time stamp of the video segment, so that the start and end time of the video segment can be determined subsequently based on the second time description information.
其中,视频片段的开始时间戳为视频片段连续图像帧中的第一个所对应的时间戳,视频片段的结束时间戳为视频片段连续图像帧中的最后一个所对应的时间戳。Wherein, the start timestamp of the video segment is the timestamp corresponding to the first one of the continuous image frames of the video segment, and the end timestamp of the video segment is the timestamp corresponding to the last one of the continuous image frames of the video segment.
可选的,为有效进行数据记录且节约存储空间,第二时间描述信息可以使用标识开始时间戳和结束时间戳的一串数字进行记录。Optionally, in order to effectively perform data recording and save storage space, the second time description information may be recorded using a series of numbers that identify the start time stamp and the end time stamp.
由以上本发明实施例可见,本发明实施例根据目标图像帧对应的时间信息获得第一时间描述信息和/或第二时间描述信息,可对视频片段与时间相关的信息进行描述与记录,因此对视频片段进行描述的信息可包括标记描述信息、第一时间描述信息和/或第二时间描述信息等多种,可更好满足后续视频处理需求。It can be seen from the above embodiments of the present invention that the embodiment of the present invention obtains the first time description information and/or the second time description information according to the time information corresponding to the target image frame, and can describe and record the time-related information of the video clip. The information describing the video segment may include multiple types of tag description information, first time description information, and/or second time description information, which can better meet subsequent video processing requirements.
实施例四Example four
如图4所示,图4为本申请实施例四提供的一种视频处理设备40,包括:存储器401、处理器402、视频采集器403,所述视频采集器403用于采集目标区域的待跟踪目标;所述存储器401用于存储程序代码;所述处理器402,调用所述程序代码,当程序代码被执行时,用于执行以下操作:As shown in FIG. 4, FIG. 4 is a video processing device 40 provided in the fourth embodiment of the application, including: a memory 401, a processor 402, and a video collector 403. Tracking target; the memory 401 is used to store program code; the processor 402 calls the program code, and when the program code is executed, it is used to perform the following operations:
对视频片段中的连续图像帧进行识别,获得所述连续图像帧中至少一个目标图像帧所对应的属性信息;Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;
根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息,其中,所述标记描述信息包括基于比特位来记录的信息,所述比特位的长度为T*N,T表示所述目标图像帧中的对象类别数量,N为大于或者等于4的整数。According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
在一个实施例中,所述属性信息包括用于标识所述目标图像帧对应的至少一对象类别的识别标记的识别标记信息;对应的,根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息包括:In one embodiment, the attribute information includes identification mark information for identifying the identification mark of at least one object category corresponding to the target image frame; correspondingly, the attribute information corresponding to the target image frame is obtained. The tag description information of the video clip includes:
根据所述目标图像帧对应的所述识别标记信息,确定所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量;Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;
根据所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量,获得所述视频片段的标记描述信息。According to the number of the identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.
在一个实施例中,所述识别标记信息包括下述信息中的至少其一:In an embodiment, the identification mark information includes at least one of the following information:
用于标识所述目标图像帧对应的物体对象类别的识别标记的物体类别信息;用于标识所述目标图像帧对应的场景对象类别的识别标记的场景类别信息;用于标识所述目目标图像帧对应的人脸对象类别的识别标记的人脸类别信息。Object category information used to identify the identification mark of the object category corresponding to the target image frame; scene category information used to identify the identification mark of the scene object category corresponding to the target image frame; used to identify the target image Face category information of the recognition mark of the face object category corresponding to the frame.
在一个实施例中,所述人脸类别信息包括下述子信息中的至少其一:In an embodiment, the face category information includes at least one of the following sub-information:
用于标识所述目标图像帧对应的表情类别的识别标记的表情子属性信息;用于标识所述目标图像帧对应的朝向类别的识别标记的朝向子属性信息;用于标识所述目标图像帧对应的性别的识别标记的性别子属性信息。Expression sub-attribute information used to identify the recognition tag of the expression category corresponding to the target image frame; orientation sub-attribute information used to identify the recognition tag of the orientation category corresponding to the target image frame; used to identify the target image frame The gender sub-attribute information of the corresponding gender identification mark.
在一个实施例中,所述处理器,调用所述程序代码,当程序代码被执行时,还用于执行以下操作:根据所述目标图像帧对应的所述识别标记信息,获得所述视频片段的标记记录信息,其中,所述标记记录信息用于记录所述目标图像帧中的至少一个第二目标类别对应的识别标记。In one embodiment, the processor calls the program code, and when the program code is executed, it is further configured to perform the following operations: obtain the video clip according to the identification mark information corresponding to the target image frame The mark recording information of the, wherein the mark recording information is used to record the identification mark corresponding to at least one second target category in the target image frame.
在一个实施例中,所述属性信息包括用于标识所述目标图像帧对应的时间戳的时间信息;所述处理器,调用所述程序代码,当程序代码被执行时,还用于执行以下操作:根据所述目标图像帧对应的所述时间信息,获得所述视频片段的第一时间描述信息和/或第二时间描述信息;其中,所述第一时间描述信息用于记录包括至少一目标标记的所述目标图像帧对应的时间戳,所述第二时间描述信息用于记录所述视频片段的开始时间戳和结束时间戳。In an embodiment, the attribute information includes time information used to identify the time stamp corresponding to the target image frame; the processor calls the program code, and when the program code is executed, it is also used to execute the following Operation: Obtain the first time description information and/or the second time description information of the video clip according to the time information corresponding to the target image frame; wherein the first time description information used for recording includes at least one A timestamp corresponding to the target image frame of the target mark, and the second time description information is used to record the start timestamp and the end timestamp of the video segment.
在一个实施例中,所述N等于8。In one embodiment, the N is equal to 8.
实施例五Example five
一种手持相机,包括前述实施例四中所述的视频处理设备,还包括:承载器,所述承载器与所述视频采集器固定连接,用于承载所述视频采集器的至少一部分。A handheld camera, including the video processing device described in the fourth embodiment, further includes: a carrier, which is fixedly connected to the video collector, and is configured to carry at least a part of the video collector.
在一个实施例中,所述承载器包括但不限于手持云台。In one embodiment, the carrier includes, but is not limited to, a handheld pan/tilt.
在一个实施例中,所述手持云台为手持三轴云台。In one embodiment, the handheld pan/tilt is a handheld three-axis pan/tilt.
在一个实施例中,所述视频采集器包括但不限于手持三轴云台用摄像头。In one embodiment, the video capture device includes, but is not limited to, a handheld three-axis pan-tilt camera.
下面对手持云台相机的基本构造进行简单介绍。The basic structure of the handheld pan/tilt camera is briefly introduced below.
如图5所示,本发明实施例的手持云台1,包括:手柄11和装载于所述手柄11的拍摄装置12,在本实施例中,所述拍摄装置12可以包括三轴云台相机,在其他实施例中包括两轴或三轴以上的云台相机。As shown in FIG. 5, the handheld pan/tilt head 1 of the embodiment of the present invention includes a handle 11 and a photographing device 12 loaded on the handle 11. In this embodiment, the photographing device 12 may include a three-axis pan/tilt camera , In other embodiments, it includes a pan-tilt camera with two axes or more than three axes.
所述手柄11设有用于显示所述拍摄装置12的拍摄内容的显示屏13。本发明不对显示屏13的类型进行限定。The handle 11 is provided with a display screen 13 for displaying the shooting content of the shooting device 12. The invention does not limit the type of the display screen 13.
通过在手持云台1的手柄11设置显示屏13,该显示屏可以显示拍摄装置12的拍摄内容,以实现用户能够通过该显示屏13快速浏览拍摄装置12所拍摄的图片或是视频,从而提高手持云台1与用户的互动性及趣味性,满足用户的多样化需求。By setting the display screen 13 on the handle 11 of the handheld PTZ 1, the display screen can display the shooting content of the shooting device 12, so that the user can quickly browse the pictures or videos shot by the shooting device 12 through the display screen 13, thereby improving The interaction and fun of the handheld PTZ 1 with the user meets the diverse needs of the user.
在一个实施例中,所述手柄11还设有用于控制所述拍摄装置12的操作功能部,通过操作所述操作功能部,能够控制拍摄装置12的工作,例如,控制拍摄装置12的开启与关闭、控制拍摄装置12的拍摄、控制拍摄装置12云台部分的姿态变化等,以便于用户对拍摄装置12进行快速操作。其中,所述操作功能部可以为按键、旋钮或者触摸屏的形式。In one embodiment, the handle 11 is further provided with an operating function unit for controlling the camera 12, and by operating the operating function unit, the operation of the camera 12 can be controlled, for example, the opening and closing of the camera 12 can be controlled. Turning off and controlling the shooting of the shooting device 12, controlling the posture change of the pan-tilt part of the shooting device 12, etc., so that the user can quickly operate the shooting device 12. Wherein, the operation function part may be in the form of a button, a knob or a touch screen.
在一个实施例中,操作功能部包括用于控制所述拍摄装置12拍摄的拍摄按键14和用于控制所述拍摄装置12启闭和其他功能的电源/功能按键15,以 及控制所述云台移动的万向键16。当然,操作功能部还可以包括其他控制按键,如影像存储按键、影像播放控制按键等等,可以根据实际需求进行设定。In one embodiment, the operating function unit includes a photographing button 14 for controlling the photographing of the photographing device 12, a power/function button 15 for controlling the opening and closing of the photographing device 12 and other functions, as well as controlling the pan/tilt. Move the universal key 16. Of course, the operating function unit may also include other control buttons, such as image storage buttons, image playback control buttons, etc., which can be set according to actual needs.
在一个实施例中,所述操作功能部和所述显示屏13设于所述手柄11的同一面,图5中所示操作功能部和显示屏13均设于手柄11的正面,符合人机工程学,同时使整个手持云台1的外观布局更合理美观。In one embodiment, the operation function part and the display screen 13 are arranged on the same side of the handle 11. The operation function part and the display screen 13 shown in FIG. Engineering, and at the same time make the overall appearance and layout of the handheld PTZ 1 more reasonable and beautiful.
进一步地,所述手柄11的侧面设置有功能操作键A,用于方便用户快速地智能一键成片。摄影机开启时,点按机身右侧橙色侧面键开启功能,则每隔一段时间自动拍摄一段视频,总共拍摄N段(N≥2),连接移动设备例如手机后,选择“一键成片”功能,系统智能筛选拍摄片段并匹配合适模板,快速生成精彩作品。Further, the side of the handle 11 is provided with a function operation key A, which is used to facilitate the user to quickly and intelligently form a sheet with one key. When the camera is turned on, click the orange side button on the right side of the fuselage to turn on the function, and it will automatically shoot a segment of video at regular intervals. A total of N segments (N≥2) will be captured. After connecting to a mobile device such as a mobile phone, select "One-click to film" Function, the system intelligently screens shots and matches suitable templates to quickly generate wonderful works.
在一可选的实施方式中,所述手柄11还设有用于插接存储元件的卡槽17。在本实施例中,卡槽17设于所述手柄11上与所述显示屏13相邻的侧面,在卡槽17中插入存储卡,即可将拍摄装置12拍摄的影像存储在存储卡中。并且,将卡槽17设置在侧部,不会影响到其他功能的使用,用户体验较佳。In an optional embodiment, the handle 11 is further provided with a card slot 17 for inserting a storage element. In this embodiment, the card slot 17 is provided on the side of the handle 11 adjacent to the display screen 13, and a memory card is inserted into the card slot 17 to store the images taken by the camera 12 in the memory card. . Moreover, arranging the card slot 17 on the side does not affect the use of other functions, and the user experience is better.
在一个实施例中,手柄11内部可以设置用于对手柄11及拍摄装置12供电的供电电池。供电电池可以采用锂电池,容量大、体积小,以实现手持云台1的小型化设计。In an embodiment, a power supply battery for supplying power to the handle 11 and the imaging device 12 may be provided inside the handle 11. The power supply battery can be a lithium battery with large capacity and small size to realize the miniaturized design of the handheld pan/tilt 1.
在一个实施例中,所述手柄11还设有充电接口/USB接口18。在本实施例中,所述充电接口/USB接口18设于所述手柄11的底部,便于连接外部电源或存储装置,从而对所述供电电池进行充电或进行数据传输。In one embodiment, the handle 11 is also provided with a charging interface/USB interface 18. In this embodiment, the charging interface/USB interface 18 is provided at the bottom of the handle 11 to facilitate connection with an external power source or storage device, so as to charge the power supply battery or perform data transmission.
在一个实施例中,所述手柄11还设有用于接收音频信号的拾音孔19,拾音孔19内部联通麦克风。拾音孔19可以包括一个,也可以包括多个。还包括用于显示状态的指示灯20。用户可以通过拾音孔19与显示屏13实现音频交互。另外,指示灯20可以达到提醒作用,用户可以通过指示灯20获得手持云台1的电量情况和目前执行功能情况。此外,拾音孔19和指示灯20也均可以设于手柄11的正面,更符合用户的使用习惯以及操作便捷性。In one embodiment, the handle 11 is further provided with a sound pickup hole 19 for receiving audio signals, and the sound pickup hole 19 communicates with a microphone inside. The sound pickup hole 19 may include one or more. It also includes an indicator light 20 for displaying status. The user can realize audio interaction with the display screen 13 through the sound pickup hole 19. In addition, the indicator light 20 can serve as a reminder, and the user can obtain the power status of the handheld PTZ 1 and the current execution function status through the indicator light 20. In addition, the sound pickup hole 19 and the indicator light 20 can also be arranged on the front of the handle 11, which is more in line with the user's usage habits and operation convenience.
在一个实施例中,所述拍摄装置12包括云台支架和搭载于所述云台支架的拍摄器。所述拍摄器可以为相机,也可以为由透镜和图像传感器(如CMOS或CCD)等组成的摄像元件,具体可根据需要选择。所述拍摄器可以集成在云台支架上,从而拍摄装置12为云台相机;也可以为外部拍摄设备,可拆卸地连接或夹持而搭载于云台支架。In one embodiment, the imaging device 12 includes a pan-tilt support and a camera mounted on the pan-tilt support. The imager may be a camera, or an image pickup element composed of a lens and an image sensor (such as CMOS or CCD), etc., which can be specifically selected according to needs. The camera may be integrated on the pan-tilt support, so that the photographing device 12 is a pan-tilt camera; it may also be an external photographing device, which can be detachably connected or clamped to be mounted on the pan-tilt support.
在一个实施例中,所述云台支架为三轴云台支架,而所述拍摄装置12为三轴云台相机。所述三轴云台支架包括偏航轴组件22、与所述偏航轴组件22活动连接的横滚轴组件23、以及与所述横滚轴组件23活动连接的俯仰轴组件24,所述拍摄器搭载于所述俯仰轴组件24。所述偏航轴组件22带动拍摄装置12沿偏航方向转动。当然,在其他例子中,所述云台支架也可以为两轴云台、四轴云台等,具体可根据需要选择。In one embodiment, the pan/tilt support is a three-axis pan/tilt support, and the photographing device 12 is a three-axis pan/tilt camera. The three-axis pan/tilt head bracket includes a yaw axis assembly 22, a roll axis assembly 23 movably connected to the yaw axis assembly 22, and a pitch axis assembly 24 movably connected to the roll axis assembly 23. The camera is mounted on the pitch axis assembly 24. The yaw axis assembly 22 drives the camera 12 to rotate in the yaw direction. Of course, in other examples, the pan/tilt support can also be a two-axis pan/tilt, a four-axis pan/tilt, etc., which can be specifically selected according to needs.
在一个实施例中,还设置有安装部,安装部设置于与所述横滚轴组件连接的连接臂的一端,而偏航轴组件可以设置于所述手柄中,所述偏航轴组件带动拍摄装置12一起沿偏航方向转动。In one embodiment, a mounting portion is further provided, the mounting portion is provided at one end of the connecting arm connected to the roll shaft assembly, and the yaw shaft assembly may be set in the handle, and the yaw shaft assembly drives The camera 12 rotates in the yaw direction together.
在一可选的实施方式中,如图6所示,所述手柄11设有用于与移动设备2(如手机)耦合连接的转接件26,所述转接件26与所述手柄11可拆卸连接。所述转接件26自所述手柄的侧部凸伸而出以用于连接所述移动设备2,当所述转接件26与所述移动设备2连接后,所述手持云台1与所述转接件26对接并用于被支撑于所述移动设备2的端部。In an alternative embodiment, as shown in FIG. 6, the handle 11 is provided with an adapter 26 for coupling with a mobile device 2 (such as a mobile phone), and the adapter 26 and the handle 11 can be Disconnect the connection. The adapter 26 protrudes from the side of the handle for connecting to the mobile device 2. When the adapter 26 is connected to the mobile device 2, the handheld platform 1 and The adapter 26 is docked and used to be supported at the end of the mobile device 2.
在手柄11设置用于与移动设备2连接的转接件26,进而将手柄11和移动设备2相互连接,手柄11可作为移动设备2的一个底座,用户可以通过握持移动设备2的另一端来一同把手持云台1拿起操作,连接方便快捷,产品美观性强。此外,手柄11通过转接件26与移动设备2耦合连接后,能够实现手持云台1与移动设备2之间的通信连接,拍摄装置12与移动设备2之间能够进行数据传输。The handle 11 is provided with an adapter 26 for connecting with the mobile device 2 to connect the handle 11 and the mobile device 2 to each other. The handle 11 can be used as a base of the mobile device 2. The user can hold the other end of the mobile device 2 Let's pick up and operate the handheld PTZ 1 together, the connection is convenient and fast, and the product is beautiful. In addition, after the handle 11 is coupled to the mobile device 2 through the adapter 26, a communication connection between the handheld pan-tilt 1 and the mobile device 2 can be realized, and the camera 12 and the mobile device 2 can transmit data.
在一个实施例中,所述转接件26与所述手柄11可拆卸连接,即转接件26和手柄11之间可以实现机械方面的连接或拆除。进一步地,所述转接件26设有电接触部,所述手柄11设有与所述电接触部配合的电接触配合部。In one embodiment, the adapter 26 and the handle 11 are detachably connected, that is, the adapter 26 and the handle 11 can be mechanically connected or removed. Further, the adapter 26 is provided with an electrical contact portion, and the handle 11 is provided with an electrical contact matching portion that matches with the electrical contact portion.
这样,当手持云台1不需要与移动设备2连接时,可以将转接件26从手柄11上拆除。当手持云台1需要与移动设备2连接时,再将转接件26装到手柄11上,完成转接件26和手柄11之间的机械连接,同时通过电接触部和电接触配合部的连接保证两者之间的电性连接,以实现拍摄装置12与移动设备2之间能够通过转接件26进行数据传输。In this way, when the handheld PTZ 1 does not need to be connected to the mobile device 2, the adapter 26 can be removed from the handle 11. When the handheld PTZ 1 needs to be connected to the mobile device 2, then the adapter 26 is installed on the handle 11 to complete the mechanical connection between the adapter 26 and the handle 11, and at the same time through the electrical contact part and the electrical contact mating part. The connection ensures the electrical connection between the two, so as to realize the data transmission between the camera 12 and the mobile device 2 through the adapter 26.
在一个实施例中,如图5所示,所述手柄11的侧部设有收容槽27,所述转接件26滑动卡接于所述收容槽27内。当转接件26装到收容槽27后,转接件26部分凸出于所述收容槽27,转接件26凸出收容槽27的部分用于与移动 设备2连接。In one embodiment, as shown in FIG. 5, a receiving groove 27 is provided on the side of the handle 11, and the adapter 26 is slidably clamped in the receiving groove 27. After the adapter 26 is installed in the receiving slot 27, the adapter 26 partially protrudes from the receiving slot 27, and the portion of the adapter 26 protruding from the receiving slot 27 is used to connect with the mobile device 2.
在一个实施例中,参见图5所示,所当述转接件26自所述转接件26装入所述收容槽27时,所述转接部与所述收容槽27齐平,进而将转接件26收纳在手柄11的收容槽27内。In one embodiment, referring to FIG. 5, when the adapter 26 is inserted into the receiving groove 27 from the adapter 26, the adapter part is flush with the receiving groove 27, and then The adapter 26 is stored in the receiving groove 27 of the handle 11.
因此,当手持云台1需要和移动设备2连接时,可以将转接件26自所述转接部装入所述收容槽27内,使得转接件26凸出于所述收容槽27,以便移动设备2与手柄11相互连接Therefore, when the handheld platform 1 needs to be connected to the mobile device 2, the adapter 26 can be inserted into the receiving groove 27 from the adapter part, so that the adapter 26 protrudes from the receiving groove 27, So that the mobile device 2 and the handle 11 are connected to each other
当移动设备2使用完毕后,或者需要将移动设备2拔下时,可以将转接件26从手柄11的收容槽27内取出,然后反向自所述转接件26装入所述收容槽27内,进而将转接件26收纳在手柄11内。转接件26与手柄11的收容槽27齐平当转接件26收纳在手柄11内后,可以保证手柄11的表面平整,而且将转接件26收纳在手柄11内更便于携带。After the mobile device 2 is used up, or when the mobile device 2 needs to be unplugged, the adapter 26 can be taken out of the receiving slot 27 of the handle 11, and then inserted into the receiving slot from the adapter 26 in the reverse direction 27, the adapter 26 is further stored in the handle 11. The adapter 26 is flush with the receiving groove 27 of the handle 11. After the adapter 26 is stored in the handle 11, the surface of the handle 11 can be ensured to be flat, and the adapter 26 is stored in the handle 11 to make it easier to carry.
在一个实施例中,所述收容槽27是半开放式地开设在手柄11的一侧表面,这样更便于转接件26与收容槽27进行滑动卡接。当然,在其他例子中,转接件26也可以采用卡扣连接、插接等方式与手柄11的收容槽27可拆卸连接。In one embodiment, the receiving groove 27 is semi-opened on one side surface of the handle 11, which makes it easier for the adapter 26 to be slidably connected to the receiving groove 27. Of course, in other examples, the adapter 26 can also be detachably connected to the receiving slot 27 of the handle 11 by means of a snap connection, a plug connection, or the like.
在一个实施例中,收容槽27设置于手柄11的侧面,在不使用转接功能时,通过盖板28卡接覆盖该收容槽27,这样便于用户操作,同时也不影响手柄的正面和侧面的整体外观。In one embodiment, the receiving groove 27 is provided on the side of the handle 11. When the transfer function is not used, the receiving groove 27 is clamped and covered by the cover 28, which is convenient for the user to operate, and does not affect the front and sides of the handle. The overall appearance.
在一个实施例中,所述电接触部与电接触配合部之间可以采用触点接触的方式实现电连接。例如,所述电接触部可以选择为伸缩探针,也可以选择为电插接口,还可以选择为电触点。当然,在其他例子中,所述电接触部与电接触配合部之间也可以直接采用面与面的接触方式实现电连接。In an embodiment, the electrical contact part and the electrical contact mating part may be electrically connected in a contact contact manner. For example, the electrical contact portion can be selected as a telescopic probe, can also be selected as an electrical plug-in interface, or can be selected as an electrical contact. Of course, in other examples, the electrical contact portion and the electrical contact mating portion can also be directly connected to each other in a surface-to-surface contact manner.
A1、一种视频片段标记方法,其特征在于,包括:A1. A method for marking video clips, characterized in that it comprises:
对视频片段中的连续图像帧进行识别,获得所述连续图像帧中至少一个目标图像帧所对应的属性信息;Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;
根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息,其中,所述标记描述信息包括基于比特位来记录的信息,所述比特位的长度为T*N,T表示所述目标图像帧中的对象类别数量,N为大于或者等于4的整数。According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
A2、根据A1所述的视频片段标记方法,其特征在于,所述属性信息包括用于标识所述目标图像帧对应的至少一对象类别的识别标记的识别标记信 息;对应的,根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息包括:A2. The video segment marking method according to A1, wherein the attribute information includes identification mark information for identifying identification marks of at least one object category corresponding to the target image frame; correspondingly, according to the target The attribute information corresponding to the image frame, and obtaining the mark description information of the video segment includes:
根据所述目标图像帧对应的所述识别标记信息,确定所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量;Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;
根据所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量,获得所述视频片段的标记描述信息。According to the number of the identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.
A3、根据A2所述的视频片段标记方法,其特征在于,所述识别标记信息包括下述信息中的至少其一:A3. The video clip marking method according to A2, wherein the identification marking information includes at least one of the following information:
用于标识所述目标图像帧对应的物体对象类别的识别标记的物体类别信息;Object category information used to identify the identification mark of the object category corresponding to the target image frame;
用于标识所述目标图像帧对应的场景对象类别的识别标记的场景类别信息;Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame;
用于标识所述目目标图像帧对应的人脸对象类别的识别标记的人脸类别信息。Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
A4、根据A3所述的视频片段标记方法,其特征在于,所述人脸类别信息包括下述子信息中的至少其一:A4. The video clip marking method according to A3, wherein the face category information includes at least one of the following sub-information:
用于标识所述目标图像帧对应的表情类别的识别标记的表情子属性信息;Expression sub-attribute information of the recognition tag used to identify the expression category corresponding to the target image frame;
用于标识所述目标图像帧对应的朝向类别的识别标记的朝向子属性信息;Orientation sub-attribute information of the identification mark used to identify the orientation category corresponding to the target image frame;
用于标识所述目标图像帧对应的性别的识别标记的性别子属性信息。The gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.
A5、根据A2所述的视频片段标记方法,其特征在于,所述方法还包括:A5. The video segment marking method according to A2, wherein the method further includes:
根据所述目标图像帧对应的所述识别标记信息,获得所述视频片段的标记记录信息,其中,所述标记记录信息用于记录所述目标图像帧中的至少一个第二目标类别对应的识别标记。According to the identification mark information corresponding to the target image frame, the mark record information of the video clip is obtained, wherein the mark record information is used to record the identification corresponding to at least one second target category in the target image frame mark.
A6、根据A2所述的视频片段标记方法,其特征在于,所述属性信息包括用于标识所述目标图像帧对应的时间戳的时间信息,所述方法还包括:A6. The video segment marking method according to A2, wherein the attribute information includes time information used to identify a timestamp corresponding to the target image frame, and the method further includes:
根据所述目标图像帧对应的所述时间信息,获得所述视频片段的第一时间描述信息和/或第二时间描述信息;其中,所述第一时间描述信息用于记录包括至少一目标标记的所述目标图像帧对应的时间戳,所述第二时间描述信息用于记录所述视频片段的开始时间戳和结束时间戳。According to the time information corresponding to the target image frame, obtain the first time description information and/or the second time description information of the video clip; wherein the first time description information is used for recording including at least one target mark The time stamp corresponding to the target image frame, and the second time description information is used to record the start time stamp and the end time stamp of the video segment.
A7、根据A1所述的视频片段标记方法,其特征在于,所述N等于8。A7. The video segment marking method according to A1, wherein the N is equal to 8.
A8、一种视频片段标记设备,其特征在于,包括:存储器、处理器、视频采集器,所述视频采集器用于采集目标区域的待跟踪目标;所述存储器用于存储程序代码;所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:A8. A video segment marking device, characterized by comprising: a memory, a processor, and a video collector, the video collector is used to collect a target to be tracked in a target area; the memory is used to store program code; the processing The program code is called, and when the program code is executed, it is used to perform the following operations:
对视频片段中的连续图像帧进行识别,获得所述连续图像帧中至少一个目标图像帧所对应的属性信息;Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;
根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息,其中,所述标记描述信息包括基于比特位来记录的信息,所述比特位的长度为T*N,T表示所述目标图像帧中的对象类别数量,N为大于或者等于4的整数。According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
A9、根据A8所述视频片段标记设备,其特征在于,所述属性信息包括用于标识所述目标图像帧对应的至少一对象类别的识别标记的识别标记信息;对应的,根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息包括:A9. The video clip marking device according to A8, wherein the attribute information includes identification mark information for identifying identification marks of at least one object category corresponding to the target image frame; correspondingly, according to the target image The attribute information corresponding to the frame, and obtaining the mark description information of the video segment includes:
根据所述目标图像帧对应的所述识别标记信息,确定所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量;Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;
根据所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量,获得所述视频片段的标记描述信息。According to the number of the identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.
A10、根据A9所述视频片段标记设备,其特征在于,所述识别标记信息包括下述信息中的至少其一:A10. The video clip marking device according to A9, wherein the identification marking information includes at least one of the following information:
用于标识所述目标图像帧对应的物体对象类别的识别标记的物体类别信息;Object category information used to identify the identification mark of the object category corresponding to the target image frame;
用于标识所述目标图像帧对应的场景对象类别的识别标记的场景类别信息;Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame;
用于标识所述目目标图像帧对应的人脸对象类别的识别标记的人脸类别信息。Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
A11、根据A10所述视频片段标记设备,其特征在于,所述人脸类别信息包括下述子信息中的至少其一:A11. The video clip marking device according to A10, wherein the face category information includes at least one of the following sub-information:
用于标识所述目标图像帧对应的表情类别的识别标记的表情子属性信息;Expression sub-attribute information of the recognition tag used to identify the expression category corresponding to the target image frame;
用于标识所述目标图像帧对应的朝向类别的识别标记的朝向子属性信息;Orientation sub-attribute information of the identification mark used to identify the orientation category corresponding to the target image frame;
用于标识所述目标图像帧对应的性别的识别标记的性别子属性信息。The gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.
A12、根据A9所述视频片段标记设备,其特征在于,所述处理器,调用所述程序代码,当程序代码被执行时,还用于执行以下操作:A12. The video clip marking device according to A9, wherein the processor calls the program code, and when the program code is executed, it is further configured to perform the following operations:
根据所述目标图像帧对应的所述识别标记信息,获得所述视频片段的标记记录信息,其中,所述标记记录信息用于记录所述目标图像帧中的至少一个第二目标类别对应的识别标记。According to the identification mark information corresponding to the target image frame, the mark record information of the video clip is obtained, wherein the mark record information is used to record the identification corresponding to at least one second target category in the target image frame mark.
A13、根据A9所述视频片段标记设备,其特征在于,所述属性信息包括用于标识所述目标图像帧对应的时间戳的时间信息;所述处理器,调用所述程序代码,当程序代码被执行时,还用于执行以下操作:A13. The video clip marking device according to A9, wherein the attribute information includes time information used to identify the timestamp corresponding to the target image frame; the processor calls the program code, when the program code When executed, it is also used to perform the following operations:
根据所述目标图像帧对应的所述时间信息,获得所述视频片段的第一时间描述信息和/或第二时间描述信息;其中,所述第一时间描述信息用于记录包括至少一目标标记的所述目标图像帧对应的时间戳,所述第二时间描述信息用于记录所述视频片段的开始时间戳和结束时间戳。According to the time information corresponding to the target image frame, obtain the first time description information and/or the second time description information of the video clip; wherein the first time description information is used for recording including at least one target mark The time stamp corresponding to the target image frame, and the second time description information is used to record the start time stamp and the end time stamp of the video segment.
A14、根据A8所述视频片段标记设备,其特征在于,所述N等于8。A14. The video segment marking device according to A8, wherein the N is equal to 8.
A15、一种手持相机,其特征在于,包括根据A8-A14中任一项所述的视频片段标记设备,其特征在于,还包括:承载器,所述承载器与所述视频采集器固定连接,用于承载所述视频采集器的至少一部分。A15. A handheld camera, characterized by comprising the video clip marking device according to any one of A8-A14, characterized by further comprising: a carrier, which is fixedly connected to the video collector , Used to carry at least a part of the video collector.
A16、如A15所述的手持相机,其特征在于,所述承载器包括但不限于手持云台。A16. The handheld camera according to A15, wherein the carrier includes but is not limited to a handheld pan/tilt.
A17、如A16所述的手持相机,其特征在于,所述手持云台为手持三轴云台。A17. The handheld camera according to A16, wherein the handheld PTZ is a handheld three-axis PTZ.
A18、如A15所述的手持相机,其特征在于,所述视频采集器包括但不限于手持三轴云台用摄像头。A18. The handheld camera according to A15, wherein the video capture device includes but is not limited to a handheld three-axis pan-tilt camera.
至此,已经对本主题的特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作可以按照不同的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序,以实现期望的结果。在某些实施方式中,多任务处理和并行处理可以是有利的。So far, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired result. In certain embodiments, multitasking and parallel processing may be advantageous.
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的 改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) and software improvements (improvements in method flow). However, with the development of technology, the improvement of many methods and processes of today can be regarded as a direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by the hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)) is such an integrated circuit whose logic function is determined by the user's programming of the device. It is programmed by the designer to "integrate" a digital system on a PLD without requiring the chip manufacturer to design and manufacture a dedicated integrated circuit chip. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly realized by using "logic compiler" software, which is similar to the software compiler used in program development and writing, but before compilation The original code must also be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one type of HDL, but many types, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description), etc., currently most commonly used It is VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that just a little bit of logic programming of the method flow in the above-mentioned hardware description languages and programming into an integrated circuit can easily obtain the hardware circuit that implements the logic method flow.
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包 括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller can be implemented in any suitable manner. For example, the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic. Those skilled in the art also know that, in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded logic. The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件 和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as a method, a system, or a computer program product. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定事务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行事务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific transactions or implement specific abstract data types. This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network execute transactions. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the difference from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application, and are not used to limit the present application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims (10)

  1. 一种视频片段标记方法,其特征在于,包括:A video segment marking method, characterized in that it comprises:
    对视频片段中的连续图像帧进行识别,获得所述连续图像帧中至少一个目标图像帧所对应的属性信息;Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;
    根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息,其中,所述标记描述信息包括基于比特位来记录的信息,所述比特位的长度为T*N,T表示所述目标图像帧中的对象类别数量,N为大于或者等于4的整数。According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
  2. 根据权利要求1所述的视频片段标记方法,其特征在于,所述属性信息包括用于标识所述目标图像帧对应的至少一对象类别的识别标记的识别标记信息;对应的,根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息包括:The video segment marking method according to claim 1, wherein the attribute information includes identification mark information used to identify an identification mark of at least one object category corresponding to the target image frame; correspondingly, according to the target The attribute information corresponding to the image frame, and obtaining the mark description information of the video segment includes:
    根据所述目标图像帧对应的所述识别标记信息,确定所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量;Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;
    根据所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量,获得所述视频片段的标记描述信息。According to the number of the identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.
  3. 根据权利要求2所述的视频片段标记方法,其特征在于,所述识别标记信息包括下述信息中的至少其一:The video segment marking method according to claim 2, wherein the identification marking information includes at least one of the following information:
    用于标识所述目标图像帧对应的物体对象类别的识别标记的物体类别信息;Object category information used to identify the identification mark of the object category corresponding to the target image frame;
    用于标识所述目标图像帧对应的场景对象类别的识别标记的场景类别信息;Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame;
    用于标识所述目目标图像帧对应的人脸对象类别的识别标记的人脸类别信息。Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
  4. 根据权利要求3所述的视频片段标记方法,其特征在于,所述人脸类别信息包括下述子信息中的至少其一:The video clip marking method according to claim 3, wherein the face category information includes at least one of the following sub-information:
    用于标识所述目标图像帧对应的表情类别的识别标记的表情子属性信息;Expression sub-attribute information of the recognition tag used to identify the expression category corresponding to the target image frame;
    用于标识所述目标图像帧对应的朝向类别的识别标记的朝向子属性信息;Orientation sub-attribute information of the identification mark used to identify the orientation category corresponding to the target image frame;
    用于标识所述目标图像帧对应的性别的识别标记的性别子属性信息。The gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.
  5. 根据权利要求2所述的视频片段标记方法,其特征在于,所述方法还包括:The video segment marking method according to claim 2, wherein the method further comprises:
    根据所述目标图像帧对应的所述识别标记信息,获得所述视频片段的标记记录信息,其中,所述标记记录信息用于记录所述目标图像帧中的至少一个第二目标类别对应的识别标记。According to the identification mark information corresponding to the target image frame, the mark record information of the video clip is obtained, wherein the mark record information is used to record the identification corresponding to at least one second target category in the target image frame mark.
  6. 根据权利要求2所述的视频片段标记方法,其特征在于,所述属性信息包括用于标识所述目标图像帧对应的时间戳的时间信息,所述方法还包括:The video segment marking method according to claim 2, wherein the attribute information includes time information used to identify a timestamp corresponding to the target image frame, and the method further comprises:
    根据所述目标图像帧对应的所述时间信息,获得所述视频片段的第一时间描述信息和/或第二时间描述信息;其中,所述第一时间描述信息用于记录包括至少一目标标记的所述目标图像帧对应的时间戳,所述第二时间描述信息用于记录所述视频片段的开始时间戳和结束时间戳。According to the time information corresponding to the target image frame, obtain the first time description information and/or the second time description information of the video clip; wherein the first time description information is used for recording including at least one target mark The time stamp corresponding to the target image frame, and the second time description information is used to record the start time stamp and the end time stamp of the video segment.
  7. 根据权利要求1所述的视频片段标记方法,其特征在于,所述N等于8。The video segment marking method according to claim 1, wherein the N is equal to 8.
  8. 一种视频片段标记设备,其特征在于,包括:存储器、处理器、视频采集器,所述视频采集器用于采集目标区域的待跟踪目标;所述存储器用于存储程序代码;所述处理器,调用所述程序代码,当程序代码被执行时,用于执行以下操作:A video segment marking device, which is characterized by comprising: a memory, a processor, and a video collector, the video collector is used to collect a target to be tracked in a target area; the memory is used to store program codes; the processor, The program code is called, and when the program code is executed, it is used to perform the following operations:
    对视频片段中的连续图像帧进行识别,获得所述连续图像帧中至少一个目标图像帧所对应的属性信息;Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;
    根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息,其中,所述标记描述信息包括基于比特位来记录的信息,所述比特位的长度为T*N,T表示所述目标图像帧中的对象类别数量,N为大于或者等于4的整数。According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
  9. 根据权利要求8所述视频片段标记设备,其特征在于,所述属性信息包括用于标识所述目标图像帧对应的至少一对象类别的识别标记的识别标记信息;对应的,根据所述目标图像帧对应的属性信息,获得所述视频片段的标记描述信息包括:The video clip marking device according to claim 8, wherein the attribute information includes identification mark information used to identify an identification mark of at least one object category corresponding to the target image frame; correspondingly, according to the target image The attribute information corresponding to the frame, and obtaining the mark description information of the video segment includes:
    根据所述目标图像帧对应的所述识别标记信息,确定所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数量;Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;
    根据所述目标图像帧中的至少一第一目标类别对应的所述识别标记的数 量,获得所述视频片段的标记描述信息。According to the number of identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.
  10. 根据权利要求9所述视频片段标记设备,其特征在于,所述识别标记信息包括下述信息中的至少其一:The video clip marking device according to claim 9, wherein the identification marking information includes at least one of the following information:
    用于标识所述目标图像帧对应的物体对象类别的识别标记的物体类别信息;Object category information used to identify the identification mark of the object category corresponding to the target image frame;
    用于标识所述目标图像帧对应的场景对象类别的识别标记的场景类别信息;Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame;
    用于标识所述目目标图像帧对应的人脸对象类别的识别标记的人脸类别信息。Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
PCT/CN2020/099832 2020-04-15 2020-07-02 Video clip marking method and device, and handheld camera WO2021208255A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010296290.4 2020-04-15
CN202010296290.4A CN112052357B (en) 2020-04-15 2020-04-15 Video clip marking method and device and handheld camera

Publications (1)

Publication Number Publication Date
WO2021208255A1 true WO2021208255A1 (en) 2021-10-21

Family

ID=73609655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099832 WO2021208255A1 (en) 2020-04-15 2020-07-02 Video clip marking method and device, and handheld camera

Country Status (2)

Country Link
CN (1) CN112052357B (en)
WO (1) WO2021208255A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113163086B (en) * 2021-04-07 2023-04-07 惠州Tcl云创科技有限公司 Be applied to display device's intelligence and shoot accessory structure
CN114598919B (en) * 2022-03-01 2024-03-01 腾讯科技(深圳)有限公司 Video processing method, device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777114A (en) * 2016-12-15 2017-05-31 北京奇艺世纪科技有限公司 A kind of video classification methods and system
CN108694217A (en) * 2017-04-12 2018-10-23 合信息技术(北京)有限公司 The label of video determines method and device
CN109165573A (en) * 2018-08-03 2019-01-08 百度在线网络技术(北京)有限公司 Method and apparatus for extracting video feature vector
US20190057258A1 (en) * 2015-10-30 2019-02-21 Hewlett-Packard Development Company, L.P. Video Content Summarization and Class Selection
CN110263217A (en) * 2019-06-28 2019-09-20 北京奇艺世纪科技有限公司 A kind of video clip label identification method and device
CN110781960A (en) * 2019-10-25 2020-02-11 Oppo广东移动通信有限公司 Training method, classification method, device and equipment of video classification model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3005297B1 (en) * 2013-06-04 2023-09-06 HRL Laboratories, LLC A system for detecting an object of interest in a scene
CN108337532A (en) * 2018-02-13 2018-07-27 腾讯科技(深圳)有限公司 Perform mask method, video broadcasting method, the apparatus and system of segment
CN109121022B (en) * 2018-09-28 2020-05-05 百度在线网络技术(北京)有限公司 Method and apparatus for marking video segments
CN110166827B (en) * 2018-11-27 2022-09-13 深圳市腾讯信息技术有限公司 Video clip determination method and device, storage medium and electronic device
CN110119711B (en) * 2019-05-14 2021-06-11 北京奇艺世纪科技有限公司 Method and device for acquiring character segments of video data and electronic equipment
CN110458008A (en) * 2019-07-04 2019-11-15 深圳壹账通智能科技有限公司 Method for processing video frequency, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190057258A1 (en) * 2015-10-30 2019-02-21 Hewlett-Packard Development Company, L.P. Video Content Summarization and Class Selection
CN106777114A (en) * 2016-12-15 2017-05-31 北京奇艺世纪科技有限公司 A kind of video classification methods and system
CN108694217A (en) * 2017-04-12 2018-10-23 合信息技术(北京)有限公司 The label of video determines method and device
CN109165573A (en) * 2018-08-03 2019-01-08 百度在线网络技术(北京)有限公司 Method and apparatus for extracting video feature vector
CN110263217A (en) * 2019-06-28 2019-09-20 北京奇艺世纪科技有限公司 A kind of video clip label identification method and device
CN110781960A (en) * 2019-10-25 2020-02-11 Oppo广东移动通信有限公司 Training method, classification method, device and equipment of video classification model

Also Published As

Publication number Publication date
CN112052357B (en) 2022-04-01
CN112052357A (en) 2020-12-08

Similar Documents

Publication Publication Date Title
US20230212064A1 (en) Methods for camera movement compensation
US10430909B2 (en) Image retrieval for computing devices
US11368567B2 (en) System and method for improving a photographic camera feature on a portable electronic device
WO2021208253A1 (en) Tracking object determination method and device, and handheld camera
US9007431B1 (en) Enabling the integration of a three hundred and sixty degree panoramic camera within a consumer device case
WO2021208255A1 (en) Video clip marking method and device, and handheld camera
US20060239648A1 (en) System and method for marking and tagging wireless audio and video recordings
CN105704369B (en) A kind of information processing method and device, electronic equipment
US8760551B2 (en) Systems and methods for image capturing based on user interest
US20100287502A1 (en) Image search device and image search method
EP2092461A1 (en) User interface for face recognition
CN108886574A (en) A kind of shooting bootstrap technique, equipment and system
CN102158649A (en) Photographic device and photographic method thereof
WO2021208256A1 (en) Video processing method and apparatus, and handheld camera
CN105893997A (en) Image and text scanning pen and scanning method for area-of-interest for users
WO2021208251A1 (en) Face tracking method and face tracking device
CN109257649A (en) A kind of multimedia file producting method and terminal device
WO2021208252A1 (en) Tracking target determination method, device, and hand-held camera
WO2021208254A1 (en) Tracking target recovery method and device, and handheld camera
WO2021208258A1 (en) Method and apparatus for searching for tracked object, and hand-held camera thereof
WO2021208257A1 (en) Tracking state determination method and device, and handheld camera
WO2022206605A1 (en) Method for determining target object, and photographing method and device
WO2021208250A1 (en) Face tracking method and face tracking device
WO2021208259A1 (en) Gimbal driving method and device, and handheld camera
WO2021208260A1 (en) Method and device for displaying tracking frame of target object, and handheld camera

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20930843

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20930843

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20930843

Country of ref document: EP

Kind code of ref document: A1