WO2021208255A1

WO2021208255A1 - Video clip marking method and device, and handheld camera

Info

Publication number: WO2021208255A1
Application number: PCT/CN2020/099832
Authority: WO
Inventors: 康含玉; 梁峰
Original assignee: 上海摩象网络科技有限公司
Priority date: 2020-04-15
Filing date: 2020-07-02
Publication date: 2021-10-21
Also published as: CN112052357B; CN112052357A

Abstract

A video clip marking method and device, and a handheld camera. The method comprises: identifying continuous images in a video clip, so as to obtain attribute information corresponding to at least one target image in the continuous images (S101); according to the attribute information corresponding to the target image, obtaining marking description information of the video clip (S102), the marking description information comprising information recorded on the basis of a bit, the length of the bit being T*N, T representing the number of object categories in the target image, N being an integer greater than or equal to 4. Said method can not only record, in a unified manner, results obtained by identifying continuous images in a video clip by different image identification algorithms, but also greatly save storage space.

Description

Video segment marking method, equipment and handheld camera

Technical field

The embodiments of the present application relate to the field of image processing technologies, and in particular, to a video clip marking method, device, and handheld camera.

Background technique

With the development of image processing technology, more and more image recognition algorithms have appeared. After the continuous image frames in the video clip are identified and marked by the image recognition algorithm, the description information used to describe the video clip can be generated, so that the video clip or part of the image in the video clip can be subsequently determined according to the description information corresponding to the video clip. Frame search, clustering and other processing.

In order to meet different video processing needs, a variety of image recognition algorithms are usually used to identify continuous image frames in the video, but the description information generated by different types of image recognition algorithms will be recorded in different ways, leading to subsequent searches It is inconvenient to use the description information in processing such as, clustering, etc.

Summary of the invention

In view of this, one of the technical problems solved by the embodiments of the present invention is to provide a video segment marking method, device, and handheld camera to overcome the inconsistency of the description information recording method generated by multiple image recognition algorithms in the prior art , Which is not conducive to the defects of subsequent data processing and storage.

The embodiment of the application provides a video segment marking method, including:

Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;

According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.

Optionally, the attribute information includes identification mark information for identifying an identification mark of at least one object category corresponding to the target image frame; correspondingly, the video clip is obtained according to the attribute information corresponding to the target image frame The tag description information includes:

Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;

According to the number of the identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.

Optionally, the identification mark information includes at least one of the following information:

Object category information used to identify the identification mark of the object category corresponding to the target image frame; scene category information used to identify the identification mark of the scene object category corresponding to the target image frame; used to identify the target image Face category information of the recognition mark of the face object category corresponding to the frame.

Optionally, the face category information includes at least one of the following sub-information: expression sub-attribute information used to identify the recognition mark of the expression category corresponding to the target image frame; and used to identify the target image frame The orientation sub-attribute information of the identification mark of the corresponding orientation category; the gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.

Optionally, the method further includes: obtaining the mark recording information of the video clip according to the identification mark information corresponding to the target image frame, wherein the mark recording information is used to record in the target image frame At least one identification mark corresponding to the second target category.

Optionally, the attribute information includes time information used to identify the time stamp corresponding to the target image frame, and the method further includes: obtaining the time information of the video clip according to the time information corresponding to the target image frame. The first time description information and/or the second time description information; wherein the first time description information is used to record the time stamp corresponding to the target image frame including at least one target mark, and the second time description information is used To record the start time stamp and the end time stamp of the video segment.

Optionally, the N is equal to 8.

An embodiment of the present application also provides a video clip marking device, including: a memory, a processor, and a video collector, where the video collector is used to collect a target to be tracked in a target area; the memory is used to store program code; The processor calls the program code, and when the program code is executed, it is used to perform the following operations:

An embodiment of the present application also provides a handheld camera, including the video clip marking device according to the foregoing, and is characterized in that it further includes a carrier, which is fixedly connected to the video collector and is used to carry the video. At least part of the collector.

Optionally, the carrier includes but is not limited to a handheld pan/tilt.

Optionally, the handheld PTZ is a handheld three-axis PTZ.

Optionally, the video capture device includes, but is not limited to, a handheld three-axis pan/tilt camera.

In the embodiment of the present application, the attribute information corresponding to at least one target image frame in the continuous image frame is obtained by recognizing the continuous image frames in the video segment; then the tag description of the video segment is obtained according to the attribute information corresponding to the target image frame Information, where the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4. Therefore, the embodiment of the present invention can not only record the recognition results of consecutive image frames in the video segment by different image recognition algorithms in a unified manner, but also greatly save storage space.

Description of the drawings

Hereinafter, some specific embodiments of the embodiments of the present application will be described in detail in an exemplary but not restrictive manner with reference to the accompanying drawings. The same reference numerals in the drawings indicate the same or similar components or parts. Those skilled in the art should understand that these drawings are not necessarily drawn in ratios. In the attached picture:

FIG. 1 is a schematic flowchart of a method for marking video clips provided in Embodiment 1 of this application;

FIG. 2 is a schematic flowchart of a method for marking video clips provided in Embodiment 2 of the present application;

FIG. 3 is a schematic flowchart of a method for marking video clips provided in Embodiment 3 of the present application;

4 is a schematic structural diagram of a video segment marking device provided in Embodiment 4 of this application;

FIG. 5 is a schematic structural diagram of a handheld pan/tilt head provided by Embodiment 5 of the application; FIG.

FIG. 6 is a schematic structural diagram of a handheld PTZ connected with a mobile phone according to Embodiment 5 of the application;

FIG. 7 is a schematic structural diagram of a handheld pan/tilt head provided in Embodiment 5 of this application.

Detailed ways

The terms used in the present invention are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. The singular forms of "a", "said" and "the" used in the present invention and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items.

It should be understood that the "first", "second" and similar words used in the specification and claims of this application do not denote any order, quantity or importance, but are only used to distinguish different components. Similarly, similar words such as "one" or "one" do not mean a quantity limit, but instead mean that there is at least one.

The specific implementation of the embodiments of the present invention will be further described below in conjunction with the accompanying drawings of the embodiments of the present invention.

Example one

Embodiment 1 of the present application provides a video segment marking method, as shown in FIG. 1. FIG. 1 is a schematic flowchart of a video segment marking provided by an embodiment of this application, including:

Step S101: Recognizing continuous image frames in a video clip, and obtaining attribute information corresponding to at least one target image frame in the continuous image frames.

In this embodiment, the video clip includes multiple consecutive image frames, and the number of consecutive image frames in the video clip is not limited. For example, when processing a long video, one long video can be divided into multiple short video segments, and the number of consecutive image frames included in each video segment can be a fixed value or a non-fixed value.

In this embodiment, one or more image recognition algorithms may be used to recognize consecutive image frames in the video segment. The type of image recognition algorithm selected is not limited, and it can be selected according to the video processing requirements or the hardware configuration to perform the processing in practical applications.

In this embodiment, the target image frame is part or all of the continuous image frames in the video clip. After at least one image recognition algorithm recognizes the continuous image frame, it can generate attribute information for identifying the recognition result of the target image frame. . The type of information included in the attribute information and the way of identifying the information are not limited, and it mainly depends on the image recognition algorithm that recognizes the target image frame.

For example, an image recognition algorithm for identifying object categories can be used to obtain attribute information for identifying whether the target image frame includes objects such as people, cats, dogs, etc.; an image recognition algorithm for identifying scene categories can be used to obtain an image for identifying the target Whether the frame includes the attribute information of the sky, sea, grass and other scene objects.

Step S102: Obtain tag description information of the video clip according to the attribute information corresponding to the target image frame.

In this embodiment, the mark description information is used to record the description content of the image recognition result of the target image frame, so that subsequent video processing operations such as similarity comparison and clustering between video clips can be performed according to the mark description information. There is no limit to the manner in which the tag description information describes the image recognition result of the target image frame. For example, the tag description information can be used to describe how many cats have appeared in the video clip in total, or used to describe the magnitude of the cats that have appeared in the video clip, and so on.

In this embodiment, the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4. Among them, the value of T can be determined according to the subsequent video processing requirements and/or the image recognition results of continuous image frames in the video clip; the value of N can be determined according to the subsequent video processing requirements and/or the hardware storage space for data processing Sure.

For example, if the object categories in the target image frame include human, cat, and dog, the bit length is 3N. When the value of N is 4, that is, the bit length of each category is 4 bits, and the three categories of human, cat, and dog need to use a total of 12 bits for recording.

In this embodiment, the bit is the smallest storage unit of the computer, and the value of the bit is represented by 0 or 1. The more the bit can record the more complex image information, the rule of recording based on the bit in this embodiment Not limited, in actual applications, the bit recording rules can be set according to subsequent video processing requirements and/or video clip content.

For example, if the tag description information is used to record the number of faces included in all target image frames of the video clip, when the value of N is set to 4, you can use 0001 to record a total of 0 faces, and use 0010 to record a total of 1 For faces, use 0100 to record a total of 2 faces, and use 1000 to record a total of more than 3 faces.

For another example, when the value of N is set to 5, you can use 00000 to record a total of 0 faces, use 00001 to record a total of 1 face, use 00010 to record a total of 2 faces, and use 00011 to record a total of 3 faces. A total of 4 faces, etc. are recorded using 00100.

Using bit-based information to record, on the one hand, the results of image processing by different image recognition algorithms can be recorded in a unified manner, which is convenient for subsequent video processing operations; on the other hand, it can also greatly save storage space.

Optionally, through multiple application tests, it is widely applicable to video clips including different contents and does not occupy much storage space. Preferably, N is equal to 8.

It can be seen from the above embodiments of the present invention that the embodiments of the present invention first identify the continuous image frames in the video clip to obtain the attribute information corresponding to at least one target image frame in the continuous image frames; then, according to the attribute information corresponding to the target image frame, Obtain the mark description information of the video segment, where the mark description information includes information recorded based on bits, the length of the bits is T*N, T represents the number of object categories in the target image frame, and N is an integer greater than or equal to 4 . Therefore, the embodiment of the present invention can not only record the recognition results of consecutive image frames in the video segment by multiple image recognition algorithms in a unified manner, but also greatly save data storage space.

Example two

The second embodiment of the present application provides a video segment marking method. As shown in FIG. 2, FIG. 2 is a schematic flowchart of a video segment marking provided by an embodiment of the application, including:

Step S201: Recognizing continuous image frames in a video clip to obtain attribute information corresponding to at least one target image frame in the continuous image frames, where the attribute information includes identification mark information.

In this embodiment, a variety of different image recognition algorithms can be used to recognize consecutive image frames in a video clip, and the target image frame or the objects included in the target image frame are classified according to multiple angles to obtain at least one The identification mark information corresponding to the target image frame. Wherein, the identification mark information is used to identify the identification mark of at least one object category corresponding to the target image frame. One target image frame or one object in the target image frame may correspond to the identification mark of one or more object categories. A variety of different identification marks can be included.

For example, when the image recognition algorithm is used to identify the objects included in the continuous image frames in the video clip corresponding to the "object category" and "dog category", if three dogs are included in the target image frame, the identification mark information can be included. Use "DOG" to identify the corresponding identification marks of the three dogs in the "object category", and use "01", "02", and "03" to identify the corresponding identification marks of the three dogs in the "dog category".

Optionally, while meeting subsequent video processing requirements, in order to reduce the amount of data processing as much as possible, only the continuous image frames in the video segment may be identified for common object categories. Specifically, the identification mark information may include at least one of the following information: object category information used to identify the identification mark of the object object category corresponding to the target image frame; and the identification mark used to identify the scene object category corresponding to the target image frame Scene category information; face category information used to identify the recognition mark of the face object category corresponding to the target image frame.

Among them, the object category is to classify the objects included in the target image frame, and the angle of the classification and the corresponding identification mark can be determined according to the video processing requirements or the adopted image recognition algorithm. For example, the identification mark corresponding to the object category can be used to identify objects of different animal categories such as "people", "cats", and "dogs", and it can also be used to identify different object categories such as "animals", "plants", and "daily necessities". Objects.

The scene object category is to classify the scene objects included in the target image frame. The angle of the classification and the corresponding identification identifier can be determined according to the video processing requirements or the image recognition algorithm used. For example, the identification mark corresponding to the scene object category can be used to identify scene objects in different weather categories such as "rainy", "sunny", and "cloudy", and it can also be used to identify different backgrounds such as "grassland", "sky", and "sea". The category of scene objects.

The face object category is to classify the face objects included in the target image frame. The angle of the classification and the corresponding identification identifier can be determined according to the video processing requirements or the image recognition algorithm used. For example, the recognition mark corresponding to the face object category can be used to identify face objects of different age groups such as "elderly", "middle-aged", "child", etc., and can also be used to identify "round face" and "square face". , "Melon seed face" and other face objects with different face shapes.

Optionally, with the development of the Internet and video shooting-related technologies, users pay more attention to the recognition and processing of human objects when performing video shooting or processing. Therefore, in order to meet the needs of most users, the target image frame can be targeted The face objects of the face objects are recognized and identified in more categories. Specifically, the face object category information includes at least one of the following sub-information: expression sub-attribute information used to identify the recognition mark of the expression category corresponding to the target image frame; recognition used to identify the orientation category corresponding to the target image frame The orientation sub-attribute information of the mark; the gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.

The expression category is to classify the human faces included in the target image frame according to expressions. For example, the recognition mark corresponding to the expression category can be used to identify facial expressions such as "laughing", "cry", and "in a daze".

The orientation category is to classify the faces included in the target image frame according to the face orientation. For example, the identification mark corresponding to the orientation category can be used to identify the face orientations such as "front", "back", and "side".

Gender is to classify the faces included in the target image frame according to gender. For example, the identification mark corresponding to the gender can be used to identify "male", "female", and "uncertain".

Step S202: Determine the number of identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame.

In this embodiment, in order to reduce the amount of data processing and storage, the tag description information obtained subsequently may only describe and record more important object categories. Specifically, in step S202, at least one of all object categories may be determined as the first target category, so that all the identification marks corresponding to the first target category in the target image frame can be determined according to the identification mark information corresponding to the target image frame. quantity.

For example, if the first target category is "dog", the target image frame A, target image frame B, and target image frame C in the video clip all include the corresponding "dog" identification mark, and the identification mark "01" "And "02" mark the two dogs appearing in the target image frame A, mark the two dogs appearing in the target image frame B with the identification marks "01" and "03", and mark the target image frame C with the identification mark "02" If a dog appears in the video clip, three dogs are marked with the identification marks "01", "02", and "03" in the video clip. That is, the number of all the identification marks corresponding to "dogs" in the video clip is 3.

Step S203: Obtain mark description information of the video clip according to the number of identification marks corresponding to at least one first target category in the target image frame.

In this embodiment, the recording method of the mark description information of the video segment is the same as that in step S102 in the first embodiment, and the details are not described herein again in this embodiment.

In this embodiment, in order to record all or part of the more important identification marks obtained by identification for subsequent video processing, it may further include: obtaining mark recording information of the video segment according to the identification mark information corresponding to the target image frame, where , The mark recording information is used to record the identification mark corresponding to at least one second target category in the target image frame.

Among them, the second target category can be the same or different from the aforementioned first target category; in addition, the identification mark corresponding to the second target category can be all the identification marks corresponding to the second target category, or it can be the part corresponding to the second target category. The identification mark can be selected reasonably according to the subsequent video processing requirements in practical applications.

For example, if the determined second target category is an expression category, the target image frame includes three identification marks of "laugh", "cry", and "in a daze" corresponding to the expression category, and the identification record information can only be used to record "laugh". The two identification marks of "" and "cry" can also be used to record the three identification marks of "laughing", "cry", and "in a daze".

Optionally, because different image recognition algorithms recognize the target image frame after the target image frame is identified, the content of the identification mark or the identification method are different, in order to facilitate subsequent video processing, the same recording method can be used to record all the identifications corresponding to the second target category. mark.

Optionally, in order to save storage space, an int type ID may be used to record the identification mark corresponding to the second target category, where each ID corresponds to an identification mark.

It can be seen from the above embodiments of the present invention that, according to the identification mark information corresponding to the target image frame, the embodiment of the present invention can obtain video clip description information used to record the number of identification marks corresponding to at least one first target category; and by selecting subsequent videos Dealing with commonly used object categories to identify continuous image frames in video clips can reduce data processing and storage; by adopting a unified way to record mark and record information, it is convenient for subsequent management and use of data.

Example three

The third embodiment of the present application provides a video segment marking method, as shown in FIG. 3, which is a schematic flowchart of a video segment marking provided by an embodiment of the application, including:

Step S301: Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames, where the attribute information includes identification mark information and time information.

In this embodiment, since the continuous image frames in the video clip all include the corresponding time stamp, in order to describe the time-related information of the video clip, when the continuous image frames in the video clip are identified, they can be used to identify Time information of the timestamp corresponding to the target image frame.

Step S302: Obtain the tag description information of the video segment according to the identification tag information corresponding to the target image frame, and obtain the first time description information and/or the second time description information according to the time information corresponding to the target image frame.

In this embodiment, the first time description information is used to record the time stamp corresponding to the target image frame including at least one target mark, so that the object or target image frame identified by the target mark can be determined in the video clip according to the first time description information. Time of appearance. According to the first time description information, subsequent video processing operations such as clustering and screening of target image frames or video fragments including target tags can be performed more conveniently.

For example, when recognizing and describing a video clip, the user may focus on the appearance of a cat. To meet this requirement, the cat can be identified using a preset target mark in the target image frame; By obtaining the timestamp corresponding to at least one target image frame including the target mark, the total appearance time of the cat in the video segment can be determined; thus, the first time describing the appearance time of the cat in the video segment can be finally generated Description.

Optionally, in order to effectively store data and save storage space, the first time description information can be recorded using an array structure, where the numbers stored in the array are used to identify the timestamp corresponding to the target image frame including at least one target mark.

Among them, in practical applications, the target mark is usually used to mark objects required for subsequent video processing or objects that the user pays more attention to. The target mark is one or more of the identification marks corresponding to at least one object category, which can be preset according to video description requirements.

In this embodiment, the second time description information is used to record the start time stamp and the end time stamp of the video segment, so that the start and end time of the video segment can be determined subsequently based on the second time description information.

Wherein, the start timestamp of the video segment is the timestamp corresponding to the first one of the continuous image frames of the video segment, and the end timestamp of the video segment is the timestamp corresponding to the last one of the continuous image frames of the video segment.

Optionally, in order to effectively perform data recording and save storage space, the second time description information may be recorded using a series of numbers that identify the start time stamp and the end time stamp.

It can be seen from the above embodiments of the present invention that the embodiment of the present invention obtains the first time description information and/or the second time description information according to the time information corresponding to the target image frame, and can describe and record the time-related information of the video clip. The information describing the video segment may include multiple types of tag description information, first time description information, and/or second time description information, which can better meet subsequent video processing requirements.

Example four

As shown in FIG. 4, FIG. 4 is a video processing device 40 provided in the fourth embodiment of the application, including: a memory 401, a processor 402, and a video collector 403. Tracking target; the memory 401 is used to store program code; the processor 402 calls the program code, and when the program code is executed, it is used to perform the following operations:

In one embodiment, the attribute information includes identification mark information for identifying the identification mark of at least one object category corresponding to the target image frame; correspondingly, the attribute information corresponding to the target image frame is obtained. The tag description information of the video clip includes:

In an embodiment, the identification mark information includes at least one of the following information:

In an embodiment, the face category information includes at least one of the following sub-information:

Expression sub-attribute information used to identify the recognition tag of the expression category corresponding to the target image frame; orientation sub-attribute information used to identify the recognition tag of the orientation category corresponding to the target image frame; used to identify the target image frame The gender sub-attribute information of the corresponding gender identification mark.

In one embodiment, the processor calls the program code, and when the program code is executed, it is further configured to perform the following operations: obtain the video clip according to the identification mark information corresponding to the target image frame The mark recording information of the, wherein the mark recording information is used to record the identification mark corresponding to at least one second target category in the target image frame.

In an embodiment, the attribute information includes time information used to identify the time stamp corresponding to the target image frame; the processor calls the program code, and when the program code is executed, it is also used to execute the following Operation: Obtain the first time description information and/or the second time description information of the video clip according to the time information corresponding to the target image frame; wherein the first time description information used for recording includes at least one A timestamp corresponding to the target image frame of the target mark, and the second time description information is used to record the start timestamp and the end timestamp of the video segment.

In one embodiment, the N is equal to 8.

Example five

A handheld camera, including the video processing device described in the fourth embodiment, further includes: a carrier, which is fixedly connected to the video collector, and is configured to carry at least a part of the video collector.

In one embodiment, the carrier includes, but is not limited to, a handheld pan/tilt.

In one embodiment, the handheld pan/tilt is a handheld three-axis pan/tilt.

In one embodiment, the video capture device includes, but is not limited to, a handheld three-axis pan-tilt camera.

The basic structure of the handheld pan/tilt camera is briefly introduced below.

As shown in FIG. 5, the handheld pan/tilt head 1 of the embodiment of the present invention includes a handle 11 and a photographing device 12 loaded on the handle 11. In this embodiment, the photographing device 12 may include a three-axis pan/tilt camera , In other embodiments, it includes a pan-tilt camera with two axes or more than three axes.

The handle 11 is provided with a display screen 13 for displaying the shooting content of the shooting device 12. The invention does not limit the type of the display screen 13.

By setting the display screen 13 on the handle 11 of the handheld PTZ 1, the display screen can display the shooting content of the shooting device 12, so that the user can quickly browse the pictures or videos shot by the shooting device 12 through the display screen 13, thereby improving The interaction and fun of the handheld PTZ 1 with the user meets the diverse needs of the user.

In one embodiment, the handle 11 is further provided with an operating function unit for controlling the camera 12, and by operating the operating function unit, the operation of the camera 12 can be controlled, for example, the opening and closing of the camera 12 can be controlled. Turning off and controlling the shooting of the shooting device 12, controlling the posture change of the pan-tilt part of the shooting device 12, etc., so that the user can quickly operate the shooting device 12. Wherein, the operation function part may be in the form of a button, a knob or a touch screen.

In one embodiment, the operating function unit includes a photographing button 14 for controlling the photographing of the photographing device 12, a power/function button 15 for controlling the opening and closing of the photographing device 12 and other functions, as well as controlling the pan/tilt. Move the universal key 16. Of course, the operating function unit may also include other control buttons, such as image storage buttons, image playback control buttons, etc., which can be set according to actual needs.

In one embodiment, the operation function part and the display screen 13 are arranged on the same side of the handle 11. The operation function part and the display screen 13 shown in FIG. Engineering, and at the same time make the overall appearance and layout of the handheld PTZ 1 more reasonable and beautiful.

Further, the side of the handle 11 is provided with a function operation key A, which is used to facilitate the user to quickly and intelligently form a sheet with one key. When the camera is turned on, click the orange side button on the right side of the fuselage to turn on the function, and it will automatically shoot a segment of video at regular intervals. A total of N segments (N≥2) will be captured. After connecting to a mobile device such as a mobile phone, select "One-click to film" Function, the system intelligently screens shots and matches suitable templates to quickly generate wonderful works.

In an optional embodiment, the handle 11 is further provided with a card slot 17 for inserting a storage element. In this embodiment, the card slot 17 is provided on the side of the handle 11 adjacent to the display screen 13, and a memory card is inserted into the card slot 17 to store the images taken by the camera 12 in the memory card. . Moreover, arranging the card slot 17 on the side does not affect the use of other functions, and the user experience is better.

In an embodiment, a power supply battery for supplying power to the handle 11 and the imaging device 12 may be provided inside the handle 11. The power supply battery can be a lithium battery with large capacity and small size to realize the miniaturized design of the handheld pan/tilt 1.

In one embodiment, the handle 11 is also provided with a charging interface/USB interface 18. In this embodiment, the charging interface/USB interface 18 is provided at the bottom of the handle 11 to facilitate connection with an external power source or storage device, so as to charge the power supply battery or perform data transmission.

In one embodiment, the handle 11 is further provided with a sound pickup hole 19 for receiving audio signals, and the sound pickup hole 19 communicates with a microphone inside. The sound pickup hole 19 may include one or more. It also includes an indicator light 20 for displaying status. The user can realize audio interaction with the display screen 13 through the sound pickup hole 19. In addition, the indicator light 20 can serve as a reminder, and the user can obtain the power status of the handheld PTZ 1 and the current execution function status through the indicator light 20. In addition, the sound pickup hole 19 and the indicator light 20 can also be arranged on the front of the handle 11, which is more in line with the user's usage habits and operation convenience.

In one embodiment, the imaging device 12 includes a pan-tilt support and a camera mounted on the pan-tilt support. The imager may be a camera, or an image pickup element composed of a lens and an image sensor (such as CMOS or CCD), etc., which can be specifically selected according to needs. The camera may be integrated on the pan-tilt support, so that the photographing device 12 is a pan-tilt camera; it may also be an external photographing device, which can be detachably connected or clamped to be mounted on the pan-tilt support.

In one embodiment, the pan/tilt support is a three-axis pan/tilt support, and the photographing device 12 is a three-axis pan/tilt camera. The three-axis pan/tilt head bracket includes a yaw axis assembly 22, a roll axis assembly 23 movably connected to the yaw axis assembly 22, and a pitch axis assembly 24 movably connected to the roll axis assembly 23. The camera is mounted on the pitch axis assembly 24. The yaw axis assembly 22 drives the camera 12 to rotate in the yaw direction. Of course, in other examples, the pan/tilt support can also be a two-axis pan/tilt, a four-axis pan/tilt, etc., which can be specifically selected according to needs.

In one embodiment, a mounting portion is further provided, the mounting portion is provided at one end of the connecting arm connected to the roll shaft assembly, and the yaw shaft assembly may be set in the handle, and the yaw shaft assembly drives The camera 12 rotates in the yaw direction together.

In an alternative embodiment, as shown in FIG. 6, the handle 11 is provided with an adapter 26 for coupling with a mobile device 2 (such as a mobile phone), and the adapter 26 and the handle 11 can be Disconnect the connection. The adapter 26 protrudes from the side of the handle for connecting to the mobile device 2. When the adapter 26 is connected to the mobile device 2, the handheld platform 1 and The adapter 26 is docked and used to be supported at the end of the mobile device 2.

The handle 11 is provided with an adapter 26 for connecting with the mobile device 2 to connect the handle 11 and the mobile device 2 to each other. The handle 11 can be used as a base of the mobile device 2. The user can hold the other end of the mobile device 2 Let's pick up and operate the handheld PTZ 1 together, the connection is convenient and fast, and the product is beautiful. In addition, after the handle 11 is coupled to the mobile device 2 through the adapter 26, a communication connection between the handheld pan-tilt 1 and the mobile device 2 can be realized, and the camera 12 and the mobile device 2 can transmit data.

In one embodiment, the adapter 26 and the handle 11 are detachably connected, that is, the adapter 26 and the handle 11 can be mechanically connected or removed. Further, the adapter 26 is provided with an electrical contact portion, and the handle 11 is provided with an electrical contact matching portion that matches with the electrical contact portion.

In this way, when the handheld PTZ 1 does not need to be connected to the mobile device 2, the adapter 26 can be removed from the handle 11. When the handheld PTZ 1 needs to be connected to the mobile device 2, then the adapter 26 is installed on the handle 11 to complete the mechanical connection between the adapter 26 and the handle 11, and at the same time through the electrical contact part and the electrical contact mating part. The connection ensures the electrical connection between the two, so as to realize the data transmission between the camera 12 and the mobile device 2 through the adapter 26.

In one embodiment, as shown in FIG. 5, a receiving groove 27 is provided on the side of the handle 11, and the adapter 26 is slidably clamped in the receiving groove 27. After the adapter 26 is installed in the receiving slot 27, the adapter 26 partially protrudes from the receiving slot 27, and the portion of the adapter 26 protruding from the receiving slot 27 is used to connect with the mobile device 2.

In one embodiment, referring to FIG. 5, when the adapter 26 is inserted into the receiving groove 27 from the adapter 26, the adapter part is flush with the receiving groove 27, and then The adapter 26 is stored in the receiving groove 27 of the handle 11.

Therefore, when the handheld platform 1 needs to be connected to the mobile device 2, the adapter 26 can be inserted into the receiving groove 27 from the adapter part, so that the adapter 26 protrudes from the receiving groove 27, So that the mobile device 2 and the handle 11 are connected to each other

After the mobile device 2 is used up, or when the mobile device 2 needs to be unplugged, the adapter 26 can be taken out of the receiving slot 27 of the handle 11, and then inserted into the receiving slot from the adapter 26 in the reverse direction 27, the adapter 26 is further stored in the handle 11. The adapter 26 is flush with the receiving groove 27 of the handle 11. After the adapter 26 is stored in the handle 11, the surface of the handle 11 can be ensured to be flat, and the adapter 26 is stored in the handle 11 to make it easier to carry.

In one embodiment, the receiving groove 27 is semi-opened on one side surface of the handle 11, which makes it easier for the adapter 26 to be slidably connected to the receiving groove 27. Of course, in other examples, the adapter 26 can also be detachably connected to the receiving slot 27 of the handle 11 by means of a snap connection, a plug connection, or the like.

In one embodiment, the receiving groove 27 is provided on the side of the handle 11. When the transfer function is not used, the receiving groove 27 is clamped and covered by the cover 28, which is convenient for the user to operate, and does not affect the front and sides of the handle. The overall appearance.

In an embodiment, the electrical contact part and the electrical contact mating part may be electrically connected in a contact contact manner. For example, the electrical contact portion can be selected as a telescopic probe, can also be selected as an electrical plug-in interface, or can be selected as an electrical contact. Of course, in other examples, the electrical contact portion and the electrical contact mating portion can also be directly connected to each other in a surface-to-surface contact manner.

A1. A method for marking video clips, characterized in that it comprises:

A2. The video segment marking method according to A1, wherein the attribute information includes identification mark information for identifying identification marks of at least one object category corresponding to the target image frame; correspondingly, according to the target The attribute information corresponding to the image frame, and obtaining the mark description information of the video segment includes:

A3. The video clip marking method according to A2, wherein the identification marking information includes at least one of the following information:

Object category information used to identify the identification mark of the object category corresponding to the target image frame;

Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame;

Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.

A4. The video clip marking method according to A3, wherein the face category information includes at least one of the following sub-information:

Expression sub-attribute information of the recognition tag used to identify the expression category corresponding to the target image frame;

Orientation sub-attribute information of the identification mark used to identify the orientation category corresponding to the target image frame;

The gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.

A5. The video segment marking method according to A2, wherein the method further includes:

According to the identification mark information corresponding to the target image frame, the mark record information of the video clip is obtained, wherein the mark record information is used to record the identification corresponding to at least one second target category in the target image frame mark.

A6. The video segment marking method according to A2, wherein the attribute information includes time information used to identify a timestamp corresponding to the target image frame, and the method further includes:

According to the time information corresponding to the target image frame, obtain the first time description information and/or the second time description information of the video clip; wherein the first time description information is used for recording including at least one target mark The time stamp corresponding to the target image frame, and the second time description information is used to record the start time stamp and the end time stamp of the video segment.

A7. The video segment marking method according to A1, wherein the N is equal to 8.

A8. A video segment marking device, characterized by comprising: a memory, a processor, and a video collector, the video collector is used to collect a target to be tracked in a target area; the memory is used to store program code; the processing The program code is called, and when the program code is executed, it is used to perform the following operations:

A9. The video clip marking device according to A8, wherein the attribute information includes identification mark information for identifying identification marks of at least one object category corresponding to the target image frame; correspondingly, according to the target image The attribute information corresponding to the frame, and obtaining the mark description information of the video segment includes:

A10. The video clip marking device according to A9, wherein the identification marking information includes at least one of the following information:

A11. The video clip marking device according to A10, wherein the face category information includes at least one of the following sub-information:

A12. The video clip marking device according to A9, wherein the processor calls the program code, and when the program code is executed, it is further configured to perform the following operations:

A13. The video clip marking device according to A9, wherein the attribute information includes time information used to identify the timestamp corresponding to the target image frame; the processor calls the program code, when the program code When executed, it is also used to perform the following operations:

A14. The video segment marking device according to A8, wherein the N is equal to 8.

A15. A handheld camera, characterized by comprising the video clip marking device according to any one of A8-A14, characterized by further comprising: a carrier, which is fixedly connected to the video collector , Used to carry at least a part of the video collector.

A16. The handheld camera according to A15, wherein the carrier includes but is not limited to a handheld pan/tilt.

A17. The handheld camera according to A16, wherein the handheld PTZ is a handheld three-axis PTZ.

A18. The handheld camera according to A15, wherein the video capture device includes but is not limited to a handheld three-axis pan-tilt camera.

So far, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired result. In certain embodiments, multitasking and parallel processing may be advantageous.

In the 1990s, the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) and software improvements (improvements in method flow). However, with the development of technology, the improvement of many methods and processes of today can be regarded as a direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by the hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)) is such an integrated circuit whose logic function is determined by the user's programming of the device. It is programmed by the designer to "integrate" a digital system on a PLD without requiring the chip manufacturer to design and manufacture a dedicated integrated circuit chip. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly realized by using "logic compiler" software, which is similar to the software compiler used in program development and writing, but before compilation The original code must also be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one type of HDL, but many types, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description), etc., currently most commonly used It is VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that just a little bit of logic programming of the method flow in the above-mentioned hardware description languages and programming into an integrated circuit can easily obtain the hardware circuit that implements the logic method flow.

The controller can be implemented in any suitable manner. For example, the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the memory control logic. Those skilled in the art also know that, in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded logic. The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.

The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.

This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.

Those skilled in the art should understand that the embodiments of the present application can be provided as a method, a system, or a computer program product. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific transactions or implement specific abstract data types. This application can also be practiced in distributed computing environments. In these distributed computing environments, remote processing devices connected through a communication network execute transactions. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.

The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the difference from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

The above descriptions are only examples of the present application, and are not used to limit the present application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims

A video segment marking method, characterized in that it comprises:

Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;

According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
The video segment marking method according to claim 1, wherein the attribute information includes identification mark information used to identify an identification mark of at least one object category corresponding to the target image frame; correspondingly, according to the target The attribute information corresponding to the image frame, and obtaining the mark description information of the video segment includes:

Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;

According to the number of the identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.
The video segment marking method according to claim 2, wherein the identification marking information includes at least one of the following information:

Object category information used to identify the identification mark of the object category corresponding to the target image frame;

Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame;

Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.
The video clip marking method according to claim 3, wherein the face category information includes at least one of the following sub-information:

Expression sub-attribute information of the recognition tag used to identify the expression category corresponding to the target image frame;

Orientation sub-attribute information of the identification mark used to identify the orientation category corresponding to the target image frame;

The gender sub-attribute information of the identification mark used to identify the gender corresponding to the target image frame.
The video segment marking method according to claim 2, wherein the method further comprises:

According to the identification mark information corresponding to the target image frame, the mark record information of the video clip is obtained, wherein the mark record information is used to record the identification corresponding to at least one second target category in the target image frame mark.
The video segment marking method according to claim 2, wherein the attribute information includes time information used to identify a timestamp corresponding to the target image frame, and the method further comprises:

According to the time information corresponding to the target image frame, obtain the first time description information and/or the second time description information of the video clip; wherein the first time description information is used for recording including at least one target mark The time stamp corresponding to the target image frame, and the second time description information is used to record the start time stamp and the end time stamp of the video segment.
The video segment marking method according to claim 1, wherein the N is equal to 8.
A video segment marking device, which is characterized by comprising: a memory, a processor, and a video collector, the video collector is used to collect a target to be tracked in a target area; the memory is used to store program codes; the processor, The program code is called, and when the program code is executed, it is used to perform the following operations:

Recognizing continuous image frames in the video segment to obtain attribute information corresponding to at least one target image frame in the continuous image frames;

According to the attribute information corresponding to the target image frame, the mark description information of the video segment is obtained, wherein the mark description information includes information recorded based on bits, and the length of the bits is T*N, where T means For the number of object categories in the target image frame, N is an integer greater than or equal to 4.
The video clip marking device according to claim 8, wherein the attribute information includes identification mark information used to identify an identification mark of at least one object category corresponding to the target image frame; correspondingly, according to the target image The attribute information corresponding to the frame, and obtaining the mark description information of the video segment includes:

Determine the number of the identification marks corresponding to at least one first target category in the target image frame according to the identification mark information corresponding to the target image frame;

According to the number of identification marks corresponding to at least one first target category in the target image frame, the mark description information of the video clip is obtained.
The video clip marking device according to claim 9, wherein the identification marking information includes at least one of the following information:

Object category information used to identify the identification mark of the object category corresponding to the target image frame;

Scene category information used to identify the identification mark of the scene object category corresponding to the target image frame;

Face category information used to identify the recognition mark of the face object category corresponding to the target image frame.