CN111079741A - Image frame position detection method and device, electronic equipment and storage medium - Google Patents

Image frame position detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111079741A
CN111079741A CN201911215002.1A CN201911215002A CN111079741A CN 111079741 A CN111079741 A CN 111079741A CN 201911215002 A CN201911215002 A CN 201911215002A CN 111079741 A CN111079741 A CN 111079741A
Authority
CN
China
Prior art keywords
target image
position information
area
region
specific area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911215002.1A
Other languages
Chinese (zh)
Inventor
赵爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911215002.1A priority Critical patent/CN111079741A/en
Publication of CN111079741A publication Critical patent/CN111079741A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a method and a device for detecting the position of an image frame, electronic equipment and a storage medium, wherein the method is used for acquiring a target image; positioning the position information of a specific area in the target image by using the specific area template; determining the frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information; the method only needs the template of the specific area in the whole process of detecting the frame position of the image, does not need a large amount of training data with accurate frame position marks, and does not need to train models such as a detection network after the training data is obtained, so the difficulty of extracting the frame position of the image can be reduced, the detection cost of the frame position of the image is reduced, and the efficiency of executing the detection network is not high because the procedures of the detection network are huge.

Description

Image frame position detection method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for detecting a position of an image frame, an electronic device, and a storage medium.
Background
With the rapid development of live video, small video and the like, people have more and more diversified requirements for video display contents, for example, a plurality of area images often exist in the same display interface, and a certain area image needs to be displayed in a full screen mode independently, or a user needs to intercept a required area image from an interface with a plurality of area images. In each of the above scenes, it is necessary to determine frame position information of a required area image from the entire image, and then intercept the area image according to the frame position information for subsequent use.
In the prior art, in the process of determining the frame position information, it is usually necessary to detect images displayed in the entire display interface by using trained network models such as a detection network, and calculate the frame position information of the images in the required area. For example, the image is detected using a detection network based on deep learning, and the frame position of the desired region image is regressed. However, this method needs training data with accurate frame position labels to train the model of the detection network, however, this kind of data labels are costly and the processing efficiency of the detection network on the CPU is low. Therefore, how to detect the position of the image frame quickly and efficiently is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of this, the present application provides an image frame position detection method, an image frame position detection device, an electronic device, and a storage medium, where frame position information of a region to be located in a target image can be directly obtained only by using a specific region template, and zero-sample fast image frame position detection can be achieved without any annotation data and without training models such as a detection network.
In order to achieve the above object, in one aspect, the present application provides a method for detecting a frame position of an image, including:
acquiring a target image;
positioning the position information of a specific area in the target image by using a specific area template;
and determining the frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information.
In a possible implementation manner, the determining, according to the position proportional relationship between the specific region and the region to be positioned in the target image and the position information, the frame position information of the region to be positioned includes:
acquiring initial frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information;
and processing the initial frame position information by using a mean algorithm to obtain the frame position information of the area to be positioned.
In yet another possible implementation, the extracting the target image from the received video includes:
intercepting a video to be extracted from a received video; the video to be extracted is a video with a preset time length in the middle section of the video;
and extracting a target image from the video to be extracted.
In yet another possible implementation manner, the locating, by using a specific area template, position information of a specific area in the target image; the method comprises the following steps:
respectively extracting the preset area in the target image and the characteristic information of a specific area template;
and determining the position information corresponding to the specific area in the target image according to the specific area template and the feature information of the preset area.
In another possible implementation manner, the determining, according to the specific area template and the feature information of the predetermined area, the position information corresponding to the specific area in the target image includes:
matching the specific area template with the characteristic information of the preset area by using a nearest neighbor method, and counting the number of matched characteristic point pairs;
and when the number is larger than a preset threshold value, acquiring the position information corresponding to the specific area in the target image by using the matched characteristic point pair information.
In another possible implementation manner, the obtaining, by using the matched pair of feature points information, position information corresponding to a specific area in the target image includes:
solving the homography matrix of the matched characteristic point pair information;
and performing projection transformation on the position of the specific area in the target image by using the homography matrix to obtain the position information of the specific area in the target image.
In another aspect, the present application provides an image frame position detecting apparatus, including:
the acquisition module is used for acquiring a target image;
the specific area positioning module is used for positioning the position information of the specific area in the target image by using a specific area template;
and the image frame positioning module is used for determining the frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information.
In another aspect, the present application further provides an electronic device, including:
a memory for storing a computer program;
and the processor is used for realizing the image frame position detection method of any embodiment of the application when executing the computer program.
In another aspect, the present application further provides a storage medium, where computer-executable instructions are stored, and when the computer-executable instructions are loaded and executed by a processor, the method for detecting the position of an image frame according to any embodiment of the present application is implemented.
Therefore, in the embodiment of the application, the position information of the specific region in the target image can be directly determined only by using the specific region template, and after the position information of the specific region is obtained, the frame position information of the region to be positioned in the target image can be directly converted according to the position proportional relation between the specific region and the region to be positioned, so that the frame position detection of the image is completed. That is, the embodiment of the present application only needs a specific area template in the whole process of determining the position of the image frame, and does not need a large amount of training data with accurate frame position labels, and does not need to train models such as a detection network after acquiring the training data, so that the difficulty of extracting the position information of the image frame can be reduced, and the high cost of the labeled training data is avoided. Furthermore, because the programs corresponding to the detection network are huge, the efficiency of running the detection network is not high, and the detection network is not suitable for large-scale video preprocessing.
Accordingly, embodiments of the present application further provide an image frame position detection apparatus, an electronic device, and a storage medium corresponding to the image frame position detection method, which have the above technical effects and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on the provided drawings without creative efforts.
Fig. 1 is a schematic diagram illustrating a hardware composition framework to which an image frame position detection method according to an embodiment of the present application is applied;
FIG. 2 is a schematic diagram of a hardware composition framework to which another image border position detection method according to the embodiment of the present application is applied;
fig. 3 is a schematic flowchart illustrating an image border position detection method according to an embodiment of the present application;
fig. 4 is a schematic flowchart illustrating an image border position detection method according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating an embodiment of locating position information of a specific area in a target image by using a specific area template;
FIG. 6 is a schematic diagram illustrating a result of matching a small map area in a game area by using a small map template in a frame of target image in a live game video according to an embodiment of the present application;
FIG. 7 is a diagram illustrating a result of location information for a small map area according to an embodiment of the present application;
FIG. 8 is a diagram illustrating the result of frame position information of a game area according to an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of a small map area template according to an embodiment of the present application;
fig. 10 is a schematic flowchart illustrating a process of determining frame position information of an area to be located according to a position proportional relationship and position information in an embodiment of the present application;
fig. 11 illustrates a schematic flow chart of determining frame position information of an area to be positioned according to a position proportional relationship and position information in an embodiment of the present application;
fig. 12 is a schematic diagram illustrating an application scenario embodiment according to an embodiment of the present application;
fig. 13 is a block diagram illustrating a configuration of an embodiment of an image frame position detection apparatus according to an embodiment of the present application.
Detailed Description
At present, when a user faces a scene where an image of a region needs to be automatically captured, the trained detection network and other models need to be used for detecting the region to be positioned, frame position information of the region to be positioned is regressed, and then the image of the region needs to be captured. For example, a live video picture of an MOBA game (e.g., a royal glory game) usually includes a game area, a main broadcast avatar area, an advertisement area, and the like during a live broadcast process, and in the application scenario, the game area is usually required to be captured from the live video picture. Among them, the MOBA game (multiplayer online Battle Arena). However, the training process of the detection network not only needs a large amount of training data with accurate frame position labels, but also has high acquisition cost of the training data, and meanwhile, the processing efficiency of the detection network is low, and the detection network is not suitable for large-scale video preprocessing. According to the technical scheme provided by the embodiment of the application, the frame position information of the image can be extracted quickly and efficiently, the frame position information of the region to be positioned in the target image can be determined only by utilizing the specific region template, and the frame position detection of the image is completed.
For convenience of understanding, a hardware composition framework to which the scheme corresponding to the image frame position detection method of the present application is applied is introduced first. Reference may be made to fig. 1, where fig. 1 is a schematic diagram illustrating a hardware composition framework to which an image frame position detection method of the present application is applied.
As can be seen from fig. 1, the hardware composition framework may include: the electronic device 10, wherein the electronic device 10 may include: a processor 11, a memory 12, a communication interface 13, an input unit 14 and a display 15 and a communication bus 16.
The processor 11, the memory 12, the communication interface 13, the input unit 14 and the display 15 all communicate with each other through a communication bus 16.
In the embodiment of the present application, the processor 11 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, an off-the-shelf programmable gate array, or other programmable logic device. The processor may call a program stored in the memory 12. Specifically, the processor may perform operations performed on the electronic device side in the following embodiments of the image border position detection method.
The memory 12 is used for storing one or more programs, which may include program codes including computer operation instructions, and in this embodiment, the memory stores at least the programs for implementing the following functions:
acquiring a target image;
positioning the position information of a specific area in the target image by using the specific area template;
and determining the frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information.
In one possible implementation, the memory 12 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as an image playing function, etc.), and the like; the storage data area can store data created in the use process of the computer, such as user data, user access data, audio and video data and the like.
In addition, the memory 12 may also include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.
The communication interface 13 may be an interface of a communication module, such as an interface of a GSM module.
The present application may also include a display 14 and an input unit 15, and the like.
Of course, the structure of the electronic device shown in fig. 1 does not constitute a limitation of the electronic device in the embodiment of the present application, and in practical applications, the electronic device may include more or less components than those shown in fig. 1, or some components may be combined.
The electronic device 10 in fig. 1 may be a terminal (e.g., a mobile terminal such as a mobile phone and a tablet computer, or a fixed terminal such as a PC) or a server.
In the embodiment of the present application, the electronic device 10 may receive videos or images sent by other external devices through a network according to the communication interface 13; the video or image may also be acquired through its own input unit 14 (e.g., scanner, etc.) or memory.
Correspondingly, the processor 11 in the electronic device 10 may obtain the target image from the communication interface 13 or the input unit 14 through the communication bus 16, call a program stored in the memory 12 to locate the specific region in the target image, obtain the position information of the specific region, convert the position information in combination with the position proportional relationship between the specific region and the region to be located, obtain the frame position information of the region to be located in the target image, and further intercept the image corresponding to the region to be located from the target image according to the frame position information, thereby implementing fast and efficient image frame position detection.
In one possible case, to improve processing efficiency. In the embodiment of the present application, the electronic device 10 extracts a target image from a video. For example, the target image is extracted from the video to be extracted at equal intervals. The video to be extracted may be a video received by the electronic device 10, or may be a video with a preset time duration in the middle of the received video. Of course, the preset duration is not limited in size at this time.
In another possible case, in order to ensure the accuracy of the frame position information of the region to be positioned in the target image, the image frame position detection method may be performed on the obtained multiple target images, multiple pieces of initial frame position information of the region to be positioned are obtained, and the final frame position information of the region to be positioned is obtained by calculation according to the multiple pieces of initial frame position information. The electronic device 10 may further use an outlier rejection algorithm to reject outlier initial frame position information from the initial frame position information, and average the rejected initial frame position information to obtain frame position information of the region to be located.
It is to be understood that, in the embodiment of the present application, the number of the electronic devices is not limited, and it may be that a plurality of electronic devices cooperate together to complete the image border position detection method. In one possible scenario, please refer to fig. 2. As can be seen from fig. 2, the hardware composition framework may include: a first electronic device 101, a second electronic device 102. The first electronic device 101 and the second electronic device 102 are connected in communication through a network 103.
In this embodiment of the application, the hardware structures of the first electronic device 101 and the second electronic device 102 may refer to the electronic device 10 in fig. 1, and it can be understood that in this embodiment, two electronic devices 10 are provided, and perform data interaction between the two electronic devices 10, so as to implement the function of determining the position information of the image frame. Further, the form of the network 103 is not limited in this embodiment, for example, the network 103 may be a wireless network (e.g., WIFI, bluetooth, etc.), or may be a wired network.
The first electronic device 101 and the second electronic device 102 may be the same electronic device, for example, both the first electronic device 101 and the second electronic device 102 are servers; or different types of electronic devices, for example, the first electronic device 101 may be a terminal or an intelligent electronic device, and the second electronic device 102 may be a server. In yet another possible case, referring to fig. 3 in particular, the present application may utilize a server with strong computing power as the second electronic device 102 to improve data processing efficiency and reliability, so as to improve image frame position information detection efficiency. Meanwhile, a terminal or an intelligent electronic device with low cost and wide application range is used as the first electronic device 101 to realize the interaction between the second electronic device 102 and the user. The interaction process may be: and after acquiring the target image, the terminal sends the target image to the server. After receiving the target image, the server positions the position information of the specific area in the target image by using the specific area template; and determining the frame position information of the region to be positioned in the target image according to the position proportional relation between the specific region and the region to be positioned and the position information. Of course, after the server calculates the frame position information of the region to be positioned in the target image, the server may also use the positioned frame position information as one of the labels of the target image (or the video obtained by intercepting the target image), and store the label together with the target image (or the corresponding video) in a warehouse, so as to facilitate subsequent use and retrieval.
With the above generality, referring to fig. 4, which shows a flowchart of an embodiment of the image frame position detection method according to the present application, the method of the present embodiment may include:
and S101, acquiring a target image.
In the embodiment of the application, the number of the target images is not limited, and only one target image can be processed to obtain the frame position information of the region to be positioned in the target image. Or processing a preset number of target images to finally obtain the frame position information of the region to be positioned. Of course, the number of the preset number is not limited in the embodiment of the present application. In the embodiment of the present application, a determination method of a target image is not limited, and in the embodiment of the present application, one, a plurality of, or all of the acquired original images or videos may be directly used as the target image, for example, the target image is extracted from the received video; or one, a plurality of or all of the acquired original images or videos may be preprocessed to be used as target images. It can be understood that, in the embodiment of the present application, the pre-processing process is not limited, and it is related to the specific manner of selection of the subsequent step, that is, when the subsequent step can directly process the original image or video, the original image or video may not be pre-processed, and if the target image required by the subsequent step is a pre-processed image, the extracted original image or video needs to be correspondingly pre-processed to obtain the target image. Of course, the electronic device may directly receive the final target image obtained by the other electronic device in the above manner.
In the embodiment of the present application, a manner of extracting a partial image as a target image from an original image or a video is not limited, and the target image may be directly extracted from the original image or the video at equal intervals. Or a part of the original image or the video can be directly intercepted as a target image; if the video with the preset time duration in the middle section of the video is captured as the target image, or the video with the last preset time duration in the video is captured as the target image, of course, the embodiment of the present application does not limit the value of the preset time duration, and the middle section indicated in the present application may refer to a video with the preset time duration in the middle section obtained after a preset frame starting from the video is used as a starting frame of the middle section video and then the preset time duration is continued. Or intercepting a video to be extracted from the received video (the video to be extracted can be a video with a preset time length in the middle section of the video, or a video with the last preset time length of the video, etc.); and then extracting the target image from the video to be extracted. Of course, the extraction at this time may be to extract the target image at equal intervals, or to randomly receive the target image, which is not limited in this embodiment.
It should be noted that, in the embodiment of the present application, the manner of extracting the target image may be selected according to a specific application scenario. For example, when the position of the border of the area to be positioned in each frame of image in the video is consistent, in order to further improve the processing efficiency on the basis of ensuring the accuracy of the border position information of the area to be positioned, only a certain number of video frames may be selected as target images to perform border position information detection of the area to be positioned. When each frame of image does not have a region to be positioned in the received video, in order to improve the detection efficiency of the frame position information of the region to be positioned, a video segment with a concentrated comparison region to be positioned can be intercepted from the video. Of course, in the case where the respective extraction rules do not contradict, the target image may be acquired by superimposing the corresponding extraction rules according to the actual application scene. For example, when the positions of the frames of the to-be-positioned area in each frame of image in the video are consistent, and the to-be-positioned area does not exist in each frame of image in the video, the video segments in the comparison set of the to-be-positioned areas may be extracted first, and then the video segments may be extracted at equal intervals to obtain the target image.
It should be understood that the manner of acquiring the original image or video is not limited in the embodiments of the present application. For example, the original image or video uploaded by other electronic devices through a network (which may be a wired network or a wireless network) may be directly received, the input original image or video may be directly obtained from a local place, or the original image or video may be obtained through an accessed usb disk.
S102, positioning the position information of the specific area in the target image by using the specific area template.
The embodiment of the application does not limit the selection mode of the specific area, as long as the selected specific area has fixed characteristics and has a certain position proportional relationship with the area to be positioned. The specific area has a fixed characteristic, so that the specific area can be identified from the target image, and the position information of the specific area can be obtained. The specific area and the area to be positioned have a certain position proportional relationship, so that after the position information of the specific area is obtained, the frame position information of the area to be positioned can be converted through the position proportional relationship. Of course, the specific position of the specific region in the target image is not limited in the embodiment of the present application, for example, the specific region may be a part of the region to be located, or may be a certain region other than the region to be located in the target image.
The specific area template in the embodiment of the present application corresponds to the specific area, which may be understood as that when the selected specific area belongs to an area without any change (e.g., a certain icon in the target image, the feature of the icon does not change), the specific area template is the specific area, e.g., an image of the specific area is cut from the image as the specific area template. When the selected specific area belongs to an area with fixed characteristics but also has variation (for example, a small map exists in a game area in a game video, characteristic information such as roads and the like in the small map is fixed and unchangeable, but hero head portrait appearing in the small map in each game can be changed), the specific area template can be a specific area, for example, an image of the specific area is intercepted from any one image to be used as the specific area template; of course, it is also possible to find a specific region image with the clearest fixed features from a plurality of images as the specific region template.
In the embodiments of the present application, the number of the specific regions is not limited, and one or more specific regions may be provided. When the specific areas are multiple, if the specific areas are different, a corresponding specific area template needs to be set for each specific area; if the image contents corresponding to the specific areas are the same, but the positions in the target image are different, a corresponding specific area template may be set for the specific areas, or of course, a corresponding specific area template may be set for each specific area.
Because the specific area template provided by the embodiment of the application and the specific area have the same characteristics or the same fixed characteristics, the embodiment of the application can match the specific area from the target image through the specific area template, and further can acquire the position information of the specific area. It should be noted that the embodiment of the present application does not limit a specific way of matching a specific region from a target image by using a specific region template, for example, a local feature matching way may be used to implement matching of the specific region template and the specific region.
The embodiment of the present application does not limit the size of the region in the target image that is feature-matched with the specific region template. The whole area of the target image can be directly matched with the specific area template; it is also possible to match a predetermined area in the target area with a specific area template. Of course, the size of the predetermined area is not limited in the embodiment of the present application, but it is necessary to include a specific area in the target image, and it is understood that the size of the predetermined area is larger than the size of the specific area, but smaller than the size of the entire target image. Of course, in the case that the predetermined area can accurately include the specific area, the smaller the predetermined area is, the smaller the number of feature points to be matched is, and accordingly, the speed of locating the position information of the specific area is increased. Therefore, in order to improve the efficiency of determining the image frame position information, only a predetermined region in the target region may be matched with the specific region template. Namely, the embodiment of the application can respectively extract the feature information of the preset area in the target image and the specific area template; and determining the position information corresponding to the specific area in the target image according to the specific area template and the feature information of the preset area.
In the embodiment of the present application, a method for extracting feature information of a predetermined region and a specific region template in a target image is not limited, for example, a Scale-invariant feature transform (SIFT-invariant feature transform), a Speeded Up Robust Features (Speeded Up Robust Features), an ORB algorithm (organized fast reached Robust BRIEF, algorithm for fast feature point extraction and description), a BRISK algorithm (Binary Robust changeable keys, a feature extraction algorithm) may be used to extract feature information of the predetermined region and the specific region template. The user can select a suitable algorithm according to the image condition of a specific area in an actual application scene, the detection efficiency of the user on the image frame position information and the hardware processing capacity of the electronic equipment. Wherein SURF is an accelerated version of the SIFT algorithm; the ORB, BRISK algorithm is much less robust than the SIFT algorithm, but can greatly increase speed.
It can be understood that after the feature information of the predetermined region and the specific region template is extracted, a feature matching operation needs to be performed, so as to complete matching of the specific region in the target image. The embodiment of the present application does not limit the manner of matching the specific area template and the feature information of the predetermined area, for example, a nearest neighbor method may be selected to match the specific area template and the feature information of the predetermined area. Of course, the user may adaptively select the corresponding feature matching method according to the selected feature extraction manner.
It should be noted that, when there is only one target image in the embodiment of the present application, two cases may occur in the process of performing feature matching, and one case is that a specific region is matched from a predetermined region in the target image by using a specific region template. Of course, when the specific area is successfully matched, the embodiment of the application may calculate the position information corresponding to the specific area in the target image by using the feature point pair information successfully matched. Another case is that a specific region cannot be matched from a predetermined region in the target image using the specific region template. The embodiment of the application does not limit the subsequent operation under the condition, for example, the subsequent operation can be directly outputting prompt information of failure detection of the image frame position information; or outputting prompt information for reacquiring the target image; or the operations can be superposed; of course, this operation may be directly ended. The user can make a selection according to actual conditions.
When there are a plurality of target images in the embodiment of the present application, there are two cases in the process of performing feature matching, and one case is that there is a specific region matched from a predetermined region in the target image by using a specific region template, that is, all target images or there is a part of the target images matched to the specific region. Of course, when the specific area is successfully matched, the embodiment of the application may calculate the position information corresponding to the specific area in each target image successfully matched by using the feature point pair information successfully matched. Another case is that none of the specific region templates can be matched to a specific region from a predetermined region in the target image. The embodiment of the application does not limit the subsequent operation under the condition, for example, the subsequent operation can be directly outputting prompt information of failure detection of the image frame position information; or outputting prompt information for reacquiring the target image; or the operations can be superposed; of course, this operation may be directly ended. The user can make a selection according to actual conditions.
It can be understood that, only when there is a successful matching to the specific region in the predetermined region, the embodiment of the present application calculates the position information corresponding to the specific region in the target image by using the feature point pair information with the successful matching. The embodiments of the present application do not limit how to determine the successful matching to a specific region in a predetermined region, which is compatible with the selected feature matching method. For example, when the nearest neighbor method is selected to match the specific area template with the feature information of the predetermined area, it may be determined whether the matching to the specific area in the predetermined area is successful according to the number of pairs of feature points whose matching is successful. That is, when the number of feature point pairs successfully matched is greater than a preset threshold, it is considered that a specific region exists in the predetermined region of the target image; and when the number of the successfully matched characteristic point pairs is not greater than a preset threshold value, determining that no specific area exists in the preset area of the target image. Of course, the numerical value of the preset threshold is not limited in the embodiment of the present application, and the user may determine and modify the numerical value according to the actual situation.
It should be noted that, in the embodiment of the present application, a manner of calculating the position information corresponding to the specific region in the target image by using the feature point pair information that is successfully matched is also not limited, for example, a corresponding homography matrix may be calculated by using the feature point pair information that is successfully matched, and the position information of the specific region in the target image may be determined according to the homography matrix (the position information may be coordinate information of the specific region in the target image). Wherein the homography matrix of a plane is defined as the projection mapping of one plane to another.
S103, determining the frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information.
After the position information of the specific area in the target image is determined, the position information can be converted according to the position proportional relation between the specific area and the area to be positioned in the target image, and the frame position information of the area to be positioned in the target image is obtained. It should be noted that, in the embodiment of the present application, the position proportional relationship may refer to a position proportional relationship between a specific region and a region to be located in a target image; or the position proportional relation between the area to be positioned in the target image and the specific area. Certainly, in the embodiment of the present application, the position relationship between the specific region and the region to be located in the target image is not limited, for example, the specific region may be a partial region in the region to be located; the specific area may be a partial area other than the area to be located. If a certain position proportional relationship exists between the two, the frame position information corresponding to the area to be positioned can be calculated according to the position proportional relationship and the position information of the specific area. It should be noted that, in the embodiment of the present application, the size of the region to be located in the target image is not limited, that is, the region to be located may be a small portion of the target image or may be the entire target image.
It can be understood that, when only the position information of one specific area is obtained, the frame position information of the area to be positioned, which is directly obtained by conversion according to the position information and the position proportional relation, may be directly used as the frame position information of the final area to be positioned. When the position information of a plurality of specific areas is obtained, the frame position information of the to-be-positioned area corresponding to each specific area can be obtained through conversion according to each position information and the position proportional relation, and the final frame position information of the to-be-positioned area is obtained according to the frame position information of the to-be-positioned area corresponding to each specific area. Further, in order to improve the accuracy of the final frame position information of the obtained region to be positioned, the embodiment of the application may acquire the initial frame position information of the region to be positioned according to the position proportional relationship between the specific region and the region to be positioned in the target image and the position information; and processing the initial frame position information by using a mean algorithm to obtain the frame position information of the area to be positioned. In other words, according to the embodiment of the application, the obtained initial frame position information is subjected to mean processing, so that the error of the finally obtained frame position information is reduced, and the accuracy of the finally obtained frame position information of the to-be-positioned area is improved.
Of course, the embodiment of the present application does not limit the manner in which the initial frame position information is processed by using the mean algorithm to obtain the frame position information of the region to be located, that is, the embodiment of the present application does not limit the mean algorithm. For example, the initial frame position information of the to-be-positioned region corresponding to each specific region may be directly averaged by using an averaging algorithm to obtain the final frame position information of the to-be-positioned region. Correspondingly, the averaging algorithm may be a mean calculation method. Or screening out initial frame position information with smaller error from initial frame position information of the to-be-positioned area corresponding to each specific area by using an average algorithm, and then averaging the screened initial frame position information to obtain final frame position information of the to-be-positioned area. Correspondingly, the mean algorithm may be a linear regression algorithm (data fitting is performed by using a regression analysis method in mathematical statistics), or an outlier rejection algorithm (an outlier rejection algorithm based on distance, which can reject bad data). Or the initial frame position information of the to-be-positioned area corresponding to each specific area is sorted from small to large by using an average algorithm, and the selected initial frame position information with the preset number in the middle is averaged to be used as the final frame position information of the to-be-positioned area. Of course, the embodiment of the present application does not limit the preset number.
Therefore, the frame position information of the region to be positioned in the target image can be automatically extracted, and the automatic marking of the frame position information of the region to be positioned in the target image can be realized by the method, for example, a frame position information label of the region to be positioned is added to the video. Furthermore, an area to be positioned can be quickly extracted from the subsequently uploaded video of the type according to the label, and picture content irrelevant to the area to be positioned is removed, so that the accuracy and reliability of detection and identification of specific elements in the subsequently uploaded area to be positioned are improved. For example, when the area to be located is a game area, the accuracy of subsequently identifying key elements such as hero and event broadcasting in the game area can be improved. Namely, the embodiment of the application provides an efficient preprocessing method for detecting and identifying the key elements of the image. Of course, the form of the specific label is not limited in the embodiments of the present application.
In the embodiment of the application, the position information of the specific area can be positioned from the target image only by using one specific area template corresponding to the specific area, and then the frame position information of the area to be positioned in the target image can be directly calculated according to the position proportional relation between the specific area and the area to be positioned in the target image, so that the image frame position detection process is completed. It can be seen that, in the embodiment of the present application, only one specific area template is needed to directly determine the position information of a specific area in a target image, and it is not necessary to acquire a large amount of training data with accurate frame position labels before detecting the frame position of the image, nor to train a detection network after acquiring the training data, therefore, the embodiment of the present application can reduce the difficulty and cost of extracting the frame position of the image, and because the programs of the detection network are huge, the efficiency of executing the detection network is not high, and the method is not suitable for large-scale video preprocessing, but in the embodiment of the present application, only one specific area template is needed to position the position information of the specific area, the process is simple, and a method for determining the frame position information of an area to be positioned of a zero sample, that is, a highly efficient and robust method for determining the frame position information, the method has high execution efficiency and is suitable for large-scale video preprocessing. Further, in the embodiment of the present application, only the position proportional relationship between the specific region and the region to be positioned in the target image is needed, and the frame position information of the region to be positioned in the target image can be directly calculated, so that when the specific region is smaller than the region to be positioned, the efficiency of detecting the frame position information of the image can be further improved, that is, the embodiment of the present application can also improve the efficiency of detecting the frame position of the image by positioning the large region with the small region.
In one possible case, the embodiment of the present application provides a process for positioning location information of a specific area in a target image by using a specific area template; referring to fig. 5, the specific execution flow is as follows:
s501, respectively extracting the characteristic information of the preset area in the target image and the specific area template.
In the embodiment of the application, in order to improve the detection efficiency of the position information of the specific area in the target image, only the feature information of the predetermined area containing the specific area in the target image is extracted. The user can select the predetermined area, the specific area and the specific area template according to the actual situation, and the relevant selection criteria can refer to the relevant description in the above embodiments. For example, when the target image is a live game image and the to-be-positioned area of the frame position information needs to be determined in the present application is a game area in the live game image, the specific area and the specific area template may be a small map area in the game area. In general, the small map in the game is in the upper left corner of the game image, and the upper left quarter of the area in the game live image can contain the small map area, so that the predetermined area can select the upper left quarter of the area in the game live image.
It should be noted that, in the embodiment of the present application, a manner of extracting feature information of a predetermined region and feature information of a specific region template is not limited, for example, feature information of a predetermined region and feature information of a specific region template may be extracted by using a SIFT algorithm; the SURF algorithm can also be used for extracting the characteristic information of the preset area and the specific area template; or extracting the characteristic information of the template of the preset area and the specific area by using an ORB algorithm; the feature information of the predetermined region and the specific region template may be extracted by the BRISK algorithm. Of course, the embodiment of the present application does not limit the specific feature information extraction process, and the user may execute the corresponding feature extraction operation according to the selected feature extraction algorithm.
For example, the process of extracting the predetermined regions in the target image and the feature information of the specific region template by using the SIFT algorithm may be: the method comprises the steps of respectively detecting corner features in a specific region template and a predetermined region, and representing feature vectors of the corners (namely feature information, which can also be represented as SIFT feature vectors) by using the scale, position and direction information of the detected corners. For example, by
Figure BDA0002299263670000163
A feature vector representing the kth corner point of the region-specific template,
Figure BDA0002299263670000164
a feature vector representing the k-th corner point of the predetermined region. The angular points are feature points, the SIFT algorithm is a universal algorithm for detecting local features, the SIFT features have scale invariance, and good detection effect can be obtained even if the rotation angle, the image brightness or the shooting visual angle are changed.
S502, matching the specific area template with the feature information of the preset area by using a nearest neighbor method, and counting the number of matched feature point pairs.
The embodiment of the application utilizes a nearest neighbor method to match the specific area template with the characteristic information of the preset area. The specific matching process needs to perform the corresponding matching operation in combination with the content of the obtained feature information. For example, when the nearest neighbor method is used to map the SIFT feature vectors of M feature points of the template of the specific region
Figure BDA0002299263670000161
SIFT feature vector of N feature points of predetermined region
Figure BDA0002299263670000162
The process of matching and counting the number of matched feature points may be: calculating the SIFT feature vector of each feature point of the template of the specific region and the SIFT feature of any feature point in the predetermined regionEuclidean distance d of the vector; and for any feature point of the specific area template, finding out two feature points with the nearest Euclidean distance d in the predetermined area, and if the nearest distance is divided by the next nearest distance to be smaller than a set proportional threshold, considering the feature point of the specific area template and the feature point with the nearest Euclidean distance d in the predetermined area as a pair of feature point pairs which are successfully matched. Wherein the Euclidean distance d is used as a similarity determination measure between the feature point of the specific region template and the feature point of the predetermined region, for example, the feature point of the specific region template
Figure BDA0002299263670000171
Characteristic point of predetermined area
Figure BDA0002299263670000172
Is measured by the similarity determination metric
Figure BDA0002299263670000173
In the embodiment of the present application, the value of the ratio threshold is not limited (for example, the ratio threshold may be set to be an empirical value of 0.7), and the smaller the value of the ratio threshold, the fewer the values of the feature point pairs that can be successfully matched become, and accordingly, the more stable and reliable the matched feature point pairs become. The user can set the proportional threshold according to the actual application scene and the required image frame position detection reliability, and of course, the user can also adjust the proportional threshold according to the frame position detection condition of the actual region to be positioned. For example, referring to fig. 6, a schematic diagram of a result of matching a small map area in a game area by using a small map area template in a frame of target image in a live game video is shown. In the figure, a certain frame of live game image is taken as a target image, a small map area is taken as a specific area, a small map area template is taken as a specific area template, and a game area is an area to be positioned. The left side of the drawing is a small map area template, the characteristic points of the corresponding positions in the small map area template represented by straight lines connecting the small map area template and the picture frame are successfully matched with the corresponding positions in the picture frame, and the positions of the characteristic points successfully matched are utilized to obtain the positions of the small map area in the pictureA specific location in the face frame.
S503, judging whether the number is larger than a preset threshold value; when the number is not greater than the preset threshold, step S504 is performed, and when the number is greater than the preset threshold, step S505 is performed.
And S504, outputting the information which is not matched with the specific area.
And S505, solving the homography matrix of the matched characteristic point pair information.
It should be noted that the specific value of the preset threshold is not limited in the embodiment of the present application (for example, the preset threshold may be set to 15). The larger the value of the preset threshold is, the higher the reliability of the detected specific region is, that is, the false detection rate of the specific region can be reduced, and the false detection rate of the specific region can also be increased. Therefore, the user can set the preset threshold according to the actual application scene and the required image frame position detection reliability, and of course, the user can also adjust the preset threshold according to the frame position detection condition of the actual region to be positioned. When the number is larger than a preset threshold value, a specific area exists in a preset area of the target image; and when the number is not more than the preset threshold value, the specific area is not considered to exist in the preset area of the target image. For example, when the preset threshold is 15 and the specific area is a small map area, when the number of matched feature point pairs is higher than 15, it is determined that a small map exists in the predetermined area of the target image; when the number of matched pairs of feature points is not higher than 15, it is determined that a small map does not exist in the predetermined region of the target image.
Of course, when the specific area does not exist in the predetermined area of the target image, the prompt information that is not matched with the specific area may be output according to step S504, and at this time, the output mode of the prompt information is not limited, and the prompt information may be output through a display screen, or may be output through voice. Or directly finishing the detection of the current target image and entering the detection process of the next frame of target image.
S506, performing projection transformation on the position of the specific area in the target image by using the homography matrix to obtain the position information of the specific area in the target image.
In the embodiment of the application, when the specific area is detected in the predetermined area in the target image, the position information of the specific area in the target image can be calculated. That is, the position information of the specific region in the target image is calculated from the matched feature point pair information. In order to improve the reliability and detection efficiency of the position information detection of the specific area in the target image, the embodiment of the application utilizes the homography matrix to solve the position information of the specific area in the target image. The process may be: firstly, a homography matrix H of the matched characteristic point pair information is solved. Then, a homography matrix H is used for projection transformation of a specific region position src (x, y) in the target image, and position information dst (x, y) of the specific region in the target image is obtained, wherein dst (x, y) ═ src (x, y) × H.
It should be noted that, in the embodiment of the present application, the content of the location information is not limited, and it may be coordinate information of each point in a specific area; or coordinate information of a frame point in a specific area; when the specific area is a regular pattern such as a matrix, the specific area may be coordinate information specifying all or some of the corners of the frame in the specific area. The content of the position information of the specific area in the embodiment of the application is related to the shape and the position proportional relation of the area to be detected. The user can actually determine the content of the location information of the desired specific area.
By way of example, reference may be made to fig. 7, 8 and 9, which respectively present a schematic diagram of the results of location information for a small map area; a result schematic diagram of frame position information of the game area; and a small map area template schematic diagram. In the figure, a certain frame of live game image is taken as a target image, a small map area (a small rectangle) is taken as a specific area, and a game area (a large rectangle) is taken as an area to be positioned. Because the small map area is positioned at the upper left corner of the game area, the top points of the upper left corner are overlapped, and because the small map area and the large map area are both rectangles, and the length-width ratios of the small map area and the large map area have a certain proportional relationship, the top point coordinates of the upper left corner and the bottom point coordinates of the lower right corner in the game area can be determined only by knowing the top point coordinates and the position proportional relationship of the small map area, and because the game area is a matrix, the picture position of the game area can be uniquely determined according to the two diagonal coordinates, and the frame position information of the game area can be obtained. Fig. 7 shows the vertex coordinates of the top left corner and the vertex coordinates of the bottom right corner of the small map area, wherein the starting position of the coordinate system is the top left corner of the target image, namely the live game image. Fig. 8 shows the vertex coordinates of the top left corner and the vertex coordinates of the bottom right corner in the game area, and of course, the starting position of the coordinate system is still the top left corner of the target image, i.e. the live game image. Fig. 9 shows a schematic diagram of a small map area template.
Therefore, in the embodiment of the application, a large amount of training data with accurate frame position labels do not need to be acquired before the frame positions of the images are detected, and the detection network does not need to be trained after the training data are acquired. And finally, directly projecting and transforming the position information of the specific area in the target image according to a homography matrix corresponding to the successfully matched characteristic point pair information. The calculation process is simple and efficient.
In another possible case, the embodiment of the present application provides a process for determining frame position information of a region to be located according to a position proportional relationship between a specific region and the region to be located in a target image and position information; referring to fig. 10, the specific execution flow is as follows:
s1001, acquiring initial frame position information of a region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information;
s1002, removing outlier initial frame position information in the initial frame position information by using an outlier removing algorithm, and averaging the removed initial frame position information to obtain frame position information of the to-be-positioned area.
According to the method and the device, when the specific area is detected in the plurality of target images, the position information of the specific area corresponding to each target image of which the specific area is detected is calculated, and further, the position information can be converted according to the position proportional relation between the specific area and the area to be positioned to obtain the initial frame position information of the area to be positioned of each target image of which the specific area is detected. That is, in the embodiment of the present application, each target image detected in the specific area is calculated to obtain a corresponding initial frame position information. It should be noted that, in the embodiment of the present application, the content of the initial frame position information or the frame position information is not limited, as long as it can uniquely determine the position of the region to be located in the target image. For example, when the area to be positioned is rectangular, the position of the area to be positioned in the picture can be uniquely determined by determining two diagonal coordinates.
For example, when the target image is a live game image glowed by a royal, the area to be located is a game area in the live game image, the specific area and the specific area template may be a small map area in the game area, and the predetermined area may be an area in the upper left quarter of the live game image, the position ratio relationship between the small map area and the game area is as follows: the coordinates of the upper left corner of the small map area are the same as the coordinates of the upper left corner of the game area, and the ratio of the picture width of the small map area of the live game image glowing by the royal to the picture width of the game area is fixed as follows: 0.3167, aspect ratio of the game area is 16: 9. correspondingly, the process of calculating the initial frame position information may be: the position information of the small map area corresponding to the kth game live broadcast image is as follows: coordinates of top left corner vertex of minimap region
Figure BDA0002299263670000207
And coordinates of the vertex of the lower right corner
Figure BDA0002299263670000206
And the coordinates of the top left corner vertex of the small map area areThe coordinates of the top left corner vertex of the play area, i.e., the top left corner vertex of the minimap area, are the same as the coordinates of the top left corner vertex of the play area. Thus, the coordinates of the upper left corner of the game area are:
Figure BDA0002299263670000205
the ratio of the screen width of the small map area of the live game image glowing by the royal person to the screen width of the game area is fixed as follows: 0.3167, in combination with the width of the small map area
Figure BDA0002299263670000203
According to
Figure BDA0002299263670000204
That is, the width of the game area, that is, the abscissa of the vertex at the lower right corner of the game area is obtained
Figure BDA0002299263670000208
Further, since the aspect ratio of the screen of the game area is 16: 9, according to
Figure BDA0002299263670000201
That is, the ordinate of the vertex of the lower right corner of the game area can be obtained
Figure BDA0002299263670000202
And further determining initial frame position information corresponding to the game area in the frame of target image according to the coordinates of the top left corner vertex of the game area and the coordinates of the bottom right corner vertex of the game area.
In the embodiment of the application, in order to avoid the influence of the position information of the specific region subjected to false detection on the frame position information of the subsequent region to be positioned, the accuracy of the frame position information of the region to be positioned is improved. According to the method and the device, the outlier removing algorithm is used for removing the outlier initial frame position information in the initial frame position information, and the average value of the remaining initial frame position information after removal is obtained to obtain the frame position information of the region to be positioned. The following illustrates the processing procedure of outlier rejection algorithm, when the target image is kingWhen the player glowing live game image, the to-be-positioned area is the game area in the live game image, and the specific area template can be small map areas in the game area, the video frames (i.e. target images) successfully matched with the small map areas are respectively calculated by utilizing the steps, and the initial frame position information of the corresponding game area (for example,
Figure BDA0002299263670000211
the coordinates of the top left corner vertex of the playing area in the kth target image are recorded,
Figure BDA0002299263670000212
recording the coordinates of the vertex at the lower right corner of the game area in the kth target image), and further obtaining all the initial frame position information coordinate sets. Statistics of set of abscissas of top left corner vertices
Figure BDA0002299263670000213
The frequency of the coordinate values is sorted from high to low according to the frequency to obtain a candidate coordinate value set, such as a set
Figure BDA0002299263670000214
The coordinate value of (1) is represented by
Figure BDA0002299263670000215
The three candidate values constitute a set of candidate coordinate values. And sequentially rejecting coordinate values in the candidate coordinate value set from high to low according to frequency, calculating the variance of the remaining candidate values when one candidate value is rejected, continuing to reject the next threshold value when the variance is higher than a preset threshold value until the variance of the remaining candidate values is lower than the preset threshold value, stopping rejecting, and calculating the mean value of all the remaining candidate values at present to serve as the detection value of the coordinate point. That is, when the calculated variance is smaller than the preset threshold, it is considered that the outliers in the abscissa set of the top-left vertex are eliminated, and the mean value of the remaining coordinate values in the abscissa set of the top-left vertex is calculated as the final average value of the top-left vertex in the frame position information of the game areaThe abscissa axis. According to the above process, respectively aligning the vertical coordinate sets of the top left corner vertexes
Figure BDA0002299263670000216
Set of abscissas of vertices in lower right corner
Figure BDA0002299263670000217
Set of ordinates of the vertex of the lower right corner
Figure BDA0002299263670000218
And executing the process to respectively obtain the vertical axis coordinate of the top left corner vertex in the final frame position information of the game area, the horizontal axis coordinate of the bottom right corner vertex in the final frame position information of the game area, and the vertical axis coordinate of the bottom right corner vertex in the final frame position information of the game area.
Of course, the embodiment of the present application does not limit the value of the set variance threshold, and for example, the set variance threshold may be set to 5. Further, since the finally obtained mean coordinate value may be a decimal, the user may set the length of the final mean coordinate value. For example, each coordinate in the frame position information of the region to be positioned may be obtained by rounding each obtained mean coordinate value, or each coordinate in the frame position information of the region to be positioned may be obtained by taking two bits after a decimal point for each obtained mean coordinate value. This is not limited in the examples of the present application.
In the embodiment of the application, when the initial frame position information of the multiple regions to be positioned is obtained through detection, in order to improve the accuracy of the frame position information of the regions to be positioned, the embodiment of the application performs statistical intervention on the multiple initial frame position information obtained through calculation to eliminate false detection conditions, so that the accuracy of the frame position information of the regions to be positioned is improved. In the embodiment of the application, only frame detection is carried out on a certain number of video frames, and statistical intervention is carried out on the detection result to eliminate false detection conditions, so that the frame position information of the region to be positioned is positioned.
In yet another possible case, the embodiment of the present application provides a process for acquiring a target image; referring to fig. 11, the specific execution flow is as follows:
s1101, intercepting a video to be extracted from the received video; the video to be extracted is a video with a preset time length in the middle section of the video;
and S1102, extracting a target image from the video to be extracted.
It can be understood that in videos uploaded by users such as live broadcasts, the video head is generally a video corresponding to the start phase, and the video tail is generally a video corresponding to the end phase, and there is a great uncertainty in both of them. For example, the video head may have an advertisement, leader, or preparation stage for game start, while the video tail may have a summary, trailer, or exit stage after game end. Therefore, it is likely that there are images that do not contain a region to be located at both the video head and the video tail. In order to avoid selecting an image not containing a region to be positioned as a target image and improve the detection efficiency of the image frame position, the embodiment of the application intercepts a video to be extracted from a received video; the video to be extracted is the video with the preset time length in the middle section of the video. That is, in the embodiment of the present application, the video to be extracted is a video obtained after removing the head and the tail of the received video. Of course, the lengths of the head and the tail of the video are not limited in the embodiment of the present application, and the user may determine the length of the middle-segment video to be intercepted according to a specific application scenario.
It should be noted that, in the embodiment of the present application, a manner of capturing a to-be-extracted video from a received video is not limited, a video with a duration of a head video being greater than a first preset value and a video with a duration of a tail video being greater than a second preset value in the received video may be removed, and the remaining videos are used as the to-be-extracted video, for example, videos with durations of the head video and the tail video being greater than 20% of each other are removed from the received video, and videos with durations of 60% in the middle segments of the remaining videos are used as the to-be-. Or taking the initial preset frame or the image with the preset time length of the received video as the starting frame of the middle section video, and continuing the preset time length to obtain the video to be extracted with the preset time length in the middle section of the video, for example, taking the video 1 minute after the start of the received video as the start of the preset time length, and taking the video 5 minutes after the start as the video to be extracted. Or, the video to be extracted may be obtained by respectively extending a preset frame or a preset time length from the middle point of the received video to the left and the right, for example, a video with 101 frames of images is received, 20 frames are counted from the 51 st frame to the left, and 41 frames are counted from the right to the 20 th frame as the video to be extracted. Of course, the embodiment of the present application does not limit the specific values of the first preset value, the second preset value, the preset frame and the preset duration.
In general, the frame position information of the same to-be-positioned region in the same video is consistent, frame position detection of the to-be-positioned region is not required for all frames in the acquired video, and frame position detection of the to-be-positioned region is only required for a certain number of video frames (i.e., target images) in the video. Therefore, in order to improve the detection processing efficiency of the frame position information of the area to be positioned in the video, resources such as calculation, storage and the like are saved. After the video to be extracted is determined, the target image needs to be extracted from the video to be extracted. In the embodiment of the application, a specific manner of extracting the target image from the video to be extracted is not limited, and the target image can be directly extracted from the video to be extracted at equal intervals; or the number of target images to be detected is set first, and then the corresponding number of target images are uniformly or randomly extracted from the extracted video. Of course, the embodiment of the present application does not limit the specific numerical values of the equal intervals and the number of target images that need to be detected.
In the embodiment of the application, the condition that the image not containing the region to be positioned is processed can be reduced to the maximum extent by intercepting the part from the received video as the video to be extracted and extracting the target image from the video to be extracted, and the number of the target images to be detected can be further reduced on the basis of ensuring the frame position information detection reliability of the region to be positioned as only part of frame images in the video are processed. Therefore, the method for acquiring the target image provided by the embodiment of the application can greatly improve the processing efficiency of video frame detection and save resources such as calculation, storage and the like of the electronic equipment.
For convenience of understanding, please refer to fig. 12, which is introduced in conjunction with an application scenario of the present solution, and a process of determining game area border position information in a video frame of a live game is described below for an application scenario by taking a video of the live game, which is glorious by a royal, as a processing object, and a MOBA game (Multiplayer Online Battle Arena).
A game area of live video of a game glowed by a royal generally has a small map area in the game process, the small map area has similar shape and relatively fixed characteristics, the position of the small map area is basically at the upper left corner of the game area, and the size of the small map area is in a certain proportion to the size of the game area. Therefore, after the position information of the small map area is positioned, the coordinates of the top left corner vertex of the small map area can be directly used as the coordinates of the top left corner vertex of the game area, and then the coordinates of the bottom right corner vertex of the game area can be determined by utilizing the position proportional relation between the small map area and the game area, so that the position information of the frame of the game area can be obtained. The application scene embodiment can take a live game image as a target image, a small map area as a specific area, a small map area template as a specific area template, the game area as an area to be positioned, and the upper left quarter area of the target image as a predetermined area; the corresponding specific processing procedure may be as follows:
a user uploads a live video of a royal glory game to a server through a terminal, and the server firstly extracts a preset number of video frames from the received live video to serve as target images; then respectively extracting feature information of a left upper quarter region of the target image and a small map region template by utilizing an SIFT algorithm, matching the small map region template with the feature information of the left upper quarter region of the target image by utilizing a nearest neighbor method, and counting the number of matched feature point pairs; when the number is larger than 15, solving the homography matrix of the matched characteristic point pair information; and projecting and transforming the position of the small map area in the target image by using the homography matrix to obtain the position information of the small map area in the target image. Calculating to obtain initial frame position information of each target image of the detected small map area according to the position proportional relation between the small map area and the game area and the position information of the small map area; and removing the initial frame position information of the outliers in the initial frame position information by using an outlier removing algorithm, and averaging the removed initial frame position information to obtain the final frame position information of the game area. The frame position information of the positioned game area can be used as one of labels of live video of the Royal game of the Royal, and the label and the live video content of the Royal game of the Royal can be stored in a storage together.
Wherein, the 'Rong Yao' is an MOBA type mobile phone game which is developed and operated by the Tengcin game Tian Mei studio group and operates on Android, IOS and NS platforms. The playing method in the game mainly comprises competitive fight, PVP fight of various modes such as 1V1, 3V3, 5V5 and the like is carried out among players, the game can also participate in adventure mode of the game, the violation mode of PVE is carried out, and the game ranking competition and the like can be participated after the conditions are met. Live game refers to a mode of playing games simultaneously by using internet technology.
On the other hand, the application also provides an image frame position detection device. For example, referring to fig. 13, which shows a schematic diagram of a composition structure of an embodiment of an image frame position detection apparatus according to the present application, the apparatus of the present embodiment may be applied to an electronic device as in the above embodiment, and the apparatus includes:
an obtaining module 101, configured to obtain a target image;
a specific region positioning module 102, configured to position location information of a specific region in a target image by using a specific region template;
and the image frame positioning module 103 is configured to determine frame position information of the region to be positioned according to the position proportional relationship between the specific region and the region to be positioned in the target image and the position information.
Optionally, the obtaining module 101 may include:
and the extraction sub-module is used for extracting the target image from the received video.
Optionally, the extracting sub-module may include:
the intercepting unit is used for intercepting a video to be extracted from the received video; the video to be extracted is a video with a preset time length in the middle section of the video;
and the extraction unit is used for extracting the target image from the video to be extracted.
Optionally, the extracting unit may include:
and the extraction subunit is used for extracting the target image from the video to be extracted at equal intervals.
Optionally, the specific area location module 102 may include:
the characteristic extraction submodule is used for respectively extracting the preset area in the target image and the characteristic information of the specific area template;
and the specific area position acquisition submodule is used for determining the position information corresponding to the specific area in the target image according to the specific area template and the feature information of the preset area.
Optionally, the feature extraction sub-module may include:
and the feature extraction unit is used for respectively extracting the predetermined regions in the target image and the feature information of the specific region template by utilizing an SIFT algorithm.
Optionally, the specific area position obtaining sub-module may include:
the characteristic matching unit is used for matching the specific area template with the characteristic information of the preset area by using a nearest neighbor method and counting the number of matched characteristic point pairs;
and the specific area position acquisition unit is used for acquiring the position information corresponding to the specific area in the target image by using the matched characteristic point pair information when the number is larger than a preset threshold value.
Optionally, the specific area position obtaining unit may include:
the homography matrix subunit is used for solving the homography matrix of the matched characteristic point pair information;
and the specific area position acquisition subunit is used for performing projection transformation on the specific area position in the target image by using the homography matrix to obtain the position information of the specific area in the target image.
Optionally, the image frame positioning module 103 may include:
the initial frame position information acquisition unit is used for acquiring initial frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information;
and the image frame positioning unit is used for processing the initial frame position information by using a mean value algorithm to obtain the frame position information of the region to be positioned.
In another aspect, the present application also provides an electronic device that may include a processor and a memory. The relationship between the processor and the memory in the electronic device can be referred to fig. 1.
Wherein the processor of the electronic device is configured to execute the program stored in the memory;
the memory of the electronic device is for storing a program for at least:
acquiring a target image;
positioning the position information of a specific area in the target image by using the specific area template;
and determining the frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information.
Of course, the electronic device may further include a communication interface, a display unit, an input device, and the like, which is not limited herein.
In another aspect, the present application further provides a storage medium, in which a computer program is stored, and when the computer program is loaded and executed by a processor, the computer program is configured to implement the image frame position detection method described in any one of the above embodiments.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (10)

1. An image frame position detection method is characterized by comprising the following steps:
acquiring a target image;
positioning the position information of a specific area in the target image by using a specific area template;
and determining the frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information.
2. The method according to claim 1, wherein the determining the frame position information of the region to be positioned according to the position proportional relationship between the specific region and the region to be positioned in the target image and the position information includes:
acquiring initial frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information;
and processing the initial frame position information by using a mean algorithm to obtain the frame position information of the area to be positioned.
3. The method according to claim 1, wherein the acquiring the target image includes:
intercepting a video to be extracted from a received video; the video to be extracted is a video with a preset time length in the middle section of the video;
and extracting a target image from the video to be extracted.
4. The method according to any one of claims 1 to 3, wherein the locating the position information of the specific region in the target image by using the specific region template includes:
respectively extracting the preset area in the target image and the characteristic information of a specific area template;
and determining the position information corresponding to the specific area in the target image according to the specific area template and the feature information of the preset area.
5. The method according to claim 4, wherein the determining the position information corresponding to the specific area in the target image according to the specific area template and the feature information of the predetermined area includes:
matching the specific area template with the characteristic information of the preset area by using a nearest neighbor method, and counting the number of matched characteristic point pairs;
and when the number is larger than a preset threshold value, acquiring the position information corresponding to the specific area in the target image by using the matched characteristic point pair information.
6. The method according to claim 5, wherein the obtaining the position information corresponding to the specific area in the target image by using the matched pair of feature points information includes:
solving the homography matrix of the matched characteristic point pair information;
and performing projection transformation on the position of the specific area in the target image by using the homography matrix to obtain the position information of the specific area in the target image.
7. The image frame position detection method according to claim 1, wherein when the target image is a live game image, the specific area is a small map area, the specific area template is a small map area template, and the area to be positioned is a game area.
8. An image frame position detection device, comprising:
the acquisition module is used for acquiring a target image;
the specific area positioning module is used for positioning the position information of the specific area in the target image by using a specific area template;
and the image frame positioning module is used for determining the frame position information of the region to be positioned according to the position proportional relation between the specific region and the region to be positioned in the target image and the position information.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the image border position detection method according to any one of claims 1 to 7 when executing the computer program.
10. A storage medium having stored thereon computer-executable instructions that, when loaded and executed by a processor, implement the image border position detection method according to any one of claims 1 to 7.
CN201911215002.1A 2019-12-02 2019-12-02 Image frame position detection method and device, electronic equipment and storage medium Pending CN111079741A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911215002.1A CN111079741A (en) 2019-12-02 2019-12-02 Image frame position detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911215002.1A CN111079741A (en) 2019-12-02 2019-12-02 Image frame position detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111079741A true CN111079741A (en) 2020-04-28

Family

ID=70312412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911215002.1A Pending CN111079741A (en) 2019-12-02 2019-12-02 Image frame position detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111079741A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598076A (en) * 2020-05-25 2020-08-28 北京明略软件系统有限公司 Method and device for detecting and processing date in label image
CN111666882A (en) * 2020-06-08 2020-09-15 武汉唯理科技有限公司 Method for extracting answers of handwritten test questions
CN111753730A (en) * 2020-06-24 2020-10-09 国网电子商务有限公司 Image examination method and device
CN113223041A (en) * 2021-06-25 2021-08-06 上海添音生物科技有限公司 Method, system and storage medium for automatically extracting target area in image
CN113255812A (en) * 2021-06-04 2021-08-13 北京有竹居网络技术有限公司 Video frame detection method and device and electronic equipment
CN114024944A (en) * 2021-11-02 2022-02-08 广州虎牙科技有限公司 Media content implantation method and device, electronic equipment and storage medium
US11593429B2 (en) * 2018-03-21 2023-02-28 Rovi Guides, Inc. Systems and methods for presenting auxiliary video relating to an object a user is interested in when the user returns to a frame of a video in which the object is depicted

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11593429B2 (en) * 2018-03-21 2023-02-28 Rovi Guides, Inc. Systems and methods for presenting auxiliary video relating to an object a user is interested in when the user returns to a frame of a video in which the object is depicted
CN111598076A (en) * 2020-05-25 2020-08-28 北京明略软件系统有限公司 Method and device for detecting and processing date in label image
CN111598076B (en) * 2020-05-25 2023-05-02 北京明略软件系统有限公司 Method and device for detecting and processing date in label image
CN111666882A (en) * 2020-06-08 2020-09-15 武汉唯理科技有限公司 Method for extracting answers of handwritten test questions
CN111753730A (en) * 2020-06-24 2020-10-09 国网电子商务有限公司 Image examination method and device
CN113255812A (en) * 2021-06-04 2021-08-13 北京有竹居网络技术有限公司 Video frame detection method and device and electronic equipment
CN113255812B (en) * 2021-06-04 2024-04-23 北京有竹居网络技术有限公司 Video frame detection method and device and electronic equipment
CN113223041A (en) * 2021-06-25 2021-08-06 上海添音生物科技有限公司 Method, system and storage medium for automatically extracting target area in image
CN113223041B (en) * 2021-06-25 2024-01-12 上海添音生物科技有限公司 Method, system and storage medium for automatically extracting target area in image
CN114024944A (en) * 2021-11-02 2022-02-08 广州虎牙科技有限公司 Media content implantation method and device, electronic equipment and storage medium
CN114024944B (en) * 2021-11-02 2024-03-29 广州虎牙科技有限公司 Media content implantation method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111079741A (en) Image frame position detection method and device, electronic equipment and storage medium
Yu et al. Trajectory-based ball detection and tracking in broadcast soccer video
CN110602554B (en) Cover image determining method, device and equipment
CN109409377B (en) Method and device for detecting characters in image
US11141665B2 (en) Method of determining exciting moments in a game video and method of playing a game video
US10169629B2 (en) Decoding visual codes
CN109308456B (en) Target object information determination method, device, equipment and storage medium
US20150249871A1 (en) Information Push Method And System, Digital Television Receiving Terminal, And Computer Storage Medium
CN115396705B (en) Screen operation verification method, platform and system
CN111401238A (en) Method and device for detecting character close-up segments in video
CN106682670B (en) Station caption identification method and system
CN110213605B (en) Image correction method, device and equipment
CN112257729B (en) Image recognition method, device, equipment and storage medium
CN115830604A (en) Surface single image correction method, device, electronic apparatus, and readable storage medium
CN109919164B (en) User interface object identification method and device
CN110097061B (en) Image display method and device
CN113490009B (en) Content information implantation method, device, server and storage medium
US10740922B2 (en) Image data processing method, electronic device and storage medium
CN108174091B (en) Image processing method, image processing device, storage medium and electronic equipment
US20230224528A1 (en) Method of processing video stream, computer device, and medium
Tahan et al. A computer vision driven squash players tracking system
CN110414471B (en) Video identification method and system based on double models
CN107967447B (en) Object display method, device, storage medium and electronic device
CN111476132A (en) Video scene recognition method and device, electronic equipment and storage medium
CN115243073B (en) Video processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40022133

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination