CN112287169B

CN112287169B - Data acquisition method, device and system, electronic equipment and storage medium

Info

Publication number: CN112287169B
Application number: CN202011177236.4A
Authority: CN
Inventors: 陈志立; 杨建朝; 刘晶
Original assignee: ByteDance Inc
Current assignee: ByteDance Inc
Priority date: 2020-10-29
Filing date: 2020-10-29
Publication date: 2024-04-26
Anticipated expiration: 2040-10-29
Also published as: CN112287169A

Abstract

A video-based data acquisition method, device and system, an electronic device and a non-transitory readable storage medium. The data acquisition method comprises the following steps: acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos; extracting video frames from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames; performing target recognition on each video frame in the video frame set; in response to a first video frame including a target landmark among a plurality of video frames and satisfying a preset condition, the first video frame is taken as data of the target landmark. The data acquisition method can conveniently and rapidly acquire data without acquiring the data in the field, so that the acquisition cost of landmark data is reduced, the workload of data acquisition can be reduced by setting preset conditions, and the efficiency of data acquisition is improved.

Description

Data acquisition method, device and system, electronic equipment and storage medium

Technical Field

Embodiments of the present disclosure relate to a video-based data acquisition method, a video-based data acquisition device, a video-based data acquisition system, an electronic apparatus, and a non-transitory readable storage medium.

Background

With the development of communication technology and terminal devices, various terminal devices such as mobile phones, tablet computers and the like have become an indispensable part of people's work and life, and with the increasing popularity of terminal devices, video interactive application has become a main channel of communication and entertainment.

Currently, the landmark AR (augmented reality ) special effect is one of hot spots in the short video field. The landmark AR special effect can increase the interestingness of shooting, and promote the user to more actively shoot and record.

Disclosure of Invention

Aiming at the problems that the conventional landmark data acquisition method needs to acquire in the field, so that the acquisition cost is increased and the acquisition efficiency is reduced, at least one embodiment of the present disclosure provides a data acquisition method, device and system based on video, electronic equipment and non-transient readable storage medium, which can acquire data conveniently and rapidly, and the data acquisition method does not need to acquire the data in the field, can reduce the acquisition cost of landmark data, can reduce the workload of data acquisition, and can improve the efficiency of data acquisition.

At least one embodiment of the present disclosure provides a video-based data acquisition method, including: acquiring a video set corresponding to a target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos; extracting video frames from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames; performing target recognition on each video frame in the video frame set; and responding to a first video frame which comprises the target landmark and meets a preset condition in the plurality of video frames, and taking the first video frame as data of the target landmark.

At least one embodiment of the present disclosure also provides a video-based data acquisition device, including: the device comprises a video set acquisition module, a video frame acquisition module, a target identification module and a judgment module. The video set acquisition module is configured to acquire a video set corresponding to the target landmark according to the retrieval parameter, wherein the video set comprises a plurality of videos. The video frame acquisition module is configured to extract video frames for each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames. The object recognition module is configured to object recognize each video frame in the set of video frames. The judgment module is configured to respond to a first video frame which comprises the target landmark and meets a preset condition among the plurality of video frames, and takes the first video frame as data of the target landmark.

The at least one embodiment of the present disclosure also provides a video-based data acquisition system, including a terminal and a data server. Wherein the terminal is configured to send request data to the data server; the data server is configured to: and responding to the request data, determining a video set corresponding to the target landmark according to the retrieval parameters in the request data, and sending the video set to the terminal, wherein the video set comprises a plurality of videos. The terminal is further configured to: extracting video frames from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames; performing target recognition on each video frame in the video frame set; and responding to a first video frame which comprises the target landmark and meets a preset condition in the plurality of video frames, and taking the first video frame as data of the target landmark.

At least one embodiment of the present disclosure also provides an electronic device, including: a processor; and a memory storing one or more computer program modules. The one or more computer program modules configured to be executed by the processor, the one or more computer program modules comprising instructions for performing the data acquisition method of any of the embodiments described above.

At least one embodiment of the present disclosure also provides a non-transitory readable storage medium having computer instructions stored thereon. The data acquisition method according to any one of the embodiments is performed when the computer instructions are executed by a processor.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. Like reference numerals refer to like elements throughout the drawings. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is an example flow chart of a video-based data acquisition method provided by at least one embodiment of the present disclosure;

FIG. 2 is a schematic illustration of a feature image of a target landmark provided in accordance with at least one embodiment of the present disclosure;

Fig. 3 is an example flowchart of steps involved in step S101 shown in fig. 1;

FIG. 4A is a flowchart of one example operation of screening an initial video set in accordance with at least one embodiment of the present disclosure;

FIG. 4B is another example flowchart of operations for screening an initial video set in accordance with at least one embodiment of the present disclosure;

FIG. 5 is a network architecture diagram of a denoising convolutional neural network provided by at least one embodiment of the present disclosure;

FIG. 6 is an example block diagram of a video-based data acquisition device provided in accordance with at least one embodiment of the present disclosure;

FIG. 7 is an example block diagram of a video-based data acquisition system provided by at least one embodiment of the present disclosure;

FIG. 8 is an example block diagram of an electronic device provided by at least one embodiment of the present disclosure;

FIG. 9 is an exemplary block diagram of a terminal provided in accordance with at least one embodiment of the present disclosure;

FIG. 10 is an example block diagram of a non-transitory readable storage medium provided by at least one embodiment of the present disclosure; and

Fig. 11 illustrates an exemplary scene graph of a video-based data acquisition system provided by at least one embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

With the increasing popularity of terminal devices, video interactive applications become a major channel of communication and entertainment. For example, the short video has the characteristics of strong social property, easiness in creation, long time and the like, and is more in line with the consumption habit of fragmented content of the user in the mobile Internet age. The augmented reality (Augmented Reality) technology is a technology for skillfully fusing virtual information with the real world, and widely uses various technical means such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, sensing and the like, and applies virtual information such as characters, images, three-dimensional models, music, videos and the like generated by a computer to the real world after simulation, so that the two kinds of information are mutually complemented, thereby realizing the enhancement of the real world. The special virtual-real fusion special effect of the AR technology determines that the AR technology has infinite expansion space in the short video field.

At present, the landmark AR special effect is one of hot spots in the short video field. The landmark AR special effect can increase the interestingness of shooting, and promote the user to more actively shoot and record. Landmark AR special effects require modeling landmarks (e.g., feature buildings of a city) using collected data, however, different data collection methods have different effects on the efficiency and effectiveness of modeling. Conventional methods of collecting landmark data may require personnel to take vehicles (e.g., aircraft, high-speed rail, trains, etc.) to the field for collection (e.g., photography, etc.), which results in increased collection costs and reduced collection efficiency.

At least one embodiment of the present disclosure provides a video-based data acquisition method, a data acquisition device, and a data acquisition system, as well as an electronic device and a non-transitory readable storage medium. The video-based data acquisition method comprises the following steps: acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos; extracting video frames from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames; performing target recognition on each video frame in the video frame set; in response to a first video frame including a target landmark among a plurality of video frames and satisfying a preset condition, the first video frame is taken as data of the target landmark.

The data about the target landmark is acquired by the data acquisition method provided by at least one embodiment of the present disclosure, so that the data can be acquired conveniently and quickly, and the field acquisition is not required, thereby reducing the acquisition cost for the landmark data; and by setting preset conditions, the workload of data acquisition can be reduced, and the efficiency of data acquisition can be improved.

The data acquisition method provided according to at least one embodiment of the present disclosure is described below by way of several examples or embodiments, and as described below, different features of these specific examples or embodiments may be combined with each other without contradiction, thereby resulting in new examples or embodiments, which are also within the scope of the present disclosure.

Fig. 1 is an example flowchart of a video-based data acquisition method provided in at least one embodiment of the present disclosure.

The video-based data acquisition method 10 provided by at least one embodiment of the present disclosure may be applied to various video interaction applications, video editing applications, and the like. For example, in at least one embodiment, the video-based data acquisition method 10 may acquire data about a target landmark for subsequent use in three-dimensional modeling, generating landmark AR special effects, and the like. For example, in at least one embodiment, as shown in FIG. 1, the video-based data acquisition method 10 includes the operations of:

Step S101: acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos;

Step S102: extracting video frames from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames;

step S103: performing target recognition on each video frame in the video frame set;

step S104: in response to a first video frame including a target landmark among a plurality of video frames and satisfying a preset condition, the first video frame is taken as data of the target landmark.

For example, by collecting the data of the target landmark by the data collecting method 10, the data can be conveniently and quickly obtained without the need of personnel to collect the data in the field, thereby reducing the collecting cost of the landmark data and improving the data collecting efficiency.

For example, the steps S101 to S104 may be performed sequentially, or may be performed in other order after adjustment, and the order of execution of the steps is not limited in the embodiments of the present disclosure and may be adjusted according to actual situations. For example, steps S101-S104 may be implemented by a server or a local side, which is not limited by embodiments of the present disclosure. For example, in some examples, implementing the data acquisition method 10 provided by at least one embodiment of the present disclosure may optionally perform some of the steps S101-S104, and may also perform some additional steps other than the steps S101-S104, as embodiments of the present disclosure are not particularly limited in this regard.

The video-based data acquisition method 10 provided by the present disclosure is described in detail below, by way of example, with reference to the accompanying drawings.

Step S101: and acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos.

For example, in at least one embodiment of the present disclosure, the target landmark includes a target building, a target landscape, or a target animal and plant. For example, in one example, the target landmark may be a feature building of a city, such as a Taiggy located in Sanbey, an Oriental pearl tower (commonly known as "Oriental pearl") located in Shanghai, a Guangzhou tower (commonly known as "thin waist") located in Guangzhou, and so on. For example, in another example, the target landmark may be a famous landscape (or tourist attraction, etc.) of a certain city, e.g., a mountain located in an Anhui, a Pearmucklandmine located in a Tibet, etc. For example, in yet another example, the target landmark may be a well-known animal or plant, such as a welcome pine located in a yellow mountain area, or the like.

It should be noted that, the target landmark in the embodiment of the present disclosure includes, but is not limited to, the above listed content, and the target landmark may be any content associated with a place of interest to a user, and the embodiment of the present disclosure is not limited thereto and may be set according to actual requirements.

For example, in at least one embodiment of the present disclosure, the retrieval parameters may include at least one of a keyword of the target landmark, a feature image corresponding to the target landmark, positioning information corresponding to the target landmark, and classification information corresponding to the target landmark. For example, in at least one embodiment of the present disclosure, the keywords of the target landmark include at least one of a name, an abbreviation, a generic name, a feature description of the target landmark. For example, in one example, the target landmark is "guangzhou Tower" located in guangdong province of china, and its search parameters may include keywords, for example, its keyword may include its name "guangzhou Tower", its other name "thin waist", "guangzhou new television Tower", or "sea-core Tower", its feature description "china first Tower", "world fourth Tower", etc., and its foreign name "Canton Tower", the foreign abbreviation "CT", etc., which the embodiments of the present disclosure are not limited to. For example, in another example, the target landmark is the world wide jar prayer in beijing city, china, and the search parameter may include a feature image corresponding to the target landmark (i.e., "beijing world wide jar"), as shown in fig. 2. For example, in yet another example, the target landmark is a pacific located in a three-wall hall of beijing, and the retrieval parameter may include positioning information corresponding to the target landmark, that is, a three-wall road 19 court in a korean district of beijing city, and may further include classification information corresponding to the target landmark, for example, a business district landmark of beijing, and the like.

It should be noted that, the content of the search parameter in the above example is merely exemplary, and the search parameter about the target landmark in the embodiment of the present disclosure includes, but is not limited to, the above content, and may be set according to actual requirements.

Fig. 3 is an example flowchart of steps included in step S101 shown in fig. 1. For example, in at least one embodiment of the present disclosure, acquiring a video set corresponding to a target landmark (i.e., step S101) may include the following operations according to the retrieval parameters, as shown in fig. 3.

Step S301: searching according to the search parameters, and matching from a video database or the Internet to obtain an initial video set corresponding to the target landmark;

Step S302: a video set is derived based on the initial video set.

For example, in at least one embodiment of the present disclosure, for step S301, an initial video set corresponding to a target landmark is obtained by matching from a video database or the internet according to a search parameter, and methods such as entity recognition, keyword matching, deep learning, etc. may be adopted, and the embodiment of the present disclosure does not limit a specific matching method, as long as an initial video set corresponding to a target landmark can be obtained. For example, the initial video set includes a plurality of videos, which are videos obtained primarily by retrieving and matching, and a part or all of the videos may constitute a video set for subsequent processing.

For example, in at least one embodiment of the present disclosure, the video database may be already established, and the already established video database may be stored in a local or server in advance, or may be established by a server, for example, when the data acquisition method 10 is implemented, or may be read from another device, which is not particularly limited, and may be set according to actual requirements.

Typically, many users upload their videos or interesting videos to the internet for other users to browse or collect. For example, in at least one embodiment of the present disclosure, an initial video set corresponding to a target landmark may be obtained from internet matching directly according to the retrieval parameters.

For example, in at least one embodiment of the present disclosure, for step S302, deriving a video set based on the initial video set may include: the initial video set is taken as a video set. For example, in one example, after the initial video set corresponding to the target landmark is obtained according to the search parameter matching, the obtained initial video set may be directly used as the video set. For example, in some cases, since the number of videos matched with the above search parameters in the video database or the internet may be large, the total number of videos included in the initial video set is large, and if the initial video set obtained by matching is directly used as the video set, a large amount of repeated data or invalid data may occur, resulting in large workload and low efficiency of data acquisition. Thus, in at least one embodiment of the present disclosure, for step S302, deriving a video set based on the initial video set may include: the initial video set is filtered to obtain a video set.

Fig. 4A is a flowchart of one example operation of screening an initial video set in accordance with at least one embodiment of the present disclosure. For example, in at least one embodiment of the present disclosure, filtering an initial video set to obtain a video set may include the following operations, as shown in fig. 4A.

Step S401: determining the duration of each video in the initial video set;

step S402: and selecting videos with duration within a preset duration range to obtain a video set.

For example, in at least one embodiment of the present disclosure, the initial video set includes three videos, e.g., including video 1, video 2, and video 3. First, the duration of each video in the initial video set is determined, for example, video 1 for 50 seconds, video 2 for 90 seconds, and video 3 for 150 seconds. Then selecting videos with duration within a preset duration range to obtain a video set. For example, in one example, where the preset duration ranges from 60 seconds to 180 seconds, then video 2 and video 3 in the initial video set are selected to join the video set because the duration is within the preset duration range.

It should be noted that the foregoing embodiments are merely exemplary, and the preset duration range may be 60 seconds to 180 seconds, 120 seconds to 300 seconds, etc., and the embodiment of the present disclosure does not limit the preset duration range and may be set according to actual requirements. The number of the video frames which can be extracted by the video with shorter duration is limited, and the operation amount is overlarge due to the video with longer duration, so that the video with shorter duration and the video with longer duration can be removed by limiting the duration range, thereby improving the acquisition efficiency of the video frames and reducing the operation amount. In addition, the embodiments of the present disclosure do not limit parameters such as resolution of video (e.g., 720P, 1080P, etc.), shooting frame rate (e.g., 30FPS, 60FPS, etc.), and the like.

Fig. 4B is another example flow diagram of operations for screening an initial video set in accordance with at least one embodiment of the present disclosure. For example, in at least one embodiment of the present disclosure, filtering an initial video set to obtain a video set may include the following operations, as shown in fig. 4B.

Step S410: determining a shooting angle of each video in the initial video set relative to the target landmark;

step S420: a plurality of videos whose shooting angles are different from each other are selected to obtain a video set.

For example, in at least one embodiment of the present disclosure, the photographing angle with respect to the target landmark may include a photographing height, a photographing direction, and a photographing distance, and may be acquired by analyzing various feature points included in the picture itself of the video and parameters (e.g., pose, etc.) of a photographing camera recorded by the video, for example. Different videos with the same shooting angle include substantially the same images with respect to the target landmark. Therefore, videos with different shooting angles are selected to be added into the video set, so that the repetition of collected data can be avoided, and the data of the target landmark can be enriched.

For example, in at least one embodiment of the present disclosure, a plurality of videos having a specific photographing angle or within a specific photographing angle range and having photographing angles different from each other may be selected to screen a video set from an initial video set, and the selection of photographing angles in the embodiment of the present disclosure is not limited and may be set according to actual needs.

It should be noted that, in the embodiment of the present disclosure, the screening manner of the initial video set is not limited to the above-described screening according to the duration and the screening according to the shooting angle, and may also be selected according to any other applicable parameters and rules, which may be determined according to the actual needs, and the embodiment of the present disclosure is not limited thereto.

Step S102: video frames are decimated for each video in a video set to obtain a video frame set that includes decimated video frames.

For example, in at least one embodiment of the present disclosure, video frames may be decimated from video at a decimating frame rate. For example, the extraction frame rate may be preset, or may be different according to the respective shooting frame rates of different videos, which is not limited by the embodiments of the present disclosure. For example, in one example, video frame extraction for all videos may be set to extract one frame per second. For another example, in another example, a video for a 30FPS may be set to extract one frame per second, while a video for a 60FPS may be set to extract two frames per second.

It should be noted that the above embodiments are merely exemplary, and the embodiments of the present disclosure do not limit the setting of the extraction frame rate, and may be set according to actual requirements.

For example, in at least one embodiment of the present disclosure, a set of video frames may be derived based on the extracted video frames, the set of video frames comprising a plurality of video frames.

Step S103: target identification is performed for each video frame in the set of video frames.

For example, in at least one embodiment of the present disclosure, object recognition for each video frame in a set of video frames includes: each video frame in the set of video frames is input to the neural network for object recognition. For example, in at least one embodiment of the present disclosure, a denoising convolutional neural network (de-noising convolutional neural network, dnCNN) may be employed. For example, in at least one embodiment of the present disclosure, the denoising convolutional neural network is derived based on a residual learning algorithm and a batch normalization algorithm. For example, the denoising convolutional neural network may denoise gaussian additive noise using a multi-layer deep convolutional neural network.

For example, in at least one embodiment of the present disclosure, the DnCNN model employs a residual learning algorithm. Unlike the conventional method, which uses a plurality of small residual units, the DnCNN model uses the entire network to construct a large residual unit, thereby predicting a residual image (i.e., a noise image). Assuming that the input of the DnCNN model is an additively noisy sample y=x+v, the DnCNN model learns a function R (y) ≡v, where v can be understood as the residual image (i.e., the noisy image) so that the original image x=y-R (y) can be restored.

Fig. 5 is a network architecture DnCNN provided in accordance with at least one embodiment of the present disclosure. As shown in fig. 5, dnCNN employs a stacked full convolution structure. Irrespective of the noisy image and the output layer of the input, assuming a total of D layers (typically, depth D is set to 17 and 20 in DnCNN), then a total of three different convolution blocks are distributed in the first, middle, and last three portions in fig. 5.

As shown in fig. 5, the first convolution block is conv+relu, which constitutes the first layer, i.e., convolving the input image (Conv), and then uses a correction linear unit (RECTIFIED LINEAR unit, reLU) (also known as the active layer). The second convolution block, i.e., the middle 2- (D-1) layer, uses a conv+bn+relu combination, i.e., a batch normalization (batch normalization, BN) layer is added between the convolution layer and the ReLU, which is a more important layer, dnCNN greatly benefits from the combination of the residual learning algorithm and the batch normalization algorithm. When the batch random gradient (SDG) GRADIENT DESCENT is used for network learning, the batch normalization is used for slowing down the change of the input distribution of the nonlinear input unit in the training process, so that the convergence of the training is quickened. While the last convolution block uses only the convolution layer to reconstruct the output layer. For example, in at least one embodiment of the present disclosure, a DnCNN model derived based on a residual learning algorithm and a batch normalization algorithm may achieve better denoising results.

It should be noted that the above embodiments are merely exemplary, and embodiments of the present disclosure may also use other methods to target each video frame in a set of video frames, which embodiments of the present disclosure are not limited to. When the neural network is used for target recognition, the structure of the neural network can be any applicable structure, which can be determined according to practical requirements, and is not limited to the DnCNN model described above.

For example, in at least one embodiment of the present disclosure, the preset conditions include at least one of the following conditions: the number of pixels of the identified target landmark in the first video frame reaches a first predetermined value, the ratio of the portion of the identified target landmark in the first video frame to the entirety of the target landmark reaches a second predetermined value, or the ratio of the number of pixels of the identified target landmark in the first video frame to the total number of pixels of the first video frame reaches a third predetermined value.

For example, in at least one embodiment of the present disclosure, the image of the first video frame comprises 640 x 840 pixels, for a total of about 54 ten thousand pixels, assuming that the image of the first video frame comprises a target landmark and the number of pixels in the first video frame displaying the target landmark is 30 ten thousand. When the first predetermined value is set to 25 ten thousand, then the number of pixels of the target landmark identified in the first video frame (i.e., 30 ten thousand) reaches the first predetermined value (25 ten thousand), and the first video frame may be used as data of the corresponding target landmark, for example, the first video frame may be stored for subsequent use. When the first predetermined value is set to 40 ten thousand, then the number of pixels of the target landmark identified in the first video frame (i.e., 30 ten thousand) does not reach the first predetermined value (40 ten thousand), and the first video frame cannot serve as data of the corresponding target landmark.

It should be noted that the first predetermined value, the number of vertical pixels and horizontal pixels of the first video frame, and the number of pixels of the target landmark described in the above embodiments are all exemplary, and the embodiments of the present disclosure are not limited thereto and may be set according to actual situations.

For example, in at least one embodiment of the present disclosure, when the second predetermined value is 0.5, then when the ratio of the portion of the target landmark displayed in the first video frame to the entirety of the actual target landmark exceeds half, the first video frame may be used as data for the corresponding target landmark, and then the first video frame may be stored for subsequent use, for example.

It should be noted that, in the above embodiment, the ratio of the portion of the target landmark identified in the first video frame to the whole of the target landmark may refer to the ratio of the area of the target landmark displayed in the first frame image to the actual area of the target landmark, or may refer to the ratio of the volume of the target landmark displayed in the first frame image to the actual volume of the target landmark, which is not limited in the embodiments of the present disclosure.

It should be further noted that, the second predetermined value of 0.5 is merely exemplary, and the second predetermined value may also be any value of 0.6, 0.4, etc., which is not limited by the embodiment of the present disclosure, and may be set according to actual needs.

For example, in at least one embodiment of the present disclosure, the image of the first video frame comprises 640 x 840 pixels, for a total of about 54 ten thousand pixels, assuming that the image of the first video frame comprises a target landmark and the number of pixels in the first video frame displaying the target landmark is 20 ten thousand. When the third predetermined value is set to 0.3, then the ratio of the number of pixels of the identified target landmark in the first video frame (i.e., 20 ten thousand) to the total number of pixels of the first video frame (i.e., 54 ten thousand) (approximately 0.37) exceeds the third predetermined value (i.e., 0.3), and the first video frame may be used as data for the corresponding target landmark, for example, the first video frame may be stored for subsequent use. When the third predetermined value is set to 0.5, then the ratio of the number of pixels of the identified target landmark in the first video frame (i.e., 20 ten thousand) to the total number of pixels of the first video frame (i.e., 54 ten thousand) (about 0.37) is lower than the third predetermined value (i.e., 0.5), and the first video frame cannot serve as data of the corresponding target landmark.

It should be noted that the third predetermined value, the number of vertical pixels and horizontal pixels of the first video frame, and the number of pixels of the target landmark described in the above embodiments are all exemplary, and the embodiments of the present disclosure are not limited thereto and may be set according to actual situations.

For example, in at least one embodiment of the present disclosure, the preset conditions may include: the location of the identified target landmark in the first video frame is within a preset range of locations of the image of the first video frame. For example, in one example, the image may be identified by a deep learning method (e.g., the above described de-noised convolutional neural network), such as determining outer contour data of the target landmark within the first video frame, or center point data of the target landmark, or the like, to determine the relative position of the target landmark within the image of the first video frame. For example, a position range is further preset, for example, the preset position range is a middle portion of the image of the first video frame, and more specifically, for example, in one example, the image of the first video frame includes 100 ten thousand total pixels, and the preset position range may be set to be within a range of 40 ten thousand pixels around the image center axis of the first video frame.

It should be noted that the preset position range described in the above embodiment is merely exemplary in the range of 40 ten thousand pixels near the central axis of the image, and the embodiment of the present disclosure does not limit the preset position range and may be set according to actual requirements.

The data acquisition method 10 provided by the embodiment of the disclosure can acquire data conveniently and rapidly, and workers do not need to physically acquire the data in the field, so that the acquisition cost of landmark data is reduced. By setting preset conditions and screening the video set, repeated or invalid data can be removed from all the collected data, so that the workload of data collection is reduced, and the efficiency of data collection is improved.

For example, in at least one embodiment of the present disclosure, the data acquisition method 10 may further include: data of the target landmark is stored for three-dimensional modeling, e.g., for three-dimensional reconstruction.

For example, in at least one embodiment of the present disclosure, data of the target landmark acquired based on the data acquisition method 10 provided in the above embodiment is stored for subsequent operations such as performing three-dimensional modeling on the landmark, making AR special effects, etc., to which embodiments of the present disclosure are not limited.

At least one embodiment of the present disclosure also provides a video-based data acquisition device. Fig. 6 is a schematic block diagram of a video-based data acquisition device provided in accordance with at least one embodiment of the present disclosure. As shown in fig. 6, the data acquisition device 60 includes: the video set acquisition module 601, the video frame acquisition module 602, the target recognition module 603, and the judgment module 604 may be implemented by software, hardware, firmware, or any combination thereof, for example, may be implemented as a video set acquisition circuit, a video frame acquisition circuit, a target recognition circuit, and a judgment circuit, respectively.

For example, in at least one embodiment of the present disclosure, the video set acquisition module 601 is configured to acquire a video set corresponding to a target landmark, the video set including a plurality of videos, according to the retrieval parameters. For example, the video frame acquisition module 602 is configured to extract video frames for each video in a video set to obtain a video frame set comprising a plurality of extracted video frames. For example, the object recognition module 603 is configured to object recognize each video frame in the set of video frames. For example, the determination module 604 is configured to, in response to a first video frame including a target landmark among a plurality of video frames and satisfying a preset condition, take the first video frame as data of the target landmark.

For example, in at least one embodiment of the present disclosure, the preset conditions include: the location of the identified target landmark in the first video frame is within a preset range of locations of the image of the first video frame.

It should be noted that, the setting of the preset condition is already described in detail in the above embodiment of the data acquisition method 10, and will not be described herein again.

For example, the specific operations that the video set acquisition module 601, the video frame acquisition module 602, the target identification module 603, and the determination module 604 are configured to perform may be referred to the relevant descriptions of the steps S101 to S104 of the data acquisition method 10 provided in at least one embodiment of the present disclosure, which are not described herein.

It should be noted that, in the embodiment of the present disclosure, the data acquisition device 60 may further include more modules, which are not limited to the video set acquisition module 601, the video frame acquisition module 602, the target recognition module 603, and the determination module 604, which may be determined according to actual requirements, which is not limited by the embodiment of the present disclosure.

It should be understood that the data acquisition device 60 provided in the embodiments of the present disclosure may implement the foregoing data acquisition method 10, and may also achieve similar technical effects as those of the foregoing data acquisition method 10, which is not described herein.

At least one embodiment of the present disclosure also provides a video-based data acquisition system. Fig. 7 is an example block diagram of a video-based data acquisition system provided by at least one embodiment of the present disclosure. As shown in fig. 7, the data acquisition system 70 includes a terminal 710 and a data server 720, and the terminal 710 and the data server 720 are signal-connected. The terminal 710 is configured to transmit the request data to the data server 720. The data server 720 is configured to: in response to the request data, a video set corresponding to the target landmark is determined according to the retrieval parameters in the request data, and the video set is transmitted to the terminal 710, the video set including a plurality of videos.

For example, the terminal 710 is further configured to extract video frames for each video in the video set to obtain a video frame set including the extracted plurality of video frames, target identify each video frame in the video frame set, and respond to a first video frame including a target landmark among the plurality of video frames and satisfying a preset condition, the first video frame being data of the target landmark.

For example, the above operations that the terminal 710 and the data server 720 are configured to perform may be referred to the data acquisition method 10 provided in at least one embodiment of the present disclosure, and will not be described herein.

For example, in one example, the data acquisition system 70 includes a terminal 710 that may be implemented as a client (e.g., a cell phone, a computer, etc.), and a data server 720 that may be implemented as a server (e.g., a server).

For example, in one example, as shown in FIG. 7, data acquisition system 70 may include a database server 730 that stores a video database in addition to terminal 710 and data server 720. The database server 730 is in signal connection with the data server 720 and is configured to respond to the request information of the data server 720 and to return the data corresponding to the request information in the video database to the data server 720. It should be noted that, when the data acquisition system 70 does not include the database server 730, the data in the video database may be directly stored on the data server 720 or stored in another storage device that is provided separately, or the data server 720 may establish the video database by itself and then store the data in the data server 720 or stored in another storage device that is provided separately, which is not limited in particular by the embodiments of the present disclosure.

The video-based data acquisition system 70 according to at least one embodiment of the present disclosure may implement the data acquisition method 10 according to the foregoing embodiment, and may also achieve similar technical effects as the data acquisition method 10 according to the foregoing embodiment, which is not described herein.

At least one embodiment of the present disclosure also provides an electronic device. Fig. 8 is a schematic diagram of an electronic device provided in at least one embodiment of the present disclosure. For example, as shown in fig. 8, the electronic device 80 includes a processor 810 and a memory 820. Memory 820 includes one or more computer program modules 821. One or more computer program modules 821 are stored in memory 820 and configured to be executed by processor 810, the one or more computer program modules 821 include instructions for performing any of the data acquisition methods provided by at least one embodiment of the present disclosure, which when executed by processor 810, can perform one or more steps in the data acquisition methods provided by at least one embodiment of the present disclosure. The memory 820 and the processor 810 may be interconnected by a bus system and/or other forms of connection mechanisms (not shown).

For example, the memory 820 and the processor 810 may be disposed at a server side (or cloud), such as the aforementioned data server 720, for performing one or more steps of the data acquisition methods described in fig. 1,3, 4A, and 4B.

For example, the processor 810 may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or other form of processing unit having data processing and/or program execution capabilities, such as a Field Programmable Gate Array (FPGA), or the like; for example, the Central Processing Unit (CPU) may be an X86 or ARM architecture, or the like. The processor 810 may be a general purpose processor or a special purpose processor that may control other components in the electronic device 80 to perform the desired functions.

For example, memory 820 may comprise any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules 821 may be stored on the computer-readable storage medium, and the processor 810 may execute the one or more computer program modules 821 to implement the various functions of the electronic device 80. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the computer readable storage medium. The specific functions and technical effects of the electronic device 80 may be referred to the above description of the data acquisition method, and will not be repeated here.

Fig. 9 is an example block diagram of a terminal provided in accordance with at least one embodiment of the present disclosure. For example, in at least one embodiment of the present disclosure, the terminal is a terminal 900 having data processing capability, such as may be applied in the data acquisition method 10 provided in embodiments of the present disclosure. For example, the terminal 900 may transmit request data to a server (e.g., the data server 720), the terminal 900 may further receive a video set from the server (e.g., the data server 720), and then extract video frames for each video in the video set to obtain a video frame set including a plurality of extracted video frames, object-identify each video frame in the video frame set, and respond to a first video frame including a target landmark among the plurality of video frames and satisfying a preset condition as data of the target landmark. It should be noted that the terminal 900 shown in fig. 9 is only one example, and does not impose any limitation on the functions and use scope of the embodiments of the present disclosure.

As shown in fig. 9, the terminal 900 may include a processing device (e.g., a central processor, a graphic processor, etc.) 910, which may perform various suitable actions and processes according to a program stored in a Read Only Memory (ROM) 920 or a program loaded from a storage device 980 into a Random Access Memory (RAM) 930. In the RAM 930, various programs and data required for the operation of the terminal 900 are also stored. The processing device 910, the ROM 920, and the RAM 930 are connected to each other by a bus 940. An input/output (I/O) interface 950 is also connected to bus 940.

In general, the following devices may be connected to the I/O interface 950: input devices 960 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 970 including, for example, a Liquid Crystal Display (LCD), speaker, vibrator, etc.; a storage device 980 including, for example, magnetic tape, hard disk, etc.; communication device 990. Communication device 990 may allow terminal 900 to communicate wirelessly or by wire with other electronic devices to exchange data. While fig. 9 illustrates a terminal 900 having various means, it is to be understood that not all illustrated means are required to be implemented or provided, and that terminal 900 may alternatively be implemented or provided with more or fewer means.

For example, the video-based data acquisition method 10 shown in fig. 1 may be implemented as a computer software program in accordance with an embodiment of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the data acquisition method 10 described above. In such embodiments, the computer program may be downloaded and installed from a network via communication device 990, or from storage device 980, or from ROM 920. The functions defined in the data acquisition method 10 provided by the embodiments of the present disclosure may be performed when the computer program is executed by the processing device 910.

At least one embodiment of the present disclosure also provides a non-transitory computer readable storage medium storing non-transitory computer readable instructions that, when executed by a computer, may implement the data acquisition method 10 of any of the embodiments of the present disclosure. By utilizing the non-transient readable storage medium, data can be conveniently and rapidly acquired without field acquisition, so that the acquisition cost of landmark data is reduced; the workload of data acquisition can be reduced, and the efficiency of data acquisition is improved.

Fig. 10 is a schematic block diagram of a non-transitory readable storage medium 100 provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 10, the non-transitory readable storage medium 100 includes computer program instructions 111 stored thereon. The computer program instructions 111, when executed by a processor, perform one or more steps in the data acquisition method 10 provided by at least one embodiment of the present disclosure.

For example, the storage medium may be any combination of one or more computer-readable storage media, such as one containing computer-readable program code for acquiring a video set corresponding to a target landmark based on the retrieval parameters, another containing computer-readable program code for extracting video frames for each video in the video set to obtain a video frame set, yet another containing computer-readable program code for performing target recognition for each video frame in the video frame set, and yet another containing computer-readable program code for taking the first video frame as data of the target landmark in response to the first video frame including the target landmark among the plurality of video frames and satisfying a preset condition. Of course, the various program codes described above may also be stored on the same computer-readable medium, as embodiments of the present disclosure are not limited in this regard. For example, when the program code is read by a computer, the computer may execute the program code stored in the computer storage medium, performing a data acquisition method such as provided by any of the embodiments of the present disclosure.

For example, the storage medium may include a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), portable compact disc read only memory (CD-ROM), flash memory, or any combination of the foregoing, as well as other suitable storage media. For example, the readable storage medium may also be the memory 820 in fig. 8, and the related description may refer to the foregoing, which is not repeated herein.

It should be noted that the storage medium 100 may be applied to the terminal 710 and/or the data server 720, and the skilled person may select according to a specific scenario, which is not limited herein.

Fig. 11 illustrates an exemplary scene graph of a video-based data acquisition system provided by at least one embodiment of the present disclosure. As shown in fig. 11, the data acquisition system 300 may include a user terminal 310, a network 320, a server 330, and a database 340.

For example, the user terminal 310 may be a computer 310-1 or a portable terminal 310-2 shown in fig. 11. It is understood that the user terminal may also be any other type of electronic device capable of performing the reception, processing and display of data, which may include, but is not limited to, desktop computers, notebook computers, tablet computers, smart home devices, wearable devices, vehicle-mounted electronic devices, medical electronic devices, etc.

For example, network 320 may be a single network, or a combination of at least two different networks. For example, network 320 may include, but is not limited to, one or a combination of several of a local area network, a wide area network, a public network, a private network, the Internet, a mobile communication network, and the like.

For example, the server 330 may be a single server or a group of servers, with each server in the group of servers being connected via a wired network or a wireless network. The wired network may use twisted pair, coaxial cable or optical fiber transmission, and the wireless network may use 3G/4G/5G mobile communication network, bluetooth, zigbee or WiFi, for example. The present disclosure is not limited herein with respect to the type and functionality of the network. The one server group may be centralized, such as a data center, or distributed. The server may be local or remote. For example, the server 330 may be a general-purpose server or a dedicated server, and may be a virtual server or a cloud server.

For example, database 340 may be used to store various data utilized, generated, and output from the operation of user terminal 310 and server 330. Database 340 may be interconnected or in communication with server 330 or a portion of server 330 via network 320, or directly with server 330, or with server 330 via a combination of both. In some embodiments, database 340 may be a stand-alone device. In other embodiments, database 340 may also be integrated in at least one of user terminal 310 and server 340. For example, the database 340 may be provided on the user terminal 310 or may be provided on the server 340. For another example, the database 340 may be distributed, with one portion being provided on the user terminal 310 and another portion being provided on the server 340.

For example, in one example, first, the user terminal 310 (e.g., a user's computer) may send the requested data to the server 330 via the network 320 or other technology (e.g., bluetooth communication, infrared communication, etc.). Next, the server 330 determines a video set corresponding to the target landmark according to the retrieval parameters in the request data in response to the request data, and transmits the video set including a plurality of videos to the user terminal 310. Finally, the user terminal 310 extracts video frames for each video in the received video set to obtain a video frame set including the extracted plurality of video frames, performs object recognition for each video frame in the video frame set, and, in response to a first video frame including a target landmark among the plurality of video frames and satisfying a preset condition, takes the first video frame as data of the target landmark.

In the foregoing, the video-based data acquisition method, apparatus and system, electronic device and non-transitory readable storage medium provided in the embodiments of the present disclosure are described in connection with fig. 1 to 11. The video-based data acquisition method provided by the embodiment of the disclosure can acquire data conveniently and rapidly, and does not need to acquire the data in the field, so that the acquisition cost of landmark data is reduced, the workload of data acquisition can be reduced by setting preset conditions, and the efficiency of data acquisition is improved.

It should be noted that the storage medium (computer readable medium) described in the present disclosure may be a computer readable signal medium or a non-transitory computer readable storage medium, or any combination of the above. The non-transitory computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the non-transitory computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a non-transitory computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a non-transitory computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), or the like, or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as the hypertext transfer protocol (Hyper Text Transfer Protocol, HTTP), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects an internet protocol address from the at least two internet protocol addresses and returns the internet protocol address; receiving an Internet protocol address returned by the node evaluation equipment; wherein the acquired internet protocol address indicates an edge node in the content distribution network.

Or the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), etc.

In this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The above description is only illustrative of some of the embodiments of the present disclosure and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A video-based data acquisition method, comprising:

Acquiring a video set corresponding to a target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos;

extracting video frames from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames;

performing target recognition on each video frame in the video frame set;

In response to a first video frame including the target landmark and satisfying a preset condition among the plurality of video frames, taking the first video frame as data of the target landmark;

wherein, according to the retrieval parameter, obtaining the video set corresponding to the target landmark comprises:

Searching according to the searching parameters, and obtaining an initial video set corresponding to the target landmark from Internet matching;

the video set is obtained based on the initial video set.

2. The data acquisition method of claim 1, wherein the preset conditions include at least one of the following conditions:

the number of pixels of the identified target landmark in the first video frame reaches a first predetermined value,

The ratio of the portion of the target landmark identified in the first video frame to the entirety of the target landmark reaches a second predetermined value, or

The ratio of the number of pixels of the identified target landmark in the first video frame to the total number of pixels of the first video frame reaches a third predetermined value.

3. The data acquisition method according to claim 1, wherein the preset condition includes:

The position of the target landmark identified in the first video frame is within a preset position range of the image of the first video frame.

4. The data acquisition method according to claim 1, wherein acquiring a video set corresponding to the target landmark according to the retrieval parameter, further comprises:

Searching according to the searching parameters, and matching from a video database to obtain an initial video set corresponding to the target landmark;

the video set is obtained based on the initial video set.

5. The data acquisition method of claim 4, wherein deriving the video set based on the initial video set comprises:

screening the initial video set to obtain the video set; or alternatively

And taking the initial video set as the video set.

6. The data acquisition method of claim 5, wherein filtering the initial video set to obtain the video set comprises:

determining a duration of each video in the initial video set;

And selecting videos with the duration within a preset duration range to obtain the video set.

7. The data acquisition method of claim 5, wherein filtering the initial video set to obtain the video set comprises:

determining a shooting angle of each video in the initial video set relative to the target landmark;

And selecting a plurality of videos with different shooting angles to obtain the video set.

8. The data collection method according to claim 1, wherein the retrieval parameter includes at least one of a keyword of the target landmark, a feature image corresponding to the target landmark, positioning information corresponding to the target landmark, and classification information corresponding to the target landmark.

9. The data collection method of claim 8, wherein the keywords of the target landmark include at least one of a name, an abbreviation, a generic name, a feature description of the target landmark.

10. The data acquisition method of claim 1, wherein object recognition of each video frame in the set of video frames comprises:

each video frame in the set of video frames is input to a neural network for object recognition.

11. The data acquisition method of claim 10, wherein the neural network comprises a de-noised convolutional neural network.

12. The data acquisition method of claim 11, wherein the de-noised convolutional neural network is derived based on a residual learning algorithm and a batch normalization algorithm.

13. The data acquisition method of claim 1, wherein the target landmark comprises a target building, a target landscape, or a target animal and plant.

14. The data acquisition method according to any one of claims 1 to 13, further comprising:

Data of the target landmark is stored for three-dimensional modeling.

15. A video-based data acquisition device, comprising:

The video set acquisition module is configured to acquire a video set corresponding to the target landmark according to the retrieval parameter, wherein the video set comprises a plurality of videos;

The video frame acquisition module is configured to extract video frames from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames;

The target recognition module is configured to recognize targets of each video frame in the video frame set;

a judging module configured to respond to a first video frame including the target landmark and satisfying a preset condition among the plurality of video frames, and take the first video frame as data of the target landmark;

Wherein the video set acquisition module is further configured to:

the video set is obtained based on the initial video set.

16. The data acquisition device of claim 15, wherein the preset conditions include at least one of:

17. The data acquisition device of claim 15, wherein the preset conditions include:

18. A video-based data acquisition system comprises a terminal and a data server, wherein,

The terminal is configured to send request data to the data server;

the data server is configured to:

Responding to the request data, determining a video set corresponding to a target landmark according to a retrieval parameter in the request data, and sending the video set to the terminal, wherein the video set comprises a plurality of videos;

the terminal is further configured to:

performing target recognition on each video frame in the video frame set;

Wherein the data server is configured to: searching according to the searching parameters, and obtaining an initial video set corresponding to the target landmark from Internet matching; the video set is obtained based on the initial video set.

19. An electronic device, comprising:

A processor;

A memory storing one or more computer program modules;

Wherein the one or more computer program modules are configured to be executed by the processor, the one or more computer program modules comprising instructions for performing the data acquisition method of any of claims 1-14.

20. A non-transitory readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform the data acquisition method of any one of claims 1-14.