CN112287169A - Data acquisition method, device and system, electronic equipment and storage medium - Google Patents

Data acquisition method, device and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN112287169A
CN112287169A CN202011177236.4A CN202011177236A CN112287169A CN 112287169 A CN112287169 A CN 112287169A CN 202011177236 A CN202011177236 A CN 202011177236A CN 112287169 A CN112287169 A CN 112287169A
Authority
CN
China
Prior art keywords
video
video frame
data acquisition
target
target landmark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011177236.4A
Other languages
Chinese (zh)
Other versions
CN112287169B (en
Inventor
陈志立
杨建朝
刘晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ByteDance Inc
Original Assignee
ByteDance Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ByteDance Inc filed Critical ByteDance Inc
Priority to CN202011177236.4A priority Critical patent/CN112287169B/en
Publication of CN112287169A publication Critical patent/CN112287169A/en
Application granted granted Critical
Publication of CN112287169B publication Critical patent/CN112287169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/743Browsing; Visualisation therefor a collection of video files or sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7328Query by example, e.g. a complete video frame or video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/787Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location

Abstract

A video-based data acquisition method, device and system, electronic equipment and a non-transitory readable storage medium are provided. The data acquisition method comprises the following steps: acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos; extracting a video frame from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames; performing target identification on each video frame in the video frame set; in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, the first video frame is taken as data of the target landmark. The data acquisition method can conveniently and quickly acquire data without acquiring the data on the spot, thereby reducing the acquisition cost of landmark data, reducing the workload of data acquisition by setting preset conditions, and improving the efficiency of data acquisition.

Description

Data acquisition method, device and system, electronic equipment and storage medium
Technical Field
Embodiments of the present disclosure relate to a video-based data acquisition method, a video-based data acquisition apparatus, a video-based data acquisition system, an electronic device, and a non-transitory readable storage medium.
Background
With the development of communication technology and terminal devices, various terminal devices such as mobile phones, tablet computers, etc. have become an indispensable part of people's work and life, and with the increasing popularity of terminal devices, video interactive application has become a main channel for communication and entertainment.
Currently, the landmark AR (augmented reality) special effect is one of the hot spots in the short video field. The special effect of the landmark AR can increase the interest of shooting, and prompts a user to shoot and record more actively.
Disclosure of Invention
At least one embodiment of the present disclosure provides a video-based data acquisition method, device and system, an electronic device, and a non-transitory readable storage medium, which can conveniently and quickly acquire data without acquiring data on the spot, reduce the acquisition cost of landmark data, reduce the workload of data acquisition, and improve the efficiency of data acquisition.
At least one embodiment of the present disclosure provides a video-based data acquisition method, including: acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos; extracting a video frame from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames; performing target identification on each video frame in the video frame set; in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, taking the first video frame as data of the target landmark.
At least one embodiment of the present disclosure also provides a video-based data acquisition apparatus, including: the device comprises a video set acquisition module, a video frame acquisition module, a target identification module and a judgment module. The video set acquisition module is configured to acquire a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos. The video frame acquisition module is configured to extract a video frame for each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames. The target recognition module is configured to perform target recognition on each video frame in the set of video frames. The determination module is configured to, in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, take the first video frame as data of the target landmark.
At least one embodiment of the present disclosure further provides a video-based data acquisition system, which includes a terminal and a data server. Wherein the terminal is configured to transmit request data to the data server; the data server is configured to: and responding to the request data, determining a video set corresponding to the target landmark according to the retrieval parameters in the request data, and transmitting the video set to the terminal, wherein the video set comprises a plurality of videos. The terminal is further configured to: extracting a video frame from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames; performing target identification on each video frame in the video frame set; in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, taking the first video frame as data of the target landmark.
At least one embodiment of the present disclosure also provides an electronic device including: a processor; the memory stores one or more computer program modules. The one or more computer program modules configured to be executed by the processor, the one or more computer program modules comprising instructions for performing the data acquisition method of any of the embodiments described above.
At least one embodiment of the present disclosure also provides a non-transitory readable storage medium having computer instructions stored thereon. Wherein the computer instructions, when executed by the processor, perform the data acquisition method of any of the above embodiments.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Like reference symbols in the various drawings indicate like elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is an example flow chart of a video-based data acquisition method provided by at least one embodiment of the present disclosure;
fig. 2 is a schematic diagram of a feature image of a target landmark according to at least one embodiment of the present disclosure;
FIG. 3 is an exemplary flowchart of the steps involved in step S101 shown in FIG. 1;
fig. 4A is a flowchart of an example operation of filtering an initial set of videos in accordance with at least one embodiment of the present disclosure;
fig. 4B is a flowchart of another example operation of filtering an initial set of videos in accordance with at least one embodiment of the present disclosure;
FIG. 5 is a network architecture diagram of a denoised convolutional neural network provided by at least one embodiment of the present disclosure;
fig. 6 is an example block diagram of a video-based data acquisition apparatus according to at least one embodiment of the present disclosure;
fig. 7 is an example block diagram of a video-based data acquisition system in accordance with at least one embodiment of the present disclosure;
fig. 8 is an example block diagram of an electronic device provided by at least one embodiment of the present disclosure;
fig. 9 is an example block diagram of a terminal provided in at least one embodiment of the present disclosure;
FIG. 10 is an example block diagram of a non-transitory readable storage medium provided by at least one embodiment of this disclosure; and
fig. 11 illustrates an exemplary scene diagram of a video-based data acquisition system provided by at least one embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
With the increasing popularity of terminal devices, video interactive application becomes a main channel for communication and entertainment. For example, the short video has the characteristics of strong social attribute, easy creation, short time and the like, and is more in line with the fragmented content consumption habit of users in the mobile internet era. The Augmented Reality (Augmented Reality) technology is a technology for skillfully fusing virtual information and a real world, and is widely applied to the real world after simulating and simulating virtual information such as characters, images, three-dimensional models, music, videos and the like generated by a computer by using various technical means such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, sensing and the like, wherein the two kinds of information supplement each other, so that the real world is enhanced. The unique virtual-real fusion special effect of the AR technology determines that the AR technology has an infinite expansion space in the field of short videos.
Currently, the special effect of the landmark AR is one of the hot spots in the short video field. The special effect of the landmark AR can increase the interest of shooting, and prompts a user to shoot and record more actively. The landmark AR special effects require modeling landmarks (e.g., featured buildings of a certain city) using collected data, however, different data collection methods have different impacts on the efficiency and effectiveness of the modeling. Conventional collection methods for landmark data may require a worker to take a vehicle (e.g., airplane, high-speed rail, train, etc.) to the field for collection (e.g., photography, etc.), which results in increased collection costs and also reduces collection efficiency.
At least one embodiment of the present disclosure provides a video-based data acquisition method, a data acquisition apparatus, a data acquisition system, an electronic device, and a non-transitory readable storage medium. The video-based data acquisition method comprises the following steps: acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos; extracting a video frame from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames; performing target identification on each video frame in the video frame set; in response to a first video frame that includes a target landmark and satisfies a preset condition among a plurality of video frames, the first video frame is taken as data of the target landmark.
The data acquisition method provided by at least one embodiment of the disclosure is used for acquiring data about a target landmark, so that the data can be conveniently and quickly acquired without being acquired on site, and the acquisition cost of landmark data is reduced; and through setting up the preset condition, can reduce data acquisition's work load, improve data acquisition's efficiency.
In the following, a data acquisition method provided according to at least one embodiment of the present disclosure is described in a non-limiting manner by using several examples or embodiments, and as described below, different features of these specific examples or embodiments may be combined with each other without mutual conflict, so as to obtain new examples or embodiments, and these new examples or embodiments also belong to the protection scope of the present disclosure.
Fig. 1 is an exemplary flowchart of a video-based data acquisition method according to at least one embodiment of the present disclosure.
At least one embodiment of the present disclosure provides a video-based data acquisition method 10 that can be applied to various video interaction applications, video clipping applications, and the like. For example, in at least one embodiment, the video-based data collection method 10 may collect data about target landmarks for subsequent use in three-dimensional modeling, generating landmark AR special effects, and the like. For example, in at least one embodiment, as shown in FIG. 1, the video-based data capture method 10 includes the following operations:
step S101: acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos;
step S102: extracting a video frame from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames;
step S103: performing target identification on each video frame in the video frame set;
step S104: in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, the first video frame is taken as data of the target landmark.
For example, the data of the target landmark is acquired by the data acquisition method 10, so that the data can be conveniently and quickly acquired without the need of personnel to acquire the data on site, thereby reducing the acquisition cost of the landmark data and improving the data acquisition efficiency.
For example, steps S101 to S104 may be executed sequentially or in other adjusted orders, and the execution order of each step is not limited in the embodiments of the present disclosure and may be adjusted according to actual situations. For example, steps S101 to S104 may be implemented by a server or a local end, and the embodiment of the disclosure is not limited thereto. For example, in some examples, the data acquisition method 10 provided by at least one embodiment of the present disclosure may optionally perform some of the steps S101 to S104, or may perform some additional steps other than the steps S101 to S104, which is not limited in this respect by the embodiments of the present disclosure.
The video-based data acquisition method 10 provided by the present disclosure is described in detail below, by way of example, with reference to the accompanying drawings.
Step S101: and acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos.
For example, in at least one embodiment of the present disclosure, the target landmark includes a target building, a target landscape, or a target animal or plant. For example, in one example, the target landmark may be a featured building in a city, such as the pacific at sanritun, beijing, the bright oriental tower at shanghai (colloquially referred to as "bright oriental pearl"), the guangzhou tower at guangzhou (colloquially referred to as "thin waist"), and so forth. For example, in another example, the target landmark may be a famous landscape (or tourist attraction, etc.) of a city, such as yellow mountain in Anhui, Everest in Tibet, etc. For example, in yet another example, the target landmark may be a locality-known animal or plant, such as a guest-greeting pine located within a yellow mountain scenic spot, or the like.
It should be noted that the target landmark in the embodiment of the present disclosure includes, but is not limited to, the above listed contents, and the target landmark may be any content associated with a place in which a user is interested, and the embodiment of the present disclosure is not limited to this, and may be set according to actual needs.
For example, in at least one embodiment of the present disclosure, the retrieval parameter may include at least one of a keyword of the target landmark, a feature image corresponding to the target landmark, positioning information corresponding to the target landmark, and classification information corresponding to the target landmark. For example, in at least one embodiment of the present disclosure, the keywords of the target landmark include at least one of a name, an abbreviation, an alias, and a feature description of the target landmark. For example, in one example, the target landmark is "guangzhou Tower" located in Guangdong province of China, and its search parameters may include keywords, for example, its keywords may include its name "Guangzhou Tower", may also include its other names "thin waist", "Guangzhou New television Tower" or "Haichi Tower", may also include a characterization of "first Tower in China", "fourth Tower in the world", etc., and may also include its foreign name "Canton Tower", the foreign abbreviation "CT", etc., as embodiments of the present disclosure are not limited in this respect. For example, in another example, the target landmark is a temple of the praying years in beijing, china, and the search parameters may include a feature image corresponding to the target landmark (i.e., "beijing temple"), as shown in fig. 2. For example, in yet another example, the target landmark is the taigu located at tricumt of beijing, and the search parameter may include positioning information corresponding to the target landmark, namely, courtyard 19, the sunny district of beijing, and may further include classification information corresponding to the target landmark, for example, a landmark of beijing business circle, and the like.
It should be noted that, the content of the search parameter in the above example is only an example, and the search parameter related to the target landmark in the embodiment of the present disclosure includes, but is not limited to, the above content, and may be set according to actual needs.
Fig. 3 is an exemplary flowchart of steps included in step S101 shown in fig. 1. For example, in at least one embodiment of the present disclosure, acquiring a video set corresponding to a target landmark (i.e., step S101) according to the retrieval parameters may include the following operations, as shown in fig. 3.
Step S301: searching according to the searching parameters, and matching from a video database or the Internet to obtain an initial video set corresponding to the target landmark;
step S302: a video set is derived based on the initial video set.
For example, in at least one embodiment of the present disclosure, for step S301, an initial video set corresponding to the target landmark is obtained from a video database or internet matching according to the search parameter, and methods such as entity recognition, keyword matching, deep learning, and the like may be used. For example, the initial video set includes a plurality of videos which are preliminarily obtained by searching and matching, and a part of the videos or all of the videos may constitute a video set for subsequent processing.
For example, in at least one embodiment of the present disclosure, the video database may be already established, and the already established video database may be stored in a local or server in advance, or may be established by the server when implementing the data acquisition method 10, or may be read from other devices.
Generally, many users upload videos shot by themselves at ordinary times or interesting videos to the internet for other users to browse or collect. For example, in at least one embodiment of the present disclosure, the initial video set corresponding to the target landmark may be directly derived from internet matching according to the retrieval parameters.
For example, in at least one embodiment of the present disclosure, for step S302, deriving a video set based on the initial video set may include: the initial video set is taken as the video set. For example, in one example, after the initial video set corresponding to the target landmark is obtained according to the search parameter matching, the obtained initial video set may be directly used as the video set. For example, in some cases, since the number of videos matched with the above search parameters in a video database or the internet may be large, the total number of videos included in the initial video set is large, and if the initial video set obtained by matching is directly used as a video set, a large amount of repeated data or invalid data may occur, resulting in a large workload and low efficiency of data acquisition. Thus, in at least one embodiment of the present disclosure, for step S302, deriving a video set based on the initial video set may include: and screening the initial video set to obtain a video set.
Fig. 4A is a flowchart of an example operation of filtering an initial video set according to at least one embodiment of the present disclosure. For example, in at least one embodiment of the present disclosure, screening the initial video set to obtain the video set may include the following operations, as shown in fig. 4A.
Step S401: determining the duration of each video in the initial video set;
step S402: and selecting the video with the duration within the preset duration range to obtain a video set.
For example, in at least one embodiment of the present disclosure, the initial set of videos includes three videos, e.g., including video 1, video 2, and video 3. The duration of each video in the initial set of videos is first determined, e.g., video 1 has a duration of 50 seconds, video 2 has a duration of 90 seconds, and video 3 has a duration of 150 seconds. And then selecting the videos with the duration within the preset duration range to obtain a video set. For example, in one example, the preset duration range is 60 seconds to 180 seconds, then video 2 and video 3 in the initial video set are selected to join the video set because the duration is within the preset duration range.
It should be noted that the foregoing embodiments are merely exemplary, the preset time period range may be 60 seconds to 180 seconds, or 120 seconds to 300 seconds, and the like. The number of video frames which can be extracted by the video with short time length is limited, and the calculation amount is overlarge due to the video with long time length, so that the video with short time length and the video with long time length can be eliminated by limiting the time length range, the acquisition efficiency of the video frames is improved, and the calculation amount is reduced. In addition, the embodiments of the present disclosure do not limit parameters such as resolution (e.g., 720P, 1080P, etc.) of a video, a shooting frame rate (e.g., 30FPS, 60FPS, etc.), and the like.
Fig. 4B is a flowchart of another example operation of filtering an initial set of videos in accordance with at least one embodiment of the present disclosure. For example, in at least one embodiment of the present disclosure, screening the initial video set to obtain the video set may include the following operations, as shown in fig. 4B.
Step S410: determining a shooting angle of each video in the initial video set relative to the target landmark;
step S420: a plurality of videos having shooting angles different from each other are selected to obtain a video set.
For example, in at least one embodiment of the present disclosure, the shooting angle with respect to the target landmark may include a shooting height, a shooting direction, and a shooting distance, and for example, the shooting angle may be obtained by analyzing various feature points included in the picture itself of the video and parameters (e.g., a posture, etc.) of a shooting camera recorded by the video. Different videos with the same shooting angle include substantially the same images about the target landmark. Therefore, videos with different shooting angles are selected to be added into the video set, so that the repetition of collected data can be avoided, and the data of the target landmark can be enriched.
For example, in at least one embodiment of the present disclosure, a plurality of videos having a specific shooting angle or within a specific shooting angle range and different shooting angles from each other may be selected to filter out a video set from the initial video set.
It should be noted that, in the embodiment of the present disclosure, the screening manner for screening the initial video set is not limited to the above-described screening according to the duration and screening according to the shooting angle, and may also be the screening according to any other applicable parameters and rules, which may be determined according to actual needs, and the embodiment of the present disclosure is not limited thereto.
Step S102: a video frame is decimated for each video in the video set to obtain a video frame set that includes the decimated plurality of video frames.
For example, in at least one embodiment of the present disclosure, video frames may be decimated from a video at a certain decimation frame rate. For example, the extraction frame rate may be preset, or may be different according to the respective shooting frame rates of different videos, which is not limited in the embodiments of the present disclosure. For example, in one example, video frame decimation for all videos may be set to decimate one frame per second. For another example, in another example, a video for 30FPS may be set to decimate one frame per second, while a video for 60FPS may be set to decimate two frames per second.
It should be noted that the above embodiments are merely exemplary, and the embodiments of the present disclosure do not limit the setting of the extraction frame rate, and may be set according to actual requirements.
For example, in at least one embodiment of the present disclosure, based on the extracted video frames, a video frame set may be obtained, the video frame set including a plurality of video frames.
Step S103: target recognition is performed for each video frame in the set of video frames.
For example, in at least one embodiment of the present disclosure, target identifying each video frame in a set of video frames comprises: each video frame in the set of video frames is input to a neural network for target recognition. For example, in at least one embodiment of the present disclosure, a de-noising convolutional neural network (DnCNN) may be employed. For example, in at least one embodiment of the present disclosure, the denoised convolutional neural network is derived based on a residual learning algorithm and a batch normalization algorithm. For example, a denoising convolutional neural network may denoise gaussian additive noise using a convolutional neural network of multiple depths.
For example, in at least one embodiment of the present disclosure, the DnCNN model employs a residual learning algorithm. Unlike conventional methods that use many small residual units, the DnCNN model predicts a residual image (i.e., a noise image) by using the entire network to construct a large residual unit. Assuming that the input of the DnCNN model is an additive noise sample y ═ x + v, the DnCNN model learns a function r (y) ≈ v, where v can be understood as a residual image (i.e., a noise image), so that the original image x ═ y-r (y) can be restored.
Fig. 5 is a network architecture of a DnCNN according to at least one embodiment of the present disclosure. As shown in fig. 5, DnCNN employs a stacked full convolution structure. Regardless of the input noisy image and the output layers, assuming a common D layer (typically, depth D is set to 17 and 20 in DnCNN), a total of three different rolling blocks are distributed in the first, middle, and last portions in fig. 5.
As shown in fig. 5, the first convolution block is Conv + ReLU, which constitutes a first layer, i.e. convolution (Conv) of the input image, followed by the use of a corrected linear unit (ReLU) (also called active layer). The second convolution block, i.e. the middle 2 to (D-1) layers, uses the combination of Conv + BN + ReLU, i.e. a Batch Normalization (BN) layer is added between the convolution layer and the ReLU, which is a relatively important layer, and DnCNN greatly benefits from the combination of the residual learning algorithm and the batch normalization algorithm. When the network learning is performed by using batch random gradient (SDG), the change of the input distribution of the nonlinear input unit in the training process can be slowed down by using batch normalization, so that the convergence of the training is accelerated. While the last convolution block uses only the convolution layer to reconstruct the output layer. For example, in at least one embodiment of the present disclosure, a DnCNN model obtained based on a residual learning algorithm and a batch normalization algorithm can achieve a better denoising result.
It should be noted that the foregoing embodiments are merely exemplary, and other methods may be adopted by the embodiments of the present disclosure to perform object identification on each video frame in the video frame set, and the embodiments of the present disclosure are not limited in this respect. When the neural network is used for target identification, the structure of the neural network used may be any suitable structure, which may be determined according to actual requirements, and is not limited to the above-described DnCNN model.
Step S104: in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, the first video frame is taken as data of the target landmark.
For example, in at least one embodiment of the present disclosure, the preset condition includes at least one of the following conditions: the number of pixels of the identified target landmarks in the first video frame reaches a first predetermined value, the ratio of the part of the identified target landmarks in the first video frame to the whole target landmarks reaches a second predetermined value, or the ratio of the number of pixels of the identified target landmarks in the first video frame to the total number of pixels of the first video frame reaches a third predetermined value.
For example, in at least one embodiment of the present disclosure, the image of the first video frame includes 640 × 840 pixels for a total of approximately 54 ten thousand pixels, assuming that the image of the first video frame includes a target landmark and the number of pixels in the first video frame displaying the target landmark is 30 ten thousand. When the first predetermined value is set to 25 ten thousand, the number of pixels (i.e., 30 ten thousand) of the identified target landmark in the first video frame, which may be used as data of the corresponding target landmark, reaches the first predetermined value (25 ten thousand), for example, the first video frame may be stored for subsequent use. When the first predetermined value is set to 40 ten thousand, the number of pixels (i.e., 30 ten thousand) of the identified target landmark in the first video frame, which cannot be used as data of the corresponding target landmark, does not reach the first predetermined value (40 ten thousand).
It should be noted that the first predetermined value, the number of vertical pixels and horizontal pixels of the first video frame, and the number of pixels of the target landmark, which are described in the above embodiments, are all exemplary, and the embodiments of the present disclosure are not limited to this, and may be set according to actual situations.
For example, in at least one embodiment of the present disclosure, when the second predetermined value is 0.5, when the ratio of the portion of the target landmark displayed in the first video frame to the whole of the actual target landmark exceeds half, the first video frame may be used as data of the corresponding target landmark, and then, for example, the first video frame may be stored for subsequent use.
It should be noted that, in the above embodiment, the ratio of the part of the target landmark identified in the first video frame to the whole target landmark may be a ratio of an area of the target landmark displayed in the first frame image to an actual area of the target landmark, or may also be a ratio of a volume of the target landmark displayed in the first frame image to an actual volume of the target landmark, which is not limited in this embodiment of the present disclosure.
It should be further noted that, the second predetermined value of 0.5 described in the foregoing embodiments is only an example, and the second predetermined value may also be any value such as 0.6, 0.4, and the embodiments of the present disclosure do not limit the specific value of the second predetermined value, and may be set according to actual requirements.
For example, in at least one embodiment of the present disclosure, the image of the first video frame includes 640 × 840 pixels for a total of approximately 54 ten thousand pixels, assuming that the image of the first video frame includes a target landmark and the number of pixels in the first video frame displaying the target landmark is 20 ten thousand. When the third predetermined value is set to 0.3, then the ratio (approximately 0.37) of the number of pixels of the identified target landmark (i.e., 20 ten thousand) in the first video frame to the total number of pixels of the first video frame (i.e., 54 ten thousand) that may serve as data for the corresponding target landmark exceeds the third predetermined value (i.e., 0.3), e.g., the first video frame may be stored for subsequent use. When the third predetermined value is set to 0.5, the ratio (approximately 0.37) of the number of pixels of the identified target landmark (i.e., 20 ten thousand) in the first video frame, which cannot be used as data of the corresponding target landmark, to the total number of pixels of the first video frame (i.e., 54 ten thousand) is lower than the third predetermined value (i.e., 0.5).
It should be noted that the third predetermined value, the number of vertical pixels and horizontal pixels of the first video frame, and the number of pixels of the target landmark, which are described in the above embodiments, are all exemplary, and the embodiments of the present disclosure are not limited to this, and may be set according to actual situations.
For example, in at least one embodiment of the present disclosure, the preset conditions may include: the position of the identified target landmark in the first video frame is within a preset position range of the image of the first video frame. For example, in one example, the image may be identified by a deep learning method (e.g., the denoising convolutional neural network described above), such as determining outer contour data of the target landmark or center point data of the target landmark within the first video frame, and so on, to determine the relative position of the target landmark within the image of the first video frame. For example, a position range is preset again, for example, the preset position range is a middle part of the image of the first video frame, more specifically, for example, in one example, the image of the first video frame includes 100 ten thousand total pixels, and the preset position range may be set within 40 ten thousand pixels near the image central axis of the first video frame.
It should be noted that the preset position range described in the above embodiments is only an example within a range of 40 ten thousand pixels near the central axis of the image, and the embodiments of the present disclosure do not limit the preset position range, and may be set according to actual requirements.
The data acquisition method 10 provided by the embodiment of the disclosure can conveniently and quickly acquire data without the need of personnel to acquire data on site, thereby reducing the acquisition cost of landmark data. By setting preset conditions and screening the video set, repeated or invalid data can be removed from all the collected data, so that the workload of data collection is reduced, and the efficiency of data collection is improved.
For example, in at least one embodiment of the present disclosure, the data acquisition method 10 may further include: data of the target landmark is stored for three-dimensional modeling, e.g., for three-dimensional reconstruction.
For example, in at least one embodiment of the present disclosure, the data of the target landmark acquired based on the data acquisition method 10 provided in the above embodiments is stored for subsequent operations, such as performing three-dimensional modeling on the landmark, making an AR special effect, and the like, which is not limited by the embodiments of the present disclosure.
At least one embodiment of the present disclosure also provides a video-based data acquisition device. Fig. 6 is a schematic block diagram of a video-based data acquisition apparatus according to at least one embodiment of the present disclosure. As shown in fig. 6, the data acquisition device 60 includes: the video set obtaining module 601, the video frame obtaining module 602, the object identifying module 603, and the determining module 604 may be implemented by software, hardware, firmware, or any combination thereof, for example, as a video set obtaining circuit, a video frame obtaining circuit, an object identifying circuit, and a determining circuit, respectively.
For example, in at least one embodiment of the present disclosure, the video set obtaining module 601 is configured to obtain a video set corresponding to the target landmark according to the retrieval parameter, where the video set includes a plurality of videos. For example, the video frame acquisition module 602 is configured to decimate a video frame for each video in the video set to obtain a video frame set, which includes a plurality of decimated video frames. For example, the target recognition module 603 is configured to perform target recognition on each video frame in the set of video frames. For example, the determination module 604 is configured to take a first video frame, which includes the target landmark and satisfies a preset condition, among the plurality of video frames, as data of the target landmark.
For example, in at least one embodiment of the present disclosure, the preset condition includes at least one of the following conditions: the number of pixels of the identified target landmarks in the first video frame reaches a first predetermined value, the ratio of the part of the identified target landmarks in the first video frame to the whole target landmarks reaches a second predetermined value, or the ratio of the number of pixels of the identified target landmarks in the first video frame to the total number of pixels of the first video frame reaches a third predetermined value.
For example, in at least one embodiment of the present disclosure, the preset conditions include: the position of the identified target landmark in the first video frame is within a preset position range of the image of the first video frame.
It should be noted that the setting of the preset condition is described in detail in the above embodiment of the data acquisition method 10, and is not described herein again.
For example, the specific operations that the video set obtaining module 601, the video frame obtaining module 602, the target identifying module 603, and the determining module 604 are configured to execute may all refer to the above description related to step S101 to step S104 of the data acquisition method 10 provided in at least one embodiment of the present disclosure, and are not repeated herein.
It should be noted that, in the embodiment of the present disclosure, the data acquisition apparatus 60 may further include more modules, which are not limited to the video set acquisition module 601, the video frame acquisition module 602, the target identification module 603, and the determination module 604, and this may be determined according to practical needs, and the embodiment of the present disclosure is not limited thereto.
It should be understood that the data acquisition device 60 provided in the embodiment of the present disclosure may implement the foregoing data acquisition method 10, and also may achieve similar technical effects to the foregoing data acquisition method 10, which are not described herein again.
At least one embodiment of the present disclosure also provides a video-based data acquisition system. Fig. 7 is an example block diagram of a video-based data acquisition system provided by at least one embodiment of the present disclosure. As shown in fig. 7, the data acquisition system 70 includes a terminal 710 and a data server 720, and the terminal 710 and the data server 720 are in signal connection. The terminal 710 is configured to transmit the requested data to the data server 720. The data server 720 is configured to: in response to the request data, a video set corresponding to the target landmark is determined according to the retrieval parameters in the request data, and the video set is transmitted to the terminal 710, the video set comprising a plurality of videos.
For example, the terminal 710 is further configured to extract a video frame for each video in the video set to obtain a video frame set including the extracted plurality of video frames, perform target recognition for each video frame in the video frame set, and take a first video frame as data of a target landmark in response to the first video frame including the target landmark and satisfying a preset condition among the plurality of video frames.
For example, the above operations performed by the terminal 710 and the data server 720 may refer to the data acquisition method 10 provided in at least one embodiment of the present disclosure, and are not described herein again.
For example, in one example, the data collection system 70 includes a terminal 710 that can be implemented as a client (e.g., a cell phone, a computer, etc.) and a data server 720 that can be implemented as a server (e.g., a server).
For example, in one example, as shown in FIG. 7, the data acquisition system 70 may include a database server 730 storing a video database in addition to the terminal 710 and the data server 720. The database server 730 is in signal connection with the data server 720 and is configured to return data in the video database corresponding to the requested information to the data server 720 in response to the requested information from the data server 720. It should be noted that, when the data acquisition system 70 does not include the database server 730, the data in the video database may be directly stored on the data server 720 or stored in another storage device provided separately, or the video database may be built by the data server 720 and then stored on the data server 720 or stored in another storage device provided separately, which is not limited in this embodiment of the disclosure.
At least one embodiment of the present disclosure provides a video-based data acquisition system 70, which can implement the data acquisition method 10 provided in the foregoing embodiment, and also can achieve the technical effects similar to the data acquisition method 10 provided in the foregoing embodiment, and details are not repeated here.
At least one embodiment of the present disclosure also provides an electronic device. Fig. 8 is a schematic diagram of an electronic device according to at least one embodiment of the present disclosure. For example, as shown in FIG. 8, the electronic device 80 includes a processor 810 and a memory 820. The memory 820 includes one or more computer program modules 821. One or more computer program modules 821 stored in the memory 820 and configured to be executed by the processor 810, the one or more computer program modules 821 including instructions for performing any of the data acquisition methods provided by at least one embodiment of the present disclosure, which when executed by the processor 810, may perform one or more steps of the data acquisition methods provided by at least one embodiment of the present disclosure. The memory 820 and the processor 810 may be interconnected by a bus system and/or other form of connection mechanism (not shown).
For example, the memory 820 and the processor 810 may be disposed at a server (or a cloud), such as the data server 720, for performing one or more steps of the data acquisition methods described in fig. 1, 3, 4A, and 4B.
For example, the processor 810 may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or other form of processing unit having data processing capabilities and/or program execution capabilities, such as a Field Programmable Gate Array (FPGA), or the like; for example, the Central Processing Unit (CPU) may be an X86 or ARM architecture or the like. The processor 810 may be a general-purpose processor or a special-purpose processor that may control other components in the electronic device 80 to perform desired functions.
For example, memory 820 may include any combination of one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules 821 may be stored on the computer-readable storage medium and executed by processor 810 to implement various functions of electronic device 80. Various applications and various data, as well as various data used and/or generated by the applications, and the like, may also be stored in the computer-readable storage medium. Specific functions and technical effects of the electronic device 80 can be referred to the above description of the data acquisition method, and are not described herein again.
Fig. 9 is an example block diagram of a terminal according to at least one embodiment of the present disclosure. For example, in at least one embodiment of the present disclosure, the terminal is a terminal 900 having a data processing capability, and may be applied to the data acquisition method 10 provided in the embodiment of the present disclosure, for example. For example, the terminal 900 may transmit the request data to a server (e.g., the data server 720), the terminal 900 may further receive a video set from the server (e.g., the data server 720), and then extract a video frame for each video in the video set to obtain a video frame set, the video frame set includes a plurality of extracted video frames, perform target recognition for each video frame in the video frame set, and in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, take the first video frame as data of the target landmark. It should be noted that the terminal 900 shown in fig. 9 is only an example, and does not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.
As shown in fig. 9, terminal 900 can include a processing device (e.g., central processing unit, graphics processor, etc.) 910 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)920 or a program loaded from a storage device 980 into a Random Access Memory (RAM) 930. In the RAM 930, various programs and data necessary for the operation of the terminal 900 are also stored. The processing device 910, the ROM 920, and the RAM 930 are connected to each other by a bus 940. An input/output (I/O) interface 950 is also connected to bus 940.
Generally, the following devices may be connected to the I/O interface 950: input devices 960 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 970 including, for example, a Liquid Crystal Display (LCD), speaker, vibrator, or the like; storage 980 including, for example, magnetic tape, hard disk, etc.; and a communication device 990. The communication means 990 may allow the terminal 900 to perform wireless or wired communication with other electronic devices to exchange data. While fig. 9 illustrates a terminal 900 having various means, it is to be understood that not all illustrated means are required to be implemented or provided, and that terminal 900 can alternatively be implemented or provided with more or less means.
For example, the video-based data acquisition method 10 shown in fig. 1 may be implemented as a computer software program in accordance with an embodiment of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program comprising program code for performing the data acquisition method 10 described above. In such embodiments, the computer program may be downloaded and installed from a network through the communication device 990, or installed from the storage device 980, or installed from the ROM 920. When executed by the processing device 910, the computer program may perform the functions defined in the data acquisition method 10 provided by the embodiments of the present disclosure.
At least one embodiment of the present disclosure also provides a non-transitory readable storage medium for storing non-transitory computer readable instructions that, when executed by a computer, may implement the data acquisition method 10 of any embodiment of the present disclosure. By utilizing the non-transient readable storage medium, data can be conveniently and quickly acquired without on-site acquisition, so that the acquisition cost of landmark data is reduced; the workload of data acquisition can be reduced, and the efficiency of data acquisition is improved.
Fig. 10 is a schematic block diagram of a non-transitory readable storage medium 100 provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 10, the non-transitory readable storage medium 100 includes computer program instructions 111 stored thereon. The computer program instructions 111, when executed by a processor, perform one or more steps of the data acquisition method 10 provided by at least one embodiment of the present disclosure.
For example, the storage medium may be any combination of one or more computer-readable storage media, such as one containing computer-readable program code for retrieving a video set corresponding to a target landmark based on search parameters, another containing computer-readable program code for decimating video frames of each video in the video set to obtain a video frame set, yet another containing computer-readable program code for object recognition of each video frame in the video frame set, and yet another containing computer-readable program code for taking the first video frame as data of the target landmark in response to the first video frame including the target landmark and satisfying a preset condition among a plurality of video frames. Of course, the above program codes may also be stored in the same computer readable medium, and the embodiments of the disclosure are not limited thereto. For example, when the program code is read by a computer, the computer can execute the program code stored in the computer storage medium to perform a data acquisition method such as that provided by any of the embodiments of the present disclosure.
For example, the storage medium may include a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a flash memory, or any combination of the above, as well as other suitable storage media. For example, the readable storage medium may also be the memory 820 in fig. 8, and reference may be made to the foregoing description for related description, which is not described herein again.
It should be noted that the storage medium 100 can be applied to the terminal 710 and/or the data server 720, and a skilled person can select the storage medium according to a specific scenario, which is not limited herein.
Fig. 11 illustrates an exemplary scene diagram of a video-based data acquisition system provided by at least one embodiment of the present disclosure. As shown in fig. 11, the data collection system 300 may include a user terminal 310, a network 320, a server 330, and a database 340.
The user terminal 310 may be, for example, a computer 310-1 or a portable terminal 310-2 shown in fig. 11. It will be appreciated that the user terminal may also be any other type of electronic device capable of performing the receiving, processing and displaying of data, which may include, but is not limited to, a desktop computer, a laptop computer, a tablet computer, a smart home device, a wearable device, a vehicle-mounted electronic device, a medical electronic device, and the like.
For example, network 320 may be a single network, or a combination of at least two different networks. For example, the network 320 may include, but is not limited to, one or a combination of local area networks, wide area networks, public networks, private networks, the internet, mobile communication networks, and the like.
For example, the server 330 may be a single server or a group of servers, and each server in the group of servers is connected via a wired network or a wireless network. The wired network may communicate by using twisted pair, coaxial cable, or optical fiber transmission, for example, and the wireless network may communicate by using 3G/4G/5G mobile communication network, bluetooth, Zigbee, or WiFi, for example. The present disclosure is not limited herein as to the type and function of the network. The one group of servers may be centralized, such as a data center, or distributed. The server may be local or remote. For example, the server 330 may be a general-purpose server or a dedicated server, may be a virtual server or a cloud server, and the like.
For example, database 340 may be used to store various data utilized, generated, and output from the operation of user terminal 310 and server 330. Database 340 may be interconnected or in communication with server 330 or a portion of server 330 via network 320, or directly interconnected or in communication with server 330, or in a combination of both. In some embodiments, database 340 may be a stand-alone device. In other embodiments, the database 340 may also be integrated in at least one of the user terminal 310 and the server 340. For example, the database 340 may be provided on the user terminal 310, or may be provided on the server 340. For another example, the database 340 may be distributed, and a part of the database may be provided in the user terminal 310 and another part of the database may be provided in the server 340.
For example, in one example, first, a user terminal 310 (e.g., a user's computer) may send request data to a server 330 via a network 320 or other technology (e.g., bluetooth communication, infrared communication, etc.). Next, the server 330 determines a video set corresponding to the target landmark according to the retrieval parameter in the request data in response to the request data, and transmits the video set to the user terminal 310, the video set including a plurality of videos. Finally, the user terminal 310 extracts a video frame for each video in the received video set to obtain a video frame set including the extracted plurality of video frames, performs target recognition on each video frame in the video frame set, and takes a first video frame as data of a target landmark in response to a first video frame including the target landmark and satisfying a preset condition among the plurality of video frames.
In the above, a video-based data acquisition method, an apparatus and a system, an electronic device and a non-transitory readable storage medium provided by the embodiments of the present disclosure are described with reference to fig. 1 to 11. The data acquisition method based on the video can conveniently and quickly acquire data without acquiring the data on site, thereby reducing the acquisition cost of landmark data, reducing the workload of data acquisition and improving the efficiency of data acquisition by setting the preset conditions.
It should be noted that the storage medium (computer readable medium) described above in the present disclosure may be a computer readable signal medium or a non-transitory computer readable storage medium or any combination of the two. The non-transitory computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the non-transitory computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a non-transitory computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a non-transitory computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as the hypertext Transfer Protocol (HTTP), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the present disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing description is only exemplary of the embodiments of the disclosure and is provided for the purpose of illustrating the general principles of the technology. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (20)

1. A video-based data collection method, comprising:
acquiring a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos;
extracting a video frame from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames;
performing target identification on each video frame in the video frame set;
in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, taking the first video frame as data of the target landmark.
2. The data acquisition method according to claim 1, wherein the preset condition comprises at least one of the following conditions:
the number of pixels of the identified target landmark in the first video frame reaches a first predetermined value,
the ratio of the part of the identified target landmark to the whole target landmark in the first video frame reaches a second preset value, or
The ratio of the number of pixels of the identified target landmark in the first video frame to the total number of pixels of the first video frame reaches a third predetermined value.
3. The data acquisition method according to claim 1, wherein the preset condition includes:
the position of the identified target landmark in the first video frame is within a preset position range of the image of the first video frame.
4. The data acquisition method as set forth in claim 1, wherein obtaining a video set corresponding to the target landmark in accordance with the retrieval parameters comprises:
searching according to the searching parameters, and matching from a video database or the Internet to obtain an initial video set corresponding to the target landmark;
the video set is obtained based on the initial video set.
5. The data acquisition method of claim 4, wherein deriving the video set based on the initial video set comprises:
screening the initial video set to obtain the video set; or
Taking the initial video set as the video set.
6. The data acquisition method as claimed in claim 5, wherein the screening of the initial video set to obtain the video set comprises:
determining the duration of each video in the initial video set;
and selecting the video with the duration within a preset duration range to obtain the video set.
7. The data acquisition method as claimed in claim 5, wherein the screening of the initial video set to obtain the video set comprises:
determining a shooting angle of each video in the initial video set relative to the target landmark;
selecting a plurality of videos with different shooting angles to obtain the video set.
8. The data acquisition method as claimed in claim 1, wherein the search parameter includes at least one of a keyword of the target landmark, a feature image corresponding to the target landmark, positioning information corresponding to the target landmark, and classification information corresponding to the target landmark.
9. The data acquisition method as claimed in claim 8, wherein the keywords of the target landmark comprise at least one of a name, an abbreviation, an alias, a feature description of the target landmark.
10. The data acquisition method of claim 1, wherein target identifying each video frame in the set of video frames comprises:
inputting each video frame in the set of video frames into a neural network for target recognition.
11. The data acquisition method as set forth in claim 10, wherein the neural network comprises a de-noised convolutional neural network.
12. The data acquisition method of claim 11, wherein the de-noised convolutional neural network is derived based on a residual learning algorithm and a batch normalization algorithm.
13. The data acquisition method as recited in claim 1, wherein the target landmark comprises a target building, a target landscape, or a target animal or plant.
14. The data acquisition method as set forth in any one of claims 1-13, further including:
storing data of the target landmark for three-dimensional modeling.
15. A video-based data acquisition apparatus comprising:
the video set acquisition module is configured to acquire a video set corresponding to the target landmark according to the retrieval parameters, wherein the video set comprises a plurality of videos;
a video frame acquisition module configured to extract a video frame for each video in the video set to obtain a video frame set, wherein the video frame set includes a plurality of extracted video frames;
a target identification module configured to perform target identification for each video frame in the set of video frames;
a determination module configured to, in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, take the first video frame as data of the target landmark.
16. The data acquisition device of claim 15, wherein the preset conditions include at least one of:
the number of pixels of the identified target landmark in the first video frame reaches a first predetermined value,
the ratio of the part of the identified target landmark to the whole target landmark in the first video frame reaches a second preset value, or
The ratio of the number of pixels of the identified target landmark in the first video frame to the total number of pixels of the first video frame reaches a third predetermined value.
17. The data acquisition apparatus according to claim 15, wherein the preset condition includes:
the position of the identified target landmark in the first video frame is within a preset position range of the image of the first video frame.
18. A video-based data acquisition system comprises a terminal and a data server, wherein,
the terminal is configured to send request data to the data server;
the data server is configured to:
responding to the request data, determining a video set corresponding to the target landmark according to the retrieval parameters in the request data, and sending the video set to the terminal, wherein the video set comprises a plurality of videos;
the terminal is further configured to:
extracting a video frame from each video in the video set to obtain a video frame set, wherein the video frame set comprises a plurality of extracted video frames;
performing target identification on each video frame in the video frame set;
in response to a first video frame that includes the target landmark and satisfies a preset condition among the plurality of video frames, taking the first video frame as data of the target landmark.
19. An electronic device, comprising:
a processor;
a memory storing one or more computer program modules;
wherein the one or more computer program modules are configured to be executed by the processor, the one or more computer program modules comprising instructions for performing the data acquisition method of any one of claims 1-14.
20. A non-transitory readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform a data acquisition method as recited in any one of claims 1-14.
CN202011177236.4A 2020-10-29 2020-10-29 Data acquisition method, device and system, electronic equipment and storage medium Active CN112287169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011177236.4A CN112287169B (en) 2020-10-29 2020-10-29 Data acquisition method, device and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011177236.4A CN112287169B (en) 2020-10-29 2020-10-29 Data acquisition method, device and system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112287169A true CN112287169A (en) 2021-01-29
CN112287169B CN112287169B (en) 2024-04-26

Family

ID=74373822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011177236.4A Active CN112287169B (en) 2020-10-29 2020-10-29 Data acquisition method, device and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112287169B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293052A (en) * 2015-06-25 2017-01-04 意法半导体国际有限公司 Reinforced augmented reality multimedia system
US20170257659A1 (en) * 2016-03-04 2017-09-07 Nec Corporation Information processing system
CN110046572A (en) * 2019-04-15 2019-07-23 重庆邮电大学 A kind of identification of landmark object and detection method based on deep learning
CN110809166A (en) * 2019-10-31 2020-02-18 北京字节跳动网络技术有限公司 Video data processing method and device and electronic equipment
CN111274426A (en) * 2020-01-19 2020-06-12 深圳市商汤科技有限公司 Category labeling method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293052A (en) * 2015-06-25 2017-01-04 意法半导体国际有限公司 Reinforced augmented reality multimedia system
US20170257659A1 (en) * 2016-03-04 2017-09-07 Nec Corporation Information processing system
CN110046572A (en) * 2019-04-15 2019-07-23 重庆邮电大学 A kind of identification of landmark object and detection method based on deep learning
CN110809166A (en) * 2019-10-31 2020-02-18 北京字节跳动网络技术有限公司 Video data processing method and device and electronic equipment
CN111274426A (en) * 2020-01-19 2020-06-12 深圳市商汤科技有限公司 Category labeling method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112287169B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN107133325B (en) Internet photo geographic space positioning method based on street view map
US9564175B2 (en) Clustering crowdsourced videos by line-of-sight
CN110189246B (en) Image stylization generation method and device and electronic equipment
CN110400363A (en) Map constructing method and device based on laser point cloud
CN112288853B (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, and storage medium
Li et al. Camera localization for augmented reality and indoor positioning: a vision-based 3D feature database approach
WO2021093679A1 (en) Visual positioning method and device
CN109272543B (en) Method and apparatus for generating a model
CN112927363A (en) Voxel map construction method and device, computer readable medium and electronic equipment
CN116261710A (en) Augmented reality content generator for destination activity
CN116325663A (en) Augmented reality content generator for identifying geographic locations
CN110991373A (en) Image processing method, image processing apparatus, electronic device, and medium
CN111292420A (en) Method and device for constructing map
CN115769260A (en) Photometric measurement based 3D object modeling
CN116250012A (en) Method, system and computer readable storage medium for image animation
CN113033677A (en) Video classification method and device, electronic equipment and storage medium
CN112308977A (en) Video processing method, video processing apparatus, and storage medium
CN113688839B (en) Video processing method and device, electronic equipment and computer readable storage medium
CN111652675A (en) Display method and device and electronic equipment
CN111191553A (en) Face tracking method and device and electronic equipment
CN115908679A (en) Texture mapping method, device, equipment and storage medium
CN110197459B (en) Image stylization generation method and device and electronic equipment
CN111710017A (en) Display method and device and electronic equipment
CN112287169B (en) Data acquisition method, device and system, electronic equipment and storage medium
CN111586295B (en) Image generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant