CN116109964A

CN116109964A - Intelligent extraction method and device for video map, storage medium and computer equipment

Info

Publication number: CN116109964A
Application number: CN202211515520.7A
Authority: CN
Inventors: 任加新; 刘万增; 陈军; 李然; 翟曦; 王新鹏
Original assignee: NATIONAL GEOMATICS CENTER OF CHINA
Current assignee: NATIONAL GEOMATICS CENTER OF CHINA
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-05-12

Abstract

The invention discloses an intelligent extraction method, an intelligent extraction device, a storage medium and computer equipment for a video map, relates to the field of electronic map extraction and computer vision, and can solve the technical problems of large auditing amount and low efficiency caused by manually auditing videos frame by frame. The method comprises the following steps: constructing a video map data set, wherein the video map data set comprises a target map picture and a target non-map picture; training a basic network by utilizing the video map data set to obtain an optimal video map extraction model; and obtaining a file to be extracted, and extracting the file to be extracted by using an optimal video map extraction model to obtain a first extraction result.

Description

Intelligent extraction method and device for video map, storage medium and computer equipment

Technical Field

The invention relates to the field of electronic map extraction and computer vision, in particular to an intelligent video map extraction method, an intelligent video map extraction device, a storage medium and computer equipment.

Background

With the rapid rise of short video platforms, the internet has a large number of videos including maps, which may contain illegal and illegal "problem maps" with serious consequences, so that auditing the video maps is of great significance.

Because the map rarely appears in the video, and the video is composed of a series of temporally consecutive video frames (pictures), the variation between frames reflects the variation of video content over time, so that the video needs to be audited frame by frame to avoid omission. At present, map auditing needs to highly rely on geographic information of experts and volunteers to report map errors, so that the traditional map auditing wastes a great deal of manpower and material resources, namely, has high cost and low efficiency, and is difficult to meet all-weather, large-scale and large-batch map auditing requirements.

Disclosure of Invention

In view of the above, the invention provides an intelligent extraction method, an intelligent extraction device, a storage medium and computer equipment for video maps, which can solve the technical problems of large auditing amount and low efficiency caused by manually auditing videos frame by frame.

According to an aspect of the present invention, there is provided an intelligent extraction method of a video map, the method comprising:

constructing a video map data set, wherein the video map data set comprises a target map picture and a target non-map picture;

training a basic network by utilizing the video map data set to obtain an optimal video map extraction model;

and obtaining a file to be extracted, and extracting the file to be extracted by using the optimal video map extraction model to obtain a first extraction result.

Preferably, the constructing a video map data set includes:

acquiring a video containing the target map picture, and extracting the target map picture and the target non-map picture from the video containing the target map picture; and/or the number of the groups of groups,

acquiring a map picture and a non-map picture, taking the map picture as the target map picture, and taking the non-map picture as the target non-map picture; and/or the number of the groups of groups,

and acquiring the target map picture after the video processing of the map picture and acquiring the target non-map picture after the video processing of the non-map picture.

Preferably, the video processing includes:

shooting the map picture and the non-map picture dynamically and randomly at multiple angles; or alternatively, the first and second heat exchangers may be,

the map picture and the non-map picture are processed using a generation countermeasure network.

Preferably, the extracting the file to be extracted by using the optimal video map extraction model to obtain a first extraction result includes:

if the file type of the file to be extracted is video, determining the file to be extracted as video to be extracted, reading the video to be extracted according to a preset time interval to obtain a first picture to be extracted, and extracting a video map from the first picture to be extracted by using the optimal video map extraction model;

If the file type of the file to be extracted is a picture, determining the file to be extracted as a second picture to be extracted, and extracting a picture map from the second picture to be extracted by using the optimal video map extraction model.

Preferably, the method further comprises:

acquiring the first extraction result, checking whether the label of the first extraction result is correct, if not, modifying the label of the first extraction result, adding the first extraction result with the correct label into the video map data set, and training the optimal video map extraction model by using the updated video map data set, wherein the label comprises a map and a non-map;

and extracting the file to be extracted by using the updated optimal video map extraction model to obtain a second extraction result.

Preferably, the training the base network by using the video map data set to obtain an optimal video map extraction model includes:

determining a model training strategy;

setting a network model super-parameter, creating a training set and a verification set of the video map data set, training a basic network according to the model training strategy, the network model super-parameter and the training set, verifying the trained basic network by using the verification set until the preset training times are reached, obtaining a first basic network, and monitoring model evaluation indexes of the first basic network;

Adjusting the network model hyper-parameters, training a basic network according to the model training strategy, the adjusted network model hyper-parameters and the training set, verifying the trained basic network by utilizing the verification set until the preset training times are reached, obtaining a second basic network, and monitoring model evaluation indexes of the second basic network;

selecting the optimal model evaluation index of the first basic network and the optimal model evaluation index of the second basic network as an optimal video map extraction model, wherein the model evaluation index comprises at least one of the following: training loss, training accuracy, training recall, training accuracy, verification loss, verification accuracy, verification recall, verification accuracy.

Preferably, the model training strategy comprises at least one of the following: data enhancement policies, image and tag mixing policies, tag smoothing and regularization policies, and sample proportion balancing policies.

According to still another aspect of the present invention, there is provided an intelligent extraction apparatus for a video map, the apparatus comprising:

the system comprises a construction module, a display module and a display module, wherein the construction module is used for constructing a video map data set, and the video map data set comprises a target map picture and a target non-map picture;

The training module is used for training the basic network by utilizing the video map data set to obtain an optimal video map extraction model;

the extraction module is used for obtaining a file to be extracted, extracting the file to be extracted by using the optimal video map extraction model, and obtaining a first extraction result.

According to still another aspect of the present invention, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described intelligent extraction method of a video map.

According to still another aspect of the present invention, there is provided a computer device including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, the processor implementing the intelligent extraction method of a video map as described above when executing the program.

By means of the technical scheme, the intelligent extraction method, the intelligent extraction device, the storage medium and the computer equipment for the video map can firstly construct a video map data set, wherein the video map data set comprises a target map picture and a target non-map picture; then, training a basic network by utilizing the video map data set to obtain an optimal video map extraction model; and finally, acquiring a file to be extracted, and extracting the file to be extracted by using an optimal video map extraction model to obtain a first extraction result. According to the technical scheme, the file type of the file to be extracted can be video or picture, specifically, aiming at the problems that the map rarely appears in the video but needs to be checked frame by frame to avoid missing inspection, so that a large amount of manpower and material resources are wasted in traditional map checking, the optimal video map extraction model is utilized to carry out intelligent video map extraction, the map can be extracted from massive video streams in advance through the optimal video map extraction model, and then a map checking expert only needs to check whether the extracted map is correct or not, the map does not need to be found out frame by frame first, and then the map is checked, so that the speed and efficiency of map checking are greatly improved. And meanwhile, the optimal video map extraction model can be used for intelligently extracting massive pictures, map pictures are extracted from the massive pictures, and then a map examining expert only needs to examine whether the extracted map is correct or not, and does not need to firstly find out the map from the massive pictures and then examine the map, so that the speed and efficiency of map examination are greatly improved.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute an undue limitation to the present application. In the drawings:

fig. 1 shows a flow diagram of an intelligent extraction method of a video map according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of another intelligent video map extraction method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an intelligent video map extraction device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of another intelligent video map extraction device according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

It should be noted that, the video map auditing is a sub-field of map auditing, and because the existence probability of a map in a video is extremely low compared with that of a natural picture, if the video is audited manually frame by frame, the auditing amount is far greater than that of auditing a single picture, and the efficiency is necessarily low; if only one part of the video is checked, the missed check is inevitably caused, and the propagation of a 'problem map' is promoted.

With the development of the electronic map market, only the auditing technology is advanced, and a safety net can be woven before the electronic map enters the market. For a long time, because of the lack of intelligent map examining technology, software and platform support, map examining work in China is seriously dependent on manual interpretation, and particularly, when examining videos, the videos to be examined need to be watched frame by frame, so that the map examining work becomes a work with high labor intensity, the map examining efficiency is seriously reduced, and all-weather, large-scale and large-scale map examining requirements are difficult to meet.

Aiming at the technical problems of large auditing amount and low efficiency caused by manually auditing videos frame by frame, the embodiment provides an intelligent extraction method of a video map, as shown in fig. 1, the method comprises the following steps:

101. and constructing a video map data set, wherein the video map data set comprises a target map picture and a target non-map picture.

It should be noted that, since the video is composed of a series of temporally consecutive video frames (pictures), it is necessary to construct a video map data set including a target map picture and a target non-map picture, wherein sources of the target map picture and the target non-map picture include, but are not limited to: any one or more of a picture in a non-video, a picture in a video, and a picture after the video processing is performed on the picture in the non-video.

Wherein, since the pictures in the video have such characteristics: 1. the dynamics of the video results in the pictures in the video also being dynamic; 2. compared with the picture form, the picture in the video is likely to be multi-angle and low-definition, and the picture forms are all orthographic and have little deformation; 3. the continuity of video results in a large number of extremely similar pictures in adjacent times. Therefore, it is necessary to perform a video processing on a picture in a non-video, the video processing including: shooting map pictures and non-map pictures dynamically and randomly at multiple angles; or, the map picture and the non-map picture are processed by using a generation countermeasure network (Generative Adversarial Networks, GAN) to generate a picture with video characteristics.

Accordingly, constructing a video map data set includes: source 1: acquiring a video containing a target map picture (such as joint map auditing departments and Internet collection), and extracting the target map picture and a target non-map picture from the video containing the target map picture; and/or, source 2: acquiring a map picture and a non-map picture, taking the map picture as a target map picture, and taking the non-map picture as a target non-map picture; and/or, source 3: and obtaining a target map picture after the video processing of the map picture and obtaining a target non-map picture after the video processing of the non-map picture. Taking the example of the target map pictures and the target non-map pictures in the video map data set including the source 1, the source 2 and the source 3, the samples in the video map data set are fully mixed so as to be randomly arranged.

The map pictures comprise maps in broad sense such as topography maps, line maps, color block maps, remote sensing images, earth surface coverage classification maps and the like, historical maps such as ancient maps and the like, and the non-map pictures are pictures of other types besides map pictures, such as natural pictures and the like.

Preferably, the target map and the target non-map included in the video map data set are processed, such as removing invalid pictures, wherein the invalid pictures include, but are not limited to: wrong format, incomplete information, unable to be opened, low definition. As another example, whether the target map or the target non-map is an RGB channel image, the non-RGB channel image needs to be converted into an RGB channel image.

102. And training the basic network by utilizing the video map data set to obtain an optimal video map extraction model.

Among which, the base network used includes, but is not limited to: VGG, inception, resNet, resNeXt, xception, inceptionResNet.

For this embodiment, as an implementation manner, the video map data set is first divided into the training set and the verification set according to a proportion, where all the pictures in the video map data set may have all the labels, or may have some of the labels, and some of the pictures in the training set may have all the labels, or may have some of the labels, or some of the pictures in the verification set may have all the labels, or some of the pictures may have no labels, and the method is not limited herein.

And then training the basic network by using a training set, verifying the basic network trained by the training set by using a verification set, and obtaining an initial video map extraction model after the preset training times are reached. Obtaining an initial video map extraction model for each basic network according to the same method, monitoring model evaluation indexes, and selecting the model evaluation indexes with the optimal model evaluation indexes in all the initial video map extraction models as the optimal video map extraction models, wherein the model evaluation indexes comprise at least one of the following: training loss, training accuracy, training recall, training accuracy, verification loss, verification accuracy, verification recall, verification accuracy.

Preferably, the video map data set may include a test set, for example, a training set, in addition to the training set and the verification set: verification set: the test set=8:1:1, because the target map picture and the target non-map picture in the test set are not input into the optimal video map extraction model, the optimal video map extraction model is tested through the test set, and the extraction capacity of the optimal video map extraction model can be truly represented by the value corresponding to the monitored model evaluation index. The values corresponding to the model evaluation indexes at this time are recorded as shown in table 1:

TABLE 1

The dimension of the extraction capacity is listed: loss, precision, recall, F1 measure, and accuracy. The optimal video map extraction model obtained in the step 102 of the embodiment can be intuitively seen, and compared with a basic network, the extraction capability is remarkably improved.

103. And obtaining a file to be extracted, and extracting the file to be extracted by using an optimal video map extraction model to obtain a first extraction result.

For this embodiment, as an implementation manner, the optimal video map extraction model may extract, in addition to a video map (the file type of the video map is a picture, and the content of the video map is a map) in the video stream, a picture map (the file type of the picture map is a picture, and the content of the picture map is a map) from a huge amount of pictures, so that the optimal video map extraction model is deployed on the internet, when a user uploads a file to be extracted, the optimal video map extraction model automatically acquires the file to be extracted, first determines the file type of the file to be extracted, and then performs an extraction operation to obtain a first extraction result, where the first extraction result is the video map when the file type of the file to be extracted is the video type, and where the first extraction result is the picture map when the file type of the file to be extracted is the picture type.

The map is intelligently extracted from the massive information through the optimal video map extraction model, and an image-examining expert can directly examine the extracted map without finding out the map from the massive information through manpower, so that the efficiency is improved, and the cost is saved.

The invention provides an intelligent extraction method, a device, a storage medium and computer equipment of a video map, which can firstly construct a video map data set, wherein the video map data set comprises a target map picture and a target non-map picture; then, training a basic network by utilizing the video map data set to obtain an optimal video map extraction model; and finally, acquiring a file to be extracted, and extracting the file to be extracted by using an optimal video map extraction model to obtain a first extraction result. According to the technical scheme, the file type of the file to be extracted can be video or picture, specifically, aiming at the problems that the map rarely appears in the video but needs to be checked frame by frame to avoid missing inspection, so that a large amount of manpower and material resources are wasted in traditional map checking, the optimal video map extraction model is utilized to carry out intelligent extraction of the video map, the map can be extracted from massive video streams in advance through a computer, and then a map checking expert only needs to check whether the extracted map is correct or not, the map does not need to be found out frame by frame first, and then the map is checked, so that the speed and efficiency of map checking are greatly accelerated. And meanwhile, the optimal video map extraction model can be used for intelligently extracting massive pictures, map pictures are extracted from the massive pictures, and then a map examining expert only needs to examine whether the extracted map is correct or not, and does not need to firstly find out the map from the massive pictures and then examine the map, so that the speed and efficiency of map examination are greatly improved.

Further, as a refinement and extension of the specific implementation manner of the foregoing embodiment, in order to fully describe the specific implementation process in this embodiment, another intelligent video map extraction method is provided, as shown in fig. 2, where the method includes:

201. and constructing a video map data set, wherein the video map data set comprises a target map picture and a target non-map picture.

For the present embodiment, the specific implementation is the same as that of embodiment step 101, and will not be described here again.

202. A model training strategy is determined.

For the present example, as an implementation, the model training strategy includes at least one of the following: data enhancement policies, image and tag mixing policies, tag smoothing and regularization policies, and sample proportion balancing policies. The model training strategy is implemented in the basic model training process, so that the model training effect can be improved, and the trained model is more accurate.

Specifically, the diversity of the samples is improved through the data enhancement strategy, the original samples are not changed, and the transformation types are linearly increased in a selectable range according to the increase of training iteration times:

where k represents at most k transformation methods, n represents the current round, epoch represents the total round number, count is the number of transformation types, and [ (] represents the downward rounding). Among them, data enhancement strategies include, but are not limited to: data decentralization, data standardization, data ZCA whitening, data graying, data random rotation, data horizontal offset, data vertical offset, data random channel offset, data random horizontal overturn, data random vertical overturn, data binarization and data random clipping.

Specifically, the intra-class distance is reduced through an image and label mixing strategy, the inter-class distance is increased, the model identification precision is improved, the model robustness is maintained, and the overfitting is reduced, wherein the image and label mixing strategy is as follows: the image mixing and the labels corresponding to the images are also mixed, specifically, any two pictures (such as between target maps, such as between target non-maps, such as between target maps and target non-maps) are overlapped pixel by pixel, such as the target map is multiplied by a first preset pixel of 0.8+ and the target non-map is multiplied by a second preset pixel of 0.2 (the first preset pixel corresponds to the target map, the second preset pixel corresponds to the target non-map), the label correspondence is also overlapped, such as map-non-map, after the overlapping, a new picture is obtained, and the new picture is also used for training the basic network, wherein the labels are the categories of the pictures and are divided into the map and the non-map.

Specifically, model overfitting is prevented by a label smoothing and regularization strategy that includes, but is not limited to: l1 regularization, L2 regularization, L1-L2 hybrid regularization.

In particular, maps appear in video with very low probability, and extreme sample imbalance conditions can occur between maps and non-maps. Thus, the sample scale balancing strategy is to weight a few samples more as the model is trained, including but not limited to: the method is based on the inverse weighting of the number of the samples and focus Loss (Focal Loss), so that more attention is given to the map samples, the proportion of positive and negative samples is balanced on the algorithm level, and the identification accuracy of the map is improved.

203. Setting network model super parameters, creating a training set and a verification set of a video map data set, training a basic network according to a model training strategy, the network model super parameters and the training set, verifying the trained basic network by using the verification set until the preset training times are reached, obtaining a first basic network, and monitoring model evaluation indexes of the first basic network.

204. And adjusting the network model super-parameters, training the basic network according to the model training strategy, the adjusted network model super-parameters and the training set, verifying the trained basic network by using the verification set until the preset training times are reached, obtaining a second basic network, and monitoring the model evaluation index of the second basic network.

205. And selecting the optimal model evaluation index of the first basic network and the optimal model evaluation index of the second basic network as an optimal video map extraction model.

For example steps 203-205, as one implementation, the ratio of the training set to the validation set of the video map data set is determined, the video map data set is divided into the training set and the validation set according to the ratio, the training set and the validation set should maintain the same mathematical distribution, and each of them contains only two types of samples: positive sample sets (target map pictures), and negative sample sets (target non-map pictures).

Wherein, the network model hyper-parameters may include: learning rate, iteration number, weight initialization method (one of ImageNet, none and random initialization can be adopted), monitoring index and loss function.

The setting of the network model hyper-parameters may be according to empirical values, for example, learning rate=0.001, iteration number=30, weight initialization method=imagenet, monitoring index=validation set loss and precision, loss function=cross entropy function. And then training the basic network according to the network model hyper-parameters, the model training strategy and the training set, verifying the trained basic network by utilizing the verification set until the preset training times are reached, obtaining a first basic network, and monitoring the model evaluation index of the first basic network. And adjusting the super parameters of the network model, wherein the super parameters of part of the network model and all the network model can be adjusted, the super parameters are not limited, and then training the basic network according to the adjusted super parameters of the network model, the model training strategy and the training set, verifying the trained basic network by using the verification set until the preset training times are reached, obtaining a second basic network, and monitoring the model evaluation index of the second basic network. Preferably, the number of times of adjusting the network model hyper-parameters may be preset, and each time of adjustment, a second base network is correspondingly obtained, and in step 205 of the embodiment, the optimal is selected from the first base network model and the second base network model, which means that the number of times of adjusting the network model hyper-parameters is the same as that of selecting from the first base network model and the plurality of second base network models.

Wherein the model evaluation index comprises at least one of the following: training loss, training accuracy, training recall, training accuracy, verification loss, verification accuracy, verification recall, verification accuracy. And selecting the first basic network model and the second basic network model to be the optimal model evaluation index as the optimal video map extraction model.

206. And obtaining a file to be extracted, and extracting the file to be extracted by using an optimal video map extraction model to obtain a first extraction result.

For this embodiment, as an implementation manner, after a file to be extracted is obtained, the file type of the file to be extracted is determined, if the file type of the file to be extracted is video, the file to be extracted is determined to be video to be extracted, the video to be extracted is read according to preset time intervals (wherein, each preset time interval corresponds to a key frame of a specific time of video, the video can be converted into a picture by sampling the key frame, and the frequency of generating the picture by setting the preset time interval, for example, the preset time interval is 10 seconds, the video to be extracted is 1 minute, then 6 pictures are generated by the video to be extracted), a first picture to be extracted is obtained, and a video map is extracted from the first picture to be extracted by using an optimal video map extraction model; if the file type of the file to be extracted is the picture, determining the file to be extracted as a second picture to be extracted, and extracting a picture map from the second picture to be extracted by using an optimal video map extraction model.

Preferably, the optimal video map extraction model can firstly judge whether a map exists in the file after the file to be extracted is acquired, if the map does not exist, the file type of the file to be extracted does not need to be judged, and if the map does not exist, the file type of the file to be extracted is continuously judged, so that the extraction efficiency is improved.

207. And acquiring a first extraction result, checking whether the label of the first extraction result is correct, if not, modifying the label of the first extraction result, adding the first extraction result with the correct label into a video map data set, and training an optimal video map extraction model by utilizing the updated video map data set.

Wherein the label includes a map and a non-map.

For this embodiment, as an implementation manner, the label of the first extraction result is a prediction label, the prediction label may be correct or may be incorrect, the first extraction result is sent to an aesthetic expert for further checking, the error change in the prediction label is correct, for example, the first extraction result includes 20 pictures, the prediction labels are all maps, after checking, it is found that 2 real labels are non-maps, then the 2 labels are changed into non-maps, the other 18 labels are still maps, the 20 pictures with correct labels are added to the video map data set, and the updated video map data set is used for training the optimal video map extraction model.

Specifically, training the optimal video map extraction model by using the updated video map data set comprises two training modes, namely incremental training, namely training the optimal video map extraction model by using only newly added data as long as the newly added data is in the video map data set, wherein the training mode has the characteristics of high frequency and less training data; the other is full training, namely, firstly, a preset time period is determined, and the optimal video map extraction model is trained by using historical data and newly-added data in the preset time period.

By iterating the optimal video map extraction model, the extraction accuracy of the model can be continuously improved. The map auditing work can be carried out without watching the video frame by the image-examining staff, and the national main authority, the safety and the benefits are effectively maintained. The map auditing is changed from traditional manual auditing which takes time and labor into semi-automatic and intelligent auditing, so that the technical blank in the field of intelligent auditing of maps, in particular to intelligent auditing of video maps, is made up.

208. And extracting the file to be extracted by using the updated optimal video map extraction model to obtain a second extraction result.

For the present embodiment, the first extraction result is different from the second extraction result in that the first extraction result is extracted by the pre-update optimal video map extraction model, the second extraction result is extracted by the updated optimal video map extraction model, and the second extraction result has higher accuracy than the first extraction result.

In summary, the technical effects of this scheme have:

1. and the efficiency of map auditing is improved. Map auditing relies on the manual interpretation of a map-examining expert for a long time, and particularly, the map-examining expert is required to watch video frame by frame in video auditing, so that a great deal of manpower and material resources are wasted. According to the scheme, an artificial intelligence technology is introduced to automatically extract the map in the video to be audited, and an inspector can carry out map auditing work without watching the video frame by frame;

2. the technical problem of 'neck clamping' of multi-source map extraction is solved. Maps exist in various modes, such as pictures and videos; the map of the same area can be drawn by different drawing methods according to different purposes, such as provincial color distribution, color block map and the like; the map is various, such as a topographic map, a navigation map, a land utilization map and the like; the scheme can accurately extract maps of different styles and different types;

3. And constructing a bridge from picture form map auditing to video map auditing. Maps in video often have poor quality due to incorrect shooting angles or poor sharpness. According to the scheme, the video map is generated through the picture-form map, the multi-angle multi-source video map data set is constructed, and the video map under the multi-angle can be successfully extracted.

Further, as a specific implementation of the method shown in fig. 1 and fig. 2, an embodiment of the present invention provides an intelligent video map extracting apparatus, as shown in fig. 3, including: a construction module 31, a training module 32 and an extraction module 33;

a construction module 31 operable to construct a video map data set, wherein the video map data set comprises a target map picture and a target non-map picture;

the training module 32 is configured to train the base network by using the video map data set to obtain an optimal video map extraction model;

the extraction module 33 may be configured to obtain a file to be extracted, and extract the file to be extracted by using the optimal video map extraction model, so as to obtain a first extraction result.

Accordingly, to construct a video map data set, the construction module 31 may be specifically configured to: acquiring a video containing the target map picture, and extracting the target map picture and the target non-map picture from the video containing the target map picture; and/or acquiring a map picture and a non-map picture, wherein the map picture is used as the target map picture, and the non-map picture is used as the target non-map picture; and/or acquiring the target map picture after the map picture is subjected to the video processing, and acquiring the target non-map picture after the non-map picture is subjected to the video processing.

Correspondingly, for the purpose of video processing, the construction module 31 may be specifically further configured to dynamically and randomly shoot the map picture and the non-map picture at multiple angles; or, processing the map picture and the non-map picture by using a generation countermeasure network.

Accordingly, in order to extract the file to be extracted by using the optimal video map extraction model, a first extraction result is obtained, and the extraction module 33 is specifically configured to: if the file type of the file to be extracted is video, determining the file to be extracted as video to be extracted, reading the video to be extracted according to a preset time interval to obtain a first picture to be extracted, and extracting a video map from the first picture to be extracted by using the optimal video map extraction model; if the file type of the file to be extracted is a picture, determining the file to be extracted as a second picture to be extracted, and extracting a picture map from the second picture to be extracted by using the optimal video map extraction model.

In a specific application scenario, an intelligent extraction device for a video map, as shown in fig. 4, the device further includes: the updating module 34 is specifically configured to obtain the first extraction result, check whether the label of the first extraction result is correct, if not, modify the label of the first extraction result, add the first extraction result with the correct label to the video map dataset, and train the optimal video map extraction model by using the updated video map dataset, where the label includes a map and a non-map; and extracting the file to be extracted by using the updated optimal video map extraction model to obtain a second extraction result.

Accordingly, in order to train the base network using the video map data set to obtain an optimal video map extraction model, the training module 32 is specifically configured to: determining a model training strategy comprising at least one of: a data enhancement strategy, an image and tag mixing strategy, a tag smoothing and regularization strategy and a sample proportion balancing strategy; setting a network model super-parameter, creating a training set and a verification set of the video map data set, training a basic network according to the model training strategy, the network model super-parameter and the training set, verifying the trained basic network by using the verification set until the preset training times are reached, obtaining a first basic network, and monitoring model evaluation indexes of the first basic network; adjusting the network model hyper-parameters, training a basic network according to the model training strategy, the adjusted network model hyper-parameters and the training set, verifying the trained basic network by utilizing the verification set until the preset training times are reached, obtaining a second basic network, and monitoring model evaluation indexes of the second basic network; selecting the optimal model evaluation index of the first basic network and the optimal model evaluation index of the second basic network as an optimal video map extraction model, wherein the model evaluation index comprises at least one of the following: training loss, training accuracy, training recall, training accuracy, verification loss, verification accuracy, verification recall, verification accuracy.

It should be noted that, other corresponding descriptions of each functional unit related to the intelligent extraction device for video map provided in this embodiment may refer to corresponding descriptions of fig. 1 to 2, and are not described herein again.

Based on the above-mentioned method shown in fig. 1 to 2, correspondingly, the present embodiment further provides a storage medium, which may be specifically volatile or nonvolatile, and has a computer program stored thereon, where the program when executed by a processor implements the above-mentioned intelligent video map extraction method shown in fig. 1 to 2.

Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method of each implementation scenario of the present invention.

Based on the method shown in fig. 1 to fig. 2 and the virtual device embodiments shown in fig. 3 and fig. 4, in order to achieve the above objects, the embodiments of the present application further provide a computer device, which may specifically be a personal computer, a server, a network device, etc., where the computer device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the intelligent extraction method of video maps as shown in fig. 1 and 2.

Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.

It will be appreciated by those skilled in the art that the architecture of a computer device provided in this embodiment is not limited to this physical device, but may include more or fewer components, or may be combined with certain components, or may be arranged in a different arrangement of components.

The storage medium may also include an operating system, a network communication module. An operating system is a program that manages the computer device hardware and software resources described above, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among all components in the nonvolatile storage medium and communication with other hardware and software in the information processing entity equipment.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present invention may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware.

Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the invention. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the invention, and the invention is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the invention.

Claims

1. An intelligent extraction method of a video map is characterized by comprising the following steps:

2. The method of claim 1, wherein the constructing a video map data set comprises:

3. The method of claim 2, wherein the video processing comprises:

4. The method according to claim 1, wherein the extracting the file to be extracted using the optimal video map extraction model to obtain a first extraction result includes:

5. The method as recited in claim 1, further comprising:

6. The method according to claim 1, wherein training the base network with the video map data set to obtain an optimal video map extraction model comprises:

determining a model training strategy;

7. The method of claim 6, wherein the model training strategy comprises at least one of: data enhancement policies, image and tag mixing policies, tag smoothing and regularization policies, and sample proportion balancing policies.

8. An intelligent extraction device for video maps, the device comprising:

9. A storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the intelligent extraction method of a video map according to any one of claims 1 to 7.

10. A computer device comprising a memory, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the intelligent extraction method of a video map according to any one of claims 1 to 7 when executing the program.