CN111832578A

CN111832578A - Interest point information processing method and device, electronic equipment and storage medium

Info

Publication number: CN111832578A
Application number: CN202010700108.7A
Authority: CN
Inventors: 刘慧�; 孙靓
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-10-27

Abstract

The application discloses a method for processing interest point information, relates to the field of image processing, and particularly relates to an intelligent traffic technology. The specific implementation scheme comprises the following steps: acquiring an image to be processed; at least one candidate point of interest POI signboard image is extracted from the image to be processed. And for each candidate POI sign image of the at least one candidate POI sign image: extracting a first class of features and a second class of features of the candidate POI signboard image; classifying the candidate POI signboard images based on the first class characteristics and the second class characteristics to determine the type of the candidate POI signboard images; and when the candidate POI signboard image is determined to belong to the preset type, determining the candidate POI signboard image as the POI signboard image to be identified. The application also discloses an interest point information processing device, electronic equipment and a storage medium.

Description

Interest point information processing method and device, electronic equipment and storage medium

Technical Field

The application relates to the field of image processing, in particular to an intelligent traffic technology. More specifically, the application provides a point of interest information processing method, device, equipment and storage medium.

Background

In the prior art, a processing mode of first detection and then identification is performed on an interest point signboard image. However, subject to the accuracy of the detection algorithm, there may be errors in the detection results for the point of interest sign images. For example, a billboard image in a street view image is erroneously detected as a point of interest signboard image. The above detection error will cause a subsequent invalid recognition process for an image not containing real interest point signboard information, so that the recognition accuracy and recognition efficiency of the interest point signboard image are low.

Disclosure of Invention

Provided are a point of interest information processing method, device, equipment and storage medium.

According to a first aspect, a method for processing point of interest information is provided, which includes: acquiring an image to be processed; at least one candidate point of interest POI signboard image is extracted from the image to be processed. And for each candidate POI sign image of the at least one candidate POI sign image: extracting a first class of features and a second class of features of the candidate POI signboard image; classifying the candidate POI signboard images based on the first class characteristics and the second class characteristics to determine the type of the candidate POI signboard images; and when the candidate POI signboard image is determined to belong to the preset type, determining the candidate POI signboard image as the POI signboard image to be identified.

According to a second aspect, there is provided a point of interest information processing apparatus comprising: the device comprises an acquisition module, an image extraction module, a feature extraction module, a classification module and a determination module. The acquisition module is used for acquiring an image to be processed. The image extraction module is used for extracting at least one candidate point of interest POI signboard image from the image to be processed. The feature extraction module is used for extracting a first class feature and a second class feature of each candidate POI signboard image in the at least one candidate POI signboard image. The classification module is configured to classify the candidate POI sign image based on the first class features and the second class features of the candidate POI sign image to determine a type to which the candidate POI sign image belongs. The determination module is used for determining the candidate POI signboard image as the POI signboard image to be recognized when the candidate POI signboard image is determined to belong to the preset type.

According to a third aspect, there is provided an electronic device comprising: at least one processor, and a memory communicatively coupled to the at least one processor. The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the point of interest information processing method provided by the application.

According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the point of interest information processing method provided according to the present application.

After candidate POI signboard images are extracted from the image to be processed, classification processing is further performed on each candidate POI signboard image, so that candidate POI signboard images which are expected (namely belong to a preset type) are further screened out to serve as the POI signboard images to be identified. Therefore, later invalid identification caused by early detection errors can be avoided, the time and the occupation of computing resources of the invalid image in the subsequent identification process are reduced, and the pertinence of the subsequent identification process is improved. In addition, in the classification process, the first class characteristics and the second class characteristics of the candidate POI signboard images are comprehensively considered, and the two classes of characteristics describe the image characteristics from different layers, so that the content of the candidate POI signboard images can be more accurately and comprehensively described, and the accuracy rate and the recall rate of classification processing can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1A is an exemplary schematic diagram of a POI sign image in accordance with an embodiment of the present application;

FIG. 1B is an exemplary system architecture for applying the point of interest information processing method and apparatus according to an embodiment of the present application;

FIG. 2 is a flow chart of a point of interest information processing method according to an embodiment of the present application;

FIG. 3 is an exemplary diagram of a point of interest information processing process according to an embodiment of the application;

FIG. 4 is an exemplary flow chart of a point of interest information processing method according to another embodiment of the present application;

FIG. 5 is an exemplary illustration of a first class of features of a candidate POI sign image in accordance with an embodiment of the present application;

FIG. 6 is an exemplary flow diagram for constructing and training a classification model according to an embodiment of the present application;

fig. 7 is a block diagram of a point of interest information processing apparatus according to an embodiment of the present application; and

fig. 8 is a block diagram of an electronic device for implementing a point of interest information processing method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

A point of interest (POI) is coordinate point marking data on an electronic map, which is used to mark places represented by the associated coordinate point, such as government departments, commercial institutions (such as gas stations, department stores, supermarkets, restaurants, hotels, convenience stores, etc.), tourist attractions (such as parks, historic sites, etc.), transportation facilities (such as various stations, parking lots, speeding cameras, speed limit marks, etc.), etc. When a certain POI is presented on an electronic map, the name, category, geographical location, and other relevant information of the POI need to be known.

In one approach, information such as the name and category of a POI may be determined based on a POI signboard (signboard) image. For example, referring to fig. 1A, fig. 1A is an exemplary schematic diagram of a POI sign image according to one embodiment of the present application. When the POI sign image shown in fig. 1A is acquired, the machine can recognize the text information included in the POI sign image, such as "convenience store", so that the name and category of the POI can be determined based on the recognition result. Illustratively, an actual street view image can be acquired through various channels, and one or more POI signboard images are obtained by detecting and positioning the actual street view image as an image to be processed by utilizing a target detection algorithm. Limited by the detection accuracy of the target detection algorithm, there may be errors in the detection result, such as misdetecting a billboard image in a street view image as a POI signboard image. The above detection error will cause a subsequent invalid recognition process for an image not containing real POI sign information, so that the recognition accuracy and recognition efficiency of the POI sign image are low.

Fig. 1B is an exemplary system architecture 100 to which the point of interest information processing method and apparatus may be applied, according to one embodiment of the present application. It should be noted that fig. 1B is only an example of a system architecture to which the embodiments of the present application may be applied to help those skilled in the art understand the technical content of the present application, and does not mean that the embodiments of the present application may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1B, the system architecture 100 according to this embodiment may include a plurality of terminal devices 110, a network 120, and a server 130. The terminal device 110 may be various terminal devices capable of capturing images, such as a smart phone, a tablet computer, a video camera, a camera, and the like, which is not limited herein. The server 130 may be any electronic device with certain computing capabilities, and is not limited herein. The following illustrates the interaction between the terminal device 110 and the server 130 via the network 120.

Illustratively, the terminal device 110 may be a professional panoramic capturing device equipped on a motor vehicle such as an automobile or a motorcycle, or in some cases, even a manual load capturing device is used to perform carpet sweeping on a real world street to capture street images. After collection, the collected street images and some associated information (e.g., time information, geographical location information, etc.) are sent to server 130 via network 120 for analysis. Similarly, the terminal device 110 may be a personal terminal device of the user, and the user may capture some street view images and feed them back to the server 130 for analysis processing during daily work and life. The server 130 extracts effective POI signboard information based on the acquired street view image, thereby performing POI labeling and updating in the electronic map 140. Further, the electronic map 140 after the POI labeling and updating can also be pushed to a plurality of terminal devices (the terminal devices may be the same as or different from the terminal device 110) to implement functions such as map display and navigation.

According to the embodiment of the application, a method for processing the interest point information is provided. The method is illustrated by the figure below. It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.

Fig. 2 is a flowchart of a point of interest information processing method according to an embodiment of the present application, which may be performed, for example, on the server side shown in fig. 1B.

As shown in fig. 2, the method may include operations S201 to S205.

In operation S201, an image to be processed is acquired.

Exemplary embodiments of obtaining the image to be processed are described above, and are not described herein again.

In operation S202, at least one candidate POI signboard image is extracted from the image to be processed.

Next, for each of the at least one candidate POI sign image, the following operations S203 to S205 are performed.

In operation S203, first and second type features of the candidate POI sign image are extracted.

Illustratively, the first type of feature and the second type of feature are extracted in different ways, and feature information of the candidate POI signboard image at different levels is respectively represented.

In operation S204, the candidate POI sign image is classified based on the first class features and the second class features to determine a genre to which the candidate POI sign image belongs.

In operation S205, when it is determined that the candidate POI sign image belongs to the predetermined category, the candidate POI sign image is determined to be a POI sign image to be recognized.

Illustratively, the predetermined type may be set according to actual traffic needs. For example, the image containing the real POI sign information is set as the image of the predetermined type, or the image containing the real shop POI sign information is set as the image of the predetermined type with finer granularity, for example, without limitation. Candidate POI sign images that do not belong to the predetermined category may be referred to as invalid images without a subsequent identification process for the POI sign content.

As can be understood by those skilled in the art, after candidate POI signboard images are extracted from the image to be processed, the POI information processing method according to the embodiment of the present application further performs classification processing on each candidate POI signboard image to further screen out a candidate POI signboard image that is expected (i.e., belongs to a predetermined type) as the POI signboard image to be recognized. Therefore, later invalid identification caused by early detection errors can be avoided, the time and the occupation of computing resources of the invalid image in the subsequent identification process are reduced, and the pertinence of the subsequent identification process is improved. In addition, in the classification process, the first class characteristics and the second class characteristics of the candidate POI signboard images are comprehensively considered, and the two classes of characteristics describe the image characteristics from different layers, so that the content of the candidate POI signboard images can be more accurately and comprehensively described, and the accuracy rate and the recall rate of classification processing can be improved.

Fig. 3 is an exemplary diagram of a point of interest information processing procedure according to an embodiment of the present application.

As shown in fig. 3, after the to-be-processed image 310 is acquired, the to-be-processed image 310 may be detected by using an object detection (object detection) algorithm, such as Fast regional convolutional neural network (Fast convolutional neural network), Fast RCNN (Fast convolutional neural network), or the like. For example, the image 310 to be processed may be input to the target detection network 320 for processing, so as to obtain the detection result 330. The detection result 330 generates at least one detection frame (bounding box)331 (2 detection frames in this example) and the category and position of the target object for which each detection frame is directed in the image to be processed 310. In this example, the POI signboard image is used as a target object. Based on the detection result, a local image area surrounded by each of the at least one detection frame 331 may be segmented from the image to be processed as at least one candidate POI signboard image. As shown in fig. 3, this example may result in 2 candidate POI sign images 340. Next, each candidate POI sign image 340 is classified using a pre-trained classification model 350.

According to the embodiment of the application, when the candidate POI sign image 340 is determined to belong to the predetermined category, the candidate POI sign image 340 is determined to be the POI sign image to be recognized, and the candidate POI sign image 340 may be subjected to text recognition by using the recognition model 360 to obtain a text recognition result for the candidate POI sign image 340. The recognition process can be realized by using an OCR (optical character recognition) technique, for example. So that the POI for the POI signboard image to be recognized can be labeled with the text recognition result in the electronic map 370.

It can be understood that, in the above process, the target object which is similar to the POI signboard in the image to be processed is detected by using the target detection algorithm, so that the detection accuracy rate for the real POI signboard image is high, and the detection speed is high. In some cases, the above process may also detect images of other objects (e.g., billboards, signs, etc.) that approximate the POI sign as candidate POI sign images as well. In order to avoid the loss of accuracy and efficiency brought by the detection errors to the subsequent identification process, according to the embodiment of the application, the classification processing process is continuously carried out on the candidate POI signboard images, the candidate POI signboard images are determined to be the POI signboard images to be identified according to the classification result, then the candidate POI signboard images are identified, and the labeling process of the electronic map is carried out according to the identification result.

Fig. 4 is an exemplary flowchart of a method for processing point of interest information according to another embodiment of the present application, which is used to exemplarily describe a process of the above operations S203 to S205 for classifying in combination with two types of features of a candidate POI image.

As shown in fig. 4, the method may include operations S401 to S406.

In operation S401, candidate POI sign images are acquired.

The candidate POI sign image obtained in operation S401 may be each POI sign image extracted from the image to be processed. The extraction process has been illustrated above and will not be described herein.

In operation S402, a first-class feature of the candidate POI sign image is extracted by a first-class feature extraction method.

Illustratively, a manually designed visual feature extraction approach may be employed to extract the first class features of the candidate POI sign images. For example, the first class of features may include color histogram (color histogram) features, color correlation graph features, Histogram of Oriented Gradient (HOG) features, scale-invariant feature transform (SIFT) features, and the like, which are used to express color and texture information of an image. The feature extraction principle of the first-class features is simple, and the candidate POI signboard images can be well characterized through design.

For example, the first type of feature of the POI sign image may also be determined based on the geographic location information to which the POI sign image is directed.

In operation S403, a second type of feature of the candidate POI signboard image is extracted by using a second type of feature extraction method.

According to an embodiment of the present application, the above process of extracting the second class of features of the candidate POI sign image may include: and carrying out image characterization learning on the candidate POI signboard image by using a deep learning model to obtain a second class of characteristics. Under a deep learning framework, the representation mode of the image is converted into data-driven learning-based expression, the model directly receives an original image signal of the candidate POI signboard image as input, and the candidate POI signboard image is expressed layer by taking image category information and the like as supervision. Feature learning can be conveniently performed using a deep learning framework when there are sufficient training samples and sufficient computational resources. For example, a deep Convolutional Neural Network (CNN) may be employed to extract the second class features of the candidate POI signboard images. CNN is an artificial neural network structure designed inspired by the optic nervous system in bionics. The convolution kernel is a weight of a neural network, linear weighting is carried out by using a strategy of sharing the weight for different image sub-regions (local receptive fields) in the POI signboard image, and nonlinear function mapping is realized by using multilevel processing such as nonlinear activation function, pooling (Pooling) and sampling operation. Thereby enabling good characterization of the content for the candidate POI sign images.

In operation S404, a fusion process is performed on the first class of features and the second class of features to obtain a fusion feature.

For example, the first kind of features extracted by manual design according to business needs in the above operation S402 may be referred to as "shallow features", and the second kind of features extracted based on the deep learning framework in the above operation S403 may be referred to as "deep features". In operation S404, the two types of features are fused to obtain a fusion feature that can describe the candidate POI signboard image more comprehensively and accurately, so that the subsequent classification process for the fusion feature is more accurate.

According to an embodiment of the present application, the first class of features and the second class of features may each have a respective vector representation (embedding), and the process of performing the fusion processing on the first class of features and the second class of features to obtain the fusion features may include: and carrying out splicing (concatenate) merging on the vector representation of the first type of characteristics and the vector representation of the second type of characteristics to obtain a fused characteristic vector.

In operation S405, the fusion features are classified by using a pre-trained classification model to obtain a classification result of the fusion features.

In operation S406, a genre to which the candidate POI sign image belongs is determined according to the classification result.

For example, whether the input image includes the real POI signboard information may be trained as a classification label of the classification model to obtain a classification result for characterizing whether the input image includes the real POI signboard information.

And when the classification result of the fusion feature indicates that one candidate POI signboard image contains the real POI signboard information, determining that the candidate POI signboard image belongs to the preset type. The candidate POI sign image is an image for the real POI sign, and a subsequent identification process for the content of the POI sign can be performed. And when the classification result of the fusion feature indicates that one candidate POI signboard image does not contain the real POI signboard information, determining that the candidate POI signboard image does not belong to the preset type. The candidate POI sign image is not an image for the real POI sign, and may be an image for a billboard, a sign, or the like, without performing a subsequent recognition process for the POI sign content. The present embodiment screens the candidate POI sign images again through the classification process to screen out invalid images that do not contain valid POI sign information. The occupation of the time and the computing resources of the invalid image in the subsequent identification process can be reduced, and the identification processing efficiency and the accuracy are improved.

FIG. 5 is an exemplary illustration of a first class of features of a candidate POI sign image in accordance with one embodiment of the present application.

As shown in fig. 5, the first-class features of the candidate POI sign image 540 may include one or more of a color distribution feature 541, a texture feature 542, and a geographic location feature 543, according to embodiments of the present application. The following is an exemplary description of each of the first class features.

The color distribution characteristic 541 is one of the most widely used visual characteristics in image retrieval, mainly because colors tend to be quite correlated with objects or scenes contained in an image. In addition, compared with other visual features, the color distribution features have small dependence on factors such as the size, the direction and the visual angle of the image, so that the robustness is higher. When extracting the color distribution characteristics of the candidate POI signboard image 540, a proper color space needs to be selected to describe the characteristics, and a certain quantization method needs to be designed to express the color distribution characteristics in a vector form.

For example, a color histogram feature of the candidate POI sign image 540 may be taken as the color distribution feature 541 of the candidate POI sign image 540. The color histogram feature may describe the proportion of different colors in the candidate POI sign image. The color histogram features may be characterized based on different color spaces and coordinate systems, such as the color histogram features of R (red, green) B (blue) space, the color histogram features of H (hue, saturation) S (saturation) V (value) space, and so on. Calculating the color histogram may divide the color space into several small color bins and then obtain the color histogram feature by calculating the number of pixels whose colors fall within each bin. The distribution of the number of pixels in each of the three color channels red, green and blue within a predetermined intensity range is illustrated by way of example in fig. 5.

According to an embodiment of the present application, the HOG feature of the candidate POI sign image 540 may be taken as the texture feature 542 of the candidate POI sign image 540. The HOG features are constructed by calculating and counting the histogram of gradient direction of local area of image. The HOG features remain relatively invariant to both geometric and optical distortions of the image. When the method is realized, the directional gradient histogram decomposes the image into a plurality of image blocks (blocks), each image block comprises a plurality of cell units (cells), the directional gradient histogram of each cell unit is calculated, the histograms of all the cell units in the same image block are connected to form the directional gradient histogram characteristic of the image block, and the directional gradient histogram characteristic is normalized. Finally, the feature descriptions of all image blocks in the image are connected to obtain the HOG features of the whole candidate POI signboard image 540. The gradient amplitudes of each cell unit in an image block in the respective directions are illustrated by way of example in fig. 5.

According to an embodiment of the present application, for example, the candidate POI sign image 540 is extracted from a certain image to be processed, and the POI geographical location information for the image to be processed can be acquired. For example, the geographical location information uploaded by the terminal device at the same time when the image to be processed is uploaded is obtained as the POI geographical location information for the image to be processed. Based on the POI geographic location information, a vectorized representation is made to determine the geographic location features 543 of the candidate POI sign images 540. In practical situations, the probability of whether the POI signboard information exists in a place can be judged by means of the geographical position information. The geographical position information is used as the first type of characteristics to carry out the subsequent classification process, which is beneficial to improving the accuracy of the classification model.

In the point of interest information processing method according to the embodiment of the application, before the fusion features of the candidate POI signboard images are input to the classification model, the acquisition of the classification model may be performed. For example, the classification model is constructed and trained.

FIG. 6 is an exemplary flow diagram for constructing and training a classification model according to one embodiment of the present application.

As shown in fig. 6, the process of constructing and training the classification model may include operations S601 to S606. It should be noted that the process of constructing and training the classification model may be implemented in the same electronic device as the above operations S201 to S205, or may be implemented in a different electronic device, which is not limited herein.

In operation S601, an initial classification model is constructed.

In operation S602, a plurality of sample images are acquired.

The obtaining manner of the multiple sample images may be the same as the obtaining manner of the to-be-processed image, and is not described herein again. Multiple sample images may also be taken from a large number of candidate POI sign images that have been processed historically.

Next, the following operations S603 to S606 are performed for each of the plurality of sample images.

In operation S603, first and second type sample features of the sample image are extracted.

The process of extracting the first type sample feature and the second type sample feature of each sample image is similar to the process of extracting the first type feature and the second type feature of each candidate POI signboard image, and is not described herein again.

In operation S604, a fusion process is performed on the first type sample feature and the second type sample feature to obtain a sample fusion feature of the sample image.

The process of performing the fusion processing on the first type of sample features and the second type of sample features is similar to the process of performing the fusion processing on the first type of features and the second type of features in the foregoing, and is not described herein again.

In operation S605, a label is added to the sample image, the label being used to characterize whether the sample image contains real POI signboard information.

In operation S606, the initial classification model is trained using the sample fusion features and labels of the plurality of sample images, so as to obtain a pre-trained classification model.

It can be understood that, in the training process, because the first type of sample features and the second type of sample features of the sample image respectively represent features of different levels of the sample image, the fused features obtained by fusing the first type of sample features and the second type of sample features can describe the content of the sample image more comprehensively and accurately. The initial classification model is trained by utilizing the fusion characteristics and the corresponding labels, so that the classification model can rapidly learn the good performance of distinguishing whether the image contains the real POI signboard information or not in the training process.

Fig. 7 is a block diagram of a point of interest information processing apparatus according to an embodiment of the present application.

As shown in fig. 7, the point of interest information processing apparatus 700 may include: an acquisition module 710, an image extraction module 720, a feature extraction module 730, a classification module 740, and a determination module 750.

The obtaining module 710 is used for obtaining an image to be processed.

The image extraction module 720 is configured to extract at least one candidate POI signboard image from the image to be processed.

The feature extraction module 730 is configured to extract, for each candidate POI sign image of the at least one candidate POI sign image, a first class feature and a second class feature of the candidate POI sign image.

The classification module 740 is configured to classify the candidate POI sign image based on the first class features and the second class features of the candidate POI sign image to determine a genre to which the candidate POI sign image belongs.

The determining module 750 is configured to determine the candidate POI sign image as the POI sign image to be recognized when it is determined that the candidate POI sign image belongs to the predetermined category.

According to an embodiment of the application, the first type of feature includes at least one of: color distribution features, texture features, and geo-location features.

For example, the feature extraction module 730 may include a first extraction sub-module for extracting a color histogram feature of the candidate POI sign image as a color distribution feature of the candidate POI sign image.

For example, the feature extraction module 730 may include a second extraction sub-module for extracting a histogram of directional gradients feature of the candidate POI sign image as a texture feature of the candidate POI sign image.

Illustratively, the feature extraction module 730 may include a third extraction sub-module for acquiring POI geographical location information for the image to be processed; and determining geographic location features of the candidate POI sign images based on the POI geographic location information.

According to an embodiment of the present application, the classification module 740 may include: the fusion submodule is used for carrying out fusion processing on the first type of features and the second type of features to obtain fusion features; and the classification processing sub-module is used for classifying the fusion features by using a pre-trained classification model to obtain a classification result of the fusion features, and determining the type of the candidate POI signboard image according to the classification result.

For example, the fusion sub-module is configured to join and merge the vector representation of the first type of feature and the vector representation of the second type of feature to obtain a fusion feature vector.

For example, the classification processing sub-module is configured to determine that the candidate POI sign image belongs to the predetermined category when the classification result of the fused feature indicates that the candidate POI sign image contains real POI sign information.

According to an embodiment of the present application, the apparatus 700 may further include: and the construction module is used for constructing an initial classification model before the classification module performs classification processing on the fusion features by using a pre-trained classification model. A sample acquiring and processing module, configured to acquire a plurality of sample images, and perform the following operations for each sample image in the plurality of sample images: extracting a first type sample characteristic and a second type sample characteristic of the sample image, carrying out fusion processing on the first type sample characteristic and the second type sample characteristic to obtain a sample fusion characteristic of the sample image, and adding a label for the sample image, wherein the label is used for representing whether the sample image contains real POI signboard information. And the training module is used for training the initial classification model by utilizing the sample fusion characteristics and the labels of the plurality of sample images to obtain a pre-trained classification model.

According to an embodiment of the present application, the feature extraction module 730 may include a fourth extraction sub-module, configured to perform image characterization learning on the candidate POI signboard image by using a deep learning model to obtain the second class of features.

According to an embodiment of the present application, the image extraction module 720 is specifically configured to detect the image to be processed by using a target detection algorithm, so as to generate at least one detection frame in the image to be processed; and segmenting an image area which is respectively surrounded by the at least one detection frame from the image to be processed to serve as at least one candidate POI signboard image.

According to an embodiment of the present application, the apparatus 700 may further include: the identification module is used for performing character identification on the POI signboard image to be identified after the candidate POI signboard image is determined to be the POI signboard image to be identified so as to obtain a character identification result; and the marking module is used for marking the POI of the to-be-identified POI signboard image by using the character identification result in the map.

It should be noted that the implementation, solved technical problems, implemented functions, and achieved technical effects of each module/unit/subunit and the like in the apparatus part embodiment are respectively the same as or similar to the implementation, solved technical problems, implemented functions, and achieved technical effects of each corresponding step in the method part embodiment, and are not described herein again.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present application may be implemented in one module. Any one or more of the modules, sub-modules, units and sub-units according to the embodiments of the present application may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to the embodiments of the present application may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the application may be at least partially implemented as computer program modules, which, when executed, may perform the corresponding functions.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 8 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 8, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 8 illustrates an example of a processor 801.

The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the method for processing the point of interest information provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the point-of-interest information processing method provided by the present application.

The memory 802, as a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the point of interest information processing method in the embodiments of the present application (for example, the obtaining module 710, the image extracting module 720, the feature extracting module 730, the classifying module 740, and the determining module 750 shown in fig. 7). The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the point of interest information processing method in the above-described method embodiment.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the point of interest information processing method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 8.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the point-of-interest information processing method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, after candidate POI signboard images are extracted from the image to be processed, the POI information processing method further carries out classification processing on each candidate POI signboard image so as to further screen out candidate POI signboard images which are expected (namely belong to a preset type) to serve as the POI signboard images to be identified. Therefore, later invalid identification caused by early detection errors can be avoided, the time and the occupation of computing resources of the invalid image in the subsequent identification process are reduced, and the pertinence of the subsequent identification process is improved. In addition, in the classification process, the first class characteristics and the second class characteristics of the candidate POI signboard images are comprehensively considered, and the two classes of characteristics describe the image characteristics from different layers, so that the content of the candidate POI signboard images can be more accurately and comprehensively described, and the accuracy rate and the recall rate of classification processing can be improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An interest point information processing method includes:

acquiring an image to be processed;

extracting at least one candidate POI signboard image from the image to be processed; and

for each of the at least one candidate POI sign image,

extracting a first class of features and a second class of features of the candidate POI signboard images;

classifying the candidate POI sign image based on the first class of features and the second class of features to determine a type to which the candidate POI sign image belongs; and

when the candidate POI signboard image is determined to belong to the preset type, determining the candidate POI signboard image as a POI signboard image to be identified.

2. The method of claim 1, wherein the first type of feature comprises at least one of:

a color distribution feature, a texture feature, or a geographic location feature.

3. The method of claim 2, wherein said extracting first class features of the candidate POI sign image comprises:

and extracting the color histogram feature of the candidate POI signboard image as the color distribution feature of the candidate POI signboard image.

4. The method of claim 2, wherein said extracting first class features of the candidate POI sign image comprises:

and extracting the directional gradient histogram characteristics of the candidate POI signboard image as the texture characteristics of the candidate POI signboard image.

5. The method of claim 2, wherein said extracting first class features of the candidate POI sign image comprises:

acquiring POI geographical position information aiming at the image to be processed; and

and determining the geographic position characteristics of the candidate POI signboard images based on the POI geographic position information.

6. The method of claim 1, wherein the classifying the candidate POI sign image based on the first class of features and the second class of features comprises:

fusing the first type of features and the second type of features to obtain fused features;

classifying the fusion features by using a pre-trained classification model to obtain classification results of the fusion features; and

and determining the type of the candidate POI signboard image according to the classification result.

7. The method of claim 6, further comprising:

prior to said classifying said fused features using a pre-trained classification model,

constructing an initial classification model;

acquiring a plurality of sample images;

extracting a first type sample feature and a second type sample feature of the sample image aiming at each sample image in the plurality of sample images, carrying out fusion processing on the first type sample feature and the second type sample feature to obtain a sample fusion feature of the sample image, and adding a label for the sample image, wherein the label is used for representing whether the sample image contains real POI signboard information or not; and

and training the initial classification model by utilizing the respective sample fusion characteristics and labels of the plurality of sample images to obtain the pre-trained classification model.

8. The method of claim 7, wherein said determining a type to which the candidate POI sign image belongs from the classification result comprises:

and when the classification result of the fusion feature represents that the candidate POI signboard image contains real POI signboard information, determining that the candidate POI signboard image belongs to a preset type.

9. The method of claim 6, wherein the fusing the first class of features and the second class of features to obtain fused features comprises:

and connecting and combining the vector representation of the first type of features and the vector representation of the second type of features to obtain a fusion feature vector.

10. The method of claim 1, wherein said extracting a second class of features of the candidate POI sign image comprises:

and performing image characterization learning on the candidate POI signboard image by using a deep learning model to obtain the second class of characteristics.

11. The method of claim 1, wherein extracting at least one candidate POI sign image from the image to be processed comprises:

detecting the image to be processed by using a target detection algorithm so as to generate at least one detection frame in the image to be processed; and

and segmenting an image area which is respectively surrounded by the at least one detection frame from the image to be processed to serve as the at least one candidate POI signboard image.

12. The method according to any one of claims 1-11, further comprising:

after the determination that the candidate POI sign image is a POI sign image to be recognized,

performing character recognition on the POI signboard image to be recognized to obtain a character recognition result; and

and marking the POI of the to-be-identified POI signboard image by using the character identification result in a map.

13. A point-of-interest information processing apparatus comprising:

the acquisition module is used for acquiring an image to be processed;

the image extraction module is used for extracting at least one candidate POI signboard image from the image to be processed;

a feature extraction module to extract, for each of the at least one candidate POI sign image, a first class of features and a second class of features of the candidate POI sign image;

a classification module to classify the candidate POI sign image based on the first class of features and the second class of features to determine a genre to which the candidate POI sign image belongs; and

the determination module is used for determining the candidate POI signboard image as the POI signboard image to be identified when the candidate POI signboard image is determined to belong to the preset type.

14. The apparatus of claim 13, wherein the first type of feature comprises at least one of:

15. The apparatus of claim 14, wherein the feature extraction module comprises:

and the first extraction sub-module is used for extracting the color histogram characteristics of the candidate POI signboard image as the color distribution characteristics of the candidate POI signboard image.

16. The apparatus of claim 14, wherein the feature extraction module comprises:

and the second extraction sub-module is used for extracting the directional gradient histogram characteristics of the candidate POI signboard image as the texture characteristics of the candidate POI signboard image.

17. The apparatus of claim 14, wherein the feature extraction module comprises:

the third extraction submodule is used for acquiring POI geographic position information aiming at the image to be processed; and determining geographic location features of the candidate POI sign images based on the POI geographic location information.

18. The apparatus of claim 13, wherein the classification module comprises:

the fusion submodule is used for carrying out fusion processing on the first type of characteristics and the second type of characteristics to obtain fusion characteristics; and

and the classification processing sub-module is used for classifying the fusion features by utilizing a pre-trained classification model to obtain a classification result of the fusion features, and determining the types of the candidate POI signboard images according to the classification result.

19. The apparatus of claim 18, further comprising:

the building module is used for building an initial classification model before the fusion features are classified by using the pre-trained classification model;

the system comprises a sample acquisition and processing module, a data processing module and a data processing module, wherein the sample acquisition and processing module is used for acquiring a plurality of sample images, extracting a first type sample feature and a second type sample feature of each sample image in the plurality of sample images, fusing the first type sample feature and the second type sample feature to obtain a sample fusion feature of the sample images, and adding a label to the sample images, wherein the label is used for representing whether the sample images contain real POI signboard information; and

and the training module is used for training the initial classification model by utilizing the respective sample fusion characteristics and labels of the plurality of sample images to obtain the pre-trained classification model.

20. The apparatus of claim 19, wherein,

and the classification processing sub-module is used for determining that the candidate POI signboard image belongs to a preset type when the classification result of the fusion feature indicates that the candidate POI signboard image contains real POI signboard information.

21. The apparatus of claim 18, wherein,

and the fusion submodule is used for connecting and merging the vector representation of the first type of characteristics and the vector representation of the second type of characteristics to obtain a fusion characteristic vector.

22. The apparatus of claim 13, wherein the feature extraction module comprises:

and the fourth extraction submodule is used for performing image characterization learning on the candidate POI signboard image by using a deep learning model so as to obtain the second class of characteristics.

23. The apparatus of claim 13, wherein,

the image extraction module is used for detecting the image to be processed by utilizing a target detection algorithm so as to generate at least one detection frame in the image to be processed; and segmenting an image area which is respectively surrounded by the at least one detection frame from the image to be processed to serve as the at least one candidate POI signboard image.

24. The apparatus of any of claims 13-23, further comprising:

the identification module is used for performing character identification on the POI signboard image to be identified after the candidate POI signboard image is determined to be the POI signboard image to be identified so as to obtain a character identification result; and

and the marking module is used for marking the POI of the to-be-identified POI signboard image by utilizing the character identification result in a map.

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.

26. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-12.