WO2023065731A1 - Method for training target map model, positioning method, and related apparatuses - Google Patents

Method for training target map model, positioning method, and related apparatuses Download PDF

Info

Publication number
WO2023065731A1
WO2023065731A1 PCT/CN2022/104939 CN2022104939W WO2023065731A1 WO 2023065731 A1 WO2023065731 A1 WO 2023065731A1 CN 2022104939 W CN2022104939 W CN 2022104939W WO 2023065731 A1 WO2023065731 A1 WO 2023065731A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
expression
image
map
character
Prior art date
Application number
PCT/CN2022/104939
Other languages
French (fr)
Chinese (zh)
Inventor
黄际洲
王海峰
卓安
孙一博
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023065731A1 publication Critical patent/WO2023065731A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Definitions

  • the present disclosure relates to the field of data processing technology, specifically to the field of artificial intelligence technology such as deep learning, natural language understanding, and intelligent search, and in particular to a training method and positioning method for a target map model, as well as corresponding devices, electronic equipment, and computer-readable Storage media and computer program products.
  • artificial intelligence technology such as deep learning, natural language understanding, and intelligent search
  • a training method and positioning method for a target map model as well as corresponding devices, electronic equipment, and computer-readable Storage media and computer program products.
  • Pre-trained models have made great progress in the field of natural language processing and products in multiple industries.
  • the pre-training model can better model representations of words, words, sentences, etc.
  • fine-tuning the model using labeled samples of specific tasks can usually achieve very good results.
  • the map field is special, and the information processing process in the map field often needs to be associated with the real world. For example, in a map retrieval engine, when a user inputs a query word, the position of the candidate word itself and its distance from the user's current location are very important ranking features.
  • the text data in the map field is mainly structured data, and the information contained is relatively streamlined and limited, usually only names, aliases, addresses, and categories.
  • information that has a strong correlation between the map field and the real world often cannot be expressed intuitively through text.
  • Embodiments of the present disclosure provide a method, device, electronic device, computer readable storage medium and computer program product for training and positioning a target map model.
  • the embodiment of the present disclosure proposes a training method for a target map model, including: obtaining the text expression, coordinate vector expression, and signboard image expression of each map location point; The first training sample composed of vector expressions is trained to obtain the first sub-model; according to the second training sample composed of text expressions and signboard image expressions corresponding to the same map location point, the second sub-model is trained to obtain the second sub-model; the fusion of the first sub-model and The second sub-model obtains the target map model.
  • the embodiment of the present disclosure proposes a training device for a target map model, including: a parameter acquisition unit configured to acquire the text expression, coordinate vector expression, and signboard image expression of each map location point; the first sub-model training The unit is configured to obtain the first sub-model according to the first training sample composed of text expression and coordinate vector expression corresponding to the same map position point; the second sub-model training unit is configured to obtain the first sub-model according to the corresponding same map position point The second training sample composed of the text expression and the signboard image expression is trained to obtain the second sub-model; the sub-model fusion unit is configured to fuse the first sub-model and the second sub-model to obtain the target map model.
  • the embodiment of the present disclosure proposes a positioning method, including: acquiring the positioning image obtained by shooting the signboard of the target building and the current position; determining the actual coordinate vector expression according to the current position, and determining the actual signboard according to the positioning image Image expression; call the target map model to determine the text expression of the shooting position corresponding to the actual coordinate vector expression; call the target map model to determine the alternative text expression sequence corresponding to the actual signboard image expression; based on the text expression of the shooting position and the alternative text expression sequence Adjust the presentation priority of each alternative text expression in the sequence based on the distance between the alternative text expressions; locate the actual location of the target building based on the adjusted alternative text expression sequence based on the presentation priority; target map model Obtained according to the training method of the target map model as described in any implementation manner in the first aspect.
  • the embodiment of the present disclosure proposes a positioning device, including: an image for positioning and a current position acquisition unit configured to acquire the image for positioning and the current position obtained by photographing the signboard of the target building; the actual coordinate vector expression and an actual signboard image expression determining unit, configured to determine the actual coordinate vector expression according to the current position, and determine the actual signboard image expression according to the positioning image; the shooting position text expression determining unit is configured to call the target map model determination and the actual coordinate vector expression The corresponding shooting position text expression; the alternative text expression sequence determination unit is configured to call the target map model to determine the alternative text expression sequence corresponding to the actual sign image expression; the presentation priority adjustment unit is configured to be based on the shooting position text The distance between the expression and each candidate text expression in the candidate text expression sequence adjusts the presentation priority ranking of each candidate text expression in the sequence; the actual position determination unit is configured to adjust the candidate text expression based on the presentation priority Select the text expression sequence, locate the actual location of the target building, and obtain the target map model according to the training device for the target
  • an embodiment of the present disclosure provides an electronic device, the electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by the at least one processor , the instructions are executed by at least one processor, so that at least one processor can realize the training method of the target map model as described in any implementation manner in the first aspect or the positioning as described in any implementation manner in the third aspect when executed method.
  • the embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to implement the target map model described in any implementation manner in the first aspect.
  • the embodiments of the present disclosure provide a computer program product including a computer program.
  • the computer program When the computer program is executed by a processor, it can realize the training method of the target map model as described in any implementation manner in the first aspect or as described in The positioning method described in any implementation manner in the third aspect.
  • the training method and positioning method of the target map model provided by the embodiments of the present disclosure are not only based on the text expression of the map position point during training, but also additionally introduce coordinate vector expression and signboard image expression, so that pre-training in multiple dimensions
  • the model makes full use of the spatio-temporal big data in the field of maps to make the information contained in the pre-training model more relevant to the real world, and then in actual application, it can better combine the user's current location and the captured positioning images to obtain more accurate results. positioning results.
  • FIG. 1 is an exemplary system architecture to which the present disclosure may be applied;
  • FIG. 2 is a flowchart of a method for training a target map model provided by an embodiment of the present disclosure
  • FIG. 3 is a flow chart of a method for obtaining a coordinate vector representation of a map location point provided by an embodiment of the present disclosure
  • FIG. 4 is a flow chart of a method for acquiring a signboard image representation of a map location point provided by an embodiment of the present disclosure
  • FIG. 5 is a flowchart of a positioning method provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a fusion of a picture sequence and a text sequence provided by an embodiment of the present disclosure
  • FIG. 7 is a structural block diagram of a training device for a target map model provided by an embodiment of the present disclosure.
  • FIG. 8 is a structural block diagram of a positioning device provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of an electronic device suitable for performing a target map model training method and/or a positioning method provided by an embodiment of the present disclosure.
  • FIG. 1 shows an exemplary system architecture 100 to which embodiments of the method, device, electronic device, and computer-readable storage medium for training a face recognition model and recognizing faces of the present disclosure can be applied.
  • a system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 .
  • the network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • Terminal devices 101 , 102 , 103 Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like.
  • Various applications for realizing information communication between the terminal devices 101, 102, 103 and the server 105 can be installed, such as map retrieval model training applications, map retrieval applications, positioning applications, and the like.
  • the terminal devices 101, 102, 103 and the server 105 may be hardware or software.
  • the terminal devices 101, 102, 103 are hardware, they can be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop computers and desktop computers, etc.; when the terminal devices 101, 102 When , 103 is software, it can be installed in the electronic devices listed above, and it can be implemented as multiple software or software modules, or can be implemented as a single software or software module, which is not specifically limited here.
  • the server 105 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server; when the server is software, it can be implemented as multiple software or software modules, or as a single software or software The module is not specifically limited here.
  • the server 105 can provide various services through various built-in applications. Taking a positioning application that can provide positioning services for users as an example, the server 105 can achieve the following effects when running the positioning application: First, the receiving terminal devices 101, 102 , 103 through the network 104 incoming images for positioning and the current position of the terminal equipment 101, 102, 103; then, determine the actual coordinate vector expression according to the current position, and determine the actual signboard image according to the image for positioning expression; then, call the target map model to determine the text expression of the shooting position corresponding to the actual coordinate vector expression; next step, call the target map model to determine the alternative text expression sequence corresponding to the actual sign image expression; next, based on the text expression of the shooting position and the distance between each alternative text expression in the alternative text expression sequence, adjust the presentation priority of each alternative text expression in the sequence; finally, pass the alternative text expression sequence after the presentation priority adjustment through the network 104 Return to the terminal devices 101, 102, 103 so that the user can locate the actual location of the target building according to the results presented by
  • the target map model can be trained by the built-in map retrieval model training application on the server 105 according to the following steps: obtain the text expression, coordinate vector expression, and signboard image expression of each map location point; and the first training sample composed of the expression of the coordinate vector to obtain the first sub-model through training; according to the second training sample composed of the text expression corresponding to the same map position point and the image expression of the signboard, the second sub-model is obtained through training; the fusion of the first sub-model The first sub-model and the second sub-model are used to obtain the target map model.
  • the training methods for the target map model provided by the subsequent embodiments of the present disclosure are generally provided by those with strong computing power and more computing resources.
  • the server 105 executes, and correspondingly, the training device of the target map model is generally also set in the server 105.
  • the terminal devices 101, 102, and 103 can also complete the training through the training application of the target map model installed on them.
  • the above calculations are performed by the server 105 , and then output the same results as the server 105 .
  • the training device for the target map model can also be set in the terminal devices 101 , 102 , 103 .
  • exemplary system architecture 100 may also exclude server 105 and network 104 .
  • the server used to train the target map model may be different from the server used to call the trained target map model.
  • the target map model trained by the server 105 can also obtain a lightweight target map model suitable for placement in the terminal devices 101, 102, and 103 through model distillation, that is, it can be flexibly selected according to the recognition accuracy of actual needs Whether to use the lightweight target map model in the terminal devices 101 , 102 , 103 or choose to use the more complex target map model in the server 105 .
  • terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 2 is a flow chart of a method for training a target map model provided by an embodiment of the present disclosure, wherein the process 200 includes the following steps:
  • Step 201 Obtain the text expression, coordinate vector expression, and signboard image expression of each map location point;
  • This step aims to obtain the text expression, coordinate vector expression, and signboard image expression of each map location point by the execution subject of the training method of the target map model (for example, the server 105 shown in FIG. 1 ).
  • the text expression is exactly this map location point described in text form, such as XX building (building, square, hospital, restaurant, etc.)
  • No. is the vectorized expression of the geographic coordinates of the building) in the real world
  • the image representation of the signboard is the image description of the signboard of the object (usually a building) at the location point of the map, that is, the image description of the signboard is used to reflect the location of the building
  • the customized signboard style reflects the strong relationship between the two.
  • the coordinate vector expression can convert various forms of coordinates into a vector expression in a variety of vectorized encoding methods, for example, transforming the boundary coordinate sequence of a map position point corresponding to a building into a vector expression, and on this basis, the boundary
  • the coordinate sequence is processed again, such as introducing a geocoding algorithm to convert the boundary coordinate sequence into another form of representation, or simply using the four-corner coordinates of the rectangle that frame the building as the boundary coordinates, and through such as a hash algorithm, etc.
  • the vectorized expression of the boundary coordinates can be obtained by using the method, which is not specifically limited here, and the appropriate processing method can be selected according to the actual application scenario.
  • signboard image expression is an expression method that can show the image characteristics of the signboard based on the image captured by the signboard, such as setting shooting resolution, character definition, character image extraction, processing methods, etc.
  • Step 202 According to the first training sample composed of text expression and coordinate vector expression corresponding to the same map location point, train to obtain the first sub-model;
  • this step aims to input the text expression corresponding to the same map position point as a sample, output the coordinate vector expression corresponding to the same map position point as a sample, and use the first Training samples are used to train the first sub-model, so that the trained first sub-model can establish the corresponding relationship between the text expression and the coordinate vector expression of the same map position point, so as to facilitate the subsequent matching of the corresponding input information according to the corresponding relationship.
  • Output information For example, input the coordinate vector of the current location to match the text description of the current location.
  • Step 203 According to the second training sample composed of text expression and signboard image expression corresponding to the same map location point, train to obtain the second sub-model;
  • this step aims to input the text expression corresponding to the same map location point as a sample, output the signboard image expression corresponding to the same map location point as a sample, and use the second Training samples to train the second sub-model, so that the trained second sub-model can establish the corresponding relationship between the text expression of the same map location point and the image expression of the signboard, so as to facilitate the subsequent matching of the input information corresponding to the corresponding relationship.
  • Output information For example, input an image taken of a building signboard, and match the text description of the corresponding building.
  • Step 204 Fusion of the first sub-model and the second sub-model to obtain the target map model.
  • this step aims to fuse the first sub-model and the second sub-model by the above-mentioned executive body, so that the text expressions in the first sub-model and the second sub-model can be used as connection points to realize the establishment of the same map position
  • the corresponding relationship between the text expression of the point, the coordinate vector expression and the signboard image expression (such as the correspondence between A, B, and C), so that the final fused target map model can be based on the three One or both of them accurately determine the effect of the other.
  • the training process of the first sub-model and the second sub-model is carried out independently, but after fusion, it does not mean that the fused initial map model no longer needs to be trained, and often needs to be repeated a few times. Training to make the overall parameters of the fused map model the best overall. For example, first fuse the first sub-model and the second sub-model to obtain the initial map model; then, adjust the parameters of the initial map model until the preset iterative jump-out condition is met, and output the initial map model that meets the iterative jump-out condition as the target map model .
  • the iterative jump-out condition set for the fused map model at this time is often different from the iterative jump-out condition set for the first sub-model and the second sub-model, unless the iterative jump-out condition set in some scenarios is Whether the accuracy difference of adjacent iteration results meets the general conditions such as requirements.
  • the first sub-model, the second sub-model, and the fused map model in this embodiment can be implemented using various model frameworks, for example, based on BERT (Bidirectional Encoder Representations from Transformers, which is generally applicable to the field of natural language processing, Chinese literal translation is a two-way encoder representation from a translator) model, and other models with similar effects can also be used, which will not be listed here.
  • BERT Bidirectional Encoder Representations from Transformers
  • Chinese literal translation is a two-way encoder representation from a translator
  • the training method of the target map model provided by the embodiment of the present disclosure is not only based on the text expression of the map position point, but also additionally introduces the coordinate vector expression and the sign image expression during training, so that the pre-trained model in multiple dimensions can make full use of Spatio-temporal big data in the field of maps makes the information contained in the pre-training model more relevant to the real world, and then in practical applications, it can better combine the user's current location and the captured positioning images to obtain more accurate positioning results .
  • FIG. 3 is a flowchart of a method for obtaining a coordinate vector expression of a map location point provided by an embodiment of the present disclosure, that is, a coordinate vector expression for step 201 in the process 200 shown in FIG. 2 is provided.
  • a specific implementation manner, other steps in the process 200 are not adjusted, and a new complete embodiment is obtained by replacing steps with the specific implementation manner provided in this embodiment.
  • the process 300 includes the following steps:
  • Step 301 Acquire the boundary coordinate sequences of each map location point respectively;
  • the coordinate sequence of the outer contours of all buildings belonging to the map location point is the boundary coordinate sequence.
  • the geographic coordinate sequence of the outer contours of the three outermost buildings is the boundary coordinate sequence.
  • the frequency or interval of taking points from the continuous outer contour can be set by yourself.
  • Step 302 Using the geocoding algorithm and the boundary coordinate sequence, calculate the geocoding set covering the geographic block where the corresponding map location point is located;
  • this step aims to use the geocoding algorithm and the boundary coordinate sequence to calculate the geocoding set covering the geographic block where the corresponding map location point is located by the above-mentioned execution subject. That is, each geocode in the set of geocodes corresponds to a boundary coordinate in the sequence of boundary coordinates.
  • the geocoding algorithm can specifically choose the Geohash algorithm, Google s2 algorithm, etc.
  • the Geohash algorithm is an address encoding method that encodes two-dimensional spatial longitude and latitude data into a string
  • the Google s2 algorithm comes from geometric mathematics.
  • a mathematical symbol S 2 which represents the unit sphere, so the s2 algorithm is designed to solve various geometric problems on the sphere, because the real world is actually a sphere, so it can also be used as an address encoding algorithm .
  • Step 303 converting the geocode set containing each geocode into a geographic string
  • this step aims to convert the geocode set containing each geocode into a geographic character string by the above-mentioned executive body, for example, convert each geocode in the geocode set into a tree structure according to the hierarchical result, and Traversed in a fixed fashion, converting it to a geographic string representing the geographic area within which this map location point falls.
  • Step 304 Convert the geographic character string into a geographic vector, and express the geographic vector as a coordinate vector of a corresponding map location point.
  • this step aims to convert the geographic character string into a geographic vector by the execution subject, and express the geographic vector as a coordinate vector of a corresponding map location point.
  • the conversion rules between geographic strings and geographic vectors can be set by yourself, or a model that can convert the results in vector form can be used, such as inputting geographic strings into preset vector expressions A conversion model; wherein, the vector expression conversion model is used to characterize the corresponding relationship between geographic character strings and geographic vectors, such as convolutional neural network, recurrent neural network, etc.; then, the geographic vector output by the vector representation conversion model is received.
  • a geographic block vectorized vocabulary can be constructed at the level of country, province, city, district, county, and road.
  • each geographic entity is given its corresponding geographic block vector.
  • FIG. 4 is a flow chart of a method for obtaining a signboard image expression of a map location point provided by an embodiment of the present disclosure, that is, for the signboard image expression in step 201 of the process 200 shown in FIG.
  • the process 400 includes the following steps:
  • Step 401 Obtain the signboard image of the building corresponding to each map location point respectively;
  • This step is aimed at firstly obtaining the image of the signboard of the building corresponding to the location point on the map by the above-mentioned execution subject. And it should be ensured that the same parameters as possible should be kept as much as possible when shooting different signboards, such as shooting equipment, light, angle, resolution, weather, etc., so as to avoid differences between different signboard images.
  • Step 402 Identify the character part from the signboard image, and cut out the character image corresponding to each character of the character part;
  • Step 403 Arranging each character image according to the sequence of each character in the character part, and using the obtained character image array as a signboard image representation of the corresponding map location point.
  • step 402 aims to identify the character part from the signboard image by the above-mentioned executing subject, and cut out the character image corresponding to each character of the character part, and then through step 403, each character image is Sorting is performed to obtain a signboard image expression used to describe the features of the signboard image.
  • image abnormality recognition may include at least one of fuzzy recognition, noise recognition, and skew recognition
  • image abnormality recognition may include at least one of fuzzy recognition, noise recognition, and skew recognition
  • a positioning method can refer to the steps included in the process 500:
  • Step 501 Obtain the image for positioning and the current position obtained by shooting the signboard of the target building;
  • This step is aimed at obtaining the image for positioning and the current position obtained by photographing the signboard of the target building by the above-mentioned execution subject.
  • the target building must be a certain building within the user's field of vision, that is, it is located in the vicinity of the user who initiated the positioning demand, and the current location is the internal positioning component of the device held by the user (such as GPS component, base station interaction component) returns the geographic coordinates.
  • Step 502 Determine the actual coordinate vector expression according to the current position, and determine the actual signboard image expression according to the positioning image;
  • this step aims to determine the actual coordinate vector expression according to the current position, and determine the actual signboard image expression according to the positioning image, that is, convert the coordinates into the corresponding vector expression, and convert the image into the corresponding image expression.
  • Step 503 call the target map model to determine the text expression of the shooting location corresponding to the actual coordinate vector expression
  • this step aims to call the corresponding relationship between the coordinate vector expression and the text expression recorded by the target map model by the above-mentioned execution subject, and determine the text expression of the shooting position corresponding to the actual coordinate vector expression.
  • Step 504 Calling the target map model to determine an alternative text expression sequence corresponding to the actual signboard image expression
  • step 502 the execution subject invokes the corresponding relationship between the signboard image expression and the text expression recorded by the target map model, and determines the alternative text expression sequence corresponding to the actual signboard image expression (the reason is the backup).
  • the text expression sequence is selected because the actual signboard images often do not necessarily contain complete image information due to the shooting conditions or influencing factors of the photographer, so in most cases there will be multiple alternative text expressions).
  • Step 505 Based on the distance between the text expression at the shooting location and each alternative text expression in the alternative text expression sequence, adjust the presentation priority of each alternative text expression in the sequence;
  • this step aims to adjust the presentation priority of each alternative text expression in the sequence based on the distance between the text expression at the shooting location and each alternative text expression in the alternative text expression sequence Sort. That is, the larger the distance, the lower the priority of the corresponding alternative text expression in the sequence (for example, the lower the position), and on the contrary, the smaller the distance, the lower the priority of the corresponding alternative text expression in the sequence. The higher the presentation priority is (for example, the higher the ranking).
  • Step 506 Based on the candidate text expression sequence after the presentation priority adjustment, locate the actual location of the target building.
  • this step aims at locating the actual location of the target building based on the alternative text expression sequence after the presentation priority adjustment.
  • the actual location of the target building that needs to be located can be determined more quickly and accurately.
  • the embodiment of the present disclosure also provides a model pre-training method based on the guiding idea of multimodal geographic knowledge enhancement.
  • Multimodal geographic knowledge enhancement is a model that explicitly learns non-universal text knowledge during the pre-training process by improving the model structure or adding pre-training tasks.
  • the map usage scenario targeted by this embodiment utilizes geographic domain data in three modalities: text, geographic coordinates, and signboard images.
  • multi-tasks are used to change the structure of the model.
  • Fully relevant geographic knowledge of the real world is incorporated into pre-trained models.
  • the main parts of the above model pre-training method include: integrating geographic coordinate information into the model, and multimodal geographic information fusion learning.
  • the input can be divided into text sequence (text sequence of XXX Eye Hospital, No. 18, Chuanhui Road, Shanghai) and picture sequence (character image sequence of XXX Eye Hospital) as shown in Figure 6.
  • the text sequence is trained using the above-mentioned improved model structure that additionally introduces geographic coordinate vectors, that is, the Embed process of the text sequence represents the text feature sequence generated before inputting the transformer layer, which incorporates geographic location information, and the image sequence is used
  • the pre-trained single-character text detection model recognizes the picture of each word on the signboard image, and the Embed of a picture sequence represents the use of existing picture feature extraction models (such as ResNet, residual network) to process the output picture sequence
  • the extracted image features represent the fusion of sequences and geographic coordinate vectors.
  • TRM stands for transformer layer. Co-TRM represents information interaction performed by two different modalities.
  • the model is pre-trained with two tasks:
  • Signboard text matching task Given a given text sequence and a signboard picture sequence, predict whether the description of the text is consistent with the expression in the signboard picture.
  • this embodiment utilizes the geographical field data of three modalities of text, geographic coordinates, and signboard images at the same time, and through multi-task in the pre-training stage of the model, changes the way of the model structure, and integrates the geographical knowledge fully related to the real world into the The pre-trained model models more complete spatio-temporal semantics for downstream tasks, so as to achieve the effect of improving various related functions such as search in map products.
  • the present disclosure provides an embodiment of a training device for a target map model and an embodiment of a positioning device respectively, and the training device for a target map model implements
  • the example corresponds to the embodiment of the training method for the target map model shown in FIG. 2
  • the embodiment of the positioning device corresponds to the embodiment of the positioning method.
  • the above device can be specifically applied to various electronic devices.
  • the target map model training apparatus 700 of this embodiment may include: a parameter acquisition unit 701 , a first sub-model training unit 702 , a second sub-model training unit 703 , and a sub-model fusion unit 704 .
  • the parameter acquisition unit 701 is configured to acquire the text expression, coordinate vector expression, and signboard image expression of each map location point;
  • the first sub-model training unit 702 is configured to obtain the text expression and coordinates corresponding to the same map location point
  • the first training sample composed of vector expressions is trained to obtain the first sub-model;
  • the second sub-model training unit 703 is configured to train the second training sample composed of text expressions and signboard image expressions corresponding to the same map location points to obtain The second sub-model;
  • the sub-model fusion unit 704 is configured to fuse the first sub-model and the second sub-model to obtain the target map model.
  • the target map model training device 700 the specific processing of the parameter acquisition unit 701, the first sub-model training unit 702, the second sub-model training unit 703, and the sub-model fusion unit 704 and the resulting
  • the specific processing of the parameter acquisition unit 701, the first sub-model training unit 702, the second sub-model training unit 703, and the sub-model fusion unit 704 and the resulting For the technical effects, reference may be made to the related descriptions of steps 201-204 in the embodiment corresponding to FIG. 2 , which will not be repeated here.
  • the parameter acquisition unit 701 may include a coordinate vector expression acquisition subunit configured to acquire the coordinate vector expression of each map location point, and the coordinate vector expression acquisition subunit may include:
  • the boundary coordinate sequence acquisition module is configured to respectively acquire the boundary coordinate sequences of each map location point;
  • the geocoding set calculation module is configured to use the geocoding algorithm and the boundary coordinate sequence to calculate the geocoding set covering the geographic block where the corresponding map location point is located;
  • a geographic string conversion module configured to convert a geocode set containing each geocode into a geographic string
  • the geographic vector conversion module is configured to convert the geographic character string into a geographic vector, and express the geographic vector as a coordinate vector of a corresponding map location point.
  • the geographic vector conversion module may be further configured to:
  • the parameter acquiring unit 701 may include a signboard image expression acquiring subunit configured to acquire the signboard image expression of each map location point, and the signboard image expression acquiring subunit may include:
  • the signboard image acquisition module is configured to respectively acquire the signboard images of the buildings corresponding to each map location point;
  • the character recognition and character image cutting module is configured to recognize the character part from the signboard image, and cut out a character image corresponding to each character of the character part;
  • the character image sorting module is configured to arrange the character images according to the character sequence of the character part, and use the obtained character image queue as a signboard image expression of the corresponding map location point.
  • the signboard image expression acquisition subunit may also include:
  • the abnormality recognition module is configured to perform image abnormality recognition on the signboard image before identifying the character part from the signboard image; wherein, the image abnormality recognition includes at least one of fuzzy recognition, noise recognition, and skew recognition;
  • the character recognition sub-module in the character recognition and character image cutting module can be further configured as:
  • the sub-model fusion unit 704 may be further configured to:
  • the positioning device 800 of this embodiment may include: an image for positioning and a current position acquisition unit 801, an actual coordinate vector expression and an actual signboard image expression determination unit 802, a shooting location text expression determination unit 803, an alternative text An expression sequence determination unit 804 and a presentation priority sorting adjustment unit 805 .
  • the positioning image and the current position acquisition unit 801 is configured to acquire the positioning image and the current position obtained by photographing the signboard of the target building;
  • the actual coordinate vector expression and actual signboard image expression determination unit 802 is configured to Determine the actual coordinate vector expression for the position, and determine the actual signboard image expression according to the positioning image;
  • the shooting position text expression determination unit 803 is configured to call the target map model to determine the shooting position text expression corresponding to the actual coordinate vector expression; alternative text expression sequence
  • the determination unit 804 is configured to call the target map model to determine the candidate text expression sequence corresponding to the actual sign image expression;
  • the presentation priority ranking adjustment unit 805 is configured to be based on each candidate text expression in the shooting location text expression and the candidate text expression sequence Select the distance between the text expressions, and adjust the presentation priority of each candidate text expression in the sequence;
  • the actual position determination unit 806 is configured to locate the target building based on the candidate text expression sequence after the presentation priority adjustment The actual position of ; wherein, the target map model is obtained according to the training device 700
  • the image to be positioned and the current position acquisition unit 801 the actual coordinate vector expression and the actual signboard image expression determination unit 802, the shooting position text expression determination unit 803, and the alternative text expression sequence determination unit 804.
  • the specific processing of the presentation priority adjusting unit 805 and the technical effects brought about by it may respectively correspond to relevant descriptions in the method embodiments, and will not be repeated here.
  • This embodiment exists as a device embodiment corresponding to the above-mentioned method embodiment.
  • the training device and positioning device of the target map model provided by this embodiment are not only based on the text expression of the map position point during training, but also additionally introduce the coordinate vector expression And signage image expression, so that the pre-trained model in multiple dimensions can make full use of the spatio-temporal big data in the map field, so that the information contained in the pre-trained model can be more related to the real world, so that it can better integrate users in practical applications
  • the current position of the camera and the positioning image obtained by shooting can obtain more accurate positioning results.
  • the present disclosure also provides an electronic device, the electronic device includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information executable by the at least one processor. instructions, the instructions are executed by at least one processor, so that the at least one processor can implement the training method and/or positioning method for the target map model described in any of the above embodiments when executed.
  • the present disclosure also provides a readable storage medium, the readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to implement the target map model described in any of the above embodiments. training methods and/or positioning methods.
  • An embodiment of the present disclosure provides a computer program product.
  • the computer program is executed by a processor, the method for training the target map model and/or the positioning method described in any of the above embodiments can be implemented.
  • FIG. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 900 includes a computing unit 901 that can execute according to a computer program stored in a read-only memory (ROM) 902 or loaded from a storage unit 908 into a random-access memory (RAM) 903. Various appropriate actions and treatments. In the RAM 903, various programs and data necessary for the operation of the device 900 can also be stored.
  • the computing unit 901, ROM 902, and RAM 903 are connected to each other through a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904 .
  • the I/O interface 905 includes: an input unit 906, such as a keyboard, a mouse, etc.; an output unit 907, such as various types of displays, speakers, etc.; a storage unit 908, such as a magnetic disk, an optical disk, etc. ; and a communication unit 909, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 909 allows the device 900 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 901 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 901 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the computing unit 901 executes various methods and processes described above, such as a training method and/or a positioning method of a target map model.
  • the object map model training method and/or localization method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 908 .
  • part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909 .
  • the computer program When the computer program is loaded into RAM 903 and executed by computing unit 901, one or more steps of the training method and/or positioning method of the target map model described above can be performed.
  • the computing unit 901 may be configured in any other appropriate way (for example, by means of firmware) to execute a target map model training method and/or positioning method.
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • EPROM Erasable Programmable Read Only Memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the management difficulties in traditional physical host and virtual private server (VPS, Virtual Private Server) services Large and weak business expansion.
  • cloud server also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the management difficulties in traditional physical host and virtual private server (VPS, Virtual Private Server) services Large and weak business expansion.
  • VPN Virtual Private Server
  • the technical solutions of the embodiments of the present disclosure not only based on the text expression of map location points, but also additionally introduce coordinate vector expression and signboard image expression during training, so that the pre-trained model in multiple dimensions can make full use of the spatiotemporal space of the map field.
  • the data makes the information contained in the pre-training model more relevant to the real world, and in actual application, it can better combine the user's current location and the captured positioning images to obtain more accurate positioning results.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to the technical field of artificial intelligence, such as deep learning, natural language understanding and intelligent search. Provided are a method and apparatus for training a target map model, a positioning method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product. The method for training a target map model comprises: acquiring a text expression, a coordinate vector expression and a signboard image expression of each map location point; performing training according to a first training sample composed of the text expression and the coordinate vector expression that correspond to the same map location point, so as to obtain a first sub-model; performing training according to a second training sample composed of the text expression and the signboard image expression that correspond to the same map location point, so as to obtain a second sub-model; and fusing the first sub-model and the second sub-model, so as to obtain a target map model. A target map model that is obtained by means of training using the method can better combine the current location of a user with an image that is captured for positioning, so as to obtain a more accurate positioning result.

Description

目标地图模型的训练方法、定位方法及相关装置Target map model training method, positioning method and related device
相关申请的交叉引用Cross References to Related Applications
本专利申请要求于2021年10月18日提交的、申请号为202111211145.2、发明名称为“目标地图模型的训练方法、定位方法及相关装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。This patent application claims the priority of the Chinese patent application filed on October 18, 2021 with the application number 202111211145.2 and the title of the invention is "Training method for target map model, positioning method and related devices", the full text of which is cited incorporated into this application.
技术领域technical field
本公开涉及数据处理技术领域,具体涉及深度学习、自然语言理解、智能搜索等人工智能技术领域,尤其涉及一种目标地图模型的训练方法和定位方法,以及对应的装置、电子设备、计算机可读存储介质及计算机程序产品。The present disclosure relates to the field of data processing technology, specifically to the field of artificial intelligence technology such as deep learning, natural language understanding, and intelligent search, and in particular to a training method and positioning method for a target map model, as well as corresponding devices, electronic equipment, and computer-readable Storage media and computer program products.
背景技术Background technique
预训练模型在自然语言处理领域以及多个行业的产品上取得了非常大的进步。通过在大规模数据的学习,预训练模型可以更好的对字、词、句子等表示进行建模。基于预训练模型,利用特定任务的标注样本进行模型微调,通常可以取得非常好的效果。Pre-trained models have made great progress in the field of natural language processing and products in multiple industries. By learning from large-scale data, the pre-training model can better model representations of words, words, sentences, etc. Based on the pre-trained model, fine-tuning the model using labeled samples of specific tasks can usually achieve very good results.
地图领域比较特殊,地图领域的信息处理过程往往需要与现实世界产生关联。例如,在地图检索引擎中,当用户输入一个查询词时,候选词本身的位置和它与用户当前所在位置的距离都是非常重要的排序特征。The map field is special, and the information processing process in the map field often needs to be associated with the real world. For example, in a map retrieval engine, when a user inputs a query word, the position of the candidate word itself and its distance from the user's current location are very important ranking features.
当前地图领域的文本数据以结构化数据为主,包含的信息较为精简和有限,通常只有名称、别名、地址、类别这几类信息。而地图领域与现实世界关联性比较强的信息往往无法直观的通过文本表示出来。Currently, the text data in the map field is mainly structured data, and the information contained is relatively streamlined and limited, usually only names, aliases, addresses, and categories. However, information that has a strong correlation between the map field and the real world often cannot be expressed intuitively through text.
发明内容Contents of the invention
本公开实施例提出了一种目标地图模型的训练、定位方法、装置、电子设备、计算机可读存储介质及计算机程序产品。Embodiments of the present disclosure provide a method, device, electronic device, computer readable storage medium and computer program product for training and positioning a target map model.
第一方面,本公开实施例提出了一种目标地图模型的训练方法,包括:获取各地图位置点的文本表达、坐标向量表达、招牌图像表达;根据由对应相同地图位置点的文本表达和坐标向量表达构成的第一训练样本,训练得到第一子模型;根据由对应相同地图位置点的文本表达和招牌图像表达构成的第二训练样本,训练得到第二子模型;融合第一子模型和第二子模型,得到目标地图模型。In the first aspect, the embodiment of the present disclosure proposes a training method for a target map model, including: obtaining the text expression, coordinate vector expression, and signboard image expression of each map location point; The first training sample composed of vector expressions is trained to obtain the first sub-model; according to the second training sample composed of text expressions and signboard image expressions corresponding to the same map location point, the second sub-model is trained to obtain the second sub-model; the fusion of the first sub-model and The second sub-model obtains the target map model.
第二方面,本公开实施例提出了一种目标地图模型的训练装置,包括:参数获取单元,被配置成获取各地图位置点的文本表达、坐标向量表达、招牌图像表达;第一子模型训练单元,被配置成根据由对应相同地图位置点的文本表达和坐标向量表达构成的第一训练样本,训练得到第一子模型;第二子模型训练单元,被配置成根据由对应相同地图位置点的文本表达和招牌图像表达构成的第二训练样本,训练得到第二子模型;子模型融合单元,被配置成融合第一子模型和第二子模型,得到目标地图模型。In the second aspect, the embodiment of the present disclosure proposes a training device for a target map model, including: a parameter acquisition unit configured to acquire the text expression, coordinate vector expression, and signboard image expression of each map location point; the first sub-model training The unit is configured to obtain the first sub-model according to the first training sample composed of text expression and coordinate vector expression corresponding to the same map position point; the second sub-model training unit is configured to obtain the first sub-model according to the corresponding same map position point The second training sample composed of the text expression and the signboard image expression is trained to obtain the second sub-model; the sub-model fusion unit is configured to fuse the first sub-model and the second sub-model to obtain the target map model.
第三方面,本公开实施例提出了一种定位方法,包括:获取对目标建筑物的招牌拍摄得到的定位用图像和当前位置;根据当前位置确定实际坐标向量表达、根据定位用图像确定实际招牌图像表达;调用目标地图模型确定与实际坐标向量表达对应的拍摄位置文本表达;调用目标地图模型确定与实际招牌图像表达对应的备选文本表达序列;基于拍摄位置文本表达与备选文本表达序列中各备选文本表达之间的距离大小,调整各备选文本表达在序列中的呈现优先级排序;基于呈现优先级调整后的备选文本表达序列,定位目标建筑物的实际位置;目标地图模型根据如第一方面中任一实现方式描述的目标地图模型的训练方法得到。In the third aspect, the embodiment of the present disclosure proposes a positioning method, including: acquiring the positioning image obtained by shooting the signboard of the target building and the current position; determining the actual coordinate vector expression according to the current position, and determining the actual signboard according to the positioning image Image expression; call the target map model to determine the text expression of the shooting position corresponding to the actual coordinate vector expression; call the target map model to determine the alternative text expression sequence corresponding to the actual signboard image expression; based on the text expression of the shooting position and the alternative text expression sequence Adjust the presentation priority of each alternative text expression in the sequence based on the distance between the alternative text expressions; locate the actual location of the target building based on the adjusted alternative text expression sequence based on the presentation priority; target map model Obtained according to the training method of the target map model as described in any implementation manner in the first aspect.
第四方面,本公开实施例提出了一种定位装置,包括:定位用图像及当前位置获取单元,被配置成获取对目标建筑物的招牌拍摄得到的定位用图像和当前位置;实际坐标向量表达及实际招牌图像表达确定单元,被配置成根据当前位置确定实际坐标向量表达、根据定位用图像确定实际招牌图像表达;拍摄位置文本表达确定单元,被配置成调用目标地图模型确定与实际坐标向量表达对应的拍摄位置文本表达;备选文本表达序列确定单元,被配置成调用目标地图模型确定与实际招牌图像表达对应的备选文本表达序列;呈现优先级排序调整单元,被配置成基于拍摄位置 文本表达与备选文本表达序列中各备选文本表达之间的距离大小,调整各备选文本表达在序列中的呈现优先级排序;实际位置确定单元,被配置成基于呈现优先级调整后的备选文本表达序列,定位目标建筑物的实际位置,目标地图模型根据如第二方面中任一实现方式描述的目标地图模型的训练装置得到。In the fourth aspect, the embodiment of the present disclosure proposes a positioning device, including: an image for positioning and a current position acquisition unit configured to acquire the image for positioning and the current position obtained by photographing the signboard of the target building; the actual coordinate vector expression and an actual signboard image expression determining unit, configured to determine the actual coordinate vector expression according to the current position, and determine the actual signboard image expression according to the positioning image; the shooting position text expression determining unit is configured to call the target map model determination and the actual coordinate vector expression The corresponding shooting position text expression; the alternative text expression sequence determination unit is configured to call the target map model to determine the alternative text expression sequence corresponding to the actual sign image expression; the presentation priority adjustment unit is configured to be based on the shooting position text The distance between the expression and each candidate text expression in the candidate text expression sequence adjusts the presentation priority ranking of each candidate text expression in the sequence; the actual position determination unit is configured to adjust the candidate text expression based on the presentation priority Select the text expression sequence, locate the actual location of the target building, and obtain the target map model according to the training device for the target map model described in any implementation manner in the second aspect.
第五方面,本公开实施例提供了一种电子设备,该电子设备包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,该指令被至少一个处理器执行,以使至少一个处理器执行时能够实现如第一方面中任一实现方式描述的目标地图模型的训练方法或如第三方面中任一实现方式描述的定位方法。In a fifth aspect, an embodiment of the present disclosure provides an electronic device, the electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by the at least one processor , the instructions are executed by at least one processor, so that at least one processor can realize the training method of the target map model as described in any implementation manner in the first aspect or the positioning as described in any implementation manner in the third aspect when executed method.
第六方面,本公开实施例提供了一种存储有计算机指令的非瞬时计算机可读存储介质,该计算机指令用于使计算机执行时能够实现如第一方面中任一实现方式描述的目标地图模型的训练方法或如第三方面中任一实现方式描述的定位方法。In the sixth aspect, the embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to implement the target map model described in any implementation manner in the first aspect. The training method or the positioning method described in any implementation manner in the third aspect.
第七方面,本公开实施例提供了一种包括计算机程序的计算机程序产品,该计算机程序在被处理器执行时能够实现如第一方面中任一实现方式描述的目标地图模型的训练方法或如第三方面中任一实现方式描述的定位方法。In the seventh aspect, the embodiments of the present disclosure provide a computer program product including a computer program. When the computer program is executed by a processor, it can realize the training method of the target map model as described in any implementation manner in the first aspect or as described in The positioning method described in any implementation manner in the third aspect.
本公开实施例提供的目标地图模型的训练方法、定位方法,通过在训练时不仅基于地图位置点的文本表达,还额外引入坐标向量表达和招牌图像表达,使得在多个维度下进行预训练的模型充分利用地图领域的时空大数据让预训练模型蕴含的信息与现实世界产生更多关联,进而在实际应用时可以更好的结合用户的当前位置、拍摄得到的定位用图像,得出更准确的定位结果。The training method and positioning method of the target map model provided by the embodiments of the present disclosure are not only based on the text expression of the map position point during training, but also additionally introduce coordinate vector expression and signboard image expression, so that pre-training in multiple dimensions The model makes full use of the spatio-temporal big data in the field of maps to make the information contained in the pre-training model more relevant to the real world, and then in actual application, it can better combine the user's current location and the captured positioning images to obtain more accurate results. positioning results.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本公开的其它特征、目的和优点将会变得更明显:Other characteristics, objects and advantages of the present disclosure will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:
图1是本公开可以应用于其中的示例性系统架构;FIG. 1 is an exemplary system architecture to which the present disclosure may be applied;
图2为本公开实施例提供的一种目标地图模型的训练方法的流程图;FIG. 2 is a flowchart of a method for training a target map model provided by an embodiment of the present disclosure;
图3为本公开实施例提供的一种获取地图位置点的坐标向量表达的方法的流程图;FIG. 3 is a flow chart of a method for obtaining a coordinate vector representation of a map location point provided by an embodiment of the present disclosure;
图4为本公开实施例提供的一种获取地图位置点的招牌图像表达的方法的流程图;FIG. 4 is a flow chart of a method for acquiring a signboard image representation of a map location point provided by an embodiment of the present disclosure;
图5为本公开实施例提供的一种定位方法的流程图;FIG. 5 is a flowchart of a positioning method provided by an embodiment of the present disclosure;
图6为本公开实施例提供的一种融合图片序列和文本序列的示意图;FIG. 6 is a schematic diagram of a fusion of a picture sequence and a text sequence provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种目标地图模型的训练装置的结构框图;FIG. 7 is a structural block diagram of a training device for a target map model provided by an embodiment of the present disclosure;
图8为本公开实施例提供的一种定位装置的结构框图;FIG. 8 is a structural block diagram of a positioning device provided by an embodiment of the present disclosure;
图9为本公开实施例提供的一种适用于执行目标地图模型的训练方法和/或定位方法的电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device suitable for performing a target map model training method and/or a positioning method provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other.
本公开的技术方案中,所涉及的用户个人信息的收集、存储、使用、加工、传输、提供和公开等处理,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of this disclosure, the collection, storage, use, processing, transmission, provision, and disclosure of user personal information involved are all in compliance with relevant laws and regulations, and do not violate public order and good customs.
图1示出了可以应用本公开的用于训练人脸识别模型以及识别人脸的方法、装置、电子设备及计算机可读存储介质的实施例的示例性系统架构100。FIG. 1 shows an exemplary system architecture 100 to which embodiments of the method, device, electronic device, and computer-readable storage medium for training a face recognition model and recognizing faces of the present disclosure can be applied.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , a system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103和服务器105上可以安装有各种用于实现两者之间进行信息通讯的应用,例如地图检索模型训练类应用、地图检索类应用、定位类应用等。Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like. Various applications for realizing information communication between the terminal devices 101, 102, 103 and the server 105 can be installed, such as map retrieval model training applications, map retrieval applications, positioning applications, and the like.
终端设备101、102、103和服务器105可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等;当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中,其可以实现成多个软件或软件模块,也可以实现成单个软件或软件模块,在此不做具体限定。当服务器105为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器;服务器为软件时,可以实现成多个软件或软件模块,也可以实现成单个软件或软件模块,在此不做具体限定。The terminal devices 101, 102, 103 and the server 105 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they can be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop computers and desktop computers, etc.; when the terminal devices 101, 102 When , 103 is software, it can be installed in the electronic devices listed above, and it can be implemented as multiple software or software modules, or can be implemented as a single software or software module, which is not specifically limited here. When the server 105 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server; when the server is software, it can be implemented as multiple software or software modules, or as a single software or software The module is not specifically limited here.
服务器105通过内置的各种应用可以提供各种服务,以可以为用户提供定位服务的定位类应用为例,服务器105在运行该定位类应用时可实现如下效果:首先,接收终端设备101、102、103通过网络104传入的对目标建筑物的招牌拍摄的定位用图像和终端设备101、102、103的当前位置;然后,根据当前位置确定实际坐标向量表达、根据定位用图像确定实际招牌图像表达;接着,调用目标地图模型确定与实际坐标向量表达对应的拍摄位置文本表达;下一步,调用目标地图模型确定与实际招牌图像表达对应的备选文本表达序列;接下来,基于拍摄位置文本表达与备选文本表达序列中各备选文本表达之间的距离大小,调整各备选文本表达在序列中的呈现优先级排序;最后,将呈现优先级调整后的备选文本 表达序列通过网络104返回至终端设备101、102、103,以便于用户根据终端设备101、102、103呈现出的结果定位出目标建筑物的实际位置。The server 105 can provide various services through various built-in applications. Taking a positioning application that can provide positioning services for users as an example, the server 105 can achieve the following effects when running the positioning application: First, the receiving terminal devices 101, 102 , 103 through the network 104 incoming images for positioning and the current position of the terminal equipment 101, 102, 103; then, determine the actual coordinate vector expression according to the current position, and determine the actual signboard image according to the image for positioning expression; then, call the target map model to determine the text expression of the shooting position corresponding to the actual coordinate vector expression; next step, call the target map model to determine the alternative text expression sequence corresponding to the actual sign image expression; next, based on the text expression of the shooting position and the distance between each alternative text expression in the alternative text expression sequence, adjust the presentation priority of each alternative text expression in the sequence; finally, pass the alternative text expression sequence after the presentation priority adjustment through the network 104 Return to the terminal devices 101, 102, 103 so that the user can locate the actual location of the target building according to the results presented by the terminal devices 101, 102, 103.
其中,目标地图模型可由服务器105上内置的地图检索模型训练类应用按如下步骤训练得到:获取各地图位置点的文本表达、坐标向量表达、招牌图像表达;根据由对应相同地图位置点的文本表达和坐标向量表达构成的第一训练样本,训练得到第一子模型;根据由对应相同地图位置点的文本表达和招牌图像表达构成的第二训练样本,训练得到第二子模型;融合所述第一子模型和第二子模型,得到目标地图模型。Among them, the target map model can be trained by the built-in map retrieval model training application on the server 105 according to the following steps: obtain the text expression, coordinate vector expression, and signboard image expression of each map location point; and the first training sample composed of the expression of the coordinate vector to obtain the first sub-model through training; according to the second training sample composed of the text expression corresponding to the same map position point and the image expression of the signboard, the second sub-model is obtained through training; the fusion of the first sub-model The first sub-model and the second sub-model are used to obtain the target map model.
由于为训练得到目标地图模型需要占用较多的运算资源和较强的运算能力,因此本公开后续各实施例所提供的目标地图模型的训练方法一般由拥有较强运算能力、较多运算资源的服务器105来执行,相应地,目标地图模型的训练装置一般也设置于服务器105中。但同时也需要指出的是,在终端设备101、102、103也具有满足要求的运算能力和运算资源时,终端设备101、102、103也可以通过其上安装的目标地图模型的训练类应用完成上述本交由服务器105做的各项运算,进而输出与服务器105同样的结果。相应的,目标地图模型的训练装置也可以设置于终端设备101、102、103中。在此种情况下,示例性系统架构100也可以不包括服务器105和网络104。Since obtaining the target map model for training needs to occupy more computing resources and strong computing power, the training methods for the target map model provided by the subsequent embodiments of the present disclosure are generally provided by those with strong computing power and more computing resources. The server 105 executes, and correspondingly, the training device of the target map model is generally also set in the server 105. However, it should also be pointed out that when the terminal devices 101, 102, and 103 also have computing power and computing resources that meet the requirements, the terminal devices 101, 102, and 103 can also complete the training through the training application of the target map model installed on them. The above calculations are performed by the server 105 , and then output the same results as the server 105 . Correspondingly, the training device for the target map model can also be set in the terminal devices 101 , 102 , 103 . In this case, exemplary system architecture 100 may also exclude server 105 and network 104 .
当然,用于训练得到目标地图模型的服务器可以不同于调用训练好的目标地图模型来使用的服务器。特殊的,经由服务器105训练得到的目标地图模型也可以通过模型蒸馏的方式得到适合置入终端设备101、102、103的轻量级的目标地图模型,即可以根据实际需求的识别准确度灵活选择使用终端设备101、102、103中的轻量级的目标地图模型,还是选择使用服务器105中的较复杂的目标地图模型。Of course, the server used to train the target map model may be different from the server used to call the trained target map model. In particular, the target map model trained by the server 105 can also obtain a lightweight target map model suitable for placement in the terminal devices 101, 102, and 103 through model distillation, that is, it can be flexibly selected according to the recognition accuracy of actual needs Whether to use the lightweight target map model in the terminal devices 101 , 102 , 103 or choose to use the more complex target map model in the server 105 .
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
请参考图2,图2为本公开实施例提供的一种目标地图模型的训练方法的流程图,其中流程200包括以下步骤:Please refer to FIG. 2. FIG. 2 is a flow chart of a method for training a target map model provided by an embodiment of the present disclosure, wherein the process 200 includes the following steps:
步骤201:获取各地图位置点的文本表达、坐标向量表达、招牌图 像表达;Step 201: Obtain the text expression, coordinate vector expression, and signboard image expression of each map location point;
本步骤旨在由目标地图模型的训练方法的执行主体(例如图1所示的服务器105)获取各地图位置点的文本表达、坐标向量表达、招牌图像表达。其中,文本表达就是用文本形式描述的该地图位置点,例如XX市XX路XX号XX大楼(大厦、广场、医院、饭店等),坐标向量表达则是对该地图位置点对应的物体(通常为建筑物)在真实世界的地理坐标的向量化表达,招牌图像表达则是对该地图位置点的物体(通常为建筑物)的招牌的图像描述,即招牌图像描述用于体现建筑物所属方对其定制的招牌样式,体现了两者之间的强关联关系。This step aims to obtain the text expression, coordinate vector expression, and signboard image expression of each map location point by the execution subject of the training method of the target map model (for example, the server 105 shown in FIG. 1 ). Wherein, the text expression is exactly this map location point described in text form, such as XX building (building, square, hospital, restaurant, etc.), No. is the vectorized expression of the geographic coordinates of the building) in the real world, and the image representation of the signboard is the image description of the signboard of the object (usually a building) at the location point of the map, that is, the image description of the signboard is used to reflect the location of the building The customized signboard style reflects the strong relationship between the two.
其中,坐标向量表达可以将多种形式的坐标以多种的向量化编码方式转换为向量表达,例如将地图位置点对应建筑物的边界坐标序列转换为向量表达,也可以在此基础上对边界坐标序列再进行一些处理,例如引入地理编码算法来将边界坐标序列转换为另一种表现形式,或者就简单的用框住建筑物的矩形的四角坐标充当边界坐标,并通过诸如哈希算法等方式得到边界坐标的向量化表达,此处不做具体限定,可根据实际应用场景来自行选择合适的处理方式。Among them, the coordinate vector expression can convert various forms of coordinates into a vector expression in a variety of vectorized encoding methods, for example, transforming the boundary coordinate sequence of a map position point corresponding to a building into a vector expression, and on this basis, the boundary The coordinate sequence is processed again, such as introducing a geocoding algorithm to convert the boundary coordinate sequence into another form of representation, or simply using the four-corner coordinates of the rectangle that frame the building as the boundary coordinates, and through such as a hash algorithm, etc. The vectorized expression of the boundary coordinates can be obtained by using the method, which is not specifically limited here, and the appropriate processing method can be selected according to the actual application scenario.
其中,招牌图像表达则是以对招牌拍摄得到的图像为基础,得到的能够表现出招牌的图像特征的表达方式,例如设定拍摄分辨率、字符清晰度、字符图像提取、处理方式等。Among them, signboard image expression is an expression method that can show the image characteristics of the signboard based on the image captured by the signboard, such as setting shooting resolution, character definition, character image extraction, processing methods, etc.
步骤202:根据由对应相同地图位置点的文本表达和坐标向量表达构成的第一训练样本,训练得到第一子模型;Step 202: According to the first training sample composed of text expression and coordinate vector expression corresponding to the same map location point, train to obtain the first sub-model;
在步骤201的基础上,本步骤旨在由上述执行主体将对应相同地图位置点的文本表达作为样本输入、将对应相同地图位置点的坐标向量表达作为样本输出,并使用由此构成的第一训练样本来训练第一子模型,使得训练完成的第一子模型能够建立起相同地图位置点的文本表达和坐标向量表达之间的对应关系,以便于后续根据该对应关系匹配与输入信息对应的输出信息。例如输入当前所在位置的坐标向量,匹配出当前位置的文本描述。On the basis of step 201, this step aims to input the text expression corresponding to the same map position point as a sample, output the coordinate vector expression corresponding to the same map position point as a sample, and use the first Training samples are used to train the first sub-model, so that the trained first sub-model can establish the corresponding relationship between the text expression and the coordinate vector expression of the same map position point, so as to facilitate the subsequent matching of the corresponding input information according to the corresponding relationship. Output information. For example, input the coordinate vector of the current location to match the text description of the current location.
步骤203:根据由对应相同地图位置点的文本表达和招牌图像表达构成的第二训练样本,训练得到第二子模型;Step 203: According to the second training sample composed of text expression and signboard image expression corresponding to the same map location point, train to obtain the second sub-model;
在步骤201的基础上,本步骤旨在由上述执行主体将对应相同地图位置点的文本表达作为样本输入、将对应相同地图位置点的招牌图像表达作为样本输出,并使用由此构成的第二训练样本来训练第二子模型,使得训练完成的第二子模型能够建立起相同地图位置点的文本表达和招牌图像表达之间的对应关系,以便于后续根据该对应关系匹配与输入信息对应的输出信息。例如输入对某个建筑物招牌拍摄得到的图像,匹配出对应建筑物的文本描述。On the basis of step 201, this step aims to input the text expression corresponding to the same map location point as a sample, output the signboard image expression corresponding to the same map location point as a sample, and use the second Training samples to train the second sub-model, so that the trained second sub-model can establish the corresponding relationship between the text expression of the same map location point and the image expression of the signboard, so as to facilitate the subsequent matching of the input information corresponding to the corresponding relationship. Output information. For example, input an image taken of a building signboard, and match the text description of the corresponding building.
步骤204:融合第一子模型和第二子模型,得到目标地图模型。Step 204: Fusion of the first sub-model and the second sub-model to obtain the target map model.
在步骤202的基础上,本步骤旨在由上述执行主体融合第一子模型和第二子模型,以将第一子模型和第二子模型中的文本表达作为衔接点,实现建立相同地图位置点的文本表达、坐标向量表达以及招牌图像表达这三者之间的对应关系(形如A与B与C三者之间的对应关系),以使得最终融合出的目标地图模型具有能够根据三者中的其中一者或两者准确确定出另一者的效果。On the basis of step 202, this step aims to fuse the first sub-model and the second sub-model by the above-mentioned executive body, so that the text expressions in the first sub-model and the second sub-model can be used as connection points to realize the establishment of the same map position The corresponding relationship between the text expression of the point, the coordinate vector expression and the signboard image expression (such as the correspondence between A, B, and C), so that the final fused target map model can be based on the three One or both of them accurately determine the effect of the other.
需要说明的是,第一子模型和第二子模型的训练过程是独立进行的,但在融合后并不意味着融合后的初始地图模型就不再需要训练,往往还需要再次进行少次的训练来使融合后的地图模型的整体参数达到整体最佳。例如首先融合第一子模型和第二子模型,得到初始地图模型;然后,调整初始地图模型的参数直至满足预设的迭代跳出条件,并将满足迭代跳出条件的初始地图模型输出为目标地图模型。具体的,此时为融合后的地图模型设定的迭代跳出条件往往不同于为第一子模型和第二子模型设定的迭代跳出条件,除非在某些场景下设定的迭代跳出条件是针对相邻次迭代结果精度差别是否满足要求等通用条件。It should be noted that the training process of the first sub-model and the second sub-model is carried out independently, but after fusion, it does not mean that the fused initial map model no longer needs to be trained, and often needs to be repeated a few times. Training to make the overall parameters of the fused map model the best overall. For example, first fuse the first sub-model and the second sub-model to obtain the initial map model; then, adjust the parameters of the initial map model until the preset iterative jump-out condition is met, and output the initial map model that meets the iterative jump-out condition as the target map model . Specifically, the iterative jump-out condition set for the fused map model at this time is often different from the iterative jump-out condition set for the first sub-model and the second sub-model, unless the iterative jump-out condition set in some scenarios is Whether the accuracy difference of adjacent iteration results meets the general conditions such as requirements.
具体的,本实施例中的第一子模型、第二子模型、融合后的地图模型可采用多种模型框架来实现,例如基于通常适用于自然语言处理领域的BERT(Bidirectional Encoder Representations from Transformers,中文直译为来自翻译器的双向编码器表示)模型,也可以采用其它有类似效果的模型,此处不再一一列举。Specifically, the first sub-model, the second sub-model, and the fused map model in this embodiment can be implemented using various model frameworks, for example, based on BERT (Bidirectional Encoder Representations from Transformers, which is generally applicable to the field of natural language processing, Chinese literal translation is a two-way encoder representation from a translator) model, and other models with similar effects can also be used, which will not be listed here.
本公开实施例提供的目标地图模型的训练方法,通过在训练时不仅基于地图位置点的文本表达,还额外引入坐标向量表达和招牌图像 表达,使得在多个维度下进行预训练的模型充分利用地图领域的时空大数据让预训练模型蕴含的信息与现实世界产生更多关联,进而在实际应用时可以更好的结合用户的当前位置、拍摄得到的定位用图像,得出更准确的定位结果。The training method of the target map model provided by the embodiment of the present disclosure is not only based on the text expression of the map position point, but also additionally introduces the coordinate vector expression and the sign image expression during training, so that the pre-trained model in multiple dimensions can make full use of Spatio-temporal big data in the field of maps makes the information contained in the pre-training model more relevant to the real world, and then in practical applications, it can better combine the user's current location and the captured positioning images to obtain more accurate positioning results .
请参考图3,图3为本公开实施例提供的一种获取地图位置点的坐标向量表达的方法的流程图,即针对图2所示的流程200中的步骤201的坐标向量表达提供了一种具体的实现方式,流程200中的其它步骤并不做调整,也将本实施例所提供的具体实现方式以替换步骤的方式得到一个新的完整实施例。其中流程300包括以下步骤:Please refer to FIG. 3. FIG. 3 is a flowchart of a method for obtaining a coordinate vector expression of a map location point provided by an embodiment of the present disclosure, that is, a coordinate vector expression for step 201 in the process 200 shown in FIG. 2 is provided. A specific implementation manner, other steps in the process 200 are not adjusted, and a new complete embodiment is obtained by replacing steps with the specific implementation manner provided in this embodiment. Wherein the process 300 includes the following steps:
步骤301:分别获取各地图位置点的边界坐标序列;Step 301: Acquire the boundary coordinate sequences of each map location point respectively;
以建筑物为例,所以隶属于该地图位置点的所有建筑物的外轮廓的坐标序列,即为该边界坐标序列。Taking buildings as an example, the coordinate sequence of the outer contours of all buildings belonging to the map location point is the boundary coordinate sequence.
以某个由5栋楼构成的医院为例,其最外侧围的三栋楼的外轮廓的地理坐标序列为该边界坐标序列。其中,从连续的外轮廓中取点的频率或间隔可自行设定。Taking a hospital composed of 5 buildings as an example, the geographic coordinate sequence of the outer contours of the three outermost buildings is the boundary coordinate sequence. Among them, the frequency or interval of taking points from the continuous outer contour can be set by yourself.
步骤302:利用地理编码算法和边界坐标序列,计算得到覆盖相应地图位置点所在地理区块的地理编码集合;Step 302: Using the geocoding algorithm and the boundary coordinate sequence, calculate the geocoding set covering the geographic block where the corresponding map location point is located;
在步骤301的基础上,本步骤旨在由上述执行主体利用地理编码算法和边界坐标序列,计算得到覆盖相应地图位置点所在地理区块的地理编码集合。即该地理编码集合中的每个地理编码均对应边界坐标序列中的一个边界坐标。On the basis of step 301, this step aims to use the geocoding algorithm and the boundary coordinate sequence to calculate the geocoding set covering the geographic block where the corresponding map location point is located by the above-mentioned execution subject. That is, each geocode in the set of geocodes corresponds to a boundary coordinate in the sequence of boundary coordinates.
具体的,地理编码算法可具体选用Geohash算法、Google s2算法等,Geohash算法是一种把二维的空间经纬度数据编码成一个字符串的地址编码方法,而Google s2算法则是来自几何数学中的一个数学符号S 2,它表示的是单位球,因此s2算法是被设计用来解决球面上各种几何问题的,因所处真实世界实际上也是一个球体,因此也可以被用作地址编码算法。 Specifically, the geocoding algorithm can specifically choose the Geohash algorithm, Google s2 algorithm, etc. The Geohash algorithm is an address encoding method that encodes two-dimensional spatial longitude and latitude data into a string, and the Google s2 algorithm comes from geometric mathematics. A mathematical symbol S 2 , which represents the unit sphere, so the s2 algorithm is designed to solve various geometric problems on the sphere, because the real world is actually a sphere, so it can also be used as an address encoding algorithm .
步骤303:将包含各地理编码的地理编码集合转换为地理字符串;Step 303: converting the geocode set containing each geocode into a geographic string;
在步骤302的基础上,本步骤旨在由上述执行主体将包含各地理编码的地理编码集合转换为地理字符串,例如将该地理编码集合中的 各地理编码按层级结果转换为树结构,并以固定方式遍历,从而将其转换为代表该地图位置点所处地理区域的地理字符串。On the basis of step 302, this step aims to convert the geocode set containing each geocode into a geographic character string by the above-mentioned executive body, for example, convert each geocode in the geocode set into a tree structure according to the hierarchical result, and Traversed in a fixed fashion, converting it to a geographic string representing the geographic area within which this map location point falls.
步骤304:将地理字符串转换为地理向量,并将地理向量作为相应地图位置点的坐标向量表达。Step 304: Convert the geographic character string into a geographic vector, and express the geographic vector as a coordinate vector of a corresponding map location point.
在步骤303的基础上,本步骤旨在由上述执行主体地理字符串转换为地理向量,并将地理向量作为相应地图位置点的坐标向量表达。On the basis of step 303, this step aims to convert the geographic character string into a geographic vector by the execution subject, and express the geographic vector as a coordinate vector of a corresponding map location point.
在已经获取到地理字符串的基础上,地理字符串与地理向量之间的转换规则可以自行设定,也可以使用可转换出向量形式结果的模型,例如将地理字符串输入预设的向量表达转换模型;其中,向量表达转换模型用于表征地理字符串与地理向量之间的对应关系,例如卷积神经网络、循环神经网络等;然后,接收向量表达转换模型输出的地理向量。On the basis of the geographic strings that have been obtained, the conversion rules between geographic strings and geographic vectors can be set by yourself, or a model that can convert the results in vector form can be used, such as inputting geographic strings into preset vector expressions A conversion model; wherein, the vector expression conversion model is used to characterize the corresponding relationship between geographic character strings and geographic vectors, such as convolutional neural network, recurrent neural network, etc.; then, the geographic vector output by the vector representation conversion model is received.
按照上述方法,可以按国家,省,市,区,县,道路为层级,构建出一个地理区块向量化词表。在预训练模型进行预测的阶段,赋予每个地理实体其对应的地理区块向量。According to the above method, a geographic block vectorized vocabulary can be constructed at the level of country, province, city, district, county, and road. In the prediction stage of the pre-training model, each geographic entity is given its corresponding geographic block vector.
除本实施例给出的上述这种获取到地图位置点的坐标向量表达方式外,也可以根据实际应用场景下的实际需求,对本实施例中的某些步骤进行改进或调整,以得到不同于本实施例但更适合实际应用场景需求的其它实现方式。In addition to the above-mentioned way of obtaining the coordinate vector expression of the map location point given in this embodiment, some steps in this embodiment can also be improved or adjusted according to the actual needs in the actual application scene, so as to obtain different This embodiment is more suitable for other implementations required by actual application scenarios.
请参考图4,图4为本公开实施例提供的一种获取地图位置点的招牌图像表达的方法的流程图,即针对图2所示的流程200中的步骤201中的招牌图像表达提供了一种具体的实现方式,流程200中的其它步骤并不做调整,也将本实施例所提供的具体实现方式以替换其它步骤的方式得到一个新的完整实施例。其中流程400包括以下步骤:Please refer to FIG. 4. FIG. 4 is a flow chart of a method for obtaining a signboard image expression of a map location point provided by an embodiment of the present disclosure, that is, for the signboard image expression in step 201 of the process 200 shown in FIG. In a specific implementation mode, other steps in the process 200 are not adjusted, and a new complete embodiment is obtained by replacing other steps with the specific implementation mode provided in this embodiment. Wherein the process 400 includes the following steps:
步骤401:分别获取各地图位置点对应建筑的招牌图像;Step 401: Obtain the signboard image of the building corresponding to each map location point respectively;
本步骤旨在由上述执行主体首先获取到与地图位置点对应建筑物的招牌图像。并应尽可能的保证对不同的招牌拍摄时应当保持尽可能的相同的参数,例如拍摄设备、光线、角度、分辨率、天气等,以避免不同招牌图像之间存在差异。This step is aimed at firstly obtaining the image of the signboard of the building corresponding to the location point on the map by the above-mentioned execution subject. And it should be ensured that the same parameters as possible should be kept as much as possible when shooting different signboards, such as shooting equipment, light, angle, resolution, weather, etc., so as to avoid differences between different signboard images.
步骤402:从招牌图像中识别出字符部分,并切割出与字符部分 的每个字符对应的字符图像;Step 402: Identify the character part from the signboard image, and cut out the character image corresponding to each character of the character part;
步骤403:将各字符图像按字符部分的各字符排序进行排列,并将得到的字符图像队列作为相应地图位置点的招牌图像表达。Step 403: Arranging each character image according to the sequence of each character in the character part, and using the obtained character image array as a signboard image representation of the corresponding map location point.
在步骤401的基础上,步骤402旨在由上述执行主体从招牌图像中识别出字符部分,并切割出与字符部分的每个字符对应的字符图像,再通过步骤403将各字符图像按照正确的排序进行排列,得到用于描述招牌图像特征的招牌图像表达。On the basis of step 401, step 402 aims to identify the character part from the signboard image by the above-mentioned executing subject, and cut out the character image corresponding to each character of the character part, and then through step 403, each character image is Sorting is performed to obtain a signboard image expression used to describe the features of the signboard image.
应当理解的是,除本实施例给出的将字符图像队列作为招牌图像表达的实现方式外,还存在其它多种实现方式,例如直接对招牌图像进行冲蚀或刻印等可以突出招牌中的字符图像特征的图像处理,并直接将处理后的图像作为该招牌图像表达。本实施例之所以选择切割出各字符的字符图像,是为了尽可能的对应上该地图位置点的文本表达,构建相同字符的字符表达与招牌图像表达之间的对应关系,以提升两者之间的关联性。It should be understood that, in addition to the implementation of the character image queue as the image expression of the signboard given in this embodiment, there are many other implementations, such as directly eroding or engraving the signboard image to highlight the characters in the signboard Image processing of image features, and directly express the processed image as the signboard image. The reason why this embodiment chooses to cut out the character image of each character is to correspond to the text expression of the map position point as much as possible, and to construct the corresponding relationship between the character expression of the same character and the image expression of the signboard, so as to improve the relationship between the two. interrelationships.
进一步的,为了提升字符识别效果,还可以在从招牌图像中识别出字符部分之前,对招牌图像进行图像异常识别(可以包括模糊识别、噪点识别、歪斜识别中的至少一项),从而仅从被识别为非异常图像的招牌图像中识别出字符部分。或者对被识别为异常图像的招牌图像进行去异常处理后再尝试从中识别出包含的字符。Further, in order to improve the character recognition effect, before identifying the character part from the signboard image, image abnormality recognition (may include at least one of fuzzy recognition, noise recognition, and skew recognition) can be performed on the signboard image, so that only from Parts of characters identified in signboard images identified as non-anomalous images. Or perform de-abnormal processing on the signboard image identified as an abnormal image, and then try to identify the contained characters therefrom.
上述各实施例从各个方面阐述了如何训练得到目标地图模型,为了尽可能的从实际使用场景突出训练出的目标地图模型所起到的效果,本公开还具体提供了一种使用训练好的目标地图模型来解决实际问题的方案,一种定位方法可参见流程500包含的步骤:The above-mentioned embodiments explain how to train the target map model from various aspects. In order to highlight the effect of the trained target map model from the actual use scene as much as possible, the disclosure also specifically provides a method for using the trained target map model. Map model to solve practical problems, a positioning method can refer to the steps included in the process 500:
步骤501:获取对目标建筑物的招牌拍摄得到的定位用图像和当前位置;Step 501: Obtain the image for positioning and the current position obtained by shooting the signboard of the target building;
本步骤旨在由上述执行主体获取对目标建筑物的招牌拍摄得到的定位用图像和当前位置。其中,目标建筑物必然是在用户视野内的某个建筑物,即位于发起定位需求的用户的附近区域,而当前位置则是用户所持的拍摄得到定位用图像的设备内部的定位组件(例如GPS组 件、基站交互组件)返回的地理坐标。This step is aimed at obtaining the image for positioning and the current position obtained by photographing the signboard of the target building by the above-mentioned execution subject. Among them, the target building must be a certain building within the user's field of vision, that is, it is located in the vicinity of the user who initiated the positioning demand, and the current location is the internal positioning component of the device held by the user (such as GPS component, base station interaction component) returns the geographic coordinates.
步骤502:根据当前位置确定实际坐标向量表达、根据定位用图像确定实际招牌图像表达;Step 502: Determine the actual coordinate vector expression according to the current position, and determine the actual signboard image expression according to the positioning image;
在步骤501的基础上,本步骤旨在由上述执行主体根据当前位置确定实际坐标向量表达、根据定位用图像确定实际招牌图像表达,即将坐标转换为相应的向量表达、图像转换为相应的图像表达。On the basis of step 501, this step aims to determine the actual coordinate vector expression according to the current position, and determine the actual signboard image expression according to the positioning image, that is, convert the coordinates into the corresponding vector expression, and convert the image into the corresponding image expression. .
步骤503:调用目标地图模型确定与实际坐标向量表达对应的拍摄位置文本表达;Step 503: call the target map model to determine the text expression of the shooting location corresponding to the actual coordinate vector expression;
在步骤502的基础上,本步骤旨在由上述执行主体调用目标地图模型记录的坐标向量表达与文本表达之间的对应关系,确定与实际坐标向量表达对应的拍摄位置文本表达。On the basis of step 502, this step aims to call the corresponding relationship between the coordinate vector expression and the text expression recorded by the target map model by the above-mentioned execution subject, and determine the text expression of the shooting position corresponding to the actual coordinate vector expression.
步骤504:调用目标地图模型确定与实际招牌图像表达对应的备选文本表达序列;Step 504: Calling the target map model to determine an alternative text expression sequence corresponding to the actual signboard image expression;
在步骤502的基础上,本步骤则由上述执行主体调用目标地图模型记录的招牌图像表达与文本表达之间的对应关系,确定与实际招牌图像表达对应的备选文本表达序列(之所以是备选文本表达序列,是因为实际招牌图像往往因拍摄者的拍摄条件或影响因素不一定包含有完整的图像信息,因此大多数情况会产生多个备选的文本表达)。On the basis of step 502, in this step, the execution subject invokes the corresponding relationship between the signboard image expression and the text expression recorded by the target map model, and determines the alternative text expression sequence corresponding to the actual signboard image expression (the reason is the backup The text expression sequence is selected because the actual signboard images often do not necessarily contain complete image information due to the shooting conditions or influencing factors of the photographer, so in most cases there will be multiple alternative text expressions).
步骤505:基于拍摄位置文本表达与备选文本表达序列中各备选文本表达之间的距离大小,调整各备选文本表达在序列中的呈现优先级排序;Step 505: Based on the distance between the text expression at the shooting location and each alternative text expression in the alternative text expression sequence, adjust the presentation priority of each alternative text expression in the sequence;
在步骤503和步骤504的基础上,本步骤旨在基于拍摄位置文本表达与备选文本表达序列中各备选文本表达之间的距离大小,调整各备选文本表达在序列中的呈现优先级排序。即距离越大,其相应的备选文本表达在序列中的呈现优先级排序越小(例如排在越靠后的位置),反之,距离越小,其相应的备选文本表达在序列中的呈现优先级排序越大(例如排在越靠前的位置)。On the basis of steps 503 and 504, this step aims to adjust the presentation priority of each alternative text expression in the sequence based on the distance between the text expression at the shooting location and each alternative text expression in the alternative text expression sequence Sort. That is, the larger the distance, the lower the priority of the corresponding alternative text expression in the sequence (for example, the lower the position), and on the contrary, the smaller the distance, the lower the priority of the corresponding alternative text expression in the sequence. The higher the presentation priority is (for example, the higher the ranking).
步骤506:基于呈现优先级调整后的备选文本表达序列,定位目标建筑物的实际位置。Step 506: Based on the candidate text expression sequence after the presentation priority adjustment, locate the actual location of the target building.
在步骤505的基础上,本步骤旨在基于呈现优先级调整后的备选 文本表达序列,定位目标建筑物的实际位置。On the basis of step 505, this step aims at locating the actual location of the target building based on the alternative text expression sequence after the presentation priority adjustment.
即通过向用户呈现调整后的备选文本表达序列,更快速、更准确的确定出需要定位位置的目标建筑物的实际位置。That is, by presenting the adjusted alternative text expression sequence to the user, the actual location of the target building that needs to be located can be determined more quickly and accurately.
为加深理解,本公开实施例还基于多模态地理知识增强的指导思想,提供了一种模型预训练方法。In order to deepen the understanding, the embodiment of the present disclosure also provides a model pre-training method based on the guiding idea of multimodal geographic knowledge enhancement.
多模态地理知识增强是通过改进模型结构或增加预训练任务等方法,在预训练过程中,显式得对非通用文本类知识进行学习的模型。具体而言,本实施例所针对的地图使用场景同时利用了文本、地理坐标、招牌图像三个模态的地理领域数据,在模型的预训练阶段通过多任务,改变模型结构的方式,将与现实世界充分关联的地理知识融入预训练模型。上述模型预训练方法的主要部分包括:将地理坐标信息融入模型,以及多模态地理信息融合学习。Multimodal geographic knowledge enhancement is a model that explicitly learns non-universal text knowledge during the pre-training process by improving the model structure or adding pre-training tasks. Specifically, the map usage scenario targeted by this embodiment utilizes geographic domain data in three modalities: text, geographic coordinates, and signboard images. During the pre-training phase of the model, multi-tasks are used to change the structure of the model. Fully relevant geographic knowledge of the real world is incorporated into pre-trained models. The main parts of the above model pre-training method include: integrating geographic coordinate information into the model, and multimodal geographic information fusion learning.
1、地理坐标信息融入模型1. Integrating geographic coordinate information into the model
作为模型训练的输入,大部分表示地理实体的文本都可以准确地与它们在现实世界对应的真实地理区块相关联。因此,可以在现有模型(以预训练模型架构BERT为例,其将接收到的纯文本的每个字符转化为词向量(Token Embeddings)、分隔符向量(Segment Embeddings)、位置向量(POSITION Embeddings)的叠加表示,并送入后续transformer等语义表示层进行上下文建模,最后使用语义表示层建模得到的向量进行诸如掩码语言模型(Masked Language Model)等预训练任务进行训练)的基础上,在字符表示层又添加了地理坐标向量(GEO Embeddings),并与词向量、分隔符向量、位置向量进行融合。As input for model training, most texts representing geographic entities can be accurately associated with their real-world counterparts in real geographic blocks. Therefore, in the existing model (take the pre-training model architecture BERT as an example, it converts each character of the received plain text into a word vector (Token Embeddings), a separator vector (Segment Embeddings), a position vector (POSITION Embeddings) ) superimposed representation, and send it to the subsequent semantic representation layer such as transformer for context modeling, and finally use the vector obtained from semantic representation layer modeling to perform pre-training tasks such as masked language model (Masked Language Model) for training) In the character representation layer, geographic coordinate vectors (GEO Embeddings) are added and fused with word vectors, separator vectors, and position vectors.
而为了将图像特征与文本特征相融合,可将输入如图6所示分为文本序列(上海市川汇路18号XXX眼科医院的文字序列)和图片序列(XXX眼科医院字样的字符图像序列),文本序列采用上述改进后的额外引入地理坐标向量的模型结构进行训练,即文本序列一支的Embed过程代表输入transformer层之前产生的文本特征序列,其中融入了地理位置信息,而图像序列是用预训练好的单字文本检测模型在招牌图像上识别出的每个字的图片,图片序列一支的Embed则代表使 用现有的图片特征提取模型(如ResNet,残差网络)对输出图片序列进行提取后的图像特征表示序列与地理坐标向量的融合。TRM代表transformer层。Co-TRM代表两个不同模态进行的信息交互。In order to integrate image features and text features, the input can be divided into text sequence (text sequence of XXX Eye Hospital, No. 18, Chuanhui Road, Shanghai) and picture sequence (character image sequence of XXX Eye Hospital) as shown in Figure 6. , the text sequence is trained using the above-mentioned improved model structure that additionally introduces geographic coordinate vectors, that is, the Embed process of the text sequence represents the text feature sequence generated before inputting the transformer layer, which incorporates geographic location information, and the image sequence is used The pre-trained single-character text detection model recognizes the picture of each word on the signboard image, and the Embed of a picture sequence represents the use of existing picture feature extraction models (such as ResNet, residual network) to process the output picture sequence The extracted image features represent the fusion of sequences and geographic coordinate vectors. TRM stands for transformer layer. Co-TRM represents information interaction performed by two different modalities.
2、多模态地理信息融合学习:2. Multimodal geographic information fusion learning:
在得到图片与文本表示后,用两个任务对模型进行预训练:After getting the picture and text representation, the model is pre-trained with two tasks:
1)招牌图片掩码任务:总体上,遮挡了输入的文本序列和图像区域的15%,并在给定剩余输入的情况下让模型预测遮挡的部分。对于文本序列,其掩码方式采用经典的MLM(Masked Language Model,掩码语言模型)方式。对于图像序列,把被选定图像的90%区域置0,10%的区域不变。区域的图像特征90%被归零,10%不变。将光学字符识别技术得到的文字识别概率分布作为该图像的标签,并让模型预测相同的分布,最后使用两个分布之间的KL散度(相对熵)作为监督信号对图片侧进行训练。1) Signature Image Masking Task: Overall, occlude 15% of the input text sequence and image area, and let the model predict the occluded part given the remaining input. For text sequences, the masking method adopts the classic MLM (Masked Language Model, masked language model) method. For image sequences, 90% of the selected image area is set to 0, and 10% of the area is unchanged. The image features of the regions are 90% zeroed and 10% unchanged. Use the text recognition probability distribution obtained by optical character recognition technology as the label of the image, and let the model predict the same distribution, and finally use the KL divergence (relative entropy) between the two distributions as a supervisory signal to train the image side.
2)招牌文字匹配任务:给一个定文本序列和一个招牌图片序列,预测文本的描述是否与招牌图片中的表达一致。2) Signboard text matching task: Given a given text sequence and a signboard picture sequence, predict whether the description of the text is consistent with the expression in the signboard picture.
即本实施例通过同时利用了文本、地理坐标、招牌图像三个模态的地理领域数据,在模型的预训练阶段通过多任务,改变模型结构的方式,将与现实世界充分关联的地理知识融入预训练模型,为下游任务建模了更完整的时空语义,以实现提升地图产品中搜索等多种相关功能的效果。That is to say, this embodiment utilizes the geographical field data of three modalities of text, geographic coordinates, and signboard images at the same time, and through multi-task in the pre-training stage of the model, changes the way of the model structure, and integrates the geographical knowledge fully related to the real world into the The pre-trained model models more complete spatio-temporal semantics for downstream tasks, so as to achieve the effect of improving various related functions such as search in map products.
进一步参考图7和图8,作为对上述各图所示方法的实现,本公开分别提供了一种目标地图模型的训练装置实施例和一种定位装置的实施例,目标地图模型的训练装置实施例与图2所示的目标地图模型的训练方法实施例相对应,定位装置实施例与定位方法实施例相对应。上述装置具体可以应用于各种电子设备中。Further referring to FIG. 7 and FIG. 8, as the realization of the methods shown in the above figures, the present disclosure provides an embodiment of a training device for a target map model and an embodiment of a positioning device respectively, and the training device for a target map model implements The example corresponds to the embodiment of the training method for the target map model shown in FIG. 2 , and the embodiment of the positioning device corresponds to the embodiment of the positioning method. The above device can be specifically applied to various electronic devices.
如图7所示,本实施例的目标地图模型的训练装置700可以包括:参数获取单元701、第一子模型训练单元702、第二子模型训练单元703、子模型融合单元704。其中,参数获取单元701,被配置成获取各地图位置点的文本表达、坐标向量表达、招牌图像表达;第一子模型训练单元 702,被配置成根据由对应相同地图位置点的文本表达和坐标向量表达构成的第一训练样本,训练得到第一子模型;第二子模型训练单元703,被配置成根据由对应相同地图位置点的文本表达和招牌图像表达构成的第二训练样本,训练得到第二子模型;子模型融合单元704,被配置成融合第一子模型和第二子模型,得到目标地图模型。As shown in FIG. 7 , the target map model training apparatus 700 of this embodiment may include: a parameter acquisition unit 701 , a first sub-model training unit 702 , a second sub-model training unit 703 , and a sub-model fusion unit 704 . Among them, the parameter acquisition unit 701 is configured to acquire the text expression, coordinate vector expression, and signboard image expression of each map location point; the first sub-model training unit 702 is configured to obtain the text expression and coordinates corresponding to the same map location point The first training sample composed of vector expressions is trained to obtain the first sub-model; the second sub-model training unit 703 is configured to train the second training sample composed of text expressions and signboard image expressions corresponding to the same map location points to obtain The second sub-model; the sub-model fusion unit 704 is configured to fuse the first sub-model and the second sub-model to obtain the target map model.
在本实施例中,目标地图模型的训练装置700中:参数获取单元701、第一子模型训练单元702、第二子模型训练单元703、子模型融合单元704的具体处理及其所带来的技术效果可分别参考图2对应实施例中的步骤201-204的相关说明,在此不再赘述。In this embodiment, in the target map model training device 700: the specific processing of the parameter acquisition unit 701, the first sub-model training unit 702, the second sub-model training unit 703, and the sub-model fusion unit 704 and the resulting For the technical effects, reference may be made to the related descriptions of steps 201-204 in the embodiment corresponding to FIG. 2 , which will not be repeated here.
在本实施例的一些可选的实现方式中,参数获取单元701可以包括被配置成获取各地图位置点的坐标向量表达的坐标向量表达获取子单元,坐标向量表达获取子单元可以包括:In some optional implementations of this embodiment, the parameter acquisition unit 701 may include a coordinate vector expression acquisition subunit configured to acquire the coordinate vector expression of each map location point, and the coordinate vector expression acquisition subunit may include:
边界坐标序列获取模块,被配置成分别获取各地图位置点的边界坐标序列;The boundary coordinate sequence acquisition module is configured to respectively acquire the boundary coordinate sequences of each map location point;
地理编码集合计算模块,被配置成利用地理编码算法和边界坐标序列,计算得到覆盖相应地图位置点所在地理区块的地理编码集合;The geocoding set calculation module is configured to use the geocoding algorithm and the boundary coordinate sequence to calculate the geocoding set covering the geographic block where the corresponding map location point is located;
地理字符串转换模块,被配置成将包含各地理编码的地理编码集合转换为地理字符串;a geographic string conversion module configured to convert a geocode set containing each geocode into a geographic string;
地理向量转换模块,被配置成将地理字符串转换为地理向量,并将地理向量作为相应地图位置点的坐标向量表达。The geographic vector conversion module is configured to convert the geographic character string into a geographic vector, and express the geographic vector as a coordinate vector of a corresponding map location point.
在本实施例的一些可选的实现方式中,地理向量转换模块可以被进一步配置成:In some optional implementations of this embodiment, the geographic vector conversion module may be further configured to:
将地理字符串输入预设的向量表达转换模型;其中,向量表达转换模型用于表征地理字符串与地理向量之间的对应关系;Inputting geographic character strings into a preset vector expression transformation model; wherein, the vector expression transformation model is used to characterize the correspondence between geographic character strings and geographic vectors;
接收向量表达转换模型输出的地理向量。Receives a vector representing the geographic vector output by the transformed model.
在本实施例的一些可选的实现方式中,参数获取单元701可以包括被配置成获取各地图位置点的招牌图像表达的招牌图像表达获取子单元,招牌图像表达获取子单元可以包括:In some optional implementations of this embodiment, the parameter acquiring unit 701 may include a signboard image expression acquiring subunit configured to acquire the signboard image expression of each map location point, and the signboard image expression acquiring subunit may include:
招牌图像获取模块,被配置成分别获取各地图位置点对应建筑的招牌图像;The signboard image acquisition module is configured to respectively acquire the signboard images of the buildings corresponding to each map location point;
字符识别及字符图像切割模块,被配置成从招牌图像中识别出字符部分,并切割出与字符部分的每个字符对应的字符图像;The character recognition and character image cutting module is configured to recognize the character part from the signboard image, and cut out a character image corresponding to each character of the character part;
字符图像排序模块,被配置成将各字符图像按字符部分的各字符排序进行排列,并将得到的字符图像队列作为相应地图位置点的招牌图像表达。The character image sorting module is configured to arrange the character images according to the character sequence of the character part, and use the obtained character image queue as a signboard image expression of the corresponding map location point.
在本实施例的一些可选的实现方式中,招牌图像表达获取子单元还可以包括:In some optional implementations of this embodiment, the signboard image expression acquisition subunit may also include:
异常识别模块,被配置成在从招牌图像中识别出字符部分之前对招牌图像进行图像异常识别;其中,图像异常识别包括模糊识别、噪点识别、歪斜识别中的至少一项;The abnormality recognition module is configured to perform image abnormality recognition on the signboard image before identifying the character part from the signboard image; wherein, the image abnormality recognition includes at least one of fuzzy recognition, noise recognition, and skew recognition;
对应的,字符识别及字符图像切割模块中的字符识别子模块可以被进一步配置成:Correspondingly, the character recognition sub-module in the character recognition and character image cutting module can be further configured as:
仅从被识别为非异常图像的招牌图像中识别出字符部分。Character parts were only identified from signboard images identified as non-anomalous images.
在本实施例的一些可选的实现方式中,子模型融合单元704可以被进一步配置成:In some optional implementations of this embodiment, the sub-model fusion unit 704 may be further configured to:
融合第一子模型和第二子模型,得到初始地图模型;fusing the first sub-model and the second sub-model to obtain an initial map model;
调整初始地图模型的参数直至满足预设的迭代跳出条件,并将满足迭代跳出条件的初始地图模型输出为目标地图模型:Adjust the parameters of the initial map model until the preset iterative jump-out condition is met, and output the initial map model that meets the iterative jump-out condition as the target map model:
如图8所示,本实施例的定位装置800可以包括:定位用图像及当前位置获取单元801、实际坐标向量表达及实际招牌图像表达确定单元802、拍摄位置文本表达确定单元803、备选文本表达序列确定单元804、呈现优先级排序调整单元805。其中,定位用图像及当前位置获取单元801,被配置成获取对目标建筑物的招牌拍摄得到的定位用图像和当前位置;实际坐标向量表达及实际招牌图像表达确定单元802,被配置成根据当前位置确定实际坐标向量表达、根据定位用图像确定实际招牌图像表达;拍摄位置文本表达确定单元803,被配置成调用目标地图模型确定与实际坐标向量表达对应的拍摄位置文本表达;备选文本表达序列确定单元804,被配置成调用目标地图模型确定与实际招牌图像表达对应的备选文本表达序列;呈现优先级排序调整单元805,被配置成基于拍摄位置文 本表达与备选文本表达序列中各备选文本表达之间的距离大小,调整各备选文本表达在序列中的呈现优先级排序;实际位置确定单元806,被配置成基于呈现优先级调整后的备选文本表达序列,定位目标建筑物的实际位置;其中,目标地图模型根据标地图模型的训练装置700得到。As shown in FIG. 8 , the positioning device 800 of this embodiment may include: an image for positioning and a current position acquisition unit 801, an actual coordinate vector expression and an actual signboard image expression determination unit 802, a shooting location text expression determination unit 803, an alternative text An expression sequence determination unit 804 and a presentation priority sorting adjustment unit 805 . Among them, the positioning image and the current position acquisition unit 801 is configured to acquire the positioning image and the current position obtained by photographing the signboard of the target building; the actual coordinate vector expression and actual signboard image expression determination unit 802 is configured to Determine the actual coordinate vector expression for the position, and determine the actual signboard image expression according to the positioning image; the shooting position text expression determination unit 803 is configured to call the target map model to determine the shooting position text expression corresponding to the actual coordinate vector expression; alternative text expression sequence The determination unit 804 is configured to call the target map model to determine the candidate text expression sequence corresponding to the actual sign image expression; the presentation priority ranking adjustment unit 805 is configured to be based on each candidate text expression in the shooting location text expression and the candidate text expression sequence Select the distance between the text expressions, and adjust the presentation priority of each candidate text expression in the sequence; the actual position determination unit 806 is configured to locate the target building based on the candidate text expression sequence after the presentation priority adjustment The actual position of ; wherein, the target map model is obtained according to the training device 700 of the mapped map model.
在本实施例中,定位装置700中:待定位用图像及当前位置获取单元801、实际坐标向量表达及实际招牌图像表达确定单元802、拍摄位置文本表达确定单元803、备选文本表达序列确定单元804、呈现优先级排序调整单元805的具体处理及其所带来的技术效果可分别对应方法实施例中的相关说明,在此不再赘述。In this embodiment, in the positioning device 700: the image to be positioned and the current position acquisition unit 801, the actual coordinate vector expression and the actual signboard image expression determination unit 802, the shooting position text expression determination unit 803, and the alternative text expression sequence determination unit 804. The specific processing of the presentation priority adjusting unit 805 and the technical effects brought about by it may respectively correspond to relevant descriptions in the method embodiments, and will not be repeated here.
本实施例作为对应于上述方法实施例的装置实施例存在,本实施例提供的目标地图模型的训练装置以及定位装置,通过在训练时不仅基于地图位置点的文本表达,还额外引入坐标向量表达和招牌图像表达,使得在多个维度下进行预训练的模型充分利用地图领域的时空大数据让预训练模型蕴含的信息与现实世界产生更多关联,进而在实际应用时可以更好的结合用户的当前位置、拍摄得到的定位用图像,得出更准确的定位结果。This embodiment exists as a device embodiment corresponding to the above-mentioned method embodiment. The training device and positioning device of the target map model provided by this embodiment are not only based on the text expression of the map position point during training, but also additionally introduce the coordinate vector expression And signage image expression, so that the pre-trained model in multiple dimensions can make full use of the spatio-temporal big data in the map field, so that the information contained in the pre-trained model can be more related to the real world, so that it can better integrate users in practical applications The current position of the camera and the positioning image obtained by shooting can obtain more accurate positioning results.
根据本公开的实施例,本公开还提供了一种电子设备,该电子设备包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,该指令被至少一个处理器执行,以使至少一个处理器执行时能够实现上述任一实施例描述的目标地图模型的训练方法和/或定位方法。According to an embodiment of the present disclosure, the present disclosure also provides an electronic device, the electronic device includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information executable by the at least one processor. instructions, the instructions are executed by at least one processor, so that the at least one processor can implement the training method and/or positioning method for the target map model described in any of the above embodiments when executed.
根据本公开的实施例,本公开还提供了一种可读存储介质,该可读存储介质存储有计算机指令,该计算机指令用于使计算机执行时能够实现上述任一实施例描述的目标地图模型的训练方法和/或定位方法。According to an embodiment of the present disclosure, the present disclosure also provides a readable storage medium, the readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to implement the target map model described in any of the above embodiments. training methods and/or positioning methods.
本公开实施例提供了一种计算机程序产品,该计算机程序在被处理器执行时能够实现上述任一实施例描述的目标地图模型的训练方法和/或定位方法。An embodiment of the present disclosure provides a computer program product. When the computer program is executed by a processor, the method for training the target map model and/or the positioning method described in any of the above embodiments can be implemented.
图9示出了可以用来实施本公开的实施例的示例电子设备900的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图9所示,设备900包括计算单元901,其可以根据存储在只读存储器(ROM)902中的计算机程序或者从存储单元908加载到随机访问存储器(RAM)903中的计算机程序,来执行各种适当的动作和处理。在RAM 903中,还可存储设备900操作所需的各种程序和数据。计算单元901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。As shown in FIG. 9 , the device 900 includes a computing unit 901 that can execute according to a computer program stored in a read-only memory (ROM) 902 or loaded from a storage unit 908 into a random-access memory (RAM) 903. Various appropriate actions and treatments. In the RAM 903, various programs and data necessary for the operation of the device 900 can also be stored. The computing unit 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904 .
设备900中的多个部件连接至I/O接口905,包括:输入单元906,例如键盘、鼠标等;输出单元907,例如各种类型的显示器、扬声器等;存储单元908,例如磁盘、光盘等;以及通信单元909,例如网卡、调制解调器、无线通信收发机等。通信单元909允许设备900通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 900 are connected to the I/O interface 905, including: an input unit 906, such as a keyboard, a mouse, etc.; an output unit 907, such as various types of displays, speakers, etc.; a storage unit 908, such as a magnetic disk, an optical disk, etc. ; and a communication unit 909, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
计算单元901可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元901的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元901执行上文所描述的各个方法和处理,例如目标地图模型的训练方法和/或定位方法。例如,在一些实施例中,目标地图模型的训练方法和/或定位方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元908。在一些实施例中,计算机程序的部分或者全部可以经由ROM902和/或通信单元909而被载入和/或安装到设备900上。当计算机程序加载到RAM 903并由计算单元901执行时,可以执行上文描述的目 标地图模型的训练方法和/或定位方法的一个或多个步骤。备选地,在其他实施例中,计算单元901可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行目标地图模型的训练方法和/或定位方法。The computing unit 901 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 901 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 executes various methods and processes described above, such as a training method and/or a positioning method of a target map model. For example, in some embodiments, the object map model training method and/or localization method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 908 . In some embodiments, part or all of the computer program may be loaded and/or installed on the device 900 via the ROM 902 and/or the communication unit 909 . When the computer program is loaded into RAM 903 and executed by computing unit 901, one or more steps of the training method and/or positioning method of the target map model described above can be performed. Alternatively, in other embodiments, the computing unit 901 may be configured in any other appropriate way (for example, by means of firmware) to execute a target map model training method and/or positioning method.
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决传统物理主机与虚拟专用服务器(VPS,Virtual Private Server)服务中存在的管理难度大,业务扩展性弱的缺陷。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the management difficulties in traditional physical host and virtual private server (VPS, Virtual Private Server) services Large and weak business expansion.
本公开实施例的技术方案,通过在训练时不仅基于地图位置点的文本表达,还额外引入坐标向量表达和招牌图像表达,使得在多个维度下进行预训练的模型充分利用地图领域的时空大数据让预训练模型蕴含的信息与现实世界产生更多关联,进而在实际应用时可以更好的结合用户的当前位置、拍摄得到的定位用图像,得出更准确的定位结果。The technical solutions of the embodiments of the present disclosure not only based on the text expression of map location points, but also additionally introduce coordinate vector expression and signboard image expression during training, so that the pre-trained model in multiple dimensions can make full use of the spatiotemporal space of the map field. The data makes the information contained in the pre-training model more relevant to the real world, and in actual application, it can better combine the user's current location and the captured positioning images to obtain more accurate positioning results.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims (17)

  1. 一种目标地图模型的训练方法,包括:A training method for a target map model, comprising:
    获取各地图位置点的文本表达、坐标向量表达、招牌图像表达;Obtain the text expression, coordinate vector expression, and signboard image expression of each map location point;
    根据由对应相同地图位置点的文本表达和坐标向量表达构成的第一训练样本,训练得到第一子模型;training to obtain a first sub-model according to a first training sample composed of a text expression corresponding to the same map position point and a coordinate vector expression;
    根据由对应相同地图位置点的文本表达和招牌图像表达构成的第二训练样本,训练得到第二子模型;training to obtain a second sub-model according to a second training sample composed of a text expression corresponding to the same map location point and a signboard image expression;
    融合所述第一子模型和第二子模型,得到目标地图模型。The first sub-model and the second sub-model are fused to obtain a target map model.
  2. 根据权利要求1所述的方法,其中,获取各所述地图位置点的坐标向量表达,包括:The method according to claim 1, wherein obtaining the coordinate vector expression of each said map location point comprises:
    分别获取各所述地图位置点的边界坐标序列;Respectively obtain the boundary coordinate sequence of each said map location point;
    利用地理编码算法和所述边界坐标序列,计算得到覆盖相应地图位置点所在地理区块的地理编码集合;Using a geocoding algorithm and the boundary coordinate sequence to calculate a geocoding set covering the geographic block where the corresponding map location point is located;
    将包含各地理编码的地理编码集合转换为地理字符串;convert a geocode collection containing each geocode into a geostring;
    将所述地理字符串转换为地理向量,并将所述地理向量作为相应地图位置点的坐标向量表达。The geographic string is converted into a geographic vector, and the geographic vector is expressed as a coordinate vector of a corresponding map location point.
  3. 根据权利要求2所述的方法,其中,所述将所述地理字符串转换为地理向量,包括:The method according to claim 2, wherein said converting said geographic character string into a geographic vector comprises:
    将所述地理字符串输入预设的向量表达转换模型;其中,所述向量表达转换模型用于表征地理字符串与地理向量之间的对应关系;Inputting the geographic character string into a preset vector expression conversion model; wherein, the vector expression conversion model is used to characterize the correspondence between geographic character strings and geographic vectors;
    接收所述向量表达转换模型输出的地理向量。Receives the vector representation to transform the model output to a geographic vector.
  4. 根据权利要求1所述的方法,其中,获取各所述地图位置点的招牌图像表达,包括:The method according to claim 1, wherein obtaining the signboard image expression of each said map location point comprises:
    分别获取各所述地图位置点对应建筑的招牌图像;Respectively acquire the signboard image of the building corresponding to each said map location point;
    从所述招牌图像中识别出字符部分,并切割出与所述字符部分的每个字符对应的字符图像;identifying a character part from the signboard image, and cutting out a character image corresponding to each character of the character part;
    将各所述字符图像按所述字符部分的各字符排序进行排列,并将得到的字符图像队列作为相应地图位置点的招牌图像表达。The character images are arranged according to the order of the characters in the character part, and the obtained character image array is expressed as a signboard image of a corresponding map location point.
  5. 根据权利要求4所述的方法,其中,在所述从所述招牌图像中识别出字符部分之前,还包括:The method according to claim 4, wherein, before identifying the character part from the signboard image, further comprising:
    对所述招牌图像进行图像异常识别;其中,所述图像异常识别包括模糊识别、噪点识别、歪斜识别中的至少一项;Perform image abnormality recognition on the signboard image; wherein, the image abnormality recognition includes at least one of fuzzy recognition, noise recognition, and skew recognition;
    对应的,所述从所述招牌图像中识别出字符部分,包括:Correspondingly, the identifying the character part from the signboard image includes:
    仅从被识别为非异常图像的招牌图像中识别出所述字符部分。The character portion was identified only from signboard images identified as non-anomalous images.
  6. 根据权利要求1-5任一项所述的方法,其中,所述融合所述第一子模型和第二子模型,得到目标地图模型,包括:The method according to any one of claims 1-5, wherein said merging said first sub-model and second sub-model to obtain a target map model comprises:
    融合所述第一子模型和所述第二子模型,得到初始地图模型;fusing the first sub-model and the second sub-model to obtain an initial map model;
    调整所述初始地图模型的参数直至满足预设的迭代跳出条件,并将满足所述迭代跳出条件的初始地图模型输出为所述目标地图模型。Adjusting the parameters of the initial map model until a preset iterative jump-out condition is met, and outputting the initial map model satisfying the iterative jump-out condition as the target map model.
  7. 一种定位方法,包括:A positioning method, comprising:
    获取对目标建筑物的招牌拍摄得到的定位用图像和当前位置;Obtain the image for positioning and the current position obtained by photographing the signboard of the target building;
    根据所述当前位置确定实际坐标向量表达、根据所述定位用图像确定实际招牌图像表达;determining the actual coordinate vector expression according to the current position, and determining the actual signboard image expression according to the positioning image;
    调用目标地图模型确定与所述实际坐标向量表达对应的拍摄位置文本表达;其中,所述目标地图模型根据权利要求1-6中任一项所述的目标地图模型的训练方法得到;Calling the target map model to determine the shooting position text expression corresponding to the actual coordinate vector expression; wherein, the target map model is obtained according to the training method of the target map model described in any one of claims 1-6;
    调用所述目标地图模型确定与所述实际招牌图像表达对应的备选文本表达序列;Calling the target map model to determine an alternative text expression sequence corresponding to the actual signboard image expression;
    基于所述拍摄位置文本表达与所述备选文本表达序列中各备选文本表达之间的距离大小,调整各所述备选文本表达在序列中的呈现优先级排序;Adjusting the presentation priority of each of the alternative text expressions in the sequence based on the distance between the text expression at the shooting location and each alternative text expression in the alternative text expression sequence;
    基于呈现优先级调整后的备选文本表达序列,定位所述目标建筑物的实际位置。The actual location of the target building is located based on the candidate text expression sequence after the presentation priority adjustment.
  8. 一种目标地图模型的训练装置,包括:A training device for a target map model, comprising:
    参数获取单元,被配置成获取各地图位置点的文本表达、坐标向量表达、招牌图像表达;The parameter acquisition unit is configured to acquire the text expression, coordinate vector expression, and signboard image expression of each map location point;
    第一子模型训练单元,被配置成根据由对应相同地图位置点的文本表达和坐标向量表达构成的第一训练样本,训练得到第一子模型;The first sub-model training unit is configured to train the first sub-model according to the first training sample composed of the text expression and the coordinate vector expression corresponding to the same map location point;
    第二子模型训练单元,被配置成根据由对应相同地图位置点的文本表达和招牌图像表达构成的第二训练样本,训练得到第二子模型;The second sub-model training unit is configured to train a second sub-model according to a second training sample composed of a text expression corresponding to the same map location point and a signboard image expression;
    子模型融合单元,被配置成融合所述第一子模型和第二子模型,得到目标地图模型。The sub-model fusion unit is configured to fuse the first sub-model and the second sub-model to obtain a target map model.
  9. 根据权利要求8所述的装置,其中,所述参数获取单元包括被配置成获取各所述地图位置点的坐标向量表达的坐标向量表达获取子单元,所述坐标向量表达获取子单元包括:The device according to claim 8, wherein the parameter acquisition unit includes a coordinate vector expression acquisition subunit configured to acquire the coordinate vector expression of each of the map position points, and the coordinate vector expression acquisition subunit includes:
    边界坐标序列获取模块,被配置成分别获取各所述地图位置点的边界坐标序列;The boundary coordinate sequence acquisition module is configured to respectively acquire the boundary coordinate sequence of each said map location point;
    地理编码集合计算模块,被配置成利用地理编码算法和所述边界坐标序列,计算得到覆盖相应地图位置点所在地理区块的地理编码集合;The geocoding set calculation module is configured to use the geocoding algorithm and the boundary coordinate sequence to calculate the geocoding set covering the geographic block where the corresponding map location point is located;
    地理字符串转换模块,被配置成将包含各地理编码的地理编码集合转换为地理字符串;a geographic string conversion module configured to convert a geocode set containing each geocode into a geographic string;
    地理向量转换模块,被配置成将所述地理字符串转换为地理向量,并将所述地理向量作为相应地图位置点的坐标向量表达。The geographic vector conversion module is configured to convert the geographic character string into a geographic vector, and express the geographic vector as a coordinate vector of a corresponding map location point.
  10. 根据权利要求9所述的装置,其中,所述地理向量转换模块被进一步配置成:The apparatus of claim 9, wherein the geographic vector conversion module is further configured to:
    将所述地理字符串输入预设的向量表达转换模型;其中,所述向量表达转换模型用于表征地理字符串与地理向量之间的对应关系;Inputting the geographic character string into a preset vector expression conversion model; wherein, the vector expression conversion model is used to characterize the correspondence between geographic character strings and geographic vectors;
    接收所述向量表达转换模型输出的地理向量。Receives the vector representation to transform the model output to a geographic vector.
  11. 根据权利要求8所述的装置,其中,所述参数获取单元包括被配 置成获取各所述地图位置点的招牌图像表达的招牌图像表达获取子单元,所述招牌图像表达获取子单元包括:The device according to claim 8, wherein the parameter acquisition unit comprises a signboard image expression acquisition subunit configured to acquire the signboard image expression of each of the map location points, the signboard image expression acquisition subunit comprising:
    招牌图像获取模块,被配置成分别获取各所述地图位置点对应建筑的招牌图像;The signboard image acquisition module is configured to respectively acquire the signboard images of the buildings corresponding to the map location points;
    字符识别及字符图像切割模块,被配置成从所述招牌图像中识别出字符部分,并切割出与所述字符部分的每个字符对应的字符图像;The character recognition and character image cutting module is configured to recognize a character part from the signboard image, and cut out a character image corresponding to each character of the character part;
    字符图像排序模块,被配置成将各所述字符图像按所述字符部分的各字符排序进行排列,并将得到的字符图像队列作为相应地图位置点的招牌图像表达。The character image sorting module is configured to arrange the character images according to the character sequence of the character part, and use the obtained character image queue as a signboard image representation of the corresponding map location point.
  12. 根据权利要求11所述的装置,其中,所述招牌图像表达获取子单元还包括:The device according to claim 11, wherein the signboard image expression acquisition subunit further comprises:
    异常识别模块,被配置成在所述从所述招牌图像中识别出字符部分之前对所述招牌图像进行图像异常识别;其中,所述图像异常识别包括模糊识别、噪点识别、歪斜识别中的至少一项;The abnormality recognition module is configured to perform image abnormality recognition on the signboard image before the character part is recognized from the signboard image; wherein, the image abnormality recognition includes at least one of fuzzy recognition, noise recognition, and skew recognition one item;
    对应的,所述字符识别及字符图像切割模块中的字符识别子模块被进一步配置成:Correspondingly, the character recognition submodule in the character recognition and character image cutting module is further configured to:
    仅从被识别为非异常图像的招牌图像中识别出所述字符部分。The character portion was identified only from signboard images identified as non-anomalous images.
  13. 根据权利要求8-12任一项所述的装置,其中,所述子模型融合单元被进一步配置成:The device according to any one of claims 8-12, wherein the sub-model fusion unit is further configured to:
    融合所述第一子模型和所述第二子模型,得到初始地图模型;fusing the first sub-model and the second sub-model to obtain an initial map model;
    调整所述初始地图模型的参数直至满足预设的迭代跳出条件,并将满足所述迭代跳出条件的初始地图模型输出为所述目标地图模型。Adjusting the parameters of the initial map model until a preset iterative jump-out condition is met, and outputting the initial map model satisfying the iterative jump-out condition as the target map model.
  14. 一种定位装置,包括:A positioning device, comprising:
    定位用图像及当前位置获取单元,被配置成获取对目标建筑物的招牌拍摄得到的定位用图像和当前位置;An image for positioning and a current position acquisition unit configured to acquire an image for positioning and a current position obtained by photographing a signboard of the target building;
    实际坐标向量表达及实际招牌图像表达确定单元,被配置成根据所述当前位置确定实际坐标向量表达、根据所述定位用图像确定实际招牌 图像表达;The actual coordinate vector expression and actual signboard image expression determining unit is configured to determine the actual coordinate vector expression according to the current position, and determine the actual signboard image expression according to the positioning image;
    拍摄位置文本表达确定单元,被配置成调用目标地图模型确定与所述实际坐标向量表达对应的拍摄位置文本表达;其中,所述目标地图模型根据权利要求8-13中任一项所述的目标地图模型的训练装置得到;The shooting position text expression determination unit is configured to call the target map model to determine the shooting position text expression corresponding to the actual coordinate vector expression; wherein, the target map model is according to the target described in any one of claims 8-13 The training device of the map model is obtained;
    备选文本表达序列确定单元,被配置成调用所述目标地图模型确定与所述实际招牌图像表达对应的备选文本表达序列;An alternative text expression sequence determination unit configured to invoke the target map model to determine an alternative text expression sequence corresponding to the actual signboard image expression;
    呈现优先级排序调整单元,被配置成基于所述拍摄位置文本表达与所述备选文本表达序列中各备选文本表达之间的距离大小,调整各所述备选文本表达在序列中的呈现优先级排序;The presentation priority sorting adjustment unit is configured to adjust the presentation of each of the candidate text expressions in the sequence based on the distance between the shooting position text expression and each candidate text expression in the candidate text expression sequence prioritization;
    实际位置确定单元,被配置成基于呈现优先级调整后的备选文本表达序列,定位所述目标建筑物的实际位置。The actual location determining unit is configured to locate the actual location of the target building based on the presentation priority-adjusted candidate text expression sequence.
  15. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的目标地图模型的训练方法和/或权利要求7所述的定位方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1-6. The training method of the target map model and/or the positioning method described in claim 7.
  16. 一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行权利要求1-6中任一项所述的目标地图模型的训练方法和/或权利要求7所述的定位方法。A non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to make the computer execute the method for training the target map model according to any one of claims 1-6 and/or claim 7 The positioning method described.
  17. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现根据权利要求1-6中任一项所述目标地图模型的训练方法的步骤和/或权利要求7所述定位方法的步骤。A computer program product, comprising a computer program, when the computer program is executed by a processor, it realizes the steps of the method for training the target map model according to any one of claims 1-6 and/or the positioning method according to claim 7 A step of.
PCT/CN2022/104939 2021-10-18 2022-07-11 Method for training target map model, positioning method, and related apparatuses WO2023065731A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111211145.2A CN113947147B (en) 2021-10-18 2021-10-18 Training method, positioning method and related device of target map model
CN202111211145.2 2021-10-18

Publications (1)

Publication Number Publication Date
WO2023065731A1 true WO2023065731A1 (en) 2023-04-27

Family

ID=79331241

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104939 WO2023065731A1 (en) 2021-10-18 2022-07-11 Method for training target map model, positioning method, and related apparatuses

Country Status (2)

Country Link
CN (1) CN113947147B (en)
WO (1) WO2023065731A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114185908A (en) * 2021-12-13 2022-03-15 北京百度网讯科技有限公司 Map data processing method and device, electronic equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947147B (en) * 2021-10-18 2023-04-18 北京百度网讯科技有限公司 Training method, positioning method and related device of target map model
CN114842464A (en) * 2022-05-13 2022-08-02 北京百度网讯科技有限公司 Image direction recognition method, device, equipment, storage medium and program product
CN114926655B (en) * 2022-05-20 2023-09-26 北京百度网讯科技有限公司 Training method and position determining method of geographic and visual cross-mode pre-training model
CN114998684B (en) * 2022-05-20 2023-06-23 北京百度网讯科技有限公司 Training method and positioning adjustment method for geographic and visual cross-mode pre-training model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170004374A1 (en) * 2015-06-30 2017-01-05 Yahoo! Inc. Methods and systems for detecting and recognizing text from images
CN111461203A (en) * 2020-03-30 2020-07-28 北京百度网讯科技有限公司 Cross-modal processing method and device, electronic equipment and computer storage medium
CN111737383A (en) * 2020-05-21 2020-10-02 百度在线网络技术(北京)有限公司 Method for extracting spatial relation of geographic position points and method and device for training extraction model
CN112633380A (en) * 2020-12-24 2021-04-09 北京百度网讯科技有限公司 Interest point feature extraction method and device, electronic equipment and storage medium
CN113947147A (en) * 2021-10-18 2022-01-18 北京百度网讯科技有限公司 Training method and positioning method of target map model and related devices

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052962B (en) * 2021-04-02 2022-08-19 北京百度网讯科技有限公司 Model training method, information output method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170004374A1 (en) * 2015-06-30 2017-01-05 Yahoo! Inc. Methods and systems for detecting and recognizing text from images
CN111461203A (en) * 2020-03-30 2020-07-28 北京百度网讯科技有限公司 Cross-modal processing method and device, electronic equipment and computer storage medium
CN111737383A (en) * 2020-05-21 2020-10-02 百度在线网络技术(北京)有限公司 Method for extracting spatial relation of geographic position points and method and device for training extraction model
CN112633380A (en) * 2020-12-24 2021-04-09 北京百度网讯科技有限公司 Interest point feature extraction method and device, electronic equipment and storage medium
CN113947147A (en) * 2021-10-18 2022-01-18 北京百度网讯科技有限公司 Training method and positioning method of target map model and related devices

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114185908A (en) * 2021-12-13 2022-03-15 北京百度网讯科技有限公司 Map data processing method and device, electronic equipment and storage medium
CN114185908B (en) * 2021-12-13 2024-02-06 北京百度网讯科技有限公司 Map data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113947147A (en) 2022-01-18
CN113947147B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
WO2023065731A1 (en) Method for training target map model, positioning method, and related apparatuses
WO2023020045A1 (en) Training method for text recognition model, and text recognition method and apparatus
CN113313022B (en) Training method of character recognition model and method for recognizing characters in image
CN111259671B (en) Semantic description processing method, device and equipment for text entity
KR20230005408A (en) Method and apparatus for extracting multi-modal POI features
US20210295088A1 (en) Image detection method, device, storage medium and computer program product
JP2023510906A (en) Method and Apparatus for Extracting Geographic Location Point Spatial Relationships, Training an Extraction Model
KR20230004391A (en) Method and apparatus for processing video, method and apparatus for querying video, training method and apparatus for video processing model, electronic device, storage medium, and computer program
WO2023124005A1 (en) Map point of interest query method and apparatus, device, storage medium, and program product
US20220358292A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
US20230306081A1 (en) Method for training a point cloud processing model, method for performing instance segmentation on point cloud, and electronic device
CN114861889B (en) Deep learning model training method, target object detection method and device
CN115359383B (en) Cross-modal feature extraction and retrieval and model training method, device and medium
US20230215203A1 (en) Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium
CN116152833B (en) Training method of form restoration model based on image and form restoration method
US20240177506A1 (en) Method and Apparatus for Generating Captioning Device, and Method and Apparatus for Outputting Caption
CN114120166B (en) Video question-answering method and device, electronic equipment and storage medium
KR102565798B1 (en) Method and device for extracting spatial relationship of geographic location points
CN113360683A (en) Method for training cross-modal retrieval model and cross-modal retrieval method and device
WO2021104274A1 (en) Image and text joint representation search method and system, and server and storage medium
CN117114063A (en) Method for training a generative large language model and for processing image tasks
US20220327803A1 (en) Method of recognizing object, electronic device and storage medium
CN114863450B (en) Image processing method, device, electronic equipment and storage medium
CN115687587A (en) Internet of things equipment and space object association matching method, device, equipment and medium based on position information
CN115565186A (en) Method and device for training character recognition model, electronic equipment and storage medium