WO2022198854A1 - Method and apparatus for extracting multi-modal poi feature - Google Patents

Method and apparatus for extracting multi-modal poi feature Download PDF

Info

Publication number
WO2022198854A1
WO2022198854A1 PCT/CN2021/107383 CN2021107383W WO2022198854A1 WO 2022198854 A1 WO2022198854 A1 WO 2022198854A1 CN 2021107383 W CN2021107383 W CN 2021107383W WO 2022198854 A1 WO2022198854 A1 WO 2022198854A1
Authority
WO
WIPO (PCT)
Prior art keywords
poi
feature representation
feature
sample
image
Prior art date
Application number
PCT/CN2021/107383
Other languages
French (fr)
Chinese (zh)
Inventor
范淼
黄际洲
王海峰
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to KR1020227044369A priority Critical patent/KR20230005408A/en
Priority to JP2022576469A priority patent/JP2023529939A/en
Publication of WO2022198854A1 publication Critical patent/WO2022198854A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/908Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/909Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Definitions

  • the present disclosure relates to the field of computer application technology, and in particular to big data technology in the field of artificial intelligence.
  • POI Point of Interest, point of interest
  • the number of POIs represents the value of the entire system to a certain extent.
  • Comprehensive POI information is the necessary information to enrich the map information system.
  • each POI includes at least multiple modal information, such as name, coordinates, and image.
  • the digital medium and presentation of this information vary. For example, names are generally text in a certain language, coordinates are generally numbers in at least two dimensions, and images are in the form of images. Therefore, a multimodal POI refers to a physical entity described by multiple digital media.
  • the POI information is stored in a relational database.
  • the POI information needs to be queried from the relational database. This requires the ability to quickly calculate the similarity of multi-modal POI, and the calculation of similarity is based on POI features, so how to extract POI features becomes the key.
  • the present disclosure provides a method and apparatus for extracting multimodal POI features.
  • a method for extracting multimodal POI features including:
  • the visual feature representation, semantic feature representation and spatial feature representation of the POI are fused to obtain the multimodal feature representation of the POI.
  • a device for extracting multimodal POI features comprising:
  • a visual feature extraction module for extracting the visual feature representation of the POI from the image of the POI by using an image feature extraction model
  • a semantic feature extraction module for extracting semantic feature representation from the text information of the POI by using a text feature extraction model
  • a spatial feature extraction module for extracting a spatial feature representation from the spatial location information of the POI by using a spatial feature extraction model
  • the feature fusion module is used for fusing the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain the multi-modal feature representation of the POI.
  • an electronic device comprising:
  • the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method as described above.
  • a computer program product comprising a computer program, when executed by a processor, implements the method as described above.
  • the embodiments of the present disclosure provide a method to extract feature vector representations of multiple modal fusions for each POI, thereby providing a basis for subsequent similarity calculation between POIs.
  • FIG. 1 is a flowchart of a method for extracting multimodal POI features provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a training image feature extraction model provided by an embodiment of the present disclosure
  • FIG. 3 is a training flow chart of a fully connected network provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of an apparatus for extracting multi-modal POI features provided by an embodiment of the present disclosure
  • FIG. 5 is a block diagram of an electronic device used to implement embodiments of the present disclosure.
  • the similarity calculation is usually performed on the images of two POIs, the similarity calculation is performed on the names of the two POIs, and the similarity calculation is performed on the coordinates of the two POIs. That is to say, it is necessary to separately calculate the similarity of the features of different modalities, which is computationally complex and time-consuming.
  • the core idea of the present disclosure is to extract feature representations fused by multiple modalities for each POI, so as to provide a basis for subsequent similarity calculation between POIs. The method provided by the present disclosure will be described in detail below with reference to the embodiments.
  • FIG. 1 is a flowchart of a method for extracting multi-modal POI features according to an embodiment of the present disclosure, and the execution body of the method is an apparatus for extracting multi-modal POI features.
  • the device can be embodied as an application located on the server side, or can also be embodied as a plug-in or a software development kit (Software Development Kit, SDK) and other functional units in the application located on the server side, or can also be located in a computer with strong computing power. terminal, which is not particularly limited in this embodiment of the present invention.
  • the method may include the following steps:
  • a visual feature representation of the POI is extracted from the image of the POI using an image feature extraction model.
  • semantic feature representations are extracted from the textual information of the POI using a textual feature extraction model.
  • a spatial feature representation is extracted from the spatial location information of the POI using a spatial feature extraction model.
  • the visual feature representation, semantic feature representation and spatial feature representation of the POI are fused to obtain a multimodal feature representation of the POI.
  • Steps 101 to 103 shown in the above embodiment are only one of the implementation sequences, and other sequences may be used to execute sequentially, or may be executed in parallel.
  • step 101 ie "using an image feature extraction model to extract a visual feature representation of POI from an image containing a POI signboard" will be described in detail.
  • the image in the POI information is usually the image containing the POI signboard.
  • the real picture includes the sign of the store, and the sign usually includes the name of the store, and some also includes the slogan of the store.
  • Another example is to take a real picture of a building, and the real picture contains the signboard of the building, and the signboard is usually the name of the building.
  • Another example is to take a real picture of a school, which contains the school's signboard, which is usually the name of the school.
  • a visual feature representation can be extracted from an image containing the shape of the main body of the building. Images of these POIs can be obtained from the POI database.
  • this step may specifically include the following steps S11 to S12:
  • step S11 the signboard area is extracted from the image containing the POI signboard using the object detection technique.
  • a pre-trained signboard discrimination model can be used. First, the real image is divided into regions. Because the signboard in the real image is a closed area in general, the real image can be identified and divided into regions. For the determined closed area, the signboard discrimination model is input, and the signboard discrimination model is output. The judgment result of whether the closed area is a signboard area.
  • the signboard discrimination model is actually a classification model. Some real images can be collected in advance, and the signboard area and the non-signboard area can be marked as positive and negative samples respectively, and then the signboard discrimination model can be obtained by training the classification model.
  • step S12 the visual feature representation of POI is extracted from the signboard area by using the image feature extraction model obtained by pre-training.
  • the image feature extraction model can be pre-trained based on a deep neural network. After the signboard area is input into the image feature extraction model, the image feature extraction model extracts the visual feature representation of POI from the signboard area.
  • Training samples can be obtained first.
  • the training sample used for training the image feature extraction model is referred to as the first training sample.
  • the expressions such as “first” and “second” involved in the present disclosure do not have a limiting effect on quantity, order, size, etc., but are only used to distinguish names.
  • the above-mentioned first training samples include image samples and category labels of the image samples.
  • the annotation about the category can be the object embodied by the image, for example, an image containing a cat is annotated as cat, and an image containing a dog is annotated as a dog.
  • the category annotation can also be the category of the object represented by the image. For example, an image containing a specific hospital is marked as a hospital, and an image containing a specific school is marked as a school.
  • the image samples are then used as the input of the deep neural network, as shown in Figure 2, and the category annotations of the image samples are used as the target output of the classification network.
  • two networks are involved, that is, a deep neural network and a classification network.
  • the deep neural network extracts the visual feature representation from the image sample and then inputs it into the classification network, and the classification network outputs the classification result of the image sample according to the visual feature representation.
  • the training objective is to minimize the difference between the classification results output by the classification network and the corresponding class labels.
  • the image feature extraction model is obtained by using the deep neural network obtained by training. That is to say, a deep neural network and a classification network are used during training, but the final image feature extraction model only uses the deep neural network, and the classification network is used to assist the training of the deep neural network.
  • the deep neural network used in the above training process can be adopted but not limited to ResNet (Residual Network, Residual Network) 50, ResNet101, EfficientNet (Efficient Network) and the like.
  • the loss function adopted by the classification network can be adopted but not limited to Large-Softmax, A-Softmax, AM-Softmax, cosface, arcface, etc.
  • step 102 ie "using a text feature extraction model to extract semantic feature representation from text information of POI" will be described in detail.
  • the text information of the POI involved in this step may be the text information of the POI obtained from the POI database, such as the POI name, description information, evaluation information, and so on. It may also be text information of the POI recognized from the image including the POI signboard using the text recognition technology. That is, after identifying the signboard area from the image containing the POI signboard, OCR (Optical Character Recognition, Optical Character Recognition) is used to identify the text from the signboard area, such as the name of the POI, advertising slogans, etc., as the text information of the POI.
  • OCR Optical Character Recognition, Optical Character Recognition
  • the text feature extraction model utilized in this step can adopt but is not limited to the following:
  • the first is the Wording Embedding model.
  • Wording Embedding models such as Word2Vec (word vector), Glove, etc. can be used.
  • the second, pre-trained language model The second, pre-trained language model.
  • pre-trained language models such as Bert (Bidirectional Encoder Representations from Transformers, bidirectional encoding representation from Transformers) and Ernie (Enhanced Representation from kNowledge IntEgration, using entity information to enhance language representation) can be used.
  • the third is to use the existing POI text data to fine-tune the pre-trained language model.
  • step 103 that is, "using a spatial feature extraction model to extract a spatial feature representation from the spatial location information of the POI" will be described in detail below.
  • the spatial position information of the POI involved in this step mainly refers to the information for marking the spatial position of the POI in a certain form, such as coordinate information.
  • the spatial feature representation can be extracted directly from the spatial location information of the POI by using the spatial feature extraction model.
  • the present disclosure provides a preferred embodiment, which may specifically include the following steps S21 to S22:
  • step S21 hash coding is performed on the spatial location information of the POI to obtain a hash code.
  • geohash latitude and longitude address encoding
  • goehash uses a string to represent the two coordinates of precision and dimension. After goehash encoding, the first few digits of the hash code of the two coordinates located in the same block are the same, and only the last few digits are distinguished.
  • step S22 the hash code is converted into a spatial feature representation using a spatial feature extraction model.
  • the spatial feature extraction model used in this step can use the Word Emedding model, that is, the hash code is converted into a quantifiable spatial feature representation by this embedding method.
  • the similarity task can be used to further train it.
  • the training target is: the closer the position of the two POIs, the higher the similarity between the spatial feature representations output by the Wording Embedding model.
  • step 104 namely, "merging the visual feature representation, semantic feature representation and spatial feature representation of POI to obtain a multi-modal feature representation of POI" will be described in detail below.
  • the visual feature representation, semantic feature representation and spatial feature representation of the above POI can be directly spliced, and the spliced feature can be used as the multi-modal feature representation of the POI.
  • this method is relatively rigid, lacks learning ability, and is naturally inaccurate in expression.
  • the present disclosure provides a preferred fusion method, which may specifically include the following steps S31 to S32:
  • step S31 the visual feature representation, semantic feature representation and spatial feature representation of the POI are spliced to obtain spliced features.
  • the visual feature representation, the semantic feature representation and the spatial feature representation can be spliced end to end in a preset order.
  • the dimension of the vector represented by the feature is different, and a preset value such as 0 can be used to make up.
  • step S32 the splicing feature is input into a pre-trained full connection network (Full Connection), and the multimodal feature representation of the POI output by the full connection network is obtained.
  • Full Connection Full Connection
  • the process may include the following steps:
  • a second training sample is obtained, where the second training sample includes POI samples and category labels for the POI samples.
  • Some POIs with image, text and spatial location information can be obtained in advance as POI samples, and the categories of these POIs can be annotated.
  • the labels are hospitals, buildings, schools, bus stops, shops, etc. These POI samples and their category labels are used as second training samples to train the fully connected network used in feature fusion.
  • a visual feature representation of the POI sample is extracted from the image of the POI sample using an image feature extraction model.
  • a textual feature extraction model is used to extract semantic feature representations from the textual information of the POI samples.
  • a spatial feature representation is extracted from the spatial location information of the POI samples using a spatial feature extraction model.
  • Steps 302 to 304 shown in the same manner are only one of the implementation sequences, and other sequences may also be used to execute sequentially, or may be executed in parallel.
  • the visual feature representation, the semantic feature representation and the spatial feature representation of the POI samples are spliced to obtain splicing features of the POI samples.
  • the splicing feature of the POI sample is input into the fully connected network, and the multimodal feature representation of the POI sample output by the fully connected layer is obtained; the multimodal feature representation is input into the classification network, and the category label of the POI sample is used as the classification network.
  • the loss function used by the classification network can be, but not limited to, Large-Softmax, A-Softmax, AM-Softmax, cosface, arcface, etc.
  • the model parameters of the image feature extraction model, the text feature extraction model and the spatial feature extraction model can remain unchanged, or can be updated in the above training process.
  • the multimodal feature representation of each POI is obtained separately for each POI in the manner in the above method embodiment, and the multimodal feature representation of each POI can be stored in a database.
  • the multimodal feature representation of POIs can be used to calculate the similarity between POIs. Specific application scenarios may include automatic production, intelligent retrieval and recommendation, etc. that are not limited to POI.
  • a collector or a collection device shoots an image containing a POI signboard, and saves the POI's image, name, coordinates and other information.
  • the massive POI data collected historically is extracted and stored in a database using the method in the above-mentioned embodiments of the present disclosure.
  • distributed redis is used as the feature library of the multi-modal feature representation.
  • the storage structure can take the form of key (key)-value (value) pairs.
  • the method in the above-mentioned embodiments of the present disclosure is also used to extract the multi-modal feature representation, and then the multi-modal feature representation is used for retrieval and matching in the feature database, for example, NN (Nearest Neighbor, nearest neighbor retrieval) is used. , ANN (Approximate Nearest Neighbor, approximate nearest neighbor retrieval) and other retrieval methods.
  • the retrieval process is based on the calculation of the similarity between the multimodal feature representation of the newly collected POI and the multimodal feature representation of the existing POI in the database, so as to judge whether the newly collected POI data is the data of the existing POI. For some POI data that have not been retrieved and matched, or POI data that cannot be processed by automation due to unrecognized text, insufficient image clarity, wrong coordinates, etc., it is submitted to the artificial platform for operation.
  • FIG. 4 is a schematic diagram of an apparatus for extracting multi-modal POI features provided by an embodiment of the present disclosure.
  • the apparatus may include: a visual feature extraction module 401 , a semantic feature extraction module 402 , a spatial feature extraction module 403 and
  • the feature fusion module 404 may further include a first model training unit 405 , a text acquisition unit 406 , a second model training unit 407 and a similarity calculation unit 408 .
  • the main functions of each unit are as follows:
  • the visual feature extraction module 401 is used for extracting the visual feature representation of the POI from the image of the POI by using the image feature extraction model.
  • the semantic feature extraction module 402 is used for extracting the semantic feature representation from the text information of the POI by using the text feature extraction model.
  • the spatial feature extraction module 403 is used for extracting the spatial feature representation from the spatial location information of the POI by using the spatial feature extraction model.
  • the feature fusion module 404 is used to fuse the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain the multimodal feature representation of the POI.
  • the visual feature extraction module 401 can use the target detection technology to extract the signboard area from the image containing the POI signboard; use the image feature extraction model obtained by pre-training to extract the visual feature representation of the POI from the signboard area.
  • the first model training unit 405 is used to pre-train to obtain an image feature extraction model in the following manner: obtain a first training sample, the first training sample includes: an image sample and a category label for the image sample; The input is to use the category label of the image sample as the target output of the classification network to train the deep neural network and the classification network; among them, the deep neural network extracts the visual feature representation from the image sample and then inputs it to the classification network, and the classification network outputs the image according to the visual feature representation.
  • the classification result of the sample; after the training, the image feature extraction model is obtained by using the deep neural network obtained by training.
  • the text obtaining unit 406 is configured to obtain the text information of the POI from the POI database; and/or, use the text recognition technology to recognize and obtain the text information of the POI from the image containing the POI signboard.
  • the text feature extraction model may include, but is not limited to: a Word Embedding model, a pre-trained language model, or a model obtained by fine-tuning the pre-trained language model with existing POI text data.
  • the spatial feature extraction module 403 is specifically configured to perform hash coding on the spatial location information of the POI to obtain a hash code; and use a spatial feature extraction model to convert the hash code into a spatial feature representation.
  • the spatial feature extraction model may include the Word Embedding model.
  • the feature fusion unit 403 can be specifically used to splicing the visual feature representation, semantic feature representation and spatial feature representation of POI to obtain splicing features; input the splicing features into a pre-trained fully connected network to obtain Multimodal feature representation of POI output from a fully connected network.
  • the second model training unit 407 is used for pre-training to obtain a fully connected network in the following manner:
  • the second training sample includes POI samples and the category labeling of the POI samples; use the image feature extraction model to extract the visual feature representation of the POI samples from the images of the POI samples; use the text feature extraction model to extract the text from the POI samples Extracting the semantic feature representation from the information; using the spatial feature extraction model to extract the spatial feature representation from the spatial location information of the POI sample; splicing the visual feature representation, semantic feature representation and spatial feature representation of the POI sample to obtain the splicing feature of the POI sample; Input the splicing features of POI samples into the fully connected network, and obtain the multimodal feature representation of the POI samples output by the fully connected layer; input the multimodal feature representation into the classification network, and use the category label of the POI sample as the target output of the classification network. Fully connected and classified networks.
  • the similarity calculation unit 408 is configured to calculate the similarity between POIs based on the multimodal feature representation of POIs.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 5 it is a block diagram of an electronic device according to an embodiment of the present disclosure.
  • Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 500 includes a computing unit 501 that can be executed according to a computer program stored in a read only memory (ROM) 502 or loaded from a storage unit 508 into a random access memory (RAM) 503 Various appropriate actions and handling. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored.
  • the computing unit 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504 .
  • Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506, such as a keyboard, mouse, etc.; an output unit 507, such as various types of displays, speakers, etc.; a storage unit 508, such as a magnetic disk, an optical disk, etc. ; and a communication unit 509, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • Computing unit 501 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the computing unit 501 performs the various methods and processes described above, such as the extraction method of multimodal POI features.
  • the extraction method of multimodal POI features may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508 .
  • part or all of the computer program may be loaded and/or installed on device 500 via ROM 802 and/or communication unit 509 .
  • the computer program When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the method for extracting multimodal POI features described above may be performed.
  • the computing unit 501 may be configured to perform the extraction method of multimodal POI features by any other suitable means (eg, by means of firmware).
  • Various implementations of the systems and techniques described herein may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip System (SOC), Load Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system on a chip System
  • CPLD Load Programmable Logic Device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the execution of the flowcharts and/or block diagrams The function/operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • a computer system can include clients and servers.
  • Clients and servers are generally remote from each other and usually interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for extracting a multi-modal POI feature, which relate to big data technology in the field of artificial intelligence. The method comprises: extracting a visual feature representation of a POI from an image of the POI by using an image feature extraction model; extracting a semantic feature representation from text information of the POI by using a text feature extraction model; extracting a spatial feature representation from spatial position information of the POI by using a spatial feature extraction model; and fusing the visual feature representation, the semantic feature representation and the spatial feature representation of the POI, so as to obtain a multi-modal feature representation of the POI. By means of the method, a feature vector representation that fuses multiple modalities is extracted for each POI, thereby providing a basis for subsequent calculation of the similarity between POIs.

Description

多模态POI特征的提取方法和装置Method and device for extracting multimodal POI features
本申请要求了申请日为2021年03月24日,申请号为202110312700.4发明名称为“多模态POI特征的提取方法和装置”的中国专利申请的优先权。This application claims the priority of the Chinese patent application with an application date of March 24, 2021 and an application number of 202110312700.4 with the invention title of "Method and Device for Extracting Multimodal POI Features".
技术领域technical field
本公开涉及计算机应用技术领域,尤其涉及人工智能领域下的大数据技术。The present disclosure relates to the field of computer application technology, and in particular to big data technology in the field of artificial intelligence.
背景技术Background technique
POI(Point of Interest,兴趣点)在地理信息系统中可以是一个建筑、一个商铺、一所学校、一个公交车站等等实际存在的地理实体。对于一个地理信息系统来说,POI的数量在一定程度代表着整个系统的价值。全面的POI信息是丰富地图信息系统的必备信息,一般而言,每个POI至少包括多种模态的信息,例如名称、坐标、图像。这些信息的数字媒介和表达方式不同。例如名称一般是某种语言的文本,坐标一般是至少两个维度的数字,图像则是图像形式。因此多模态POI指的就是多种数字媒介描述的物理实体。POI (Point of Interest, point of interest) can be a physical entity such as a building, a shop, a school, a bus station, etc. in a geographic information system. For a geographic information system, the number of POIs represents the value of the entire system to a certain extent. Comprehensive POI information is the necessary information to enrich the map information system. Generally speaking, each POI includes at least multiple modal information, such as name, coordinates, and image. The digital medium and presentation of this information vary. For example, names are generally text in a certain language, coordinates are generally numbers in at least two dimensions, and images are in the form of images. Therefore, a multimodal POI refers to a physical entity described by multiple digital media.
通常POI的信息存储于关系型数据库中,在很多应用场景下,需要从关系型数据库中查询POI的信息。这就需要快速计算多模态POI相似度的能力,而相似度的计算又是基于POI特征的,因此如何提取POI特征成为关键。Usually the POI information is stored in a relational database. In many application scenarios, the POI information needs to be queried from the relational database. This requires the ability to quickly calculate the similarity of multi-modal POI, and the calculation of similarity is based on POI features, so how to extract POI features becomes the key.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本公开提供了一种多模态POI特征的提取方法和装置。In view of this, the present disclosure provides a method and apparatus for extracting multimodal POI features.
根据本公开的第一方面,提供了一种多模态POI特征的提取方法,包括:According to a first aspect of the present disclosure, a method for extracting multimodal POI features is provided, including:
利用图像特征提取模型从POI的图像中提取所述POI的视觉特征表示;Extract the visual feature representation of the POI from the image of the POI using an image feature extraction model;
利用文本特征提取模型从所述POI的文本信息中提取语义特征表 示;Utilize text feature extraction model to extract semantic feature representation from the text information of described POI;
利用空间特征提取模型从所述POI的空间位置信息中提取空间特征表示;Extract spatial feature representation from the spatial location information of the POI using a spatial feature extraction model;
对所述POI的视觉特征表示、语义特征表示以及空间特征表示进行融合,得到所述POI的多模态特征表示。The visual feature representation, semantic feature representation and spatial feature representation of the POI are fused to obtain the multimodal feature representation of the POI.
根据本公开的第二方面,提供了一种多模态POI特征的提取装置,包括:According to a second aspect of the present disclosure, a device for extracting multimodal POI features is provided, comprising:
视觉特征提取模块,用于利用图像特征提取模型从POI的图像中提取所述POI的视觉特征表示;A visual feature extraction module for extracting the visual feature representation of the POI from the image of the POI by using an image feature extraction model;
语义特征提取模块,用于利用文本特征提取模型从所述POI的文本信息中提取语义特征表示;a semantic feature extraction module for extracting semantic feature representation from the text information of the POI by using a text feature extraction model;
空间特征提取模块,用于利用空间特征提取模型从所述POI的空间位置信息中提取空间特征表示;a spatial feature extraction module for extracting a spatial feature representation from the spatial location information of the POI by using a spatial feature extraction model;
特征融合模块,用于对所述POI的视觉特征表示、语义特征表示以及空间特征表示进行融合,得到所述POI的多模态特征表示。The feature fusion module is used for fusing the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain the multi-modal feature representation of the POI.
根据本公开的第三方面,提供了一种电子设备,包括:According to a third aspect of the present disclosure, there is provided an electronic device, comprising:
至少一个处理器;以及at least one processor; and
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的方法。The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
根据本公开的第四方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行如上所述的方法。According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method as described above.
根据本公开的第五方面,一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如上所述的方法。According to a fifth aspect of the present disclosure, a computer program product comprising a computer program, when executed by a processor, implements the method as described above.
由以上技术方案可以看出,本公开实施例提供了一种方法针对每一个POI提取多种模态融合的特征向量表示,从而为后续POI之间的相似度计算提供基础。It can be seen from the above technical solutions that the embodiments of the present disclosure provide a method to extract feature vector representations of multiple modal fusions for each POI, thereby providing a basis for subsequent similarity calculation between POIs.
应当理解,本部分分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通 过以下的说明书而变得容易理解。It should be understood that the matters described in this section are not intended to identify key or critical features of embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become readily understood from the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present disclosure. in:
图1为本公开实施例提供的多模态POI特征的提取方法流程图;1 is a flowchart of a method for extracting multimodal POI features provided by an embodiment of the present disclosure;
图2为本公开实施例提供的训练图像特征提取模型的示意图;2 is a schematic diagram of a training image feature extraction model provided by an embodiment of the present disclosure;
图3为本公开实施例提供的全连接网络的训练流程图;FIG. 3 is a training flow chart of a fully connected network provided by an embodiment of the present disclosure;
图4为本公开实施例提供的多模态POI特征的提取装置的示意图;4 is a schematic diagram of an apparatus for extracting multi-modal POI features provided by an embodiment of the present disclosure;
图5是用来实现本公开实施例的电子设备的框图。5 is a block diagram of an electronic device used to implement embodiments of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.
在现有的传统相似度计算方式中,通常是将两个POI的图像进行相似度计算,将两个POI的名称进行相似度计算,再将两个POI的坐标进行相似度计算。也就是说,需要分别对不同模态的特征分别进行相似度的计算,计算复杂度大且耗时长。针对该问题,本公开的核心思路在于针对每一个POI提取多种模态融合的特征表示,从而为后续POI之间的相似度计算提供基础。下面结合实施例对本公开提供的方法进行详细描述。In the existing traditional similarity calculation method, the similarity calculation is usually performed on the images of two POIs, the similarity calculation is performed on the names of the two POIs, and the similarity calculation is performed on the coordinates of the two POIs. That is to say, it is necessary to separately calculate the similarity of the features of different modalities, which is computationally complex and time-consuming. In view of this problem, the core idea of the present disclosure is to extract feature representations fused by multiple modalities for each POI, so as to provide a basis for subsequent similarity calculation between POIs. The method provided by the present disclosure will be described in detail below with reference to the embodiments.
图1为本公开实施例提供的多模态POI特征的提取方法流程图,该方法的执行主体为多模态POI特征的提取装置。该装置可以体现为位于服务器端的应用,或者还可以体现为位于服务器端的应用中的插件或软件开发工具包(Software Development Kit,SDK)等功能单元,或者,还可以位于具有较强计算能力的计算机终端,本发明实施例对此不进行特别限定。如图1中所示,该方法可以包括以下步骤:FIG. 1 is a flowchart of a method for extracting multi-modal POI features according to an embodiment of the present disclosure, and the execution body of the method is an apparatus for extracting multi-modal POI features. The device can be embodied as an application located on the server side, or can also be embodied as a plug-in or a software development kit (Software Development Kit, SDK) and other functional units in the application located on the server side, or can also be located in a computer with strong computing power. terminal, which is not particularly limited in this embodiment of the present invention. As shown in Figure 1, the method may include the following steps:
在101中,利用图像特征提取模型从POI的图像中提取POI的视觉特征表示。In 101, a visual feature representation of the POI is extracted from the image of the POI using an image feature extraction model.
在102中,利用文本特征提取模型从POI的文本信息中提取语义特征表示。In 102, semantic feature representations are extracted from the textual information of the POI using a textual feature extraction model.
在103中,利用空间特征提取模型从POI的空间位置信息中提取空间特征表示。In 103, a spatial feature representation is extracted from the spatial location information of the POI using a spatial feature extraction model.
在104中,对POI的视觉特征表示、语义特征表示以及空间特征表示进行融合,得到POI的多模态特征表示。In 104, the visual feature representation, semantic feature representation and spatial feature representation of the POI are fused to obtain a multimodal feature representation of the POI.
上述实施例中示出的步骤101~103仅为其中一种实现顺序,也可以采用其他顺序先后执行,也可以并行执行。 Steps 101 to 103 shown in the above embodiment are only one of the implementation sequences, and other sequences may be used to execute sequentially, or may be executed in parallel.
下面分别结合实施例对上述各步骤进行详细描述。首先对上述步骤101即“利用图像特征提取模型从包含POI招牌的图像中提取POI的视觉特征表示”进行详细描述。The above steps are described in detail below with reference to the embodiments respectively. First, the above-mentioned step 101, ie "using an image feature extraction model to extract a visual feature representation of POI from an image containing a POI signboard" will be described in detail.
POI信息中的图像通常是包含POI招牌的图像。例如是拍摄某个商铺的实景图,该实景图中包含该商铺的招牌,该招牌通常包含该商铺的名称,有的还包含该商铺的广告语。再例如是拍摄某个建筑的实景图,该实景图中包含该建筑的招牌,该招牌通常是建筑的名称。再例如是拍摄某个学校的实景图,该实景图中包含该学校的招牌,该招牌通常是学校的名称。这些包含POI招牌的图像在POI信息中具有较高的标识符,因此作为一种优选的实施方式,本公开中可以从包含POI招牌的图像中提取POI的视觉特征表示。The image in the POI information is usually the image containing the POI signboard. For example, taking a real picture of a certain store, the real picture includes the sign of the store, and the sign usually includes the name of the store, and some also includes the slogan of the store. Another example is to take a real picture of a building, and the real picture contains the signboard of the building, and the signboard is usually the name of the building. Another example is to take a real picture of a school, which contains the school's signboard, which is usually the name of the school. These images containing POI signboards have high identifiers in POI information, so as a preferred embodiment, the present disclosure can extract the visual feature representation of POIs from the images containing POI signs.
除了从包含POI招牌的图像中提取POI的视觉特征表示之外,还可以从其他类型的POI图像中提取。例如对于具有显著形状的建筑物类POI,可以从包含该建筑物主体形状的图像中提取视觉特征表示。这些POI的图像可以从POI数据库中获取。In addition to extracting visual feature representations of POIs from images containing POI signboards, it can also be extracted from other types of POI images. For example, for a building-like POI with a salient shape, a visual feature representation can be extracted from an image containing the shape of the main body of the building. Images of these POIs can be obtained from the POI database.
作为其中一种优选的实施方式,本步骤可以具体包括以下步骤S11~S12:As one of the preferred embodiments, this step may specifically include the following steps S11 to S12:
在步骤S11中,利用目标检测技术从包含POI招牌的图像中提取招牌区域。In step S11, the signboard area is extracted from the image containing the POI signboard using the object detection technique.
在本步骤中,可以利用诸如YOLO(You Only Look Once,只需看一眼)、SSD(Single Shot MultiBox Detector,单击多盒探测器)、Faster RCNN(Faster Region Convolutional Neural Networks,快速区域卷积神经网络)等目标检测技术从包含POI招牌的图像中识别招牌区域,在上述 目标检测技术的基础上海可以进一步结合FPN(feature pyramid networks,特征金字塔网络)等优化方式。这些目标检测方法为目前较为成熟的技术,在此不做详述。In this step, you can use tools such as YOLO (You Only Look Once, just take a look), SSD (Single Shot MultiBox Detector, click multi-box detector), Faster RCNN (Faster Region Convolutional Neural Networks, fast regional convolutional neural network) Network) and other target detection technologies can identify signboard areas from images containing POI signs. On the basis of the above target detection technologies, Shanghai can further combine optimization methods such as FPN (feature pyramid networks, feature pyramid networks). These target detection methods are relatively mature technologies at present, and will not be described in detail here.
除了采用目标检测技术之外,也可以采用其他方式来提取招牌区域。例如可以利用预先训练得到的招牌判别模型。首先对实景图像进行区域划分,因为一般情况下实景图像中的招牌是一个封闭区域,因此可以对实景图像进行区域的识别和划分,对于确定出的封闭区域输入招牌判别模型,由招牌判别模型输出该封闭区域是否为招牌区域的判别结果。In addition to using the target detection technology, other methods can also be used to extract the signboard area. For example, a pre-trained signboard discrimination model can be used. First, the real image is divided into regions. Because the signboard in the real image is a closed area in general, the real image can be identified and divided into regions. For the determined closed area, the signboard discrimination model is input, and the signboard discrimination model is output. The judgment result of whether the closed area is a signboard area.
其中招牌判别模型实际上是一个分类模型,可以预先收集一些实景图像在其中标注出招牌区域和非招牌区域分别作为正、负样本,然后训练分类模型而得到该招牌判别模型。Among them, the signboard discrimination model is actually a classification model. Some real images can be collected in advance, and the signboard area and the non-signboard area can be marked as positive and negative samples respectively, and then the signboard discrimination model can be obtained by training the classification model.
在步骤S12中,利用预先训练得到的图像特征提取模型,从招牌区域中提取POI的视觉特征表示。In step S12, the visual feature representation of POI is extracted from the signboard area by using the image feature extraction model obtained by pre-training.
其中的图像特征提取模型可以基于深度神经网络预先训练得到,将招牌区域输入图像特征提取模型后,由图像特征提取模型从招牌区域中提取POI的视觉特征表示。The image feature extraction model can be pre-trained based on a deep neural network. After the signboard area is input into the image feature extraction model, the image feature extraction model extracts the visual feature representation of POI from the signboard area.
下面对图像特征提取模型的训练过程进行描述。可以首先获取训练样本。本实施例中将用以训练图像特征提取模型的训练样本称为第一训练样本。但需要说明的是,本公开中涉及的“第一”、“第二”等表述并不具有数量、顺序、大小等限定作用,仅仅用以在名称上进行区分。The training process of the image feature extraction model is described below. Training samples can be obtained first. In this embodiment, the training sample used for training the image feature extraction model is referred to as the first training sample. However, it should be noted that the expressions such as "first" and "second" involved in the present disclosure do not have a limiting effect on quantity, order, size, etc., but are only used to distinguish names.
上述第一训练样本包括图像样本以及图像样本的类别标注。其中关于类别的标注可以是图像所体现的对象,例如,对包含猫的图像标注其类别为猫,对包含狗的图像标注其类别为狗。关于类别的标注也可以是图像所体现对象的种类,例如对包含某个具体医院的图像标注其类别为医院,对包含某个具体学校的图像标注其类别为学校。The above-mentioned first training samples include image samples and category labels of the image samples. The annotation about the category can be the object embodied by the image, for example, an image containing a cat is annotated as cat, and an image containing a dog is annotated as a dog. The category annotation can also be the category of the object represented by the image. For example, an image containing a specific hospital is marked as a hospital, and an image containing a specific school is marked as a school.
然后将图像样本作为深度神经网络的输入,如图2中所示,将图像样本的类别标注作为分类网络的目标输出。在本实施例中关于图像特征提取模型的训练过程中,涉及到两个网络,即深度神经网络和分类网络。其中深度神经网络从图像样本中提取视觉特征表示后输入分类网络,分类网络依据视觉特征表示输出对图像样本的分类结果。训练目标是最小化分类网络输出的分类结果与对应类别标注的差异。在训练结束后,例 如损失函数的值小于预设的阈值,或者,训练迭代次数达到预设的次数阈值等,利用训练得到的深度神经网络得到图像特征提取模型。也就是说,训练时采用深度神经网络和分类网络,但最终的图像特征提取模型仅仅使用其中的深度神经网络,而分类网络用以辅助深度神经网络的训练。The image samples are then used as the input of the deep neural network, as shown in Figure 2, and the category annotations of the image samples are used as the target output of the classification network. In the training process of the image feature extraction model in this embodiment, two networks are involved, that is, a deep neural network and a classification network. The deep neural network extracts the visual feature representation from the image sample and then inputs it into the classification network, and the classification network outputs the classification result of the image sample according to the visual feature representation. The training objective is to minimize the difference between the classification results output by the classification network and the corresponding class labels. After training, for example, the value of the loss function is less than the preset threshold, or the number of training iterations reaches the preset threshold, etc., the image feature extraction model is obtained by using the deep neural network obtained by training. That is to say, a deep neural network and a classification network are used during training, but the final image feature extraction model only uses the deep neural network, and the classification network is used to assist the training of the deep neural network.
在上述训练过程中采用的深度神经网络可以采用但不限于ResNet(Residual Network,残差网络)50、ResNet101、EfficientNet(高效网络)等。分类网络采用的损失函数可以采用但不限于Large-Softmax,A-Softmax,AM-Softmax,cosface,arcface等。The deep neural network used in the above training process can be adopted but not limited to ResNet (Residual Network, Residual Network) 50, ResNet101, EfficientNet (Efficient Network) and the like. The loss function adopted by the classification network can be adopted but not limited to Large-Softmax, A-Softmax, AM-Softmax, cosface, arcface, etc.
对上述步骤102即“利用文本特征提取模型从POI的文本信息中提取语义特征表示”进行详细描述。The above-mentioned step 102, ie "using a text feature extraction model to extract semantic feature representation from text information of POI" will be described in detail.
本步骤中涉及的POI的文本信息可以是从POI数据库中获取的POI的文本信息,例如POI名称、描述信息、评价信息等等。也可以是利用文字识别技术从包含POI招牌的图像中识别得到的POI的文本信息。即在从包含POI招牌的图像中识别出招牌区域后,再利用OCR(Optical Character Recognition,光学字符识别)从招牌区域中识别出文字,例如POI的名称、广告语等,作为POI的文本信息。The text information of the POI involved in this step may be the text information of the POI obtained from the POI database, such as the POI name, description information, evaluation information, and so on. It may also be text information of the POI recognized from the image including the POI signboard using the text recognition technology. That is, after identifying the signboard area from the image containing the POI signboard, OCR (Optical Character Recognition, Optical Character Recognition) is used to identify the text from the signboard area, such as the name of the POI, advertising slogans, etc., as the text information of the POI.
本步骤中利用的文本特征提取模型可以采用但不限于以下几种:The text feature extraction model utilized in this step can adopt but is not limited to the following:
第一种,Wording Embedding(词嵌入)模型。The first is the Wording Embedding model.
例如,可以采用诸如Word2Vec(词向量)、Glove等Wording Embedding模型。For example, Wording Embedding models such as Word2Vec (word vector), Glove, etc. can be used.
第二种,预训练语言模型。The second, pre-trained language model.
例如可以采用Bert(Bidirectional Encoder Representations from Transformers,来自转换器的双向编码表示)、Ernie(Enhanced Representation from kNowledge IntEgration,使用实体信息增强语言表示)等预训练语言模型。For example, pre-trained language models such as Bert (Bidirectional Encoder Representations from Transformers, bidirectional encoding representation from Transformers) and Ernie (Enhanced Representation from kNowledge IntEgration, using entity information to enhance language representation) can be used.
第三种,利用已有的POI文本数据对预训练语言模型进行fine-tune(微调)后的模型。The third is to use the existing POI text data to fine-tune the pre-trained language model.
下面对上述步骤103即“利用空间特征提取模型从POI的空间位置信息中提取空间特征表示”进行详细描述。The above-mentioned step 103, that is, "using a spatial feature extraction model to extract a spatial feature representation from the spatial location information of the POI" will be described in detail below.
本步骤中涉及的POI的空间位置信息主要指以采用一定形式对POI 的空间位置进行标注的信息,例如坐标信息。可以直接利用空间特征提取模型对POI的空间位置信息提取空间特征表示。The spatial position information of the POI involved in this step mainly refers to the information for marking the spatial position of the POI in a certain form, such as coordinate information. The spatial feature representation can be extracted directly from the spatial location information of the POI by using the spatial feature extraction model.
考虑到很多POI的距离实际上是很近的,且目前的定位精度能够控制在米级别,而在地图信息系统中更希望能够在block(地块)上划分各POI。因此,本公开提供了一种优选的实施方式,可以具体包括以下步骤S21~S22:Considering that the distance of many POIs is actually very close, and the current positioning accuracy can be controlled at the meter level, in the map information system, it is more desirable to divide each POI on a block (plot). Therefore, the present disclosure provides a preferred embodiment, which may specifically include the following steps S21 to S22:
在步骤S21中,对POI的空间位置信息进行哈希编码,得到哈希码。In step S21, hash coding is performed on the spatial location information of the POI to obtain a hash code.
对于坐标信息而言,可以采用诸如geohash(经纬度地址编码)对其进行编码。goehash采用一个字符串表示精度和维度两个坐标,经过goehash编码后使得位于相同block的两个坐标的哈希码前若干位相同,仅在后几位进行区别。For the coordinate information, such as geohash (latitude and longitude address encoding) can be used to encode it. goehash uses a string to represent the two coordinates of precision and dimension. After goehash encoding, the first few digits of the hash code of the two coordinates located in the same block are the same, and only the last few digits are distinguished.
在步骤S22中,利用空间特征提取模型将哈希码转化为空间特征表示。In step S22, the hash code is converted into a spatial feature representation using a spatial feature extraction model.
本步骤中采用的空间特征提取模型可以采用Word Emedding模型,即采用此嵌入的方式将哈希码转化为可量化的空间特征表示。The spatial feature extraction model used in this step can use the Word Emedding model, that is, the hash code is converted into a quantifiable spatial feature representation by this embedding method.
在本实施例中,对于Word Emedding模型,可以采用相似度任务对其进行进一步训练,训练目标为:位置上越接近的两个POI,Wording Embedding模型输出的空间特征表示之间的相似度越高。In this embodiment, for the Word Emedding model, the similarity task can be used to further train it. The training target is: the closer the position of the two POIs, the higher the similarity between the spatial feature representations output by the Wording Embedding model.
下面对上述步骤104即“对POI的视觉特征表示、语义特征表示以及空间特征表示进行融合,得到POI的多模态特征表示”进行详细描述。The above-mentioned step 104, namely, "merging the visual feature representation, semantic feature representation and spatial feature representation of POI to obtain a multi-modal feature representation of POI" will be described in detail below.
在本步骤中,可以将上述POI的视觉特征表示、语义特征表示以及空间特征表示直接进行拼接,将拼接得到的特征作为POI的多模态特征表示。但这种方式较为生硬,缺乏学习能力,表达自然也不准确。In this step, the visual feature representation, semantic feature representation and spatial feature representation of the above POI can be directly spliced, and the spliced feature can be used as the multi-modal feature representation of the POI. However, this method is relatively rigid, lacks learning ability, and is naturally inaccurate in expression.
因此,本公开提供了一种优选的融合方式,具体可以包括以下步骤S31~S32:Therefore, the present disclosure provides a preferred fusion method, which may specifically include the following steps S31 to S32:
在步骤S31中,将POI的视觉特征表示、语义特征表示以及空间特征表示进行拼接,得到拼接特征。In step S31, the visual feature representation, semantic feature representation and spatial feature representation of the POI are spliced to obtain spliced features.
本步骤中可以将视觉特征表示、语义特征表示以及空间特征表示按照预设的顺序进行首尾拼接。特征表示的向量维度不同,可以采用预设的值例如0进行补足。In this step, the visual feature representation, the semantic feature representation and the spatial feature representation can be spliced end to end in a preset order. The dimension of the vector represented by the feature is different, and a preset value such as 0 can be used to make up.
在步骤S32中,将拼接特征输入预先训练得到的全连接网络(Full  Connection),获取全连接网络输出的POI的多模态特征表示。In step S32, the splicing feature is input into a pre-trained full connection network (Full Connection), and the multimodal feature representation of the POI output by the full connection network is obtained.
下面对上述全连接网络的训练过程进行详细描述。如图3中所示,该过程可以包括以下步骤:The training process of the above fully connected network is described in detail below. As shown in Figure 3, the process may include the following steps:
在301中,获取第二训练样本,第二训练样本包括POI样本以及对POI样本的类别标注。In 301, a second training sample is obtained, where the second training sample includes POI samples and category labels for the POI samples.
可以预先获取一些具有图像、文本和空间位置信息的POI作为POI样本,并对这些POI的类别进行标注。例如标注是医院、建筑物、学校、公交车站、商铺等等。将这些POI样本及其类别标注作为第二训练样本,用以训练特征融合时采用的全连接网络。Some POIs with image, text and spatial location information can be obtained in advance as POI samples, and the categories of these POIs can be annotated. For example, the labels are hospitals, buildings, schools, bus stops, shops, etc. These POI samples and their category labels are used as second training samples to train the fully connected network used in feature fusion.
在302中,利用图像特征提取模型从POI样本的图像中提取POI样本的视觉特征表示。At 302, a visual feature representation of the POI sample is extracted from the image of the POI sample using an image feature extraction model.
在303中,利用文本特征提取模型从POI样本的文本信息中提取语义特征表示。In 303, a textual feature extraction model is used to extract semantic feature representations from the textual information of the POI samples.
在304中,利用空间特征提取模型从POI样本的空间位置信息中提取空间特征表示。In 304, a spatial feature representation is extracted from the spatial location information of the POI samples using a spatial feature extraction model.
上述步骤302~步骤304中的特征提取方式参见之前方法实施例中的相关记载,在此不做赘述。同样示出的步骤302~304仅为其中一种实现顺序,也可以采用其他顺序先后执行,也可以并行执行。For the feature extraction methods in the foregoing steps 302 to 304, reference may be made to the relevant records in the previous method embodiments, and details are not described herein. Steps 302 to 304 shown in the same manner are only one of the implementation sequences, and other sequences may also be used to execute sequentially, or may be executed in parallel.
在305中,将POI样本的视觉特征表示、语义特征表示以及空间特征表示进行拼接,得到POI样本的拼接特征。In 305, the visual feature representation, the semantic feature representation and the spatial feature representation of the POI samples are spliced to obtain splicing features of the POI samples.
在306中,将POI样本的拼接特征输入全连接网络,获取全连接层输出的POI样本的多模态特征表示;将多模态特征表示输入分类网络,将POI样本的类别标注作为分类网络的目标输出,训练全连接网络和分类网络。In 306, the splicing feature of the POI sample is input into the fully connected network, and the multimodal feature representation of the POI sample output by the fully connected layer is obtained; the multimodal feature representation is input into the classification network, and the category label of the POI sample is used as the classification network. Target output, train fully connected network and classification network.
其中,分类网络采用的损失函数可以采用但不限于Large-Softmax,A-Softmax,AM-Softmax,cosface,arcface等。Among them, the loss function used by the classification network can be, but not limited to, Large-Softmax, A-Softmax, AM-Softmax, cosface, arcface, etc.
在上述训练过程中,重点训练全连接网络和分类网络,利用损失函数的值更新全连接网络和分类网络的参数。对于图像特征提取模型、文本特征提取模型和空间特征提取模型的模型参数可以保持不变,也可以在上述训练过程中参与更新。In the above training process, focus on training the fully connected network and the classification network, and use the value of the loss function to update the parameters of the fully connected network and the classification network. The model parameters of the image feature extraction model, the text feature extraction model and the spatial feature extraction model can remain unchanged, or can be updated in the above training process.
在采用上述方法实施例中的方式针对各POI分别得到各POI的多模 态特征表示,可以将各POI的多模态特征表示存储于数据库。POI的多模态特征表示可以用以进行POI之间的相似度计算。具体的应用场景可以包括按不限于POI的自动化生产、智能检索与推荐等。The multimodal feature representation of each POI is obtained separately for each POI in the manner in the above method embodiment, and the multimodal feature representation of each POI can be stored in a database. The multimodal feature representation of POIs can be used to calculate the similarity between POIs. Specific application scenarios may include automatic production, intelligent retrieval and recommendation, etc. that are not limited to POI.
以POI的自动化生产为例,采集员或采集装置拍摄包含POI招牌的图像,并保存POI的图像、名称、坐标等信息。历史采集到的海量POI数据,采用本公开上述实施例中的方式提取多模态特征表示后存储于数据库,例如采用分布式redis作为多模态特征表示的特征库。存储结构可以采用key(键)-value(值)对的形式。Taking the automated production of POI as an example, a collector or a collection device shoots an image containing a POI signboard, and saves the POI's image, name, coordinates and other information. The massive POI data collected historically is extracted and stored in a database using the method in the above-mentioned embodiments of the present disclosure. For example, distributed redis is used as the feature library of the multi-modal feature representation. The storage structure can take the form of key (key)-value (value) pairs.
对于新采集的POI数据,同样采用本公开上述实施例中的方式提取多模态特征表示,然后利用多模态特征表示在特征库中进行检索匹配,例如采用NN(Nearest Neighbor,最近邻检索)、ANN(Approximate Nearest Neighbor,近似最近邻检索)等检索方式。检索过程基于新采集的POI的多模态特征表示与数据库中已有POI的多模态特征表示之间的相似度的计算,以此判断该新采集的POI数据是否为已有POI的数据。对于一些未检索匹配上的POI数据,或者,因为诸如文本无法识别、图像清晰度不足、坐标错误等所引起的自动化无法处理的POI数据,则提交给人工平台进行作业。For the newly collected POI data, the method in the above-mentioned embodiments of the present disclosure is also used to extract the multi-modal feature representation, and then the multi-modal feature representation is used for retrieval and matching in the feature database, for example, NN (Nearest Neighbor, nearest neighbor retrieval) is used. , ANN (Approximate Nearest Neighbor, approximate nearest neighbor retrieval) and other retrieval methods. The retrieval process is based on the calculation of the similarity between the multimodal feature representation of the newly collected POI and the multimodal feature representation of the existing POI in the database, so as to judge whether the newly collected POI data is the data of the existing POI. For some POI data that have not been retrieved and matched, or POI data that cannot be processed by automation due to unrecognized text, insufficient image clarity, wrong coordinates, etc., it is submitted to the artificial platform for operation.
以上是对本公开所提供方法进行的详细描述,下面结合实施例对本公开所提供的装置进行详细描述。The above is a detailed description of the method provided by the present disclosure, and the device provided by the present disclosure is described in detail below with reference to the embodiments.
图4为本公开实施例提供的多模态POI特征的提取装置的示意图,如图4中所示,该装置可以包括:视觉特征提取模块401、语义特征提取模块402、空间特征提取模块403和特征融合模块404,还可以进一步包括第一模型训练单元405、文本获取单元406、第二模型训练单元407和相似度计算单元408。其中各组成单元的主要功能如下:FIG. 4 is a schematic diagram of an apparatus for extracting multi-modal POI features provided by an embodiment of the present disclosure. As shown in FIG. 4 , the apparatus may include: a visual feature extraction module 401 , a semantic feature extraction module 402 , a spatial feature extraction module 403 and The feature fusion module 404 may further include a first model training unit 405 , a text acquisition unit 406 , a second model training unit 407 and a similarity calculation unit 408 . The main functions of each unit are as follows:
视觉特征提取模块401,用于利用图像特征提取模型从POI的图像中提取POI的视觉特征表示。The visual feature extraction module 401 is used for extracting the visual feature representation of the POI from the image of the POI by using the image feature extraction model.
语义特征提取模块402,用于利用文本特征提取模型从POI的文本信息中提取语义特征表示。The semantic feature extraction module 402 is used for extracting the semantic feature representation from the text information of the POI by using the text feature extraction model.
空间特征提取模块403,用于利用空间特征提取模型从POI的空间位置信息中提取空间特征表示。The spatial feature extraction module 403 is used for extracting the spatial feature representation from the spatial location information of the POI by using the spatial feature extraction model.
特征融合模块404,用于对POI的视觉特征表示、语义特征表示以 及空间特征表示进行融合,得到POI的多模态特征表示。The feature fusion module 404 is used to fuse the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain the multimodal feature representation of the POI.
作为一种优选的实施方式,视觉特征提取模块401可以利用目标检测技术从包含POI招牌的图像中提取招牌区域;利用预先训练得到的图像特征提取模型从招牌区域中提取POI的视觉特征表示。As a preferred embodiment, the visual feature extraction module 401 can use the target detection technology to extract the signboard area from the image containing the POI signboard; use the image feature extraction model obtained by pre-training to extract the visual feature representation of the POI from the signboard area.
第一模型训练单元405,用于采用如下方式预先训练得到图像特征提取模型:获取第一训练样本,第一训练样本包括:图像样本以及对图像样本的类别标注;将图像样本作为深度神经网络的输入,将图像样本的类别标注作为分类网络的目标输出,训练深度神经网络和分类网络;其中,深度神经网络从图像样本中提取视觉特征表示后输入分类网络,分类网络依据视觉特征表示输出对图像样本的分类结果;训练结束后,利用训练得到的深度神经网络得到图像特征提取模型。The first model training unit 405 is used to pre-train to obtain an image feature extraction model in the following manner: obtain a first training sample, the first training sample includes: an image sample and a category label for the image sample; The input is to use the category label of the image sample as the target output of the classification network to train the deep neural network and the classification network; among them, the deep neural network extracts the visual feature representation from the image sample and then inputs it to the classification network, and the classification network outputs the image according to the visual feature representation. The classification result of the sample; after the training, the image feature extraction model is obtained by using the deep neural network obtained by training.
文本获取单元406,用于从POI数据库中获取POI的文本信息;和/或,利用文字识别技术从包含POI招牌的图像中识别得到POI的文本信息。The text obtaining unit 406 is configured to obtain the text information of the POI from the POI database; and/or, use the text recognition technology to recognize and obtain the text information of the POI from the image containing the POI signboard.
其中,文本特征提取模型可以包括但不限于:Word Embedding模型、预训练语言模型或者利用已有的POI文本数据对预训练语言模型进行微调后得到的模型。The text feature extraction model may include, but is not limited to: a Word Embedding model, a pre-trained language model, or a model obtained by fine-tuning the pre-trained language model with existing POI text data.
作为一种优选的实施方式,空间特征提取模块403,具体用于对POI的空间位置信息进行哈希编码,得到哈希码;利用空间特征提取模型将哈希码转化为空间特征表示。As a preferred embodiment, the spatial feature extraction module 403 is specifically configured to perform hash coding on the spatial location information of the POI to obtain a hash code; and use a spatial feature extraction model to convert the hash code into a spatial feature representation.
其中,空间特征提取模型可以包括Word Embedding模型。Among them, the spatial feature extraction model may include the Word Embedding model.
作为一种优选的实施方式,特征融合单元403可以具体用于将POI的视觉特征表示、语义特征表示以及空间特征表示进行拼接,得到拼接特征;将拼接特征输入预先训练得到的全连接网络,获取全连接网络输出的POI的多模态特征表示。As a preferred embodiment, the feature fusion unit 403 can be specifically used to splicing the visual feature representation, semantic feature representation and spatial feature representation of POI to obtain splicing features; input the splicing features into a pre-trained fully connected network to obtain Multimodal feature representation of POI output from a fully connected network.
第二模型训练单元407,用于采用如下方式预先训练得到全连接网络:The second model training unit 407 is used for pre-training to obtain a fully connected network in the following manner:
获取第二训练样本,第二训练样本包括POI样本以及对POI样本的类别标注;利用图像特征提取模型从POI样本的图像中提取POI样本的视觉特征表示;利用文本特征提取模型从POI样本的文本信息中提取语义特征表示;利用空间特征提取模型从POI样本的空间位置信息中提取 空间特征表示;将POI样本的视觉特征表示、语义特征表示以及空间特征表示进行拼接,得到POI样本的拼接特征;将POI样本的拼接特征输入全连接网络,获取全连接层输出的POI样本的多模态特征表示;将多模态特征表示输入分类网络,将POI样本的类别标注作为分类网络的目标输出,训练全连接网络和分类网络。Obtain a second training sample, the second training sample includes POI samples and the category labeling of the POI samples; use the image feature extraction model to extract the visual feature representation of the POI samples from the images of the POI samples; use the text feature extraction model to extract the text from the POI samples Extracting the semantic feature representation from the information; using the spatial feature extraction model to extract the spatial feature representation from the spatial location information of the POI sample; splicing the visual feature representation, semantic feature representation and spatial feature representation of the POI sample to obtain the splicing feature of the POI sample; Input the splicing features of POI samples into the fully connected network, and obtain the multimodal feature representation of the POI samples output by the fully connected layer; input the multimodal feature representation into the classification network, and use the category label of the POI sample as the target output of the classification network. Fully connected and classified networks.
相似度计算单元408,用于基于POI的多模态特征表示,计算POI之间的相似度。The similarity calculation unit 408 is configured to calculate the similarity between POIs based on the multimodal feature representation of POIs.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for related parts.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
如图5所示,是根据本公开实施例的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。As shown in FIG. 5 , it is a block diagram of an electronic device according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图5所示,设备500包括计算单元501,其可以根据存储在只读存储器(ROM)502中的计算机程序或者从存储单元508加载到随机访问存储器(RAM)503中的计算机程序,来执行各种适当的动作和处理。在RAM 503中,还可存储设备500操作所需的各种程序和数据。计算单元501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in FIG. 5 , the device 500 includes a computing unit 501 that can be executed according to a computer program stored in a read only memory (ROM) 502 or loaded from a storage unit 508 into a random access memory (RAM) 503 Various appropriate actions and handling. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504 .
设备500中的多个部件连接至I/O接口505,包括:输入单元506,例如键盘、鼠标等;输出单元507,例如各种类型的显示器、扬声器等;存储单元508,例如磁盘、光盘等;以及通信单元509,例如网卡、调制解调器、无线通信收发机等。通信单元509允许设备500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506, such as a keyboard, mouse, etc.; an output unit 507, such as various types of displays, speakers, etc.; a storage unit 508, such as a magnetic disk, an optical disk, etc. ; and a communication unit 509, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
计算单元501可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元501的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元501执行上文所描述的各个方法和处理,例如多模态POI特征的提取方法。例如,在一些实施例中,多模态POI特征的提取方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元508。 Computing unit 501 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as the extraction method of multimodal POI features. For example, in some embodiments, the extraction method of multimodal POI features may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508 .
在一些实施例中,计算机程序的部分或者全部可以经由ROM 802和/或通信单元509而被载入和/或安装到设备500上。当计算机程序加载到RAM 503并由计算单元501执行时,可以执行上文描述的多模态POI特征的提取方法的一个或多个步骤。备选地,在其他实施例中,计算单元501可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行多模态POI特征的提取方法。In some embodiments, part or all of the computer program may be loaded and/or installed on device 500 via ROM 802 and/or communication unit 509 . When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the method for extracting multimodal POI features described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the extraction method of multimodal POI features by any other suitable means (eg, by means of firmware).
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip System (SOC), Load Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控30制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the execution of the flowcharts and/or block diagrams The function/operation is implemented. The program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加 或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application can be executed in parallel, sequentially or in different orders, and as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements, and improvements made within the spirit and principles of the present disclosure should be included within the protection scope of the present disclosure.

Claims (23)

  1. 一种多模态兴趣点POI特征的提取方法,包括:A method for extracting POI features of multimodal interest points, comprising:
    利用图像特征提取模型从POI的图像中提取所述POI的视觉特征表示;Extract the visual feature representation of the POI from the image of the POI using an image feature extraction model;
    利用文本特征提取模型从所述POI的文本信息中提取语义特征表示;Utilize a text feature extraction model to extract semantic feature representation from the text information of the POI;
    利用空间特征提取模型从所述POI的空间位置信息中提取空间特征表示;Extract spatial feature representation from the spatial location information of the POI using a spatial feature extraction model;
    对所述POI的视觉特征表示、语义特征表示以及空间特征表示进行融合,得到所述POI的多模态特征表示。The visual feature representation, semantic feature representation and spatial feature representation of the POI are fused to obtain the multimodal feature representation of the POI.
  2. 根据权利要求1所述的方法,其中,所述利用图像特征提取模型从POI的图像中提取所述POI的视觉特征表示包括:The method according to claim 1, wherein the extracting the visual feature representation of the POI from the image of the POI using an image feature extraction model comprises:
    利用目标检测技术从包含POI招牌的图像中提取招牌区域;Extract signboard regions from images containing POI signboards using object detection techniques;
    利用预先训练得到的图像特征提取模型从所述招牌区域中提取所述POI的视觉特征表示。The visual feature representation of the POI is extracted from the signboard area using a pre-trained image feature extraction model.
  3. 根据权利要求1或2所述的方法,其中,所述图像特征提取模型采用如下方式预先训练得到:The method according to claim 1 or 2, wherein the image feature extraction model is pre-trained in the following manner:
    获取第一训练样本,所述第一训练样本包括:图像样本以及对图像样本的类别标注;Obtain a first training sample, where the first training sample includes: an image sample and a category label for the image sample;
    将所述图像样本作为深度神经网络的输入,将所述图像样本的类别标注作为分类网络的目标输出,训练所述深度神经网络和所述分类网络;其中,所述深度神经网络从所述图像样本中提取视觉特征表示后输入所述分类网络,所述分类网络依据所述视觉特征表示输出对所述图像样本的分类结果;The image sample is used as the input of the deep neural network, and the category label of the image sample is used as the target output of the classification network, and the deep neural network and the classification network are trained; wherein, the deep neural network is obtained from the image After the visual feature representation is extracted from the sample, it is input to the classification network, and the classification network outputs the classification result of the image sample according to the visual feature representation;
    训练结束后,利用训练得到的所述深度神经网络得到所述图像特征提取模型。After the training, the image feature extraction model is obtained by using the deep neural network obtained by training.
  4. 根据权利要求1所述的方法,其中,所述POI的文本信息包括:The method according to claim 1, wherein the text information of the POI comprises:
    从POI数据库中获取的所述POI的文本信息;和/或,Textual information of the POI obtained from the POI database; and/or,
    利用文字识别技术从包含POI招牌的图像中识别得到的所述POI的文本信息。The text information of the POI is recognized from the image containing the POI signboard by using the text recognition technology.
  5. 根据权利要求1所述的方法,其中,所述文本特征提取模型包括:The method according to claim 1, wherein the text feature extraction model comprises:
    词嵌入Word Embedding模型、预训练语言模型或者利用已有的POI文本数据对预训练语言模型进行微调后得到的模型。Word embedding Word Embedding model, pre-trained language model, or a model obtained by fine-tuning the pre-trained language model with existing POI text data.
  6. 根据权利要求1所述的方法,其中,利用空间特征提取模型从所述POI的空间位置信息中提取空间特征表示包括:The method according to claim 1, wherein extracting a spatial feature representation from the spatial location information of the POI by using a spatial feature extraction model comprises:
    对所述POI的空间位置信息进行哈希编码,得到哈希码;Hash coding is performed on the spatial location information of the POI to obtain a hash code;
    利用空间特征提取模型将所述哈希码转化为空间特征表示。The hash code is converted into a spatial feature representation using a spatial feature extraction model.
  7. 根据权利要求1或6所述的方法,其中,所述空间特征提取模型包括词嵌入模型。The method of claim 1 or 6, wherein the spatial feature extraction model comprises a word embedding model.
  8. 根据权利要求1所述的方法,其中,对所述POI的视觉特征表示、语义特征表示以及空间特征表示进行融合,得到所述POI的多模态特征表示包括:The method according to claim 1, wherein, fusing the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain the multimodal feature representation of the POI comprises:
    将所述POI的视觉特征表示、语义特征表示以及空间特征表示进行拼接,得到拼接特征;Splicing the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain splicing features;
    将所述拼接特征输入预先训练得到的全连接网络,获取所述全连接网络输出的所述POI的多模态特征表示。The splicing feature is input into a pre-trained fully connected network, and a multimodal feature representation of the POI output by the fully connected network is obtained.
  9. 根据权利要求8所述的方法,其中,所述全连接网络采用如下方式预先训练得到:The method according to claim 8, wherein the fully connected network is pre-trained in the following manner:
    获取第二训练样本,所述第二训练样本包括POI样本以及对所述POI样本的类别标注;obtaining a second training sample, where the second training sample includes a POI sample and a category label for the POI sample;
    利用所述图像特征提取模型从所述POI样本的图像中提取所述POI样本的视觉特征表示;Extract the visual feature representation of the POI sample from the image of the POI sample using the image feature extraction model;
    利用所述文本特征提取模型从所述POI样本的文本信息中提取语义特征表示;Using the text feature extraction model to extract semantic feature representations from the text information of the POI samples;
    利用空间特征提取模型从所述POI样本的空间位置信息中提取空间特征表示;Extract spatial feature representation from the spatial location information of the POI sample by using a spatial feature extraction model;
    将所述POI样本的视觉特征表示、语义特征表示以及空间特征表示进行拼接,得到所述POI样本的拼接特征;Splicing the visual feature representation, semantic feature representation and spatial feature representation of the POI sample to obtain the splicing feature of the POI sample;
    将所述POI样本的拼接特征输入全连接网络,获取所述全连接层输出的所述POI样本的多模态特征表示;Input the splicing feature of the POI sample into a fully connected network, and obtain the multimodal feature representation of the POI sample output by the fully connected layer;
    将所述多模态特征表示输入分类网络,将所述POI样本的类别标注 作为所述分类网络的目标输出,训练所述全连接网络和所述分类网络。The multimodal feature representation is input into the classification network, and the category labeling of the POI sample is used as the target output of the classification network, and the fully connected network and the classification network are trained.
  10. 根据权利要求1所述的方法,该方法还包括:The method of claim 1, further comprising:
    基于POI的多模态特征表示,计算POI之间的相似度。Based on the multimodal feature representation of POIs, the similarity between POIs is calculated.
  11. 一种多模态POI特征的提取装置,包括:A device for extracting multimodal POI features, comprising:
    视觉特征提取模块,用于利用图像特征提取模型从POI的图像中提取所述POI的视觉特征表示;A visual feature extraction module for extracting the visual feature representation of the POI from the image of the POI by using an image feature extraction model;
    语义特征提取模块,用于利用文本特征提取模型从所述POI的文本信息中提取语义特征表示;a semantic feature extraction module for extracting semantic feature representation from the text information of the POI by using a text feature extraction model;
    空间特征提取模块,用于利用空间特征提取模型从所述POI的空间位置信息中提取空间特征表示;a spatial feature extraction module for extracting a spatial feature representation from the spatial location information of the POI by using a spatial feature extraction model;
    特征融合模块,用于对所述POI的视觉特征表示、语义特征表示以及空间特征表示进行融合,得到所述POI的多模态特征表示。The feature fusion module is used for fusing the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain the multi-modal feature representation of the POI.
  12. 根据权利要求11所述的装置,其中,所述视觉特征提取模块,具体用于利用目标检测技术从包含POI招牌的图像中提取招牌区域;利用预先训练得到的图像特征提取模型从所述招牌区域中提取所述POI的视觉特征表示。The device according to claim 11, wherein the visual feature extraction module is specifically configured to extract a signboard area from an image containing POI signboards by using a target detection technology; to extract the visual feature representation of the POI.
  13. 根据权利要求11或12所述的装置,还包括:The apparatus of claim 11 or 12, further comprising:
    第一模型训练单元,用于采用如下方式预先训练得到所述图像特征提取模型:获取第一训练样本,所述第一训练样本包括:图像样本以及对图像样本的类别标注;将所述图像样本作为深度神经网络的输入,将所述图像样本的类别标注作为分类网络的目标输出,训练所述深度神经网络和所述分类网络;其中,所述深度神经网络从所述图像样本中提取视觉特征表示后输入所述分类网络,所述分类网络依据所述视觉特征表示输出对所述图像样本的分类结果;训练结束后,利用训练得到的所述深度神经网络得到所述图像特征提取模型。a first model training unit, configured to pre-train to obtain the image feature extraction model in the following manner: obtain a first training sample, where the first training sample includes: an image sample and a category label for the image sample; As the input of the deep neural network, the class label of the image sample is used as the target output of the classification network, and the deep neural network and the classification network are trained; wherein, the deep neural network extracts visual features from the image sample After the representation, the classification network is input, and the classification network outputs the classification result of the image sample according to the visual feature representation; after the training is completed, the image feature extraction model is obtained by using the deep neural network obtained by training.
  14. 根据权利要求11所述的装置,还包括:The apparatus of claim 11, further comprising:
    文本获取单元,用于从POI数据库中获取所述POI的文本信息;和/或,利用文字识别技术从包含POI招牌的图像中识别得到所述POI的文本信息。A text acquisition unit, configured to acquire the text information of the POI from the POI database; and/or, using a text recognition technology to recognize and obtain the text information of the POI from an image containing the POI signboard.
  15. 根据权利要求11所述的装置,其中,所述文本特征提取模型包括:The apparatus according to claim 11, wherein the text feature extraction model comprises:
    词嵌入Word Embedding模型、预训练语言模型或者利用已有的POI文本数据对预训练语言模型进行微调后得到的模型。Word embedding Word Embedding model, pre-trained language model, or a model obtained by fine-tuning the pre-trained language model with existing POI text data.
  16. 根据权利要求11所述的装置,其中,所述空间特征提取模块,具体用于对所述POI的空间位置信息进行哈希编码,得到哈希码;利用空间特征提取模型将所述哈希码转化为空间特征表示。The device according to claim 11, wherein the spatial feature extraction module is specifically configured to perform hash coding on the spatial location information of the POI to obtain a hash code; and use a spatial feature extraction model to extract the hash code Converted to spatial feature representation.
  17. 根据权利要求11或16所述的装置,其中,所述空间特征提取模型包括词嵌入模型。The apparatus of claim 11 or 16, wherein the spatial feature extraction model comprises a word embedding model.
  18. 根据权利要求11所述的装置,其中,所述特征融合单元,具体用于将所述POI的视觉特征表示、语义特征表示以及空间特征表示进行拼接,得到拼接特征;将所述拼接特征输入预先训练得到的全连接网络,获取所述全连接网络输出的所述POI的多模态特征表示。The device according to claim 11, wherein the feature fusion unit is specifically configured to splicing the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain splicing features; The fully-connected network obtained by training is obtained, and the multi-modal feature representation of the POI output by the fully-connected network is obtained.
  19. 根据权利要求18所述的装置,还包括:The apparatus of claim 18, further comprising:
    第二模型训练单元,用于采用如下方式预先训练得到所述全连接网络:The second model training unit is used to obtain the fully connected network by pre-training in the following manner:
    获取第二训练样本,所述第二训练样本包括POI样本以及对所述POI样本的类别标注;利用所述图像特征提取模型从所述POI样本的图像中提取所述POI样本的视觉特征表示;利用所述文本特征提取模型从所述POI样本的文本信息中提取语义特征表示;利用空间特征提取模型从所述POI样本的空间位置信息中提取空间特征表示;将所述POI样本的视觉特征表示、语义特征表示以及空间特征表示进行拼接,得到所述POI样本的拼接特征;将所述POI样本的拼接特征输入全连接网络,获取所述全连接层输出的所述POI样本的多模态特征表示;将所述多模态特征表示输入分类网络,将所述POI样本的类别标注作为所述分类网络的目标输出,训练所述全连接网络和所述分类网络。Obtain a second training sample, where the second training sample includes POI samples and a category label for the POI samples; extract the visual feature representation of the POI samples from the images of the POI samples by using the image feature extraction model; Use the text feature extraction model to extract semantic feature representation from the text information of the POI sample; use the spatial feature extraction model to extract spatial feature representation from the spatial location information of the POI sample; represent the visual feature of the POI sample , semantic feature representation and spatial feature representation are spliced to obtain the splicing feature of the POI sample; input the splicing feature of the POI sample into the fully connected network to obtain the multimodal feature of the POI sample output by the fully connected layer Representation; input the multimodal feature representation into a classification network, use the category label of the POI sample as the target output of the classification network, and train the fully connected network and the classification network.
  20. 根据权利要求11所述的装置,还包括:The apparatus of claim 11, further comprising:
    相似度计算单元,用于基于POI的多模态特征表示,计算POI之间的相似度。The similarity calculation unit is used to calculate the similarity between POIs based on the multimodal feature representation of POIs.
  21. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令 被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-10中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-10 Methods.
  22. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行权利要求1-10中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-10.
  23. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-10中任一项所述的方法。A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-10.
PCT/CN2021/107383 2021-03-24 2021-07-20 Method and apparatus for extracting multi-modal poi feature WO2022198854A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020227044369A KR20230005408A (en) 2021-03-24 2021-07-20 Method and apparatus for extracting multi-modal POI features
JP2022576469A JP2023529939A (en) 2021-03-24 2021-07-20 Multimodal POI feature extraction method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110312700.4A CN113032672A (en) 2021-03-24 2021-03-24 Method and device for extracting multi-modal POI (Point of interest) features
CN202110312700.4 2021-03-24

Publications (1)

Publication Number Publication Date
WO2022198854A1 true WO2022198854A1 (en) 2022-09-29

Family

ID=76473210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107383 WO2022198854A1 (en) 2021-03-24 2021-07-20 Method and apparatus for extracting multi-modal poi feature

Country Status (4)

Country Link
JP (1) JP2023529939A (en)
KR (1) KR20230005408A (en)
CN (1) CN113032672A (en)
WO (1) WO2022198854A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966061A (en) * 2022-12-28 2023-04-14 上海帜讯信息技术股份有限公司 Disaster warning processing method, system and device based on 5G message
CN116665228A (en) * 2023-07-31 2023-08-29 恒生电子股份有限公司 Image processing method and device
CN116805531A (en) * 2023-08-24 2023-09-26 安徽通灵仿生科技有限公司 Pediatric remote medical system

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032672A (en) * 2021-03-24 2021-06-25 北京百度网讯科技有限公司 Method and device for extracting multi-modal POI (Point of interest) features
CN113657274B (en) 2021-08-17 2022-09-20 北京百度网讯科技有限公司 Table generation method and device, electronic equipment and storage medium
CN113807102B (en) * 2021-08-20 2022-11-01 北京百度网讯科技有限公司 Method, device, equipment and computer storage medium for establishing semantic representation model
CN113807218B (en) * 2021-09-03 2024-02-20 科大讯飞股份有限公司 Layout analysis method, device, computer equipment and storage medium
CN114821622B (en) * 2022-03-10 2023-07-21 北京百度网讯科技有限公司 Text extraction method, text extraction model training method, device and equipment
CN114911787B (en) * 2022-05-31 2023-10-27 南京大学 Multi-source POI data cleaning method integrating position and semantic constraint
CN114861889B (en) * 2022-07-04 2022-09-27 北京百度网讯科技有限公司 Deep learning model training method, target object detection method and device
CN115455129B (en) * 2022-10-14 2023-08-25 阿里巴巴(中国)有限公司 POI processing method, POI processing device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472232A (en) * 2018-10-31 2019-03-15 山东师范大学 Video semanteme characterizing method, system and medium based on multi-modal fusion mechanism
CN112101165A (en) * 2020-09-07 2020-12-18 腾讯科技(深圳)有限公司 Interest point identification method and device, computer equipment and storage medium
CN112200317A (en) * 2020-09-28 2021-01-08 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-modal knowledge graph construction method
CN113032672A (en) * 2021-03-24 2021-06-25 北京百度网讯科技有限公司 Method and device for extracting multi-modal POI (Point of interest) features

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166982A (en) * 2014-06-30 2014-11-26 复旦大学 Image optimization clustering method based on typical correlation analysis
KR102092392B1 (en) * 2018-06-15 2020-03-23 네이버랩스 주식회사 Method and system for automatically collecting and updating information about point of interest in real space
CN111460077B (en) * 2019-01-22 2021-03-26 大连理工大学 Cross-modal Hash retrieval method based on class semantic guidance
WO2021000362A1 (en) * 2019-07-04 2021-01-07 浙江大学 Deep neural network model-based address information feature extraction method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472232A (en) * 2018-10-31 2019-03-15 山东师范大学 Video semanteme characterizing method, system and medium based on multi-modal fusion mechanism
CN112101165A (en) * 2020-09-07 2020-12-18 腾讯科技(深圳)有限公司 Interest point identification method and device, computer equipment and storage medium
CN112200317A (en) * 2020-09-28 2021-01-08 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-modal knowledge graph construction method
CN113032672A (en) * 2021-03-24 2021-06-25 北京百度网讯科技有限公司 Method and device for extracting multi-modal POI (Point of interest) features

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966061A (en) * 2022-12-28 2023-04-14 上海帜讯信息技术股份有限公司 Disaster warning processing method, system and device based on 5G message
CN115966061B (en) * 2022-12-28 2023-10-24 上海帜讯信息技术股份有限公司 Disaster early warning processing method, system and device based on 5G message
CN116665228A (en) * 2023-07-31 2023-08-29 恒生电子股份有限公司 Image processing method and device
CN116665228B (en) * 2023-07-31 2023-10-13 恒生电子股份有限公司 Image processing method and device
CN116805531A (en) * 2023-08-24 2023-09-26 安徽通灵仿生科技有限公司 Pediatric remote medical system
CN116805531B (en) * 2023-08-24 2023-12-05 安徽通灵仿生科技有限公司 Pediatric remote medical system

Also Published As

Publication number Publication date
CN113032672A (en) 2021-06-25
JP2023529939A (en) 2023-07-12
KR20230005408A (en) 2023-01-09

Similar Documents

Publication Publication Date Title
WO2022198854A1 (en) Method and apparatus for extracting multi-modal poi feature
CN112949415B (en) Image processing method, apparatus, device and medium
CN112084790A (en) Relation extraction method and system based on pre-training convolutional neural network
WO2021093308A1 (en) Method and apparatus for extracting poi name, device, and computer storage medium
WO2018177316A1 (en) Information identification method, computing device, and storage medium
CN110826335B (en) Named entity identification method and device
WO2022227769A1 (en) Training method and apparatus for lane line detection model, electronic device and storage medium
WO2022174552A1 (en) Method and apparatus for obtaining poi state information
WO2021208696A1 (en) User intention analysis method, apparatus, electronic device, and computer storage medium
CN114490998B (en) Text information extraction method and device, electronic equipment and storage medium
US20230041943A1 (en) Method for automatically producing map data, and related apparatus
CN112989097A (en) Model training and picture retrieval method and device
CN114092948B (en) Bill identification method, device, equipment and storage medium
CN115359383A (en) Cross-modal feature extraction, retrieval and model training method, device and medium
CN113705716B (en) Image recognition model training method and device, cloud control platform and automatic driving vehicle
CN113407610B (en) Information extraction method, information extraction device, electronic equipment and readable storage medium
CN113255501A (en) Method, apparatus, medium, and program product for generating form recognition model
CN114764874B (en) Deep learning model training method, object recognition method and device
CN113807102B (en) Method, device, equipment and computer storage medium for establishing semantic representation model
CN114691918B (en) Radar image retrieval method and device based on artificial intelligence and electronic equipment
CN115329132A (en) Method, device and equipment for generating video label and storage medium
CN113806541A (en) Emotion classification method and emotion classification model training method and device
CN112818972A (en) Method and device for detecting interest point image, electronic equipment and storage medium
CN115482436B (en) Training method and device for image screening model and image screening method
CN112541496B (en) Method, device, equipment and computer storage medium for extracting POI (point of interest) names

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932468

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022576469

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227044369

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932468

Country of ref document: EP

Kind code of ref document: A1