WO2022198854A1 - Procédé et appareil d'extraction de caractéristique de poi multimodale - Google Patents

Procédé et appareil d'extraction de caractéristique de poi multimodale Download PDF

Info

Publication number
WO2022198854A1
WO2022198854A1 PCT/CN2021/107383 CN2021107383W WO2022198854A1 WO 2022198854 A1 WO2022198854 A1 WO 2022198854A1 CN 2021107383 W CN2021107383 W CN 2021107383W WO 2022198854 A1 WO2022198854 A1 WO 2022198854A1
Authority
WO
WIPO (PCT)
Prior art keywords
poi
feature representation
feature
sample
image
Prior art date
Application number
PCT/CN2021/107383
Other languages
English (en)
Chinese (zh)
Inventor
范淼
黄际洲
王海峰
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to JP2022576469A priority Critical patent/JP2023529939A/ja
Priority to KR1020227044369A priority patent/KR20230005408A/ko
Publication of WO2022198854A1 publication Critical patent/WO2022198854A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/908Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/909Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Definitions

  • the present disclosure relates to the field of computer application technology, and in particular to big data technology in the field of artificial intelligence.
  • POI Point of Interest, point of interest
  • the number of POIs represents the value of the entire system to a certain extent.
  • Comprehensive POI information is the necessary information to enrich the map information system.
  • each POI includes at least multiple modal information, such as name, coordinates, and image.
  • the digital medium and presentation of this information vary. For example, names are generally text in a certain language, coordinates are generally numbers in at least two dimensions, and images are in the form of images. Therefore, a multimodal POI refers to a physical entity described by multiple digital media.
  • the POI information is stored in a relational database.
  • the POI information needs to be queried from the relational database. This requires the ability to quickly calculate the similarity of multi-modal POI, and the calculation of similarity is based on POI features, so how to extract POI features becomes the key.
  • the present disclosure provides a method and apparatus for extracting multimodal POI features.
  • a method for extracting multimodal POI features including:
  • the visual feature representation, semantic feature representation and spatial feature representation of the POI are fused to obtain the multimodal feature representation of the POI.
  • a device for extracting multimodal POI features comprising:
  • a visual feature extraction module for extracting the visual feature representation of the POI from the image of the POI by using an image feature extraction model
  • a semantic feature extraction module for extracting semantic feature representation from the text information of the POI by using a text feature extraction model
  • a spatial feature extraction module for extracting a spatial feature representation from the spatial location information of the POI by using a spatial feature extraction model
  • the feature fusion module is used for fusing the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain the multi-modal feature representation of the POI.
  • an electronic device comprising:
  • the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method as described above.
  • a computer program product comprising a computer program, when executed by a processor, implements the method as described above.
  • the embodiments of the present disclosure provide a method to extract feature vector representations of multiple modal fusions for each POI, thereby providing a basis for subsequent similarity calculation between POIs.
  • FIG. 1 is a flowchart of a method for extracting multimodal POI features provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a training image feature extraction model provided by an embodiment of the present disclosure
  • FIG. 3 is a training flow chart of a fully connected network provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of an apparatus for extracting multi-modal POI features provided by an embodiment of the present disclosure
  • FIG. 5 is a block diagram of an electronic device used to implement embodiments of the present disclosure.
  • the similarity calculation is usually performed on the images of two POIs, the similarity calculation is performed on the names of the two POIs, and the similarity calculation is performed on the coordinates of the two POIs. That is to say, it is necessary to separately calculate the similarity of the features of different modalities, which is computationally complex and time-consuming.
  • the core idea of the present disclosure is to extract feature representations fused by multiple modalities for each POI, so as to provide a basis for subsequent similarity calculation between POIs. The method provided by the present disclosure will be described in detail below with reference to the embodiments.
  • FIG. 1 is a flowchart of a method for extracting multi-modal POI features according to an embodiment of the present disclosure, and the execution body of the method is an apparatus for extracting multi-modal POI features.
  • the device can be embodied as an application located on the server side, or can also be embodied as a plug-in or a software development kit (Software Development Kit, SDK) and other functional units in the application located on the server side, or can also be located in a computer with strong computing power. terminal, which is not particularly limited in this embodiment of the present invention.
  • the method may include the following steps:
  • a visual feature representation of the POI is extracted from the image of the POI using an image feature extraction model.
  • semantic feature representations are extracted from the textual information of the POI using a textual feature extraction model.
  • a spatial feature representation is extracted from the spatial location information of the POI using a spatial feature extraction model.
  • the visual feature representation, semantic feature representation and spatial feature representation of the POI are fused to obtain a multimodal feature representation of the POI.
  • Steps 101 to 103 shown in the above embodiment are only one of the implementation sequences, and other sequences may be used to execute sequentially, or may be executed in parallel.
  • step 101 ie "using an image feature extraction model to extract a visual feature representation of POI from an image containing a POI signboard" will be described in detail.
  • the image in the POI information is usually the image containing the POI signboard.
  • the real picture includes the sign of the store, and the sign usually includes the name of the store, and some also includes the slogan of the store.
  • Another example is to take a real picture of a building, and the real picture contains the signboard of the building, and the signboard is usually the name of the building.
  • Another example is to take a real picture of a school, which contains the school's signboard, which is usually the name of the school.
  • a visual feature representation can be extracted from an image containing the shape of the main body of the building. Images of these POIs can be obtained from the POI database.
  • this step may specifically include the following steps S11 to S12:
  • step S11 the signboard area is extracted from the image containing the POI signboard using the object detection technique.
  • a pre-trained signboard discrimination model can be used. First, the real image is divided into regions. Because the signboard in the real image is a closed area in general, the real image can be identified and divided into regions. For the determined closed area, the signboard discrimination model is input, and the signboard discrimination model is output. The judgment result of whether the closed area is a signboard area.
  • the signboard discrimination model is actually a classification model. Some real images can be collected in advance, and the signboard area and the non-signboard area can be marked as positive and negative samples respectively, and then the signboard discrimination model can be obtained by training the classification model.
  • step S12 the visual feature representation of POI is extracted from the signboard area by using the image feature extraction model obtained by pre-training.
  • the image feature extraction model can be pre-trained based on a deep neural network. After the signboard area is input into the image feature extraction model, the image feature extraction model extracts the visual feature representation of POI from the signboard area.
  • Training samples can be obtained first.
  • the training sample used for training the image feature extraction model is referred to as the first training sample.
  • the expressions such as “first” and “second” involved in the present disclosure do not have a limiting effect on quantity, order, size, etc., but are only used to distinguish names.
  • the above-mentioned first training samples include image samples and category labels of the image samples.
  • the annotation about the category can be the object embodied by the image, for example, an image containing a cat is annotated as cat, and an image containing a dog is annotated as a dog.
  • the category annotation can also be the category of the object represented by the image. For example, an image containing a specific hospital is marked as a hospital, and an image containing a specific school is marked as a school.
  • the image samples are then used as the input of the deep neural network, as shown in Figure 2, and the category annotations of the image samples are used as the target output of the classification network.
  • two networks are involved, that is, a deep neural network and a classification network.
  • the deep neural network extracts the visual feature representation from the image sample and then inputs it into the classification network, and the classification network outputs the classification result of the image sample according to the visual feature representation.
  • the training objective is to minimize the difference between the classification results output by the classification network and the corresponding class labels.
  • the image feature extraction model is obtained by using the deep neural network obtained by training. That is to say, a deep neural network and a classification network are used during training, but the final image feature extraction model only uses the deep neural network, and the classification network is used to assist the training of the deep neural network.
  • the deep neural network used in the above training process can be adopted but not limited to ResNet (Residual Network, Residual Network) 50, ResNet101, EfficientNet (Efficient Network) and the like.
  • the loss function adopted by the classification network can be adopted but not limited to Large-Softmax, A-Softmax, AM-Softmax, cosface, arcface, etc.
  • step 102 ie "using a text feature extraction model to extract semantic feature representation from text information of POI" will be described in detail.
  • the text information of the POI involved in this step may be the text information of the POI obtained from the POI database, such as the POI name, description information, evaluation information, and so on. It may also be text information of the POI recognized from the image including the POI signboard using the text recognition technology. That is, after identifying the signboard area from the image containing the POI signboard, OCR (Optical Character Recognition, Optical Character Recognition) is used to identify the text from the signboard area, such as the name of the POI, advertising slogans, etc., as the text information of the POI.
  • OCR Optical Character Recognition, Optical Character Recognition
  • the text feature extraction model utilized in this step can adopt but is not limited to the following:
  • the first is the Wording Embedding model.
  • Wording Embedding models such as Word2Vec (word vector), Glove, etc. can be used.
  • the second, pre-trained language model The second, pre-trained language model.
  • pre-trained language models such as Bert (Bidirectional Encoder Representations from Transformers, bidirectional encoding representation from Transformers) and Ernie (Enhanced Representation from kNowledge IntEgration, using entity information to enhance language representation) can be used.
  • the third is to use the existing POI text data to fine-tune the pre-trained language model.
  • step 103 that is, "using a spatial feature extraction model to extract a spatial feature representation from the spatial location information of the POI" will be described in detail below.
  • the spatial position information of the POI involved in this step mainly refers to the information for marking the spatial position of the POI in a certain form, such as coordinate information.
  • the spatial feature representation can be extracted directly from the spatial location information of the POI by using the spatial feature extraction model.
  • the present disclosure provides a preferred embodiment, which may specifically include the following steps S21 to S22:
  • step S21 hash coding is performed on the spatial location information of the POI to obtain a hash code.
  • geohash latitude and longitude address encoding
  • goehash uses a string to represent the two coordinates of precision and dimension. After goehash encoding, the first few digits of the hash code of the two coordinates located in the same block are the same, and only the last few digits are distinguished.
  • step S22 the hash code is converted into a spatial feature representation using a spatial feature extraction model.
  • the spatial feature extraction model used in this step can use the Word Emedding model, that is, the hash code is converted into a quantifiable spatial feature representation by this embedding method.
  • the similarity task can be used to further train it.
  • the training target is: the closer the position of the two POIs, the higher the similarity between the spatial feature representations output by the Wording Embedding model.
  • step 104 namely, "merging the visual feature representation, semantic feature representation and spatial feature representation of POI to obtain a multi-modal feature representation of POI" will be described in detail below.
  • the visual feature representation, semantic feature representation and spatial feature representation of the above POI can be directly spliced, and the spliced feature can be used as the multi-modal feature representation of the POI.
  • this method is relatively rigid, lacks learning ability, and is naturally inaccurate in expression.
  • the present disclosure provides a preferred fusion method, which may specifically include the following steps S31 to S32:
  • step S31 the visual feature representation, semantic feature representation and spatial feature representation of the POI are spliced to obtain spliced features.
  • the visual feature representation, the semantic feature representation and the spatial feature representation can be spliced end to end in a preset order.
  • the dimension of the vector represented by the feature is different, and a preset value such as 0 can be used to make up.
  • step S32 the splicing feature is input into a pre-trained full connection network (Full Connection), and the multimodal feature representation of the POI output by the full connection network is obtained.
  • Full Connection Full Connection
  • the process may include the following steps:
  • a second training sample is obtained, where the second training sample includes POI samples and category labels for the POI samples.
  • Some POIs with image, text and spatial location information can be obtained in advance as POI samples, and the categories of these POIs can be annotated.
  • the labels are hospitals, buildings, schools, bus stops, shops, etc. These POI samples and their category labels are used as second training samples to train the fully connected network used in feature fusion.
  • a visual feature representation of the POI sample is extracted from the image of the POI sample using an image feature extraction model.
  • a textual feature extraction model is used to extract semantic feature representations from the textual information of the POI samples.
  • a spatial feature representation is extracted from the spatial location information of the POI samples using a spatial feature extraction model.
  • Steps 302 to 304 shown in the same manner are only one of the implementation sequences, and other sequences may also be used to execute sequentially, or may be executed in parallel.
  • the visual feature representation, the semantic feature representation and the spatial feature representation of the POI samples are spliced to obtain splicing features of the POI samples.
  • the splicing feature of the POI sample is input into the fully connected network, and the multimodal feature representation of the POI sample output by the fully connected layer is obtained; the multimodal feature representation is input into the classification network, and the category label of the POI sample is used as the classification network.
  • the loss function used by the classification network can be, but not limited to, Large-Softmax, A-Softmax, AM-Softmax, cosface, arcface, etc.
  • the model parameters of the image feature extraction model, the text feature extraction model and the spatial feature extraction model can remain unchanged, or can be updated in the above training process.
  • the multimodal feature representation of each POI is obtained separately for each POI in the manner in the above method embodiment, and the multimodal feature representation of each POI can be stored in a database.
  • the multimodal feature representation of POIs can be used to calculate the similarity between POIs. Specific application scenarios may include automatic production, intelligent retrieval and recommendation, etc. that are not limited to POI.
  • a collector or a collection device shoots an image containing a POI signboard, and saves the POI's image, name, coordinates and other information.
  • the massive POI data collected historically is extracted and stored in a database using the method in the above-mentioned embodiments of the present disclosure.
  • distributed redis is used as the feature library of the multi-modal feature representation.
  • the storage structure can take the form of key (key)-value (value) pairs.
  • the method in the above-mentioned embodiments of the present disclosure is also used to extract the multi-modal feature representation, and then the multi-modal feature representation is used for retrieval and matching in the feature database, for example, NN (Nearest Neighbor, nearest neighbor retrieval) is used. , ANN (Approximate Nearest Neighbor, approximate nearest neighbor retrieval) and other retrieval methods.
  • the retrieval process is based on the calculation of the similarity between the multimodal feature representation of the newly collected POI and the multimodal feature representation of the existing POI in the database, so as to judge whether the newly collected POI data is the data of the existing POI. For some POI data that have not been retrieved and matched, or POI data that cannot be processed by automation due to unrecognized text, insufficient image clarity, wrong coordinates, etc., it is submitted to the artificial platform for operation.
  • FIG. 4 is a schematic diagram of an apparatus for extracting multi-modal POI features provided by an embodiment of the present disclosure.
  • the apparatus may include: a visual feature extraction module 401 , a semantic feature extraction module 402 , a spatial feature extraction module 403 and
  • the feature fusion module 404 may further include a first model training unit 405 , a text acquisition unit 406 , a second model training unit 407 and a similarity calculation unit 408 .
  • the main functions of each unit are as follows:
  • the visual feature extraction module 401 is used for extracting the visual feature representation of the POI from the image of the POI by using the image feature extraction model.
  • the semantic feature extraction module 402 is used for extracting the semantic feature representation from the text information of the POI by using the text feature extraction model.
  • the spatial feature extraction module 403 is used for extracting the spatial feature representation from the spatial location information of the POI by using the spatial feature extraction model.
  • the feature fusion module 404 is used to fuse the visual feature representation, semantic feature representation and spatial feature representation of the POI to obtain the multimodal feature representation of the POI.
  • the visual feature extraction module 401 can use the target detection technology to extract the signboard area from the image containing the POI signboard; use the image feature extraction model obtained by pre-training to extract the visual feature representation of the POI from the signboard area.
  • the first model training unit 405 is used to pre-train to obtain an image feature extraction model in the following manner: obtain a first training sample, the first training sample includes: an image sample and a category label for the image sample; The input is to use the category label of the image sample as the target output of the classification network to train the deep neural network and the classification network; among them, the deep neural network extracts the visual feature representation from the image sample and then inputs it to the classification network, and the classification network outputs the image according to the visual feature representation.
  • the classification result of the sample; after the training, the image feature extraction model is obtained by using the deep neural network obtained by training.
  • the text obtaining unit 406 is configured to obtain the text information of the POI from the POI database; and/or, use the text recognition technology to recognize and obtain the text information of the POI from the image containing the POI signboard.
  • the text feature extraction model may include, but is not limited to: a Word Embedding model, a pre-trained language model, or a model obtained by fine-tuning the pre-trained language model with existing POI text data.
  • the spatial feature extraction module 403 is specifically configured to perform hash coding on the spatial location information of the POI to obtain a hash code; and use a spatial feature extraction model to convert the hash code into a spatial feature representation.
  • the spatial feature extraction model may include the Word Embedding model.
  • the feature fusion unit 403 can be specifically used to splicing the visual feature representation, semantic feature representation and spatial feature representation of POI to obtain splicing features; input the splicing features into a pre-trained fully connected network to obtain Multimodal feature representation of POI output from a fully connected network.
  • the second model training unit 407 is used for pre-training to obtain a fully connected network in the following manner:
  • the second training sample includes POI samples and the category labeling of the POI samples; use the image feature extraction model to extract the visual feature representation of the POI samples from the images of the POI samples; use the text feature extraction model to extract the text from the POI samples Extracting the semantic feature representation from the information; using the spatial feature extraction model to extract the spatial feature representation from the spatial location information of the POI sample; splicing the visual feature representation, semantic feature representation and spatial feature representation of the POI sample to obtain the splicing feature of the POI sample; Input the splicing features of POI samples into the fully connected network, and obtain the multimodal feature representation of the POI samples output by the fully connected layer; input the multimodal feature representation into the classification network, and use the category label of the POI sample as the target output of the classification network. Fully connected and classified networks.
  • the similarity calculation unit 408 is configured to calculate the similarity between POIs based on the multimodal feature representation of POIs.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 5 it is a block diagram of an electronic device according to an embodiment of the present disclosure.
  • Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 500 includes a computing unit 501 that can be executed according to a computer program stored in a read only memory (ROM) 502 or loaded from a storage unit 508 into a random access memory (RAM) 503 Various appropriate actions and handling. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored.
  • the computing unit 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504 .
  • Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506, such as a keyboard, mouse, etc.; an output unit 507, such as various types of displays, speakers, etc.; a storage unit 508, such as a magnetic disk, an optical disk, etc. ; and a communication unit 509, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • Computing unit 501 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the computing unit 501 performs the various methods and processes described above, such as the extraction method of multimodal POI features.
  • the extraction method of multimodal POI features may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508 .
  • part or all of the computer program may be loaded and/or installed on device 500 via ROM 802 and/or communication unit 509 .
  • the computer program When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the method for extracting multimodal POI features described above may be performed.
  • the computing unit 501 may be configured to perform the extraction method of multimodal POI features by any other suitable means (eg, by means of firmware).
  • Various implementations of the systems and techniques described herein may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip System (SOC), Load Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system on a chip System
  • CPLD Load Programmable Logic Device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the execution of the flowcharts and/or block diagrams The function/operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • a computer system can include clients and servers.
  • Clients and servers are generally remote from each other and usually interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un appareil d'extraction d'une caractéristique de POI multimodale, qui se rapportent à la technologie des données de masse dans le domaine de l'intelligence artificielle. Le procédé consiste à : extraire une représentation de caractéristique visuelle d'un POI à partir d'une image du POI à l'aide d'un modèle d'extraction de caractéristique d'image ; extraire une représentation de caractéristique sémantique à partir d'informations de texte du POI à l'aide d'un modèle d'extraction de caractéristique de texte ; extraire une représentation de caractéristique spatiale à partir d'informations de position spatiale du POI à l'aide d'un modèle d'extraction de caractéristique spatiale ; et fusionner la représentation de caractéristique visuelle, la représentation de caractéristique sémantique et la représentation de caractéristique spatiale du POI, de façon à obtenir une représentation de caractéristique multimodale du POI. Au moyen du procédé, une représentation vectorielle de caractéristiques qui fusionne de multiples modalités est extraite pour chaque POI, fournissant ainsi une base pour le calcul ultérieur de la similarité entre des POI.
PCT/CN2021/107383 2021-03-24 2021-07-20 Procédé et appareil d'extraction de caractéristique de poi multimodale WO2022198854A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022576469A JP2023529939A (ja) 2021-03-24 2021-07-20 マルチモーダルpoi特徴の抽出方法及び装置
KR1020227044369A KR20230005408A (ko) 2021-03-24 2021-07-20 멀티 모달 poi 특징의 추출 방법 및 장치

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110312700.4A CN113032672A (zh) 2021-03-24 2021-03-24 多模态poi特征的提取方法和装置
CN202110312700.4 2021-03-24

Publications (1)

Publication Number Publication Date
WO2022198854A1 true WO2022198854A1 (fr) 2022-09-29

Family

ID=76473210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107383 WO2022198854A1 (fr) 2021-03-24 2021-07-20 Procédé et appareil d'extraction de caractéristique de poi multimodale

Country Status (4)

Country Link
JP (1) JP2023529939A (fr)
KR (1) KR20230005408A (fr)
CN (1) CN113032672A (fr)
WO (1) WO2022198854A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966061A (zh) * 2022-12-28 2023-04-14 上海帜讯信息技术股份有限公司 基于5g消息的灾情预警处理方法、系统和装置
CN116665228A (zh) * 2023-07-31 2023-08-29 恒生电子股份有限公司 图像处理方法及装置
CN116805531A (zh) * 2023-08-24 2023-09-26 安徽通灵仿生科技有限公司 一种儿科远程医疗系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032672A (zh) * 2021-03-24 2021-06-25 北京百度网讯科技有限公司 多模态poi特征的提取方法和装置
CN113657274B (zh) * 2021-08-17 2022-09-20 北京百度网讯科技有限公司 表格生成方法、装置、电子设备及存储介质
CN113807102B (zh) * 2021-08-20 2022-11-01 北京百度网讯科技有限公司 建立语义表示模型的方法、装置、设备和计算机存储介质
CN113807218B (zh) * 2021-09-03 2024-02-20 科大讯飞股份有限公司 版面分析方法、装置、计算机设备和存储介质
CN114821622B (zh) * 2022-03-10 2023-07-21 北京百度网讯科技有限公司 文本抽取方法、文本抽取模型训练方法、装置及设备
CN114911787B (zh) * 2022-05-31 2023-10-27 南京大学 一种融合位置和语义约束的多源poi数据清洗方法
CN114861889B (zh) * 2022-07-04 2022-09-27 北京百度网讯科技有限公司 深度学习模型的训练方法、目标对象检测方法和装置
CN115455129B (zh) * 2022-10-14 2023-08-25 阿里巴巴(中国)有限公司 Poi处理方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472232A (zh) * 2018-10-31 2019-03-15 山东师范大学 基于多模态融合机制的视频语义表征方法、系统及介质
CN112101165A (zh) * 2020-09-07 2020-12-18 腾讯科技(深圳)有限公司 兴趣点识别方法、装置、计算机设备和存储介质
CN112200317A (zh) * 2020-09-28 2021-01-08 西南电子技术研究所(中国电子科技集团公司第十研究所) 多模态知识图谱构建方法
CN113032672A (zh) * 2021-03-24 2021-06-25 北京百度网讯科技有限公司 多模态poi特征的提取方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104166982A (zh) * 2014-06-30 2014-11-26 复旦大学 基于典型相关性分析的图像优化聚类方法
KR102092392B1 (ko) * 2018-06-15 2020-03-23 네이버랩스 주식회사 실 공간에서 관심지점 관련 정보를 자동으로 수집 및 업데이트하는 방법 및 시스템
CN111460077B (zh) * 2019-01-22 2021-03-26 大连理工大学 一种基于类语义引导的跨模态哈希检索方法
CN110377686B (zh) * 2019-07-04 2021-09-17 浙江大学 一种基于深度神经网络模型的地址信息特征抽取方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472232A (zh) * 2018-10-31 2019-03-15 山东师范大学 基于多模态融合机制的视频语义表征方法、系统及介质
CN112101165A (zh) * 2020-09-07 2020-12-18 腾讯科技(深圳)有限公司 兴趣点识别方法、装置、计算机设备和存储介质
CN112200317A (zh) * 2020-09-28 2021-01-08 西南电子技术研究所(中国电子科技集团公司第十研究所) 多模态知识图谱构建方法
CN113032672A (zh) * 2021-03-24 2021-06-25 北京百度网讯科技有限公司 多模态poi特征的提取方法和装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115966061A (zh) * 2022-12-28 2023-04-14 上海帜讯信息技术股份有限公司 基于5g消息的灾情预警处理方法、系统和装置
CN115966061B (zh) * 2022-12-28 2023-10-24 上海帜讯信息技术股份有限公司 基于5g消息的灾情预警处理方法、系统和装置
CN116665228A (zh) * 2023-07-31 2023-08-29 恒生电子股份有限公司 图像处理方法及装置
CN116665228B (zh) * 2023-07-31 2023-10-13 恒生电子股份有限公司 图像处理方法及装置
CN116805531A (zh) * 2023-08-24 2023-09-26 安徽通灵仿生科技有限公司 一种儿科远程医疗系统
CN116805531B (zh) * 2023-08-24 2023-12-05 安徽通灵仿生科技有限公司 一种儿科远程医疗系统

Also Published As

Publication number Publication date
KR20230005408A (ko) 2023-01-09
JP2023529939A (ja) 2023-07-12
CN113032672A (zh) 2021-06-25

Similar Documents

Publication Publication Date Title
WO2022198854A1 (fr) Procédé et appareil d'extraction de caractéristique de poi multimodale
CN112949415B (zh) 图像处理方法、装置、设备和介质
CN112084790A (zh) 一种基于预训练卷积神经网络的关系抽取方法及系统
WO2021093308A1 (fr) Procédé et appareil d'extraction de nom de point d'intérêt (poi), dispositif et support de stockage informatique associés
WO2021208696A1 (fr) Procédé d'analyse d'intention d'utilisateur, appareil, dispositif électronique et support de stockage informatique
WO2022174552A1 (fr) Procédé et appareil d'obtention d'informations d'état de poi
CN113705716B (zh) 图像识别模型训练方法、设备、云控平台及自动驾驶车辆
CN115359383A (zh) 跨模态特征提取、检索以及模型的训练方法、装置及介质
CN112989097A (zh) 模型训练、图片检索方法及装置
CN114764566B (zh) 用于航空领域的知识元抽取方法
CN114092948B (zh) 一种票据识别方法、装置、设备以及存储介质
CN114495113A (zh) 文本分类方法和文本分类模型的训练方法、装置
CN113139110A (zh) 区域特征处理方法、装置、设备、存储介质和程序产品
CN113255501A (zh) 生成表格识别模型的方法、设备、介质及程序产品
CN115482436B (zh) 图像筛选模型的训练方法、装置以及图像筛选方法
CN114764874B (zh) 深度学习模型的训练方法、对象识别方法和装置
CN113807102B (zh) 建立语义表示模型的方法、装置、设备和计算机存储介质
CN112818972B (zh) 兴趣点图像的检测方法、装置、电子设备及存储介质
CN113360791B (zh) 电子地图的兴趣点查询方法、装置、路侧设备及车辆
CN113806541A (zh) 情感分类的方法和情感分类模型的训练方法、装置
CN112541496B (zh) 提取poi名称的方法、装置、设备和计算机存储介质
CN113221564B (zh) 训练实体识别模型的方法、装置、电子设备和存储介质
CN114861062B (zh) 信息过滤方法和装置
CN115168599B (zh) 多三元组抽取方法、装置、设备、介质及产品
CN113343670B (zh) 基于隐马尔可夫与分类算法耦合的地址文本要素提取方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932468

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022576469

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227044369

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932468

Country of ref document: EP

Kind code of ref document: A1