CN111651674B - Bidirectional searching method and device and electronic equipment - Google Patents

Bidirectional searching method and device and electronic equipment Download PDF

Info

Publication number
CN111651674B
CN111651674B CN202010497516.7A CN202010497516A CN111651674B CN 111651674 B CN111651674 B CN 111651674B CN 202010497516 A CN202010497516 A CN 202010497516A CN 111651674 B CN111651674 B CN 111651674B
Authority
CN
China
Prior art keywords
data
training
text
searched
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010497516.7A
Other languages
Chinese (zh)
Other versions
CN111651674A (en
Inventor
王海
孔飞
刘邦长
谷书锋
赵红文
王燕华
赵进
庄博然
袁晓飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Miaoyijia Health Technology Group Co ltd
Original Assignee
Beijing Miaoyijia Health Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Miaoyijia Health Technology Group Co ltd filed Critical Beijing Miaoyijia Health Technology Group Co ltd
Priority to CN202010497516.7A priority Critical patent/CN111651674B/en
Publication of CN111651674A publication Critical patent/CN111651674A/en
Application granted granted Critical
Publication of CN111651674B publication Critical patent/CN111651674B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a bidirectional searching method, a bidirectional searching device and electronic equipment, wherein the method comprises the following steps: acquiring data to be searched; extracting features of the data to be searched through the multi-mode model to obtain data features of the data to be searched; wherein the multimodal model includes a visual feature extractor and a text feature extractor; searching target data corresponding to the data to be searched based on the data characteristics; the data types of the target data and the data to be searched are the same or different; the data types include a picture type and a text type. The invention can effectively improve the utilization rate of dish data and can also obviously improve the user experience.

Description

Bidirectional searching method and device and electronic equipment
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a bidirectional searching method, a bidirectional searching device, and an electronic device.
Background
With the development of internet technology, numerous vertical food sharing websites appear in the public view, and in general, a great amount of menu information is accumulated in the vertical food sharing websites and matched with corresponding menu pictures. At present, a user can search corresponding menu information based on dish pictures of dishes, however, the unidirectional menu information searching mode cannot fully utilize data in a food sharing website, namely, the utilization rate of the data in the food sharing website is low, and meanwhile, the user searching mode is limited, so that the user experience is influenced to a certain extent.
Disclosure of Invention
Accordingly, the present invention aims to provide a bidirectional searching method, a bidirectional searching device and an electronic device, which can effectively improve the utilization rate of dish data and can also remarkably improve the user experience.
In a first aspect, an embodiment of the present invention provides a bidirectional searching method, including: acquiring data to be searched; extracting features of the data to be searched through a multi-mode model to obtain data features of the data to be searched; wherein the multimodal model includes a visual feature extractor and a text feature extractor; searching target data corresponding to the data to be searched based on the data characteristics; the data types of the target data and the data to be searched are the same or different; the data types include a picture type and a text type.
In one embodiment, the step of obtaining the data to be searched includes: receiving diet data input by a user; wherein the data type of the diet data comprises a picture type or a text type; if the data type of the diet data comprises the picture type, performing color conversion processing and/or size normalization processing on the diet data of the picture type to obtain data to be searched of the picture type; and if the data type of the dietary data comprises the text type, performing word segmentation and/or word deactivation processing on the dietary data of the text type to obtain the data to be searched of the text type.
In one embodiment, the step of extracting features of the data to be searched through the multimodal model to obtain data features of the data to be searched includes: extracting data features of the data to be searched of the picture type by a visual feature extractor in the multi-mode model aiming at the data to be searched of the picture type; and extracting the data characteristics of the data to be searched of the character type by a character characteristic extractor in the multi-mode model aiming at the data to be searched of the character type.
In one embodiment, the step of searching for the target data corresponding to the data to be searched based on the data features includes: calculating a first similarity degree between the data characteristics and candidate picture data stored in a preset database, and determining target data corresponding to the data to be searched from each candidate picture data based on the first similarity degree; and/or calculating a second similarity between the data characteristics and candidate text data stored in the preset database, and determining target data corresponding to the data to be searched from the candidate text data based on the second similarity.
In one embodiment, the training step of the multimodal model includes: acquiring training data; the training data comprises training picture data, training text data and training labels, wherein the training labels are used for representing association relations between the training picture data and the training text data; inputting the training data into the multi-modal model to obtain a visual feature vector corresponding to the training picture data and a text feature vector corresponding to the training text data; calculating a loss value based on the visual feature vector, the text feature vector, and the training tag; training the multimodal model based on the loss value.
In one embodiment, the step of acquiring training data includes: collecting dish images and menu characters in a specified website by adopting a crawler technology; performing color conversion processing and/or size normalization processing on the dish image to obtain training picture data; performing word segmentation and/or stop word removal processing on the menu words to obtain training word data; if the training picture data is associated with the training text data, determining that the training label is a first numerical value; and if the training picture data is not associated with the training text data, determining that the training label is a second numerical value.
In one embodiment, the step of inputting the training data into the multimodal model to obtain a visual feature vector corresponding to the training picture data and a text feature vector corresponding to the training text data includes: extracting data features of the training picture data by a visual feature extractor in the multimodal model; extracting data features of the training text data through a text feature extractor in the multimodal model; mapping the data characteristics of the training picture data to obtain visual characteristic vectors; mapping the data characteristics of the training text data to obtain text feature vectors; wherein, the visual feature vector and the text feature vector have an association relationship.
In a second aspect, an embodiment of the present invention further provides a bidirectional searching apparatus, including: the data acquisition module is used for acquiring data to be searched; the feature extraction module is used for extracting features of the data to be searched through a multi-mode model to obtain data features of the data to be searched; wherein the multimodal model includes a visual feature extractor and a text feature extractor; the searching module is used for searching target data corresponding to the data to be searched based on the data characteristics; the data types of the target data and the data to be searched are the same or different; the data types include a picture type and a text type.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory has stored thereon a computer program which, when executed by the processor, performs the method according to any of the first aspects provided.
In a fourth aspect, embodiments of the present invention also provide a computer storage medium storing computer software instructions for use with any of the methods provided in the first aspect.
According to the bidirectional searching method, the bidirectional searching device and the electronic equipment, data to be searched are firstly obtained, the characteristics of the data to be searched are obtained through the characteristic extraction of the multi-mode model, and then target data corresponding to the data to be searched are searched based on the data characteristics. The multi-modal model comprises a visual feature extractor and a text feature extractor, the target data is the same as or different from the data type of the data to be searched, and the data type comprises a picture type and a text type. According to the method, the multi-mode model is utilized to conduct feature extraction on the data to be searched, and the corresponding target feature is searched based on the extracted data feature, and because the data type of the data to be searched comprises the picture type and the text type, the embodiment of the invention not only can realize bidirectional searching, but also fully utilizes the text data and the image data, and compared with the mode of only utilizing unidirectional searching to obtain the corresponding data in the prior art, the embodiment of the invention effectively improves the data utilization rate, and also remarkably improves the user experience.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a bidirectional searching method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another bidirectional searching method according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a multi-modal model according to an embodiment of the present invention;
Fig. 4 is a schematic structural diagram of a bidirectional searching device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described in conjunction with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
At present, the existing menu information searching mode is generally that menu images are searched in a pre-arranged image library, the most similar N images are obtained by sorting according to the similarity among the images, and finally menu information corresponding to the N images is returned. This approach then suffers from two drawbacks: (1) The searching mode is one-way searching, menu information can be searched only from the pictures, and the inquiry of the menu pictures can not be realized; (2) The searching mode adopts information of a single image mode, so that menu text information rich in semantic information cannot be utilized well, the data utilization rate is low, and the user experience is influenced. In order to solve the problems, the invention provides a bidirectional searching method, a bidirectional searching device and electronic equipment, which can effectively improve the utilization rate of dish data and can also obviously improve user experience.
For the sake of understanding the present embodiment, first, a detailed description will be given of a bidirectional searching method disclosed in the present embodiment, referring to a schematic flow chart of a bidirectional searching method shown in fig. 1, the method mainly includes the following steps S102 to S106:
step S102, obtaining data to be searched. The data types of the data to be searched can include a picture type and a text type. For example, a dish picture may be used as the data to be searched, and a dish text (such as "cook sugar thick" or the like) may be used as the data to be searched. In one embodiment, an uploading channel of the data to be searched can be provided for the user, if the data type of the data to be searched is a picture type, the user can select a required menu picture according to the requirement, and the menu picture is uploaded through the uploading channel, so that the data to be searched uploaded by the user is obtained; if the data type of the data to be searched is a text type, the data to be searched can be obtained through text input or voice-to-text mode.
Step S104, extracting features of the data to be searched through the multi-mode model to obtain data features of the data to be searched. The multi-modal model comprises a visual feature extractor and a text feature extractor, wherein the visual feature extractor is used for extracting data features of the picture type data to be searched, and the text feature extractor is used for extracting data features of the text type data to be searched. In addition, the multi-modal model provided by the embodiment of the invention can be provided with different feature extractors for different modalities, for example, if the data to be searched is of an audio type, the multi-modal model can also comprise the feature extractors for extracting audio features.
Step S106, searching target data corresponding to the data to be searched based on the data characteristics. The target data and the data to be searched are the same or different in data type, and the data type comprises a picture type and a text type. For example, the data feature of the data to be searched based on the picture type may find target data of the picture type and/or the text type, and the data feature of the data to be searched based on the text type may find target data of the picture type and/or the text type. In one embodiment, a database storing dish data may be preconfigured, the dish data may include a menu word for describing a dish making process, and may also include a dish image for representing a finished dish, and then a degree of similarity of a data feature and each dish data in the database is calculated, so that corresponding target data of data to be searched is determined based on the degree of similarity.
According to the bidirectional searching method provided by the embodiment of the invention, the multi-mode model is utilized to perform feature extraction on the data to be searched, and the corresponding target feature is searched based on the extracted data feature.
In consideration of that more interference factors may exist in the acquired data, for example, the data of the text type may include characters or words having no practical meaning or the data of the picture type may have inconsistent picture sizes, and the like, the interference factors will affect the accuracy of data searching, so in order to improve the accuracy of data searching, the embodiment of the present invention may execute the steps of acquiring the data to be searched according to the following steps a to c:
and a step a of receiving diet data input by a user. The diet data, namely the dish data, and the data type of the diet data comprises a picture type or a text type.
And b, if the data type of the diet data comprises a picture type, performing color conversion processing and/or size normalization processing on the diet data of the picture type to obtain the data to be searched of the picture type. In one embodiment, if the picture type diet data input by the user is a gray scale, the data needs to be subjected to a color conversion process to be converted into a color image, so as to facilitate the extraction of the data features in the diet data. In addition, since the size of the diet data of the picture type input by the user may be different, the diet data input by the user may be normalized from the current size to the specified size by performing the size normalization process.
And c, if the data type of the diet data comprises a text type, performing word segmentation and/or word stopping processing on the diet data of the text type to obtain data to be searched of the text type. Supposing that preset stop words comprise words such as 'having' and 'having a next' and the like, for example, diet data are 'mutton chop water is fished out for standby', word segmentation processing is carried out on the mutton chop water, the mutton chop, the scalding, the fishing, the taking out and the standby can be obtained, the word segmentation result comprises the stop word, and the food data can be removed from the mutton chop, so that the mutton chop scalding and the fishing out of the data to be searched are obtained for standby.
In order to facilitate understanding the above step S104, the embodiment of the present invention provides a specific implementation manner for extracting features of data to be searched through a multi-mode model to obtain data features of the data to be searched: (1) Aiming at the data to be searched of the picture type, extracting the data characteristics of the data to be searched of the picture type through a visual characteristic extractor in the multi-mode model; (2) And extracting the data characteristics of the data to be searched of the character type by a character characteristic extractor in the multimodal model aiming at the data to be searched of the character type. The embodiment of the invention provides a structure that a visual feature extractor and a text feature extractor can be adopted, wherein the visual feature extractor can adopt network architectures such as VGG (Visual Geometry Group), resNet (Residual Neural Network) and the like, the text feature extractor can adopt network architectures such as word2vec combined with RNN (Recurrent Neural Network ) or a Transformer network architecture and the like, and in practical application, a required network architecture can be selected as the visual extractor or the text extractor based on practical requirements, and the embodiment of the invention is not limited to this.
In one implementation manner, when searching the target data corresponding to the data to be searched based on the data features, the embodiment of the invention can calculate the first similarity degree between the data features and the candidate picture data in the database and/or the second similarity degree between the data features and the candidate text data in the database based on the user requirements respectively. In specific implementation, (1) if the user expects to output target data of a picture type, a first similarity degree between data features and candidate picture data stored in a preset database can be calculated, and target data corresponding to data to be searched is determined from the candidate picture data based on the first similarity degree; (2) If the user expects to output the target data of the text type, a second similarity degree between the data characteristics and the candidate text data stored in the preset database can be calculated, and the target data corresponding to the data to be searched is determined from the candidate text data based on the second similarity degree.
In order to enable the multi-modal model provided by the embodiment of the present invention to more accurately output target data corresponding to data to be searched, the embodiment of the present invention may train the multi-modal model, and the present invention provides an embodiment of training the multi-modal model, which is described in the following steps 1 to 4:
And step 1, acquiring training data. The training data comprises training picture data, training text data and training labels, wherein the training labels are used for representing association relations between the training picture data and the training text data. For example, the training data includes a pair of data < R, I >, where R represents training text data (i.e., menu text R), I represents training picture data (i.e., menu image I), and further includes a training label L for characterizing whether the menu text R in the pair of data < R, I > corresponds to the menu image I in the pair of data < R, I >, the pair of data < R, I > is characterized by the menu and menu image data corresponding to each other when the training label L takes 1, and the pair of data < R, I > is not the menu and menu image data corresponding to each other when the training label L takes 0.
Further, the embodiment of the present invention provides an implementation manner of acquiring training data, see the following steps 1.1 to 1.4:
and 1.1, acquiring dish images and menu characters in a specified website by adopting a crawler technology. The designated website may include a vertical-like event sharing website or a menu sharing website.
Step 1.2, performing color conversion processing and/or size normalization processing on the dish image to obtain training picture data; and performing word segmentation and/or stop word removal processing on the menu words to obtain training word data. Because the non-practical words or words contained in the training data of the character type and the picture size of the training data of the picture type affect the accuracy degree of feature extraction, the embodiment of the invention respectively processes the menu image and the menu text, thereby obtaining corresponding training picture data and training text data. The specific processing procedure can refer to the foregoing step b and step c, and the embodiments of the present invention are not described herein again.
And step 1.3, if the training picture data is associated with the training text data, determining the training label as a first numerical value. The first value may be 1, that is, when the training picture data and the training text data correspond to each other, the training label of the training data set is 1.
And step 1.4, if the training picture data is not associated with the training text data, determining that the training label is a second numerical value. The first value may be 0, that is, when the training picture data and the training text data do not correspond to each other, the training label of the training data set is 0.
And 2, inputting training data into the multi-modal model to obtain a visual feature vector corresponding to the training picture data and a text feature vector corresponding to the training text data. In the implementation, the step of inputting training data into the multi-modal model to obtain a visual feature vector corresponding to training picture data and a text feature vector corresponding to training text data may be performed according to the following steps 2.1 to 2.2:
step 2.1, extracting data features of training picture data through a visual feature extractor in the multi-modal model; and extracting the data characteristics of the training text data by a text feature extractor in the multimodal model. The dish image I is subjected to visual feature extraction (namely, the data feature of training picture data) by a visual feature extractor, and the visual feature can be extracted by adopting the feature extractor of network architectures such as VGG, resNet and the like; the recipe words R extract text features (i.e., data features of training word data) through a text feature extractor, which may be extracted using a word2 vec-RNN network architecture or using a Transformer network architecture feature extractor.
Step 2.2, mapping the data features of the training picture data to obtain visual feature vectors; mapping the data characteristics of the training text data to obtain text feature vectors; wherein, the visual feature vector and the text feature vector have an association relationship. In one embodiment, the text feature and the visual feature generated in the step 2.1 may be subjected to feature transformation by a full connection layer or a convolution layer sharing parameters to generate a corresponding visual feature vector and a corresponding text feature vector, and an association relationship between the visual feature vector and the text feature vector may be established. Specifically, taking a full-connection layer as an example for explanation, text features and visual features are input into the full-connection layer of the shared parameters together, namely the full-connection layer receives the text features and the visual features simultaneously, the text features and the visual features are forced to be considered simultaneously in the full-connection layer, feature mapping is carried out to obtain text feature vectors corresponding to the text features and visual feature vectors corresponding to the visual features, so that menu characters and menu image features corresponding to each other are mapped to similar spatial positions, and the association relationship between the visual feature vectors and the character feature vectors is established.
And 3, calculating a loss value based on the visual feature vector, the text feature vector and the training label. The visual feature vector may also be referred to as a visual embedded vector, and the text feature vector may also be referred to as a text embedded vector. In a specific implementation manner, the loss value may be calculated according to the visual feature vector v, the text feature vector r, the training label L and the loss function generated in the step 2, and the embodiment of the present invention provides a loss function, which is specifically as follows:
LOSS(r,v,L)=L×d 2 +(1-L)×max(m-d,0) 2 the method comprises the steps of carrying out a first treatment on the surface of the Where LOSS (r, v, L) represents a LOSS value, L represents a training label, d represents a euclidean distance between the visual feature vector v and the text feature vector r, and m is a constant.
And 4, training the multi-mode model based on the loss value. In specific implementation, through inputting the menu text R, the menu image I and the training label L, the LOSS value LOSS is optimized according to the flow provided by the above embodiment, and the LOSS value LOSS is used for end-to-end Model training, and the Model is saved as a multi-mode Model after the training is finished.
In order to facilitate understanding of the bidirectional searching method provided in the foregoing embodiment, another bidirectional searching method is provided according to the foregoing embodiment, and referring to a schematic flow chart of another bidirectional searching method shown in fig. 2, the method mainly includes the following steps S202 to S210:
Step S202, acquiring a menu image and menu characters. In one embodiment, crawler technology may be utilized to collect menu images and menu text in a vertical-type food sharing website or a vertical-type menu sharing website.
Step S204, performing word pretreatment on menu words. In particular, the pretreatment of the dish text in the step c is referred to, and the embodiments of the present invention are not described herein.
Step S206, performing image preprocessing on the dish image. In particular, the pretreatment of the dish text in the step b is referred to, and the embodiments of the present invention are not described herein. In addition, the processing sequence of the menu image and the menu text is not limited by the embodiment of the invention.
Step S208, training the multi-mode model based on the preprocessed menu characters and menu images. In one embodiment, the feature extractor may be used to extract the text feature of the menu text and the visual feature of the menu image, and the full connection layer may be used to generate a text embedding vector corresponding to the text feature and a visual embedding vector corresponding to the visual feature, and an association relationship between the text embedding vector, the visual embedding vector and the visual embedding vector is established, and then training is performed on the multimodal model based on the text embedding vector, the visual embedding vector and the training label, and the specific implementation process may be referred to in the foregoing steps 1 to 4.
Step S210, searching characters or character searching images according to the trained multimodal model. In the searching stage, the menu characters or menu images to be searched, which are input by a user, can be extracted according to the trained multi-mode Model, the features are utilized to search in a database, and the most similar menu images or menu characters are obtained and returned to the user.
In order to facilitate understanding of the above step S210, an exemplary embodiment of the present invention provides a structural schematic diagram of a multi-modal model, as shown in fig. 3, where fig. 3 illustrates that the multi-modal model includes a text feature extractor, a visual feature extractor, and a full-connection layer, where the text feature extractor is used to extract text features of an input menu word, for example, extract text features of "lamb blanching water plus several pieces of ginger", the visual feature extractor is used to extract visual features of an input menu image, the text features and the visual features are simultaneously input to the full-connection layer, the full-connection layer performs feature mapping on the input text features and the visual features to obtain a text embedding feature (i.e., the text embedding vector) and a visual embedding feature (i.e., the visual embedding vector) that have an association relationship, and further calculates a loss value based on the text embedding feature and the visual embedding feature, and the multi-modal model is trained by using the loss value.
In summary, the embodiment of the invention provides a bidirectional search idea for menu characters and menu images, and realizes bidirectional search; according to the scheme of carrying out menu and menu image modeling by adopting multi-mode information of menu characters and menu images at the same time, character embedded features and visual embedded features are generated and distributed to the same space, so that cross-mode searching is carried out in a mode of measuring cross-mode distances through distances among vectors, and in addition, the relation between menu characters and menu images is forcedly learned in a mode of sharing a full connection layer, so that model modeling can be carried out by fusing bimodal images and character information, and data is more comprehensively utilized; in addition, the method provided by the embodiment of the invention is convenient to realize and can be applied to a mobile terminal or a PC (personal computer ) end; furthermore, the bidirectional searching method provided by the embodiment of the invention has the advantages of clear structure, easiness in maintenance and upgrading, capability of improving user experience, high playability, high practicability and the like.
For the bidirectional searching method provided in the above embodiment, the embodiment of the present invention provides a bidirectional searching device, referring to a schematic structural diagram of a bidirectional searching device shown in fig. 4, the device mainly includes the following parts:
The data acquisition module 402 is configured to acquire data to be searched.
The feature extraction module 404 is configured to perform feature extraction on data to be searched through the multimodal model, so as to obtain data features of the data to be searched; wherein the multimodal model includes a visual feature extractor and a text feature extractor.
A searching module 406, configured to search target data corresponding to data to be searched based on the data characteristics; the data types of the target data and the data to be searched are the same or different; the data types include a picture type and a text type.
According to the bidirectional searching device provided by the embodiment of the invention, the multi-mode model is utilized to perform feature extraction on the data to be searched, and the corresponding target feature is searched based on the extracted data feature.
In one embodiment, the data acquisition module 402 is further configured to: receiving diet data input by a user; wherein the data type of the diet data comprises a picture type or a text type; if the data type of the diet data comprises a picture type, performing color conversion processing and/or size normalization processing on the diet data of the picture type to obtain data to be searched of the picture type; if the data type of the diet data comprises the text type, word segmentation and/or word stopping processing are carried out on the diet data of the text type, so that the data to be searched of the text type is obtained.
In one embodiment, the feature extraction module 404 is further configured to: aiming at the data to be searched of the picture type, extracting the data characteristics of the data to be searched of the picture type through a visual characteristic extractor in the multi-mode model; and extracting the data characteristics of the data to be searched of the character type by a character characteristic extractor in the multimodal model aiming at the data to be searched of the character type.
In one embodiment, the searching module 406 is further configured to: calculating a first similarity degree between the data characteristics and candidate picture data stored in a preset database, and determining target data corresponding to the data to be searched from each candidate picture data based on the first similarity degree; and/or calculating a second similarity degree between the data characteristics and the candidate text data stored in the preset database, and determining target data corresponding to the data to be searched from the candidate text data based on the second similarity degree.
In one embodiment, the apparatus further includes a training module configured to: acquiring training data; the training data comprises training picture data, training text data and training labels, wherein the training labels are used for representing association relations between the training picture data and the training text data; inputting training data into a multi-modal model to obtain a visual feature vector corresponding to training picture data and a text feature vector corresponding to training text data; calculating a loss value based on the visual feature vector, the text feature vector and the training tag; the multimodal model is trained based on the loss values.
In one embodiment, the training module is further configured to: collecting dish images and menu characters in a specified website by adopting a crawler technology; performing color conversion processing and/or size normalization processing on the dish image to obtain training picture data; performing word segmentation and/or stop word removal processing on menu words to obtain training word data; if the training picture data is associated with the training text data, determining that the training label is a first numerical value; and if the training picture data is not associated with the training text data, determining the training label as a second numerical value.
In one embodiment, the training module is further configured to: extracting data features of training picture data by a visual feature extractor in the multimodal model; extracting data features of training text data through a text feature extractor in the multimodal model; mapping the data characteristics of the training picture data to obtain visual characteristic vectors; mapping the data characteristics of the training text data to obtain text feature vectors; wherein, the visual feature vector and the text feature vector have an association relationship.
The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned.
The embodiment of the invention provides electronic equipment, which comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the embodiments described above.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 100 includes: a processor 50, a memory 51, a bus 52 and a communication interface 53, the processor 50, the communication interface 53 and the memory 51 being connected by the bus 52; the processor 50 is arranged to execute executable modules, such as computer programs, stored in the memory 51.
The memory 51 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is achieved via at least one communication interface 53 (which may be wired or wireless), and the internet, wide area network, local network, metropolitan area network, etc. may be used.
Bus 52 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 5, but not only one bus or type of bus.
The memory 51 is configured to store a program, and the processor 50 executes the program after receiving an execution instruction, and the method executed by the apparatus for flow defining disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 50 or implemented by the processor 50.
The processor 50 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware in the processor 50 or by instructions in the form of software. The processor 50 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 51 and the processor 50 reads the information in the memory 51 and in combination with its hardware performs the steps of the above method.
The computer program product of the readable storage medium provided by the embodiment of the present invention includes a computer readable storage medium storing a program code, where the program code includes instructions for executing the method described in the foregoing method embodiment, and the specific implementation may refer to the foregoing method embodiment and will not be described herein.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A bi-directional search method, comprising:
acquiring data to be searched;
extracting features of the data to be searched through a multi-mode model to obtain data features of the data to be searched; wherein the multimodal model includes a visual feature extractor and a text feature extractor;
Searching target data corresponding to the data to be searched based on the data characteristics; the data types of the target data and the data to be searched are the same or different; the data types comprise picture types and text types;
the training step of the multi-mode model comprises the following steps:
acquiring training data; the training data comprises training picture data, training text data and training labels, wherein the training labels are used for representing association relations between the training picture data and the training text data;
inputting the training data into the multi-modal model to obtain a visual feature vector corresponding to the training picture data and a text feature vector corresponding to the training text data;
calculating a loss value based on the visual feature vector, the text feature vector, and the training tag;
training the multimodal model based on the loss value;
the step of obtaining training data includes:
collecting dish images and menu characters in a specified website by adopting a crawler technology;
performing color conversion processing and/or size normalization processing on the dish image to obtain training picture data; performing word segmentation and/or stop word removal processing on the menu words to obtain training word data;
If the training picture data is associated with the training text data, determining that the training label is a first numerical value;
if the training picture data is not associated with the training text data, determining that the training label is a second numerical value;
calculating a penalty value based on the visual feature vector, the text feature vector, and the training label includes: calculating a loss value according to the visual feature vector, the text feature vector, the training label and the loss function, wherein the loss function is as follows:
LOSS(r,v,L)=L×d 2 +(1-L)×max(m-d,0) 2 the method comprises the steps of carrying out a first treatment on the surface of the Wherein LOSS (r, v, L) represents
The loss value, L, represents the training label, d represents the euclidean distance between the visual feature vector and the text feature vector, and m is a constant.
2. The method of claim 1, wherein the step of obtaining the data to be searched comprises:
receiving diet data input by a user; wherein the data type of the diet data comprises a picture type or a text type;
if the data type of the diet data comprises the picture type, performing color conversion processing and/or size normalization processing on the diet data of the picture type to obtain data to be searched of the picture type;
and if the data type of the dietary data comprises the text type, performing word segmentation and/or word deactivation processing on the dietary data of the text type to obtain the data to be searched of the text type.
3. The method according to claim 1, wherein the step of extracting features of the data to be searched by using a multi-modal model to obtain data features of the data to be searched includes:
extracting data features of the data to be searched of the picture type by a visual feature extractor in the multi-mode model aiming at the data to be searched of the picture type;
and extracting the data characteristics of the data to be searched of the character type by a character characteristic extractor in the multi-mode model aiming at the data to be searched of the character type.
4. The method according to claim 1, wherein the step of searching for target data corresponding to the data to be searched based on the data characteristics comprises:
calculating a first similarity degree between the data characteristics and candidate picture data stored in a preset database, and determining target data corresponding to the data to be searched from each candidate picture data based on the first similarity degree;
and/or calculating a second similarity between the data characteristics and candidate text data stored in the preset database, and determining target data corresponding to the data to be searched from the candidate text data based on the second similarity.
5. The method of claim 1, wherein the step of inputting the training data into the multimodal model to obtain a visual feature vector corresponding to the training picture data and a text feature vector corresponding to the training text data comprises:
extracting data features of the training picture data by a visual feature extractor in the multimodal model; extracting data features of the training text data through a text feature extractor in the multimodal model;
mapping the data characteristics of the training picture data to obtain visual characteristic vectors; mapping the data characteristics of the training text data to obtain text feature vectors; wherein, the visual feature vector and the text feature vector have an association relationship.
6. A bi-directional searching apparatus, comprising:
the data acquisition module is used for acquiring data to be searched;
the feature extraction module is used for extracting features of the data to be searched through a multi-mode model to obtain data features of the data to be searched; wherein the multimodal model includes a visual feature extractor and a text feature extractor;
The searching module is used for searching target data corresponding to the data to be searched based on the data characteristics; the data types of the target data and the data to be searched are the same or different; the data types comprise picture types and text types;
the training module is used for:
acquiring training data; the training data comprises training picture data, training text data and training labels, wherein the training labels are used for representing association relations between the training picture data and the training text data;
inputting the training data into the multi-modal model to obtain a visual feature vector corresponding to the training picture data and a text feature vector corresponding to the training text data;
calculating a loss value based on the visual feature vector, the text feature vector, and the training tag;
training the multimodal model based on the loss value;
the training module is also used for:
collecting dish images and menu characters in a specified website by adopting a crawler technology;
performing color conversion processing and/or size normalization processing on the dish image to obtain training picture data; performing word segmentation and/or stop word removal processing on the menu words to obtain training word data;
If the training picture data is associated with the training text data, determining that the training label is a first numerical value;
if the training picture data is not associated with the training text data, determining that the training label is a second value
The training module is also used for: calculating a loss value according to the visual feature vector, the text feature vector, the training label and the loss function, wherein the loss function is as follows:
LOSS(r,v,L)=L×d 2 +(1-L)×max(m-d,0) 2 the method comprises the steps of carrying out a first treatment on the surface of the Wherein LOSS (r, v, L) represents
The loss value, L, represents the training label, d represents the euclidean distance between the visual feature vector and the text feature vector, and m is a constant.
7. An electronic device comprising a processor and a memory;
stored on the memory is a computer program which, when executed by the processor, performs the method of any one of claims 1 to 5.
8. A computer storage medium storing computer software instructions for use with the method of any one of claims 1 to 5.
CN202010497516.7A 2020-06-03 2020-06-03 Bidirectional searching method and device and electronic equipment Active CN111651674B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010497516.7A CN111651674B (en) 2020-06-03 2020-06-03 Bidirectional searching method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010497516.7A CN111651674B (en) 2020-06-03 2020-06-03 Bidirectional searching method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111651674A CN111651674A (en) 2020-09-11
CN111651674B true CN111651674B (en) 2023-08-25

Family

ID=72350358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010497516.7A Active CN111651674B (en) 2020-06-03 2020-06-03 Bidirectional searching method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111651674B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488301B (en) * 2020-12-09 2024-04-16 孙成林 Food inversion method based on multitask learning and attention mechanism
CN112613891B (en) * 2020-12-24 2023-10-03 支付宝(杭州)信息技术有限公司 Shop registration information verification method, device and equipment

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003065246A1 (en) * 2002-01-31 2003-08-07 Silverbrook Research Pty Ltd An electronic filing system searchable by a handwritten search query
CN101122953A (en) * 2007-09-21 2008-02-13 北京大学 Picture words segmentation method
CN101477692A (en) * 2009-02-13 2009-07-08 阿里巴巴集团控股有限公司 Method and apparatus for image characteristic extraction
CN101571875A (en) * 2009-05-05 2009-11-04 程治永 Realization method of image searching system based on image recognition
CN102262624A (en) * 2011-08-08 2011-11-30 中国科学院自动化研究所 System and method for realizing cross-language communication based on multi-mode assistance
CN102591890A (en) * 2011-01-17 2012-07-18 腾讯科技(深圳)有限公司 Method for displaying search information and search information display device
CN103310391A (en) * 2012-03-09 2013-09-18 朱曼平 Remote control digital menu and human-computer interaction method for same
CN103793498A (en) * 2014-01-22 2014-05-14 百度在线网络技术(北京)有限公司 Picture searching method and device and searching engine
CN104462873A (en) * 2013-09-13 2015-03-25 北大方正集团有限公司 Picture processing method and picture processing device
CN104615640A (en) * 2014-11-28 2015-05-13 百度在线网络技术(北京)有限公司 Method and device for providing searching keywords and carrying out searching
CN106683091A (en) * 2017-01-06 2017-05-17 北京理工大学 Target classification and attitude detection method based on depth convolution neural network
CN106980686A (en) * 2017-03-31 2017-07-25 努比亚技术有限公司 The segmenting method and terminal of a kind of search term
CN107273106A (en) * 2016-04-08 2017-10-20 北京三星通信技术研究有限公司 Object information is translated and derivation information acquisition methods and device
CN108228757A (en) * 2017-12-21 2018-06-29 北京市商汤科技开发有限公司 Image search method and device, electronic equipment, storage medium, program
CN109255640A (en) * 2017-07-13 2019-01-22 阿里健康信息技术有限公司 A kind of method, apparatus and system of determining user grouping
CN109815355A (en) * 2019-01-28 2019-05-28 网易(杭州)网络有限公司 Image search method and device, storage medium, electronic equipment
CN110532414A (en) * 2019-08-29 2019-12-03 深圳市商汤科技有限公司 A kind of picture retrieval method and device
CN110866140A (en) * 2019-11-26 2020-03-06 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN111090763A (en) * 2019-11-22 2020-05-01 北京视觉大象科技有限公司 Automatic picture labeling method and device
CN111159361A (en) * 2019-12-30 2020-05-15 北京阿尔山区块链联盟科技有限公司 Method and device for acquiring article and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101191223B1 (en) * 2011-11-16 2012-10-15 (주)올라웍스 Method, apparatus and computer-readable recording medium by for retrieving image

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003065246A1 (en) * 2002-01-31 2003-08-07 Silverbrook Research Pty Ltd An electronic filing system searchable by a handwritten search query
CN101122953A (en) * 2007-09-21 2008-02-13 北京大学 Picture words segmentation method
CN101477692A (en) * 2009-02-13 2009-07-08 阿里巴巴集团控股有限公司 Method and apparatus for image characteristic extraction
CN101571875A (en) * 2009-05-05 2009-11-04 程治永 Realization method of image searching system based on image recognition
CN102591890A (en) * 2011-01-17 2012-07-18 腾讯科技(深圳)有限公司 Method for displaying search information and search information display device
CN102262624A (en) * 2011-08-08 2011-11-30 中国科学院自动化研究所 System and method for realizing cross-language communication based on multi-mode assistance
CN103310391A (en) * 2012-03-09 2013-09-18 朱曼平 Remote control digital menu and human-computer interaction method for same
CN104462873A (en) * 2013-09-13 2015-03-25 北大方正集团有限公司 Picture processing method and picture processing device
CN103793498A (en) * 2014-01-22 2014-05-14 百度在线网络技术(北京)有限公司 Picture searching method and device and searching engine
CN104615640A (en) * 2014-11-28 2015-05-13 百度在线网络技术(北京)有限公司 Method and device for providing searching keywords and carrying out searching
CN107273106A (en) * 2016-04-08 2017-10-20 北京三星通信技术研究有限公司 Object information is translated and derivation information acquisition methods and device
CN106683091A (en) * 2017-01-06 2017-05-17 北京理工大学 Target classification and attitude detection method based on depth convolution neural network
CN106980686A (en) * 2017-03-31 2017-07-25 努比亚技术有限公司 The segmenting method and terminal of a kind of search term
CN109255640A (en) * 2017-07-13 2019-01-22 阿里健康信息技术有限公司 A kind of method, apparatus and system of determining user grouping
CN108228757A (en) * 2017-12-21 2018-06-29 北京市商汤科技开发有限公司 Image search method and device, electronic equipment, storage medium, program
CN109815355A (en) * 2019-01-28 2019-05-28 网易(杭州)网络有限公司 Image search method and device, storage medium, electronic equipment
CN110532414A (en) * 2019-08-29 2019-12-03 深圳市商汤科技有限公司 A kind of picture retrieval method and device
CN111090763A (en) * 2019-11-22 2020-05-01 北京视觉大象科技有限公司 Automatic picture labeling method and device
CN110866140A (en) * 2019-11-26 2020-03-06 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN111159361A (en) * 2019-12-30 2020-05-15 北京阿尔山区块链联盟科技有限公司 Method and device for acquiring article and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Achieving_semantic_security_without_keys_through_coding_and_all-or-nothing_transforms_over_wireless_channels;Marco Baldi;《IEEE XPLORE》;全文 *

Also Published As

Publication number Publication date
CN111651674A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN110033018B (en) Graph similarity judging method and device and computer readable storage medium
TW201915787A (en) Search method and processing device
CN109034069B (en) Method and apparatus for generating information
CN110232340B (en) Method and device for establishing video classification model and video classification
CN110110577B (en) Method and device for identifying dish name, storage medium and electronic device
CN111858843B (en) Text classification method and device
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
WO2019028990A1 (en) Code element naming method, device, electronic equipment and medium
CN107862058B (en) Method and apparatus for generating information
CN112507153B (en) Method, computing device, and computer storage medium for image retrieval
CN111651674B (en) Bidirectional searching method and device and electronic equipment
CN114429566A (en) Image semantic understanding method, device, equipment and storage medium
CN112364664A (en) Method and device for training intention recognition model and intention recognition and storage medium
CN113435499B (en) Label classification method, device, electronic equipment and storage medium
CN113076720B (en) Long text segmentation method and device, storage medium and electronic device
CN114429635A (en) Book management method
CN110851597A (en) Method and device for sentence annotation based on similar entity replacement
CN115909357A (en) Target identification method based on artificial intelligence, model training method and device
CN111291561A (en) Text recognition method, device and system
CN113221718A (en) Formula identification method and device, storage medium and electronic equipment
CN113836297A (en) Training method and device for text emotion analysis model
CN111881681A (en) Entity sample obtaining method and device and electronic equipment
CN111753836A (en) Character recognition method and device, computer readable medium and electronic equipment
CN115294227A (en) Multimedia interface generation method, device, equipment and medium
CN115455968A (en) Named entity identification method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant