WO2024087202A1 - 一种搜索方法、模型训练方法、装置及存储介质 - Google Patents

一种搜索方法、模型训练方法、装置及存储介质 Download PDF

Info

Publication number
WO2024087202A1
WO2024087202A1 PCT/CN2022/128375 CN2022128375W WO2024087202A1 WO 2024087202 A1 WO2024087202 A1 WO 2024087202A1 CN 2022128375 W CN2022128375 W CN 2022128375W WO 2024087202 A1 WO2024087202 A1 WO 2024087202A1
Authority
WO
WIPO (PCT)
Prior art keywords
private entity
search
private
search request
user
Prior art date
Application number
PCT/CN2022/128375
Other languages
English (en)
French (fr)
Inventor
黄振东
蒋昊
杨光
李继忠
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/128375 priority Critical patent/WO2024087202A1/zh
Publication of WO2024087202A1 publication Critical patent/WO2024087202A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions

Definitions

  • the present application relates to the field of computer search technology, and in particular to a search method, a model training method, a device and a storage medium.
  • an embodiment of the present application provides a search method, comprising: receiving a first search request from a user, the first search request including a private entity; wherein the private entity represents an entity having an associated relationship with the user; determining, among the features corresponding to at least one private entity, the features corresponding to the private entity in the first search request; wherein the features corresponding to the at least one private entity indicate data corresponding to the at least one private entity, and the data corresponding to the at least one private entity corresponds to a different modality from the first search request; obtaining search results according to the first search request and the features corresponding to the private entity in the first search request; the search results correspond to a different modality from the first search request; and displaying the search results.
  • the features corresponding to the private entity in the user's search request are determined, and the search results are obtained according to the features corresponding to the private entity and the first search request. Since the features corresponding to at least one private entity indicate the data corresponding to at least one private entity, the data corresponding to at least one private entity can correspond to the same modality as the search results, so that the private entity in the search request can be corresponded to the specific image in reality (that is, the specific image of the private entity in the search results), thereby realizing a concrete search; solving the problem that the terminal device cannot perform a concrete search during the search process, improving the search effect and search efficiency, and enhancing the user's search experience.
  • the model can be used to infer the private entity in the first search request and the first search request to obtain the public-private fusion features, and then the search results are determined in the database through the public-private fusion features.
  • the model can be used to infer the private entity in the first search request and the first search request to obtain the public-private fusion features, and then the search results are determined in the database through the public-private fusion features.
  • obtaining search results based on the first search request and features corresponding to private entities in the first search request includes: processing the first search request and features corresponding to private entities in the first search request through a model to generate a fused feature; the fused feature indicates the first search request; and using data corresponding to features in a database that match the fused feature as the search result.
  • the model is used to infer the first search request to generate fused features, which can simultaneously represent public information and concrete private information; as an example, the first search request can be processed by the model to generate public features, and then the generated public features and private features (i.e., the features corresponding to the private entities in the first search request) are fused to generate fused features; thereby, for different users, the search results that the users want can be searched, thereby realizing concrete search based on the fusion of private features and public features.
  • the first search request can be processed by the model to generate public features, and then the generated public features and private features (i.e., the features corresponding to the private entities in the first search request) are fused to generate fused features; thereby, for different users, the search results that the users want can be searched, thereby realizing concrete search based on the fusion of private features and public features.
  • the method also includes: obtaining data corresponding to a first private entity; the first private entity is any private entity among the at least one private entity; processing the data corresponding to the first private entity to generate features corresponding to the first private entity.
  • data corresponding to the first private entity is obtained.
  • the data corresponding to the first private entity can be obtained by explicit collection or implicit collection, so as to collect concrete private information within the scope permitted by privacy security; and the data corresponding to the first private entity is processed to generate features corresponding to the first private entity, so as to obtain the features corresponding to the first private entity; exemplarily, the features corresponding to the first private entity can be stored in the terminal device; thereby, in the subsequent process of the terminal device performing multimodal search, support is provided for quickly determining the features corresponding to the target private entity.
  • obtaining data corresponding to the first private entity includes: sending a first prompt message to a user, the first prompt message being used to prompt the user to select data containing the first private entity; and obtaining data corresponding to the first private entity in response to the user's selection operation.
  • the data corresponding to the first private entity is obtained in an explicit collection method that can be perceived by the user. Since the private information is subjectively selected by the user, the private information collected in this way has the characteristic of high confidence. Thereby, it is achieved within the scope of user privacy security to collect high-confidence concrete private information through display guidance.
  • obtaining data corresponding to the first private entity includes: obtaining initial data containing the first private entity; displaying at least one search result in response to a second search request of the user; the second search request contains the first private entity, and the at least one search result corresponds to a different modality from the second search request; the at least one search result includes the initial data; based on the user's operation of selecting the initial data in the at least one search result, using the initial data as the data corresponding to the first private entity.
  • low-confidence concrete private information i.e., initial data
  • at least one search result is displayed, and based on the user's operation of selecting the above initial data in at least one search result, the initial data is used as the data corresponding to the first private entity, thereby improving the confidence of the concrete private information, thereby acquiring the data corresponding to the first private entity in an implicit collection method that the user is unaware of.
  • the method further includes: obtaining data containing the first private entity and the second private entity; wherein the second private entity is any private entity among the at least one private entity except the first private entity; and obtaining features corresponding to the second private entity based on the data containing the first private entity and the second private entity, and features corresponding to the first private entity.
  • the features corresponding to the second private entity are obtained according to the acquired data containing multiple private entities and the generated private knowledge (i.e., the features corresponding to the first private entity); as an example, common sense and models can be used to obtain the features corresponding to the second private entity, and the private knowledge set can be automatically expanded. Therefore, within the scope permitted by privacy security, with as little collection as possible and a small amount of generated private knowledge, the concrete private knowledge can be improved, thereby achieving effective use of data in situations such as when there is little user feedback data, and providing support for realizing concrete search in the process of terminal devices performing multimodal search.
  • the private entity includes: a title, a name, or a nickname associated with the user.
  • an embodiment of the present application provides a search method, comprising: displaying a first display interface; the first display interface comprising a search portal; in response to a user inputting a keyword in the search portal, displaying a second display interface, the second display interface comprising a first search result corresponding to the keyword; wherein the keyword includes a private entity; the first search result is obtained based on features corresponding to the keyword and the private entity; the first search result and the keyword correspond to different modalities.
  • the second display interface is displayed; since the first search result is obtained based on the keyword and the characteristics corresponding to the private entity in the search request, the private entity in the keyword can be corresponded to the specific image in reality (that is, the specific image of the private entity in the search result), thereby realizing a concrete search; the problem that the terminal device cannot perform a concrete search during the search process is solved, the search effect and search efficiency are improved, and the user's search experience is enhanced.
  • a third display interface is displayed, and the third display interface includes first prompt information and at least one data; the first prompt information is used to prompt the user to select data containing a first private entity; in response to the user's selection operation in the at least one data, a fourth display interface is displayed, and the fourth display interface includes first data and a first identifier, and the first identifier is used to indicate that the first data is the data selected by the user.
  • the data corresponding to the first private entity is obtained in an explicit collection method that can be perceived by the user. Since the private information is subjectively selected by the user, the private information collected in this way has the characteristic of high confidence. Thereby, it is achieved within the scope of user privacy security to collect high-confidence concrete private information through display guidance.
  • the second display interface also includes: second prompt information, the second prompt information is used to prompt the user to confirm whether the first search result is the search result expected by the user; the method also includes: in response to the user's confirmation operation, displaying a fifth display interface, the fifth display interface including the first search result and a second identifier, the second identifier is used to indicate that the first search result is the search result confirmed by the user.
  • an embodiment of the present application provides a model training method, the method comprising: obtaining a multimodal sample set and a first model; wherein the multimodal sample set comprises: a multimodal sample corresponding to a first private entity, a multimodal sample corresponding to a first search request, and a multimodal sample corresponding to a second search request; the first private entity represents an entity having an association with a user, the first search request includes the first private entity, and the second search request does not include the first private entity; the first model is trained using the multimodal sample set to obtain a second model.
  • the trained model i.e., the second model
  • the trained model can generate similar features for the same private entity in samples of different modalities, and has the ability to infer public information and visualize the features of private information; then the second model can be used for visualized search, greatly improving the search effect and efficiency.
  • a multimodal model can be trained using a public data set to obtain a first model for generating similar features for the same non-private entity in samples of different modalities; the first model trained with a public data set has the ability to recognize common entities.
  • the using the multimodal sample set to train the first model to obtain the second model includes: processing the multimodal samples corresponding to the first private entity and the multimodal samples corresponding to the first search request through the first model to obtain fused features; training the first model according to the fused features, the features corresponding to the first search request, and the features corresponding to the second search request to obtain the second model; wherein the features corresponding to the first search request are obtained by processing the multimodal samples corresponding to the first search request by the first model, and the features corresponding to the second search request are obtained by processing the multimodal samples corresponding to the second search request by the first model.
  • the multimodal model and private data are used to fuse private features with public features, and then aligned with the corresponding multimodal features to train a public-private fusion multimodal model.
  • the first model is fine-tuned by jointly training public and private features, which enables the trained model (i.e., the second model) to have the ability to infer concrete private information. This solves the problem that the multimodal model trained only with public data sets cannot carry concrete private information.
  • the multimodal samples corresponding to the first private entity and the multimodal samples corresponding to the first search request are processed by the first model to obtain a fused feature, including: inputting the multimodal samples corresponding to the first search request into the first model to generate a first feature; inputting the multimodal samples corresponding to the first private entity into the first model to generate a second feature; and fusing the first feature and the second feature to obtain the fused feature.
  • the first model in the process of training the first model, is used to generate private features (i.e., the second features) and public features (i.e., the first features), and the generated public features and private features are fused.
  • the fused features can simultaneously represent public information and concrete private information, so that the first model can take into account the learning of both public information and private information.
  • the multimodal samples include samples of the first modality and samples of the second modality; the features corresponding to the first search request are obtained by processing the samples of the first modality corresponding to the first search request by the first model, and the features corresponding to the second search request are obtained by processing the samples of the first modality corresponding to the second search request by the first model; the processing of the multimodal samples corresponding to the first private entity and the multimodal samples corresponding to the first search request by the first model to obtain fused features includes: inputting the samples of the first modality corresponding to the first private entity and the samples of the second modality corresponding to the first search request into the first model to obtain the fused features.
  • an embodiment of the present application provides a search device, comprising: a receiving module, used to receive a first search request of a user, wherein the first search request includes a private entity; wherein the private entity represents an entity that has an associated relationship with the user; a determination module, used to determine, among the features corresponding to at least one private entity, a feature corresponding to the private entity in the first search request; wherein the feature corresponding to the at least one private entity indicates data corresponding to the at least one private entity, and the data corresponding to the at least one private entity corresponds to a different modality from the first search request; a search module, used to obtain search results based on the first search request and the features corresponding to the private entity in the first search request; the search results correspond to a different modality from the first search request; and a display module, used to display the search results.
  • the search module is further used to: process the first search request and features corresponding to private entities in the first search request through a model to generate a fused feature; the fused feature indicates the first search request; and use data corresponding to features in the database that match the fused feature as the search result.
  • the device also includes: a generation module, used to obtain data corresponding to a first private entity; the first private entity is any private entity among the at least one private entity; the data corresponding to the first private entity is processed to generate features corresponding to the first private entity.
  • a generation module used to obtain data corresponding to a first private entity; the first private entity is any private entity among the at least one private entity; the data corresponding to the first private entity is processed to generate features corresponding to the first private entity.
  • the generation module is further used to: send a first prompt message to the user, wherein the first prompt message is used to prompt the user to select data containing the first private entity; and in response to the user's selection operation, obtain data corresponding to the first private entity.
  • the generation module is further used to: obtain initial data containing the first private entity; display at least one search result in response to a second search request of a user; the second search request contains the first private entity, and the at least one search result corresponds to a different modality from the second search request; the at least one search result includes the initial data; based on the user's operation of selecting the initial data in the at least one search result, use the initial data as data corresponding to the first private entity.
  • the generation module is further used to: obtain data containing the first private entity and the second private entity; wherein the second private entity is any private entity among the at least one private entity except the first private entity; and obtain features corresponding to the second private entity based on the data containing the first private entity and the second private entity, and features corresponding to the first private entity.
  • the private entity includes: a title, a name, or a nickname associated with the user.
  • an embodiment of the present application provides a search device, comprising: a first display module, used to display a first display interface; the first display interface includes a search portal; a second display module, used to display a second display interface in response to a user inputting a keyword in the search portal, the second display interface including search results corresponding to the keyword; wherein the keyword includes a private entity; the search results are obtained based on the features corresponding to the keyword and the private entity; the search results and the keyword correspond to different modalities.
  • the device also includes: a third display module, used to display a third display interface, the third display interface including a first prompt information and at least one data; the first prompt information is used to prompt the user to select data containing a first private entity; a fourth display module, used to display a fourth display interface in response to the user's selection operation in the at least one data, the fourth display interface including first data and a first identifier, the first identifier being used to indicate that the first data is data selected by the user.
  • a third display module used to display a third display interface, the third display interface including a first prompt information and at least one data
  • the first prompt information is used to prompt the user to select data containing a first private entity
  • a fourth display module used to display a fourth display interface in response to the user's selection operation in the at least one data, the fourth display interface including first data and a first identifier, the first identifier being used to indicate that the first data is data selected by the user.
  • the second display interface also includes: second prompt information, the second prompt information is used to prompt the user to confirm whether the first search result is the search result expected by the user;
  • the device also includes: a fifth display module, used to display a fifth display interface in response to the user's confirmation operation, the fifth display interface including the first search result and a second identifier, the second identifier is used to indicate that the first search result is the search result confirmed by the user.
  • an embodiment of the present application provides a model training device, comprising: an acquisition module for acquiring a multimodal sample set and a first model; wherein the multimodal sample set comprises: a multimodal sample corresponding to a first private entity, a multimodal sample corresponding to a first search request, and a multimodal sample corresponding to a second search request; the first private entity represents an entity having an association with a user, the first search request includes the first private entity, and the second search request does not include the first private entity; a training module for training the first model using the multimodal sample set to obtain a second model.
  • the training module is further used to: process the multimodal samples corresponding to the first private entity and the multimodal samples corresponding to the first search request through the first model to obtain fused features; train the first model according to the fused features, the features corresponding to the first search request, and the features corresponding to the second search request to obtain the second model; wherein the features corresponding to the first search request are obtained by processing the multimodal samples corresponding to the first search request by the first model, and the features corresponding to the second search request are obtained by processing the multimodal samples corresponding to the second search request by the first model.
  • the training module is further used to: input the multimodal sample corresponding to the first search request into the first model to generate a first feature; input the multimodal sample corresponding to the first private entity into the first model to generate a second feature; and fuse the first feature and the second feature to obtain the fused feature.
  • the multimodal samples include samples of the first modality and samples of the second modality; the features corresponding to the first search request are obtained by processing the samples of the first modality corresponding to the first search request by the first model, and the features corresponding to the second search request are obtained by processing the samples of the first modality corresponding to the second search request by the first model; the training module is also used to: input the samples of the first modality corresponding to the first private entity and the samples of the second modality corresponding to the first search request into the first model to obtain the fused features.
  • an embodiment of the present application provides an electronic device, comprising: a processor; a memory for storing processor executable instructions; wherein the processor is configured to implement the first aspect or one or several search methods of the first aspect, the search method of the second aspect, or the third aspect or one or several model training methods of the third aspect when executing the instructions.
  • an embodiment of the present application provides a computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the first aspect or one or several search methods of the first aspect, the search method of the second aspect, or implement the second aspect or one or several model training methods of the second aspect.
  • an embodiment of the present application provides a computer program product, which, when running on a computer, enables the computer to execute the above-mentioned first aspect or one or several search methods of the first aspect, the search method of the second aspect, or execute the above-mentioned second aspect or one or several model training methods of the second aspect.
  • FIG. 1( a )-( c ) are schematic diagrams showing various application scenarios of a search method according to an embodiment of the present application.
  • FIG. 2 shows a flow chart of a search method according to an embodiment of the present application.
  • FIG3 shows a flow chart of a method for obtaining search results according to an embodiment of the present application.
  • FIGS. 4(a)-(c) are schematic diagrams showing a search method according to an embodiment of the present application.
  • FIG5 shows a flow chart of a method for constructing a private knowledge set according to an embodiment of the present application.
  • FIG6 shows a flow chart of an explicit collection method according to an embodiment of the present application.
  • FIG7( a )-( c ) are schematic diagrams showing an explicit acquisition method according to an embodiment of the present application.
  • FIG8( a )-( c ) are schematic diagrams showing an explicit acquisition method according to an embodiment of the present application.
  • FIG. 9 shows a flow chart of an implicit collection method according to an embodiment of the present application.
  • FIG10( a )-( c ) are schematic diagrams showing an implicit collection method according to an embodiment of the present application.
  • FIG11 shows a flowchart of a model training method according to an embodiment of the present application.
  • FIG12 shows a flowchart of a model training method according to an embodiment of the present application.
  • FIG13 is a flow chart showing a model training method according to an embodiment of the present application.
  • FIG. 14 shows a structural diagram of a search device according to an embodiment of the present application.
  • FIG15 shows a structural diagram of a model training device according to an embodiment of the present application.
  • FIG. 16 shows a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.
  • FIG. 17 shows a software structure block diagram of the electronic device 100 according to an embodiment of the present application.
  • FIG18 is a schematic structural diagram of another electronic device according to an embodiment of the present application.
  • references to "one embodiment” or “some embodiments” etc. described in this specification mean that a particular feature, structure or characteristic described in conjunction with the embodiment is included in one or more embodiments of the present application.
  • the phrases “in one embodiment”, “in some embodiments”, “in some other embodiments”, “in some other embodiments”, etc. appearing in different places in this specification do not necessarily all refer to the same embodiment, but mean “one or more but not all embodiments", unless otherwise specifically emphasized in other ways.
  • the terms “including”, “comprising”, “having” and their variations all mean “including but not limited to”, unless otherwise specifically emphasized in other ways.
  • At least one means one or more
  • plural means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can mean: including the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A and B can be singular or plural.
  • the character “/” generally indicates that the previous and next associated objects are in an “or” relationship.
  • “At least one of the following” or similar expressions refers to any combination of these items, including any combination of single or plural items.
  • At least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple.
  • Multimodal data refers to data obtained from different fields or perspectives for the same description object, and each field, perspective, existence form or information source describing these data is called a modality.
  • Data composed of two or more modalities is called multimodal data.
  • modal data can include: text, image, video, audio and other data.
  • multimodal search or multimodal retrieval; it is a technology that searches by using query statements (such as keywords) that are inconsistent with the modality of the data to be searched, for example, searching for images with text.
  • a technology that enables multimodal search within terminals such as mobile phones, tablets, and personal computers (PCs).
  • Figures 1(a)-(c) are schematic diagrams showing various application scenarios of the search method according to an embodiment of the present application.
  • Scenario 1 In-terminal multimodal search scenario, in which a user can search the local gallery, search globally, etc. in a terminal device, which stores the user's private photos;
  • the terminal device can be a personal computer, a laptop, a smart phone, a tablet computer, an IoT device, or a portable wearable device, etc.
  • the IoT device can be a smart speaker, a smart TV, a smart air conditioner, a smart car device, etc.
  • the portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, etc.
  • taking the in-terminal text search for pictures as an example, as shown in FIG.
  • the mobile phone 101 can display a user search interface, which includes an entry for a picture search request; the user can enter a keyword through the entry for the picture search request, thereby triggering the mobile phone 101 to search the local gallery for the pictures the user wants, and the searched pictures can be displayed on the mobile phone 101.
  • Scenario 2 Cross-end multimodal search scenario, in which a user can use a terminal device to search the gallery on other terminal devices, perform global search, etc., wherein a connection relationship is established between the terminal device and the other terminal device, and the other terminal device stores the user's private photos.
  • a connection is established between the mobile phone 101 and the personal computer 102, and the mobile phone 101 can display a user search interface, which includes an entry for image search requests; the user can enter keywords through the entry for the image search request, and the mobile phone 101 transmits the keywords to the personal computer 102, thereby triggering the personal computer 102 to search for the user's desired pictures in the personal computer gallery, and the searched pictures can be displayed on the mobile phone 101.
  • Scenario 3 End-to-cloud multimodal search scenario, in which a user can use a terminal device to perform gallery search, global search, etc. on the gallery on the cloud server, wherein a connection relationship is established between the terminal device and the cloud server, and the cloud server stores the user's private photos;
  • the cloud server can be an independent server or a server cluster composed of multiple servers.
  • the mobile phone 101 can establish a connection with the cloud server 103 through the network, and the mobile phone 101 can display a user search interface, which includes an entry for image search requests; the user can enter keywords through the entry for the image search request, and the mobile phone 101 transmits the keywords to the cloud server 103, thereby triggering the cloud server 103 to search for the user's desired pictures in the cloud gallery, and the searched pictures can be displayed on the mobile phone 101.
  • a user search interface which includes an entry for image search requests
  • the user can enter keywords through the entry for the image search request
  • the mobile phone 101 transmits the keywords to the cloud server 103, thereby triggering the cloud server 103 to search for the user's desired pictures in the cloud gallery, and the searched pictures can be displayed on the mobile phone 101.
  • Method 1 Use fixed tags for on-end multimodal search.
  • a closed tag set is formed by combining inherent tags (such as time, location, etc.) with tags inferred by models such as object detection, image classification, and optical character recognition (OCR); then, an inverted index is established based on the closed tag set; then, query tag calculation is performed: and then, through text segmentation technology, the keywords entered by the user are decomposed into text to obtain multiple tags; finally, based on the multiple tags obtained above, a tag inverted search is performed to search for images that meet the conditions, and the images are returned to the user.
  • inherent tags such as time, location, etc.
  • OCR optical character recognition
  • This method can realize the image search function, but it still has the following shortcomings: (1) It only supports searches based on closed tag sets. When users search, keywords need to be strictly input according to the tags and strictly matched. It has poor flexibility and is difficult to solve matching scenarios such as ambiguity, fuzziness, and synonyms, and cannot achieve semantic matching. (2) The tags in the closed tag set are very limited and cannot effectively represent the semantics of the image. It cannot meet the diversified search requests of users in most cases. Even if traditional natural language understanding (NLU) is used as keywords, it still needs to be customized for different search businesses. It lacks versatility and has poor scalability. (3) If the closed tag set is expanded, the inference model needs to be replaced accordingly, which makes maintenance difficult.
  • NLU natural language understanding
  • Method 2 Use an open method for on-end multimodal search.
  • this method first, use the public and rich data of image-text pairs to train the multimodal model to obtain a multimodal model with the same semantic representation of image-text pairs in high-dimensional space; then, the pictures in the gallery are inferred by the multimodal model to obtain the corresponding high-dimensional space feature vectors to form the picture base library data; then, the high-dimensional space feature vectors of user keywords are inferred by the multimodal model; finally, the similarity between the high-dimensional space feature vectors generated above and the picture feature vectors in the picture semantic base library is calculated to obtain the picture with the highest similarity, thereby completing the gallery search.
  • This method can redefine image search technology from a semantic perspective, and can have a more comprehensive representation of images, meeting the needs of users' daily open search (i.e., it can be entered at will without the need for specific tags), but it still has the following shortcomings: lack of concrete search capabilities, that is, it is impossible to correspond the entities in the query text to the specific images in reality, making it difficult to achieve true semantic understanding of the content and user requests. For example, when a user enters the keyword "my wife and I play badminton", after searching, photos of two people playing badminton will be displayed, but there is no guarantee that these two people are the user and the user's wife.
  • the search results may include photos of any two people playing badminton, such as "my wife and I play badminton” and “my daughter and I play badminton” in the gallery.
  • the user still needs to further select photos from the many search results to obtain the photos of "my wife and I play badminton” that the user wants, resulting in low search efficiency, low accuracy, and poor user experience.
  • the embodiment of the present application provides a search method (see below for detailed description).
  • This can effectively solve the problem that users cannot perform concrete searches when searching through terminal devices, and improve the user's search experience.
  • this search method does not require a closed tag set. Users can enter keywords arbitrarily when searching, which is highly flexible; it can meet users' diversified search requests and has higher scalability; this search method goes beyond the scope of tags, and there is no need to continuously enrich and exhaust tags. For different terminal devices, there is no need to change the model, and maintenance is easier.
  • this search method can correspond the entities in the query text to the specific images in reality, realize concrete search, and has high search efficiency, high accuracy, and better user experience.
  • FIG2 shows a flow chart of a search method according to an embodiment of the present application.
  • the method may be executed in a terminal device, for example, the mobile phone 101 in FIG1 above; as shown in FIG2, the method may include the following steps:
  • Step S201 Receive a first search request from a user, where the first search request includes a private entity.
  • the first search request may be data in a text modality (such as keywords such as text, words, sentences, paragraphs consisting of sentences, etc.), an image modality (such as an image containing text), a video modality (such as a video containing text), or an audio modality (such as voice).
  • a text modality such as keywords such as text, words, sentences, paragraphs consisting of sentences, etc.
  • an image modality such as an image containing text
  • a video modality such as a video containing text
  • an audio modality such as voice
  • the first search request may include one or more private entities and one or more non-private entities; wherein the private entity indicates an entity associated with the user, for example, it may be “I”, “wife”, “husband”, “uncle”, “aunt”, “son”, “mother” and other titles associated with the user; for another example, it may be "Zhang San”, “Dian Dian” and other names or nicknames associated with the user. It can be understood that for different users, the same private entity corresponds to different images in reality, such as different people or objects. For example, for user A, the private entity “I” corresponds to user A in reality, and for user B, the private entity “I” corresponds to user B in reality.
  • Non-private entities indicate entities that are not directly associated with the user, for example, “shoes”, “badminton”, “swimming”, “Shanghai” and other general objects, places, etc.
  • the same non-private entity corresponds to the same image in reality, such as the same person or object.
  • a user can issue a first search request in text mode to a terminal device, and the first search request in text mode can include private entities; for example, as shown in Figure 1(a), a user can enter a keyword through a search bar in the gallery of mobile phone 101, and the keyword can include private entities.
  • the user may also send a first search request in audio mode to the terminal device, and the terminal device may perform a modal transformation on the first search request in audio mode to obtain a first search request in text mode, wherein the first search request in text mode includes private entities.
  • the user may input a search voice through the search entry on the negative first screen of the mobile phone 101, and the mobile phone 101 may process the search voice input by the user and convert it into a keyword, wherein the keyword includes a private entity.
  • the terminal device may parse the first search request to obtain at least one entity, and then may match the at least one entity with at least one private entity in the terminal device, and the one or more entities that are successfully matched are the private entities in the first search request.
  • At least one private entity in the terminal device may include: “I”, “wife”, “son”, “uncle”, “mom”, “Zhang San”, etc.
  • the first search request may be the keyword “my photos”.
  • the terminal device parses the keyword “my photos” input by the user, and can extract the two entities “I” and “photos” in the keyword “my photos”, and then match "I” and "photos” with the above-mentioned multiple private entities respectively, among which "I” is matched successfully, that is, the private entity in the keyword "my photos” is "I”.
  • Step S202 Determine, among the features corresponding to at least one private entity, the features corresponding to the private entity in the first search request; wherein the features corresponding to the at least one private entity indicate data corresponding to the at least one private entity, and the data corresponding to the at least one private entity corresponds to a different modality than the first search request.
  • a private knowledge set may be stored in the terminal device, which may include at least one private entity and features (also called feature vectors) corresponding to each private entity in the at least one private entity; the terminal device selects the features corresponding to the private entity in the private knowledge set based on the private entity in the first search request received from the user.
  • features also called feature vectors
  • the feature corresponding to at least one private entity in the private knowledge set is obtained by processing the data corresponding to the at least one private entity by the model in the terminal device; for example, the first search request may be data in text mode, and the data corresponding to the at least one private entity may be data in image mode; for another example, the first search request may be data in audio mode, and the data corresponding to the at least one private entity may be data in image mode. It can be understood that for the data corresponding to the private entity of a certain mode, the feature corresponding to the private entity obtained by the model reflects the characteristics of the private entity in the mode.
  • the feature corresponding to the private entity "I” obtained by the model reflects the visual characteristics of "I” in the image, such as color, shape, position or size.
  • the training process of the model in the terminal device can refer to the relevant description below; exemplarily, the trained model can be transplanted to the terminal device, and taking the first search request as data in text mode as an example, the data in image mode containing the private entity in the terminal device can be input into the model, so as to obtain the feature corresponding to the private entity, and establish an association relationship between the private entity and the feature corresponding to the private entity, so as to obtain the private knowledge set.
  • photos 1 to 10 in the local gallery on the mobile phone can be input into the model, wherein each photo in photos 1 to 10 contains at least one private entity; it is understandable that a photo can contain one or more private entities, for example, a photo of me and my wife can include two private entities, "me” and "wife".
  • a photo can contain one or more private entities, for example, a photo of me and my wife can include two private entities, "me” and "wife”.
  • the features corresponding to each private entity in photos 1 to 10 can be obtained; and the corresponding relationship between each private entity and the features corresponding to each private entity can be determined respectively, so as to establish a private knowledge set.
  • At least one private entity in the private knowledge set may include “I”, “wife”, “son”, “uncle”, “mother”, and “Zhang San”; the private knowledge set may include characteristics of “I”, characteristics of “wife”, characteristics of “son”, characteristics of “uncle”, characteristics of “mother”, and characteristics of "Zhang San”. If the private entity in the first search request received by the terminal device in the above steps is "I", the characteristics of "I" can be selected in the private knowledge set.
  • Step S203 Obtain search results according to the first search request and the features corresponding to the private entities in the first search request.
  • the terminal device may process the first search request and the features corresponding to the private entities in the first search request respectively through the above-mentioned model, and fuse the processing results, thereby obtaining search results according to the fused processing results.
  • the search results and the first search request correspond to different modalities.
  • the search results may correspond to the same modality as the data corresponding to the at least one private entity.
  • the first search request may be data in a textual modality
  • the search results may be data in an image modality.
  • the first search request may be the keyword "my photos", and the search results may be one or more photos.
  • FIG3 shows a flow chart of a method for obtaining search results according to an embodiment of the present application.
  • step S203 may include the following steps:
  • Step S20301 Process the first search request and features corresponding to private entities in the first search request through a model to generate a fused feature; the fused feature indicates the first search request.
  • the model may be pre-configured in the terminal device, and the model may generate similar features for the same private entity in data of different modalities.
  • the terminal device may input a first search request into the model to generate a first feature, and then fuse the first feature with the feature corresponding to the private entity in the first search request to generate a fused feature.
  • the model may include: a sub-model corresponding to the first modality and a sub-model corresponding to the second modality; wherein the first modality is the modality corresponding to the data corresponding to the private entity in the first search request, such as an image modality, and the second modality is the modality corresponding to the first search request, such as a text modality.
  • the terminal device may input the feature corresponding to the first search request into the sub-model corresponding to the second modality to obtain the first feature, and then fuse the first feature with the feature corresponding to the private entity in the first search request to generate a fused feature.
  • the features corresponding to the same private entity may be different, that is, the features corresponding to the private entity can represent concrete private information, which is related to the specific user, and can also be called private features.
  • the first feature reflects the semantics of the characters in the first search request itself; it is understandable that for different users, the first search request is the same, and the semantics of the characters in the first search request itself is the same.
  • the generated first features are the same. Therefore, the first feature can represent public information, which is not related to the specific user, and can also be called a public feature; further, the private features and the public features can be fused to obtain fused features, which can also be called public-private fusion features.
  • existing fusion methods such as adapter, sum, and attention can be used to fuse the first feature and the feature corresponding to the private entity in the first search request to obtain a fused feature.
  • the fused feature is a feature of a high-dimensional space, which can represent both private features and public features, thereby achieving the complementarity of private features and public features.
  • Step S20302 taking the data corresponding to the feature matching the fusion feature in the database as the search result.
  • the database may include a plurality of data and features corresponding to each of the plurality of data; wherein the plurality of data and the first search request correspond to different modalities.
  • the terminal device can calculate the similarity between the above fusion feature and the feature corresponding to each data in the database, and use the feature with the highest similarity to the fusion feature or the top K features in similarity in the database as the feature matching the fusion feature.
  • the features can be sorted from high to low by similarity, so that the features corresponding to the top K similarities are used as the features matching the fusion feature; the data corresponding to the features matching the fusion feature is the search result.
  • the number of features matching the fusion feature can be one or more, wherein different features matching the fusion feature correspond to different search results, that is, the number of search results can be one or more.
  • the database can be a picture gallery in a terminal device, including multiple pictures and features corresponding to each of the multiple pictures.
  • the above-mentioned fused features can be used to calculate the similarity with the features in the database, and the features with the highest similarity in the database or the top K features in similarity ranking are used as features that match the fused features, and then the pictures corresponding to the features that match the fused features are used as searched pictures.
  • the above-mentioned model in the terminal device can be used in advance to infer the data of different modalities in the terminal device corresponding to the first search request, obtain the features corresponding to each data, and the features corresponding to each data can be high-dimensional space features, and establish a corresponding relationship between each data and the features corresponding to each data. Based on the corresponding relationship, each data and the features corresponding to each data are stored in the database of the terminal device.
  • the terminal device can input the data of the image modality into the sub-model corresponding to the image modality in the model, thereby generating the features corresponding to the data of the image modality.
  • the terminal device can trigger operations such as inferring the features corresponding to each data according to upper-layer scheduling at an appropriate time, for example, when the computing power of the terminal device permits or when the user does not use the terminal device, thereby updating the database.
  • the database can be a gallery in a mobile phone. It is understandable that there are usually new pictures in the gallery every day.
  • the above model can be used to infer the features corresponding to the new pictures, thereby reducing the impact on the user's normal use of the mobile phone.
  • the sub-model corresponding to the image modality in the above model can be used to infer the pictures in the gallery or the new pictures in turn, obtain the features corresponding to each picture, and store them in the terminal device.
  • the terminal device uses the model to infer the first search request and generate a fused feature, which can simultaneously represent public information and concrete private information;
  • the first search request can be processed by the model to generate a public feature (i.e., the first feature), and then the generated public feature and private feature (i.e., the feature corresponding to the private entity in the first search request) are fused to generate a fused feature; thereby, for different users, the search results that the user wants can be searched, thereby realizing a concrete search based on the fusion of private features and public features.
  • the first search request may be processed by the model to obtain search results.
  • Step S204 display search results.
  • the terminal device may display the search results.
  • the search results are image modality data, that is, if the search results are pictures
  • the pictures may be displayed to the user through the display screen on the terminal device.
  • the pictures may be displayed directly, or a thumbnail of the pictures may be displayed, etc., without limitation.
  • the features corresponding to the private entity in the user's search request are determined in the features corresponding to at least one private entity, and the search results are obtained according to the features corresponding to the private entity and the first search request. Since the features corresponding to at least one private entity indicate the data corresponding to at least one private entity, the data corresponding to at least one private entity can correspond to the same modality as the search results, so that the private entity in the search request can be corresponded to the specific image in reality (that is, the specific image of the private entity in the search results), thereby realizing a concrete search; solving the problem that the terminal device cannot perform a concrete search during the search process, improving the search effect and search efficiency, and enhancing the user's search experience.
  • the model can be used to infer the private entity in the first search request and the first search request to obtain the public-private fusion features, and then the search results are determined in the database through the public-private fusion features.
  • the model can be used to infer the private entity in the first search request and the first search request to obtain the public-private fusion features, and then the search results are determined in the database through the public-private fusion features.
  • the embodiment of the present application also provides another search method, which may include: the terminal device may display a first display interface; the first display interface includes a search entry; further, the terminal device displays a second display interface in response to the user inputting a keyword in the search entry, and the second display interface includes a first search result corresponding to the keyword; wherein the keyword includes a private entity; the first search result is obtained based on the keyword and the features corresponding to the private entity in the keyword; the first search result corresponds to a different modality than the keyword.
  • the number of first search results may be one or more; illustratively, the process of obtaining the first search result may refer to the relevant description in the above step S203.
  • Figures 4(a)-(c) show schematic diagrams of a search method according to an embodiment of the present application.
  • the user opens the local gallery in the mobile phone 101 and displays the first display interface, wherein a search box (i.e., a search entry) is provided above the first display interface; the first display interface can also display part or all of the images in the local gallery;
  • the user can enter the keyword "I play badminton” in the search box in the local gallery, and click the search icon in the search box to trigger the mobile phone 101 to perform a search operation;
  • the mobile phone 101 parses the keyword "I play badminton” and obtains the private entity "I” in the keyword "I play badminton” among multiple private entities, and then the mobile phone 101 can search for the private entity "I” in the features corresponding to the multiple private entities.
  • the mobile phone 101 can fuse the public features obtained by inputting the keyword "I play badminton” into the model with the private features corresponding to the private entity "I” to obtain the fused features, and then determine the picture corresponding to the features matching the fused features in the gallery as the picture of "I play badminton” that the user wants; as shown in Figure 4(c), the mobile phone 101 can then display a second display interface, and the searched picture of "I play badminton" can be displayed in the second display interface, thereby completing the concrete search.
  • the number of private entities included in the private knowledge set can be set as needed.
  • the types of private entities contained in the images in the gallery of most users can be statistically obtained based on the minimization principle (for example, "I”, “wife”, “husband”, “uncle”, “aunt”, “son”, “mother” and other common user-related characters), so as to determine the number of private entities in the minimum private knowledge set, and then generate the features corresponding to each private entity to complete the construction of the minimum private knowledge set.
  • This minimum private knowledge set can solve most of the requirements of concrete search in the field of image search, thereby meeting the search needs of different users.
  • the private entities included in the private knowledge set can also be added or reduced according to needs, and there is no limitation on this.
  • FIG5 shows a flow chart of a method for constructing a private knowledge set according to an embodiment of the present application.
  • the method can be executed on a terminal device, for example, on the mobile phone 101 in FIG1 above; as shown in FIG5, the method can include the following steps:
  • Step S501 Acquire data corresponding to a first private entity.
  • the first private entity is any private entity among the at least one private entity; the data corresponding to the first private entity corresponds to a different modality than the first search request.
  • the data corresponding to the first private entity may include only the first private entity or may include multiple private entities including the first private entity, without limitation.
  • the data corresponding to the same private entity is different, that is, for a certain user, the data corresponding to the private entity can be used as the concrete private information related to the user.
  • the data corresponding to the private entity can be used as the concrete private information related to the user.
  • the data corresponding to "son” obtained by user A's terminal device can be a photo of user A's son; for user B, the data corresponding to "son” obtained by user B's terminal device can be a photo of user B's son.
  • the data corresponding to the first private entity may be acquired in the following collection manner.
  • the data corresponding to the first private entity can be obtained by explicit collection.
  • the terminal device can guide the user to perform a selection operation by sending a prompt message to the user or other methods that the user can perceive, and obtain the data corresponding to the first private entity according to the user's selection operation; thereby obtaining the data corresponding to the first private entity within the scope permitted by privacy security.
  • the data corresponding to the first private entity can be obtained by implicit collection.
  • the terminal device can obtain the data corresponding to the first private entity by inferring data containing common sense information (for example, wedding photos, parent-child photos, selfies, etc.), thereby obtaining the data corresponding to the first private entity without the user's perception; considering that the confidence of the data corresponding to the first private entity obtained may be low, it can further respond to the search request actively triggered by the user, display the search results to the user, and determine whether the data corresponding to the first private entity with lower confidence obtained above is data confirmed by the user according to the user's selection operation in the search results, so that the data corresponding to the first private entity with high confidence can be obtained; thereby obtaining the data corresponding to the first private entity within the scope permitted by privacy security.
  • common sense information for example, wedding photos, parent-child photos, selfies, etc.
  • Step S502 Process data corresponding to the first private entity to generate features corresponding to the first private entity.
  • the terminal device may process the data corresponding to the first private entity through the model in the terminal device, thereby generating features corresponding to the first private entity.
  • the features corresponding to the first private entity generated by the model reflect the characteristics of the first private entity in the modality. For example, for the photo "My Selfie", the photo is input into the model, and the features corresponding to "I” generated reflect the features of "I” embodied in the image.
  • the model may include sub-models corresponding to multiple modalities, and the terminal device may input data corresponding to the first private entity of a certain modality into the sub-model corresponding to the modality; for example, the photo "My Selfie” may be input into the sub-model corresponding to the image modality, thereby generating features corresponding to the image modality "I".
  • the features corresponding to the generated first private entity can be used as private knowledge related to the user.
  • the private knowledge can be used as known information to realize concrete search.
  • the private knowledge and the search request can be input into the model to realize concrete search.
  • the terminal device may further split the data into multiple data containing only one private entity, and then input each data containing only one private entity into the model to obtain features corresponding to each private entity. For example, the terminal device may identify the age of the person in the photo, thereby splitting "a photo of me and my son" into a photo containing only me and a photo containing only my son, and inputting these two photos into the model respectively to obtain features corresponding to the private entity "me” and features corresponding to the private entity "son” respectively; or, the terminal device may further input the data corresponding to the first private entity into the model to obtain features corresponding to the multiple private entities. For example, the terminal device may input "a photo of me and my son” into the model to obtain features corresponding to the two private entities "me” and "son”.
  • the data corresponding to the first private entity is obtained.
  • the data corresponding to the first private entity can be obtained by explicit collection or implicit collection, so as to collect concrete private information within the scope permitted by privacy security; and the data corresponding to the first private entity is processed to generate features corresponding to the first private entity, thereby obtaining the features corresponding to the first private entity; exemplarily, the features corresponding to the first private entity can be stored in the terminal device; thereby, in the subsequent process of the terminal device performing multimodal search, support is provided for quickly determining the features corresponding to the private entity.
  • the terminal device can determine the features corresponding to other private entities in the private knowledge set according to the features corresponding to the generated private entities, thereby automatically expanding the private knowledge set and completing the construction of the private knowledge set.
  • the private knowledge set can be stored in the terminal device.
  • the terminal device can repeatedly execute the above steps S501-S502, obtain the data corresponding to each private entity in turn, and use the model to generate the features corresponding to each private entity, thereby traversing each private entity, expanding private knowledge, and completing the construction of the private knowledge set.
  • the terminal device can also, based on the generated features corresponding to the first private entity, further obtain features corresponding to other private entities in the private knowledge set by executing the following steps S503 and S504, thereby expanding the private knowledge set, and finally obtaining features corresponding to each private entity to complete the construction of the private knowledge set.
  • Step S503 Acquire data including the first private entity and the second private entity.
  • the second private entity is any private entity among the at least one private entity except the first private entity.
  • the terminal device can automatically obtain data containing both the first private entity and the second private entity within the scope of privacy security authorized by the user. It can be understood that, for a certain user, compared with data containing only the first private entity or data containing only the second private entity, the data containing both the first private entity and the second private entity contains more concrete private information related to the user.
  • the data containing the first private entity and the second private entity can be photos containing multiple private entities such as "wedding photos", "family photos” or "group photos” in the gallery.
  • the terminal device can automatically determine the "family photo” in the gallery in combination with common sense information.
  • the photo can include multiple private entities such as "me", "wife”, “son” or "daughter".
  • Step S504 Obtain features corresponding to the second private entity according to the data including the first private entity and the second private entity and features corresponding to the first private entity.
  • the terminal device can split the data containing the first private entity and the second private entity into two independent data, namely, data containing only the first private entity and data containing only the second private entity, and input the two independent data into the model respectively to obtain features corresponding to the two private entities, namely, features corresponding to the first private entity and features corresponding to the second private entity; and then, based on the known features corresponding to the first private entity (that is, the known private knowledge), the features corresponding to the second private entity can be determined from the features corresponding to the two private entities, thereby realizing automatic expansion of the private knowledge set.
  • the terminal device can obtain the "selfie” in the gallery, that is, the image corresponding to the private entity "I”, and input the “selfie” into the model to generate the features corresponding to the image modality "I". Furthermore, the terminal device can also obtain the "wedding photo” in the gallery, that is, the image containing the private entities "I” and “wife”, and extract the features corresponding to the two private entities in the "wedding photo", that is, the features corresponding to the image modalities "I” and "wife”; finally, the terminal device can combine the features corresponding to the image modality "I” generated by the "selfie” to obtain the features corresponding to the image modality "wife”.
  • the terminal device can collect data corresponding to private entities (i.e., concrete private information) within the scope permitted by privacy and security through the above steps S501-S502, and then generate private knowledge; for example, the private information can be processed by a model to generate private knowledge; and then the terminal device can also automatically expand the generated private knowledge set; as an example, the terminal device can automatically expand the private knowledge set by using common sense and models according to the acquired data containing multiple private entities and the generated private knowledge through the above steps S503-S504, so as to improve the concrete private knowledge within the scope permitted by privacy and security with as little collection as possible and using a small amount of generated private knowledge, thereby achieving effective use of data in situations where there is little user feedback data, and providing support for concrete search in the process of terminal devices performing multimodal search.
  • private entities i.e., concrete private information
  • the terminal device can collect data corresponding to private entities (i.e., concrete private information) within the scope permitted by privacy and security through the above steps S501-S502, and then generate private knowledge; for example
  • the terminal device can further determine whether to traverse each private entity in the minimum private knowledge set. If all private entities are traversed, that is, the features corresponding to all private entities are generated, then the construction of the minimum private knowledge set is completed.
  • the following specifically describes the process of acquiring data corresponding to the first private entity in an explicit collection manner when constructing the private knowledge set.
  • FIG6 shows a flow chart of an explicit collection method according to an embodiment of the present application.
  • the method can be executed on a terminal device, for example, on the mobile phone 101 in FIG1 above; as shown in FIG6, the method can include the following steps:
  • Step S601 Send a first prompt message to the user.
  • the first prompt information is used to prompt the user to select data containing the first private entity.
  • the first prompt information can be in the form of voice, text, vibration or video, etc., which is not limited. It is understandable that the first prompt information needs to prompt the user within the scope of privacy security authorized by the user.
  • the terminal device may send a first prompt message to the user when the user searches for pictures in the gallery for the first time, so as to prompt the user to select data containing the first private entity in the gallery; or, the terminal device may also send a first prompt message to the user when it detects that the time spent by the user in searching for pictures in the gallery is usually long, so as to prompt the user to select data containing the first private entity; or, the terminal device may also send a first prompt message to the user when the number of pictures in the gallery exceeds a certain number.
  • the terminal device may select one or more pieces of prompt information in the prompt information set as the first prompt information based on the prompt information set.
  • Step S602 In response to a selection operation by the user, data corresponding to the first private entity is obtained.
  • the terminal device sends the first prompt information to the user
  • the user can select corresponding data or corresponding prompt options according to the first prompt information
  • the terminal device uses the data selected by the user as data corresponding to the first private entity.
  • the terminal device displays a third display interface, the third display interface includes first prompt information and at least one data; the first prompt information is used to prompt the user to select data containing the first private entity; in response to the user's selection operation in the at least one data, a fourth display interface is displayed, the fourth display interface includes the first data and a first identifier, the first identifier is used to indicate that the first data is the data selected by the user.
  • the identifier can be text, color, brightness, graphics, size, etc. For example, if the first data is a picture, it can be highlighted as an identifier, or a border can be added around the picture as an identifier.
  • the terminal device may send a prompt message such as "Please select your photo”, “Please select a photo of your son”, “My wife and I play badminton”, "Please select a wedding photo” or "Please select a family photo” to the user, so as to prompt the user to select the corresponding picture in the gallery. If the user selects one or more pictures, the terminal device uses the one or more pictures as the data of the image modality corresponding to the private entity contained in the prompt message.
  • Figures 7(a)-(c) show a schematic diagram of an explicit collection method according to an embodiment of the present application;
  • the user enters the local gallery of the mobile phone 101 and displays a first display interface, and a search box (i.e., a search entry) is provided above the first display interface;
  • the first display interface can also display some or all images in the local gallery, and the search box in the first display interface can be clicked to enter keywords in the search box;
  • the mobile phone 101 may detect that the user clicks on the local gallery for the first time.
  • a third display interface is displayed, which displays some or all images in the local library, and can also display a prompt message "Please select your photo" to guide the user to select his own photo from the displayed images; the user can select his own photo according to the prompt message, and in response to the user's selection operation, a fourth display interface is displayed, as shown in Figure 7(c), the user selects picture 2; then the mobile phone 101 can determine that picture 2 is a photo of the user himself, and add a border around picture 2 to indicate that picture 2 is the picture selected by the user, thereby obtaining the image modal data corresponding to the private entity "I".
  • the terminal device may display a first display interface; the first display interface includes a search entry; further, the terminal device displays a second display interface in response to a user inputting a keyword in the search entry, the second display interface including a first search result corresponding to the keyword and a second prompt message; wherein the keyword includes a private entity; the first search result is obtained based on the keyword and the features corresponding to the private entity in the keyword; the first search result and the keyword correspond to different modalities; the second prompt message is used to prompt the user to confirm whether the first search result is the search result expected by the user.
  • the terminal device In response to the user's confirmation operation, the terminal device displays a fifth display interface, the fifth display interface including the first search result and a second identifier, the second identifier being used to indicate that the first search result is the search result confirmed by the user.
  • the process of obtaining the first search result may refer to the relevant description in the above step S203.
  • the terminal device can detect that the user actively searches for pictures in the gallery and issue a prompt message. For example, the user enters keywords such as "daughter”, “photo of son” or “my wife and I playing badminton”. The terminal device responds to the user's query operation, searches in the gallery, displays the searched photos, and issues a prompt message "Is this the photo you want to search for?" and displays the prompt options "Yes” and "No". If the user selects "Yes", the terminal device uses the photo as data of the image modality corresponding to the private entity "son”.
  • Figures 8(a)-(c) show schematic diagrams of an explicit collection method according to an embodiment of the present application
  • the user enters the local gallery of the mobile phone 101 and displays a first display interface, a search box (i.e., a search entry) is provided above the first display interface, and the keyword "son” can be entered in the search box of the first display interface
  • the mobile phone 101 performs a search operation and displays the searched picture 1, a prompt statement "Is this the photo you want to search for?" and prompt options "Yes” and "No” on the screen; in response to the user clicking the option "Yes", a fifth display interface is displayed, as shown in Figure 8(c), then the mobile phone 101 can determine that picture 1 is a photo of the user's son, and add a border around picture 1 to indicate that picture 1 is a photo confirmed by the user, thereby obtaining
  • a button or entrance can be set in the gallery of the terminal device. If the user finds that the image search effect is not good, the button or entrance can be used to trigger the terminal device to send a first prompt message, such as "Please select photos of Zhang San studying"; if the user selects one or more pictures, the terminal device will use the one or more photos as data of the image modality corresponding to the private entity "Zhang San".
  • the data corresponding to the first private entity is obtained in an explicit collection method that can be perceived by the user. Since the private information is subjectively selected by the user, the private information collected in this way has the characteristic of high confidence; thereby achieving the collection of high-confidence concrete private information through display guidance within the scope of user privacy security.
  • the terminal device may also execute step S603, input data corresponding to the first private entity into the model, and generate features corresponding to the first private entity; thereby generating private knowledge from the concrete private information obtained by the above explicit collection method.
  • step S603 may refer to the relevant description in the above step S502.
  • the terminal device may also obtain features corresponding to other private entities in the private knowledge set by executing the following steps S604-S605, and automatically expand the private knowledge set, thereby completing the construction of the private knowledge set.
  • Step S604 Acquire data including the first private entity and the second private entity.
  • Step S605 Obtain features corresponding to the second private entity according to the data including the first private entity and the second private entity and features corresponding to the first private entity.
  • steps S604-S605 may refer to the relevant descriptions in steps S503-S504 in FIG. 5 .
  • the terminal device can expand the private knowledge set according to the collected data containing multiple private entities and the generated private knowledge by executing the above steps S604 and S605, so as to generate more high-confidence concrete private knowledge with as little display collection as possible and a small amount of generated high-confidence private knowledge within the scope allowed by privacy security.
  • the terminal device can further determine whether to traverse each private entity in the minimum private knowledge set. If all private entities are traversed, that is, the features corresponding to all private entities are generated, the construction of the minimum private knowledge set is completed. Otherwise, the above steps S604-S605 are repeated until all private entities are traversed and the constructed minimum private knowledge set is stored in the terminal device.
  • the following specifically describes the process of acquiring data corresponding to the first private entity in an implicit collection manner when constructing the private knowledge set.
  • FIG9 shows a flow chart of an implicit collection method according to an embodiment of the present application.
  • the method can be executed on a terminal device, for example, on the mobile phone 101 in FIG1 above; as shown in FIG9 , the method can include the following steps:
  • Step S901 Acquire initial data including a first private entity.
  • the initial data may be data containing common sense information, such as wedding photos, parent-child photos, selfies, etc.
  • the terminal device may infer the data in the terminal device in combination with common sense to obtain initial data containing the first private entity; for example, if the user is a male user, the wedding photos usually contain two private entities, "I" and "wife", and the terminal device may extract the features of the pictures in the local gallery through a model (such as a sub-model corresponding to the image modality), thereby filtering out the wedding photos in the local gallery; for another example, selfies usually contain the private entity "I", and the terminal device may extract the features of the pictures in the local gallery through a model, thereby filtering out the selfies in the local gallery.
  • a model such as a sub-model corresponding to the image modality
  • the initial data acquired in this step includes concrete private information, so that the initial data including the first private entity can be used as data corresponding to the first private entity; thereby, the data corresponding to the first private entity is acquired without the user's awareness.
  • the confidence of the initial data obtained in this step is usually low because the user's confirmation is not obtained.
  • these selfies contain data corresponding to the private entity "I" (i.e., the user's selfies), and may also contain data corresponding to other private entities (i.e., non-user selfies).
  • the terminal device can further filter and optimize the obtained initial data by executing the following steps S902-S903, so as to obtain data corresponding to the first private entity with high confidence.
  • Step S902 In response to a second search request of the user, display at least one search result, wherein the second search request includes the first private entity, and the at least one search result corresponds to a different modality than the second search request.
  • the terminal device may process the second search request according to the model in the terminal device to obtain at least one search result, and may present the at least one search result to the user.
  • the at least one search result may include the initial data containing the first private entity.
  • Step S903 based on a selection operation of the user in at least one search result, obtain data corresponding to the first private entity.
  • the user can select the result he wants to search from at least one search result. Since the second search request includes the first private entity, the search result selected by the user includes the first private entity; thus, the terminal device can obtain the data corresponding to the first private entity, that is, the concrete private information with high confidence, based on the search result selected by the user.
  • the terminal device may update the confidence of the initial data based on the search result selected by the user; as an example, the terminal device may use the initial data as the data corresponding to the first private entity based on the operation of the user selecting the initial data in at least one search result. For example, if the user selects the initial data, it means that the confidence of the initial data is high, and the terminal device may use the initial data as the data corresponding to the first private entity; if the user does not select the initial data, it means that the confidence of the initial data is still low, and the terminal device may not use the initial data as the data corresponding to the first private entity.
  • the terminal device can determine the search result selected by the user as the data corresponding to the first private entity.
  • the terminal device is the mobile phone 101 in FIG. 1(a) above, and the first private entity can be "I".
  • FIG. 10(a)-(c) shows a schematic diagram of an implicit collection method according to an embodiment of the present application; the mobile phone 101 can filter out selfies in the local gallery; as shown in FIG. 10(a), the second search request can be the keyword "my photos", and the user can enter the keyword "my photos" in the search box of the local gallery of the mobile phone 101.
  • the mobile phone 101 processes the keywords using the sub-model corresponding to the image modality, searches for the user's photos in the gallery, and can display the searched user photos on the screen; as shown in FIG.
  • the user can select his own photos from the photos displayed on the screen of the mobile phone 101, for example, the user selects picture 2; if the selfies filtered out above include picture 2, the mobile phone 101 can use picture 2 as the data corresponding to the private entity "I". In this way, through the user's selection operation, the confidence of picture 2 is improved, thereby obtaining concrete private information with high confidence.
  • low-confidence concrete private information i.e., initial data
  • at least one search result is displayed, and based on the user's operation of selecting the above initial data in at least one search result, the initial data is used as the data corresponding to the first private entity, thereby improving the confidence of the concrete private information, thereby acquiring the data corresponding to the first private entity in an implicit collection method that the user is unaware of.
  • the terminal device may also execute step S904, input the data corresponding to the first private entity into the model, and generate features corresponding to the first private entity; thereby generating private knowledge from the highly confident concrete private information obtained by the above implicit collection method.
  • step S904 may refer to the relevant description in the above step S502.
  • the terminal device may also obtain features corresponding to other private entities in the private knowledge set by executing the following steps S905-S906, and automatically expand the private knowledge set, thereby completing the construction of the private knowledge set.
  • Step S905 Acquire data including the first private entity and the second private entity.
  • Step S906 Obtain features corresponding to the second private entity according to the data including the first private entity and the second private entity and features corresponding to the first private entity.
  • steps S905-S906 may refer to the relevant descriptions in steps S503-S504 in FIG. 5 .
  • the terminal device can expand the private knowledge set according to the collected data containing multiple private entities and the generated private knowledge by executing the above steps S905 and S906, so as to generate more high-confidence concrete private knowledge with as little collection as possible and a small amount of generated high-confidence private knowledge within the scope allowed by privacy security.
  • the terminal device can further determine whether to traverse each private entity in the minimum private knowledge set. If all private entities are traversed, that is, the features corresponding to all private entities are generated, the construction of the minimum private knowledge set is completed.
  • steps S905-S906 are repeatedly executed until all private entities are traversed, and the constructed minimum private knowledge set is stored in the terminal device; illustratively, a triple storage method can be adopted, for example, ⁇ 1#, 2#, husband and wife relationship>, ⁇ 1#, 3#, father and child relationship>, etc.
  • the model training method provided by the embodiment of the present application is described in detail below from the training side. It is understandable that the application side and the training side can correspond to the same device, that is, the training of the model and the search using the trained model can be performed on the same device; the application side and the training side can also correspond to different devices, that is, the model can be trained on one device, and the trained model can be configured on another device for concrete search.
  • the model can be pre-trained on the server, so that the model can be transplanted to the terminal device to realize the concrete search on the terminal device.
  • FIG11 is a flow chart of a model training method according to an embodiment of the present application.
  • the model training method can be applied to a server, such as a cloud server.
  • the method can include the following steps:
  • the multimodal sample set may include: multimodal samples corresponding to the first private entity, multimodal samples corresponding to the first search request, and multimodal samples corresponding to the second search request;
  • the first private entity represents an entity that has an associated relationship with the user, the first search request includes the first private entity, and the second search request does not include the first private entity.
  • the acquired multimodal sample set can be divided into a training set, a validation set and a test set; wherein the number of samples contained in each of the training set, the validation set and the test set can be set according to demand and is not limited to this.
  • the multimodal sample may include a sample of the first modality and a sample of the second modality, for example, the multimodal sample may include a sample of the text modality and a sample of the image modality.
  • the multimodal sample set may include a picture-text pair data set, the picture-text pair data set includes a plurality of picture-text pair data, wherein the data of the text modality in each picture-text pair data corresponds to the data of the image modality, for example, the "I" of the text modality (i.e., the text-I describing the user's photo) and the "I" of the image modality (i.e., the user's photo) may be used as a picture-text pair data.
  • the multimodal sample corresponding to the first private entity may be the image-text pair data corresponding to the first private entity.
  • the first private entity may be "I”
  • the multimodal sample corresponding to the first private entity may be "I” in text mode and "I” in image mode.
  • the multimodal sample corresponding to the first search request may be the image-text pair data corresponding to the first search request.
  • the multimodal sample corresponding to the first search request may be "I play badminton" in text mode (i.e., the keyword describing the photo of the user playing badminton - I play badminton) and "I play badminton” in image mode (i.e., the photo of the user playing badminton).
  • the multimodal sample corresponding to the second search request may be public image-text pair data; for example, the multimodal sample corresponding to the second search request may be "other people play badminton" in text mode (i.e., the keyword describing the photo of other people other than the user playing badminton - other people play badminton) and "other people play badminton” in image mode (i.e., the photo of other people other than the user playing badminton).
  • text mode i.e., the keyword describing the photo of other people other than the user playing badminton - other people play badminton
  • image mode i.e., the photo of other people other than the user playing badminton
  • the first model may be a trained multimodal model, for example, it may be a multimodal model trained using a public data set, which can be used to generate similar features for the same non-private entity in samples of different modalities; wherein, the training process of the first model may refer to the prior art and will not be repeated here.
  • the first model may include: a sub-model corresponding to the first modality and a sub-model corresponding to the second modality.
  • the first modality may be an image modality
  • the second modality may include a text modality
  • the sub-model corresponding to the text modality is used to encode the data of the text modality and generate corresponding features
  • the sub-model corresponding to the picture modality is used to encode the data of the image modality and generate corresponding features.
  • a public image-text pair dataset can be used to divide the dataset into a training set, a validation set, and a test set to train the multimodal model.
  • the accuracy of the test set reaches a preset value
  • the same representation of the public image-text pairs in a high-dimensional space that is, the same high-dimensional space features, can be obtained.
  • the training is stopped and the model is saved to obtain a trained multimodal model, that is, the first model.
  • the first model since the first model is obtained by training with a public data set, the first model has the ability to identify general entities, for example, it can identify entity badminton; and for private entities such as "I", “wife” or "Zhang San", since the public data set does not distinguish private entities for different users, the first model trained with the public data set is only capable of identifying these entities as "people" and cannot accurately correspond these entities to specific images in reality.
  • a multimodal sample set containing private information i.e., multimodal samples corresponding to the first private entity
  • the obtained second model itself has the ability to carry information in a concrete form, and is used to generate similar features for the same private entity in samples of different modalities, so that the private entity can be accurately corresponded to the specific image in reality.
  • the second model can be used as the model in Figure 2 above for concrete search, greatly improving the search effect and efficiency.
  • the first model can be trained through comparative learning using multimodal samples corresponding to the first private entity, multimodal samples corresponding to the first search request, and multimodal samples corresponding to the second search request, thereby obtaining the second model; wherein the multimodal samples corresponding to the first search request can be used as positive examples, and the multimodal samples corresponding to the second search request can be used as (difficult) negative examples.
  • fine-tuning is performed relying on a certain "private" data set, so that the trained model (i.e., the second model) can generate similar features for the same private entity in samples of different modalities, and has the ability to infer the features of public information and visualize private information; and then the second model can be used for visualized search, greatly improving the search effect and efficiency.
  • the trained model i.e., the second model
  • step S1102 The training process in the above step S1102 is described in detail below.
  • FIG12 is a flow chart of a model training method according to an embodiment of the present application. As shown in FIG12 , the above step S1102 may include the following steps:
  • the server may obtain a multimodal sample set corresponding to a private entity, which may be used as a "private" data set.
  • a picture-text pair data set corresponding to a private entity may be obtained, and each picture-text pair data may include a picture and a label of the private entity in the picture.
  • the server may parse the first search request, obtain the private entity contained therein (ie, the first private entity), and acquire the modal sample corresponding to the first private entity.
  • the server may input a sample of the first modality corresponding to the first private entity and a sample of the second modality corresponding to the first search request into the first model to obtain a fused feature.
  • the sample of the second modality corresponding to the first search request may be the keyword "I play badminton”
  • the private entity contained in the first search request may be "I", that is, the first private entity is "I”
  • the "I” of the image modality that is, the sample of the first modality corresponding to the first private entity
  • the keyword "I play badminton” may be input into the first model to obtain the fusion feature.
  • the sample of the second modality corresponding to the first search request may be the keyword "I play badminton".
  • the server can determine that the private entity contained in the first search request is "I", that is, the first private entity is "I”, so that the "I" in the image modality can be input into the sub-model corresponding to the image modality.
  • the sub-model corresponding to the image modality infers the features corresponding to the "I” in the image modality, that is, the features of the private entity "I” in the image.
  • the server may input the multimodal sample corresponding to the first search request into the first model to generate a first feature; input the multimodal sample corresponding to the first private entity into the first model to generate a second feature; and fuse the first feature and the second feature to obtain a fused feature.
  • the specific description of this implementation can refer to the relevant description in step S20301 in Figure 3 above, which will not be repeated here.
  • the server may input the sample of the second modality corresponding to the first search request into the sub-model corresponding to the second modality to obtain the first feature; input the sample of the first modality corresponding to the first private entity into the sub-model corresponding to the first modality to obtain the second feature; and then fuse the first feature and the second feature to obtain a fused feature.
  • the keyword "I play badminton” can be input into the sub-model corresponding to the text modality, and the sub-model corresponding to the text modality infers the feature corresponding to the keyword "I play badminton” (i.e., the first feature);
  • the sample of the image modality corresponding to the first search request is a photo of "me”
  • the photo of "me” is input into the sub-model corresponding to the image modality
  • the sub-model corresponding to the image modality infers the feature corresponding to the photo of "me” (i.e., the second feature)
  • the two features are fused to obtain a fused feature.
  • the first model is used to obtain fused features; as an example, private features (i.e., the second features) and public features (i.e., the first features) are generated, and the generated public features and private features are fused.
  • the fused features can simultaneously represent public information and concrete private information, so that the first model can take into account the learning of both public information and private information.
  • the features corresponding to the first search request are obtained by processing the multimodal samples corresponding to the first search request by the first model
  • the features corresponding to the second search request are obtained by processing the multimodal samples corresponding to the second search request by the first model.
  • the features corresponding to the first search request are obtained by processing the samples of the first modality corresponding to the first search request by the first model
  • the features corresponding to the second search request are obtained by processing the samples of the first modality corresponding to the second search request by the first model.
  • the samples of the first modality corresponding to the first search request may be photos of "I play badminton”
  • the samples of the first modality corresponding to the second search request may be photos of "other people playing badminton”.
  • the server may input the photos of "I play badminton” into the sub-model corresponding to the image modality to obtain the features corresponding to "I play badminton” in the image modality, and input the photos of "other people playing badminton” into the sub-model corresponding to the image modality to obtain the features corresponding to "other people playing badminton” in the image modality.
  • the first model can be trained by contrastive learning using the fusion feature, the feature corresponding to the first search request, and the feature corresponding to the second search request to obtain the second model; wherein the feature corresponding to the first search request is used as the feature of the positive example, and the feature corresponding to the second search request is used as the feature of the negative example.
  • the distance between the feature of the positive example and the fusion feature gradually approaches, and the distance between the feature of the negative example and the fusion feature gradually moves away, until the effect of aligning the feature of the positive example with the feature of the fusion feature is achieved; that is, the distance between the feature of the positive example and the feature of the fusion feature is close, such as the Euclidean distance, the cosine distance, etc., and the distance between the other features is far; it can be understood that the greater the Euclidean distance between the two features, the greater the difference between the two features, that is, the smaller the similarity; when the accuracy of the test set reaches the preset value, the training can be stopped and the model can be saved, thereby obtaining a trained public-private fusion multimodal model, that is, the second model.
  • the second model can be used as the model in Figure 2 above. In the search process, the second model combines private knowledge, so that the fusion feature that can characterize public information and private information can be inferred, thereby realizing
  • the loss function value can be determined according to the fused feature, the feature corresponding to the first search request, and the feature corresponding to the second search request.
  • the loss function value of the positive example can be calculated by the difference between the fused feature and the feature corresponding to the first search request
  • the loss function value of the negative example can be calculated by the difference between the fused feature and the feature corresponding to the second search request.
  • the loss function value is obtained according to the loss function value of the positive example and the loss function value of the negative example.
  • the loss function value can be back-propagated, and the parameter value in the first model can be updated using the gradient descent algorithm.
  • the fused feature obtained by the first model is closer to the feature similarity of the positive example (such as the Euclidean distance), and the similarity with the feature of the negative example is gradually widened.
  • FIG13 shows a flow chart of a model training method according to an embodiment of the present application.
  • the first private entity may be "I”
  • the multimodal sample corresponding to the first private entity is a photo of "I” and a text label "I” corresponding to the photo
  • the multimodal sample corresponding to the first search request is a photo of "I” playing badminton and a text label "I play badminton” corresponding to the photo
  • the multimodal sample corresponding to the second search request is a photo of other people playing badminton and a text label "other people play badminton” corresponding to the photo
  • the photo of "I” playing badminton is used as a positive example
  • the photo of other people playing badminton is used as a negative example
  • the first model fuses the first feature and the second feature to obtain a fused feature. Then, the first model is fine-tuned in combination with feature B, feature C, and fusion feature; through continuous fine-tuning, feature B and the fusion feature are gradually approached, and feature C and the fusion feature are gradually separated, until the effect of feature B and fusion feature alignment is achieved, that is, the "I" in the text mode is aligned with the "I" in the picture mode.
  • the training can be stopped, the model can be saved, and the second model can be obtained. In this way, the model training process can take into account the learning of the public information of the photos and the concrete private information.
  • the public information and private information of the pictures in the gallery can be extracted, and the fusion features can be inferred and generated. Then, combined with private knowledge, pictures containing private information such as "I” and "wife” can be accurately searched.
  • a multimodal model and private data are used to fuse private features with public features, and then aligned with the corresponding multimodal features to train a public-private fusion multimodal model; in this way, the first model is fine-tuned by jointly training public features and private features, which enables the trained model (i.e., the second model) to have the ability to infer concrete private information; this solves the problem that a multimodal model trained only with a public data set cannot carry concrete private information.
  • the embodiment of the present application further provides a search device, which can be used to execute the technical solution described in the above method embodiment.
  • a search device which can be used to execute the technical solution described in the above method embodiment.
  • the steps of the method shown in Figures 2, 3, 5, 6 or 9 can be executed.
  • Figure 14 shows a structural diagram of a search device according to an embodiment of the present application.
  • a receiving module 1401 is used to receive a first search request from a user, wherein the first search request includes a private entity; wherein the private entity represents an entity associated with the user;
  • a determining module 1402 is used to determine, among the features corresponding to at least one private entity, the features corresponding to the private entity in the first search request; wherein the features corresponding to the at least one private entity indicate the data corresponding to the at least one private entity, and the data corresponding to the at least one private entity corresponds to a different modality than the first search request;
  • a searching module 1403 is used to obtain search results according to the first search request and the features corresponding to the private entity in the first search request; the search results correspond to a different modality than the first search request; and
  • a display module 1404 is used to display the search results.
  • the features corresponding to the private entity in the user's search request are determined, and the search results are obtained according to the features corresponding to the private entity and the first search request. Since the features corresponding to at least one private entity indicate the data corresponding to at least one private entity, the data corresponding to at least one private entity can correspond to the same modality as the search results, so that the private entity in the search request can be corresponded to the specific image in reality (that is, the specific image of the private entity in the search results), thereby realizing a figurative search; solving the problem that the terminal device cannot perform a figurative search during the search process, improving the search effect and search efficiency, and enhancing the user's search experience.
  • the model can be used to infer the private entity in the first search request and the first search request to obtain the public-private fusion features, and then the search results are determined in the database through the public-private fusion features.
  • the model can be used to infer the private entity in the first search request and the first search request to obtain the public-private fusion features, and then the search results are determined in the database through the public-private fusion features.
  • the search module 1403 is also used to: process the first search request and features corresponding to private entities in the first search request through a model to generate a fused feature; the fused feature indicates the first search request; and use the data corresponding to the features in the database that match the fused feature as the search result.
  • the device also includes: a generation module, used to obtain data corresponding to a first private entity; the first private entity is any private entity among the at least one private entity; the data corresponding to the first private entity is processed to generate features corresponding to the first private entity.
  • a generation module used to obtain data corresponding to a first private entity; the first private entity is any private entity among the at least one private entity; the data corresponding to the first private entity is processed to generate features corresponding to the first private entity.
  • the generation module is further used to: send a first prompt message to the user, where the first prompt message is used to prompt the user to select data containing the first private entity; and obtain data corresponding to the first private entity in response to the user's selection operation.
  • the generation module is further used to: obtain initial data including the first private entity; display at least one search result in response to a second search request from a user; the second search request includes the first private entity, and the at least one search result corresponds to a different modality from the second search request; the at least one search result includes the initial data; and based on the user's operation of selecting the initial data from the at least one search result, use the initial data as data corresponding to the first private entity.
  • the generation module is further used to: obtain data containing the first private entity and the second private entity; wherein the second private entity is any private entity among the at least one private entity except the first private entity; and obtain features corresponding to the second private entity based on the data containing the first private entity and the second private entity, and features corresponding to the first private entity.
  • the private entity includes: a title, a name, or a nickname associated with the user.
  • an embodiment of the present application also provides another search device, which includes: a first display module, used to display a first display interface; the first display interface includes a search portal; a second display module, used to display a second display interface in response to a user inputting a keyword at the search portal, the second display interface including search results corresponding to the keyword; wherein the keyword includes a private entity; the search results are obtained based on the features corresponding to the keyword and the private entity; the search results and the keyword correspond to different modalities.
  • a second display interface in response to the user inputting a keyword in the search entrance of the first display interface, a second display interface is displayed; since the search results are obtained based on the keyword and the characteristics corresponding to the private entity in the search request, the private entity in the keyword can be corresponded to the specific image in reality (that is, the specific image of the private entity in the search results), thereby realizing a concrete search; this solves the problem that the terminal device cannot perform a concrete search during the search process, improves the search effect and search efficiency, and enhances the user's search experience.
  • the device also includes: a third display module, used to display a third display interface, the third display interface including a first prompt message and at least one data; the first prompt message is used to prompt the user to select data containing a first private entity; a fourth display module, used to display a fourth display interface in response to the user's selection operation in the at least one data, the fourth display interface including first data and a first identifier, the first identifier is used to indicate that the first data is the data selected by the user.
  • a third display module used to display a third display interface, the third display interface including a first prompt message and at least one data
  • the first prompt message is used to prompt the user to select data containing a first private entity
  • a fourth display module used to display a fourth display interface in response to the user's selection operation in the at least one data, the fourth display interface including first data and a first identifier, the first identifier is used to indicate that the first data is the data selected by the user.
  • the second display interface also includes: a second prompt message, the second prompt message is used to prompt the user to confirm whether the first search result is the search result expected by the user;
  • the device also includes: a fifth display module, used to display a fifth display interface in response to the user's confirmation operation, the fifth display interface includes the first search result and a second identifier, the second identifier is used to indicate that the first search result is the search result confirmed by the user.
  • the embodiment of the present application further provides a model training device, which can be used to execute the technical solution described in the above method embodiment. For example, each step of the method shown in Figure 11 or Figure 12 can be executed.
  • Figure 15 shows a structural diagram of a model training device according to an embodiment of the present application.
  • the device includes: an acquisition module 1501, used to acquire a multimodal sample set and a first model; wherein the multimodal sample set includes: a multimodal sample corresponding to a first private entity, a multimodal sample corresponding to a first search request, and a multimodal sample corresponding to a second search request; the first private entity represents an entity associated with a user, the first search request includes the first private entity, and the second search request does not include the first private entity; a training module 1502, used to train the first model using the multimodal sample set to obtain a second model.
  • fine-tuning is performed relying on a certain "private" data set, so that the trained model (i.e., the second model) can generate similar features for the same private entity in samples of different modalities, and has the ability to infer the features of public information and visualize private information; and then the second model can be used for visualized search, greatly improving the search effect and efficiency.
  • the trained model i.e., the second model
  • the training module 1502 is also used to: process the multimodal samples corresponding to the first private entity and the multimodal samples corresponding to the first search request through the first model to obtain fused features; train the first model according to the fused features, the features corresponding to the first search request and the features corresponding to the second search request to obtain the second model; wherein the features corresponding to the first search request are obtained by processing the multimodal samples corresponding to the first search request by the first model, and the features corresponding to the second search request are obtained by processing the multimodal samples corresponding to the second search request by the first model.
  • the training module 1502 is also used to: input the multimodal sample corresponding to the first search request into the first model to generate a first feature; input the multimodal sample corresponding to the first private entity into the first model to generate a second feature; and fuse the first feature and the second feature to obtain the fused feature.
  • the multimodal samples include samples of the first modality and samples of the second modality; the features corresponding to the first search request are obtained by processing the samples of the first modality corresponding to the first search request by the first model, and the features corresponding to the second search request are obtained by processing the samples of the first modality corresponding to the second search request by the first model; the training module 1502 is also used to: input the samples of the first modality corresponding to the first private entity and the samples of the second modality corresponding to the first search request into the first model to obtain the fused features
  • model training device shown in FIG. 15 The technical effects and specific descriptions of the model training device shown in FIG. 15 and its various possible implementation methods can be found in the above-mentioned model training method, which will not be repeated here.
  • the division of the modules in the above search device and model training device is only a division of logical functions. In actual implementation, they can be fully or partially integrated into one physical entity, or they can be physically separated.
  • the modules in the device can be implemented in the form of a processor calling software; for example, the device includes a processor, the processor is connected to a memory, and instructions are stored in the memory.
  • the processor calls the instructions stored in the memory to implement any of the above methods or realize the functions of the modules of the device, wherein the processor is, for example, a general-purpose processor, such as a central processing unit (CPU) or a microprocessor, and the memory is a memory inside the device or a memory outside the device.
  • CPU central processing unit
  • microprocessor a microprocessor
  • the modules in the device may be implemented in the form of hardware circuits, and the functions of some or all modules may be implemented by designing the hardware circuits, which may be understood as one or more processors; for example, in one implementation, the hardware circuit is an application-specific integrated circuit (ASIC), and the functions of some or all modules above may be implemented by designing the logical relationship of components within the circuit; for another example, in another implementation, the hardware circuit may be implemented by a programmable logic device (PLD), and a field programmable gate array (FPGA) may be used as an example, which may include a large number of logic gate circuits, and the connection relationship between the logic gate circuits may be configured by configuring the configuration file, thereby implementing the functions of some or all modules above. All modules of the above device may be implemented in the form of a processor calling software, or in the form of hardware circuits, or in part by a processor calling software, and the rest by hardware circuits.
  • All modules of the above device may be implemented in the form of a processor calling software, or in the form of
  • the processor is a circuit with the ability to process signals.
  • the processor may be a circuit with the ability to read and run instructions, such as a CPU, a microprocessor, a graphics processing unit (GPU), a digital signal processor (DSP), a neural-network processing unit (NPU), a tensor processing unit (TPU), etc.; in another implementation, the processor may implement certain functions through the logical relationship of a hardware circuit, and the logical relationship of the hardware circuit is fixed or reconfigurable, such as a hardware circuit implemented by an ASIC or PLD, such as an FPGA.
  • the process of the processor loading a configuration document to implement the hardware circuit configuration can be understood as the process of the processor loading instructions to implement the functions of some or all of the above modules.
  • each module in the above device can be one or more processors (or processing circuits) configured to implement the above embodiment method, such as: CPU, GPU, NPU, TPU, microprocessor, DSP, ASIC, FPGA, or a combination of at least two of these processor forms.
  • processors or processing circuits
  • each module in the above device can be fully or partially integrated together, or can be implemented independently, which is not limited.
  • the embodiment of the present application also provides an electronic device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to implement the method of the above embodiment when executing the instructions.
  • a processor comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to implement the method of the above embodiment when executing the instructions.
  • FIG. 16 shows a schematic diagram of the structure of an electronic device 100 according to an embodiment of the present application.
  • the electronic device 100 may include a mobile phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a personal digital assistant (PDA), an augmented reality (AR) device, a virtual reality (VR) device, an artificial intelligence (AI) device, a wearable device, a vehicle-mounted device, a smart home device, or at least one terminal device in a smart city device.
  • PDA personal digital assistant
  • AR augmented reality
  • VR virtual reality
  • AI artificial intelligence
  • wearable device a vehicle-mounted device
  • smart home device or at least one terminal device in a smart city device.
  • the embodiment of the present application does not impose any special restrictions on the specific type of the electronic device 100.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently.
  • the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc.
  • AP application processor
  • GPU graphics processor
  • ISP image signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • Different processing units may be independent devices or integrated in one or more processors.
  • the processor can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of fetching and executing instructions.
  • the processor can execute the steps of the method shown in Figures 2, 3, 5, 6 or 9 above.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in the processor 110 may be a cache memory.
  • the memory may store instructions or data that have been used or are frequently used by the processor 110. If the processor 110 needs to use the instruction or data, it may be directly called from the memory. This avoids repeated access, reduces the waiting time of the processor 110, and thus improves the efficiency of the system.
  • a model or a private knowledge set may be stored.
  • the processor 110 may include one or more interfaces.
  • the interface may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface.
  • the processor 110 may be connected to a touch sensor, an audio module, a wireless communication module, a display, a camera, and other modules through at least one of the above interfaces.
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is only a schematic illustration and does not constitute a structural limitation on the electronic device 100.
  • the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the USB connector 130 is an interface that complies with USB standard specifications and can be used to connect the electronic device 100 and peripheral devices. Specifically, it can be a Mini USB connector, a Micro USB connector, a USB Type C connector, etc.
  • the USB connector 130 can be used to connect a charger to charge the electronic device 100, and can also be used to connect other electronic devices to transmit data between the electronic device 100 and other electronic devices. It can also be used to connect headphones to output audio stored in the electronic device through the headphones.
  • the connector can also be used to connect other electronic devices, such as VR devices.
  • the charging management module 140 is used to receive charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle number, and battery health status (leakage, impedance).
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve the utilization of antennas.
  • antenna 1 can be reused as a diversity antenna for a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide solutions for wireless communications including 2G/3G/4G/5G, etc., applied to the electronic device 100.
  • the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc.
  • the mobile communication module 150 may receive electromagnetic waves from the antenna 1, and perform filtering, amplification, and other processing on the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 may also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna 1.
  • at least some of the functional modules of the mobile communication module 150 may be arranged in the processor 110.
  • at least some of the functional modules of the mobile communication module 150 may be arranged in the same device as at least some of the modules of the processor 110.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 170A, a receiver 170B, etc.), or displays an image or video through a display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide wireless communication solutions including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), bluetooth low energy (BLE), ultra wide band (UWB), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared (IR) and the like applied to the electronic device 100.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • BLE Bluetooth low energy
  • UWB ultra wide band
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared
  • the wireless communication module 160 can be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signal and performs filtering, and sends the processed signal to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, modulate the frequency, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the electronic device 100 can communicate with the network and other electronic devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology.
  • the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS) and/or a satellite based augmentation system (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation system
  • the electronic device 100 can realize the display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc.
  • the electronic device 100 may include one or more display screens 194.
  • the display screen 194 can be used to display a display interface, a search request, a search result, or prompt information, etc.
  • the electronic device 100 can realize the camera function through the camera module 193, ISP, video codec, GPU, display screen 194, application processor AP, neural network processor NPU, etc.
  • the camera module 193 can be used to collect color image data and depth data of the photographed object.
  • the ISP can be used to process the color image data collected by the camera module 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converts it into an image visible to the naked eye.
  • the ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image.
  • the ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP can be set in the camera module 193.
  • the camera module 193 may be composed of a color camera module and a 3D sensing module.
  • the 3D sensing module may be a (time of flight, TOF) 3D sensing module or a structured light (structured light) 3D sensing module.
  • the camera module 193 may also be composed of two or more cameras.
  • the electronic device 100 may include one or more camera modules 193. Specifically, the electronic device 100 may include one front camera module 193 and one rear camera module 193.
  • the front camera module 193 can generally be used to collect the color image data and depth data of the photographer himself facing the display screen 194, and the rear camera module can be used to collect the color image data and depth data of the shooting object (such as people, scenery, etc.) facing the photographer.
  • the shooting object such as people, scenery, etc.
  • the CPU or GPU or NPU in the processor 110 can be based on Huawei's PoissonEngine_VS vector engine, or open source vector engines Faiss, Annoy, Milvus, Vearch, etc.; perform private knowledge reasoning, build a private knowledge set, and store the private knowledge set in the data; it can also infer keywords based on user search requests to obtain fusion features corresponding to the keywords.
  • Digital signal processors are used to process digital signals and can also process other digital signals.
  • Video codecs are used to compress or decompress digital videos.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record videos in a variety of coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG Moving Picture Experts Group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • NPU is a neural network (NN) computing processor.
  • NN neural network
  • applications such as intelligent cognition of electronic device 100 can be realized, such as image recognition, face recognition, voice recognition, text understanding, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, music, video and other files are saved in the external memory card. Or music, video and other files are transferred from the electronic device to the external memory card.
  • the internal memory 121 can be used to store computer executable program codes, which include instructions.
  • the internal memory 121 can include a program storage area and a data storage area.
  • the program storage area can store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
  • the data storage area can store data created during the use of the electronic device 100 (such as audio data, a phone book, etc.), etc.
  • a private knowledge set and a database can be stored.
  • the internal memory 121 can include a high-speed random access memory, and can also include a non-volatile memory, such as at least one disk storage device, a flash memory device, a universal flash storage (UFS), etc.
  • the processor 110 executes various functional methods or data processing of the electronic device 100 by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor. Exemplarily, the method shown in Figures 2, 3, 5, 6 or 9 above can be executed.
  • the electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 can be arranged in the processor 110, or some functional modules of the audio module 170 can be arranged in the processor 110.
  • the speaker 170A also called a "speaker" is used to convert an audio electrical signal into a sound signal.
  • the electronic device 100 can listen to music or output an audio signal for a hands-free call through the speaker 170A.
  • the receiver 170B also called a "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be received by placing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak by putting their mouth close to microphone 170C to input the sound signal into microphone 170C.
  • the electronic device 100 can be provided with at least one microphone 170C. In other embodiments, the electronic device 100 can be provided with two microphones 170C, which can not only collect sound signals but also realize noise reduction function. In other embodiments, the electronic device 100 can also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify the sound source, realize directional recording function, etc.
  • the earphone interface 170D is used to connect a wired earphone.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5 mm open mobile terminal platform (OMTP) standard interface or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 180A can be set on the display screen 194.
  • a capacitive pressure sensor can be a parallel plate including at least two conductive materials.
  • the gyro sensor 180B can be used to determine the motion posture of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y, and z axes) can be determined by the gyro sensor 180B. The gyro sensor 180B can be used for anti-shake shooting.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip leather case.
  • the magnetic sensor 180D can be used to detect the folding or unfolding of the electronic device, or the folding angle.
  • the acceleration sensor 180E can detect the magnitude of acceleration in various directions (generally three axes) of the electronic device 100. When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected.
  • the distance sensor 180F is used to measure the distance.
  • the electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • the ambient light sensor 180L can be used to sense the brightness of the ambient light.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is blocked, for example, the electronic device is in a pocket.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to achieve fingerprint unlocking, access application locks, fingerprint photography, fingerprint call answering, etc.
  • the temperature sensor 180J is used to detect temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature detected by the temperature sensor 180J exceeds a threshold, the electronic device 100 executes to reduce the performance of the processor so as to reduce the power consumption of the electronic device to implement thermal protection.
  • the touch sensor 180K is also called a "touch control device”.
  • the touch sensor 180K can be set on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a "touch control screen”.
  • the touch sensor 180K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to the touch operation can be provided through the display screen 194.
  • the touch sensor 180K can also be set on the surface of the electronic device 100, which is different from the position of the display screen 194.
  • the bone conduction sensor 180M can obtain a vibration signal. In some embodiments, the bone conduction sensor 180M can obtain a vibration signal of a vibrating bone block of a human vocal part.
  • the key 190 may include a power key, a volume key, etc.
  • the key 190 may be a mechanical key or a touch key.
  • the electronic device 100 may receive key input and generate key signal input related to user settings and function control of the electronic device 100.
  • Motor 191 can generate vibration prompts.
  • Motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • touch operations acting on different areas of the display screen 194 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminders, receiving messages, alarm clocks, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 may be an indicator light, which may be used to indicate the charging status, power changes, messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to or disconnected from the electronic device 100 by inserting it into or pulling it out of the SIM card interface 195.
  • the electronic device 100 can support one or more SIM card interfaces.
  • the SIM card interface 195 can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 can also be compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as calls and data communications.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture.
  • the embodiment of the present application takes the Android system of the layered architecture as an example to exemplify the software structure of the electronic device 100.
  • FIG. 17 shows a software structure block diagram of the electronic device 100 according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, each with clear roles and division of labor.
  • the layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, from top to bottom: the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include applications such as phone, camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
  • the application framework layer provides application programming interface (API) and programming framework for the applications in the application layer.
  • API application programming interface
  • the application framework layer includes some predefined functions.
  • the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
  • the window manager is used to manage window programs.
  • the window manager can obtain the display screen size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • the data may include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, private knowledge sets, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying images, controls for displaying gallery user interfaces (UI), etc.
  • the view system can be used to build applications.
  • a display interface can be composed of one or more views.
  • a display interface including a text notification icon can include a view for displaying text and a view for displaying images. As an example, it can be used to provide a display interface for image viewing, search entry, and search results.
  • the phone manager is used to provide communication functions for terminal devices, such as the management of call status (including answering, hanging up, etc.).
  • the resource manager provides various resources for applications, such as localized strings, icons, images, layout files, video files, and so on.
  • the notification manager enables applications to display notification information in the status bar. It can be used to convey notification-type messages and can disappear automatically after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be a notification that appears in the system top status bar in the form of a chart or scroll bar text, such as notifications of applications running in the background, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in the status bar, a prompt sound is emitted, the terminal device vibrates, and the indicator light flashes.
  • Android Runtime includes core libraries and virtual machines. Android Runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function that needs to be called by the Java language, and the other part is the Android core library.
  • the application layer and the application framework layer run in a virtual machine.
  • the virtual machine executes the Java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules, such as surface manager, media library, 3D graphics processing library (such as OpenGL ES), 2D graphics engine (such as SGL), etc.
  • functional modules such as surface manager, media library, 3D graphics processing library (such as OpenGL ES), 2D graphics engine (such as SGL), etc.
  • the surface manager is used to manage the display subsystem and provide the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • a 2D graphics engine is a drawing engine for 2D drawings.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • Figure 18 shows a structural schematic diagram of another electronic device according to an embodiment of the present application.
  • the electronic device may be a server; as shown in Figure 18, the electronic device may include: at least one processor 1601, a communication line 1602, a memory 1603 and at least one communication interface 1604.
  • Processor 1601 can be a general-purpose central processing unit, a microprocessor, a specific application integrated circuit, or one or more integrated circuits for controlling the execution of the program of the present application; processor 1601 can also include a heterogeneous computing architecture of multiple general-purpose processors, for example, it can be a combination of at least two of CPU, GPU, microprocessor, DSP, ASIC, FPGA; as an example, processor 1601 can be CPU+GPU or CPU+ASIC or CPU+FPGA.
  • the communication link 1602 may include a pathway for transmitting information between the above-mentioned components.
  • the communication interface 1604 uses any transceiver-like device for communicating with other devices or communication networks, such as Ethernet, RAN, wireless local area networks (WLAN), etc.
  • Ethernet Ethernet
  • RAN wireless local area networks
  • WLAN wireless local area networks
  • the memory 1603 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory may be independent and connected to the processor via a communication line 1602.
  • the memory may also be integrated with the processor.
  • the memory provided in the embodiment of the present application may generally have non-volatility.
  • the memory 1603 is used to store computer-executable instructions for executing the solution of the present application, and the execution is controlled by the processor 1601.
  • the processor 1601 is used to execute the computer-executable instructions stored in the memory 1603, thereby implementing the method provided in the above embodiment of the present application; illustratively, each step of the method shown in Figure 11 or Figure 12 can be implemented.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application code, which is not specifically limited in the embodiments of the present application.
  • the processor 1601 may include one or more CPUs, for example, CPU0 in FIG. 18 ; the processor 1601 may also include a CPU, and any one of a GPU, an ASIC, and an FPGA, for example, CPU0+GPU0 or CPU 0+ASIC0 or CPU0+FPGA0 in FIG. 18 .
  • an NPU may also be included, and the GPU and the NPU may provide the computing power required for machine learning and neural network operators.
  • the electronic device may include multiple processors, such as processor 1601 and processor 1607 in FIG. 18.
  • processors may be a single-core (single-CPU) processor, a multi-core (multi-CPU) processor, or a heterogeneous computing architecture including multiple general-purpose processors.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
  • the electronic device may further include an output device 1605 and an input device 1606.
  • the output device 1605 communicates with the processor 1601 and may display information in a variety of ways.
  • the output device 1605 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector, etc.
  • LCD liquid crystal display
  • LED light emitting diode
  • CRT cathode ray tube
  • a projector etc.
  • it may be a display device such as a vehicle-mounted HUD, an AR-HUD, a display, etc.
  • the input device 1606 communicates with the processor 1601 and may receive user input in a variety of ways.
  • the input device 1606 may be a mouse, a keyboard, a touch screen device, or a sensor device, etc.
  • inventions of the present application provide a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the method in the above embodiment is implemented. Exemplarily, each step of the method shown in Figure 2, Figure 3, Figure 5, Figure 6 or Figure 9 can be implemented, or each step of the method shown in Figure 11 or Figure 12 can be executed.
  • the embodiments of the present application provide a computer program product, which may include, for example, a computer-readable code or a non-volatile computer-readable storage medium carrying the computer-readable code; when the computer program product is run on a computer, the computer executes the method in the above embodiment. Exemplarily, each step of the method shown in FIG. 2, FIG. 3, FIG. 5, FIG. 6 or FIG. 9 may be executed, or each step of the method shown in FIG. 11 or FIG. 12 may be executed.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions used by an instruction execution device.
  • a computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above.
  • Computer-readable storage media include: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a static random access memory (SRAM), a portable compact disk read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanical encoding device, such as a punch card or a raised structure in a groove on which instructions are stored, and any suitable combination of the above.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • mechanical encoding device such as a punch card or a raised structure in a groove on which instructions are stored, and any suitable combination of the above.
  • a computer-readable storage medium is not to be interpreted as a transient signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., a light pulse through a fiber optic cable), or an electrical signal transmitted through a wire.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to each computing/processing device, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network can include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device.
  • the computer program instructions for performing the operation of the present application can be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed completely on a user's computer, partially on a user's computer, executed as an independent software package, partially on a user's computer, partially on a remote computer, or completely on a remote computer or server.
  • the remote computer can be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (for example, using an Internet service provider to connect through the Internet).
  • LAN local area network
  • WAN wide area network
  • an Internet service provider for example, using an Internet service provider to connect through the Internet.
  • the electronic circuit can execute a computer-readable program instruction, thereby realizing various aspects of the present application.
  • These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processor of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
  • each square box in the flow chart or block diagram can represent a part of a module, program segment or instruction, and a part of the module, program segment or instruction includes one or more executable instructions for realizing the specified logical function.
  • the function marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square boxes can actually be executed substantially in parallel, and they can sometimes be executed in reverse order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs the specified function or action, or can be implemented with a combination of special-purpose hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及一种搜索方法、模型训练方法、装置及存储介质。其中,搜索方法可以包括:接收用户的第一搜索请求,第一搜索请求中包括私有实体;在至少一个私有实体对应的特征中,确定第一搜索请求中私有实体对应的特征;其中,至少一个私有实体对应的特征指示至少一个私有实体对应的数据,至少一个私有实体对应的数据与第一搜索请求对应于不同的模态;根据第一搜索请求及第一搜索请求中私有实体对应的特征,得到搜索结果;搜索结果与第一搜索请求对应于不同的模态;显示搜索结果;通过本申请,可以解决终端设备在搜索过程中无法进行具象化搜索的问题,提升了搜索效果和搜索效率,从而提升用户的搜索体验。

Description

一种搜索方法、模型训练方法、装置及存储介质 技术领域
本申请涉及计算机搜索技术领域,尤其涉及一种搜索方法、模型训练方法、装置及存储介质。
背景技术
当前业界主流的搜索引擎都是面向云侧的,基于云侧服务的搜索能力得到了长足发展,在构建索引技术上,既有传统基于单词的倒排索引,也有当下流行的基于模型推理向量的近似最近邻(Approximate Nearest Neighbor,ANN)索引。而在手机、平板等终端设备上,由于各种资源限制(例如内存、存储、功耗等),导致云侧搜索实践无法迁移到端侧,亦或是需要做大量裁剪,且最终效果难于保证。此外,出于安全隐私因素,无法使用云侧能力来搜索端侧的数据。
而现有端侧搜索技术中无法进行具象化搜索,即无法将查询文本中的实体与实际中具体的形象相对应,搜索效果较差,用户的搜索体验不佳。
发明内容
有鉴于此,提出了一种搜索方法、模型训练方法、装置、电子设备及存储介质。
第一方面,本申请的实施例提供了一种搜索方法,所述方法包括:接收用户的第一搜索请求,所述第一搜索请求中包括私有实体;其中,私有实体表示与所述用户具有关联关系的实体;在至少一个私有实体对应的特征中,确定所述第一搜索请求中私有实体对应的特征;其中,所述至少一个私有实体对应的特征指示所述至少一个私有实体对应的数据,所述至少一个私有实体对应的数据与所述第一搜索请求对应于不同的模态;根据所述第一搜索请求及所述第一搜索请求中私有实体对应的特征,得到搜索结果;所述搜索结果与所述第一搜索请求对应于不同的模态;显示所述搜索结果。
基于上述技术方案,在至少一个私有实体对应的特征中,确定用户的搜索请求中的私有实体对应的特征,并根据该私有实体对应的特征及第一搜索请求,得到搜索结果。由于至少一个私有实体对应的特征指示至少一个私有实体对应的数据,至少一个私有实体对应的数据可以与搜索结果对应于同一模态,从而可以将搜索请求中私有实体与实际中具体的形象(即搜索结果中私有实体的具体形象)相对应,从而实现具象化搜索;解决了终端设备在搜索过程中无法进行具象化搜索的问题,提高了搜索效果和搜索效率,提升了用户的搜索体验。作为一个示例,可以通过模型对第一搜索请求中私有实体及第一搜索请求进行推理,得到公私融合特征,进而通过公私融合特征,在数据库中确定搜索结果。此外,无需封闭式标签集,灵活性强;可以满足用户多元化的搜索请求,扩展性更高;针对不同的终端设备,无需更换模型,维护更加容易。
根据第一方面,在所述第一方面的第一种可能的实现方式中,所述根据所述第一搜索 请求及所述第一搜索请求中私有实体对应的特征,得到搜索结果,包括:通过模型对所述第一搜索请求及所述第一搜索请求中私有实体对应的特征进行处理,生成融合特征;所述融合特征指示所述第一搜索请求;将数据库中与所述融合特征相匹配的特征对应的数据作为所述搜索结果。
基于上述技术方案,使用模型对第一搜索请求进行推理,生成融合特征,该融合特征能够同时表征公有信息及具象化的私有信息;作为一个示例,可以通过模型对第一搜索请求进行处理,生成公有特征,进而将生成的公有特征和私有特征(即第一搜索请求中私有实体对应的特征)进行融合,生成融合特征;从而针对不同用户,搜索到用户想要的搜索结果,实现基于私有特征与公有特征融合的具象化搜索。
根据第一方面或第一方面的第一种可能的实现方式,在所述第一方面的第二种可能的实现方式中,所述方法还包括:获取第一私有实体对应的数据;所述第一私有实体为所述至少一个私有实体中的任一私有实体;对所述第一私有实体对应的数据进行处理,生成所述第一私有实体对应的特征。
基于上述技术方案,获取第一私有实体对应的数据,作为一个示例,可以采用显示采集或隐式采集的方式获取第一私有实体对应的数据,从而在隐私安全允许的范围内采集具象化的私有信息;并对第一私有实体对应的数据进行处理,生成第一私有实体对应的特征,从而得到第一私有实体对应的特征;示例性地,该第一私有实体对应的特征可以存储在终端设备中;从而在后续终端设备执行多模态搜索的过程中,为快速确定目标私有实体对应的特征提供支持。
根据第一方面的第二种可能的实现方式,在所述第一方面的第三种可能的实现方式中,所述获取第一私有实体对应的数据,包括:向用户发出第一提示信息,所述第一提示信息用于提示用户选取包含所述第一私有实体的数据;响应于用户的选取操作,获取所述第一私有实体对应的数据。
基于上述技术方案,以用户可以感知到的显式采集方式获取第一私有实体对应的数据,由于该私有信息为用户主观选取,这样采集到的私有信息具备高置信度的特点;从而实现在用户隐私安全范围内,通过显示引导,采集高置信度的具象化的私有信息。
根据第一方面的第二种可能的实现方式,在所述第一方面的第四种可能的实现方式中,所述获取第一私有实体对应的数据,包括:获取包含所述第一私有实体的初始数据;响应于用户的第二搜索请求,显示至少一个搜索结果;所述第二搜索请求包含所述第一私有实体,所述至少一个搜索结果与所述第二搜索请求对应于不同的模态;所述至少一个搜索结果包括所述初始数据;基于用户在所述至少一个搜索结果中选取所述初始数据的操作,将所述初始数据作为所述第一私有实体对应的数据。
基于上述技术方案,自动获取低置信度的具象化的私有信息(即初始数据),进一步响应于用户主动触发的第二搜索请求,显示至少一个搜索结果,并基于用户在至少一个搜索结果中选取上述初始数据的操作,将该初始数据作为第一私有实体对应的数据,提升具象化的私有信息的置信度,从而以用户无感知的隐式采集方式获取第一私有实体对应的数据。
根据第一方面的第二种或第三种或第四种可能的实现方式,在所述第一方面的第五种可能的实现方式中,所述方法还包括:获取包含所述第一私有实体及第二私有实体的数据; 其中,所述第二私有实体为所述至少一个私有实体中除所述第一私有实体之外的任一私有实体;根据所述包含所述第一私有实体及第二私有实体的数据,及所述第一私有实体对应的特征,得到所述第二私有实体对应的特征。
基于上述技术方案,根据获取到的包含多个私有实体的数据和已生成的私有知识(即第一私有实体对应的特征),得到第二私有实体对应的特征;作为一个示例,可以利用常识和模型得到第二私有实体对应的特征,自动扩展私有知识集,从而在隐私安全允许的范围内,以尽可能少的采集,利用少量的已生成的私有知识,完善具象化的私有知识,实现了在用户反馈数据少等情况对数据的有效利用,为终端设备执行多模态搜索的过程中实现具象化搜索提供支持。
根据第一方面的上述各种可能的实现方式,在所述第一方面的第六种可能的实现方式中,所述私有实体包括:与所述用户相关联的称谓、人名或昵称。
第二方面,本申请的实施例提供了一种搜索方法,所述方法包括:显示第一显示界面;所述第一显示界面包括搜索入口;响应于用户在所述搜索入口输入关键词的操作,显示第二显示界面,所述第二显示界面包括对应于所述关键词的第一搜索结果;其中,所述关键词中包括私有实体;所述第一搜索结果根据所述关键词及所述私有实体对应的特征得到;所述第一搜索结果与所述关键词对应于不同的模态。
基于上述技术方案,响应于用户在第一显示界面中搜索入口输入关键词的操作,显示第二显示界面;由于第一搜索结果根据该关键词及搜索请求中私有实体对应的特征得到,从而可以将该关键词中私有实体与实际中具体的形象(即搜索结果中私有实体的具体形象)相对应,从而实现具象化搜索;解决了终端设备在搜索过程中无法进行具象化搜索的问题,提高了搜索效果和搜索效率,提升了用户的搜索体验。
根据第二方面,在所述第二方面的第一种可能的实现方式中,显示第三显示界面,所述第三显示界面包括第一提示信息及至少一个数据;所述第一提示信息用于提示用户选取包含第一私有实体的数据;响应于用户在所述至少一个数据中的选取操作,显示第四显示界面,所述第四显示界面包括第一数据及第一标识,所述第一标识用于指示所述第一数据为用户所选取的数据。
基于上述技术方案,以用户可以感知到的显式采集方式获取第一私有实体对应的数据,由于该私有信息为用户主观选取,这样采集到的私有信息具备高置信度的特点;从而实现在用户隐私安全范围内,通过显示引导,采集高置信度的具象化的私有信息。
根据第二方面或第二方面的第一种可能的实现方式,在所述第二方面的第二种可能的实现方式中,所述第二显示界面还包括:第二提示信息,所述第二提示信息用于提示用户确认所述第一搜索结果是否为用户期望的搜索结果;所述方法还包括:响应于用户的确认操作,显示第五显示界面,所述第五显示界面包括第一搜索结果及第二标识,所述第二标识用于指示所述第一搜索结果为用户所确认的搜索结果。
基于上述技术方案,以用户可以感知到的显式采集方式获取包含私有实体对应的数据,由于该私有信息经过用户确认,这样采集到的私有信息具备高置信度的特点;从而实现在用户隐私安全范围内,通过显示引导,采集高置信度的具象化的私有信息。
第三方面,本申请的实施例提供了一种模型训练方法,所述方法包括:获取多模态样本集及第一模型;其中,所述多模态样本集包括:第一私有实体对应的多模态样本、第一 搜索请求对应的多模态样本、第二搜索请求对应的多模态样本;所述第一私有实体表示与用户具有关联关系的实体,所述第一搜索请求包含所述第一私有实体,所述第二搜索请求不包含所述第一私有实体;利用所述多模态样本集对所述第一模型进行训练,得到第二模型。
基于上述技术方案,在对第一模型进行训练过程中,依赖于一定的“私有”数据集进行微调,这使得训练好的模型(即第二模型)可以为不同模态的样本中同一私有实体生成相似的特征,具备推理公有信息及具象化私有信息的特征的能力;进而可以利用第二模型进行具象化搜索,大大提升搜索效果和效率。作为一个示例,可以使用公开的数据集对多模态模型进行训练,从而得到用于为不同模态的样本中同一非私有实体生成相似的特征的第一模型;通过公开数据集训练得到的第一模型具有能力识别通用的实体。
根据第三方面,在所述第三方面的第一种可能的实现方式中,所述利用所述多模态样本集对所述第一模型进行训练,得到第二模型,包括:通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征;根据所述融合特征、所述第一搜索请求对应的特征及所述第二搜索请求对应的特征,对所述第一模型进行训练,得到所述第二模型;其中,所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的多模态样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的多模态样本进行处理得到。
基于上述技术方案,利用多模态模型及私有数据,将私有特征与公有特征融合后,再与对应的多模态的特征对齐,从而训练得公私融合多模态模型;这样,利用公有特征和私有特征联合训练,对第一模型进行微调,这使得训练后的模型(即第二模型)具备推理具象化私有信息的能力;解决了仅通过公开数据集训练的多模态模型无法承载具象化私有信息的问题。
根据第三方面的第一种可能的实现方式,在所述第三方面的第二种可能的实现方式中,所述通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征,包括:将所述第一搜索请求对应的多模态样本输入到所述第一模型中,生成第一特征;将所述第一私有实体对应的多模态样本输入到所述第一模型中,生成第二特征;对所述第一特征及所述第二特征进行融合,得到所述融合特征。
基于上述技术方案,在对第一模型的训练过程中,使用第一模型生成私有特征(即第二特征)及公有特征(即第一特征),并将生成的公有特征和私有特征进行融合,融合后的特征能够同时表征公有信息及具象化的私有信息,从而使得第一模型可以兼顾学习公有信息和私有信息。
根据第三方面的第一种可能的实现方式或第三方面的第二种可能的实现方式,在所述第三方面的第三种可能的实现方式中,所述多模态样本包括第一模态的样本及第二模态的样本;所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的第一模态的样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的第一模态的样本进行处理得到;所述通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征,包括:将所述第一私有实体对应的第一模态的样本及所述第一搜索请求对应的第二模态的样本输入 到所述第一模型中,得到所述融合特征。
第四方面,本申请的实施例提供了一种搜索装置,所述装置包括:接收模块,用于接收用户的第一搜索请求,所述第一搜索请求中包括私有实体;其中,私有实体表示与所述用户具有关联关系的实体;确定模块,用于在至少一个私有实体对应的特征中,确定所述第一搜索请求中私有实体对应的特征;其中,所述至少一个私有实体对应的特征指示所述至少一个私有实体对应的数据,所述至少一个私有实体对应的数据与所述第一搜索请求对应于不同的模态;搜索模块,用于根据所述第一搜索请求及所述第一搜索请求中私有实体对应的特征,得到搜索结果;所述搜索结果与所述第一搜索请求对应于不同的模态;显示模块,用于显示所述搜索结果。
根据第四方面,在所述第四方面的第一种可能的实现方式中,所述搜索模块,还用于:通过模型对所述第一搜索请求及所述第一搜索请求中私有实体对应的特征进行处理,生成融合特征;所述融合特征指示所述第一搜索请求;将数据库中与所述融合特征相匹配的特征对应的数据作为所述搜索结果。
根据第四方面或第四方面的第一种可能的实现方式,在所述第四方面的第二种可能的实现方式中,所述装置还包括:生成模块,用于获取第一私有实体对应的数据;所述第一私有实体为所述至少一个私有实体中的任一私有实体;对所述第一私有实体对应的数据进行处理,生成所述第一私有实体对应的特征。
根据第四方面的第二种可能的实现方式,在所述第四方面的第三种可能的实现方式中,所述生成模块,还用于:向用户发出第一提示信息,所述第一提示信息用于提示用户选取包含所述第一私有实体的数据;响应于用户的选取操作,获取所述第一私有实体对应的数据。
根据第四方面的第二种可能的实现方式,在所述第四方面的第四种可能的实现方式中,所述生成模块,还用于:获取包含所述第一私有实体的初始数据;响应于用户的第二搜索请求,显示至少一个搜索结果;所述第二搜索请求包含所述第一私有实体,所述至少一个搜索结果与所述第二搜索请求对应于不同的模态;所述至少一个搜索结果包括所述初始数据;基于用户在所述至少一个搜索结果中选取所述初始数据的操作,将所述初始数据作为所述第一私有实体对应的数据。
根据第四方面的第二种或第三种或第四种可能的实现方式,在所述第四方面的第五种可能的实现方式中,所述生成模块,还用于:获取包含所述第一私有实体及第二私有实体的数据;其中,所述第二私有实体为所述至少一个私有实体中除所述第一私有实体之外的任一私有实体;根据所述包含所述第一私有实体及第二私有实体的数据,及所述第一私有实体对应的特征,得到所述第二私有实体对应的特征。
根据第四方面的上述各种可能的实现方式,在所述第四方面的第六种可能的实现方式中,所述私有实体包括:与所述用户相关联的称谓、人名或昵称。
第五方面,本申请的实施例提供了一种搜索装置,所述装置包括:第一显示模块,用于显示第一显示界面;所述第一显示界面包括搜索入口;第二显示模块,用于响应于用户在所述搜索入口输入关键词的操作,显示第二显示界面,所述第二显示界面包括对应于所述关键词的搜索结果;其中,所述关键词中包括私有实体;所述搜索结果根据所述关键词及所述私有实体对应的特征得到;所述搜索结果与所述关键词对应于不同的模态。
根据第二方面,在所述第二方面的第一种可能的实现方式中,所述装置还包括:第三显示模块,用于显示第三显示界面,所述第三显示界面包括第一提示信息及至少一个数据;所述第一提示信息用于提示用户选取包含第一私有实体的数据;第四显示模块,用于响应于用户在所述至少一个数据中的选取操作,显示第四显示界面,所述第四显示界面包括第一数据及第一标识,所述第一标识用于指示所述第一数据为用户所选取的数据。
根据第二方面或第二方面的第一种可能的实现方式,在所述第二方面的第二种可能的实现方式中,所述第二显示界面还包括:第二提示信息,所述第二提示信息用于提示用户确认所述第一搜索结果是否为用户期望的搜索结果;所述装置还包括:第五显示模块,用于响应于用户的确认操作,显示第五显示界面,所述第五显示界面包括第一搜索结果及第二标识,所述第二标识用于指示所述第一搜索结果为用户所确认的搜索结果。
第六方面,本申请的实施例提供了一种模型训练装置,所述装置包括:所述装置包括:获取模块,用于获取多模态样本集及第一模型;其中,所述多模态样本集包括:第一私有实体对应的多模态样本、第一搜索请求对应的多模态样本、第二搜索请求对应的多模态样本;所述第一私有实体表示与用户具有关联关系的实体,所述第一搜索请求包含所述第一私有实体,所述第二搜索请求不包含所述第一私有实体;训练模块,用于利用所述多模态样本集对所述第一模型进行训练,得到第二模型。
根据第六方面,在所述第六方面的第一种可能的实现方式中,所述训练模块,还用于:通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征;根据所述融合特征、所述第一搜索请求对应的特征及所述第二搜索请求对应的特征,对所述第一模型进行训练,得到所述第二模型;其中,所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的多模态样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的多模态样本进行处理得到。
根据第六方面的第一种可能的实现方式,在所述第六方面的第二种可能的实现方式中,所述训练模块,还用于:将所述第一搜索请求对应的多模态样本输入到所述第一模型中,生成第一特征;将所述第一私有实体对应的多模态样本输入到所述第一模型中,生成第二特征;对所述第一特征及所述第二特征进行融合,得到所述融合特征。
根据第六方面的第一种可能的实现方式或第六方面的第二种可能的实现方式,在所述第六方面的第三种可能的实现方式中,所述多模态样本包括第一模态的样本及第二模态的样本;所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的第一模态的样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的第一模态的样本进行处理得到;所述训练模块,还用于:将所述第一私有实体对应的第一模态的样本及所述第一搜索请求对应的第二模态的样本输入到所述第一模型中,得到所述融合特征。
第七方面,本申请的实施例提供了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现第一方面或第一方面的一种或几种的搜索方法、第二方面的搜索方法、或者实现第三方面或第三方面的一种或几种的模型训练方法。
第八方面,本申请的实施例提供了一种计算机可读存储介质,其上存储有计算机程序 指令,所述计算机程序指令被处理器执行时实现第一方面或第一方面的一种或几种的搜索方法、第二方面的搜索方法、或者实现第二方面或第二方面的一种或几种的模型训练方法。
第九方面,本申请的实施例提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行上述第一方面或第一方面的一种或几种的搜索方法、第二方面的搜索方法、或者执行上述第二方面或第二方面的一种或几种的模型训练方法。
上述第四方面至第九方面的技术效果,可参见上述第一方面或第二方面或第三方面。
附图说明
图1(a)-(c)示出根据本申请一实施例的搜索方法的多种应用场景的示意图。
图2示出根据本申请一实施例的一种搜索方法的流程图。
图3示出根据本申请一实施例的一种得到搜索结果的方法流程图。
图4(a)-(c)示出根据本申请一实施例的一种搜索方法的示意图。
图5示出根据本申请一实施例的一种构建私有知识集的方法流程图。
图6示出根据本申请一实施例的一种显式采集方式的流程图。
图7(a)-(c)示出根据本申请一实施例的一种显式采集方式的示意图。
图8(a)-(c)示出根据本申请一实施例的一种显式采集方式的示意图。
图9示出根据本申请一实施例的一种隐式采集方式的流程图。
图10(a)-(c)示出根据本申请一实施例的一种隐式采集方式的示意图。
图11示出根据本申请一实施例的模型训练方法的流程图。
图12示出根据本申请一实施例的模型训练方法的流程图。
图13示出根据本申请一实施例的模型训练方法的流程示意图。
图14示出根据本申请一实施例的一种搜索装置的结构图。
图15示出根据本申请一实施例的一种模型训练装置的结构图。
图16示出根据本申请一实施例的一种电子设备100的结构示意图。
图17示出根据本申请一实施例的电子设备100的软件结构框图。
图18示出根据本申请一实施例的另一种电子设备的结构示意图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:包括单 独存在A,同时存在A和B,以及单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。
为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的相关术语和概念进行介绍。
1、多模态数据
多模态数据是指对于同一个描述对象,通过不同领域或视角获取到的数据,并且把描述这些数据的每一个领域、视角、存在形式或信息来源叫做一个模态。由两种或两种以上模态组成的数据称之为多模态数据。示例性地,模态数据可以包括:文本、图像、视频、音频等等数据。
2、多模搜索
也称多模态搜索,或多模态检索;是一种通过使用与所要搜索的数据的模态不一致的查询语句(如关键词)进行搜索的技术,例如,以文搜图。
3、端内多模态搜索
通过手机、平板、个人电脑(Personal Computer,PC)等终端,在终端内进行多模态搜索的技术。
4、跨端多模态搜索
一种通过某种数据协议,能够分发搜索请求到多个终端,实现跨终端的多模态搜索和结果汇聚的技术。
5、端云多模态搜索
一种通过某种数据协议,将终端与云端服务器等远程服务器结合,从而通过终端与云端服务器实现多模态联合搜索的技术。
下面首先对本申请实施例中搜索方法可以适用的应用场景进行示例性说明。图1(a)-(c)示出根据本申请一实施例的搜索方法的多种应用场景的示意图。
场景一、端内多模态搜索场景,该场景中,用户可以在终端设备中进行本地图库搜索、全局搜索等,该终端设备上存储有用户的私人照片;示例性地,终端设备可以为个人计算机、笔记本电脑、智能手机、平板电脑、物联网设备或便携式可穿戴设备等,其中,物联网设备可为智能音箱、智能电视、智能空调、智能车载设备等,便携式可穿戴设备可为智 能手表、智能手环、头戴设备等。作为一个示例,以端内以文搜图为例,如图1(a)所示,手机101可以显示用户搜索界面,用户搜索界面中包括图片搜索请求的入口;用户可以通过该图片搜索请求的入口,输入关键词,从而触发手机101在本地图库中搜索用户想要的图片,并可以在手机101上显示搜索到的图片。
场景二、跨端多模态搜索场景,该场景中,用户可以通过一个终端设备对其他终端设备上的图库进行搜索、全局搜索等,其中,该终端设备与该其他终端设备之间建立有连接关系,该其他终端设备中存储有用户的私人照片。作为一个示例,以跨端以文搜图为例,如图1(b)所示,手机101与个人电脑102之间建立连接,手机101可以显示用户搜索界面,用户搜索界面中包括图片搜索请求的入口;用户可以通过该图片搜索请求的入口,输入关键词,手机101将该关键词传递至个人电脑102,从而触发个人电脑102在个人电脑图库中搜索用户想要的图片,并可以在手机101上显示搜索到的图片。
场景三、端云多模态搜索场景,该场景中,用户可以通过一个终端设备对云端服务器上的图库中进行图库搜索、全局搜索等,其中,该终端设备与云端服务器之间建立有连接关系,云端服务器中存储有用户的私人照片;示例性地,云端服务器可以为独立的服务器或者是多个服务器组成的服务器集群。作为一个示例,以端云以文搜图为例,如图1(c)所示,手机101可以通过网络与云端服务器103之间建立连接,手机101可以显示用户搜索界面,用户搜索界面中包括图片搜索请求的入口;用户可以通过该图片搜索请求的入口,输入关键词,手机101将该关键词传递至云端服务器103,从而触发云端服务器103在云端图库中搜索用户想要的图片,并可以在手机101上显示搜索到的图片。
以上述图1(a)所示的端内以文搜图搜索场景为例,相关技术中,端内以文搜图方式主要有两种。
方式一,采用固定标签进行端内多模态搜索的方式。该方式中,首先针对图库中的图片,通过固有标签(例如时间、地点等)和目标检测、图像分类、光学字符识别(Optical Character Recognition,OCR)等模型推理的标签结合,形成封闭标签集;然后,根据该封闭标签集建立倒排索引;然后进行查询标签计算:进而通过文本分词技术,将用户输入的关键词进行文本分解,得到多个标签;最后,根据上述得到的多个标签,进行标签倒排查找,搜索出符合条件的图片,并将该图片返回给用户。
该方式能够实现图片搜索功能,但是其仍然存在以下不足:(1)仅支持基于封闭标签集的搜索,用户搜索时关键词需要严格按照标签输入,进行严格匹配,灵活性差,很难解决歧义、模糊、同义词等匹配场景,无法做到语义匹配;(2)封闭标签集内的标签非常有限,无法对图片的语义进行有效的表征,无法满足大多数情况下用户多元化的搜索请求,即使使用传统的自然语言理解(Natural Language Understanding,NLU)做关键词仍需要针对不同的搜索业务做定制,缺乏通用性,扩展性差;(3)若扩展封闭标签集,则需要相应的更换推理模型,维护难度大。
方式二,采用开放式进行端内多模态搜索的方式。该方式中,首先,使用公开丰富的图文对数据训练多模态模型,得到高维空间下图文对相同语义表征的多模态模型;然后,将图库中的图片,经过多模态模型推理得到相应高维空间特征向量,形成图片底库数据;进而,通过多模态模型,推理用户关键词的高维空间特征向量;最后,对上述生成的高维空间特征向量和图片语义底库中的图片特征向量进行相似度计算,得到最高相似度对应的 图片,从而完成图库搜索。
该方式能够从语义的角度重新定义图片搜索技术,能够对图片有一个较为全面的表征,满足用户日常的开放式搜索(即可以随意输入,不需要特定的标签)需求,但仍然存在以下不足:缺乏具象化搜索能力,即无法将查询文本中的实体与实际中具体的形象相对应,从而难以对内容和用户请求做到真正的语义理解。例如当某一用户输入的关键词是“我和老婆打羽毛球”时,进行搜索后会显示两个人打羽毛球的照片,但无法保证这两个人是该用户及该用户的老婆,如搜索结果中可能包括图库中“我和老婆打羽毛球”与“我和女儿打羽毛球”等任意两个人打羽毛球的照片,该用户仍需在众多搜索结果中进行进一步的挑选照片,从而获取用户想要的“我和老婆打羽毛球”的照片,搜索效率低、准确率低、用户体验差。
为了解决相关技术中的上述问题,本申请实施例提供了一种搜索方法(详细描述参见下文)。从而可以有效解决用户通过终端设备进行搜索的过程中无法进行具象化搜索的问题,提升用户的搜索体验。相对于固定标签进行端侧搜索的方式,该搜索方法无需封闭式标签集,用户搜索时可以任意输入关键词,灵活性强;可以满足用户多元化的搜索请求,扩展性更高;该搜索方法超出了标签范畴,无需不断丰富、穷举标签,针对不同的终端设备,无需更换模型,维护更加容易。相对于开放式端内多模态搜索的方式,该搜索方法可以将查询文本中的实体与实际中具体的形象相对应,实现具象化搜索,搜索效率高、准确率高、具有更佳的用户体验。
需要说明的是,本申请实施例描述的上述应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,针对其他相似的或新的场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用,例如,可以应用于在终端设备中对用户授权的私有数据中进行搜索。
以下将从应用侧和训练侧两个方面对本申请提供的技术方案进行介绍,首先从应用侧对本申请实施例提供的搜索方法进行详细说明。
图2示出根据本申请一实施例的一种搜索方法的流程图。示例性地,该方法可以在终端设备中执行,例如,可以在上述图1中的手机101上执行;如图2所示,该方法可以包括以下步骤:
步骤S201、接收用户的第一搜索请求,所述第一搜索请求中包括私有实体。
示例性地,第一搜索请求可以是文本模态(如文字、词语、句子、由句子组成的段落等关键词)、图像模态(如包含文本的图像)、视频模态(如包含文本的视频)或音频模态(如语音)的数据。
示例性地,第一搜索请求中可以包括一个或多个私有实体,还可以包括一个或多个非私有实体;其中,私有实体表示与该用户具有关联关系的实体,例如,可以是“我”、“老婆”、“老公”、“叔叔”、“阿姨”、“儿子”、“妈妈”等与该用户相关联的称谓;再例如,可以是“张三”、“点点”等与该用户相关联的人名或昵称。可以理解的是,针对于不同的用户,同一私有实体对应于实际中不同的形象,如不同的人物或物品,例如,针对用户A,私有实体“我”对应于实际中的用户A,而针对用户B,私有实体“我”对应于实际中的用户B。非私有实体表示与该用户不具有直接关联关系的实体,例如,“鞋子”、“羽毛球”、“游泳”、“上海”等通用物体、地点等,针对不同的用户,同一非私有实体对应于实际中 相同的形象,如相同的人物或物体。
作为一个示例,用户可以向终端设备发出文本模态的第一搜索请求,该文本模态的第一搜索请求中可以包括私有实体;举例来说,如图1(a)所示,用户可以通过手机101的图库中的搜索栏输入关键词,该关键词中可以包括私有实体。
作为另一个示例,用户还可以向终端设备发出音频模态的第一搜索请求,终端设备可以对该音频模态的第一搜索请求进行模态变换,得到文本模态的第一搜索请求,该文本模态的第一搜索请求中包括私有实体。例如,用户可以通过手机101的负一屏的搜索入口输入搜索语音,手机101可以对用户输入的搜索语音进行处理转换为关键词,该关键词中包括私有实体。
示例性地,在用户向终端设备发出第一搜索请求后,终端设备可以对第一搜索请求进行解析,得到至少一个实体,进而可以将该至少一个实体分别与终端设备中的至少一个私有实体进行匹配,匹配成功的一个或多个实体即为第一搜索请求中的私有实体。
举例来说,以以文搜图为例,终端设备中的至少一个私有实体可以包括:“我”、“老婆”、“儿子”、“叔叔”、“妈妈”、“张三”等,第一搜索请求可以是关键词“我的照片”,终端设备通过对用户输入的关键词“我的照片”进行解析,可以提取出关键词“我的照片”中的两个实体“我”和“照片”,进而将“我”和“照片”分别与上述多个私有实体进行匹配,其中,“我”匹配成功,即关键词“我的照片”中的私有实体为“我”。
步骤S202、在至少一个私有实体对应的特征中,确定第一搜索请求中私有实体对应的特征;其中,该至少一个私有实体对应的特征指示所述至少一个私有实体对应的数据,所述至少一个私有实体对应的数据与所述第一搜索请求对应于不同的模态。
示例性地,终端设备中可以存储有私有知识集,该私有知识集中可以包括至少一个私有实体,及该至少一个私有实体中各私有实体对应的特征(也称特征向量);终端设备根据上述所接收的用户的第一搜索请求中私有实体,在私有知识集中选取该私有实体对应的特征。
示例性地,私有知识集中至少一个私有实体对应的特征由终端设备中的模型对该至少一个私有实体对应的数据处理得到;例如,第一搜索请求可以是文本模态的数据,至少一个私有实体对应的数据可以是图像模态的数据;再例如,第一搜索请求可以是音频模态的数据,至少一个私有实体对应的数据可以是图像模态的数据。可以理解的是,对于某一模态的私有实体对应的数据,通过该模型所得到私有实体对应的特征反映的为私有实体在该模态中所体现的特性。例如,对于一张我的自拍照,通过模型所得到的私有实体“我”对应的特征反映的为“我”在图像中的视觉特性,如颜色、形状、位置或尺寸等。其中,终端设备中的模型的训练过程可参加下文相关表述;示例性地,可以将训练好的该模型移植到终端设备上,以第一搜索请求为文本模态的数据为例,可以将终端设备中包含私有实体的图像模态的数据输入到该模型中,从而得到私有实体对应的特征,并建立私有实体和私有实体对应的特征之间的关联关系,从而得到私有知识集。例如,可以将手机上本地图库中的照片1-照片10输入到该模型中,其中,照片1-照片10中各照片均包含至少一个私有实体;可以理解的是,一张照片可以包含一个或多个私有实体,例如,一张我和老婆合影可以包括“我”与“老婆”两个私有实体。从而可以得到照片1-照片10中各私有实体所对应的特征;并分别确定各私有实体与各私有实体所对应的特征之间的对应关系,从而建 立私有知识集。
作为一个示例,私有知识集中至少一个私有实体可以包括“我”、“老婆”、“儿子”、“叔叔”、“妈妈”、“张三”;私有知识集中可以包括“我”的特征、“老婆”的特征、“儿子”的特征、“叔叔”的特征、“妈妈”的特征、“张三”的特征,若上述步骤中终端设备所接收的第一搜索请求中私有实体为“我”,则可以在私有知识集中选择出“我”的特征。
步骤S203、根据第一搜索请求及第一搜索请求中私有实体对应的特征,得到搜索结果。
示例性地,终端设备可以通过上述模型分别对第一搜索请求及第一搜索请求中私有实体对应的特征进行处理,并将处理结果进行融合,从而根据融合后的处理结果,得到搜索结果。
其中,搜索结果与第一搜索请求对应于不同的模态。示例性地,搜索结果可以与上述至少一个私有实体对应的数据对应于同一模态。作为一个示例,第一搜索请求可以是文本模态的数据,搜索结果可以是图像模态的数据。例如,第一搜索请求可以是关键词“我的照片”,搜索结果可以是一张或多张照片。
图3示出根据本申请一实施例的一种得到搜索结果的方法流程图。如图3所示,该步骤S203,可以包括以下步骤:
步骤S20301、通过模型对所述第一搜索请求及所述第一搜索请求中私有实体对应的特征进行处理,生成融合特征;所述融合特征指示所述第一搜索请求。
示例性地,模型可以为预先配置在终端设备中,该模型可以为不同模态的数据中同一私有实体生成相似的特征。示例性地,终端设备可以将第一搜索请求输入到该模型中,生成第一特征,进而将该第一特征与第一搜索请求中私有实体对应的特征进行融合处理,生成融合特征。作为一个示例,该模型可以包括:第一模态对应的子模型及第二模态对应的子模型;其中,第一模态为第一搜索请求中私有实体对应的数据所对应的模态,如图像模态,第二模态为第一搜索请求所对应的模态,如文本模态。终端设备可以将第一搜索请求对应的特征输入到第二模态对应的子模型中,从而得到第一特征,进而将第一特征与第一搜索请求中私有实体对应的特征进行融合,生成融合特征。
可以理解的是,针对不同的用户,由于同一私有实体可能对应于实际中不同的形象,即同一私有实体对应的特征可能不同,即私有实体对应的特征可以表征具象化的私有信息,与具体用户有关,也可以称为私有特征。第一特征反映第一搜索请求中文字本身的语义;可以理解的是,针对不同的用户,第一搜索请求相同,则第一搜索请求中文字本身的语义相同,将该第一搜索请求输入到上述模型中,所生成的第一特征相同,因此,第一特征可以表征公有信息,与具体用户无关,也可以称为公有特征;进而,可以对私有特征及公有特征进行融合,得到融合特征,也可以称为公私融合特征。
示例性地,可以使用适配器(Adapter)、加和、注意力(Attention)等现有的融合方法对第一特征及第一搜索请求中私有实体对应的特征进行融合,得到融合特征。该融合特征为高维空间的特征,既可以表征私有特征,又可以表征公有特征,从而实现私有特征与公有特征的互补。
步骤S20302、将数据库中与所述融合特征相匹配的特征对应的数据作为搜索结果。
示例性地,数据库中可以包括多个数据及该多个数据中各数据对应的特征;其中,该多个数据与第一搜索请求对应于不同的模态。
示例性地,终端设备可以计算上述融合特征与数据库中各数据对应的特征的相似度,并将数据库中与融合特征相似度最高的特征或相似度排行前K的特征作为与融合特征相匹配的特征,例如,可以按相似度按从高到低进行排序,从而将排在前K的相似度对应的特征作为与融合特征相匹配的特征;与融合特征相匹配的特征对应的数据即为搜索结果。可以理解的是,与融合特征相匹配的特征的数量可以为一个或多个,其中,不同的与融合特征相匹配的特征对应于不同的搜索结果,即搜索结果的数量可以为一个或多个。
作为一个示例,数据库可以是终端设备中的图库,包括多张图片及该多张图片中各图片对应的特征,可以使用上述融合特征与数据库中的特征进行相似度计算,将数据库中相似度最高的特征或者相似度排行前K的特征作为与融合特征相匹配的特征,进而将与融合特征相匹配的特征对应的图片,作为搜索到的图片。
示例性地,可以预先使用终端设备中上述模型推理终端设备中与第一搜索请求对应于不同模态的数据,得到各数据对应的特征,各数据对应的特征可以为高维空间特征,并建立各数据与各数据对应的特征之间的对应关系,基于该对应关系,将各数据与各数据对应的特征存储在终端设备的数据库中。示例性地,终端设备可以将图像模态的数据输入到模型中的图像模态对应的子模型中,从而生成该图像模态的数据对应的特征。
示例性地,终端设备可以根据上层调度,在合适的时机,例如,终端设备算力允许的情况下或用户不使用终端设备的时间段,触发推理各数据对应的特征等操作,从而更新数据库。
作为一个示例,数据库可以是手机中的图库,可以理解的是,图库中通常每天会有新增图片,可以在手机算力允许,或者用户不使用手机的时段,使用上述模型推理新增图片对应的特征,从而减少对用户正常使用手机的影响;例如,可以在凌晨用户不使用手机时,使用上述模型中的图像模态对应的子模型依次推理图库中的图片或新增图片,得到各图片对应的特征,并存储在终端设备中。
这样,通过上述步骤S20301-S20302,终端设备使用模型对第一搜索请求进行推理,生成融合特征,该融合特征能够同时表征公有信息及具象化的私有信息;作为一个示例,可以通过模型对第一搜索请求进行处理,生成公有特征(即第一特征),进而将生成的公有特征和私有特征(即第一搜索请求中私有实体对应的特征)进行融合,生成融合特征;从而针对不同用户,搜索到用户想要的搜索结果,实现基于私有特征与公有特征融合的具象化搜索。
示例性地,在步骤S202中至少一个私有实体对应的特征未包含第一搜索请求中私有实体对应的特征的情况下,则可以通过模型对第一搜索请求进行处理,以得到搜索结果。
步骤S204、显示搜索结果。
示例性地,终端设备可以显示搜索结果。例如,搜索结果是图像模态的数据,即搜索结果为图片,则可以通过终端设备上的显示屏向用户显示该图片。示例性地,可以直接显示该图片,也可以显示该图片的缩略图等,对此不作限定。
这样,通过上述步骤S201-步骤S204,在至少一个私有实体对应的特征中,确定用户的搜索请求中的私有实体对应的特征,并根据该私有实体对应的特征及第一搜索请求,得到搜索结果。由于至少一个私有实体对应的特征指示至少一个私有实体对应的数据,至少一个私有实体对应的数据可以与搜索结果对应于同一模态,从而可以将搜索请求中私有实 体与实际中具体的形象(即搜索结果中私有实体的具体形象)相对应,从而实现具象化搜索;解决了终端设备在搜索过程中无法进行具象化搜索的问题,提高了搜索效果和搜索效率,提升了用户的搜索体验。作为一个示例,可以通过模型对第一搜索请求中私有实体及第一搜索请求进行推理,得到公私融合特征,进而通过公私融合特征,在数据库中确定搜索结果。此外,无需封闭式标签集,灵活性强;可以满足用户多元化的搜索请求,扩展性更高;针对不同的终端设备,无需更换模型,维护更加容易。
本申请实施例还提供了另一种搜索方法,该方法可以包括:终端设备可以显示第一显示界面;所述第一显示界面包括搜索入口;进而,终端设备响应于用户在该搜索入口输入关键词的操作,显示第二显示界面,所述第二显示界面包括对应于该关键词的第一搜索结果;其中,该关键词中包括私有实体;该第一搜索结果根据该关键词及该关键词中私有实体对应的特征得到;该第一搜索结果与该关键词对应于不同的模态。其中,第一搜索结果的数量可以为一个或多个;示例性地,得到第一搜索结果的过程可参照前文步骤S203中相关表述。
举例来说,以终端设备为上述图1(a)中手机101为例,图4(a)-(c)示出根据本申请一实施例的一种搜索方法的示意图。如图4(a)所示,用户打开手机101中的本地图库,显示第一显示界面,其中,该第一显示界面的上方设置有搜索框(即搜索入口);该第一显示界面还可以显示本地图库中的部分或全部图像;如图4(b)所示,用户可以在本地图库中的搜索框内输入关键词“我打羽毛球”,并点击搜索框中的搜索图标,触发手机101执行搜索操作;示例性地,手机101通过对关键词“我打羽毛球”进行解析,在多个私有实体中,得到关键词“我打羽毛球”中的私有实体“我”,然后,手机101可以在多个私有实体对应的特征中,确定私有实体“我”对应的特征,并可以根据关键词“我打羽毛球”和私有实体“我”对应的特征,在图库中搜索到“我打羽毛球”的图片,例如,手机101可以通过将关键词“我打羽毛球”输入到模型得到的公有特征与私有实体“我”对应的私有特征进行融合,得到融合特征,进而将图库中与该融合特征相匹配的特征所对应的图片,确定为用户想要的“我打羽毛球”的图片;如图4(c)所示,进而手机101可以显示第二显示界面,该第二显示界面中可以显示搜索到的“我打羽毛球”的图片,从而完成具象化搜索。
下面对上述建立私有知识集的可能实现方式进行具体说明。
示例性地,可以根据需要设定私有知识集中所包含的私有实体数量,例如,鉴于终端设备中图库中图像的有限性,为提高数据采集有效性和效率,可以基于最小化原则,统计得到多数用户图库中图像所包含的私有实体的类型(例如,“我”、“老婆”、“老公”、“叔叔”、“阿姨”、“儿子”、“妈妈”等常见与用户相关的人物),从而确定最小私有知识集中私有实体的数量,进而生成各私有实体对应的特征,完成最小私有知识集的构建,该最小私有知识集可以解决图像搜索领域绝大部分的具象化搜索的述求,从而满足不同用户的搜索需求。示例性地,还可以根据需求新增或减少私有知识集中所包含的私有实体,对此不作限定。
图5示出根据本申请一实施例的一种构建私有知识集的方法流程图。该方法可以在终端设备上执行,例如,可以在上述图1中的手机101上执行;如图5所示,该方法可以包括以下步骤:
步骤S501、获取第一私有实体对应的数据。
其中,第一私有实体为上述至少一个私有实体中的任一私有实体;第一私有实体对应的数据与第一搜索请求对应于不同的模态。第一私有实体对应的数据可以仅包括第一私有实体,也可以包括第一私有实体在内的多个私有实体,对此不作限定。
可以理解的是,对于不同的用户,同一私有实体对应的数据不同,即针对某一用户,私有实体对应的数据可以作为与该用户相关的具象化的私有信息。例如,以第一私有实体为“儿子”为例,对于用户A,用户A的终端设备所获取的“儿子”对应的数据可以为用户A的儿子的照片;对于用户B,用户B的终端设备所获取的“儿子”对应的数据可以为用户B的儿子的照片。
示例性地,可以采用以下采集方式获取第一私有实体对应的数据。
方式一、可以采用显式采集方式获取第一私有实体对应的数据。例如,终端设备可以通过向用户发出提示信息等用户可以感知到的方式,引导用户进行选取操作,并根据用户的选取操作获取第一私有实体对应的数据;从而在隐私安全允许的范围内获取第一私有实体对应的数据。
方式二、可以采用隐式采集方式获取第一私有实体对应的数据。例如,终端设备可以通过推理包含常识信息的数据(例如,结婚照、亲子照、自拍照等),获取第一私有实体对应的数据,从而在用户无感知的情况下获取第一私有实体对应的数据;考虑到所获取的第一私有实体对应的数据的置信度可能较低,则可以进一步响应于用户主动触发的搜索请求,向用户展示搜索结果,并根据用户在搜索结果中的选取操作,判别上述得到的较低置信度的第一私有实体对应的数据是否为经过用户确认的数据,则可以得到置信度高的第一私有实体对应的数据;从而在隐私安全允许的范围内获取第一私有实体对应的数据。
步骤S502、对第一私有实体对应的数据进行处理,生成第一私有实体对应的特征。
示例性地,终端设备可以通过终端设备中模型对第一私有实体对应的数据进行处理,从而生成第一私有实体对应的特征。可以理解的是,对于某一模态的第一私有实体对应的数据,通过该模型所生成的第一私有实体对应的特征反映的为第一私有实体在该模态中所体现的特性。例如,对于照片“我的自拍照”,将该照片输入到该模型中,所生成的“我”对应的特征反映的为“我”在图像中所体现的特征。
示例性地,该模型可以包括多个模态对应的子模型,终端设备可以将某一模态的第一私有实体对应的数据输入到该模态对应的子模型中;例如,可以将“我的自拍照”这一照片输入到图像模态对应的子模型中,从而生成图像模态“我”对应的特征。
可以理解的是,由于同一私有实体对应的数据包含与用户相关的具象化的私有信息;所生成的第一私有实体对应的特征可以作为用户相关的私有知识,在多模态搜索过程中,该私有知识作为已知信息可以实现具象化搜索,例如,可以将该私有知识及搜索请求输入到该模型中,以实现具象化搜索。
在一些示例中,在第一私有实体对应的数据包括第一私有实体在内的多个私有实体的情况下,终端设备还可以将该数据拆分为多个仅包含一个私有实体的数据,进而将各仅包含一个私有实体的数据输入到模型中,得到各私有实体对应的特征,例如,终端设备可以识别照片中人物的年龄,从而将“我和儿子的合照”拆分为仅包含我的照片及仅包含儿子的照片,将这两张照片分别输入到模型中,分别得到私有实体“我”对应的特征及私有实 体“儿子”对应的特征;或者,终端设备还可以将第一私有实体对应的数据输入到模型中,从而得到该多个私有实体对应的特征,例如,终端设备可以将“我和儿子的合照”输入到模型中,得到两个私有实体“我”及“儿子”对应的特征。
这样,通过上述步骤S501-步骤S502,获取第一私有实体对应的数据,作为一个示例,可以采用显示采集或隐式采集的方式获取第一私有实体对应的数据,从而在隐私安全允许的范围内采集具象化的私有信息;并对第一私有实体对应的数据进行处理,生成第一私有实体对应的特征,从而得到第一私有实体对应的特征;示例性地,该第一私有实体对应的特征可以存储在终端设备中;从而在后续终端设备执行多模态搜索的过程中,为快速确定私有实体对应的特征提供支持。
进一步地,终端设备可以根据已经生成的私有实体对应的特征,确定私有知识集中其他私有实体对应的特征,从而实现自动扩展私有知识集,完成私有知识集构建。示例性地,该私有知识集可以存储在终端设备中。
作为一个示例,针对私有知识集中任一私有实体,终端设备可以重复执行上述步骤S501-步骤S502,依次获取各私有实体对应的数据,并可以利用模型,生成各私有实体对应的特征,从而遍历各私有实体,扩展私有知识,完成私有知识集构建。
作为另一个示例,终端设备还可以在生成第一私有实体对应的特征的基础上,进一步通过执行下述步骤S503及步骤S504,得到私有知识集中其他私有实体对应的特征,从而扩展私有知识集,最终得到各私有实体对应的特征,完成私有知识集构建。
步骤S503、获取包含第一私有实体及第二私有实体的数据。
其中,第二私有实体为至少一个私有实体中除第一私有实体之外的任一私有实体。
该步骤中,终端设备可以在用户所授权的隐私安全的范围内,自动获取同时包含第一私有实体及第二私有实体的数据,可以理解的是,针对某一用户,相对于仅包含第一私有实体的数据或仅包含第二私有实体的数据,该同时包含第一私有实体及第二私有实体的数据中具有更多与该用户相关的具象化的私有信息。例如,包含第一私有实体及第二私有实体的数据可以为图库中的“结婚照”、“全家福”或“大合照”等包含多个私有实体的照片。示例性地,终端设备可以结合常识信息,自动在图库中确定“全家福”,该照片中可以包括“我”、“老婆”、“儿子”或“女儿”等多个私有实体。
步骤S504、根据包含第一私有实体及第二私有实体的数据,及第一私有实体对应的特征,得到第二私有实体对应的特征。
示例性地,终端设备可以将包含第一私有实体及第二私有实体的数据拆分为两个独立的数据,即仅包含第一私有实体的数据及仅包含第二私有实体的数据,将这两个独立的数据分别输入到模型中,得到两个私有实体对应的特征,即第一私有实体对应的特征及第二私有实体对应的特征;进而可以根据已知的第一私有实体对应的特征(即已知的私有知识),在该两个私有实体对应的特征确定第二私有实体对应的特征,从而实现自动扩展私有知识集。
作为一个示例,终端设备可以获取图库中“自拍照”,即私有实体“我”对应的图像,将“自拍照”输入到模型中,从而生成图像模态“我”对应的特征。进而,终端设备还可以获取图库中的“结婚照”,即包含私有实体“我”和“老婆”的图像,并在该“结婚照”中提取出两个私有实体对应的特征,即图像模态“我”和“老婆”对应的特征;最后,终 端设备可以结合上述通过“自拍照”生成图像模态“我”对应的特征,得到图像模态“老婆”对应的特征。
这样,终端设备可以通过上述步骤S501-步骤S502,在隐私安全允许的范围内采集私有实体对应的数据(即具象化的私有信息),进而生成私有知识;例如,可以经过模型对私有信息进行处理生成私有知识;进而终端设备还可以自动扩展生成私有知识集;作为一个示例,终端设备可以通过上述步骤S503-步骤S504,根据获取到的包含多个私有实体的数据和已生成的私有知识,利用常识和模型自动扩展私有知识集,从而在隐私安全允许的范围内,以尽可能少的采集,利用少量的已生成的私有知识,完善具象化的私有知识,实现了在用户反馈数据少等情况对数据的有效利用,为终端设备执行多模态搜索的过程中实现具象化搜索提供支持。在一些示例中,在执行完上述步骤后,终端设备可以进一步地判断是否遍历最小私有知识集中的各私有实体,若遍历所有的私有实体,即生成了所有私有实体对应的特征,则完成最小私有知识集的构建,否则,重复执行上述步骤S503-步骤S504,直到遍历所有的私有实体,并将构建的最小私有知识集存储在终端设备中;示例性地,可以采用三元组<1#,2#,人物关系>存储方式,其中,1#表示最小私有知识集中的私有实体1对应的特征,2#表示最小私有知识集中私有实体2对应的特征,人物关系表示私有实体1与2之间的关系,1#与2#以键值对的形式存储在数据库中。
下面对上述构建私有知识集时,采取显式采集方式获取第一私有实体对应的数据的过程进行具体说明。
图6示出根据本申请一实施例的一种显式采集方式的流程图。该方法可以在终端设备上执行,例如,可以在上述图1中的手机101上执行;如图6所示,该方法可以包括以下步骤:
步骤S601、向用户发出第一提示信息。
该第一提示信息用于提示用户选取包含第一私有实体的数据。示例性地,第一提示信息可以为语音、文字、振动或视频等形式,对此不作限定。可以理解的是,第一提示信息需要在用户所授权的隐私安全的范围内对用户进行提示。
示例性地,终端设备可以在用户第一次在图库中搜索图片时,向用户发出第一提示信息,以提示用户在图库中选择包含第一私有实体的数据;或者,终端设备还可以在检测到用户在图库中搜索图片所花费的时间通常较长的情况下,向用户发出第一提示信息,以提示用户选择包含第一私有实体的数据;或者,终端设备还可以在图库中图片的数量超过一定数量时,向用户发出第一提示信息。
示例性地,终端设备可以基于提示信息集,选择该提示信息集中的一条或多条提示信息作为第一提示信息。
步骤S602、响应于用户的选取操作,获取第一私有实体对应的数据。
示例性地,在终端设备向用户发出第一提示信息后,用户可以根据第一提示信息,选择相应的数据或相应的提示选项,终端设备将用户所选取的数据,作为第一私有实体对应的数据。
在一种可能的实现方式中,终端设备显示第三显示界面,第三显示界面包括第一提示信息及至少一个数据;该第一提示信息用于提示用户选取包含第一私有实体的数据;响应于用户在至少一个数据中的选取操作,显示第四显示界面,第四显示界面包括第一数据及 第一标识,该第一标识用于指示第一数据为用户所选取的数据。示例性地,标识可以为文字、颜色、;亮度、图形、大小等标识,例如,第一数据为图片,则可以以高亮显示作为标识,也可以在该图片周围添加边框作为标识。
作为一个示例,终端设备可以向用户发出“请选择您的照片”、“请选择您儿子的照片”、“我和老婆打羽毛球”、“请选择结婚照”或“请选择全家福照片”等提示信息,以提示用户在图库中选择相应的图片。若用户选择了一张或多张图片,则终端设备将该一张或多张照片作为提示信息中所包含的私有实体对应的图像模态的数据。举例来说,以终端设备为上述图1(a)中手机101,第一私有实体是“我”为例,图7(a)-(c)示出根据本申请一实施例的一种显式采集方式的示意图;如图7(a)所示,用户进入手机101的本地图库,显示第一显示界面,该第一显示界面的上方设置有搜索框(即搜索入口);该第一显示界面还可以显示本地图库中的部分或全部图像,可以点击第一显示界面中的搜索框,从而可以在搜索框中输入关键词;如图7(b)所示,手机101可以在检测到用户首次点击本地图库中搜索框时,显示第三显示界面,该第三显示界面显示本地图库中的部分或全部图像,还可以显示“请选取您的照片”这一提示信息,用于引导用户在所显示图像中选取自己的照片;用户可以根据该提示信息,选取自己的照片,响应于用户的选取操作,显示第四显示界面,如图7(c)所示,用户选择图片2;则手机101可以确定图片2为用户本人的照片,并在图片2周围添加边框,以指示图片2为用户所选取的图片,从而获取“我”这一私有实体对应的图像模态的数据。
在一种可能的实现方式中,终端设备可以显示第一显示界面;所述第一显示界面包括搜索入口;进而,终端设备响应于用户在该搜索入口输入关键词的操作,显示第二显示界面,所述第二显示界面包括对应于该关键词的第一搜索结果及第二提示信息;其中,该关键词中包括私有实体;该第一搜索结果根据该关键词及该关键词中私有实体对应的特征得到;该第一搜索结果与该关键词对应于不同的模态;该第二提示信息用于提示用户确认第一搜索结果是否为用户期望的搜索结果。终端设备响应于用户的确认操作,显示第五显示界面,第五显示界面包括第一搜索结果及第二标识,该第二标识用于指示第一搜索结果为用户所确认的搜索结果。示例性地,得到第一搜索结果的过程可参照前文步骤S203中相关表述。
作为一个示例,终端设备可以检测到用户主动在图库中搜索图片时发出提示信息,例如用户输入“女儿”、“儿子的照片”或“我和老婆打羽毛球”等关键词,终端设备响应于用户查询操作,在图库中进行搜索,显示搜索到的照片,并发出提示信息“这是不是您想搜索的照片?”并显示提示选项“是”及“否”,若用户选择“是”,则终端设备将该照片作为“儿子”这一私有实体对应的图像模态的数据。举例来说,以终端设备为上述图1(a)中手机101,第一私有实体是“儿子”为例,图8(a)-(c)示出根据本申请一实施例的一种显式采集方式的示意图;如图8(a)所示,用户进入手机101的本地图库,显示第一显示界面,该第一显示界面的上方设置有搜索框(即搜索入口),可以在第一显示界面的搜索框中输入关键词“儿子”;如图8(b)所示,手机101执行搜索操作,并在屏幕中显示搜索到的图片1、提示语句“这是不是您想搜索的照片?”以及提示选项“是”及“否”;响应于用户点击选项“是”的操作,显示第五显示界面,如图8(c)所示,则手机101可以确定图片1为用户儿子的照片,并在图片1周围添加边框,以指示图片1为用户所确认 的图片,从而获取“儿子”这一私有实体对应的图像模态的数据。
作为另一个示例,终端设备的图库中可以设置按钮或入口,如果用户发觉搜索图片的效果不佳,则可以通过该按钮或入口触发终端设备发出第一提示信息,如“请选择张三学习的照片”;若用户选择了一张或多张图片,则终端设备将该一张或多张照片作为“张三”这一私有实体对应的图像模态的数据。
这样,通过上述步骤S601及S602,以用户可以感知到的显式采集方式获取第一私有实体对应的数据,由于该私有信息为用户主观选取,这样采集到的私有信息具备高置信度的特点;从而实现在用户隐私安全范围内,通过显示引导,采集高置信度的具象化的私有信息。
进一步地,终端设备还可以执行步骤S603、将第一私有实体对应的数据输入到模型中,生成第一私有实体对应的特征;从而在上述显式采集方式所获取的具象化的私有信息中生成私有知识。
该步骤S603的可能实现方式可参照上述步骤S502中相关表述。
进一步地,终端设备还可以通过执行下述步骤S604-S605,得到私有知识集中其他私有实体对应的特征,自动扩展私有知识集,从而完成私有知识集构建。
步骤S604、获取包含第一私有实体及第二私有实体的数据。
步骤S605、根据包含第一私有实体及第二私有实体的数据,及第一私有实体对应的特征,得到第二私有实体对应的特征。
该步骤S604-S605的具体实现过程可参照上述图5中步骤S503-S504中相关表述。
这样,通过上述步骤S601-步骤S603,以显示采集的方式获取具象化的私有信息,并生成私有知识;进而,终端设备可以通过执行上述步骤S604与S605,根据采集到的包含多个私有实体的数据和已生成的私有知识,扩展私有知识集,从而在隐私安全允许的范围内,以尽可能少的显示采集,利用少量的已生成的高置信度的私有知识,生成更多高置信度的具象化的私有知识。在一些示例中,在执行完上述步骤后,终端设备可以进一步地判断是否遍历最小私有知识集中的各私有实体,若遍历所有的私有实体,即生成了所有私有实体对应的特征,则完成最小私有知识集的构建,否则,重复执行上述步骤S604-步骤S605,直到遍历所有的私有实体,并将构建的最小私有知识集存储在终端设备中。
下面对上述构建私有知识集时,采取隐式采集方式获取第一私有实体对应的数据的过程进行具体说明。
图9示出根据本申请一实施例的一种隐式采集方式的流程图。该方法可以在终端设备上执行,例如,可以在上述图1中的手机101上执行;如图9所示,该方法可以包括以下步骤:
步骤S901、获取包含第一私有实体的初始数据。
示例性地,初始数据可以为包含常识信息的数据,例如,结婚照、亲子照、自拍照等。终端设备可以结合常识对终端设备中的数据进行推理,得到包含第一私有实体的初始数据;例如,若用户为男性用户,则结婚照中通常包含“我”和“老婆”这两个私有实体,终端设备可以通过模型(如图像模态对应的子模型)提取出本地图库中图片的特征,从而在本地图库中筛选出结婚照;再例如,自拍照通常包含“我”这一私有实体,终端设备可以通过模型提取出本地图库中图片的特征,从而在本地图库中筛选出自拍照。
示例性地,该步骤中所获取的初始数据包含具象化的私有信息,从而可以将包含第一私有实体的初始数据作为第一私有实体对应的数据;从而在用户无感知的情况下获取第一私有实体对应的数据。
可以理解的是,由于未得到用户的确认,该步骤所获取的初始数据的置信度通常较低。例如,终端设备中可能存在多张不同人物的自拍照,这些自拍照中包含了私有实体“我”对应的数据(即用户的自拍照),还可能包含其他私有实体对应数据(即非用户的自拍照)。终端设备可以通过执行下述步骤S902-S903,对所获取的初始数据进行进一步地筛选优化,从而得到高置信度的第一私有实体对应的数据。
步骤S902、响应于用户的第二搜索请求,显示至少一个搜索结果。其中,第二搜索请求包含第一私有实体,该至少一个搜索结果与第二搜索请求对应于不同的模态。
示例性地,终端设备接收到用户的第二搜索请求后,可以根据终端设备中模型对第二搜索请求进行处理,以得到至少一个搜索结果,并可以将该至少一个搜索结果呈现给用户。作为一个示例,该至少一个搜索结果可以包括上述包含第一私有实体的初始数据。
步骤S903、基于用户在至少一个搜索结果中的选取操作,获取第一私有实体对应的数据。
示例性地,用户可以在至少一个搜索结果中选取自己想要搜索的结果,由于第二搜索请求包含第一私有实体,则用户所选取的搜索结果中包含第一私有实体;从而终端设备可以根据用户选取的搜索结果,获取第一私有实体对应的数据,即高置信度的具象化的私有信息。
示例性地,终端设备可以基于用户所选取的搜索结果,更新上述初始数据的置信度;作为一个示例,终端设备可以基于用户在至少一个搜索结果中选取上述初始数据的操作,将该初始数据作为第一私有实体对应的数据。例如,若用户选取了该初始数据,则说明该初始数据的置信度较高,则终端设备可以将该初始数据作为第一私有实体对应的数据;若用户未选取该初始数据,则说明该初始数据的置信度仍旧较低,终端设备可以不将该初始数据作为第一私有实体对应的数据。
作为另一个示例,在该至少一个搜索结果中不包括上述包含第一私有实体的初始数据的情况下,或者在该至少一个搜索结果中包括该初始数据且用户未选取该初始数据的情况下;用户所选取的搜索结果的置信度较高,则终端设备可以将用户所选取的搜索结果确定为第一私有实体对应的数据。
举例来说,以终端设备为上述图1(a)中手机101,第一私有实体可以是“我”为例,图10(a)-(c)示出根据本申请一实施例的一种隐式采集方式的示意图;手机101可以在本地图库中筛选出自拍照;如图10(a)所示,第二搜索请求可以是关键词“我的照片”,用户可以在手机101本地图库的搜索框内输入关键词“我的照片”,如图10(b)所示,手机101利用图像模态对应的子模型对关键词进行处理,在图库中搜索用户的照片,并可以将搜索到的用户照片显示在屏幕上;如图10(c)所示,用户可以在手机101屏幕显示的照片中选取自己的照片,例如,用户选择图片2;若上述筛选出自拍照包括图片2,则手机101可以将图片2作为“我”这一私有实体对应的数据。这样,通过用户的选取操作,提高了图片2的置信度,从而得到高置信度的具象化的私有信息。
这样,通过上述步骤S901-S903,自动获取低置信度的具象化的私有信息(即初始数 据),进一步响应于用户主动触发的第二搜索请求,显示至少一个搜索结果,并基于用户在至少一个搜索结果中选取上述初始数据的操作,将该初始数据作为第一私有实体对应的数据,提升具象化的私有信息的置信度,从而以用户无感知的隐式采集方式获取第一私有实体对应的数据。
作为一个示例,通过上述隐式采集的方式,可以完成构建私有知识集所需的全部或大部分具象化的私有信息的采集,从而可以作为上述显式采集方式的备份,在即使不进行显式采集的情况下,也能完成高置信度的具象化的私有信息的采集。
进一步地,终端设备还可以执行步骤S904、将第一私有实体对应的数据输入到模型中,生成第一私有实体对应的特征;从而在上述隐式采集方式所获取的高置信度的具象化的私有信息中生成私有知识。
该步骤S904的可能实现方式可参照上述步骤S502中相关表述。
进一步地,终端设备还可以通过执行下述步骤S905-S906,得到私有知识集中其他私有实体对应的特征,自动扩展私有知识集,从而完成私有知识集构建。
步骤S905、获取包含第一私有实体及第二私有实体的数据。
步骤S906、根据包含第一私有实体及第二私有实体的数据,及第一私有实体对应的特征,得到第二私有实体对应的特征。
该步骤S905-S906的具体实现过程可参照上述图5中步骤S503-S504中相关表述。
这样,通过上述步骤S901-步骤S904,以隐式采集的方式获取具象化的私有信息,并生成私有知识;进而,终端设备可以通过执行上述步骤S905与S906,根据采集到的包含多个私有实体的数据和已生成的私有知识,扩展私有知识集,从而在隐私安全允许的范围内,以尽可能少的采集,利用少量已生成的高置信度的私有知识,生成更多高置信度的具象化的私有知识。在一些示例中,在执行完上述步骤后,终端设备可以进一步地判断是否遍历最小私有知识集中的各私有实体,若遍历所有的私有实体,即生成了所有私有实体对应的特征,则完成最小私有知识集的构建,否则,重复执行上述步骤S905-步骤S906,直到遍历所有的私有实体,并将构建的最小私有知识集存储在终端设备中;示例性地,可以采用三元组存储方式,例如,<1#,2#,夫妻关系>、<1#,3#,父子关系>等等。
下面从训练侧对本申请实施例提供的模型训练方法进行详细说明。可以理解的是,应用侧与训练侧可以对应于同一设备,即模型的训练与采用训练好的模型进行搜索可以在同一设备上进行;应用侧与训练侧也可以对应于不同的设备,即可以在一个设备上对模型进行训练,并将该训练好的模型配置在另一个设备中进行具象化搜索。例如,可以在服务器上预先训练好模型,从而将该模型移植到终端设备上,实现终端设备上的具象化搜索。
图11示出根据本申请一实施例的模型训练方法的流程图。示例性地,该模型训练方法可以应用于服务器,例如,云端服务器。如图11所示,该方法可以包括以下步骤:
S1101、获取多模态样本集及第一模型。
其中,多模态样本集可以包括:第一私有实体对应的多模态样本、第一搜索请求对应的多模态样本、第二搜索请求对应的多模态样本;第一私有实体表示与用户具有关联关系的实体,第一搜索请求包括第一私有实体,第二搜索请求不包含第一私有实体。
示例性地,可以将获取的多模态样本集划分为训练集、验证集和测试集;其中,训练集、验证集和测试集中各自包含的样本数量可以根据需求进行设定,对此不作限定。
示例性地,多模态样本可以包括第一模态的样本及第二模态的样本,例如,多模态样本可以包括文本模态的样本和图像模态的样本。作为一个示例,多模态样本集可以包括图文对数据集,图文对数据集中包括多个图文对数据,其中,每个图文对数据中文本模态的数据与图像模态的数据相对应,例如,文本模态的“我”(即描述用户照片的文本-我)和图像模态的“我”(即该用户的照片)可以作为一个图文对数据。
作为一个示例,第一私有实体对应的多模态样本可以是第一私有实体对应的图文对数据。例如,第一私有实体可以是“我”,第一私有实体对应的多模态样本可以是文本模态的“我”和图像模态的“我”。第一搜索请求对应的多模态样本可以是第一搜索请求对应的图文对数据。例如,第一搜索请求对应的多模态样本可以是文本模态的“我打羽毛球”(即描述用户打羽毛球的照片的关键词-我打羽毛球)和图像模态的“我打羽毛球”(即该用户打羽毛球的照片)。第二搜索请求对应的多模态样本可以是公开的图文对数据;例如,第二搜索请求对应的多模态样本可以是文本模态的“其他人打羽毛球”(即描述该用户以外的其他人打羽毛球的照片的关键词-其他人打羽毛球)和图像模态的“其他人打羽毛球”(即该用户以外的其他人打羽毛球的照片)。
示例性地,第一模型可以为训练好的多模态模型,例如,可以为采用公开数据集训练得到的多模态模型,可以用于为不同模态的样本中同一非私有实体生成相似的特征;其中,第一模型的训练过程可参照现有技术,在此不再赘述。示例性地,第一模型可以包括:第一模态对应的子模型及第二模态对应的子模型。例如,第一模态可以为图像模态,第二模态可以包括文本模态;其中,文本模态对应的子模型用于编码文本模态的数据,生成对应的特征;图片模态对应的子模型用于编码图像模态的数据,生成对应的特征。
示例性地,可以使用公开的图文对数据集,将数据集分成训练集、验证集和测试集,对多模态模型进行训练,当测试集准确率达到预设值时,可以得到公开的图文对在高维空间下的相同表征,即相同的高维空间特征,此时停止训练,保存模型,从而得到训练好的多模态模型,即第一模型。
S1102、利用多模态样本集对第一模型进行训练,得到第二模型。
可以理解的是,由于第一模型是通过公开数据集训练得到的,因此,第一模型具有能力识别通用的实体,例如,可以识别实体羽毛球;而对于“我”、“老婆”或者“张三”等私有实体等,由于公开数据集中没有针对不同用户对私有实体进行区分,因此,采用公开数据集训练的第一模型仅有能力识别出这些实体为“人”,并无法将这些实体与实际中具体的形象准确对应。该步骤中利用包含私有信息(即第一私有实体对应的多模态样本)的多模态样本集,对上述第一模型进行微调Fine-tune,提升了模型的表达力,所得到的第二模型本身具备具象化信息承载能力,用于为不同模态的样本中同一私有实体生成相似的特征,从而可以将私有实体与实际中具体的形象准确对应,该第二模型可以作为上述图2中的模型,用于具象化搜索,大大提升搜索效果和效率。
示例性地,可以利用第一私有实体对应的多模态样本、第一搜索请求对应的多模态样本、第二搜索请求对应的多模态样本,通过对比学习,对第一模型进行训练,从而得到第二模型;其中,第一搜索请求对应的多模态样本可以作为正例、第二搜索请求对应的多模态样本可以作为(难)负例。
本申请实施例中,在对第一模型进行训练过程中,依赖于一定的“私有”数据集进行 微调,这使得训练好的模型(即第二模型)可以为不同模态的样本中同一私有实体生成相似的特征,具备推理公有信息及具象化私有信息的特征的能力;进而可以利用第二模型进行具象化搜索,大大提升搜索效果和效率。
下面对上述步骤S1102中的训练过程进行具体说明。
图12示出根据本申请一实施例的模型训练方法的流程图。如图12所示,上述步骤S1102可以包括以下步骤:
S11021、通过第一模型对第一私有实体对应的多模态样本及第一搜索请求对应的多模态样本进行处理,得到融合特征。
示例性地,服务器可以获取私有实体对应的多模态样本集,该私有实体对应的多模态样本集可以作为“私有”数据集。例如,可以获取私有实体对应的图文对数据集,每一图文对数据中可以包括图片及该图片中私有实体的标注。
示例性地,服务器可以对第一搜索请求进行解析,得到其所包含的私有实体(即第一私有实体),获取第一私有实体对应的模态样本。
作为一个示例,服务器可以将第一私有实体对应的第一模态的样本及第一搜索请求对应的第二模态的样本输入到第一模型中,得到融合特征。
例如,第一搜索请求对应的第二模态的样本可以是关键词“我打羽毛球”,可以得到第一搜索请求中包含的私有实体为“我”,即第一私有实体为“我”,从而可以将图像模态的“我”(即第一私有实体对应的第一模态的样本)及关键词“我打羽毛球”输入到第一模型中,得到融合特征。
作为一个示例,第一搜索请求对应的第二模态的样本可以是关键词“我打羽毛球”,服务器通过对该关键词进行解析,可以确定第一搜索请求中包含的私有实体为“我”,即第一私有实体为“我”,从而可以将图像模态的“我”输入到图像模态对应的子模型中,该图像模态对应的子模型推理得到图像模态的“我”对应的特征,即私有实体“我”在图像中的特征。
在一种可能的实现方式中,服务器可以将第一搜索请求对应的多模态样本输入到第一模型中,生成第一特征;将第一私有实体对应的多模态样本输入到第一模型中,生成第二特征;对第一特征及第二特征进行融合,得到融合特征。该实现方式具体描述可参照前文图3中步骤S20301中相关表述,在此不再赘述。
示例性地,服务器可以将第一搜索请求对应的第二模态的样本输入到第二模态对应的子模型中,得到第一特征;将第一私有实体对应的第一模态的样本输入到第一模态对应的子模型中,得到第二特征;进而对第一特征及第二特征进行融合,得到融合特征。例如,第一搜索请求对应的文本模态的样本为关键词“我打羽毛球”,则可以将关键词“我打羽毛球”输入到文本模态对应的子模型中,该文本模态对应的子模型推理得到关键词“我打羽毛球”对应的特征(即第一特征),第一搜索请求对应的图像模态的样本为“我”的照片,将“我”的照片输入到图像模态对应的子模型中,该图像模态对应的子模型推理得到“我”的照片对应的特征(即第二特征),进而将这两个特征融合,得到融合特征。
这样,在对第一模型的训练过程中,使用第一模型得到融合特征;作为一个示例,生成私有特征(即第二特征)及公有特征(即第一特征),并将生成的公有特征和私有特征进行融合,融合后的特征能够同时表征公有信息及具象化的私有信息,从而使得第一模型 可以兼顾学习公有信息和私有信息。
S11022、根据融合特征、第一搜索请求对应的特征及第二搜索请求对应的特征,对第一模型进行训练,得到第二模型。
其中,第一搜索请求对应的特征由第一模型对第一搜索请求对应的多模态样本进行处理得到,第二搜索请求对应的特征由第一模型对第二搜索请求对应的多模态样本进行处理得到。
示例性地,第一搜索请求对应的特征由第一模型对第一搜索请求对应的第一模态的样本进行处理得到,第二搜索请求对应的特征由第一模型对第二搜索请求对应的第一模态的样本进行处理得到。作为一个示例,第一搜索请求对应的第一模态的样本可以“我打羽毛球”的照片,第二搜索请求对应的第一模态的样本可以是“其他人打羽毛球”的照片,则服务器可以将“我打羽毛球”的照片输入到图像模态对应的子模型中,得到图像模态的“我打羽毛球”对应的特征,将“其他人打羽毛球”的照片输入到图像模态对应的子模型中,得到图像模态的“其他人打羽毛球”对应的特征。
示例性地,可以利用融合特征、第一搜索请求对应的特征及第二搜索请求对应的特征,通过对比学习对第一模型进行训练,得到第二模型;其中,第一搜索请求对应的特征作为正例的特征,第二搜索请求对应的特征作为负例的特征。在对比学习的过程中,正例的特征与融合特征之间的距离逐渐靠近,负例的特征与融合特征之间的距离逐渐远离,直到达到正例的特征与融合特征的特征对齐的效果;即正例的特征与融合特征的特征之间的距离相近,例如欧氏距离、余弦距离等相近,而同其它的特征之间的距离较远;可以理解的是,两个特征之间的欧氏距离越大,表示两个特征之间的差异越大,即相似度越小;当测试集准确率达到预设值时,可以停止训练,保存模型,从而得到训练好的公私融合多模态模型,即第二模型。该第二模型可以作为上述图2中的模型,在搜索过程中,第二模型结合私有知识,从而可以推理出可以表征公有信息及私有信息的融合特征,从而实现具象化搜索,大大提升搜索效果和效率。
示例性地,在对比学习的过程中,可以根据融合特征、第一搜索请求对应的特征及第二搜索请求对应的特征,确定损失函数值,例如,可以通过融合特征与第一搜索请求对应的特征的差异计算正例的损失函数值,通过融合特征与第二搜索请求对应的特征的差异计算负例的损失函数值,根据该正例的损失函数值及该负例的损失函数值得到损失函数值。进而在得到损失函数值后,可以对该损失函数值进行反向传播,采用梯度下降算法更新第一模型中参数值,通过不断训练,通过第一模型得到的融合特征与正例的特征相似程度(如欧氏距离)不断靠近,同时与反例的特征的相似程度逐渐拉大。
举例来说,图13示出根据本申请一实施例的模型训练方法的流程示意图,如图13所示,第一私有实体可以是“我”,第一私有实体对应的多模态样本是“我”的照片及该照片对应的文本标注“我”;第一搜索请求对应的多模态样本是“我”打羽毛球的照片及该照片对应的文本标注“我打羽毛球”;第二搜索请求对应的多模态样本是其他人打羽毛球的照片及该照片对应的文本标注“其他人打羽毛球”;其中,“我”打羽毛球的照片作为正例,其他人打羽毛球的照片作为负例;在对第一模型进行微调的过程中,可以由第一模型(如文本模态对应的子模型)将文本标注“我打羽毛球”进行处理,从而得到第一特征;可以由第一模型(如图像模态的子模型)对“我”的照片进行处理得到特征A,即第二特 征;对“我”打羽毛球的照片进行处理得到特征B;对其他人打羽毛球的照片进行处理得到特征C。第一模型对第一特征及第二特征进行融合,得到融合特征。进而,结合特征B、特征C、融合特征,对第一模型进行微调;通过不断的微调,特征B与融合特征的特征逐渐靠近,特征C与融合特征的特征逐渐远离,直到达到特征B与融合特征的特征对齐的效果,即将文本模态的“我”和图片模态的“我”对齐。当测试集准确率达到预设值时,可以停止训练,保存模型,从而得到第二模型。这样,模型训练的过程中可以兼顾学习照片公有信息和具象化的私有信息,利用训练好的第二模型在图库中进行搜索的过程中,可以提取图库中图片的公有信息和私有信息,推理生成融合特征,进而结合私有知识,可以准确搜索包含“我”、“老婆”等私有信息的图片。
本申请实施例中,利用多模态模型及私有数据,将私有特征与公有特征融合后,再与对应的多模态的特征对齐,从而训练得公私融合多模态模型;这样,利用公有特征和私有特征联合训练,对第一模型进行微调,这使得训练后的模型(即第二模型)具备推理具象化私有信息的能力;解决了仅通过公开数据集训练的多模态模型无法承载具象化私有信息的问题。
基于上述方法实施例的同一发明构思,本申请的实施例还提供了一种搜索装置,该搜索装置可以用于执行上述方法实施例所描述的技术方案。例如,可以执行上述图2、图3、图5、图6或图9中所示方法的各步骤。
图14示出根据本申请一实施例的一种搜索装置的结构图。如图14所示,接收模块1401,用于接收用户的第一搜索请求,所述第一搜索请求中包括私有实体;其中,私有实体表示与所述用户具有关联关系的实体;确定模块1402,用于在至少一个私有实体对应的特征中,确定所述第一搜索请求中私有实体对应的特征;其中,所述至少一个私有实体对应的特征指示所述至少一个私有实体对应的数据,所述至少一个私有实体对应的数据与所述第一搜索请求对应于不同的模态;搜索模块1403,用于根据所述第一搜索请求及所述第一搜索请求中私有实体对应的特征,得到搜索结果;所述搜索结果与所述第一搜索请求对应于不同的模态;显示模块1404,用于显示所述搜索结果。
本申请实施例中,在至少一个私有实体对应的特征中,确定用户的搜索请求中的私有实体对应的特征,并根据该私有实体对应的特征及第一搜索请求,得到搜索结果。由于至少一个私有实体对应的特征指示至少一个私有实体对应的数据,至少一个私有实体对应的数据可以与搜索结果对应于同一模态,从而可以将搜索请求中私有实体与实际中具体的形象(即搜索结果中私有实体的具体形象)相对应,从而实现具象化搜索;解决了终端设备在搜索过程中无法进行具象化搜索的问题,提高了搜索效果和搜索效率,提升了用户的搜索体验。作为一个示例,可以通过模型对第一搜索请求中私有实体及第一搜索请求进行推理,得到公私融合特征,进而通过公私融合特征,在数据库中确定搜索结果。此外,无需封闭式标签集,灵活性强;可以满足用户多元化的搜索请求,扩展性更高;针对不同的终端设备,无需更换模型,维护更加容易。
在一种可能的实现方式中,所述搜索模块1403,还用于:通过模型对所述第一搜索请求及所述第一搜索请求中私有实体对应的特征进行处理,生成融合特征;所述融合特征指示所述第一搜索请求;将数据库中与所述融合特征相匹配的特征对应的数据作为所述搜索结果。
在一种可能的实现方式中,所述装置还包括:生成模块,用于获取第一私有实体对应的数据;所述第一私有实体为所述至少一个私有实体中的任一私有实体;对所述第一私有实体对应的数据进行处理,生成所述第一私有实体对应的特征。
在一种可能的实现方式中,所述生成模块,还用于:向用户发出第一提示信息,所述第一提示信息用于提示用户选取包含所述第一私有实体的数据;响应于用户的选取操作,获取所述第一私有实体对应的数据。
在一种可能的实现方式中,所述生成模块,还用于:获取包含所述第一私有实体的初始数据;响应于用户的第二搜索请求,显示至少一个搜索结果;所述第二搜索请求包含所述第一私有实体,所述至少一个搜索结果与所述第二搜索请求对应于不同的模态;所述至少一个搜索结果包括所述初始数据;基于用户在所述至少一个搜索结果中选取所述初始数据的操作,将所述初始数据作为所述第一私有实体对应的数据。
在一种可能的实现方式中,所述生成模块,还用于:获取包含所述第一私有实体及第二私有实体的数据;其中,所述第二私有实体为所述至少一个私有实体中除所述第一私有实体之外的任一私有实体;根据所述包含所述第一私有实体及第二私有实体的数据,及所述第一私有实体对应的特征,得到所述第二私有实体对应的特征。
在一种可能的实现方式中,所述私有实体包括:与用户相关联的称谓、人名或昵称。
上述图14所示的搜索装置及其各种可能的实现方式的技术效果及具体描述可参见上述搜索方法,此处不再赘述。
基于上述方法实施例的同一发明构思,本申请的实施例还提供了另一种搜索装置,该搜索装置包括:第一显示模块,用于显示第一显示界面;所述第一显示界面包括搜索入口;第二显示模块,用于响应于用户在所述搜索入口输入关键词的操作,显示第二显示界面,所述第二显示界面包括对应于所述关键词的搜索结果;其中,所述关键词中包括私有实体;所述搜索结果根据所述关键词及所述私有实体对应的特征得到;所述搜索结果与所述关键词对应于不同的模态。
本申请实施例中,响应于用户在第一显示界面中搜索入口输入关键词的操作,显示第二显示界面;由于搜索结果根据该关键词及搜索请求中私有实体对应的特征得到,从而可以将该关键词中私有实体与实际中具体的形象(即搜索结果中私有实体的具体形象)相对应,从而实现具象化搜索;解决了终端设备在搜索过程中无法进行具象化搜索的问题,提高了搜索效果和搜索效率,提升了用户的搜索体验。
在一种可能的实现方式中,所述装置还包括:第三显示模块,用于显示第三显示界面,所述第三显示界面包括第一提示信息及至少一个数据;所述第一提示信息用于提示用户选取包含第一私有实体的数据;第四显示模块,用于响应于用户在所述至少一个数据中的选取操作,显示第四显示界面,所述第四显示界面包括第一数据及第一标识,所述第一标识用于指示所述第一数据为用户所选取的数据。
在一种可能的实现方式中,所述第二显示界面还包括:第二提示信息,所述第二提示信息用于提示用户确认所述第一搜索结果是否为用户期望的搜索结果;所述装置还包括:第五显示模块,用于响应于用户的确认操作,显示第五显示界面,所述第五显示界面包括第一搜索结果及第二标识,所述第二标识用于指示所述第一搜索结果为用户所确认的搜索结果。
基于上述方法实施例的同一发明构思,本申请的实施例还提供了一种模型训练装置,该模型训练装置可以用于执行上述方法实施例所描述的技术方案。例如,可以执行上述图11或图12所示方法的各步骤。
图15示出根据本申请一实施例的一种模型训练装置的结构图。如图15所示,所述装置包括:获取模块1501,用于获取多模态样本集及第一模型;其中,所述多模态样本集包括:第一私有实体对应的多模态样本、第一搜索请求对应的多模态样本、第二搜索请求对应的多模态样本;所述第一私有实体表示与用户具有关联关系的实体,所述第一搜索请求包含所述第一私有实体,所述第二搜索请求不包含所述第一私有实体;训练模块1502,用于利用所述多模态样本集对所述第一模型进行训练,得到第二模型。
本申请实施例中,在对第一模型进行训练过程中,依赖于一定的“私有”数据集进行微调,这使得训练好的模型(即第二模型)可以为不同模态的样本中同一私有实体生成相似的特征,具备推理公有信息及具象化私有信息的特征的能力;进而可以利用第二模型进行具象化搜索,大大提升搜索效果和效率。
在一种可能的实现方式中,所述训练模块1502,还用于:通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征;根据所述融合特征、所述第一搜索请求对应的特征及所述第二搜索请求对应的特征,对所述第一模型进行训练,得到所述第二模型;其中,所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的多模态样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的多模态样本进行处理得到。
在一种可能的实现方式中,所述训练模块1502,还用于:将所述第一搜索请求对应的多模态样本输入到所述第一模型中,生成第一特征;将所述第一私有实体对应的多模态样本输入到所述第一模型中,生成第二特征;对所述第一特征及所述第二特征进行融合,得到所述融合特征。
在一种可能的实现方式中,所述多模态样本包括第一模态的样本及第二模态的样本;所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的第一模态的样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的第一模态的样本进行处理得到;所述训练模块1502,还用于:将所述第一私有实体对应的第一模态的样本及所述第一搜索请求对应的第二模态的样本输入到所述第一模型中,得到所述融合特征
上述图15所示的模型训练装置及其各种可能的实现方式的技术效果及具体描述可参见上述模型训练方法,此处不再赘述。
应理解以上搜索装置及模型训练装置中各模块的划分仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。此外,装置中的模块可以以处理器调用软件的形式实现;例如装置包括处理器,处理器与存储器连接,存储器中存储有指令,处理器调用存储器中存储的指令,以实现以上任一种方法或实现该装置各模块的功能,其中处理器例如为通用处理器,例如中央处理单元(Central Processing Unit,CPU)或微处理器,存储器为装置内的存储器或装置外的存储器。或者,装置中的模块可以以硬件电路的形式实现,可以通过对硬件电路的设计实现部分或全部模块的功能,该硬件电路可以理解为一个或多个处理器;例如,在一种实现中,该硬件电路为专用集成电路 (application-specific integrated circuit,ASIC),通过对电路内元件逻辑关系的设计,实现以上部分或全部模块的功能;再如,在另一种实现中,该硬件电路为可以通过可编程逻辑器件(programmablelogic device,PLD)实现,以现场可编程门阵列(Field Programmable Gate Array,FPGA)为例,其可以包括大量逻辑门电路,通过配置文件来配置逻辑门电路之间的连接关系,从而实现以上部分或全部模块的功能。以上装置的所有模块可以全部通过处理器调用软件的形式实现,或全部通过硬件电路的形式实现,或部分通过处理器调用软件的形式实现,剩余部分通过硬件电路的形式实现。
在本申请实施例中,处理器是一种具有信号的处理能力的电路,在一种实现中,处理器可以是具有指令读取与运行能力的电路,例如CPU、微处理器、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、神经网络处理器(neural-network processing unit,NPU)、张量处理器(tensor processing unit,TPU)等;在另一种实现中,处理器可以通过硬件电路的逻辑关系实现一定功能,该硬件电路的逻辑关系是固定的或可以重构的,例如处理器为ASIC或PLD实现的硬件电路,例如FPGA。在可重构的硬件电路中,处理器加载配置文档,实现硬件电路配置的过程,可以理解为处理器加载指令,以实现以上部分或全部模块的功能的过程。
可见,以上装置中的各模块可以是被配置成实施以上实施例方法的一个或多个处理器(或处理电路),例如:CPU、GPU、NPU、TPU、微处理器、DSP、ASIC、FPGA,或这些处理器形式中至少两种的组合。此外,以上装置中的各模块可以全部或部分可以集成在一起,或者可以独立实现,对此不作限定。
本申请的实施例还提供了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述实施例的方法。示例性地,可以执行上述图2、图3、图5、图6或图9所示方法的各步骤或者执行上述图11或图11所示方法的各步骤。
作为一个示例,图16示出根据本申请一实施例的一种电子设备100的结构示意图。该电子设备100可以包括手机、可折叠电子设备、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备、或智慧城市设备中的至少一种终端设备。本申请实施例对该电子设备100的具体类型不作特殊限制。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接头130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L, 骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。示例性地,处理器可以执行上述图2、图3、图5、图6或图9所示方法的各步骤。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器可以为高速缓冲存储器。该存储器可以保存处理器110用过或使用频率较高的指令或数据。如果处理器110需要使用该指令或数据,可从该存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。作为一个示例,可以存储有模型或私有知识集等。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。处理器110可以通过以上至少一种接口连接触摸传感器、音频模块、无线通信模块、显示器、摄像头等模块。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
USB接头130是一种符合USB标准规范的接口,可以用于连接电子设备100和外围设备,具体可以是Mini USB接头,Micro USB接头,USB Type C接头等。USB接头130可以用于连接充电器,实现充电器为该电子设备100充电,也可以用于连接其他电子设备,实现电子设备100与其他电子设备之间传输数据。也可以用于连接耳机,通过耳机输出电子设备中存储的音频。该接头还可以用于连接其他电子设备,例如VR设备等。
充电管理模块140用于接收充电器的充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显 示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),蓝牙低功耗(bluetooth low energy,BLE),超宽带(ultra wide band,UWB),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络和其他电子设备通信。该无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。该GNSS可以包括全球卫星定位系统(global positioning system,GPS),全 球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100可以通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或多个显示屏194。示例性地,显示屏194可以用于显示显示界面、搜索请求、搜索结果或提示信息等。
电子设备100可以通过摄像模组193,ISP,视频编解码器,GPU,显示屏194以及应用处理器AP、神经网络处理器NPU等实现摄像功能。
摄像模组193可用于采集拍摄对象的彩色图像数据以及深度数据。ISP可用于处理摄像模组193采集的彩色图像数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将该电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像模组193中。
在一些实施例中,摄像模组193可以由彩色摄像模组和3D感测模组组成。在一些实施例中,3D感测模组可以是(time of flight,TOF)3D感测模块或结构光(structured light)3D感测模块。在另一些实施例中,摄像模组193还可以由两个或更多个摄像头构成。在一些实施例中,电子设备100可以包括1个或多个摄像模组193。具体的,电子设备100可以包括1个前置摄像模组193以及1个后置摄像模组193。其中,前置摄像模组193通常可用于采集面对显示屏194的拍摄者自己的彩色图像数据以及深度数据,后置摄像模组可用于采集拍摄者所面对的拍摄对象(如人物、风景等)的彩色图像数据以及深度数据。
在一些实施例中,处理器110中的CPU或GPU或NPU可以基于华为PoissonEngine_VS向量引擎,或者开源向量引擎Faiss、Annoy、Milvus、Vearch等;进行私有知识推理,构建私有知识集,并在私有知识集存储在数据中;还可以基于用户的搜索请求,推理关键词,得到关键词对应的融合特征。
数字信号处理器用于处理数字信号,还可以处理其他数字信号。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本 理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。或将音乐,视频等文件从电子设备传输至外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,该可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等,作为一个示例,可以存储私有知识集及数据库。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备100的各种功能方法或数据处理。示例性地,可以执行上述图2、图3、图5、图6或图9所示方法。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或输出免提通话的音频信号。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100 根据压力传感器180A检测该触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。
气压传感器180C用于测量气压。在一些实施例中,电子设备100根据气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。当电子设备为可折叠电子设备,磁传感器180D可以用于检测电子设备的折叠或展开,或折叠角度。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。
环境光传感器180L可以用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否被遮挡,例如电子设备在口袋里。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当通过温度传感器180J检测的温度超过阈值,电子设备100执行降低处理器的性能,以便降低电子设备的功耗以实施热保护。
触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。
按键190可以包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动 反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或多个SIM卡接口。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图17示出根据本申请一实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图17所示,应用程序包可以包括电话、相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图17所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿,私有知识集等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件,显示图库用户界面(User Interface,UI)的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。作为一个示例,可以用于提供图片查看、搜索入口和搜索结果的显示界面。
电话管理器用于提供终端设备的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例 如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,终端设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
作为另一个示例,图18示出根据本申请一实施例的另一种电子设备的结构示意图,示例性地,该电子设备可以为服务器;如图18所示,该电子设备可以包括:至少一个处理器1601,通信线路1602,存储器1603以及至少一个通信接口1604。
处理器1601可以是一个通用中央处理器,微处理器,特定应用集成电路,或一个或多个用于控制本申请方案程序执行的集成电路;处理器1601也可以包括多个通用处理器的异构运算架构,例如,可以是CPU、GPU、微处理器、DSP、ASIC、FPGA中至少两种的组合;作为一个示例,处理器1601可以是CPU+GPU或者CPU+ASIC或者CPU+FPGA。
通信线路1602可包括一通路,在上述组件之间传送信息。
通信接口1604,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,RAN,无线局域网(wireless local area networks,WLAN)等。
存储器1603可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路1602与处理器相连接。存储器也可以和处理器集成在一起。本申请实施例提供的存储器通常可以具有非易失性。其中,存储器1603用于存储执行本申请方案的计算机执行指令,并由处理器1601来控制执行。处理器1601用于执行存储器1603中存储的计算机执行指令,从而实现本申请上述实施例中提供的方 法;示例性地,可以实现上述图11或图12所示方法的各步骤。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
示例性地,处理器1601可以包括一个或多个CPU,例如,图18中的CPU0;处理器1601也可以包括一个CPU,及GPU、ASIC、FPGA中任一个,例如,图18中的CPU0+GPU0或者CPU 0+ASIC0或者CPU0+FPGA0。在一些示例汇总,还可以包括NPU,GPU及NPU可以提供机器学习、神经网络算子所需的算力。
示例性地,电子设备可以包括多个处理器,例如图18中的处理器1601和处理器1607。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器,或者是包括多个通用处理器的异构运算架构。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,电子设备还可以包括输出设备1605和输入设备1606。输出设备1605和处理器1601通信,可以以多种方式来显示信息。例如,输出设备1605可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等,例如,可以为车载HUD、AR-HUD、显示器等显示设备。输入设备1606和处理器1601通信,可以以多种方式接收用户的输入。例如,输入设备1606可以是鼠标、键盘、触摸屏设备或传感设备等。
本申请的实施例提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述实施例中的方法。示例性地,可以实现上述图2、图3、图5、图6或图9中所示方法的各步骤或者执行上述图11或图12所示方法的各步骤。
本申请的实施例提供了一种计算机程序产品,例如,可以包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质;当所述计算机程序产品在计算机上运行时,使得所述计算机执行上述实施例中的方法。示例性地,可以执行上述图2、图3、图5、图6或图9中所示方法的各步骤或者执行上述图11或图12所示方法的各步骤。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接 收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在 最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (30)

  1. 一种搜索方法,其特征在于,所述方法包括:
    接收用户的第一搜索请求,所述第一搜索请求中包括私有实体;其中,私有实体表示与所述用户具有关联关系的实体;
    在至少一个私有实体对应的特征中,确定所述第一搜索请求中私有实体对应的特征;其中,所述至少一个私有实体对应的特征指示所述至少一个私有实体对应的数据,所述至少一个私有实体对应的数据与所述第一搜索请求对应于不同的模态;
    根据所述第一搜索请求及所述第一搜索请求中私有实体对应的特征,得到搜索结果;所述搜索结果与所述第一搜索请求对应于不同的模态;
    显示所述搜索结果。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一搜索请求及所述第一搜索请求中私有实体对应的特征,得到搜索结果,包括:
    通过模型对所述第一搜索请求及所述第一搜索请求中私有实体对应的特征进行处理,生成融合特征;所述融合特征指示所述第一搜索请求;
    将数据库中与所述融合特征相匹配的特征对应的数据作为所述搜索结果。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    获取第一私有实体对应的数据;所述第一私有实体为所述至少一个私有实体中的任一私有实体;
    对所述第一私有实体对应的数据进行处理,生成所述第一私有实体对应的特征。
  4. 根据权利要求3所述的方法,其特征在于,所述获取第一私有实体对应的数据,包括:
    向用户发出第一提示信息,所述第一提示信息用于提示用户选取包含所述第一私有实体的数据;
    响应于用户的选取操作,获取所述第一私有实体对应的数据。
  5. 根据权利要求3所述的方法,其特征在于,所述获取第一私有实体对应的数据,包括:
    获取包含所述第一私有实体的初始数据;
    响应于用户的第二搜索请求,显示至少一个搜索结果;所述第二搜索请求包含所述第一私有实体,所述至少一个搜索结果与所述第二搜索请求对应于不同的模态;所述至少一个搜索结果包括所述初始数据;
    基于用户在所述至少一个搜索结果中选取所述初始数据的操作,将所述初始数据作为所述第一私有实体对应的数据。
  6. 根据权利要求3-5中任一项所述的方法,其特征在于,所述方法还包括:
    获取包含所述第一私有实体及第二私有实体的数据;其中,所述第二私有实体为所述 至少一个私有实体中除所述第一私有实体之外的任一私有实体;
    根据所述包含所述第一私有实体及第二私有实体的数据,及所述第一私有实体对应的特征,得到所述第二私有实体对应的特征。
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,所述私有实体包括:与所述用户相关联的称谓、人名或昵称。
  8. 一种搜索方法,其特征在于,所述方法包括:
    显示第一显示界面;所述第一显示界面包括搜索入口;
    响应于用户在所述搜索入口输入关键词的操作,显示第二显示界面,所述第二显示界面包括对应于所述关键词的第一搜索结果;其中,所述关键词中包括私有实体;所述第一搜索结果根据所述关键词及所述私有实体对应的特征得到;所述第一搜索结果与所述关键词对应于不同的模态。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    显示第三显示界面,所述第三显示界面包括第一提示信息及至少一个数据;所述第一提示信息用于提示用户选取包含第一私有实体的数据;
    响应于用户在所述至少一个数据中的选取操作,显示第四显示界面,所述第四显示界面包括第一数据及第一标识,所述第一标识用于指示所述第一数据为用户所选取的数据。
  10. 根据权利要求8或9所述的方法,其特征在于,所述第二显示界面还包括:第二提示信息,所述第二提示信息用于提示用户确认所述第一搜索结果是否为用户期望的搜索结果;
    所述方法还包括:响应于用户的确认操作,显示第五显示界面,所述第五显示界面包括第一搜索结果及第二标识,所述第二标识用于指示所述第一搜索结果为用户所确认的搜索结果。
  11. 一种模型训练方法,其特征在于,所述方法包括:
    获取多模态样本集及第一模型;其中,所述多模态样本集包括:第一私有实体对应的多模态样本、第一搜索请求对应的多模态样本、第二搜索请求对应的多模态样本;所述第一私有实体表示与用户具有关联关系的实体,所述第一搜索请求包含所述第一私有实体,所述第二搜索请求不包含所述第一私有实体;
    利用所述多模态样本集对所述第一模型进行训练,得到第二模型。
  12. 根据权利要求11所述的方法,其特征在于,所述利用所述多模态样本集对所述第一模型进行训练,得到第二模型,包括:
    通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征;
    根据所述融合特征、所述第一搜索请求对应的特征及所述第二搜索请求对应的特征, 对所述第一模型进行训练,得到所述第二模型;其中,所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的多模态样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的多模态样本进行处理得到。
  13. 根据权利要求12所述的方法,其特征在于,所述通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征,包括:
    将所述第一搜索请求对应的多模态样本输入到所述第一模型中,生成第一特征;
    将所述第一私有实体对应的多模态样本输入到所述第一模型中,生成第二特征;
    对所述第一特征及所述第二特征进行融合,得到所述融合特征。
  14. 根据权利要求12或13所述的方法,其特征在于,所述多模态样本包括第一模态的样本及第二模态的样本;所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的第一模态的样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的第一模态的样本进行处理得到;
    所述通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征,包括:
    将所述第一私有实体对应的第一模态的样本及所述第一搜索请求对应的第二模态的样本输入到所述第一模型中,得到所述融合特征。
  15. 一种搜索装置,其特征在于,所述装置包括:
    接收模块,用于接收用户的第一搜索请求,所述第一搜索请求中包括私有实体;其中,私有实体表示与所述用户具有关联关系的实体;
    确定模块,用于在至少一个私有实体对应的特征中,确定所述第一搜索请求中私有实体对应的特征;其中,所述至少一个私有实体对应的特征指示所述至少一个私有实体对应的数据,所述至少一个私有实体对应的数据与所述第一搜索请求对应于不同的模态;
    搜索模块,用于根据所述第一搜索请求及所述第一搜索请求中私有实体对应的特征,得到搜索结果;所述搜索结果与所述第一搜索请求对应于不同的模态;
    显示模块,用于显示所述搜索结果。
  16. 根据权利要求15所述的装置,其特征在于,所述搜索模块,还用于:通过模型对所述第一搜索请求及所述第一搜索请求中私有实体对应的特征进行处理,生成融合特征;所述融合特征指示所述第一搜索请求;将数据库中与所述融合特征相匹配的特征对应的数据作为所述搜索结果。
  17. 根据权利要求15或16所述的装置,其特征在于,所述装置还包括:生成模块,用于获取第一私有实体对应的数据;所述第一私有实体为所述至少一个私有实体中的任一私有实体;对所述第一私有实体对应的数据进行处理,生成所述第一私有实体对应的特征。
  18. 根据权利要求17所述的装置,其特征在于,所述生成模块,还用于:向用户发出第一提示信息,所述第一提示信息用于提示用户选取包含所述第一私有实体的数据;响应于用户的选取操作,获取所述第一私有实体对应的数据。
  19. 根据权利要求17所述的装置,其特征在于,所述生成模块,还用于:获取包含所述第一私有实体的初始数据;响应于用户的第二搜索请求,显示至少一个搜索结果;所述第二搜索请求包含所述第一私有实体,所述至少一个搜索结果与所述第二搜索请求对应于不同的模态;所述至少一个搜索结果包括所述初始数据;基于用户在所述至少一个搜索结果中选取所述初始数据的操作,将所述初始数据作为所述第一私有实体对应的数据。
  20. 根据权利要求17-19中任一项所述的装置,其特征在于,所述生成模块,还用于:获取包含所述第一私有实体及第二私有实体的数据;其中,所述第二私有实体为所述至少一个私有实体中除所述第一私有实体之外的任一私有实体;根据所述包含所述第一私有实体及第二私有实体的数据,及所述第一私有实体对应的特征,得到所述第二私有实体对应的特征。
  21. 根据权利要求15-20中任一项所述的装置,其特征在于,所述私有实体包括:与用户相关联的称谓、人名或昵称。
  22. 一种搜索装置,其特征在于,所述装置包括:
    第一显示模块,用于显示第一显示界面;所述第一显示界面包括搜索入口;
    第二显示模块,用于响应于用户在所述搜索入口输入关键词的操作,显示第二显示界面,所述第二显示界面包括对应于所述关键词的搜索结果;其中,所述关键词中包括私有实体;所述搜索结果根据所述关键词及所述私有实体对应的特征得到;所述搜索结果与所述关键词对应于不同的模态。
  23. 根据权利要求22所述的装置,其特征在于,所述装置还包括:第三显示模块,用于显示第三显示界面,所述第三显示界面包括第一提示信息及至少一个数据;所述第一提示信息用于提示用户选取包含第一私有实体的数据;第四显示模块,用于响应于用户在所述至少一个数据中的选取操作,显示第四显示界面,所述第四显示界面包括第一数据及第一标识,所述第一标识用于指示所述第一数据为用户所选取的数据。
  24. 根据权利要求22或23所述的装置,其特征在于,所述第二显示界面还包括:第二提示信息,所述第二提示信息用于提示用户确认所述第一搜索结果是否为用户期望的搜索结果;所述装置还包括:第五显示模块,用于响应于用户的确认操作,显示第五显示界面,所述第五显示界面包括第一搜索结果及第二标识,所述第二标识用于指示所述第一搜索结果为用户所确认的搜索结果。
  25. 一种模型训练装置,其特征在于,所述装置包括:获取模块,用于获取多模态样 本集及第一模型;其中,所述多模态样本集包括:第一私有实体对应的多模态样本、第一搜索请求对应的多模态样本、第二搜索请求对应的多模态样本;所述第一私有实体表示与用户具有关联关系的实体,所述第一搜索请求包含所述第一私有实体,所述第二搜索请求不包含所述第一私有实体;训练模块,用于利用所述多模态样本集对所述第一模型进行训练,得到第二模型。
  26. 根据权利要求25所述的装置,其特征在于,所述训练模块,还用于:通过所述第一模型对所述第一私有实体对应的多模态样本及所述第一搜索请求对应的多模态样本进行处理,得到融合特征;根据所述融合特征、所述第一搜索请求对应的特征及所述第二搜索请求对应的特征,对所述第一模型进行训练,得到所述第二模型;其中,所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的多模态样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的多模态样本进行处理得到。
  27. 根据权利要求26所述的装置,其特征在于,所述训练模块,还用于:将所述第一搜索请求对应的多模态样本输入到所述第一模型中,生成第一特征;将所述第一私有实体对应的多模态样本输入到所述第一模型中,生成第二特征;对所述第一特征及所述第二特征进行融合,得到所述融合特征。
  28. 根据权利要求26或27所述的装置,其特征在于,所述多模态样本包括第一模态的样本及第二模态的样本;所述第一搜索请求对应的特征由所述第一模型对所述第一搜索请求对应的第一模态的样本进行处理得到,所述第二搜索请求对应的特征由所述第一模型对所述第二搜索请求对应的第一模态的样本进行处理得到;所述训练模块,还用于:将所述第一私有实体对应的第一模态的样本及所述第一搜索请求对应的第二模态的样本输入到所述第一模型中,得到所述融合特征。
  29. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-7任意一项所述的方法、实现权利要求8-10所述的方法、或者实现权利要求11-14任意一项所述的方法。
  30. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-7任意一项所述的方法、实现权利要求8-10所述的方法、或者实现权利要求11-14任意一项所述的方法。
PCT/CN2022/128375 2022-10-28 2022-10-28 一种搜索方法、模型训练方法、装置及存储介质 WO2024087202A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/128375 WO2024087202A1 (zh) 2022-10-28 2022-10-28 一种搜索方法、模型训练方法、装置及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/128375 WO2024087202A1 (zh) 2022-10-28 2022-10-28 一种搜索方法、模型训练方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2024087202A1 true WO2024087202A1 (zh) 2024-05-02

Family

ID=90829765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128375 WO2024087202A1 (zh) 2022-10-28 2022-10-28 一种搜索方法、模型训练方法、装置及存储介质

Country Status (1)

Country Link
WO (1) WO2024087202A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132685A1 (en) * 2016-01-19 2017-05-11 Merry Houston Hou Monetizing search results as an electronic marketplace
CN108431801A (zh) * 2015-12-28 2018-08-21 谷歌有限责任公司 为与用户关联的图像生成标签
CN108431802A (zh) * 2015-12-28 2018-08-21 谷歌有限责任公司 组织与用户关联的图像
CN111708943A (zh) * 2020-06-12 2020-09-25 北京搜狗科技发展有限公司 一种搜索结果展示方法、装置和用于搜索结果展示的装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108431801A (zh) * 2015-12-28 2018-08-21 谷歌有限责任公司 为与用户关联的图像生成标签
CN108431802A (zh) * 2015-12-28 2018-08-21 谷歌有限责任公司 组织与用户关联的图像
US20170132685A1 (en) * 2016-01-19 2017-05-11 Merry Houston Hou Monetizing search results as an electronic marketplace
CN111708943A (zh) * 2020-06-12 2020-09-25 北京搜狗科技发展有限公司 一种搜索结果展示方法、装置和用于搜索结果展示的装置

Similar Documents

Publication Publication Date Title
CN110111787B (zh) 一种语义解析方法及服务器
CN110503959B (zh) 语音识别数据分发方法、装置、计算机设备及存储介质
CN110798506B (zh) 执行命令的方法、装置及设备
WO2022052776A1 (zh) 一种人机交互的方法、电子设备及系统
CN112269853B (zh) 检索处理方法、装置及存储介质
CN111970401B (zh) 一种通话内容处理方法、电子设备和存储介质
WO2021254411A1 (zh) 意图识别方法和电子设备
CN111881315A (zh) 图像信息输入方法、电子设备及计算机可读存储介质
CN113010740A (zh) 词权重的生成方法、装置、设备及介质
CN116415594A (zh) 问答对生成的方法和电子设备
WO2021169370A1 (zh) 服务元素的跨设备分配方法、终端设备及存储介质
CN112740148A (zh) 一种向输入框中输入信息的方法及电子设备
CN114691839A (zh) 一种意图槽位识别方法
CN112287070A (zh) 词语的上下位关系确定方法、装置、计算机设备及介质
WO2024087202A1 (zh) 一种搜索方法、模型训练方法、装置及存储介质
WO2021031862A1 (zh) 一种数据处理方法及其装置
CN113380240B (zh) 语音交互方法和电子设备
CN112988984B (zh) 特征获取方法、装置、计算机设备及存储介质
CN113497835B (zh) 多屏交互方法、电子设备及计算机可读存储介质
CN116861066A (zh) 应用推荐方法和电子设备
CN112416984B (zh) 一种数据处理方法及其装置
CN111597823A (zh) 中心词提取方法、装置、设备及存储介质
CN114513575B (zh) 一种收藏处理的方法及相关装置
CN114817521B (zh) 搜索方法和电子设备
WO2024114785A1 (zh) 一种图像处理方法、电子设备及系统