WO2019169872A1 - Procédé et dispositif de recherche de ressource de contenu, et serveur - Google Patents

Procédé et dispositif de recherche de ressource de contenu, et serveur Download PDF

Info

Publication number
WO2019169872A1
WO2019169872A1 PCT/CN2018/111433 CN2018111433W WO2019169872A1 WO 2019169872 A1 WO2019169872 A1 WO 2019169872A1 CN 2018111433 W CN2018111433 W CN 2018111433W WO 2019169872 A1 WO2019169872 A1 WO 2019169872A1
Authority
WO
WIPO (PCT)
Prior art keywords
similarity
visual
content resource
text
preset
Prior art date
Application number
PCT/CN2018/111433
Other languages
English (en)
Chinese (zh)
Inventor
董维山
王园
毛妤
袁洁
陈曼仪
杨茗名
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2019169872A1 publication Critical patent/WO2019169872A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of computer network technologies, and in particular, to a method, an apparatus, and a server for searching for content resources.
  • word frequency TF-IDF term frequency-inverse document frequency
  • word2vec word vector
  • the embodiment of the present application provides a method, an apparatus, and a server for searching for a content resource, to solve or alleviate one or more technical problems in the prior art, and at least provide a beneficial choice.
  • the embodiment of the present application provides a method for searching for a content resource, including:
  • the target content resource is determined from the at least one candidate content resource based on the determined text similarity and visual similarity.
  • the present application determines, according to the determined text similarity and visual similarity, the target content resource from the at least one candidate content resource, including:
  • the target content resource is determined from the at least one candidate content resource based on the obtained overall similarity.
  • the overall similarity between the preset picture and the candidate content resource is obtained according to the determined text similarity and visual similarity Degree, including:
  • the text feature of the preset picture is extracted, including:
  • the first embodiment of the first aspect or the second embodiment of the first aspect or the third embodiment of the first aspect, the fourth embodiment of the first aspect of the present application Determining text similarity between the text feature of the preset picture and the text feature of the candidate content resource, including:
  • the content repository includes a plurality of content resources and their corresponding text labels.
  • the application is further in the fifth embodiment of the first aspect
  • the method includes: determining a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource, including:
  • the sampling, by the candidate content resource includes:
  • the candidate content resources are subjected to perspective sampling in a preset observation manner and a sampling manner;
  • the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  • an embodiment of the present application provides an apparatus for searching for a content resource, including:
  • An acquiring module configured to acquire a preset picture and at least one candidate content resource, and separately extract text features and visual features of the preset picture and text features and visual features of the candidate content resource;
  • a first determining module configured to determine a text similarity between a text feature of the preset picture and a text feature of the candidate content resource
  • a second determining module configured to determine a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource
  • the target determining module is configured to determine the target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity.
  • the target determining module includes:
  • a first calculation submodule configured to obtain an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity
  • the target determining submodule is configured to determine the target content resource from the at least one candidate content resource according to the obtained overall similarity.
  • the first computing submodule is further configured to:
  • the acquiring module includes:
  • An identifier module configured to identify a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture;
  • the extracting sub-module is configured to obtain a corresponding webpage content according to the uniform resource locator of the preset image, and extract a text feature of the preset image from the webpage content.
  • the first determining module includes:
  • a first determining submodule configured to obtain a content resource in the preset content resource library as the at least one candidate content resource, and determine that the text feature of the preset image is similar to the text label of the candidate content resource
  • the preset content resource library includes a plurality of content resources and corresponding text labels.
  • the second determining module includes:
  • sampling sub-module configured to sample the candidate content resource to obtain at least one sample picture corresponding to the content resource
  • a second determining submodule configured to determine a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture for each sampled picture
  • a third determining submodule configured to determine a visual feature of the candidate content resource according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset image The visual similarity between the visual features of the preset picture.
  • the sampling submodule is further configured to:
  • the candidate content resources are subjected to perspective sampling in a preset observation manner and a sampling manner;
  • the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  • an embodiment of the present application provides a server, where the server includes:
  • One or more processors are One or more processors;
  • a storage device for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described above.
  • an embodiment of the present application provides a computer readable storage medium, configured to store computer software instructions for a device for searching for a content resource, where the method for searching for a content resource in the foregoing first aspect is searching for content The program involved in the device of the resource.
  • the text feature of the preset picture and the text of each content resource may be determined based on the text feature and the visual feature of the preset picture.
  • the text similarity between the features determining a visual similarity between the visual features of the preset picture and the visual features of each content resource, and then determining the target content from the content resources according to the determined text similarity and visual similarity Resources. Since the similarity and visual similarity of the text are combined in the search process, the desired content resources can be accurately searched.
  • FIG. 1 is a flowchart of a method for searching for content resources according to Embodiment 1 of the present application;
  • FIG. 2 is a flowchart of a method for searching for content resources according to Embodiment 2 of the present application
  • FIG. 3 is a schematic diagram of performing perspective sampling on content resources in a method for searching for content resources according to Embodiment 2 of the present application;
  • FIG. 4 is a schematic diagram of a comparison of visual features of a preset picture and a content resource in a method for searching for a content resource according to Embodiment 2 of the present application;
  • FIG. 5 is a flowchart of a method for searching for content resources according to Embodiment 3 of the present application.
  • FIG. 6 is a schematic diagram of an apparatus for searching for content resources according to Embodiment 4 of the present application.
  • FIG. 7 is a schematic diagram of an apparatus for searching for content resources according to Embodiment 5 of the present application.
  • FIG. 8 is a schematic diagram of a server according to Embodiment 6 of the present application.
  • FIG. 1 is a flowchart of a method for searching for content resources according to an embodiment of the present application.
  • the method for searching for content resources in the embodiment of the present application includes the following steps:
  • S101 Acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource.
  • the preset picture may include, but is not limited to, a network picture, a picture stored in an album, a picture taken by a camera, or a hand-drawn sketch.
  • the embodiments of the present application may receive a picture sent by a client as a preset picture according to various API (Application Programming Interface) interfaces of a protocol such as HTTP (HyperText Transfer Protocol) and HTTPS.
  • the preset image is obtained by the webpage address of the image input by the user.
  • the method for obtaining the text feature of the preset picture may be: analyzing the preset picture to obtain one or more short texts capable of describing or representing the picture content, thereby converting the query condition in the form of the picture into the query condition in the text form.
  • the specific analysis method may include: constructing a picture classifier by using a machine learning algorithm, and then inputting a preset picture into a picture classifier for analysis.
  • the picture classifier can be used to analyze the preset picture to obtain the picture content, and output the description text of the preset picture. For example, if you enter the picture of Tyrannosaurus in the image classifier, you can output the text "T-Rex".
  • Content resources related to embodiments of the present application include, but are not limited to, normal video, panoramic picture, panoramic video, three-dimensional (3D) model, three-dimensional animation, and their presentation in virtual reality (VR) and augmented reality (AR) scenarios.
  • VR virtual reality
  • AR augmented reality
  • a panoramic photo (PANORAMIC PHOTO or PANORAMA) includes a normal effective viewing angle (approximately 90 degrees horizontally, 70 degrees vertical) of a person's eyes or a binocular residual light angle (approximately 180 degrees horizontally, 90 degrees vertical) or even 360 degrees complete. A photo taken in the scene range.
  • the content resource may be obtained by a web crawler from the Internet or by a content producer.
  • web crawler technology can be used to obtain content resources.
  • Content resource producers can also create content resources and build them into content repositories.
  • Content resources can be tagged with text to facilitate classification, management, and retrieval.
  • the content repository can be updated at preset intervals. In this way, when searching for content resources, you can search in the content resource library to improve search efficiency.
  • the text features of the preset picture and the text features of each content resource in the content resource library may be compared one by one to determine the text similarity between the preset picture and each content resource.
  • the text feature of the preset picture is "Tiananmen.” If a content resource is a panorama, the text feature of the content resource is "Tiananmen Square.” Comparing the "Tiananmen" of the preset picture with the "Tiananmen Square" of the content resource, it can be determined that the text similarity is high. If the text feature of another content resource is "Nanjing", the "Tiananmen" of the preset picture is compared with the "Nanjing" of the content resource, and the text similarity between the two is low.
  • the visual feature may be attribute data that represents semantics of the image, such as color, texture, and the like of the image.
  • step S101 of the embodiment of the present application the text feature and the visual feature of the preset picture may be simultaneously extracted, and the text feature and the visual feature of the preset picture may be separately extracted.
  • the order of extracting the text features and the visual features of the preset picture is not limited.
  • step S102 may be performed to determine the text similarity
  • step 103 is performed to determine the visual similarity.
  • step 103 is performed to determine the visual similarity
  • step S102 is performed to determine the text similarity.
  • the extraction of these two features and the corresponding process of determining the similarity can also be performed in parallel.
  • the technical solution of the search provided by the embodiment of the present application may determine the text similarity between the text feature of the preset image and the text feature of each content resource according to the text feature and the visual feature of the preset image, and determine the visual feature of the preset image.
  • the visual similarity with the visual features of each content resource and then determining the target content resource from each content resource according to the determined text similarity and visual similarity, since the similarity and vision of the text are combined in the process of searching Similarity, which can accurately search for the required content resources, and is suitable for searching various content resources such as panoramic pictures, panoramic videos, three-dimensional models, three-dimensional animations, display in virtual reality and augmented reality scenes.
  • FIG. 2 is a flowchart of a method for searching for content resources according to an embodiment of the present application.
  • the method for searching for content resources in the embodiment of the present application includes the following steps:
  • S201 Acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource.
  • the preset picture classification model may be used to identify the picture content of the preset picture, and the text feature is extracted from the preset picture.
  • the image classification model can be trained based on the convolutional neural network algorithm and according to the vertical class, and the image of the Tyrannosaurus can be input to the image classification model, and the image classification model can output the text classification label "Treasure Dragon".
  • the text feature of the preset picture may be extracted by acquiring a corresponding webpage content according to a Uniform Resource Locator (URL) of the preset image, and extracting a preset from the webpage content.
  • the textual characteristics of the picture For example, when the preset picture is from the Internet, or when the web page contains the same picture as the preset picture, the URL of the preset picture or the same picture can be obtained.
  • the content included in the webpage indicated by the URL is processed to extract the text feature of the preset image. For example, input the preset picture of the Tiananmen Gate and the web address of the preset picture, and process the content in the webpage to generate a short text "Tiananmen" and then output the short text "Tiananmen".
  • the content resource in the preset content resource library is obtained as the at least one candidate content resource, and the text similarity between the text feature of the preset image and the text label of the candidate content resource is determined.
  • a text label for a content resource can be set as it is generated.
  • it can be implemented using a "keyword text similarity calculation" module based on natural language processing technology commonly used by web search engines.
  • the following example shows the keyword text similarity calculation process: given two keywords (both short texts), the text similarity calculation model is adopted, which is constructed based on user keyword data and click log data, and has been offline.
  • the pre-training is completed (for example, based on a neural network or a bag model), and the semantic similarity of the two keywords is scored by a text similarity calculation model. The higher the score, the closer the two keywords are semantically similar, and vice versa.
  • the output score value ranges from [-1, 1].
  • the similarity scores of "Tiananmen Square” and “Tiananmen Square” are set to s, the value of s should be close to 1, and the similarity scores of "Tiananmen” and “Nanjing Road” should be significantly lower than s.
  • the content resources may be sampled in the visual space in a preset observation mode and a sampling manner.
  • the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  • the sampling manner may include equal interval sampling, random sampling, sampling based on user interaction history record distribution, and the like.
  • the user's observation mode can be simulated to sample the angle of view of the entire visible space, that is, the content resources are planarly projected at the simulated observation point, and the corresponding point is obtained. Sampling the picture and then adjusting the simulated observation point, ie changing the viewing position.
  • the perspective sampling method is common to all types of content resources.
  • the sampling interval also needs to consider factors such as calculation amount, storage space and accuracy, and recall rate.
  • the content resource is a 3D model of Tyrannosaurus Rex, and different observation modes are used, for example, a certain plane angle is rotated, and a 3D model of Tyrannosaurus Rex is sampled once to obtain a sample picture 1 and a sample picture 2 , sample picture 3... sample picture n.
  • S204 Determine, for each sample picture, a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture.
  • a picture feature extractor is used to extract a visual feature of a preset picture.
  • This step can be implemented by a visual feature-based "similar map retrieval" module commonly used by image search engines in the conventional technology.
  • the process of similarity map retrieval includes: taking a preset picture, and adopting a pre-trained or pre-trained picture feature extractor (for example, based on a convolutional neural network or the like) to perform visual feature extraction on the preset picture.
  • the above method can also be used to extract the visual features of the sampled picture.
  • the visual features of the preset picture and the visual features of each sample picture are determined, and the visual similarity between the preset picture and each sample picture is obtained.
  • S205 Determine, according to a visual similarity between a visual feature of each sample picture corresponding to the candidate content resource and a visual feature of the preset picture, a visual feature of the candidate content resource and a visual of the preset picture. Visual similarity between features.
  • step S205 for the candidate content resource, multiple sampled pictures are obtained.
  • the preset pictures are respectively compared with each sampled picture of the candidate content resources, and the visual similarity between the preset picture and each sampled picture is calculated. Degree, and output visual similarity for all sampled pictures.
  • the visual similarity between the visual feature of the preset picture and the visual feature of each sample picture of the candidate content resource is determined.
  • a higher visual similarity indicates that the preset picture is closer to the sampled picture in visual semantics, and vice versa.
  • Step S206 includes: A, selecting the determined text similarity and visual similarity based on the preset threshold; B, obtaining the preset picture and the according to the selected text similarity and visual similarity The overall similarity of candidate content resources.
  • methods for calculating the overall similarity include, but are not limited to, linear weighting, product, value domain normalization, and the like.
  • linear weighting method as an example: suppose a content resource corresponds to a text feature, and correspondingly, the preset image has a value corresponding to the text similarity of a content resource.
  • the content resource may be sampled to obtain a plurality of sampled pictures, and corresponding multiple visual similarities are obtained.
  • the text similarity is taken as an item in the formula
  • each visual similarity is taken as an item in the formula, and then each item is multiplied by the corresponding weight, and then summed to obtain the overall similarity, the formula as follows:
  • Q represents the overall degree of similarity
  • a, b, c -> n is the weight
  • S 0 represents text similarity
  • S 1 represents text similarity
  • Determining the target content resource can be done in the following ways:
  • the content resource whose overall similarity is greater than the first preset threshold is the target content resource. For example, if the first preset threshold is 80%, the content resources whose overall similarity is greater than 80% are outputted among all the content resources that are searched.
  • the searched content resources may be sorted according to the overall similarity size, and the content resources ranked in the first few bits are determined as the target content resources.
  • the content function is sorted in order of overall similarity by using the ranking function rank, and the first five content resources are reserved.
  • the text similarity and/or the visual similarity may be separately filtered according to the value.
  • the content resource whose text similarity is smaller than the first preset threshold is filtered; and the content resource whose visual similarity is less than the second preset threshold is filtered. This can reduce the amount of content resources that need to calculate the overall similarity, thereby reducing the amount of calculation and improving the calculation efficiency.
  • the preset picture is a certain scenic spot picture
  • the candidate content resource is a panoramic picture.
  • the panoramic view is sampled to obtain a plurality of sampled pictures, and then the scenic picture is compared with each sampled picture. If the two match, the panoramic image corresponding to the sampled image is a content resource matching the image of a certain scenic spot, and the panoramic image is output.
  • each content resource in the content resource library is used as a candidate content resource, but sometimes in order to reduce the amount of calculation, only a certain category of content resources is used as a candidate content resource, and then the above steps are repeated to obtain a preset image and The overall similarity of each candidate content resource, and then the candidate content is sorted according to the size of the overall similarity. The higher the overall similarity value, the higher the similarity between the preset image and the content resource.
  • each content resource has its corresponding identifier.
  • the searched content resource when the searched content resource is output, the content resource itself is not directly output, but the identifier (ID) of the content resource is output.
  • the identifier (ID) of the corresponding content resource is output.
  • the content resources are sorted according to the overall similarity size, and then the IDs of the first n content resources are output.
  • the content resource storage address can be obtained according to the ID, and then the user selects the multimedia file to be displayed in the candidate multimedia file list through a certain interaction mode, for example, selecting a remote player, etc. in the browser interface. , that is, the target content resource.
  • the technical solution of the embodiment of the present application samples the content resources by using different viewing modes and sampling intervals of the content resources, so that the visual features of the preset images can be comprehensively matched with the visual features of the content resources, so that the search content resources are accurate. Higher degrees.
  • the embodiment of the present application provides a method for searching for content resources.
  • FIG. 5 it is a flowchart of a method for searching for content resources according to an embodiment of the present application.
  • the method for searching for content resources in the embodiment of the present application includes:
  • 1Preset picture As the initial input of the system, generated by the browser client, containing the URL content of the picture content and picture.
  • the form of the picture is not limited, and can be a picture file uploaded by the user, a picture taken by the camera or a hand-drawn sketch.
  • 2 network interface receive and parse the preset picture sent by the client, and return the search result of the content resource to the client.
  • Possible implementations include, but are not limited to, various types of API interface definitions based on protocols such as HTTP and HTTPS.
  • 3 Picture Guess Enter the preset picture passed for the network interface, and output as one or more short text segments that can describe or represent the content of the picture.
  • the role of the picture guessing word is to convert the preset picture of the picture form into a keyword of the text form.
  • mapping functions include:
  • the module uses the same picture to match or use the URL information to aggregate, extract, and generate text output on the text information on the source page of the Internet (in the case where the picture is reproduced, there may be multiple source pages) .
  • the module outputs the short text "Tiananmen”.
  • the picture content is identified, and the classification label text is output. For example, given the picture input of Tyrannosaurus Rex, the module outputs the short text "Overlord Dragon".
  • Text similarity calculation input is the output of 3 (the result of the picture guessing word and the set of text labels carried by each resource in the content resource library). This step performs a pairwise matching between the guess word result text and all content resource text labels to obtain multiple text matching pairs, and calculates the text similarity between each two text matching pairs, and outputs the text similarity score.
  • This step can be implemented by a "query text similarity calculation” module based on natural language processing technology commonly used by web search engines.
  • the text similarity calculation function of the preset picture is: given two pieces of keywords (both short texts), using a text similarity calculation model based on user query data and click log offline pre-training (eg based on neural network) Or the word bag model), which scores the semantic similarity of two keywords. The higher the score, the closer the two keywords are semantically similar, and vice versa.
  • the output score value ranges from [-1, 1].
  • the similarity scores of "Tiananmen Square” and “Tiananmen Square” are set to s, the value of s should be close to 1, and the similarity scores of "Tiananmen” and “Nanjing Road” should be significantly lower than s.
  • the content resource library is a collection of various resources, which are provided by search engine crawler crawling or content producers, and each resource has a text label for classification, management and retrieval.
  • 6-view sampling The input is any resource in the content library, and the output is several sample images.
  • the viewing angle can be viewed in the entire visible space by simulating and changing the user's viewing manner, including but not limited to viewing position, angle, visual range, and the like. Sampling, obtaining multiple images, each of which is a plane projection of the content to the simulated observation point. Viewing angle sampling can be used for panoramic/3D/AR/VR content.
  • sampling interval of viewing position, angle and visual range can be traded off between calculation amount, storage space and accuracy, and recall rate; for panoramic video containing animation content and
  • the 3D animation is further matched with the frame sampling to generate an output sample picture on the time axis, and the sampling time interval is also trade-off between the calculation amount, the storage space and the accuracy, and the recall rate.
  • Typical sampling techniques include, but are not limited to, equally spaced sampling, random sampling, sampling based on user interaction history distribution, and the like.
  • This step can be implemented by a visual feature-based "similar map retrieval" module commonly used by image search engines.
  • the function of the similarity map search is: given a preset picture, using a pre-defined or offline pre-trained picture feature extractor (eg based on a convolutional neural network, etc.) to perform visual feature extraction on the preset picture, The extracted features are compared with the features of each picture in the picture library, and the similarity of the visual features is scored. The higher the score, the closer the visual difference between the preset picture and the picture in a certain library, and vice versa.
  • the input is the output of 4 and 6, that is, the text matching pair of the preset picture and the resource in the content library and the similarity score of the picture matching pair in the text and the visual
  • the output is the overall similarity and corresponding Candidate content resource ID.
  • the overall similarity calculation is based on a combination of text and visual similarity scores, and possible implementations include, but are not limited to, linear weighting, product, range normalization, and the like. At the same time, additional factors may be considered, including but not limited to content quality assessment indices (high quality, low quality, resolution, model sophistication, etc.), user history click records, laws and regulations, and the like.
  • the text similarity score and the visual similarity score entering the calculation may be separately filtered, for example, the similarity score below a certain threshold is directly filtered. Do not enter the overall similarity calculation process to reduce the amount of calculation.
  • the output of input 7 is the overall similarity and the corresponding candidate content resource ID, and the output is the first k candidate content resource IDs arranged in descending order of the overall similarity score.
  • the user selects the content resource to be displayed in the candidate content resource list through a certain interaction mode in the browser interface, and is displayed by the browser client.
  • 3-9 can be pre-calculated by offline method, thereby accelerating the online search process.
  • the image library of the whole webpage may be processed in advance in an offline manner, the similarity score calculation and sorting may be performed offline, a static lookup table structure may be established, and the image in any webpage may be associated with the content resource. .
  • This lookup table can be updated in incremental calculations.
  • 3-9 can be calculated online.
  • the above online and offline calculation processes can be accelerated by techniques such as parallel computing.
  • the typical matching result is shown in the embodiment shown in FIG. 3. It can be seen that the matching can be accurate to a specific viewing angle, the matching precision is high, and the user experience is good.
  • the technical solution of the embodiment of the present application samples the content resources by using different viewing modes and sampling intervals of the content resources, so that the visual features of the preset images can be comprehensively matched with the visual features of the content resources, so that the search content resources are accurate. Higher degrees.
  • FIG. 6 is a schematic diagram of an apparatus for searching for content resources according to an embodiment of the present application.
  • the device for searching for content resources in the embodiment of the present application includes:
  • the obtaining module 61 is configured to acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource;
  • the first determining module 62 is configured to determine a text similarity between the text feature of the preset picture and the text feature of the candidate content resource;
  • a second determining module 63 configured to determine a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource
  • the target determining module 64 is configured to determine the target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity.
  • the technical solution of the embodiment of the present application can realize the combination of the text features of the preset picture and the content resource, and the accuracy of searching for the content resource is high.
  • the technical effect is the same as that of the first embodiment, and the technical effect is not the same. Let me repeat.
  • FIG. 7 is a schematic diagram of an apparatus for searching for content resources according to an embodiment of the present application.
  • the device for searching content resources in the embodiment of the present application :
  • the target determination module 64 includes:
  • the first calculation sub-module 641 is configured to obtain an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity;
  • the target determining sub-module 642 is configured to determine the target content resource from the at least one candidate content resource according to the obtained overall similarity.
  • the first calculation submodule is further configured to:
  • the obtaining module 61 includes:
  • the identification sub-module 611 is configured to identify a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture; or
  • the extraction sub-module 612 is configured to obtain a corresponding webpage content according to the uniform resource locator of the preset image, and extract a text feature of the preset image from the webpage content.
  • the first determining module 62 includes:
  • the first determining sub-module 621 is configured to acquire a content resource in the preset content resource library as the at least one candidate content resource, and determine a text between the text feature of the preset image and the text label of the candidate content resource. Similarity, wherein the preset content resource library includes a plurality of content resources and their corresponding text labels.
  • the second determining module 63 includes:
  • the sampling sub-module 631 is configured to sample the candidate content resource to obtain at least one sample picture corresponding to the content resource;
  • a second determining sub-module 632 configured to determine a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture for each sampled picture
  • the third determining sub-module 633 is configured to determine a visual feature of the candidate content resource according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset picture The visual similarity between the visual features of the preset picture.
  • the sampling sub-module is specifically configured to: perform, in a visible space, a sampling manner of the candidate content resource in a preset observation manner and a sampling manner;
  • the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  • the technical solution of the embodiment of the present application can sample the content resources by using multiple observation modes and sampling modes, so that the accuracy of searching the content resources is high, and the technical effect is the same as that of the second embodiment, and details are not described herein again. .
  • the embodiment of the present application provides an information classification device.
  • the device includes a memory 81 and a processor 82.
  • the memory 81 stores a computer program executable on the processor 82.
  • the processor 82 implements the information classification method in the above embodiment when the computer program is executed.
  • the number of memories 81 and processors 82 may be one or more.
  • the device also includes:
  • the communication interface 83 is used for communication between the memory 81 and the processor 82 and an external device.
  • the memory 81 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • the bus may be an Industrial Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Component (EISA) bus.
  • ISA Industrial Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Component
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 8, but it does not mean that there is only one bus or one type of bus.
  • the memory 81, the processor 82, and the communication interface 83 are integrated on one chip, the memory 81, the processor 82, and the communication interface 83 can complete communication with each other through the internal interface.
  • a computer readable storage medium storing a computer program that, when executed by a processor, implements the method of any of the embodiments of FIGS. 1-5.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” and “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is two or more unless specifically and specifically defined otherwise.
  • a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with such an instruction execution system, apparatus, or device.
  • the computer readable medium described in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. More specific examples of computer readable storage media, at least (non-exhaustive list) include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM) ), read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable read only memory (CDROM).
  • the computer readable storage medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if necessary, other Processing is performed in a suitable manner to obtain the program electronically and then stored in computer memory.
  • a computer readable signal medium may include a data signal that is propagated in a baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use in or in connection with an instruction execution system, an input method, or a device.
  • Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), and the like, or any suitable combination of the foregoing.
  • portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as separate products, may also be stored in a computer readable storage medium.
  • the storage medium may be a read only memory, a magnetic disk or an optical disk, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de recherche d'une ressource de contenu et un serveur. Le procédé consiste à : acquérir une image prédéfinie et au moins une ressource de contenu candidate, et extraire séparément des caractéristiques de texte et des caractéristiques visuelles de l'image prédéfinie ainsi que des caractéristiques de texte et des caractéristiques visuelles de la ressource de contenu candidate ; déterminer des similarités de texte entre les caractéristiques de texte de l'image prédéfinie et les caractéristiques de texte de la ressource de contenu candidate ; déterminer des similarités visuelles entre les caractéristiques visuelles de l'image prédéfinie et les caractéristiques visuelles de la ressource de contenu candidate ; et déterminer, en fonction des similarités de texte et des similarités visuelles déterminées, une ressource de contenu cible à partir de ladite ressource de contenu candidate. La solution technique fournie dans les modes de réalisation de la présente invention combine les similarités de texte et les similarités visuelles dans un processus de recherche de contenu, ce qui permet des recherches précises de ressources de contenu requises.
PCT/CN2018/111433 2018-03-09 2018-10-23 Procédé et dispositif de recherche de ressource de contenu, et serveur WO2019169872A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810195551.6 2018-03-09
CN201810195551.6A CN108416028B (zh) 2018-03-09 2018-03-09 一种搜索内容资源的方法、装置及服务器

Publications (1)

Publication Number Publication Date
WO2019169872A1 true WO2019169872A1 (fr) 2019-09-12

Family

ID=63130764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/111433 WO2019169872A1 (fr) 2018-03-09 2018-10-23 Procédé et dispositif de recherche de ressource de contenu, et serveur

Country Status (2)

Country Link
CN (1) CN108416028B (fr)
WO (1) WO2019169872A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001451A (zh) * 2020-08-27 2020-11-27 上海擎感智能科技有限公司 数据冗余处理方法、系统、介质及装置
CN113761252A (zh) * 2020-06-03 2021-12-07 华为技术有限公司 文本配图的方法、装置及电子设备
CN115150297A (zh) * 2022-08-15 2022-10-04 北京百润洪科技有限公司 一种基于移动互联网的数据过滤及内容评价方法和系统
CN115243081A (zh) * 2022-09-23 2022-10-25 北京润尼尔网络科技有限公司 一种基于vr的镜像分发方法

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416028B (zh) * 2018-03-09 2021-09-21 北京百度网讯科技有限公司 一种搜索内容资源的方法、装置及服务器
CN109271552B (zh) * 2018-08-22 2021-08-20 北京达佳互联信息技术有限公司 通过图片检索视频的方法、装置、电子设备及存储介质
CN109558523B (zh) * 2018-11-06 2021-09-21 广东美的制冷设备有限公司 搜索处理方法、装置及终端设备
CN111475603B (zh) * 2019-01-23 2023-07-04 百度在线网络技术(北京)有限公司 企业标识识别方法、装置、计算机设备及存储介质
CN111866609B (zh) * 2019-04-08 2022-12-13 百度(美国)有限责任公司 用于生成视频的方法和装置
CN111782982A (zh) * 2019-05-20 2020-10-16 北京京东尚科信息技术有限公司 搜索结果的排序方法、装置和计算机可读存储介质
CN111782841A (zh) * 2019-11-27 2020-10-16 北京沃东天骏信息技术有限公司 图像搜索方法、装置、设备和计算机可读介质
CN113536026B (zh) * 2020-04-13 2024-01-23 阿里巴巴集团控股有限公司 音频搜索方法、装置及设备
CN111694978B (zh) * 2020-05-20 2023-04-28 Oppo(重庆)智能科技有限公司 图像相似度检测方法、装置、存储介质与电子设备
CN113590854B (zh) * 2021-09-29 2021-12-31 腾讯科技(深圳)有限公司 一种数据处理方法、设备以及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388022A (zh) * 2008-08-12 2009-03-18 北京交通大学 一种融合文本语义和视觉内容的Web人像检索方法
CN101634996A (zh) * 2009-08-13 2010-01-27 浙江大学 基于综合考量的个性化视频排序方法
CN104298749A (zh) * 2014-10-14 2015-01-21 杭州淘淘搜科技有限公司 一种图像视觉和文本语义融合商品检索方法
CN108416028A (zh) * 2018-03-09 2018-08-17 北京百度网讯科技有限公司 一种搜索内容资源的方法、装置及服务器

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100456300C (zh) * 2006-10-27 2009-01-28 北京航空航天大学 基于二维草图的三维模型检索方法
CN103793434A (zh) * 2012-11-02 2014-05-14 北京百度网讯科技有限公司 一种基于内容的图片搜索方法和装置
TWI536186B (zh) * 2013-12-12 2016-06-01 三緯國際立體列印科技股份有限公司 三維圖檔搜尋方法與三維圖檔搜尋系統
CN106202256B (zh) * 2016-06-29 2019-12-17 西安电子科技大学 基于语义传播及混合多示例学习的Web图像检索方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388022A (zh) * 2008-08-12 2009-03-18 北京交通大学 一种融合文本语义和视觉内容的Web人像检索方法
CN101634996A (zh) * 2009-08-13 2010-01-27 浙江大学 基于综合考量的个性化视频排序方法
CN104298749A (zh) * 2014-10-14 2015-01-21 杭州淘淘搜科技有限公司 一种图像视觉和文本语义融合商品检索方法
CN108416028A (zh) * 2018-03-09 2018-08-17 北京百度网讯科技有限公司 一种搜索内容资源的方法、装置及服务器

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761252A (zh) * 2020-06-03 2021-12-07 华为技术有限公司 文本配图的方法、装置及电子设备
CN112001451A (zh) * 2020-08-27 2020-11-27 上海擎感智能科技有限公司 数据冗余处理方法、系统、介质及装置
CN115150297A (zh) * 2022-08-15 2022-10-04 北京百润洪科技有限公司 一种基于移动互联网的数据过滤及内容评价方法和系统
CN115150297B (zh) * 2022-08-15 2023-05-19 雁展科技(深圳)有限公司 一种基于移动互联网的数据过滤及内容评价方法和系统
CN115243081A (zh) * 2022-09-23 2022-10-25 北京润尼尔网络科技有限公司 一种基于vr的镜像分发方法

Also Published As

Publication number Publication date
CN108416028B (zh) 2021-09-21
CN108416028A (zh) 2018-08-17

Similar Documents

Publication Publication Date Title
WO2019169872A1 (fr) Procédé et dispositif de recherche de ressource de contenu, et serveur
US11409791B2 (en) Joint heterogeneous language-vision embeddings for video tagging and search
CN111062871B (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
CN108334627B (zh) 新媒体内容的搜索方法、装置和计算机设备
US9372920B2 (en) Identifying textual terms in response to a visual query
WO2020155423A1 (fr) Procédé et appareil d'extraction d'informations inter-modes, et support de stockage
US8788434B2 (en) Search with joint image-audio queries
JP6047550B2 (ja) 検索方法、クライアント及びサーバ
US20150178321A1 (en) Image-based 3d model search and retrieval
CN110516096A (zh) 合成感知数字图像搜索
WO2019062044A1 (fr) Procédé d'interaction entre livre numérique et thème de livre numérique, dispositif informatique et support d'informations
JP6361351B2 (ja) 発話ワードをランク付けする方法、プログラム及び計算処理システム
CN107533638B (zh) 利用标签正确性概率来注释视频
CN102549603A (zh) 基于相关性的图像选择
CN108763244B (zh) 在图像内搜索和注释
US20200012862A1 (en) Multi-model Techniques to Generate Video Metadata
WO2023108980A1 (fr) Procédé et dispositif de poussée d'informations basés sur un échantillon de publicité textuelle
CN109033385A (zh) 图片检索方法、装置、服务器及存储介质
WO2016107125A1 (fr) Procédé et appareil de recherche d'informations
US8903182B1 (en) Image classification
TW201931163A (zh) 影像搜尋方法、系統和索引建構方法和媒體
CN116578738B (zh) 一种基于图注意力和生成对抗网络的图文检索方法和装置
Kuhn et al. Large Scale Video Analytics: On-demand, iterative inquiry for moving image research
JP4774087B2 (ja) 動画評価方法、装置及びプログラム
CN116977701A (zh) 视频分类模型训练的方法、视频分类的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18908971

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/01/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18908971

Country of ref document: EP

Kind code of ref document: A1