WO2019169872A1 - Method and device for searching for content resource, and server - Google Patents

Method and device for searching for content resource, and server Download PDF

Info

Publication number
WO2019169872A1
WO2019169872A1 PCT/CN2018/111433 CN2018111433W WO2019169872A1 WO 2019169872 A1 WO2019169872 A1 WO 2019169872A1 CN 2018111433 W CN2018111433 W CN 2018111433W WO 2019169872 A1 WO2019169872 A1 WO 2019169872A1
Authority
WO
WIPO (PCT)
Prior art keywords
similarity
visual
content resource
text
preset
Prior art date
Application number
PCT/CN2018/111433
Other languages
French (fr)
Chinese (zh)
Inventor
董维山
王园
毛妤
袁洁
陈曼仪
杨茗名
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2019169872A1 publication Critical patent/WO2019169872A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of computer network technologies, and in particular, to a method, an apparatus, and a server for searching for content resources.
  • word frequency TF-IDF term frequency-inverse document frequency
  • word2vec word vector
  • the embodiment of the present application provides a method, an apparatus, and a server for searching for a content resource, to solve or alleviate one or more technical problems in the prior art, and at least provide a beneficial choice.
  • the embodiment of the present application provides a method for searching for a content resource, including:
  • the target content resource is determined from the at least one candidate content resource based on the determined text similarity and visual similarity.
  • the present application determines, according to the determined text similarity and visual similarity, the target content resource from the at least one candidate content resource, including:
  • the target content resource is determined from the at least one candidate content resource based on the obtained overall similarity.
  • the overall similarity between the preset picture and the candidate content resource is obtained according to the determined text similarity and visual similarity Degree, including:
  • the text feature of the preset picture is extracted, including:
  • the first embodiment of the first aspect or the second embodiment of the first aspect or the third embodiment of the first aspect, the fourth embodiment of the first aspect of the present application Determining text similarity between the text feature of the preset picture and the text feature of the candidate content resource, including:
  • the content repository includes a plurality of content resources and their corresponding text labels.
  • the application is further in the fifth embodiment of the first aspect
  • the method includes: determining a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource, including:
  • the sampling, by the candidate content resource includes:
  • the candidate content resources are subjected to perspective sampling in a preset observation manner and a sampling manner;
  • the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  • an embodiment of the present application provides an apparatus for searching for a content resource, including:
  • An acquiring module configured to acquire a preset picture and at least one candidate content resource, and separately extract text features and visual features of the preset picture and text features and visual features of the candidate content resource;
  • a first determining module configured to determine a text similarity between a text feature of the preset picture and a text feature of the candidate content resource
  • a second determining module configured to determine a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource
  • the target determining module is configured to determine the target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity.
  • the target determining module includes:
  • a first calculation submodule configured to obtain an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity
  • the target determining submodule is configured to determine the target content resource from the at least one candidate content resource according to the obtained overall similarity.
  • the first computing submodule is further configured to:
  • the acquiring module includes:
  • An identifier module configured to identify a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture;
  • the extracting sub-module is configured to obtain a corresponding webpage content according to the uniform resource locator of the preset image, and extract a text feature of the preset image from the webpage content.
  • the first determining module includes:
  • a first determining submodule configured to obtain a content resource in the preset content resource library as the at least one candidate content resource, and determine that the text feature of the preset image is similar to the text label of the candidate content resource
  • the preset content resource library includes a plurality of content resources and corresponding text labels.
  • the second determining module includes:
  • sampling sub-module configured to sample the candidate content resource to obtain at least one sample picture corresponding to the content resource
  • a second determining submodule configured to determine a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture for each sampled picture
  • a third determining submodule configured to determine a visual feature of the candidate content resource according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset image The visual similarity between the visual features of the preset picture.
  • the sampling submodule is further configured to:
  • the candidate content resources are subjected to perspective sampling in a preset observation manner and a sampling manner;
  • the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  • an embodiment of the present application provides a server, where the server includes:
  • One or more processors are One or more processors;
  • a storage device for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described above.
  • an embodiment of the present application provides a computer readable storage medium, configured to store computer software instructions for a device for searching for a content resource, where the method for searching for a content resource in the foregoing first aspect is searching for content The program involved in the device of the resource.
  • the text feature of the preset picture and the text of each content resource may be determined based on the text feature and the visual feature of the preset picture.
  • the text similarity between the features determining a visual similarity between the visual features of the preset picture and the visual features of each content resource, and then determining the target content from the content resources according to the determined text similarity and visual similarity Resources. Since the similarity and visual similarity of the text are combined in the search process, the desired content resources can be accurately searched.
  • FIG. 1 is a flowchart of a method for searching for content resources according to Embodiment 1 of the present application;
  • FIG. 2 is a flowchart of a method for searching for content resources according to Embodiment 2 of the present application
  • FIG. 3 is a schematic diagram of performing perspective sampling on content resources in a method for searching for content resources according to Embodiment 2 of the present application;
  • FIG. 4 is a schematic diagram of a comparison of visual features of a preset picture and a content resource in a method for searching for a content resource according to Embodiment 2 of the present application;
  • FIG. 5 is a flowchart of a method for searching for content resources according to Embodiment 3 of the present application.
  • FIG. 6 is a schematic diagram of an apparatus for searching for content resources according to Embodiment 4 of the present application.
  • FIG. 7 is a schematic diagram of an apparatus for searching for content resources according to Embodiment 5 of the present application.
  • FIG. 8 is a schematic diagram of a server according to Embodiment 6 of the present application.
  • FIG. 1 is a flowchart of a method for searching for content resources according to an embodiment of the present application.
  • the method for searching for content resources in the embodiment of the present application includes the following steps:
  • S101 Acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource.
  • the preset picture may include, but is not limited to, a network picture, a picture stored in an album, a picture taken by a camera, or a hand-drawn sketch.
  • the embodiments of the present application may receive a picture sent by a client as a preset picture according to various API (Application Programming Interface) interfaces of a protocol such as HTTP (HyperText Transfer Protocol) and HTTPS.
  • the preset image is obtained by the webpage address of the image input by the user.
  • the method for obtaining the text feature of the preset picture may be: analyzing the preset picture to obtain one or more short texts capable of describing or representing the picture content, thereby converting the query condition in the form of the picture into the query condition in the text form.
  • the specific analysis method may include: constructing a picture classifier by using a machine learning algorithm, and then inputting a preset picture into a picture classifier for analysis.
  • the picture classifier can be used to analyze the preset picture to obtain the picture content, and output the description text of the preset picture. For example, if you enter the picture of Tyrannosaurus in the image classifier, you can output the text "T-Rex".
  • Content resources related to embodiments of the present application include, but are not limited to, normal video, panoramic picture, panoramic video, three-dimensional (3D) model, three-dimensional animation, and their presentation in virtual reality (VR) and augmented reality (AR) scenarios.
  • VR virtual reality
  • AR augmented reality
  • a panoramic photo (PANORAMIC PHOTO or PANORAMA) includes a normal effective viewing angle (approximately 90 degrees horizontally, 70 degrees vertical) of a person's eyes or a binocular residual light angle (approximately 180 degrees horizontally, 90 degrees vertical) or even 360 degrees complete. A photo taken in the scene range.
  • the content resource may be obtained by a web crawler from the Internet or by a content producer.
  • web crawler technology can be used to obtain content resources.
  • Content resource producers can also create content resources and build them into content repositories.
  • Content resources can be tagged with text to facilitate classification, management, and retrieval.
  • the content repository can be updated at preset intervals. In this way, when searching for content resources, you can search in the content resource library to improve search efficiency.
  • the text features of the preset picture and the text features of each content resource in the content resource library may be compared one by one to determine the text similarity between the preset picture and each content resource.
  • the text feature of the preset picture is "Tiananmen.” If a content resource is a panorama, the text feature of the content resource is "Tiananmen Square.” Comparing the "Tiananmen" of the preset picture with the "Tiananmen Square" of the content resource, it can be determined that the text similarity is high. If the text feature of another content resource is "Nanjing", the "Tiananmen" of the preset picture is compared with the "Nanjing" of the content resource, and the text similarity between the two is low.
  • the visual feature may be attribute data that represents semantics of the image, such as color, texture, and the like of the image.
  • step S101 of the embodiment of the present application the text feature and the visual feature of the preset picture may be simultaneously extracted, and the text feature and the visual feature of the preset picture may be separately extracted.
  • the order of extracting the text features and the visual features of the preset picture is not limited.
  • step S102 may be performed to determine the text similarity
  • step 103 is performed to determine the visual similarity.
  • step 103 is performed to determine the visual similarity
  • step S102 is performed to determine the text similarity.
  • the extraction of these two features and the corresponding process of determining the similarity can also be performed in parallel.
  • the technical solution of the search provided by the embodiment of the present application may determine the text similarity between the text feature of the preset image and the text feature of each content resource according to the text feature and the visual feature of the preset image, and determine the visual feature of the preset image.
  • the visual similarity with the visual features of each content resource and then determining the target content resource from each content resource according to the determined text similarity and visual similarity, since the similarity and vision of the text are combined in the process of searching Similarity, which can accurately search for the required content resources, and is suitable for searching various content resources such as panoramic pictures, panoramic videos, three-dimensional models, three-dimensional animations, display in virtual reality and augmented reality scenes.
  • FIG. 2 is a flowchart of a method for searching for content resources according to an embodiment of the present application.
  • the method for searching for content resources in the embodiment of the present application includes the following steps:
  • S201 Acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource.
  • the preset picture classification model may be used to identify the picture content of the preset picture, and the text feature is extracted from the preset picture.
  • the image classification model can be trained based on the convolutional neural network algorithm and according to the vertical class, and the image of the Tyrannosaurus can be input to the image classification model, and the image classification model can output the text classification label "Treasure Dragon".
  • the text feature of the preset picture may be extracted by acquiring a corresponding webpage content according to a Uniform Resource Locator (URL) of the preset image, and extracting a preset from the webpage content.
  • the textual characteristics of the picture For example, when the preset picture is from the Internet, or when the web page contains the same picture as the preset picture, the URL of the preset picture or the same picture can be obtained.
  • the content included in the webpage indicated by the URL is processed to extract the text feature of the preset image. For example, input the preset picture of the Tiananmen Gate and the web address of the preset picture, and process the content in the webpage to generate a short text "Tiananmen" and then output the short text "Tiananmen".
  • the content resource in the preset content resource library is obtained as the at least one candidate content resource, and the text similarity between the text feature of the preset image and the text label of the candidate content resource is determined.
  • a text label for a content resource can be set as it is generated.
  • it can be implemented using a "keyword text similarity calculation" module based on natural language processing technology commonly used by web search engines.
  • the following example shows the keyword text similarity calculation process: given two keywords (both short texts), the text similarity calculation model is adopted, which is constructed based on user keyword data and click log data, and has been offline.
  • the pre-training is completed (for example, based on a neural network or a bag model), and the semantic similarity of the two keywords is scored by a text similarity calculation model. The higher the score, the closer the two keywords are semantically similar, and vice versa.
  • the output score value ranges from [-1, 1].
  • the similarity scores of "Tiananmen Square” and “Tiananmen Square” are set to s, the value of s should be close to 1, and the similarity scores of "Tiananmen” and “Nanjing Road” should be significantly lower than s.
  • the content resources may be sampled in the visual space in a preset observation mode and a sampling manner.
  • the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  • the sampling manner may include equal interval sampling, random sampling, sampling based on user interaction history record distribution, and the like.
  • the user's observation mode can be simulated to sample the angle of view of the entire visible space, that is, the content resources are planarly projected at the simulated observation point, and the corresponding point is obtained. Sampling the picture and then adjusting the simulated observation point, ie changing the viewing position.
  • the perspective sampling method is common to all types of content resources.
  • the sampling interval also needs to consider factors such as calculation amount, storage space and accuracy, and recall rate.
  • the content resource is a 3D model of Tyrannosaurus Rex, and different observation modes are used, for example, a certain plane angle is rotated, and a 3D model of Tyrannosaurus Rex is sampled once to obtain a sample picture 1 and a sample picture 2 , sample picture 3... sample picture n.
  • S204 Determine, for each sample picture, a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture.
  • a picture feature extractor is used to extract a visual feature of a preset picture.
  • This step can be implemented by a visual feature-based "similar map retrieval" module commonly used by image search engines in the conventional technology.
  • the process of similarity map retrieval includes: taking a preset picture, and adopting a pre-trained or pre-trained picture feature extractor (for example, based on a convolutional neural network or the like) to perform visual feature extraction on the preset picture.
  • the above method can also be used to extract the visual features of the sampled picture.
  • the visual features of the preset picture and the visual features of each sample picture are determined, and the visual similarity between the preset picture and each sample picture is obtained.
  • S205 Determine, according to a visual similarity between a visual feature of each sample picture corresponding to the candidate content resource and a visual feature of the preset picture, a visual feature of the candidate content resource and a visual of the preset picture. Visual similarity between features.
  • step S205 for the candidate content resource, multiple sampled pictures are obtained.
  • the preset pictures are respectively compared with each sampled picture of the candidate content resources, and the visual similarity between the preset picture and each sampled picture is calculated. Degree, and output visual similarity for all sampled pictures.
  • the visual similarity between the visual feature of the preset picture and the visual feature of each sample picture of the candidate content resource is determined.
  • a higher visual similarity indicates that the preset picture is closer to the sampled picture in visual semantics, and vice versa.
  • Step S206 includes: A, selecting the determined text similarity and visual similarity based on the preset threshold; B, obtaining the preset picture and the according to the selected text similarity and visual similarity The overall similarity of candidate content resources.
  • methods for calculating the overall similarity include, but are not limited to, linear weighting, product, value domain normalization, and the like.
  • linear weighting method as an example: suppose a content resource corresponds to a text feature, and correspondingly, the preset image has a value corresponding to the text similarity of a content resource.
  • the content resource may be sampled to obtain a plurality of sampled pictures, and corresponding multiple visual similarities are obtained.
  • the text similarity is taken as an item in the formula
  • each visual similarity is taken as an item in the formula, and then each item is multiplied by the corresponding weight, and then summed to obtain the overall similarity, the formula as follows:
  • Q represents the overall degree of similarity
  • a, b, c -> n is the weight
  • S 0 represents text similarity
  • S 1 represents text similarity
  • Determining the target content resource can be done in the following ways:
  • the content resource whose overall similarity is greater than the first preset threshold is the target content resource. For example, if the first preset threshold is 80%, the content resources whose overall similarity is greater than 80% are outputted among all the content resources that are searched.
  • the searched content resources may be sorted according to the overall similarity size, and the content resources ranked in the first few bits are determined as the target content resources.
  • the content function is sorted in order of overall similarity by using the ranking function rank, and the first five content resources are reserved.
  • the text similarity and/or the visual similarity may be separately filtered according to the value.
  • the content resource whose text similarity is smaller than the first preset threshold is filtered; and the content resource whose visual similarity is less than the second preset threshold is filtered. This can reduce the amount of content resources that need to calculate the overall similarity, thereby reducing the amount of calculation and improving the calculation efficiency.
  • the preset picture is a certain scenic spot picture
  • the candidate content resource is a panoramic picture.
  • the panoramic view is sampled to obtain a plurality of sampled pictures, and then the scenic picture is compared with each sampled picture. If the two match, the panoramic image corresponding to the sampled image is a content resource matching the image of a certain scenic spot, and the panoramic image is output.
  • each content resource in the content resource library is used as a candidate content resource, but sometimes in order to reduce the amount of calculation, only a certain category of content resources is used as a candidate content resource, and then the above steps are repeated to obtain a preset image and The overall similarity of each candidate content resource, and then the candidate content is sorted according to the size of the overall similarity. The higher the overall similarity value, the higher the similarity between the preset image and the content resource.
  • each content resource has its corresponding identifier.
  • the searched content resource when the searched content resource is output, the content resource itself is not directly output, but the identifier (ID) of the content resource is output.
  • the identifier (ID) of the corresponding content resource is output.
  • the content resources are sorted according to the overall similarity size, and then the IDs of the first n content resources are output.
  • the content resource storage address can be obtained according to the ID, and then the user selects the multimedia file to be displayed in the candidate multimedia file list through a certain interaction mode, for example, selecting a remote player, etc. in the browser interface. , that is, the target content resource.
  • the technical solution of the embodiment of the present application samples the content resources by using different viewing modes and sampling intervals of the content resources, so that the visual features of the preset images can be comprehensively matched with the visual features of the content resources, so that the search content resources are accurate. Higher degrees.
  • the embodiment of the present application provides a method for searching for content resources.
  • FIG. 5 it is a flowchart of a method for searching for content resources according to an embodiment of the present application.
  • the method for searching for content resources in the embodiment of the present application includes:
  • 1Preset picture As the initial input of the system, generated by the browser client, containing the URL content of the picture content and picture.
  • the form of the picture is not limited, and can be a picture file uploaded by the user, a picture taken by the camera or a hand-drawn sketch.
  • 2 network interface receive and parse the preset picture sent by the client, and return the search result of the content resource to the client.
  • Possible implementations include, but are not limited to, various types of API interface definitions based on protocols such as HTTP and HTTPS.
  • 3 Picture Guess Enter the preset picture passed for the network interface, and output as one or more short text segments that can describe or represent the content of the picture.
  • the role of the picture guessing word is to convert the preset picture of the picture form into a keyword of the text form.
  • mapping functions include:
  • the module uses the same picture to match or use the URL information to aggregate, extract, and generate text output on the text information on the source page of the Internet (in the case where the picture is reproduced, there may be multiple source pages) .
  • the module outputs the short text "Tiananmen”.
  • the picture content is identified, and the classification label text is output. For example, given the picture input of Tyrannosaurus Rex, the module outputs the short text "Overlord Dragon".
  • Text similarity calculation input is the output of 3 (the result of the picture guessing word and the set of text labels carried by each resource in the content resource library). This step performs a pairwise matching between the guess word result text and all content resource text labels to obtain multiple text matching pairs, and calculates the text similarity between each two text matching pairs, and outputs the text similarity score.
  • This step can be implemented by a "query text similarity calculation” module based on natural language processing technology commonly used by web search engines.
  • the text similarity calculation function of the preset picture is: given two pieces of keywords (both short texts), using a text similarity calculation model based on user query data and click log offline pre-training (eg based on neural network) Or the word bag model), which scores the semantic similarity of two keywords. The higher the score, the closer the two keywords are semantically similar, and vice versa.
  • the output score value ranges from [-1, 1].
  • the similarity scores of "Tiananmen Square” and “Tiananmen Square” are set to s, the value of s should be close to 1, and the similarity scores of "Tiananmen” and “Nanjing Road” should be significantly lower than s.
  • the content resource library is a collection of various resources, which are provided by search engine crawler crawling or content producers, and each resource has a text label for classification, management and retrieval.
  • 6-view sampling The input is any resource in the content library, and the output is several sample images.
  • the viewing angle can be viewed in the entire visible space by simulating and changing the user's viewing manner, including but not limited to viewing position, angle, visual range, and the like. Sampling, obtaining multiple images, each of which is a plane projection of the content to the simulated observation point. Viewing angle sampling can be used for panoramic/3D/AR/VR content.
  • sampling interval of viewing position, angle and visual range can be traded off between calculation amount, storage space and accuracy, and recall rate; for panoramic video containing animation content and
  • the 3D animation is further matched with the frame sampling to generate an output sample picture on the time axis, and the sampling time interval is also trade-off between the calculation amount, the storage space and the accuracy, and the recall rate.
  • Typical sampling techniques include, but are not limited to, equally spaced sampling, random sampling, sampling based on user interaction history distribution, and the like.
  • This step can be implemented by a visual feature-based "similar map retrieval" module commonly used by image search engines.
  • the function of the similarity map search is: given a preset picture, using a pre-defined or offline pre-trained picture feature extractor (eg based on a convolutional neural network, etc.) to perform visual feature extraction on the preset picture, The extracted features are compared with the features of each picture in the picture library, and the similarity of the visual features is scored. The higher the score, the closer the visual difference between the preset picture and the picture in a certain library, and vice versa.
  • the input is the output of 4 and 6, that is, the text matching pair of the preset picture and the resource in the content library and the similarity score of the picture matching pair in the text and the visual
  • the output is the overall similarity and corresponding Candidate content resource ID.
  • the overall similarity calculation is based on a combination of text and visual similarity scores, and possible implementations include, but are not limited to, linear weighting, product, range normalization, and the like. At the same time, additional factors may be considered, including but not limited to content quality assessment indices (high quality, low quality, resolution, model sophistication, etc.), user history click records, laws and regulations, and the like.
  • the text similarity score and the visual similarity score entering the calculation may be separately filtered, for example, the similarity score below a certain threshold is directly filtered. Do not enter the overall similarity calculation process to reduce the amount of calculation.
  • the output of input 7 is the overall similarity and the corresponding candidate content resource ID, and the output is the first k candidate content resource IDs arranged in descending order of the overall similarity score.
  • the user selects the content resource to be displayed in the candidate content resource list through a certain interaction mode in the browser interface, and is displayed by the browser client.
  • 3-9 can be pre-calculated by offline method, thereby accelerating the online search process.
  • the image library of the whole webpage may be processed in advance in an offline manner, the similarity score calculation and sorting may be performed offline, a static lookup table structure may be established, and the image in any webpage may be associated with the content resource. .
  • This lookup table can be updated in incremental calculations.
  • 3-9 can be calculated online.
  • the above online and offline calculation processes can be accelerated by techniques such as parallel computing.
  • the typical matching result is shown in the embodiment shown in FIG. 3. It can be seen that the matching can be accurate to a specific viewing angle, the matching precision is high, and the user experience is good.
  • the technical solution of the embodiment of the present application samples the content resources by using different viewing modes and sampling intervals of the content resources, so that the visual features of the preset images can be comprehensively matched with the visual features of the content resources, so that the search content resources are accurate. Higher degrees.
  • FIG. 6 is a schematic diagram of an apparatus for searching for content resources according to an embodiment of the present application.
  • the device for searching for content resources in the embodiment of the present application includes:
  • the obtaining module 61 is configured to acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource;
  • the first determining module 62 is configured to determine a text similarity between the text feature of the preset picture and the text feature of the candidate content resource;
  • a second determining module 63 configured to determine a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource
  • the target determining module 64 is configured to determine the target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity.
  • the technical solution of the embodiment of the present application can realize the combination of the text features of the preset picture and the content resource, and the accuracy of searching for the content resource is high.
  • the technical effect is the same as that of the first embodiment, and the technical effect is not the same. Let me repeat.
  • FIG. 7 is a schematic diagram of an apparatus for searching for content resources according to an embodiment of the present application.
  • the device for searching content resources in the embodiment of the present application :
  • the target determination module 64 includes:
  • the first calculation sub-module 641 is configured to obtain an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity;
  • the target determining sub-module 642 is configured to determine the target content resource from the at least one candidate content resource according to the obtained overall similarity.
  • the first calculation submodule is further configured to:
  • the obtaining module 61 includes:
  • the identification sub-module 611 is configured to identify a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture; or
  • the extraction sub-module 612 is configured to obtain a corresponding webpage content according to the uniform resource locator of the preset image, and extract a text feature of the preset image from the webpage content.
  • the first determining module 62 includes:
  • the first determining sub-module 621 is configured to acquire a content resource in the preset content resource library as the at least one candidate content resource, and determine a text between the text feature of the preset image and the text label of the candidate content resource. Similarity, wherein the preset content resource library includes a plurality of content resources and their corresponding text labels.
  • the second determining module 63 includes:
  • the sampling sub-module 631 is configured to sample the candidate content resource to obtain at least one sample picture corresponding to the content resource;
  • a second determining sub-module 632 configured to determine a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture for each sampled picture
  • the third determining sub-module 633 is configured to determine a visual feature of the candidate content resource according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset picture The visual similarity between the visual features of the preset picture.
  • the sampling sub-module is specifically configured to: perform, in a visible space, a sampling manner of the candidate content resource in a preset observation manner and a sampling manner;
  • the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  • the technical solution of the embodiment of the present application can sample the content resources by using multiple observation modes and sampling modes, so that the accuracy of searching the content resources is high, and the technical effect is the same as that of the second embodiment, and details are not described herein again. .
  • the embodiment of the present application provides an information classification device.
  • the device includes a memory 81 and a processor 82.
  • the memory 81 stores a computer program executable on the processor 82.
  • the processor 82 implements the information classification method in the above embodiment when the computer program is executed.
  • the number of memories 81 and processors 82 may be one or more.
  • the device also includes:
  • the communication interface 83 is used for communication between the memory 81 and the processor 82 and an external device.
  • the memory 81 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • the bus may be an Industrial Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Component (EISA) bus.
  • ISA Industrial Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Component
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 8, but it does not mean that there is only one bus or one type of bus.
  • the memory 81, the processor 82, and the communication interface 83 are integrated on one chip, the memory 81, the processor 82, and the communication interface 83 can complete communication with each other through the internal interface.
  • a computer readable storage medium storing a computer program that, when executed by a processor, implements the method of any of the embodiments of FIGS. 1-5.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” and “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is two or more unless specifically and specifically defined otherwise.
  • a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with such an instruction execution system, apparatus, or device.
  • the computer readable medium described in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. More specific examples of computer readable storage media, at least (non-exhaustive list) include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM) ), read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable read only memory (CDROM).
  • the computer readable storage medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if necessary, other Processing is performed in a suitable manner to obtain the program electronically and then stored in computer memory.
  • a computer readable signal medium may include a data signal that is propagated in a baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use in or in connection with an instruction execution system, an input method, or a device.
  • Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), and the like, or any suitable combination of the foregoing.
  • portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as separate products, may also be stored in a computer readable storage medium.
  • the storage medium may be a read only memory, a magnetic disk or an optical disk, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a method and a device for searching for a content resource, and a server. The method comprises: acquiring a preset image and at least one candidate content resource, and separately extracting text features and visual features of the preset image and text features and visual features of the candidate content resource; determining text similarities between the text features of the preset image and the text features of the candidate content resource; determining visual similarities between the visual features of the preset image and the visual feature of the candidate content resource; and determining, according to the determined text similarities and visual similarities, a target content resource from the at least one candidate content resource. The technical solution provided in embodiments of the present application combines text similarities and visual similarities in a content searching process, thereby enabling accurate searches for required content resources.

Description

搜索内容资源的方法、装置和服务器Method, device and server for searching content resources
本申请要求于2018年3月9日提交中国专利局、申请号为201810195551.6发明名称为“一种搜索内容资源的方法、装置和服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to the Chinese Patent Application entitled "Method, Apparatus and Server for Searching for Content Resources", filed on March 9, 2018, in the Chinese Patent Office, Application No. 201101195551.6, the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本申请涉及计算机网络技术领域,尤其涉及一种搜索内容资源的方法、装置及服务器。The present application relates to the field of computer network technologies, and in particular, to a method, an apparatus, and a server for searching for content resources.
背景技术Background technique
随着计算机技术的发展,出现了许多新类型的内容资源,例如全景(panorama)图片、全景视频、三维(3D)模型、三维动画及其在虚拟现实(virtual reality,VR)及增强现实(augmented reality,AR)场景下的展示等。同时,摄影技术(如鱼眼镜头)、建模技术和编程工具也在不断发展,这就使得产生上述内容资源越来越容易。在互联网上,这种新类型的内容资源也越来越多的出现。与传统的文本、二维图片、普通视频和音频等相比,这些内容资源具有连贯性、多线性、多角度、临场感、大空间、高交互、信息即时性、线上联动线下等优点。With the development of computer technology, many new types of content resources have appeared, such as panoramic images, panoramic videos, three-dimensional (3D) models, three-dimensional animations and their virtual reality (VR) and augmented reality (augmented). Reality, AR) shows under the scene. At the same time, photographic techniques (such as fisheye lenses), modeling techniques, and programming tools are constantly evolving, making it easier to generate the above-mentioned content resources. On the Internet, this new type of content resource is also appearing more and more. Compared with traditional text, 2D pictures, ordinary video and audio, these content resources have the advantages of coherence, multi-linearity, multi-angle, presence, large space, high interaction, information immediacy, online linkage and offline. .
传统的互联网搜索技术主要利用文本信息对海量网页内容进行索引。典型地,利用词频TF-IDF(term frequency–inverse document frequency,词频-逆向文件频率)以及词向量(word2vec)技术等,在网页库中建立文字索引,并搜索包含的内容与用户文本查询(query)匹配的网页。随着图片、视频内容的大量出现以及深度神经网络技术的发展,还出现了图片搜索、语音搜索和音乐搜索等。Traditional Internet search technology mainly uses text information to index massive web content. Typically, word frequency TF-IDF (term frequency-inverse document frequency) and word vector (word2vec) technology are used to create a text index in a webpage library, and search for content and user text query (query) ) matching pages. With the emergence of a large number of pictures and video content and the development of deep neural network technology, image search, voice search and music search have also appeared.
然而,针对上文提到的新类型的内容资源,由于其形态超越了文本、普通二维图片、视频、音乐的表达空间,用户很难方便、快捷地使用当前的搜索引擎技术搜索这些内容资源。However, for the new types of content resources mentioned above, because their form transcends the expression space of text, ordinary two-dimensional pictures, videos, and music, it is difficult for users to search for these content resources conveniently and quickly using current search engine technology. .
发明内容Summary of the invention
本申请实施例提供一种搜索内容资源的方法、装置及服务器,以解决或 缓解背景技术中的一项或多项以上技术问题,至少提供一种有益的选择。The embodiment of the present application provides a method, an apparatus, and a server for searching for a content resource, to solve or alleviate one or more technical problems in the prior art, and at least provide a beneficial choice.
第一方面,本申请实施例提供了一种搜索内容资源的方法,包括:In a first aspect, the embodiment of the present application provides a method for searching for a content resource, including:
获取预设图片和至少一个候选内容资源,并分别提取所述预设图片的文本特征和视觉特征以及所述候选内容资源的文本特征和视觉特征;Obtaining a preset picture and at least one candidate content resource, and separately extracting text features and visual features of the preset picture and text features and visual features of the candidate content resource;
确定所述预设图片的文本特征与所述候选内容资源的文本特征之间的文本相似度;Determining a text similarity between a text feature of the preset picture and a text feature of the candidate content resource;
确定所述预设图片的视觉特征与所述候选内容资源的视觉特征之间的视觉相似度;以及Determining a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource;
根据所确定的文本相似度和视觉相似度,从所述至少一个候选内容资源中确定目标内容资源。The target content resource is determined from the at least one candidate content resource based on the determined text similarity and visual similarity.
结合第一方面,本申请在第一方面的第一种实施方式中,根据所确定的文本相似度和视觉相似度,从所述至少一个候选内容资源中确定目标内容资源,包括:With reference to the first aspect, in the first implementation manner of the first aspect, the present application determines, according to the determined text similarity and visual similarity, the target content resource from the at least one candidate content resource, including:
根据所确定的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源之间的总体相似度;以及Obtaining an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity;
根据所得到的总体相似度,从所述至少一个候选内容资源中确定目标内容资源。The target content resource is determined from the at least one candidate content resource based on the obtained overall similarity.
结合第一方面的第一种实施方式,在第一方面的第二种实施方式中,根据所确定的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度,包括:With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the overall similarity between the preset picture and the candidate content resource is obtained according to the determined text similarity and visual similarity Degree, including:
基于预设阈值,对所确定的文本相似度和视觉相似度进行选择;以及Selecting the determined text similarity and visual similarity based on a preset threshold;
根据所选择的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度。Obtaining an overall similarity between the preset picture and the candidate content resource according to the selected text similarity and visual similarity.
结合第一方面,本申请在第一方面的第三种实施方式中,提取所述预设图片的文本特征,包括:With reference to the first aspect, in the third implementation manner of the first aspect, the text feature of the preset picture is extracted, including:
采用预设图片分类模型对所述预设图片中包含的图片内容进行识别,以从所述预设图片中提取文本特征;或Identifying a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture; or
根据所述预设图片的统一资源定位符,获取对应的网页内容,从所述网页内容中提取所述预设图片的文本特征。Obtaining corresponding webpage content according to the uniform resource locator of the preset image, and extracting text features of the preset image from the webpage content.
结合第一方面,第一方面的第一种实施方式或第一方面的第二种实施方式或第一方面的第三种实施方式,本申请在第一方面的第四种实施方式中,所述确定所述预设图片的文本特征与所述候选内容资源的文本特征之间的文本相似度,包括:With reference to the first aspect, the first embodiment of the first aspect or the second embodiment of the first aspect or the third embodiment of the first aspect, the fourth embodiment of the first aspect of the present application Determining text similarity between the text feature of the preset picture and the text feature of the candidate content resource, including:
获取预设内容资源库中的内容资源作为所述至少一个候选内容资源,确定所述预设图片的文本特征与所述候选内容资源的文本标签之间的文本相似度,其中,所述预设内容资源库包括多个内容资源及其对应的文本标签。Obtaining a content resource in the preset content resource library as the at least one candidate content resource, and determining a text similarity between a text feature of the preset image and a text tag of the candidate content resource, where the preset The content repository includes a plurality of content resources and their corresponding text labels.
结合第一方面、第一方面的第一种实施方式或第一方面的第二种实施方式或第一方面的第三种实施方式,本申请在第一方面的第五种实施方式中,还包括:所述确定所述预设图片的视觉特征与所述候选内容资源的视觉特征之间的视觉相似度,包括:With reference to the first aspect, the first embodiment of the first aspect or the second embodiment of the first aspect or the third embodiment of the first aspect, the application is further in the fifth embodiment of the first aspect The method includes: determining a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource, including:
对所述候选内容资源进行采样,以获得所述内容资源对应的至少一张采样图片;Sampling the candidate content resource to obtain at least one sample picture corresponding to the content resource;
对于每张采样图片,确定该采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度;以及For each sampled picture, determining a visual similarity between a visual feature of the sampled picture and a visual feature of the predetermined picture;
根据所述候选内容资源对应的每张采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度,确定所述候选内容资源的视觉特征与所述预设图片的视觉特征之间的视觉相似度。Determining a visual feature of the candidate content resource and a visual feature of the preset image according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset image Visual similarity between.
结合第一方面的第五种实施方式,本申请在第一方面的第六种实施方式中,对所述候选内容资源进行采样,包括:With reference to the fifth implementation manner of the first aspect, in the sixth implementation manner of the first aspect, the sampling, by the candidate content resource, includes:
在可视空间内,以预设的观察方式和采样方式,对所述候选内容资源进行视角采样;In the visible space, the candidate content resources are subjected to perspective sampling in a preset observation manner and a sampling manner;
其中,所述预设的观察方式基于观察位置、角度和可视范围中的至少一项及其组合。Wherein, the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
第二方面,本申请实施例提供了一种搜索内容资源的装置,包括:In a second aspect, an embodiment of the present application provides an apparatus for searching for a content resource, including:
获取模块,配置为获取预设图片和至少一个候选内容资源,并分别提取所述预设图片的文本特征和视觉特征以及所述候选内容资源的文本特征和视觉特征;An acquiring module, configured to acquire a preset picture and at least one candidate content resource, and separately extract text features and visual features of the preset picture and text features and visual features of the candidate content resource;
第一确定模块,配置为确定所述预设图片的文本特征与所述候选内容资源的文本特征之间的文本相似度;a first determining module, configured to determine a text similarity between a text feature of the preset picture and a text feature of the candidate content resource;
第二确定模块,配置为确定所述预设图片的视觉特征与所述候选内容资源的视觉特征之间的视觉相似度;以及a second determining module, configured to determine a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource;
目标确定模块,配置为根据所确定的文本相似度和视觉相似度,从所述至少一个候选内容资源中确定目标内容资源。The target determining module is configured to determine the target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity.
结合第二方面,本申请在第二方面的第一种实施方式中,所述目标确定模块包括:With reference to the second aspect, in the first implementation manner of the second aspect, the target determining module includes:
第一计算子模块,配置为根据所确定的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源之间的总体相似度;以及a first calculation submodule configured to obtain an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity;
目标确定子模块,配置为根据所得到的总体相似度,从所述至少一个候选内容资源中确定目标内容资源。The target determining submodule is configured to determine the target content resource from the at least one candidate content resource according to the obtained overall similarity.
结合第二方面的第二种实施方式,本申请在第二方面的第三种实施方式中,所述第一计算子模块还配置为:With reference to the second implementation manner of the second aspect, in the third implementation manner of the second aspect, the first computing submodule is further configured to:
基于预设阈值,对所确定的文本相似度和视觉相似度进行选择;以及Selecting the determined text similarity and visual similarity based on a preset threshold;
根据所选择的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度。Obtaining an overall similarity between the preset picture and the candidate content resource according to the selected text similarity and visual similarity.
结合第二方面,本申请在第二方面的第三种实施方式中,所述获取模块包括:With reference to the second aspect, in the third implementation manner of the second aspect, the acquiring module includes:
识别子模块,配置为采用预设图片分类模型对所述预设图片中包含的图片内容进行识别,以从所述预设图片中提取文本特征;或An identifier module configured to identify a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture; or
提取子模块,配置为根据所述预设图片的统一资源定位符,获取对应的网页内容,从所述网页内容中提取所述预设图片的文本特征。The extracting sub-module is configured to obtain a corresponding webpage content according to the uniform resource locator of the preset image, and extract a text feature of the preset image from the webpage content.
结合第二方面、第二方面的第一种实施方式、第二方面的第二种实施方 式或第二方面的第三种实施方式,在本申请第二方面的第四种实施方式中,所述第一确定模块包括:With reference to the second aspect, the first embodiment of the second aspect, the second embodiment of the second aspect, or the third embodiment of the second aspect, in the fourth embodiment of the second aspect of the present application, The first determining module includes:
第一确定子模块,配置为获取预设内容资源库中的内容资源作为所述至少一个候选内容资源,确定所述预设图片的文本特征与所述候选内容资源的文本标签之间的文本相似度,其中,所述预设内容资源库包括多个内容资源及其对应的文本标签。a first determining submodule configured to obtain a content resource in the preset content resource library as the at least one candidate content resource, and determine that the text feature of the preset image is similar to the text label of the candidate content resource And the preset content resource library includes a plurality of content resources and corresponding text labels.
结合第二方面、第二方面的第一实施方式、第二方面的第二种实施方式或第二方面的第三种实施方式,在本申请第二方面的第五种实施方式中,所述第二确定模块包括:With reference to the second aspect, the first embodiment of the second aspect, the second embodiment of the second aspect, or the third embodiment of the second aspect, in a fifth implementation manner of the second aspect of the present application, The second determining module includes:
采样子模块,配置为对所述候选内容资源进行采样,以获得所述内容资源对应的至少一张采样图片;a sampling sub-module configured to sample the candidate content resource to obtain at least one sample picture corresponding to the content resource;
第二确定子模块,配置为对于每张采样图片,确定该采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度;以及a second determining submodule configured to determine a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture for each sampled picture;
第三确定子模块,配置为根据所述候选内容资源对应的每张采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度,确定所述候选内容资源的视觉特征与所述预设图片的视觉特征之间的视觉相似度。a third determining submodule configured to determine a visual feature of the candidate content resource according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset image The visual similarity between the visual features of the preset picture.
结合第二方面的第五种实施方式,在本申请的第六种实施方式中,所述采样子模块进一步配置为:With reference to the fifth implementation manner of the second aspect, in a sixth implementation manner of the application, the sampling submodule is further configured to:
在可视空间内,以预设的观察方式和采样方式,对所述候选内容资源进行视角采样;In the visible space, the candidate content resources are subjected to perspective sampling in a preset observation manner and a sampling manner;
其中,所述预设的观察方式基于观察位置、角度和可视范围中的至少一项及其组合。Wherein, the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
第三方面,本申请实施例提供一种服务器,所述服务器包括:In a third aspect, an embodiment of the present application provides a server, where the server includes:
一个或多个处理器;One or more processors;
存储装置,用于存储一个或多个程序;a storage device for storing one or more programs;
当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如上所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described above.
第四方面,本申请实施例提供了一种计算机可读存储介质,用于存储搜索内容资源的装置所用的计算机软件指令,其包括用于执行上述第一方面中搜索内容资源的方法为搜索内容资源的装置所涉及的程序。In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, configured to store computer software instructions for a device for searching for a content resource, where the method for searching for a content resource in the foregoing first aspect is searching for content The program involved in the device of the resource.
上述技术方案中的一个技术方案具有如下优点或有益效果:本申请实施例查询的技术方案中,可以基于预设图片的文本特征和视觉特征,确定预设图片的文本特征与各内容资源的文本特征之间的文本相似度,确定预设图片的视觉特征与各内容资源的视觉特征之间的视觉相似度,然后根据所确定的文本相似度和视觉相似度,从内容资源中确定出目标内容资源。由于在搜索的过程中结合了文本的相似度和视觉相似度,这样可以准确地搜索到所需的内容资源。The technical solution of the foregoing technical solution has the following advantages or advantages: in the technical solution of the query in the embodiment of the present application, the text feature of the preset picture and the text of each content resource may be determined based on the text feature and the visual feature of the preset picture. The text similarity between the features, determining a visual similarity between the visual features of the preset picture and the visual features of each content resource, and then determining the target content from the content resources according to the determined text similarity and visual similarity Resources. Since the similarity and visual similarity of the text are combined in the search process, the desired content resources can be accurately searched.
上述概述仅仅是为了说明书的目的,并不意图以任何方式进行限制。除上述描述的示意性的方面、实施方式和特征之外,通过参考附图和以下的详细描述,本申请进一步的方面、实施方式和特征将会是容易明白的。The above summary is for the purpose of illustration only and is not intended to be limiting. Further aspects, embodiments, and features of the present application will be readily apparent from the Detailed Description of the Drawings.
附图说明DRAWINGS
在附图中,除非另外规定,否则贯穿多个附图相同的附图标记表示相同或相似的部件或元素。这些附图不一定是按照比例绘制的。应该理解,这些附图仅描绘了根据本申请公开的一些实施方式,而不应将其视为是对本申请范围的限制。In the drawings, the same reference numerals are used to refer to the The drawings are not necessarily to scale. It is to be understood that the appended drawings are not intended to
图1为本申请实施例一的搜索内容资源的方法的流程图;1 is a flowchart of a method for searching for content resources according to Embodiment 1 of the present application;
图2为本申请实施例二的搜索内容资源的方法的流程图;2 is a flowchart of a method for searching for content resources according to Embodiment 2 of the present application;
图3为本申请实施例二的搜索内容资源的方法的对内容资源进行视角采样的示意图;3 is a schematic diagram of performing perspective sampling on content resources in a method for searching for content resources according to Embodiment 2 of the present application;
图4为本申请实施例二的搜索内容资源的方法的预设图片与内容资源的视觉特征比对示意图;4 is a schematic diagram of a comparison of visual features of a preset picture and a content resource in a method for searching for a content resource according to Embodiment 2 of the present application;
图5为本申请实施例三的搜索内容资源的方法的流程图;FIG. 5 is a flowchart of a method for searching for content resources according to Embodiment 3 of the present application;
图6为本申请实施例四的搜索内容资源的装置的示意图;6 is a schematic diagram of an apparatus for searching for content resources according to Embodiment 4 of the present application;
图7为本申请实施例五的搜索内容资源的装置的示意图;7 is a schematic diagram of an apparatus for searching for content resources according to Embodiment 5 of the present application;
图8为本申请实施例六的服务器的示意图。FIG. 8 is a schematic diagram of a server according to Embodiment 6 of the present application.
具体实施方式Detailed ways
在下文中,仅简单地描述了某些示例性实施例。正如本领域技术人员可认识到的那样,在不脱离本申请的精神或范围的情况下,可通过各种不同方式修改所描述的实施例。因此,附图和描述被认为本质上是示例性的而非限制性的。In the following, only certain exemplary embodiments are briefly described. The described embodiments may be modified in various different ways, without departing from the spirit and scope of the invention. Accordingly, the drawings and description are to be regarded as illustrative rather
实施例一Embodiment 1
本申请实施例提供了一种搜索内容资源的方法。如图1所示,为本申请实施例的搜索内容资源的方法的流程图。本申请实施例的搜索内容资源的方法包括以下步骤:The embodiment of the present application provides a method for searching for content resources. FIG. 1 is a flowchart of a method for searching for content resources according to an embodiment of the present application. The method for searching for content resources in the embodiment of the present application includes the following steps:
S101,获取预设图片和至少一个候选内容资源,并分别提取所述预设图片的文本特征和视觉特征以及所述候选内容资源的文本特征和视觉特征。S101. Acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource.
本申请实施例中,预设图片可以包括但不限于网络图片、相册中存储的图片、摄像机拍摄的图片或者手绘草图等。In the embodiment of the present application, the preset picture may include, but is not limited to, a network picture, a picture stored in an album, a picture taken by a camera, or a hand-drawn sketch.
本申请实施例可以基于HTTP(HyperText Transfer Protocol,超文本传输协议)、HTTPS等协议的各类API(Application Programming Interface,应用程序编程接口)接口接收客户端发送来的图片作为预设图片,也可以通过用户输入的、图片的网页地址来获取预设图片。The embodiments of the present application may receive a picture sent by a client as a preset picture according to various API (Application Programming Interface) interfaces of a protocol such as HTTP (HyperText Transfer Protocol) and HTTPS. The preset image is obtained by the webpage address of the image input by the user.
其中,获取预设图片的文本特征的方法可以为:分析预设图片,得到一段或多段能够描述或代表图片内容的短文本,从而将图片形式的查询条件转换为文本形式的查询条件。具体的分析方法可以包括:采用机器学习算法构建图片分类器,然后将预设图片输入图片分类器,以进行分析。图片分类器可以用于分析预设图片以得到图片内容,并输出预设图片的描述文本。例如,向图片分类器中输入霸王龙的图片,可以输出文字“霸王龙”。The method for obtaining the text feature of the preset picture may be: analyzing the preset picture to obtain one or more short texts capable of describing or representing the picture content, thereby converting the query condition in the form of the picture into the query condition in the text form. The specific analysis method may include: constructing a picture classifier by using a machine learning algorithm, and then inputting a preset picture into a picture classifier for analysis. The picture classifier can be used to analyze the preset picture to obtain the picture content, and output the description text of the preset picture. For example, if you enter the picture of Tyrannosaurus in the image classifier, you can output the text "T-Rex".
S102,确定所述预设图片的文本特征与所述候选内容资源的文本特征之间的文本相似度。S102. Determine a text similarity between a text feature of the preset picture and a text feature of the candidate content resource.
本申请实施例涉及的内容资源包括但不限于普通视频、全景图片、全景视频、三维(3D)模型、三维动画及其在虚拟现实(VR)和增强现实(AR) 场景下的展示。Content resources related to embodiments of the present application include, but are not limited to, normal video, panoramic picture, panoramic video, three-dimensional (3D) model, three-dimensional animation, and their presentation in virtual reality (VR) and augmented reality (AR) scenarios.
例如,全景照片(PANORAMIC PHOTO或PANORAMA)包括符合人的双眼正常有效视角(大约水平90度,垂直70度)或包括双眼余光视角(大约水平180度,垂直90度)以上,乃至360度完整场景范围拍摄的照片。For example, a panoramic photo (PANORAMIC PHOTO or PANORAMA) includes a normal effective viewing angle (approximately 90 degrees horizontally, 70 degrees vertical) of a person's eyes or a binocular residual light angle (approximately 180 degrees horizontally, 90 degrees vertical) or even 360 degrees complete. A photo taken in the scene range.
其中,所述内容资源可由网络爬虫从互联网中获取或者由内容生产者生产。例如,为了便于搜索内容资源,可以采用网络爬虫技术获取内容资源。内容资源生产者也可以制作内容资源,将这些内容资源构建成内容资源库。内容资源可以带有文本标签以便于分类、管理和检索。另外,还可以在预设间隔时间内更新内容资源库。这样在搜索内容资源时,可以在内容资源库中进行搜索,以提高搜索效率。Wherein, the content resource may be obtained by a web crawler from the Internet or by a content producer. For example, in order to facilitate searching for content resources, web crawler technology can be used to obtain content resources. Content resource producers can also create content resources and build them into content repositories. Content resources can be tagged with text to facilitate classification, management, and retrieval. In addition, the content repository can be updated at preset intervals. In this way, when searching for content resources, you can search in the content resource library to improve search efficiency.
在本申请实施中,可以将预设图片的文本特征与内容资源库中的各内容资源的文本特征一一进行比较,以确定预设图片与各内容资源的文本相似度。例如,预设图片的文本特征为“天安门”。如果一个内容资源为一幅全景图,该内容资源的文本特征为“天安门广场”。将预设图片的“天安门”和该内容资源的“天安门广场”相比对,可以确定二者文本相似度较高。如果另一个内容资源的文本特征为“南京”,将预设图片的“天安门”和该内容资源的“南京”相对比,二者的文本相似度较低。In the implementation of the present application, the text features of the preset picture and the text features of each content resource in the content resource library may be compared one by one to determine the text similarity between the preset picture and each content resource. For example, the text feature of the preset picture is "Tiananmen." If a content resource is a panorama, the text feature of the content resource is "Tiananmen Square." Comparing the "Tiananmen" of the preset picture with the "Tiananmen Square" of the content resource, it can be determined that the text similarity is high. If the text feature of another content resource is "Nanjing", the "Tiananmen" of the preset picture is compared with the "Nanjing" of the content resource, and the text similarity between the two is low.
S103,确定所述预设图片的视觉特征与所述候选内容资源的视觉特征之间的视觉相似度。S103. Determine a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource.
其中,视觉特征可以为表征图片所蕴含语义的属性数据,例如包括图片的颜色、纹理等。The visual feature may be attribute data that represents semantics of the image, such as color, texture, and the like of the image.
S104,根据所确定的文本相似度和视觉相似度,从所述至少一个候选内容资源中确定目标内容资源。S104. Determine a target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity.
在本申请实施例的步骤S101中,既可以同时提取预设图片的文本特征和视觉特征,也可以分别提取预设图片的文本特征和视觉特征。在本申请实施例中,并不限定提取预设图片的文本特征和视觉特征的先后顺序。例如,还可以在提取预设图片的文本特征后,执行步骤S102以确定文本相似度;再提取预设图片的视觉相似度后,执行步骤103以确定视觉相似度。也可以先提取预设图片的视觉相似度后,执行步骤103以确定视觉相似度;在提取预设 图片的文本特征后,执行步骤S102以确定文本相似度。或者,还可以并行执行这两种特征的提取和相应的确定相似度的过程。In step S101 of the embodiment of the present application, the text feature and the visual feature of the preset picture may be simultaneously extracted, and the text feature and the visual feature of the preset picture may be separately extracted. In the embodiment of the present application, the order of extracting the text features and the visual features of the preset picture is not limited. For example, after extracting the text feature of the preset picture, step S102 may be performed to determine the text similarity; after extracting the visual similarity of the preset picture, step 103 is performed to determine the visual similarity. After the visual similarity of the preset picture is extracted, step 103 is performed to determine the visual similarity; after extracting the text feature of the preset picture, step S102 is performed to determine the text similarity. Alternatively, the extraction of these two features and the corresponding process of determining the similarity can also be performed in parallel.
本申请实施例提供的搜索的技术方案可以根据预设图片的文本特征和视觉特征,确定预设图片的文本特征与各内容资源的文本特征的文本相似度,确定所述预设图片的视觉特征与各内容资源的视觉特征的视觉相似度,然后根据所确定的文本相似度和视觉相似度,从各内容资源中确定出目标内容资源,由于在搜索的过程中结合了文本的相似度和视觉相似度,这样可以准确地搜索到所需的内容资源,适用于对全景图片、全景视频、三维模型、三维动画、在虚拟现实及增强现实场景下的展示等各种内容资源进行搜索。The technical solution of the search provided by the embodiment of the present application may determine the text similarity between the text feature of the preset image and the text feature of each content resource according to the text feature and the visual feature of the preset image, and determine the visual feature of the preset image. The visual similarity with the visual features of each content resource, and then determining the target content resource from each content resource according to the determined text similarity and visual similarity, since the similarity and vision of the text are combined in the process of searching Similarity, which can accurately search for the required content resources, and is suitable for searching various content resources such as panoramic pictures, panoramic videos, three-dimensional models, three-dimensional animations, display in virtual reality and augmented reality scenes.
实施例二 Embodiment 2
在实施例一的基础上,本申请实施例提供了一种搜索内容资源的方法。如图2所示,为本申请实施例的搜索内容资源的方法的流程图。本申请实施例的搜索内容资源的方法包括以下步骤:On the basis of the first embodiment, the embodiment of the present application provides a method for searching for content resources. FIG. 2 is a flowchart of a method for searching for content resources according to an embodiment of the present application. The method for searching for content resources in the embodiment of the present application includes the following steps:
S201,获取预设图片和至少一个候选内容资源,并分别提取所述预设图片的文本特征和视觉特征以及所述候选内容资源的文本特征和视觉特征。S201. Acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource.
在一种实施例中,可以采用预设图片分类模型对预设图片的图片内容进行识别,从预设图片的中提取文本特征。例如,可以基于卷积神经网络算法并按照垂类来训练得到图片分类模型,向图片分类模型输入霸王龙的图片,图片分类模型可以输出文本分类标签“霸王龙”。In an embodiment, the preset picture classification model may be used to identify the picture content of the preset picture, and the text feature is extracted from the preset picture. For example, the image classification model can be trained based on the convolutional neural network algorithm and according to the vertical class, and the image of the Tyrannosaurus can be input to the image classification model, and the image classification model can output the text classification label "Treasure Dragon".
在另外一种实施例中,可以通过以下步骤提取预设图片的文本特征:根据预设图片的统一资源定位符(UniformResourceLocator,URL),获取对应的网页内容,从所述网页内容中提取预设图片的文本特征。例如,当预设图片来自于互联网时,或者网页上包含与预设图片一样的图片时,可以获取预设图片或与其一样的图片的URL。其次,对包括在URL指示的网页内的内容进行处理,提取预设图片的文本特征。例如,输入天安门城楼的预设图片和该预设图片的网页地址,对网页中的内容进行处理,可以产生短文本“天安门”,然后输出短文本“天安门”。In another embodiment, the text feature of the preset picture may be extracted by acquiring a corresponding webpage content according to a Uniform Resource Locator (URL) of the preset image, and extracting a preset from the webpage content. The textual characteristics of the picture. For example, when the preset picture is from the Internet, or when the web page contains the same picture as the preset picture, the URL of the preset picture or the same picture can be obtained. Secondly, the content included in the webpage indicated by the URL is processed to extract the text feature of the preset image. For example, input the preset picture of the Tiananmen Gate and the web address of the preset picture, and process the content in the webpage to generate a short text "Tiananmen" and then output the short text "Tiananmen".
S202,获取预设内容资源库中的内容资源作为所述至少一个候选内容资源,确定所述预设图片的文本特征与所述候选内容资源的文本标签之间的文本相似度。S202. The content resource in the preset content resource library is obtained as the at least one candidate content resource, and the text similarity between the text feature of the preset image and the text label of the candidate content resource is determined.
为便于管理,可以在内容资源产生时为其设置文本标签。例如,可使用网页搜索引擎常用的基于自然语言处理技术的“关键字文本相似度计算”模块加以实现。For ease of management, you can set a text label for a content resource as it is generated. For example, it can be implemented using a "keyword text similarity calculation" module based on natural language processing technology commonly used by web search engines.
下面举例介绍关键字文本相似度计算过程:给定两个关键字(均为短文本),采用文本相似度计算模型,该模型基于用户关键字数据和点击日志数据构建,并在离线状态下已经完成预训练(例如基于神经网络或词袋模型),通过文本相似度计算模型对两个关键字的语义相似度进行打分,分值越高表明两个关键字在语义上越相近,反之亦然。以基于余弦相似度的计算方式为例,输出分值取值范围为[-1,1]。例如,“天安门”和“天安门广场”的相似度分值设为s,s的值应接近于1,而“天安门”和“南京路”的相似度分值应显著低于s。The following example shows the keyword text similarity calculation process: given two keywords (both short texts), the text similarity calculation model is adopted, which is constructed based on user keyword data and click log data, and has been offline. The pre-training is completed (for example, based on a neural network or a bag model), and the semantic similarity of the two keywords is scored by a text similarity calculation model. The higher the score, the closer the two keywords are semantically similar, and vice versa. Taking the calculation method based on cosine similarity as an example, the output score value ranges from [-1, 1]. For example, the similarity scores of "Tiananmen Square" and "Tiananmen Square" are set to s, the value of s should be close to 1, and the similarity scores of "Tiananmen" and "Nanjing Road" should be significantly lower than s.
S203,对所述候选内容资源进行采样,以获得所述内容资源对应的至少一张采样图片。S203. Sample the candidate content resource to obtain at least one sample picture corresponding to the content resource.
在步骤S203中,可以在可视空间内,以预设的观察方式和采样方式,对各内容资源进行视角采样。In step S203, the content resources may be sampled in the visual space in a preset observation mode and a sampling manner.
其中,预设的观察方式基于观察位置、角度和可视范围中的至少一项及其组合。Wherein, the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
其中,采样方式可以包括等间隔采样、随机采样、基于用户交互历史记录分布的采样等。The sampling manner may include equal interval sampling, random sampling, sampling based on user interaction history record distribution, and the like.
以等间隔采样为例,在对内容资源进行采样时,可模拟用户的观察方式,来对整个可视空间进行视角采样,即将内容资源在模拟观察点进行平面投影,得到对应于该观察点的采样图片,然后调整模拟观察点,即改变观察位置。视角采样方法可通用于所有类型的内容资源。以预定的观察方式采样时,采样间隔还需要考虑计算量、存储空间与准确度和召回率等因素。在对包含动画内容的全景视频和3D动画采样时,还需进一步结合帧采样方法,即在时间轴上采样生成输出图片,采样时间间隔同样需要考虑计算量、存储空间与准 确度和召回率等因素。Taking the sampling at equal intervals as an example, when sampling the content resources, the user's observation mode can be simulated to sample the angle of view of the entire visible space, that is, the content resources are planarly projected at the simulated observation point, and the corresponding point is obtained. Sampling the picture and then adjusting the simulated observation point, ie changing the viewing position. The perspective sampling method is common to all types of content resources. When sampling in a predetermined observation mode, the sampling interval also needs to consider factors such as calculation amount, storage space and accuracy, and recall rate. In the sampling of panoramic video and 3D animation containing animation content, it is necessary to further combine the frame sampling method, that is, to generate an output image on the time axis, and the sampling interval also needs to consider the calculation amount, storage space and accuracy, and recall rate. factor.
举例来说,如图3所示,内容资源为霸王龙3D模型,采用不同的观察方式,例如每旋转一定的平面角度,就对霸王龙3D模型进行一次采样,获得采样图片1、采样图片2、采样图片3……采样图片n。For example, as shown in FIG. 3, the content resource is a 3D model of Tyrannosaurus Rex, and different observation modes are used, for example, a certain plane angle is rotated, and a 3D model of Tyrannosaurus Rex is sampled once to obtain a sample picture 1 and a sample picture 2 , sample picture 3... sample picture n.
S204,对于每张采样图片,确定该采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度。S204. Determine, for each sample picture, a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture.
在具体实施时,首先,采用图片特征提取器提取预设图片的视觉特征。In a specific implementation, first, a picture feature extractor is used to extract a visual feature of a preset picture.
此步骤可采用传统技术中图片搜索引擎常用的基于视觉特征的“相似图检索”模块加以实现。相似图检索的过程包括:给定一张预设图片,采用预定义或离线状态下预训练的图片特征提取器(例如基于卷积神经网络等)对预设图片进行视觉特征提取。This step can be implemented by a visual feature-based "similar map retrieval" module commonly used by image search engines in the conventional technology. The process of similarity map retrieval includes: taking a preset picture, and adopting a pre-trained or pre-trained picture feature extractor (for example, based on a convolutional neural network or the like) to perform visual feature extraction on the preset picture.
在提取各采样图片的视觉特征时,也可以采用以上方法来提取采样图片的视觉特征。When extracting the visual features of each sampled picture, the above method can also be used to extract the visual features of the sampled picture.
然后确定预设图片的视觉特征和每个采样图片的视觉特征,获得预设图片与每个采样图片的视觉相似度。Then, the visual features of the preset picture and the visual features of each sample picture are determined, and the visual similarity between the preset picture and each sample picture is obtained.
S205,根据所述候选内容资源对应的每张采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度,确定所述候选内容资源的视觉特征与所述预设图片的视觉特征之间的视觉相似度。S205. Determine, according to a visual similarity between a visual feature of each sample picture corresponding to the candidate content resource and a visual feature of the preset picture, a visual feature of the candidate content resource and a visual of the preset picture. Visual similarity between features.
经步骤S205,对于候选内容资源,会得到多张采样图片,此时预设图片分别与候选内容资源的每张采样图片进行比对,计算得到预设图片与每张采样图片之间的视觉相似度,并且输出用于所有采样图片的视觉相似度。In step S205, for the candidate content resource, multiple sampled pictures are obtained. At this time, the preset pictures are respectively compared with each sampled picture of the candidate content resources, and the visual similarity between the preset picture and each sampled picture is calculated. Degree, and output visual similarity for all sampled pictures.
根据步骤S205中所提取的视觉特征,确定预设图片的视觉特征与候选内容资源的每张采样图片的视觉特征之间的获得视觉相似度。视觉相似度越高表明预设图片与该采样图片在视觉语义上越相近,反之亦然。According to the visual feature extracted in step S205, the visual similarity between the visual feature of the preset picture and the visual feature of each sample picture of the candidate content resource is determined. A higher visual similarity indicates that the preset picture is closer to the sampled picture in visual semantics, and vice versa.
S206,根据所确定的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度。S206. Obtain an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity.
其中,步骤S206包括:A,基于预设阈值,对所确定的文本相似度和视 觉相似度进行选择;B,根据所选择的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度。Step S206 includes: A, selecting the determined text similarity and visual similarity based on the preset threshold; B, obtaining the preset picture and the according to the selected text similarity and visual similarity The overall similarity of candidate content resources.
其中,计算总体相似度的方法包括但不限于:线性加权、乘积、值域归一化等。以线性加权方法为例:假设一个内容资源对应有一个文本特征,相应地,预设图片与一个内容资源的文本相似度对应有一个值。在计算视觉相似度时,可以对内容资源进行采样得到多个采样图片,相应得到多个视觉相似度。这样,文本相似度作为公式中的一项,每个视觉相似度都作为公式中的一项,然后将每一项乘以相应的权值,再求和,即可得出总体相似度,公式如下:Among them, methods for calculating the overall similarity include, but are not limited to, linear weighting, product, value domain normalization, and the like. Taking the linear weighting method as an example: suppose a content resource corresponds to a text feature, and correspondingly, the preset image has a value corresponding to the text similarity of a content resource. When calculating the visual similarity, the content resource may be sampled to obtain a plurality of sampled pictures, and corresponding multiple visual similarities are obtained. Thus, the text similarity is taken as an item in the formula, and each visual similarity is taken as an item in the formula, and then each item is multiplied by the corresponding weight, and then summed to obtain the overall similarity, the formula as follows:
Q=aS 0+bS 1+cS 2+......+nS n Q=aS 0 +bS 1 +cS 2 +...+nS n
其中,Q表示总体相似度,a、b、c……n表示权值,S 0表示文本相似度,S 1、S 2……S n表示视觉相似度。 Where, Q represents the overall degree of similarity, a, b, c ...... n is the weight, S 0 represents text similarity, S 1, S 2 ...... S n represent visual similarity.
另外,有时还需要考虑其他额外因素对总体相似度的影响,例如,内容质量评估指数(优质、低质、分辨率、模型精细程度等)、用户历史点击记录、法律法规的禁止性规定等。In addition, it is sometimes necessary to consider the impact of other additional factors on overall similarity, such as content quality assessment index (quality, low quality, resolution, model fineness, etc.), user history click records, and prohibitions of laws and regulations.
确定目标内容资源可以采用以下方式进行:Determining the target content resource can be done in the following ways:
一、将总体相似度大于第一预设阈值的内容资源为目标内容资源。例如,第一预设阈值为80%,则在搜索到的所有内容资源中,输出总体相似度大于80%的内容资源。1. The content resource whose overall similarity is greater than the first preset threshold is the target content resource. For example, if the first preset threshold is 80%, the content resources whose overall similarity is greater than 80% are outputted among all the content resources that are searched.
二、可以对搜索到的内容资源按照总体相似度大小进行排序,将排在前几位的内容资源确定为目标内容资源。例如,采用排序函数rank对内容资源按照总体相似度大小顺序进行排序,保留前5个内容资源。Second, the searched content resources may be sorted according to the overall similarity size, and the content resources ranked in the first few bits are determined as the target content resources. For example, the content function is sorted in order of overall similarity by using the ranking function rank, and the first five content resources are reserved.
本申请实施例为提高计算效率,避免处理过多的内容资源,在计算出文本相似度和视觉相似度后,可以对文本相似度和/或视觉相似度按照值的大小分别进行过滤。在其中一种实施方式中,过滤文本相似度小于第一预设阈值的内容资源;过滤视觉相似度小于第二预设阈值的内容资源。这样可以减少需要计算总体相似度的内容资源数量,从而减少计算量,提高计算效率。In the embodiment of the present application, in order to improve the calculation efficiency and avoid processing too much content resources, after calculating the text similarity and the visual similarity, the text similarity and/or the visual similarity may be separately filtered according to the value. In one embodiment, the content resource whose text similarity is smaller than the first preset threshold is filtered; and the content resource whose visual similarity is less than the second preset threshold is filtered. This can reduce the amount of content resources that need to calculate the overall similarity, thereby reducing the amount of calculation and improving the calculation efficiency.
S207,根据所得到的总体相似度,从所述至少一个候选内容资源中确定 目标内容资源。S207. Determine a target content resource from the at least one candidate content resource according to the obtained overall similarity.
如图4所示,预设图片为某景区图片,候选内容资源为一全景图,这时将全景图做视角采样获得多张采样图片,然后将景区图片与每张采样图片进行对比。如果二者匹配,则采样图片所对应的全景图就是与某景区图片匹配的内容资源,输出该全景图。As shown in FIG. 4, the preset picture is a certain scenic spot picture, and the candidate content resource is a panoramic picture. At this time, the panoramic view is sampled to obtain a plurality of sampled pictures, and then the scenic picture is compared with each sampled picture. If the two match, the panoramic image corresponding to the sampled image is a content resource matching the image of a certain scenic spot, and the panoramic image is output.
在具体实施时,将内容资源库中的每一内容资源作为候选内容资源,但有时为了减少计算量,仅将某一类别的内容资源作为候选内容资源,然后重复以上步骤,获得预设图片与每一候选内容资源的总体相似度,再将候选内容根据总体相似度的大小进行排序。总体相似度的值越高,说明预设图片与内容资源的相似性越高。In a specific implementation, each content resource in the content resource library is used as a candidate content resource, but sometimes in order to reduce the amount of calculation, only a certain category of content resources is used as a candidate content resource, and then the above steps are repeated to obtain a preset image and The overall similarity of each candidate content resource, and then the candidate content is sorted according to the size of the overall similarity. The higher the overall similarity value, the higher the similarity between the preset image and the content resource.
另外,每个内容资源都有其对应的标识,为了便于实施,输出搜索到的内容资源时,并非直接输出内容资源本身,而是输出内容资源的标识(ID)。例如,输出总体相似度及其对应的内容资源的标识(ID),将内容资源按总体相似度大小进行排序,然后输出前n个内容资源的ID。In addition, each content resource has its corresponding identifier. For the sake of implementation, when the searched content resource is output, the content resource itself is not directly output, but the identifier (ID) of the content resource is output. For example, the overall similarity and the identifier (ID) of the corresponding content resource are output, the content resources are sorted according to the overall similarity size, and then the IDs of the first n content resources are output.
然后当在客户端展示内容资源时,可以根据ID获得内容资源存储地址,然后用户在浏览器界面通过一定的交互方式,例如选择远程播放器等,在候选多媒体文件列表中选择要展示的多媒体文件、即目标内容资源。Then, when the content resource is displayed on the client, the content resource storage address can be obtained according to the ID, and then the user selects the multimedia file to be displayed in the candidate multimedia file list through a certain interaction mode, for example, selecting a remote player, etc. in the browser interface. , that is, the target content resource.
本申请实施例的技术方案对内容资源以不同的观察方式和采样间隔对内容资源进行视角采样,使预设图片的视觉特征能够与内容资源的视觉特征全方位的匹配,使得搜索内容资源的准确度较高。The technical solution of the embodiment of the present application samples the content resources by using different viewing modes and sampling intervals of the content resources, so that the visual features of the preset images can be comprehensively matched with the visual features of the content resources, so that the search content resources are accurate. Higher degrees.
实施例三 Embodiment 3
在实施例二的基础上,本申请实施例提供了一种搜索内容资源的方法。如图5所示,为本申请实施例的搜索内容资源的方法的流程图。本申请实施例的搜索内容资源的方法包括:On the basis of the second embodiment, the embodiment of the present application provides a method for searching for content resources. As shown in FIG. 5, it is a flowchart of a method for searching for content resources according to an embodiment of the present application. The method for searching for content resources in the embodiment of the present application includes:
①预设图片(query):作为系统的初始输入,由浏览器客户端产生,包含图片内容及图片的URL地址。1Preset picture (query): As the initial input of the system, generated by the browser client, containing the URL content of the picture content and picture.
其中,图片的形式不限,可为用户上传的图片文件、使用摄像头拍摄的 图片或手绘草图等。Among them, the form of the picture is not limited, and can be a picture file uploaded by the user, a picture taken by the camera or a hand-drawn sketch.
②网络接口:接收和解析客户端发来的预设图片,并返回对内容资源的搜索结果至客户端。可能的实现包括但不限于基于HTTP、HTTPS等协议的各类API接口定义形式。2 network interface: receive and parse the preset picture sent by the client, and return the search result of the content resource to the client. Possible implementations include, but are not limited to, various types of API interface definitions based on protocols such as HTTP and HTTPS.
③图片猜词:输入为网络接口传递的预设图片,输出为一段或多段能够描述或代表图片内容的短文本片段。图片猜词的作用为将图片形态的预设图片转换为文本形态的关键字。3 Picture Guess: Enter the preset picture passed for the network interface, and output as one or more short text segments that can describe or represent the content of the picture. The role of the picture guessing word is to convert the preset picture of the picture form into a keyword of the text form.
此步骤可采用图片搜索引擎常用的“识图”模块加以实现。典型地,识图功能包括:This step can be implemented by using the "figure map" module commonly used by image search engines. Typically, the mapping functions include:
a)使用相同图片进行匹配或使用URL信息,对该图片在互联网上的来源网页(在图片被转载的情况下,可能存在多个来源网页)上的文本信息加以聚合、提取、并产生文本输出。例如,给定天安门城楼的图片输入,模块输出短文本“天安门”。a) use the same picture to match or use the URL information to aggregate, extract, and generate text output on the text information on the source page of the Internet (in the case where the picture is reproduced, there may be multiple source pages) . For example, given the picture input of the Tiananmen Gate, the module outputs the short text "Tiananmen".
b)使用针对垂类预训练好的图片分类器(例如基于卷积神经网络算法的图片分类器),对图片内容进行识别,输出分类标签文本。例如,给定霸王龙的图片输入,模块输出短文本“霸王龙”。b) Using a picture classifier pre-trained for the vertical class (for example, a picture classifier based on a convolutional neural network algorithm), the picture content is identified, and the classification label text is output. For example, given the picture input of Tyrannosaurus Rex, the module outputs the short text "Overlord Dragon".
④文本相似度计算:输入为③的输出(图片猜词的结果以及内容资源库中各资源所携带的文本标签集合)。此步骤在猜词结果文本与所有内容资源文本标签之间进行两两匹配得到多个文本匹配对、计算得到每两个文本匹配对之间的文本相似度,输出为其文本相似度分值。4 Text similarity calculation: input is the output of 3 (the result of the picture guessing word and the set of text labels carried by each resource in the content resource library). This step performs a pairwise matching between the guess word result text and all content resource text labels to obtain multiple text matching pairs, and calculates the text similarity between each two text matching pairs, and outputs the text similarity score.
此步骤可采用网页搜索引擎常用的基于自然语言处理技术的“query文本相似度计算”模块加以实现。典型地,预设图片的文本相似度计算功能是:给定两段关键字(均为短文本),使用基于用户查询数据和点击日志离线预训练好的文本相似度计算模型(例如基于神经网络或词袋模型),对两个关键字的语义相似度进行打分,分值越高表明两个关键字在语义上越相近,反之亦然。以基于余弦相似度的计算方式为例,输出分值取值范围为[-1,1]。比如,“天安门”和“天安门广场”的相似度分值设为s,s的值应接近于1,而“天安门”和“南京路”的相似度分值应显著低于s。This step can be implemented by a "query text similarity calculation" module based on natural language processing technology commonly used by web search engines. Typically, the text similarity calculation function of the preset picture is: given two pieces of keywords (both short texts), using a text similarity calculation model based on user query data and click log offline pre-training (eg based on neural network) Or the word bag model), which scores the semantic similarity of two keywords. The higher the score, the closer the two keywords are semantically similar, and vice versa. Taking the calculation method based on cosine similarity as an example, the output score value ranges from [-1, 1]. For example, the similarity scores of "Tiananmen Square" and "Tiananmen Square" are set to s, the value of s should be close to 1, and the similarity scores of "Tiananmen" and "Nanjing Road" should be significantly lower than s.
⑤内容资源库:内容资源库是各类资源的集合,其中的资源由搜索引擎爬虫爬取或内容生产者提供,且各资源都带有文本标签以便于分类、管理和检索。5 Content Resource Library: The content resource library is a collection of various resources, which are provided by search engine crawler crawling or content producers, and each resource has a text label for classification, management and retrieval.
⑥视角采样:输入为内容库中任一资源,输出为若干张采样图片。可参见如图3所示的实施例,对于给定的内容资源,可通过模拟并改变用户的观察方式,包括但不限于观察位置、角度、可视范围等,在整个可视空间内进行视角采样,获取多张图片,其中每张图片均为将内容对模拟观察点进行平面投影所得。视角采样可通用于全景/3D/AR/VR内容,观察位置、角度、可视范围的采样间隔可在计算量、存储空间与准确度、召回率之间权衡;对于包含动画内容的全景视频和3D动画,则进一步配合帧采样在时间轴上采样生成输出采样图片,采样时间间隔同样在计算量、存储空间与准确度、召回率之间权衡。典型的采样技术包括但不限于等间隔采样、随机采样、基于用户交互历史记录分布的采样等。6-view sampling: The input is any resource in the content library, and the output is several sample images. Referring to the embodiment shown in FIG. 3, for a given content resource, the viewing angle can be viewed in the entire visible space by simulating and changing the user's viewing manner, including but not limited to viewing position, angle, visual range, and the like. Sampling, obtaining multiple images, each of which is a plane projection of the content to the simulated observation point. Viewing angle sampling can be used for panoramic/3D/AR/VR content. The sampling interval of viewing position, angle and visual range can be traded off between calculation amount, storage space and accuracy, and recall rate; for panoramic video containing animation content and The 3D animation is further matched with the frame sampling to generate an output sample picture on the time axis, and the sampling time interval is also trade-off between the calculation amount, the storage space and the accuracy, and the recall rate. Typical sampling techniques include, but are not limited to, equally spaced sampling, random sampling, sampling based on user interaction history distribution, and the like.
⑦视觉相似度计算:输入为预设图片和⑤的输出(内容资源库中各内容资源经视角采样步骤得到的采样图片集合)。此模块在预设图片与所有内容资源采样图片之间进行两两匹配得到多个图片匹配对、计算得到每两个图片匹配对之间的视觉相似度,输出为其视觉相似度分值。7 Visual similarity calculation: input is the preset picture and the output of 5 (the sample picture set obtained by the view sampling step of each content resource in the content resource library). The module performs a pairwise matching between the preset picture and all the content resource sample pictures to obtain a plurality of picture matching pairs, and calculates a visual similarity between each two picture matching pairs, and outputs the visual similarity scores.
此步骤可采用图片搜索引擎常用的基于视觉特征的“相似图检索”模块加以实现。典型地,相似图检索的功能是:给定一张预设图片,使用预定义或离线预训练的图片特征提取器(例如基于卷积神经网络等)对预设图片进行视觉特征提取,将所提取的特征与图片库中各图片的特征进行比对,对视觉特征的相似度进行打分排序,分值越高表明预设图片与某张库内图片在视觉语义上越相近,反之亦然。This step can be implemented by a visual feature-based "similar map retrieval" module commonly used by image search engines. Typically, the function of the similarity map search is: given a preset picture, using a pre-defined or offline pre-trained picture feature extractor (eg based on a convolutional neural network, etc.) to perform visual feature extraction on the preset picture, The extracted features are compared with the features of each picture in the picture library, and the similarity of the visual features is scored. The higher the score, the closer the visual difference between the preset picture and the picture in a certain library, and vice versa.
⑧总体相似度计算:输入为④和⑥的输出,即预设图片与内容库内资源的文本匹配对和图片匹配对在文本和视觉两方面的相似度分值,输出为总体相似度及对应的候选内容资源ID。总体相似度分值越高,表明预设图片与对应的候选内容资源相关性越高。8 Overall similarity calculation: the input is the output of 4 and 6, that is, the text matching pair of the preset picture and the resource in the content library and the similarity score of the picture matching pair in the text and the visual, and the output is the overall similarity and corresponding Candidate content resource ID. The higher the overall similarity score, the higher the correlation between the preset picture and the corresponding candidate content resource.
总体相似度的计算基于文本和视觉相似度分值的组合,可能的实现包括但不限于线性加权、乘积、值域归一化等方式的组合。同时,可考虑额外因 素,包括但不限于内容质量评估指数(优质、低质、分辨率、模型精细程度等)、用户历史点击记录、法律法规等。The overall similarity calculation is based on a combination of text and visual similarity scores, and possible implementations include, but are not limited to, linear weighting, product, range normalization, and the like. At the same time, additional factors may be considered, including but not limited to content quality assessment indices (high quality, low quality, resolution, model sophistication, etc.), user history click records, laws and regulations, and the like.
为加速这一模块的计算过程,避免处理过多的匹配对,可以对进入计算的文本相似度分值和视觉相似度分值分别进行过滤,例如低于一定阈值的相似度分值直接过滤,不进入总体相似度计算过程,以减少计算量。In order to speed up the calculation process of this module and avoid dealing with too many matching pairs, the text similarity score and the visual similarity score entering the calculation may be separately filtered, for example, the similarity score below a certain threshold is directly filtered. Do not enter the overall similarity calculation process to reduce the amount of calculation.
⑨Top k排序:输入为⑦的输出,即总体相似度及对应的候选内容资源ID,输出为前k个按总体相似度分值降序排列的候选内容资源ID。9Top k sorting: The output of input 7 is the overall similarity and the corresponding candidate content resource ID, and the output is the first k candidate content resource IDs arranged in descending order of the overall similarity score.
⑩客户端展示:根据⑨的输出,用户在浏览器界面通过一定的交互方式在候选内容资源列表中选择要展示的内容资源,由浏览器客户端进行展示。10 client display: According to the output of 9, the user selects the content resource to be displayed in the candidate content resource list through a certain interaction mode in the browser interface, and is displayed by the browser client.
上述模块中,③—⑨可通过离线方式预先计算,从而加速在线的搜索过程。例如,可以离线方式预先对全网网页的图片库逐张进行处理,以离线方式进行相似度分值计算和排序,建立静态的查找表结构,将任意网页中的图片建立起与内容资源的关联。在线搜索时,则可以通过查表快速获取相匹配的内容。该查找表可通过增量计算方式更新。如果用户预设图片不在全网库中,③—⑨可进行在线计算。上述在线和离线计算的过程均可通过并行计算等技术进行加速。典型的匹配结果如图3所示的实施例,可见匹配可以精准到特定观察角度,匹配精度高,用户体验好。Among the above modules, 3-9 can be pre-calculated by offline method, thereby accelerating the online search process. For example, the image library of the whole webpage may be processed in advance in an offline manner, the similarity score calculation and sorting may be performed offline, a static lookup table structure may be established, and the image in any webpage may be associated with the content resource. . When searching online, you can quickly get matching content by looking up the table. This lookup table can be updated in incremental calculations. If the user preset picture is not in the entire network, 3-9 can be calculated online. The above online and offline calculation processes can be accelerated by techniques such as parallel computing. The typical matching result is shown in the embodiment shown in FIG. 3. It can be seen that the matching can be accurate to a specific viewing angle, the matching precision is high, and the user experience is good.
本申请实施例的技术方案对内容资源以不同的观察方式和采样间隔对内容资源进行视角采样,使预设图片的视觉特征能够与内容资源的视觉特征全方位的匹配,使得搜索内容资源的准确度较高。The technical solution of the embodiment of the present application samples the content resources by using different viewing modes and sampling intervals of the content resources, so that the visual features of the preset images can be comprehensively matched with the visual features of the content resources, so that the search content resources are accurate. Higher degrees.
实施例四Embodiment 4
本申请实施例提供了一种搜索内容资源的装置。如图6所示,为本申请实施例的搜索内容资源的装置的示意图。本申请实施例的搜索内容资源的装置包括:An embodiment of the present application provides an apparatus for searching for a content resource. FIG. 6 is a schematic diagram of an apparatus for searching for content resources according to an embodiment of the present application. The device for searching for content resources in the embodiment of the present application includes:
获取模块61,配置为获取预设图片和至少一个候选内容资源,并分别提取所述预设图片的文本特征和视觉特征以及所述候选内容资源的文本特征和视觉特征;The obtaining module 61 is configured to acquire a preset picture and at least one candidate content resource, and respectively extract text features and visual features of the preset picture and text features and visual features of the candidate content resource;
第一确定模块62,配置为确定所述预设图片的文本特征与所述候选内容资源的文本特征之间的文本相似度;The first determining module 62 is configured to determine a text similarity between the text feature of the preset picture and the text feature of the candidate content resource;
第二确定模块63,配置为确定所述预设图片的视觉特征与所述候选内容资源的视觉特征之间的视觉相似度;以及a second determining module 63, configured to determine a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource;
目标确定模块64,配置为根据所确定的文本相似度和视觉相似度,从所述至少一个候选内容资源中确定目标内容资源。The target determining module 64 is configured to determine the target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity.
本申请实施例的技术方案可以实现根据预设图片和内容资源的文本特征结合二者的视觉特征,搜索内容资源的准确度较高,该技术效果与实施例一的有益效果相同,在此不再赘述。The technical solution of the embodiment of the present application can realize the combination of the text features of the preset picture and the content resource, and the accuracy of searching for the content resource is high. The technical effect is the same as that of the first embodiment, and the technical effect is not the same. Let me repeat.
实施例五 Embodiment 5
在实施例四的基础上,本申请实施例提供了一种搜索内容资源的装置。如图7所示,为本申请实施例的搜索内容资源的装置的示意图。在本申请实施例的搜索内容资源的装置中:On the basis of the fourth embodiment, the embodiment of the present application provides an apparatus for searching for content resources. FIG. 7 is a schematic diagram of an apparatus for searching for content resources according to an embodiment of the present application. In the device for searching content resources in the embodiment of the present application:
目标确定模块64包括:The target determination module 64 includes:
第一计算子模块641,配置为根据所确定的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源之间的总体相似度;以及The first calculation sub-module 641 is configured to obtain an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity;
目标确定子模块642,配置为根据所得到的总体相似度,从所述至少一个候选内容资源中确定目标内容资源。The target determining sub-module 642 is configured to determine the target content resource from the at least one candidate content resource according to the obtained overall similarity.
其中,所述第一计算子模块还配置为:The first calculation submodule is further configured to:
基于预设阈值,对所确定的文本相似度和视觉相似度进行选择;以及Selecting the determined text similarity and visual similarity based on a preset threshold;
根据所选择的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度。Obtaining an overall similarity between the preset picture and the candidate content resource according to the selected text similarity and visual similarity.
进一步地,所述获取模块61包括:Further, the obtaining module 61 includes:
识别子模块611,配置为采用预设图片分类模型对所述预设图片中包含的图片内容进行识别,以从所述预设图片中提取文本特征;或The identification sub-module 611 is configured to identify a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture; or
提取子模块612,配置为根据所述预设图片的统一资源定位符,获取对应的网页内容,从所述网页内容中提取所述预设图片的文本特征。The extraction sub-module 612 is configured to obtain a corresponding webpage content according to the uniform resource locator of the preset image, and extract a text feature of the preset image from the webpage content.
进一步地,所述第一确定模块62包括:Further, the first determining module 62 includes:
第一确定子模块621,配置为获取预设内容资源库中的内容资源作为所述至少一个候选内容资源,确定所述预设图片的文本特征与所述候选内容资源的文本标签之间的文本相似度,其中,所述预设内容资源库包括多个内容资源及其对应的文本标签。The first determining sub-module 621 is configured to acquire a content resource in the preset content resource library as the at least one candidate content resource, and determine a text between the text feature of the preset image and the text label of the candidate content resource. Similarity, wherein the preset content resource library includes a plurality of content resources and their corresponding text labels.
进一步地,所述第二确定模块63包括:Further, the second determining module 63 includes:
采样子模块631,配置为对所述候选内容资源进行采样,以获得所述内容资源对应的至少一张采样图片;The sampling sub-module 631 is configured to sample the candidate content resource to obtain at least one sample picture corresponding to the content resource;
第二确定子模块632,配置为对于每张采样图片,确定该采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度;以及a second determining sub-module 632 configured to determine a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture for each sampled picture;
第三确定子模块633,配置为根据所述候选内容资源对应的每张采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度,确定所述候选内容资源的视觉特征与所述预设图片的视觉特征之间的视觉相似度。The third determining sub-module 633 is configured to determine a visual feature of the candidate content resource according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset picture The visual similarity between the visual features of the preset picture.
其中,所述采样子模块具体配置为:在可视空间内,以预设的观察方式和采样方式,对所述候选内容资源进行视角采样;The sampling sub-module is specifically configured to: perform, in a visible space, a sampling manner of the candidate content resource in a preset observation manner and a sampling manner;
其中,所述预设的观察方式基于观察位置、角度和可视范围中的至少一项及其组合。Wherein, the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
本申请实施例的技术方案可以对内容资源的采用多种观察方式和采样方式进行采样,使得搜索内容资源的准确度较高,该技术效果与实施例二的有益效果相同,在此不再赘述。The technical solution of the embodiment of the present application can sample the content resources by using multiple observation modes and sampling modes, so that the accuracy of searching the content resources is high, and the technical effect is the same as that of the second embodiment, and details are not described herein again. .
实施例六 Embodiment 6
本申请实施例提供一种信息分类设备,如图8所示,该设备包括:存储器81和处理器82,存储器81内存储有可在处理器82上运行的计算机程序。处理器82执行所述计算机程序时实现上述实施例中的信息分类方法。存储器 81和处理器82的数量可以为一个或多个。The embodiment of the present application provides an information classification device. As shown in FIG. 8, the device includes a memory 81 and a processor 82. The memory 81 stores a computer program executable on the processor 82. The processor 82 implements the information classification method in the above embodiment when the computer program is executed. The number of memories 81 and processors 82 may be one or more.
该设备还包括:The device also includes:
通信接口83,用于存储器81和处理器82与外部设备之间的通信。The communication interface 83 is used for communication between the memory 81 and the processor 82 and an external device.
存储器81可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 81 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
如果存储器81、处理器82和通信接口83独立实现,则存储器81、处理器82和通信接口83可以通过总线相互连接并完成相互间的通信。所述总线可以是工业标准体系结构(ISA,Industry Standard Architecture)总线、外部设备互连(PCI,Peripheral Component)总线或扩展工业标准体系结构(EISA,Extended Industry Standard Component)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。If the memory 81, the processor 82, and the communication interface 83 are implemented independently, the memory 81, the processor 82, and the communication interface 83 can be connected to each other through a bus and complete communication with each other. The bus may be an Industrial Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Component (EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 8, but it does not mean that there is only one bus or one type of bus.
可选的,在具体实现上,如果存储器81、处理器82及通信接口83集成在一块芯片上,则存储器81、处理器82及通信接口83可以通过内部接口完成相互间的通信。Optionally, in a specific implementation, if the memory 81, the processor 82, and the communication interface 83 are integrated on one chip, the memory 81, the processor 82, and the communication interface 83 can complete communication with each other through the internal interface.
实施例七Example 7
一种计算机可读存储介质,其存储有计算机程序,该程序被处理器执行时实现如图1至图5任一实施例所示的方法。A computer readable storage medium storing a computer program that, when executed by a processor, implements the method of any of the embodiments of FIGS. 1-5.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification, as well as features of various embodiments or examples, may be combined and combined.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第 一”、“第二”的特征可以明示或隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。Moreover, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly. In the description of the present application, the meaning of "a plurality" is two or more unless specifically and specifically defined otherwise.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code that includes one or more executable instructions for implementing the steps of a particular logical function or process. And the scope of the preferred embodiments of the present application includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in the reverse order depending on the functions involved, in accordance with the illustrated or discussed order. It will be understood by those skilled in the art to which the embodiments of the present application pertain.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。The logic and/or steps represented in the flowchart or otherwise described herein, for example, may be considered as an ordered list of executable instructions for implementing logical functions, and may be embodied in any computer readable medium, Used in conjunction with, or in conjunction with, an instruction execution system, apparatus, or device (eg, a computer-based system, a system including a processor, or other system that can fetch instructions and execute instructions from an instruction execution system, apparatus, or device) Or use with equipment. For the purposes of this specification, a "computer-readable medium" can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with such an instruction execution system, apparatus, or device.
本申请实施例所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质的更具体的示例至少(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式只读存储器(CDROM)。另外,计算机可读存储介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The computer readable medium described in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. More specific examples of computer readable storage media, at least (non-exhaustive list) include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM) ), read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable read only memory (CDROM). In addition, the computer readable storage medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if necessary, other Processing is performed in a suitable manner to obtain the program electronically and then stored in computer memory.
在本申请实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于指令执 行系统、输入法或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。In an embodiment of the present application, a computer readable signal medium may include a data signal that is propagated in a baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use in or in connection with an instruction execution system, an input method, or a device. . Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), and the like, or any suitable combination of the foregoing.
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that portions of the application can be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。One of ordinary skill in the art can understand that all or part of the steps carried by the method of implementing the above embodiments can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, one or a combination of the steps of the method embodiments is included.
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读存储介质中。所述存储介质可以是只读存储器,磁盘或光盘等。In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as separate products, may also be stored in a computer readable storage medium. The storage medium may be a read only memory, a magnetic disk or an optical disk, or the like.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到其各种变化或替换,这些都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above description is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of various changes or within the technical scope disclosed in the present application. In addition, these should be covered by the scope of the present application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims (16)

  1. 一种搜索内容资源的方法,其特征在于,所述方法包括:A method for searching a content resource, the method comprising:
    获取预设图片和至少一个候选内容资源,并分别提取所述预设图片的文本特征和视觉特征以及所述候选内容资源的文本特征和视觉特征;Obtaining a preset picture and at least one candidate content resource, and separately extracting text features and visual features of the preset picture and text features and visual features of the candidate content resource;
    确定所述预设图片的文本特征与所述候选内容资源的文本特征之间的文本相似度;Determining a text similarity between a text feature of the preset picture and a text feature of the candidate content resource;
    确定所述预设图片的视觉特征与所述候选内容资源的视觉特征之间的视觉相似度;以及Determining a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource;
    根据所确定的文本相似度和视觉相似度,从所述至少一个候选内容资源中确定目标内容资源。The target content resource is determined from the at least one candidate content resource based on the determined text similarity and visual similarity.
  2. 根据权利要求1所述的方法,其特征在于,根据所确定的文本相似度和视觉相似度,从所述至少一个候选内容资源中确定目标内容资源,包括:The method according to claim 1, wherein determining the target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity comprises:
    根据所确定的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源之间的总体相似度;以及Obtaining an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity;
    根据所得到的总体相似度,从所述至少一个候选内容资源中确定目标内容资源。The target content resource is determined from the at least one candidate content resource based on the obtained overall similarity.
  3. 根据权利要求2所述的方法,其特征在于,根据所确定的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度,包括:The method according to claim 2, wherein the overall similarity between the preset picture and the candidate content resource is obtained according to the determined text similarity and visual similarity, including:
    基于预设阈值,对所确定的文本相似度和视觉相似度进行选择;以及Selecting the determined text similarity and visual similarity based on a preset threshold;
    根据所选择的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度。Obtaining an overall similarity between the preset picture and the candidate content resource according to the selected text similarity and visual similarity.
  4. 根据权利要求1所述的方法,其特征在于,提取所述预设图片的文本特征,包括:The method according to claim 1, wherein extracting text features of the preset picture comprises:
    采用预设图片分类模型对所述预设图片中包含的图片内容进行识别,以从所述预设图片中提取文本特征;或Identifying a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture; or
    根据所述预设图片的统一资源定位符,获取对应的网页内容,从所述网页内容中提取所述预设图片的文本特征。Obtaining corresponding webpage content according to the uniform resource locator of the preset image, and extracting text features of the preset image from the webpage content.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述确定所述预设图片的文本特征与所述候选内容资源的文本特征之间的文本相似度,包括:The method according to any one of claims 1 to 4, wherein the determining a text similarity between a text feature of the preset picture and a text feature of the candidate content resource comprises:
    获取预设内容资源库中的内容资源作为所述至少一个候选内容资源,确定所述预设图片的文本特征与所述候选内容资源的文本标签之间的文本相似度,其中,所述预设内容资源库包括多个内容资源及其对应的文本标签。Obtaining a content resource in the preset content resource library as the at least one candidate content resource, and determining a text similarity between a text feature of the preset image and a text tag of the candidate content resource, where the preset The content repository includes a plurality of content resources and their corresponding text labels.
  6. 根据权利要求1至4中任一项所述的方法,其特征在于,所述确定所述预设图片的视觉特征与所述候选内容资源的视觉特征之间的视觉相似度,包括:The method according to any one of claims 1 to 4, wherein the determining a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource comprises:
    对所述候选内容资源进行采样,以获得所述内容资源对应的至少一张采样图片;Sampling the candidate content resource to obtain at least one sample picture corresponding to the content resource;
    对于每张采样图片,确定该采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度;以及For each sampled picture, determining a visual similarity between a visual feature of the sampled picture and a visual feature of the predetermined picture;
    根据所述候选内容资源对应的每张采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度,确定所述候选内容资源的视觉特征与所述预设图片的视觉特征之间的视觉相似度。Determining a visual feature of the candidate content resource and a visual feature of the preset image according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset image Visual similarity between.
  7. 根据权利要求6所述的方法,其特征在于,对所述候选内容资源进行采样,包括:The method according to claim 6, wherein sampling the candidate content resources comprises:
    在可视空间内,以预设的观察方式和采样方式,对所述候选内容资源进行视角采样;In the visible space, the candidate content resources are subjected to perspective sampling in a preset observation manner and a sampling manner;
    其中,所述预设的观察方式基于观察位置、角度和可视范围中的至少一项及其组合。Wherein, the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  8. 一种搜索内容资源的装置,其特征在于,所述装置包括:An apparatus for searching for a content resource, the apparatus comprising:
    获取模块,配置为获取预设图片和至少一个候选内容资源,并分别提取所述预设图片的文本特征和视觉特征以及所述候选内容资源的文本特征和视觉特征;An acquiring module, configured to acquire a preset picture and at least one candidate content resource, and separately extract text features and visual features of the preset picture and text features and visual features of the candidate content resource;
    第一确定模块,配置为确定所述预设图片的文本特征与所述候选内容资 源的文本特征之间的文本相似度;a first determining module, configured to determine a text similarity between a text feature of the preset picture and a text feature of the candidate content resource;
    第二确定模块,配置为确定所述预设图片的视觉特征与所述候选内容资源的视觉特征之间的视觉相似度;以及a second determining module, configured to determine a visual similarity between a visual feature of the preset picture and a visual feature of the candidate content resource;
    目标确定模块,配置为根据所确定的文本相似度和视觉相似度,从所述至少一个候选内容资源中确定目标内容资源。The target determining module is configured to determine the target content resource from the at least one candidate content resource according to the determined text similarity and visual similarity.
  9. 根据权利要求8所述的装置,其特征在于,所述目标确定模块包括:The apparatus according to claim 8, wherein the target determining module comprises:
    第一计算子模块,配置为根据所确定的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源之间的总体相似度;以及a first calculation submodule configured to obtain an overall similarity between the preset picture and the candidate content resource according to the determined text similarity and visual similarity;
    目标确定子模块,配置为根据所得到的总体相似度,从所述至少一个候选内容资源中确定目标内容资源。The target determining submodule is configured to determine the target content resource from the at least one candidate content resource according to the obtained overall similarity.
  10. 根据权利要求9所述的装置,其特征在于,所述第一计算子模块还配置为:The apparatus according to claim 9, wherein the first calculation submodule is further configured to:
    基于预设阈值,对所确定的文本相似度和视觉相似度进行选择;以及Selecting the determined text similarity and visual similarity based on a preset threshold;
    根据所选择的文本相似度和视觉相似度,得到所述预设图片与所述候选内容资源的总体相似度。Obtaining an overall similarity between the preset picture and the candidate content resource according to the selected text similarity and visual similarity.
  11. 根据权利要求8所述的装置,其特征在于,所述获取模块包括:The device according to claim 8, wherein the obtaining module comprises:
    识别子模块,配置为采用预设图片分类模型对所述预设图片中包含的图片内容进行识别,以从所述预设图片中提取文本特征;或An identifier module configured to identify a picture content included in the preset picture by using a preset picture classification model to extract a text feature from the preset picture; or
    提取子模块,配置为根据所述预设图片的统一资源定位符,获取对应的网页内容,从所述网页内容中提取所述预设图片的文本特征。The extracting sub-module is configured to obtain a corresponding webpage content according to the uniform resource locator of the preset image, and extract a text feature of the preset image from the webpage content.
  12. 根据权利要求8至12任一项所述的装置,其特征在于,所述第一确定模块包括:The apparatus according to any one of claims 8 to 12, wherein the first determining module comprises:
    第一确定子模块,配置为获取预设内容资源库中的内容资源作为所述至少一个候选内容资源,确定所述预设图片的文本特征与所述候选内容资源的文本标签之间的文本相似度,其中,所述预设内容资源库包括多个内容资源及其对应的文本标签。a first determining submodule configured to obtain a content resource in the preset content resource library as the at least one candidate content resource, and determine that the text feature of the preset image is similar to the text label of the candidate content resource And the preset content resource library includes a plurality of content resources and corresponding text labels.
  13. 根据权利要求8至12任一项所述的装置,其特征在于,所述第二确定模块包括:The apparatus according to any one of claims 8 to 12, wherein the second determining module comprises:
    采样子模块,配置为对所述候选内容资源进行采样,以获得所述内容资源对应的至少一张采样图片;a sampling sub-module configured to sample the candidate content resource to obtain at least one sample picture corresponding to the content resource;
    第二确定子模块,配置为对于每张采样图片,确定该采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度;以及a second determining submodule configured to determine a visual similarity between a visual feature of the sampled picture and a visual feature of the preset picture for each sampled picture;
    第三确定子模块,配置为根据所述候选内容资源对应的每张采样图片的视觉特征与所述预设图片的视觉特征之间的视觉相似度,确定所述候选内容资源的视觉特征与所述预设图片的视觉特征之间的视觉相似度。a third determining submodule configured to determine a visual feature of the candidate content resource according to a visual similarity between a visual feature of each sampled picture corresponding to the candidate content resource and a visual feature of the preset image The visual similarity between the visual features of the preset picture.
  14. 根据权利要求13所述的装置,其特征在于,所述采样子模块进一步配置为:The apparatus of claim 13, wherein the sampling submodule is further configured to:
    在可视空间内,以预设的观察方式和采样方式,对所述候选内容资源进行视角采样;In the visible space, the candidate content resources are subjected to perspective sampling in a preset observation manner and a sampling manner;
    其中,所述预设的观察方式基于观察位置、角度和可视范围中的至少一项及其组合。Wherein, the preset observation mode is based on at least one of an observation position, an angle, and a visible range, and a combination thereof.
  15. 一种服务器,其特征在于,所述服务器包括:A server, wherein the server comprises:
    一个或多个处理器;One or more processors;
    存储装置,用于存储一个或多个程序;a storage device for storing one or more programs;
    当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1-7中任一所述的方法。The one or more processors are caused to perform the method of any one of claims 1-7 when the one or more programs are executed by the one or more processors.
  16. 一种计算机可读存储介质,其存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7中任一所述的方法。A computer readable storage medium storing a computer program, wherein the program, when executed by a processor, implements the method of any of claims 1-7.
PCT/CN2018/111433 2018-03-09 2018-10-23 Method and device for searching for content resource, and server WO2019169872A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810195551.6 2018-03-09
CN201810195551.6A CN108416028B (en) 2018-03-09 2018-03-09 Method, device and server for searching content resources

Publications (1)

Publication Number Publication Date
WO2019169872A1 true WO2019169872A1 (en) 2019-09-12

Family

ID=63130764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/111433 WO2019169872A1 (en) 2018-03-09 2018-10-23 Method and device for searching for content resource, and server

Country Status (2)

Country Link
CN (1) CN108416028B (en)
WO (1) WO2019169872A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001451A (en) * 2020-08-27 2020-11-27 上海擎感智能科技有限公司 Data redundancy processing method, system, medium and device
CN113761252A (en) * 2020-06-03 2021-12-07 华为技术有限公司 Text matching method and device and electronic equipment
CN115150297A (en) * 2022-08-15 2022-10-04 北京百润洪科技有限公司 Data filtering and content evaluation method and system based on mobile internet
CN115243081A (en) * 2022-09-23 2022-10-25 北京润尼尔网络科技有限公司 Mirror image distribution method based on VR

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416028B (en) * 2018-03-09 2021-09-21 北京百度网讯科技有限公司 Method, device and server for searching content resources
CN109271552B (en) * 2018-08-22 2021-08-20 北京达佳互联信息技术有限公司 Method and device for retrieving video through picture, electronic equipment and storage medium
CN109558523B (en) * 2018-11-06 2021-09-21 广东美的制冷设备有限公司 Search processing method and device and terminal equipment
CN111475603B (en) * 2019-01-23 2023-07-04 百度在线网络技术(北京)有限公司 Enterprise identification recognition method, enterprise identification recognition device, computer equipment and storage medium
CN111866609B (en) * 2019-04-08 2022-12-13 百度(美国)有限责任公司 Method and apparatus for generating video
CN111782982A (en) * 2019-05-20 2020-10-16 北京京东尚科信息技术有限公司 Method and device for sorting search results and computer-readable storage medium
CN111782841A (en) * 2019-11-27 2020-10-16 北京沃东天骏信息技术有限公司 Image searching method, device, equipment and computer readable medium
CN113536026B (en) * 2020-04-13 2024-01-23 阿里巴巴集团控股有限公司 Audio searching method, device and equipment
CN111694978B (en) * 2020-05-20 2023-04-28 Oppo(重庆)智能科技有限公司 Image similarity detection method and device, storage medium and electronic equipment
CN114329013A (en) * 2021-09-29 2022-04-12 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388022A (en) * 2008-08-12 2009-03-18 北京交通大学 Web portrait search method for fusing text semantic and vision content
CN101634996A (en) * 2009-08-13 2010-01-27 浙江大学 Individualized video sequencing method based on comprehensive consideration
CN104298749A (en) * 2014-10-14 2015-01-21 杭州淘淘搜科技有限公司 Commodity retrieval method based on image visual and textual semantic integration
CN108416028A (en) * 2018-03-09 2018-08-17 北京百度网讯科技有限公司 A kind of method, apparatus and server of search content resource

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100456300C (en) * 2006-10-27 2009-01-28 北京航空航天大学 Method for searching 3D model based on 2D sketch
CN103793434A (en) * 2012-11-02 2014-05-14 北京百度网讯科技有限公司 Content-based image search method and device
TWI536186B (en) * 2013-12-12 2016-06-01 三緯國際立體列印科技股份有限公司 Three-dimension image file serching method and three-dimension image file serching system
CN106202256B (en) * 2016-06-29 2019-12-17 西安电子科技大学 Web image retrieval method based on semantic propagation and mixed multi-instance learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388022A (en) * 2008-08-12 2009-03-18 北京交通大学 Web portrait search method for fusing text semantic and vision content
CN101634996A (en) * 2009-08-13 2010-01-27 浙江大学 Individualized video sequencing method based on comprehensive consideration
CN104298749A (en) * 2014-10-14 2015-01-21 杭州淘淘搜科技有限公司 Commodity retrieval method based on image visual and textual semantic integration
CN108416028A (en) * 2018-03-09 2018-08-17 北京百度网讯科技有限公司 A kind of method, apparatus and server of search content resource

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761252A (en) * 2020-06-03 2021-12-07 华为技术有限公司 Text matching method and device and electronic equipment
CN112001451A (en) * 2020-08-27 2020-11-27 上海擎感智能科技有限公司 Data redundancy processing method, system, medium and device
CN115150297A (en) * 2022-08-15 2022-10-04 北京百润洪科技有限公司 Data filtering and content evaluation method and system based on mobile internet
CN115150297B (en) * 2022-08-15 2023-05-19 雁展科技(深圳)有限公司 Data filtering and content evaluating method and system based on mobile internet
CN115243081A (en) * 2022-09-23 2022-10-25 北京润尼尔网络科技有限公司 Mirror image distribution method based on VR

Also Published As

Publication number Publication date
CN108416028A (en) 2018-08-17
CN108416028B (en) 2021-09-21

Similar Documents

Publication Publication Date Title
WO2019169872A1 (en) Method and device for searching for content resource, and server
US11409791B2 (en) Joint heterogeneous language-vision embeddings for video tagging and search
CN111062871B (en) Image processing method and device, computer equipment and readable storage medium
CN108334627B (en) Method and device for searching new media content and computer equipment
WO2020155423A1 (en) Cross-modal information retrieval method and apparatus, and storage medium
US9372920B2 (en) Identifying textual terms in response to a visual query
US8788434B2 (en) Search with joint image-audio queries
CN102549603B (en) Relevance-based image selection
JP6047550B2 (en) Search method, client and server
US20150178321A1 (en) Image-based 3d model search and retrieval
CN110516096A (en) Synthesis perception digital picture search
WO2019062044A1 (en) Method for interaction between electronic book and electronic book topic computing device, and storage medium
CN107533638B (en) Annotating video with tag correctness probabilities
US10685236B2 (en) Multi-model techniques to generate video metadata
CN108763244B (en) Searching and annotating within images
JP2015162244A (en) Methods, programs and computation processing systems for ranking spoken words
US9507805B1 (en) Drawing based search queries
WO2016107125A1 (en) Information searching method and apparatus
WO2023108980A1 (en) Information push method and device based on text adversarial sample
WO2024001057A1 (en) Video retrieval method based on attention segment prompt
TW201931163A (en) Image search and index building
CN116578738B (en) Graph-text retrieval method and device based on graph attention and generating countermeasure network
Panda et al. Heritage app: annotating images on mobile phones
Kuhn et al. Large Scale Video Analytics: On-demand, iterative inquiry for moving image research
CN116977701A (en) Video classification model training method, video classification method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18908971

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/01/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18908971

Country of ref document: EP

Kind code of ref document: A1