WO2015188719A1 - 结构化数据与图片的关联方法与关联装置 - Google Patents

结构化数据与图片的关联方法与关联装置 Download PDF

Info

Publication number
WO2015188719A1
WO2015188719A1 PCT/CN2015/080712 CN2015080712W WO2015188719A1 WO 2015188719 A1 WO2015188719 A1 WO 2015188719A1 CN 2015080712 W CN2015080712 W CN 2015080712W WO 2015188719 A1 WO2015188719 A1 WO 2015188719A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
structured data
description information
extended
text
Prior art date
Application number
PCT/CN2015/080712
Other languages
English (en)
French (fr)
Inventor
陶哲
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2015188719A1 publication Critical patent/WO2015188719A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text

Definitions

  • the present invention relates to the field of Internet applications, and in particular, to a method and an apparatus for associating structured data and pictures, and a method, device and system for generating structured data search result items.
  • the network resource library has replaced the previous paper resource library with its full and full features.
  • the existing picture description text can obtain a certain amount of information about the picture content, the existing description text tends to be less informative, irregular, and limited in text content.
  • the general search engine can't accurately classify and identify it, so it can't provide more accurate results for users, and can't get extended information about the content of the pictures.
  • the present invention has been made in order to provide a method of associating structured data with pictures and corresponding associated devices that overcome the above problems or at least partially solve the above problems.
  • a method for associating structured data with a picture including: acquiring text description information of a picture, performing semantic extension on the text description information, and obtaining extended extended description information; And the extended description information is matched with the encyclopedia database storing the structured data, and the topic matching the extended description information is determined; and the related description of the extended description information is selected among the plurality of pieces of structured data included in the matching topic Exceeding at least one piece of structured data of a specified relevance; associating the selected at least one piece of structured data with the picture.
  • a method for generating a structured data search result item including: acquiring a matching picture corresponding to a search query word; and generating search result according to the structured data that the picture is given item.
  • an apparatus for associating structured data with a picture including: an expansion module configured to acquire text description information of the picture, and describe the character The information is semantically extended to obtain extended extended description information; the matching module is configured to match the extended description information with the encyclopedia database storing the structured data, and determine a theme that matches the extended description information; Configuring, in the plurality of pieces of structured data included in the matching topic, selecting at least one piece of structured data whose relevance to the extended description information exceeds a specified relevance; the association module configured to structure the selected at least one piece The data is associated with the picture.
  • an apparatus for generating a structured data search result item including: an obtaining module configured to acquire a picture corresponding to a search query word; and a generating module configured to be according to the picture and The structured data associated with the image generates a search result page.
  • a system for generating a structured data search result item comprising: a Wikipedia database configured to include a plurality of topics, each topic comprising a plurality of structured data; a picture database, Configuring to store a plurality of pictures; performing semantic extension on the text description information of each picture to obtain extended extended description information; and matching the extended description information with the encyclopedia database, associating the picture with the matching At least one piece of structured data; a user terminal configured to input a search query word of the picture; a search engine configured to search for and obtain a picture corresponding to the search query word in the picture database, and search in the encyclopedia database And acquiring structured data associated with the picture, and combining the acquired picture and associated information associated with the picture to generate a search result page.
  • a computer program comprising computer readable code that, when executed on a computing device, causes the computing device to perform the method of the present invention.
  • a computer readable medium storing the computer program of the present invention is provided.
  • the text description information of the picture is first obtained, and the text description information of the picture is semantically extended to obtain extended description information.
  • the extended description information covers the content of the text description information and can expand the description by semantic extension.
  • a large amount of structured data is classified and stored according to the theme. Each subject in the encyclopedia database contains multiple pieces of structured data.
  • the extended description information is matched with the encyclopedia database, and the topic matching the extended description information is determined, and then a plurality of structured data matching the extended description information are selected under the determined topic.
  • the extended description information is obtained based on the text description information Therefore, by determining the topic that matches the extended description information, it is equivalent to determining the subject to which the structured data corresponding to the picture belongs, and the manner of matching the first determined topic can ensure the accuracy of the final obtained structured data, when different pictures
  • the text description information appears in the same situation, it can be distinguished by the theme to avoid the association error between the image and the structured data.
  • the specified correlation degree selecting at least one structured data whose correlation degree with the extended description information exceeds the specified relevance degree and the picture is associated with the picture, the degree of matching between the structured data and the picture can be ensured, and the picture is associated with the picture. To as much structured data as possible.
  • the association method in the embodiment of the present invention can achieve the association between the structured data and the picture.
  • the purpose is to accurately identify the image based on the associated structured data, thereby providing users with more accurate search results, and providing users with extended information about the content of the image.
  • FIG. 1 shows a process flow diagram of a method of associating structured data with a picture in accordance with one embodiment of the present invention
  • FIG. 2 shows a process flow diagram of a method of generating a structured data search result item in accordance with one embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of an apparatus for associating structured data with a picture according to an embodiment of the present invention
  • FIG. 4 is a block diagram showing an apparatus for generating a structured data search result item according to an embodiment of the present invention
  • FIG. 5 is a block diagram showing a structure of a system for generating structured data search result items according to an embodiment of the present invention
  • Figure 6 shows schematically a block diagram of a computing device for performing the method according to the invention
  • Fig. 7 schematically shows a storage unit for holding or carrying program code implementing the method according to the invention.
  • the embodiment of the present invention provides a new inventive concept for associating the picture with the structured data.
  • the structured data is data composed of fields in a prescribed format, and can generally be obtained by extracting and processing the data information corresponding to the terms stored in the encyclopedia database, and fully utilizing the characteristics of the encyclopedia database with large information amount and strong information reliability.
  • the resulting structured data is stored in a Wikipedia database.
  • individual structured data can also be obtained by manual editing, but only by manual editing can not meet the needs of massive structured data generation and its field update.
  • an embodiment of the present invention provides a method for associating structured data with a picture.
  • 1 shows a process flow diagram of a method of associating structured data with a picture in accordance with one embodiment of the present invention.
  • the method for associating the structured data with the picture includes at least steps S102 to S108.
  • Step S102 Acquire text description information of the picture, perform semantic extension on the text description information, and obtain extended extended description information.
  • Step S104 Match the extended description information with the encyclopedia database storing the structured data, and determine a topic that matches the extended description information.
  • Step S106 Select, in the plurality of pieces of structured data included in the matching topic, at least one piece of structured data whose relevance to the extended description information exceeds a specified relevance.
  • Step S108 Associate the selected at least one structured data with the picture.
  • the text description information of the picture is first obtained, and the text description information of the picture is semantically extended to obtain extended description information.
  • the extended description information covers the content of the text description information and can expand the description by semantic extension.
  • a large amount of structured data is classified and stored according to the theme, each in the encyclopedia database. There are multiple structured data under the theme.
  • the extended description information is matched with the encyclopedia database, and the topic matching the extended description information is determined, and then a plurality of structured data matching the extended description information are selected under the determined topic.
  • the extended description information is obtained on the basis of the text description information, by determining the topic matching the extended description information, it is equivalent to determining the subject to which the structured data corresponding to the picture belongs, and the matching manner of the first determining topic can be To ensure the accuracy of the final structured data, when the text description information of different pictures is similar, the topic can be distinguished to avoid the association error between the picture and the structured data.
  • the specified correlation degree selecting at least one structured data whose correlation degree with the extended description information exceeds the specified relevance degree and the picture is associated with the picture, the degree of matching between the structured data and the picture can be ensured, and the picture is associated with the picture. To as much structured data as possible.
  • the association method in the embodiment of the present invention can achieve the association between the structured data and the picture.
  • the purpose is to accurately identify the image based on the associated structured data, thereby providing users with more accurate search results, and providing users with extended information about the content of the image.
  • the purpose of applying the embodiment of the present invention is to associate the picture with the structured data of the star Liu Xiaoming in the encyclopedia database.
  • the specific process of association is as follows. First, get the text description information of the picture.
  • the text description information is “Liu Xiaoming and Zhang Daliang together to award the Asian Film Festival”.
  • the structured data about the star Liu Xiaoming includes: the age of the star Liu Xiaoming, the place of birth, his recent news, activities, and one or more of the data such as film and television works and music works.
  • the combination; the structured data about Professor Liu Xiaoming includes: a combination of one or more of Professor Liu Xiaoming's age, work school, teaching experience and honors. It can be found that Liu Xiaoming, who has several different identities in the encyclopedia database, has a lot of structured data for each Liu Xiaoming. Therefore, it is impossible to determine which Liu Xiaoming's structured data should be used only by the text description information of the image. Associated.
  • the structured data is data composed of fields in a prescribed format, and can generally be obtained by extracting and processing data information corresponding to the terms stored in the encyclopedia database.
  • the data processing extraction process can adopt various methods, for example, first extract all the data information corresponding to all the entries in the encyclopedia database, and the data information corresponding to each term is usually a name, and then corresponds to a piece of description information. Then, each of the extracted data information is subjected to a weight calculation based on a TF-IDF (term frequency-inverse document frequency) algorithm.
  • the weight calculation of a word based on the TF-IDF algorithm can be implemented by dividing the number of total words in all data information by the number of occurrences of the word. Therefore, when the total number of vocabularies in all data information is constant, the words with more natural occurrences, such as "," and other meaningless words, have relatively small weights.
  • weight calculation it is possible to exclude words with no natural meaning and relatively small weights, so as to screen out the words with substantial meaning in the data information.
  • the data information after the weight calculation is processed by a series of preset rules, which may be a format processing of each field. At this point, the data information is extracted and processed, and structured data is obtained.
  • the Encyclopedia database divides the structured data into topics, and each topic contains a large number of structured data with the same theme.
  • the determination of the subject in the encyclopedia database is based on word co-occurrence.
  • the word co-occurrence refers to the coexistence of several words. If several words often appear together, they can appear together in one sentence, or they can be in a natural paragraph, and the semantic meaning of these words is considered to be interrelated. For example, the words "360”, "security guard”, and "computer check” often appear together in one sentence, so the semantic meaning between the three words is considered to be related.
  • the word co-occurrence rate refers to the probability that several words appear together. The higher the co-occurrence rate of several words, the closer the semantic association between each other.
  • the Encyclopedia database stores the structured data about the star Liu Xiaoming under the entertainment theme, and the structured data about the professor Liu Xiaoming is stored under the educational theme.
  • the theme to which the structured data corresponding to the picture belongs can be determined first, and then the required structured data is selected under the subject, so as to avoid Named data causes the image to be associated with incorrect structured data.
  • the text description information of the picture may be semantically extended to obtain extended description information covering the content of the text description information and wider than the text description information, and using the extended description information and the encyclopedia
  • the database is matched to determine a topic that matches the extended description information, thereby determining the topic to which the structured data corresponding to the picture belongs.
  • the intersection and the union of the extended description information and the structured data are first taken, and the intersection is included in the extended description information and the structured data.
  • a collection of words, a union is a collection of all words that appear in the extended description information and in the structured data.
  • the ratio of the number of words in the intersection to the number of words in the union is the degree of relevance between the extended description information and the structured data. The greater the ratio of the number of intersection words to the number of union words, the higher the correlation between the extended description information and the structured data.
  • S ij represents the relevance of text i and text j
  • D i represents the word contained in text i
  • D j represents the word contained in text j
  • D i ⁇ D j represents the intersection of text i and text j
  • ie A collection of words contained in both text i and text j
  • D i ⁇ D j represents the union of text i and text j, ie the set of all words that appear in text i and text j.
  • the specified relevance is set to 70%.
  • the structured data that is not related to Liu Xiaoming of other stars under the entertainment theme it can be quickly judged that the correlation with the extended description information is less than 70%.
  • the structured data with a correlation of more than 70% at least one piece of Liu Xiaoming's structured data is selected to be associated with the picture, thereby providing the user with information about Liu Xiaoming.
  • Liu Xiaoming's structured data selected in this example can be “Liu Xiaoming, who has been the guest of the Asian Film Festival for five consecutive years. He once wanted to be a director.”
  • the method for associating structured data with a picture in the embodiment of the present invention is applied.
  • the extended description information is used to determine the matching topic, that is, the entertainment theme corresponding to Liu Xiaoming is first determined, and then structured from the entertainment theme.
  • the choice of data can guarantee the accuracy of the final structured data and avoid the correlation between the picture and the structured data of Professor Liu Xiaoming. Setting the specified relevance and selecting at least one structured data that is related to the extended description information to exceed the specified relevance is associated with the picture, and can ensure that the user is provided with accurate extended information about the picture content.
  • a picture about the star Liu Xiaoming is also provided.
  • the text description information is "Liu Xiaoming's filming in Hengdian", and the extension of the text description information is expanded to "movie, entertainment, star, shooting” and the like.
  • these extended descriptions can be matched with the entertainment themes in the encyclopedia database, and many structured data about Liu Xiaoming are retrieved under the entertainment theme, such as "Liu Xiaoming, famous movie actor, good at shooting martial arts” and “Liu Xiaoming,” I have shot more than 50 movies, and the new drama just started at Hengdian.
  • These structured data are structured data with 80% relevance to the extended description information. If the amount of structured data is small, it will be satisfied. At the same time with the picture The number of row associations can be selected to match the image at the same time. If the amount of structured data with high correlation, such as up to thousands, cannot be associated with the picture at the same time, you can select the latest and most comprehensive information from the picture to match the picture.
  • the structured data can be sorted according to the relevance level.
  • the specified relevance is set to 90%. It may be found that the structured data that matches the extended description information cannot be found by searching, or the retrieved structured data is rarely matched, and the search process is found in the search process.
  • the extended description information has a correlation degree of 80%, and the amount of information is relatively large. At this time, the correlation degree can be automatically reduced by 80%, thereby providing the user with as much extended information about the image content as possible.
  • Another preferred embodiment of the present invention also provides a picture showing white clouds and earth.
  • the text description information of the picture is "earth, white clouds".
  • the purpose of using the association method shown in Figure 1 is to structure the data of the earth and The picture is associated.
  • There may be several kinds of structured data about the earth in the encyclopedia database which may be the structured data of the earth in the natural landscape, or the structured data of a song named first in the earth. If the text is directly used to describe the information and encyclopedia The database matches, and it is not possible to determine which structured data is associated, and the association will definitely be wrong. At this time, the text description information “Dand, White Cloud” of the picture is expanded.
  • the matching theme is a natural landscape
  • the selected structured data can be selected under the natural landscape theme
  • the natural landscape can be The association of geo-structured data with images ensures that images are associated with accurate structured data.
  • the association method in the embodiment of the present invention can achieve the purpose of accurately associating structured data with a picture by means of matching pictures and corresponding structured data, and can be based on the associated structured data.
  • the images are accurately identified so that they can provide users with more accurate search results and provide users with extended information about the content of the images.
  • the text description information of the picture includes at least one of a title of the page where the picture is located, a text surrounding the picture, an anchor text of the picture (link anchor text), and a name of the picture.
  • stop words When expanding the text description information, it can be found that some words appear frequently, but they have no substantive meaning. These words are called stop words. The existence of the stop word does not help the semantic extension. Therefore, when the text description information is expanded, the text description information is first analyzed, and these non-substantial stop words are deleted to ensure that some words meaningful for the matching operation are left. Common stop words are, yes, land and other words. Stop words include, but are not limited to, words such as land, land, and a large number of meaningless semantic words. The more frequently the words appear in the encyclopedia database, the more they are considered to be meaningless when expanded, so they are considered to be stop words and deleted. The semantic extension of the deleted part increases the accuracy of the semantic extension. By performing the weight calculation based on the TF-IDF algorithm on the text description information, it is possible to exclude the stop words that are meaningless for the expansion. Stop words are also called stop words.
  • the text "Description” and “Yes” are the stop words for the text "Liu Xiaoming and Zhang Daliang are awarded together for the Asian Film Festival” mentioned in the above example.
  • the text description information should also be analyzed, and the nouns that can be used as search keywords are extracted and then expanded. It can be found that the nouns that have a role in this example are "Zhang Daliang” and “Asian Film Festival", so the extension of the two words makes it easy to match the entertainment themes in the Wikipedia database.
  • the method for deleting the stop words from the text description information, extracting the keywords, and then performing the semantic extension is extended to the text description information, thereby ensuring the validity of the extension and accurately matching the appropriate theme in the encyclopedia database. .
  • an embodiment of the present invention further provides a method for generating a structured data search result item.
  • 2 shows a process flow diagram of a method of generating a structured data search result item in accordance with one embodiment of the present invention. As shown in FIG. 2, the method of generating a structured data search result item includes at least steps S202 to S204.
  • the image matching the query word is automatically acquired, and the image has been associated with the corresponding structured data through the association method of the structured data and the image shown in FIG. 1 .
  • the generated search result item may be a picture matching the query word, and the picture is linked with the corresponding structured data, and the clicked picture may jump to the corresponding structured data page, or may include an image and corresponding
  • the method for generating a structured data search result item in the embodiment of the present invention generates a search result item according to the query word, can provide a more accurate search result for the user, and can also provide the user with extended information about the picture content.
  • the method for generating a structured data search result item in the embodiment of the present invention after obtaining the keyword "Liu Xiaoming”, automatically obtains a picture matching "Liu Xiaoming", the picture has passed
  • the association method of the structured data and the picture shown in FIG. 1 is associated with the corresponding structured data, and the search result item is generated according to the picture and the associated structured data.
  • the search result item can be a webpage containing Liu Xiaoming's picture, and the webpage automatically links to the structured data of Liu Xiaoming.
  • Clicking on the searched Liu Xiaoming's picture can automatically jump to the page containing Liu Xiaoming's structured data, and the search result item can also be It is a webpage containing Liu Xiaoming's image, which also provides structured data corresponding to Liu Xiaoming.
  • search result items may also be presented to the user in other forms.
  • the user when searching for “Liu Xiaoming”, the user can not only obtain the pictures of Liu Xiaoming, but also obtain the structured data of Liu Xiaoming.
  • the method for obtaining the structured data of Liu Xiaoming by the user in this example, such as clicking on the picture, is only an example of suitability, and the invention is not limited. In the specific implementation, the specific method depends on the specific situation.
  • FIG. 3 provides a schematic structural diagram of an apparatus for associating structured data with a picture.
  • an embodiment of the present invention provides a device for associating structured data with a picture, including:
  • the expansion module 310 is configured to obtain text description information of the picture, perform semantic extension on the text description information, and obtain extended extended description information.
  • the matching module 320 is configured to match the extended description information with the encyclopedia database storing the structured data to determine a topic that matches the extended description information.
  • the selecting module 330 is configured to select and match among the plurality of structured data included in the matching topic. Expanding at least one piece of structured data describing the relevance of the information in excess of the specified relevance.
  • the association module 340 is configured to associate the selected at least one piece of structured data with the picture.
  • the extension module 310 in order to match the picture with the structured data, is used to obtain the text description information of the picture, and then the semantic description of the picture description information is extended to obtain extended description information, and the extended description information includes the text description information. Content and expand the scope of the description through related words.
  • the encyclopedia database a large amount of structured data is classified and stored according to the theme. Each subject in the encyclopedia database contains multiple pieces of structured data.
  • the extended description information is matched with the encyclopedia database, the topic associated with the extended description information is determined, and then a plurality of structured data matching the extended description information are selected under the determined topic.
  • the use matching module 320 first determines the matching manner of the theme to ensure the accuracy of the final obtained structured data.
  • the theme can be distinguished to avoid the association error between the picture and the structured data.
  • Setting the specified relevance in the selection module 330, and selecting, by using the selection module 330, at least one structured data exceeding the specified relevance under the matching topic to be associated with the image, can ensure the matching degree between the structured data and the extended description information, and Associate as many structured data as possible for the image. Therefore, compared with the prior art, the user cannot provide more accurate search results, and the extended information about the content of the picture cannot be obtained.
  • the associated device in the embodiment of the present invention can mutually interact with the corresponding structured data through the picture.
  • the means of matching achieves the purpose of associating the structured data with the picture, and can accurately identify the picture based on the associated structured data, thereby providing users with more accurate search results and providing the user with information about the picture content.
  • Extended Information
  • the extension module 310 further needs to analyze the text description information, and delete the stop words in which the number of occurrences is large and meaningless, and the stop words include but are not limited to, the ground, the ground, and a large number of Semantic words of meaning.
  • the text description information of the picture includes at least one of a title of the page where the picture is located, a text surrounding the picture, an anchor text of the picture (link anchor text), and a name of the picture.
  • the name of the picture is obtained from at least one of the above texts.
  • FIG. 4 is a block diagram showing the structure of an apparatus for generating a structured data search result item according to an embodiment of the present invention.
  • an embodiment of the present invention provides an apparatus for generating a structured data search result item, including:
  • the obtaining module 410 is configured to obtain a picture corresponding to the search query word.
  • the generating module 420 is configured to generate a search result page according to the picture and the structured data associated with the picture.
  • the acquiring module 410 automatically acquires the image matching the query word, and the image has been associated with the corresponding structured data by the method shown in FIG. 1 .
  • the usage generation module 420 generates a search result page based on the image and the associated structured data.
  • the search result page can be a webpage containing the searched image, and the webpage automatically links a webpage with corresponding structured data, and can jump by clicking the webpage, or can include an image and corresponding structured data.
  • the specific implementation of the search results item depends on the specific situation.
  • the device for generating a structured data search result item in the embodiment of the present invention generates a search result item according to the query word, can provide an accurate search result for the user, and can also provide the user with extended information about the picture content.
  • the user searches for "Liu Xiaoming" by using the device for generating a structured data search result item in the embodiment of the present invention, not only the picture of Liu Xiaoming is obtained, but also the structured data of Liu Xiaoming can be obtained by clicking the picture. .
  • the method for obtaining the structured data of Liu Xiaoming by the user in this example, such as clicking on the picture, is only an example of suitability, and the invention is not limited. In the specific implementation, the specific method depends on the specific situation.
  • FIG. 5 shows a schematic structural diagram of a system for generating structured data search result items according to an embodiment of the present invention.
  • the system for generating structured data search result items includes:
  • the encyclopedia database 510 is configured to include a plurality of topics, each of which includes a plurality of structured data.
  • the picture database 520 is configured to store a plurality of pictures; perform semantic extension on the text description information of each picture to obtain extended extended description information; and match the extended description information with the encyclopedia database to associate the picture with the matching At least one structured data obtained.
  • the user terminal 530 is configured to input a search query word of a picture.
  • the search engine 540 is configured to search for and obtain a picture corresponding to the search query word in the picture database, and search for and obtain structured data associated with the picture in the encyclopedia database, and combine the acquired picture with the picture Link the information to generate a search results page.
  • the system for generating a structured data search result item can perform semantic extension on the text description information of each picture in the picture database 520 to obtain extended description information, and then according to the extended description information and the encyclopedia database 510. Matching, determining a topic matching the extended description information in the encyclopedia database 510, and selecting at least one structured data that is more related to the extended description information and exceeding a specified threshold from the corresponding topic, and is associated with the corresponding picture. among them
  • the encyclopedia database 510 includes a plurality of topics, each of which includes a plurality of structured data. This completes the process of associating images with structured data.
  • the system in the embodiment of the present invention has a user terminal 530, and the user can input a search query word of the picture by using the user terminal 530.
  • the system in the embodiment of the present invention further has a search engine 540.
  • the search engine 540 can search for and obtain the corresponding image in the image database 520 according to the image query word, and can also obtain the structured data corresponding to the image in the encyclopedia database 510. Based on the image and associated structured data, a search results page can be generated.
  • the picture can be associated with the corresponding structured data, and when the user searches for the picture, the user is provided with an accurate search result, and can also provide information about the picture content. Extended information.
  • a method for associating structured data with a picture is provided.
  • the text description information of the picture is first obtained, and the text description information of the picture is semantically extended to obtain extended description information.
  • the extended description information covers the content of the text description information and can expand the description by semantic extension.
  • the encyclopedia database a large amount of structured data is classified and stored according to the theme. Each subject in the encyclopedia database contains multiple pieces of structured data.
  • the extended description information is matched with the encyclopedia database, and the topic matching the extended description information is determined, and then a plurality of structured data matching the extended description information are selected under the determined topic.
  • the extended description information is obtained on the basis of the text description information, by determining the topic matching the extended description information, it is equivalent to determining the subject to which the structured data corresponding to the picture belongs, and the matching manner of the first determining topic can be To ensure the accuracy of the final structured data, when the text description information of different pictures is similar, the topic can be distinguished to avoid the association error between the picture and the structured data.
  • the specified correlation degree selecting at least one structured data whose correlation degree with the extended description information exceeds the specified relevance degree and the picture is associated with the picture, the degree of matching between the structured data and the picture can be ensured, and the picture is associated with the picture. To as much structured data as possible.
  • the association method in the embodiment of the present invention can achieve the association between the structured data and the picture.
  • the purpose is to accurately identify the image based on the associated structured data, thereby providing users with more accurate search results, and providing users with extended information about the content of the image.
  • another embodiment of the present invention further provides a device for associating structured data with a picture, which cannot provide a better user than the prior art.
  • the problem of the extended information about the content of the picture is not obtained by the accurate search result.
  • the associated device in the embodiment of the present invention can achieve the purpose of associating the structured data with the picture by means of matching the picture with the corresponding structured data. Based on the associated structured data, the image can be accurately identified, thereby providing users with more accurate search results, and providing users with extended information about the content of the image.
  • Another embodiment of the present invention further provides a method for generating a structured data search result item, which can generate a search result item according to the query word, thereby providing a more accurate search result for the user, and providing the user with information about the picture content.
  • Extended Information is a method for generating a structured data search result item, which can generate a search result item according to the query word, thereby providing a more accurate search result for the user, and providing the user with information about the picture content.
  • another embodiment further provides a device for generating a structured data search result item, which can generate a search result item according to the query word, thereby providing a more accurate search for the user.
  • the user can also be provided with extended information about the content of the picture.
  • the picture can be associated with the corresponding structured data, and when the user searches for the picture, the user is provided with an accurate search result, and can also provide information about Extended information for the content of the image.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the instructions including companion All of the features disclosed in the claims, the abstract and the drawings, and all processes or units of any of the methods or devices disclosed herein are combined.
  • Each feature disclosed in this specification including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of some or all of the components of the structured data and picture associated device in accordance with embodiments of the present invention.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals.
  • Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 6 illustrates a computing device in which the method in accordance with the present invention can be implemented.
  • the computing device conventionally includes a processor 610 and a computer program product or computer readable medium in the form of a memory 620.
  • the memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • Memory 620 has a memory space 630 for program code 631 for performing any of the method steps described above.
  • storage space 630 for program code may include various program code 631 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • Such computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG.
  • the storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 620 in the computing device of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit includes computer readable code 631', code that can be read by a processor, such as 610, such code. When executed by a computing device, the computing device is caused to perform various steps in the methods described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Software Systems (AREA)

Abstract

一种结构化数据与图片的关联方法与关联装置,其中关联方法包括:获取图片的文字描述信息,对文字描述信息进行语意扩展,得到扩展后的扩展描述信息;将扩展描述信息与存储有结构化数据的百科数据库进行匹配,确定与扩展描述信息匹配的主题;在匹配的主题包括的多条结构化数据中,选择与扩展描述信息相关度超过指定相关度的至少一条结构化数据;将选择的至少一条结构化数据与图片进行关联。利用所述方法和装置能够达到结构化数据与图片相关联的目的,基于相关联的结构化数据能够对图片进行准确的识别,从而能够为用户提供更精准的搜索结果,还能为用户提供关于图片内容的扩展信息。

Description

结构化数据与图片的关联方法与关联装置 技术领域
本发明涉及互联网应用领域,特别是涉及一种结构化数据与图片的关联方法与关联装置,以及一种生成结构化数据搜索结果项的方法、装置与系统。
背景技术
随着技术的发展,网络已经成为人们目前最大的资料来源。当用户需要了解某些特定信息时,最常用的方式就是到网上获取,网络资源库已以其内容的多、全的特性取代了曾经的纸质资源库。
当用户通过网络搜索到想要的图片后,虽然通过现有的图片描述文本能够获取到一定量的关于图片内容的信息,但现有的描述文本往往信息量少、不规范,且文本内容有限,一般的搜索引擎无法对其进行准确分类和识别,从而也无法为用户提供更多更精准的结果,也无法得到关于图片内容的扩展信息。
发明内容
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的结构化数据与图片的关联方法和相应的关联装置。
依据本发明实施例的一个方面,提供了一种结构化数据与图片的关联方法,包括:获取图片的文字描述信息,对所述文字描述信息进行语意扩展,得到扩展后的扩展描述信息;将所述扩展描述信息与存储有结构化数据的百科数据库进行匹配,确定与所述扩展描述信息匹配的主题;在匹配的主题包括的多条结构化数据中,选择与所述扩展描述信息相关度超过指定相关度的至少一条结构化数据;将选择的所述至少一条结构化数据与所述图片进行关联。
依据本发明实施例的另一个方面,还提供了一种生成结构化数据搜索结果项的方法,包括:获取搜索查询词对应匹配的图片;根据所述图片被赋予的结构化数据,生成搜索结果项。
依据本发明实施例的另一个方面,还提供了一种结构化数据与图片的关联装置,包括:扩展模块,配置为获取图片的文字描述信息,对所述文字描 述信息进行语意扩展,得到扩展后的扩展描述信息;匹配模块,配置为将所述扩展描述信息与存储有结构化数据的百科数据库进行匹配,确定与所述扩展描述信息匹配的主题;选择模块,配置为在匹配的主题包括的多条结构化数据中,选择与所述扩展描述信息相关度超过指定相关度的至少一条结构化数据;关联模块,配置为将选择的所述至少一条结构化数据与所述图片进行关联。
依据本发明实施例的另一个方面,还提供了一种生成结构化数据搜索结果项的装置,包括:获取模块,配置为获取搜索查询词对应的图片;生成模块,配置为根据所述图片以及所述图片关联的结构化数据,生成搜索结果页。
依据本发明实施例的另一个方面,还提供了一种生成结构化数据搜索结果项的系统,包括:百科数据库,配置为包括多个主题,每个主题包括多条结构化数据;图片数据库,配置为存储有多张图片;对每张图片的文字描述信息进行语意扩展,得到扩展后的扩展描述信息;以及,根据所述扩展后的描述信息与百科数据库匹配,关联该图片与匹配得到的至少一条结构化数据;用户终端,配置为输入图片的搜索查询词;搜索引擎,配置为在所述图片数据库中搜索并获取所述搜索查询词对应的图片,以及,在所述百科数据库中搜索并获取与所述图片关联的结构化数据,结合获取的所述图片以及与所述图片关联的关联信息,生成搜索结果页。
根据本发明的又一个方面,提供了一种计算机程序,其包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行本发明所述的方法。
根据本发明的再一个方面,提供了一种计算机可读介质,其中存储了本发明所述的计算机程序。
本发明的有益效果为:
本发明实施例中,为使图片与结构化数据相关联,首先获取图片的文字描述信息,并对图片的文字描述信息进行语意扩展得到扩展描述信息。扩展描述信息涵盖文字描述信息的内容并能够通过语意扩展将描述范围扩大。在百科数据库中,大量结构化数据按照主题进行分类存储,百科数据库中每个主题下包含多条结构化数据。利用扩展描述信息与百科数据库相匹配,确定与扩展描述信息相匹配的主题,进而在确定的主题下选择与扩展描述信息匹配的若干条结构化数据。由于扩展描述信息是在文字描述信息的基础上得到 的,因此通过确定与扩展描述信息相匹配的主题,等同于确定与图片对应的结构化数据所属的主题,这种首先确定主题的匹配方式能够保证最后获得结构化数据的准确性,当不同图片的文字描述信息出现雷同情况时可以通过主题进行区分,避免出现图片与结构化数据的关联错误。通过设定指定相关度,选择相匹配的主题下与扩展描述信息的相关度超过指定相关度的至少一条结构化数据与图片进行关联,能够保证结构化数据与图片的匹配程度,并为图片关联到尽可能多的结构化数据。由此可见,相对于现有技术无法为用户提供更精准的搜索结果,也无法得到关于图片内容的扩展信息的问题,应用本发明实施例中的关联方法能够达到结构化数据与图片相关联的目的,基于相关联的结构化数据能够对图片进行准确的识别,从而能够为用户提供更精准的搜索结果,还能为用户提供关于图片内容的扩展信息。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了根据本发明一个实施例的结构化数据与图片的关联方法的处理流程图;
图2示出了根据本发明一个实施例的生成结构化数据搜索结果项的方法的处理流程图;
图3示出了根据本发明一个实施例的结构化数据与图片的关联装置的结构示意图;
图4示出了根据本发明一个实施例的生成结构化数据搜索结果项的装置的结构示意图;
图5示出了根据本发明一个实施例的生成结构化数据搜索结果项的系统的结构示意图;
图6示意性地示出了用于执行根据本发明方法的计算设备的框图;以及
图7示意性地示出了用于保持或者携带实现根据本发明方法的程序代码的存储单元。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
为解决上述问题,达到向用户提供更精准的图片搜索结果,还向用户提供关于图片内容的扩展信息的目的,本发明实施例提供了一种将图片与结构化数据关联起来的新的发明构思。其中结构化数据是经过规定格式的字段所组成的数据,一般可以通过对存储在百科数据库中的词条对应的数据信息提取加工得到,充分利用百科数据库信息量大且信息可靠性较强的特性,将得到的结构化数据存储在百科数据库中。当然个别结构化数据也可以通过人工编辑的方式获取,但是仅仅依靠人工编辑无法满足海量的结构化数据生成及其字段更新的需要。
基于上述发明构思,本发明实施例提供了一种结构化数据与图片的关联方法。图1示出了根据本发明一个实施例的结构化数据与图片的关联方法的处理流程图。参见图1,该结构化数据与图片的关联方法至少包括步骤S102至步骤S108。
步骤S102、获取图片的文字描述信息,对文字描述信息进行语意扩展,得到扩展后的扩展描述信息。
步骤S104、将扩展描述信息与存储有结构化数据的百科数据库进行匹配,确定与扩展描述信息匹配的主题。
步骤S106、在匹配的主题包括的多条结构化数据中,选择与扩展描述信息相关度超过指定相关度的至少一条结构化数据。
步骤S108、将选择的至少一条结构化数据与图片进行关联。
本发明实施例中,为使图片与结构化数据相关联,首先获取图片的文字描述信息,并对图片的文字描述信息进行语意扩展得到扩展描述信息。扩展描述信息涵盖文字描述信息的内容并能够通过语意扩展将描述范围扩大。在百科数据库中,大量结构化数据按照主题进行分类存储,百科数据库中每个 主题下包含多条结构化数据。利用扩展描述信息与百科数据库相匹配,确定与扩展描述信息相匹配的主题,进而在确定的主题下选择与扩展描述信息匹配的若干条结构化数据。由于扩展描述信息是在文字描述信息的基础上得到的,因此通过确定与扩展描述信息相匹配的主题,等同于确定与图片对应的结构化数据所属的主题,这种首先确定主题的匹配方式能够保证最后获得结构化数据的准确性,当不同图片的文字描述信息出现雷同情况时可以通过主题进行区分,避免出现图片与结构化数据的关联错误。通过设定指定相关度,选择相匹配的主题下与扩展描述信息的相关度超过指定相关度的至少一条结构化数据与图片进行关联,能够保证结构化数据与图片的匹配程度,并为图片关联到尽可能多的结构化数据。由此可见,相对于现有技术无法为用户提供更精准的搜索结果,也无法得到关于图片内容的扩展信息的问题,应用本发明实施例中的关联方法能够达到结构化数据与图片相关联的目的,基于相关联的结构化数据能够对图片进行准确的识别,从而能够为用户提供更精准的搜索结果,还能为用户提供关于图片内容的扩展信息。
为使本发明实施例容易理解,现举例对其进行具体说明。以一张人物为明星刘小明的图片举例,图片内容是刘小明和张大亮一同出席亚洲电影节颁奖典礼,为使用户在检索明星刘小明的图片时,能够准确的检索到这张图片,还能获取关于明星刘小明的信息,应用本发明实施例的目的是将该图片与百科数据库中明星刘小明的结构化数据进行关联。关联的具体过程如下。首先获取图片的文字描述信息,文字描述信息是“刘小明和张大亮一同为亚洲电影节颁奖”,百科数据库中存在很多个不同的叫刘小明的名人,有著名教授刘小明、明星刘小明等,关于每个刘小明都有很多结构化数据,比如关于明星刘小明的结构化数据包括:明星刘小明的年龄、出生地、他最近的新闻、活动、以及影视作品、音乐作品等数据信息中的一项或几项的组合;关于教授刘小明的结构化数据包括:教授刘小明的年龄、工作学校、教学经历及所获荣誉中的一项或几项的组合。由此可以发现,由于百科数据库中存在若干个不同身份但名字相同的刘小明,针对每个刘小明都有很多结构化数据,因此仅仅通过图片的文字描述信息无法确定图片应当与哪个刘小明的结构化数据相关联。
为在若干个身份不同的刘小明的结构化数据中筛选出明星刘小明的结构化数据,并将其与图片进行关联,需要对百科数据库中的结构化数据进行 分析。结构化数据是经过规定格式的字段所组成的数据,一般可以通过对存储在百科数据库中的词条对应的数据信息提取加工得到。
数据信息的提取加工过程可采用多种方式,比如首先提取百科数据库中全部词条对应的所有的数据信息,每个词条对应的数据信息通常是一个名称,然后对应一段描述的信息。然后对提取得到的所有数据信息中的每个词分别进行基于TF-IDF(term frequency—inverse document frequency,词频—反文档频率)算法的权值计算。基于TF-IDF算法的某个词的权值计算可以用所有数据信息中总词汇的个数除以该词出现的次数来实现。因此当所有数据信息中总词汇的个数一定时,对于天然出现次数就比较多的词,比如“的、了”等无实质意义的词,其权值就相对较小。通过权值计算可以排除天然出现次数较多、权值相对较小的无实质意义的词,从而筛选出数据信息中有实质意义的词。最后对权值计算后的数据信息进行一系列预设规则的处理,可以是对每个字段的格式处理等。至此就完成了数据信息的提取加工,得到了结构化数据。
百科数据库对结构化数据进行了主题划分,每个主题下包含大量同主题的结构化数据。百科数据库中对主题的确定是基于词共现来计算的。词共现指的是若干个词共同出现。如果若干个词经常共同出现,可以共同出现在一句话中,也可以是一个自然段中,则认为这些词的语意之间是相互关联的。比如“360”、“安全卫士”、“电脑体检”这三个词中经常共同出现在一句话中,因此认为这三个词之间的语意是关联的。词共现率指的是若干个词共同出现的概率。若干个词的共现率越高,则彼此之间的语意关联越密切。因此可以将词共现率较高的若干个词划分为一个主题。可以利用PLSA(Probability Latent Semantic Analysis,概率潜在语义分析)主题模型,通过词共现计算和矩阵方程对结构化数据中的所有词进行计算分析,将共现率比较高的若干个词划分为一个主题,同时得到主题中每个词出现的概率,一个词可以在每个主题中重复出现。如利用PLSA主题模型,将“360”、“安全卫士”、“电脑体检”三个词划分在一个主题中,该主题中“360”出现的概率是0.6777,同时,PLSA主题模型也将“360”和“互联网企业”两个词划分在一个主题中,该主题中“360”出现的概率是0.553。同一个主题下的词可以按照一定的规则进行排序,可以按照词出现概率的由高到低进行排序,也可以按照其它规则进行排序。
基于以上结构化数据的获取和存储方法,百科数据库将关于明星刘小明的结构化数据存储在娱乐主题下,关于教授刘小明的结构化数据存储在教育主题下。基于这种分主题的存储方式,为了筛选出明星刘小明的结构化数据,可以首先确定图片对应的结构化数据所属的主题,然后在所属的主题下选择需要的结构化数据,从而避免由于人物重名等原因使图片关联错误的结构化数据。为了确定与图片对应的结构化数据所属的主题,可以对图片的文字描述信息进行语意扩展,得到涵盖文字描述信息的内容并且比文字描述信息范围更广的扩展描述信息,利用扩展描述信息与百科数据库相匹配,确定与扩展描述信息相匹配的主题,从而确定与图片对应的结构化数据所属的主题。
将本例中图片的文字描述信息“刘小明和张大亮一同为亚洲电影节颁奖”进行语意扩展,由于张大亮是著名影视导演,因此通过对人名“张大亮”进行扩展,能够得到“导演、影视、娱乐”等一系列有关的词。对“亚洲电影节”进行扩展,能够得到“电影、活动、娱乐、影视”等一系列词。由“张大亮”、“亚洲电影节”进行扩展后得到的词共同组成扩展描述信息。将上述扩展描述信息与百科数据库中主题的分类进行匹配可确定上述扩展描述信息所属的主题为娱乐主题,因此与该图片中的“刘小明”对应的结构化数据存储在娱乐主题下。
本例中,娱乐主题下存有很多娱乐类的结构化数据,为能够在大量的结构化数据中获取到与扩展描述信息相关度较高的数据,从而为图片提供准确的数据信息,设定指定相关度作为匹配结构化数据的标准。本例中计算扩展描述信息和某一条结构化数据的相关度时,首先取扩展描述信息和该条结构化数据的交集和并集,交集是扩展描述信息中和该条结构化数据中都包含的词的集合,并集是扩展描述信息中和该条结构化数据中出现的所有的词的集合。交集中词的个数与并集中词的个数之比,就是扩展描述信息和该条结构化数据的相关度。交集中词的个数与并集中词的个数之比越大,说明扩展描述信息和该条结构化数据相关度越高。
相关度具体计算公式为
Figure PCTCN2015080712-appb-000001
其中Sij表示文本i和文本j的相关度,Di表示文本i中所含有的词,Dj表示文本j中所含有的词,Di∩Dj表示文本i和文本j的交集,即文本i和文本j中都包含的词的集合,Di∪Dj表示文本 i和文本j的并集,即文本i和文本j中出现的所有的词的集合。
在本发明实施例中,为了从娱乐类主题中获取到关于刘小明的结构化数据,设定指定相关度为70%。对于娱乐主题下关于其他明星的与刘小明无关的结构化数据,可以迅速判断出与扩展描述信息相关度小于70%。在相关度超过70%的结构化数据中,选择至少一条刘小明的结构化数据与图片进行关联,从而为用户提供关于刘小明的信息。基于本例中的扩展描述信息,本例中选择的刘小明的结构化数据可以是“刘小明,连续五年担任亚洲电影节的颁奖嘉宾,曾自爆想改行当导演。”
在本实例中,应用本发明实施例中的结构化数据与图片的关联方法,由于先利用扩展描述信息确定匹配的主题,即先确定与刘小明对应的娱乐主题,再从娱乐主题中进行结构化数据的选择,能够保证最后获得的结构化数据的准确性,避免图片与教授刘小明的结构化数据相互关联。设定指定相关度并选择与扩展描述信息相关度超过指定相关度的至少一条结构化数据与图片进行关联,能够保证为用户提供准确的关于图片内容的扩展信息。
为图片关联结构化数据时,有时会出现无法建立关联关系的情况。比如对于图片主题名为“刘小明与王大川一同赴宴”的图片,对应主题名的这条文本描述信息,出现了“刘小明”、“王大川”、“赴宴”等词,通过对“王大川”扩展可得到“著名企业家、经济学家”等扩展信息,从而判断属于经济领域,而对“赴宴”进行扩展无法确定主题领域,而数据库中的几个“刘小明”都不存在与经济领域或类型相匹配的情况,因此此时就算通过扩展也无法确定文本描述信息所属的主题,从而也无法建立此图片与数据库中某一个刘小明的结构化数据的关联关系。
在主题下选择结构化数据时,可能存在有若干条相关度超过指定相关度的结构化数据,并且这些结构化数据具有相同的相关度的情况。在本发明另一个实施例中,同样提供了一张关于明星刘小明的图片,文字描述信息是“刘小明在横店拍戏”,通过对文字描述信息进行扩展得到“电影、娱乐、明星、拍摄”等扩展描述信息,这些扩展描述信息能够与百科数据库中的娱乐主题进行匹配,在娱乐主题下检索到关于刘小明的很多结构化数据,比如“刘小明,著名的电影演员,擅长拍摄武打戏”与“刘小明,曾拍摄过50多部电影,新戏刚在横店开机”等,这些结构化数据都是与扩展描述信息相关度为80%的结构化数据,这时如果这类结构化数据数量不多,满足同时与图片进 行关联的数量要求,可以同时选择出来与图片进行匹配。如果这种相关度都很高的结构化数据数量大,比如多达数千条,无法同时与图片进行关联,可以从中选择内容最新的,信息最全面的若干条与图片进行匹配。当图片与多条结构化数据进行匹配时,可以将结构化数据按照相关度等级进行排序。
在某些情况下,在主题下进行搜索时,提起设定好的相关度可能会过高,导致检索不到合适的结构化数据。比如在某主题下,将指定相关度定为90%,通过检索可能发现找不到与扩展描述信息匹配的结构化数据,或者检索到的能匹配结构化数据很少,而检索过程中发现与扩展描述信息的相关度为80%的结构化数据很多,信息量比较大,这时可以自动降低相关度为80%,从而为用户提供尽可能多的关于图片内容的扩展信息。
上述具体实例中的相关度的数值、以及选择出的与图片匹配的结构化数据的条数均仅为例举,对本发明并未造成限定。在具体实施时,相关度的数值、以及选择出的与图片匹配的结构化数据的条数均由具体情况而定。
本发明另一个优选实施例还提供了一张展示白云和大地的图片,图片的文字描述信息是“大地,白云”,利用图1中所示的关联方法的目的是将大地的结构化数据与图片进行关联。百科数据库中可能存在若干种条关于大地的结构化数据,可能是自然景观中的大地的结构化数据,也可能是某首名为大地的歌曲的结构化数据,如果直接用文字描述信息与百科数据库相匹配,无法确定关联哪条结构化数据,同时关联肯定会出现错误。这时对图片的文字描述信息“大地,白云”进行扩展,通过扩展描述信息可以确定相匹配的主题是自然景观,从而在自然景观主题下选择相匹配的结构化数据,最终能够将自然景观中的关于大地结构化数据与图片进行关联,确保图片关联到准确的结构化数据。
由上述发明实施例可知,应用本发明实施例中的关联方法能够通过图片与对应的结构化数据相互匹配的手段,达到结构化数据与图片准确关联的目的,基于相关联的结构化数据能够对图片进行准确的识别,从而能够为用户提供更精准的搜索结果,还能为用户提供关于图片内容的扩展信息。
在本发明的一个优选实施例中,图片的文字描述信息至少包括图片所在页面的标题、图片周围的文本、图片的anchor文本(链接锚文本)和图片的名称中的一项或多项。
在获取到图片后,可以通过图片所在页面的标题、图片周围的文本、图 片的anchor文本(链接锚文本)和图片的名称中的一项或多项获得图片的名称。尽可能获取多的图片描述信息在扩展后更容易与百科数据库中的主题进行匹配。
在对文字描述信息进行扩展时,可以发现,有些词出现的频率很高,但是却又没有实质意义,这类词称为停止词。停止词的存在对于语意扩展没有帮助,因此在对文字描述信息进行扩展时,首先对文字描述信息进行分析,删除这些无实质意义的停止词,保证留下一些对于匹配操作有意义的词。常见的停止词有的、得、地等词。停止词包括但不仅限于的、得、地等词,还包括大量无意义的语义词。通常在百科数据库中出现次数越多的词,在扩展时会认为其越接近无意义,因此会认为是停止词而予以删除。对删除后的部分进行语意扩展,增加了语意扩展的准确性。通过对文字描述信息进行基于TF-IDF算法的权值计算,能够排除对于扩展没有意义的停止词。停止词也叫做停用词。
在本发明的另一个优选实施例中,对于在上述例子中提到的“刘小明和张大亮一同为亚洲电影节颁奖”这句文字描述信息,“和”、“为”就是停止词,在进行扩展时,首先对其进行删除。再分析句子“刘小明张大亮一同亚洲电影节颁奖”可以发现,“一同”和“颁奖”虽然不是无意义词,不属于停止词范畴,但也没有对匹配主题起到帮助作用,因此在删除停止词之后,还应对文字描述信息进行分析,提取出能作为检索关键字的名词,然后再进行扩展。可以发现,本例中有作用的名词是“张大亮”、“亚洲电影节”,因此对着两个词进行扩展,很容易就能够和百科数据库中的娱乐主题进行匹配。
本发明实施例中采用从文字描述信息中删除停止词,提取关键字,再进行语意扩展的方法对文字描述信息进行扩展,保证了扩展的有效性,能够在百科数据库中准确匹配到合适的主题。
基于同一发明构思,本发明实施例还提供了一种生成结构化数据搜索结果项的方法。图2示出了根据本发明一个实施例的生成结构化数据搜索结果项的方法的处理流程图。如图2所示,生成结构化数据搜索结果项的方法至少包括步骤S202至S204。
S202、获取搜索查询词对应匹配的图片。
S204、根据图片被赋予的结构化数据,生成搜索结果项。
本发明实施例中,在获取到查询词后,会自动获取与查询词相匹配的图片,并且图片已经通过图1所示的结构化数据与图片的关联方法与对应的结构化数据进行了关联,再根据图片与关联的结构化数据生成搜索结果项。生成的搜索结果项可以是一个与查询词相匹配的图片,并且图片和相应的结构化数据进行了链接,通过点击图片可以跳转到相应的结构化数据页面,也可以是一个包含图片和对应的结构化数据的网页,搜索结果项的具体实施方式视具体情况而定。
通过本发明实施例中的生成结构化数据搜索结果项的方法,根据查询词生成搜索结果项,能够为用户提供更准确的搜索结果,还能为用户提供关于图片内容的扩展信息。
以搜索刘小明的图片为例,应用本发明实施例中的生成结构化数据搜索结果项的方法,在获取到关键字“刘小明”后,会自动获取与“刘小明”相匹配的图片,图片已经通过图1所示的结构化数据与图片的关联方法与对应的结构化数据进行了关联,根据图片与关联的结构化数据生成搜索结果项。搜索结果项可以是一个包含刘小明图片的网页,同时网页自动链接关于刘小明的结构化数据,点击搜索到的刘小明的图片即可自动跳转到包含刘小明的结构化数据的页面,搜索结果项也可以是一个包含刘小明图片的网页,网页中还提供与刘小明对应的结构化数据。此处只是示意性的举例,搜索结果项也可以以其他形式展示给用户。
相应地,用户在搜索“刘小明”时,不仅能获取到刘小明的图片,还能获取到刘小明的结构化数据。本例中给出的用户获取刘小明的结构化数据的方法如点击图片只是适宜性的举例,对本发明并未造成限定。在具体实施时,具体方法视具体情况而定。
对应图1中示出的结构化数据与图片的关联方法,图3提供了结构化数据与图片的关联装置的结构示意图。如图3所示,本发明实施例提供一种结构化数据与图片的关联装置,包括:
扩展模块310,配置为获取图片的文字描述信息,对文字描述信息进行语意扩展,得到扩展后的扩展描述信息。
匹配模块320,配置为将扩展描述信息与存储有结构化数据的百科数据库进行匹配,确定与扩展描述信息匹配的主题。
选择模块330,配置为在匹配的主题包括的多条结构化数据中,选择与 扩展描述信息相关度超过指定相关度的至少一条结构化数据。
关联模块340,配置为将选择的至少一条结构化数据与图片进行关联。
本发明实施例中,为使图片与结构化数据相匹配,利用扩展模块310获取图片的文字描述信息,然后并对图片的文字描述信息进行语意扩展得到扩展描述信息,扩展描述信息涵盖文字描述信息的内容并通过关联词扩大描述的范围。在百科数据库中,大量结构化数据按照主题进行分类存储,百科数据库中每个主题下包含多条结构化数据。将扩展描述信息与百科数据库相匹配,确定与扩展描述信息相关联的主题,进而在确定的主题下选择与扩展描述信息匹配的若干条结构化数据。这种利用匹配模块320首先确定主题的匹配方式保证了最后获得结构化数据的准确性,当不同图片的文字描述信息出现雷同情况时可以通过主题进行区分,避免出现图片与结构化数据的关联错误。在选择模块330中设定指定相关度,利用选择模块330选择相匹配的主题下超过指定相关度的至少一条结构化数据与图片进行关联,能够保证结构化数据与扩展描述信息的匹配程度,并为图片关联到尽可能多的结构化数据。由此可见,相对于现有技术无法为用户提供更精准的搜索结果,也无法得到关于图片内容的扩展信息的问题,应用本发明实施例中的关联装置能够通过图片与对应的结构化数据相互匹配的手段,达到结构化数据与图片相关联的目的,基于相关联的结构化数据能够对图片进行准确的识别,从而能够为用户提供更精准的搜索结果,还能为用户提供关于图片内容的扩展信息。
在本发明的另一个实施例中,扩展模块310还需要对文字描述信息进行分析,删除其中出现次数多又无意义的停止词,停止词包括但不限于的、得、地,还包括大量无意义的语义词。
在本发明的一个优选实施例中,图片的文字描述信息至少包括图片所在页面的标题、图片周围的文本、图片的anchor文本(链接锚文本)和图片的名称中的一项或多项。其中,图片的名称从上述文本中的至少一项获得。
对应图2中示出的生成结构化数据搜索结果项的方法,图4示出了根据本发明一个实施例的生成结构化数据搜索结果项的装置的结构示意图。如图4所示,本发明实施例提供了一种生成结构化数据搜索结果项的装置,包括:
获取模块410,配置为获取搜索查询词对应的图片。
生成模块420,配置为根据图片以及图片关联的结构化数据,生成搜索结果页。
本发明实施例中,在搜索图片的查询词后,通过获取模块410会自动获取与查询词相匹配的图片,图片已经通过图1所示的方法与对应的结构化数据进行了关联。利用生成模块420根据图片与关联的结构化数据生成搜索结果页面。搜索结果页面可以是一个包含搜索到的图片的网页,同时网页自动链接一个含有对应的结构化数据的网页,通过点击网页的方式即可跳转,也可以是一个包含图片和对应的结构化数据的网页,搜索结果项的具体实施方式视具体情况而定。
通过本发明实施例中的生成结构化数据搜索结果项的装置,根据查询词生成搜索结果项,能够为用户提供准确的搜索结果,还能为用户提供关于图片内容的扩展信息。
在另一个优选实施例中,用户利用本发明实施例中的生成结构化数据搜索结果项的装置在搜索“刘小明”时,不仅会获取刘小明的图片,通过点击图片还可以获得刘小明的结构化数据。本例中给出的用户获取刘小明的结构化数据的方法如点击图片只是适宜性的举例,对本发明并未造成限定。在具体实施时,具体方法视具体情况而定。
进一步地,综合上述内容,图5示出了根据本发明一个实施例的生成结构化数据搜索结果项的系统的结构示意图。该生成结构化数据搜索结果项的系统包括:
百科数据库510,配置为包括多个主题,每个主题包括多条结构化数据。
图片数据库520,配置为存储有多张图片;对每张图片的文字描述信息进行语意扩展,得到扩展后的扩展描述信息;以及,根据扩展后的描述信息与百科数据库匹配,关联该图片与匹配得到的至少一条结构化数据。
用户终端530,配置为输入图片的搜索查询词。
搜索引擎540,配置为在图片数据库中搜索并获取所述搜索查询词对应的图片,以及,在百科数据库中搜索并获取与图片关联的结构化数据,结合获取的所述图片以及与图片关联的关联信息,生成搜索结果页。
应用本发明实施例中提供的生成结构化数据搜索结果项的系统,能够对图片数据库520中的每张图片的文字描述信息进行语意扩展,得到扩展描述信息,再根据扩展描述信息与百科数据库510匹配,确定百科数据库510中与扩展描述信息相匹配的主题,再从对应的主题中选择与扩展描述信息相关度较高的超过指定阈值的至少一条结构化数据,关联到相应的图片上。其中 百科数据库510中包括多个主题,每个主题包括多条结构化数据。至此就完成图片与结构化数据关联的过程。
本发明实施例中的系统具有用户终端530,用户能够利用用户终端530输入图片的搜索查询词。本发明实施例中的系统还具有搜索引擎540,搜索引擎540能够根据图片查询词在图片数据库520中搜索并获取对应的图片,还能在百科数据库510中获取到与图片相对应的结构化数据,根据图片和关联的结构化数据,能够生成搜索结果页。
利用本发明实施例中提供的生成结构化数据搜索结果项的系统,图片能够与对应的结构化数据进行关联,当用户搜素图片时,为用户提供准确的搜索结果,还能够提供关于图片内容的扩展信息。
由上可知,本发明一个实施例中,提供了一种结构化数据与图片的关联方法。为使图片与结构化数据相关联,首先获取图片的文字描述信息,并对图片的文字描述信息进行语意扩展得到扩展描述信息。扩展描述信息涵盖文字描述信息的内容并能够通过语意扩展将描述范围扩大。在百科数据库中,大量结构化数据按照主题进行分类存储,百科数据库中每个主题下包含多条结构化数据。利用扩展描述信息与百科数据库相匹配,确定与扩展描述信息相匹配的主题,进而在确定的主题下选择与扩展描述信息匹配的若干条结构化数据。由于扩展描述信息是在文字描述信息的基础上得到的,因此通过确定与扩展描述信息相匹配的主题,等同于确定与图片对应的结构化数据所属的主题,这种首先确定主题的匹配方式能够保证最后获得结构化数据的准确性,当不同图片的文字描述信息出现雷同情况时可以通过主题进行区分,避免出现图片与结构化数据的关联错误。通过设定指定相关度,选择相匹配的主题下与扩展描述信息的相关度超过指定相关度的至少一条结构化数据与图片进行关联,能够保证结构化数据与图片的匹配程度,并为图片关联到尽可能多的结构化数据。由此可见,相对于现有技术无法为用户提供更精准的搜索结果,也无法得到关于图片内容的扩展信息的问题,应用本发明实施例中的关联方法能够达到结构化数据与图片相关联的目的,基于相关联的结构化数据能够对图片进行准确的识别,从而能够为用户提供更精准的搜索结果,还能为用户提供关于图片内容的扩展信息。
对应上述结构化数据与图片的关联方法,本发明另一个实施例中还提供了一种结构化数据与图片的关联装置,相对于现有技术无法为用户提供更精 准的搜索结果,也无法得到关于图片内容的扩展信息的问题,应用本发明实施例中的关联装置能够通过图片与对应的结构化数据相互匹配的手段,达到结构化数据与图片相关联的目的,基于相关联的结构化数据能够对图片进行准确的识别,从而能够为用户提供更精准的搜索结果,还能为用户提供关于图片内容的扩展信息。
本发明另一个实施例还提供了一种生成结构化数据搜索结果项的方法,能够根据查询词生成搜索结果项,从而能够为用户提供更准确的搜索结果,还能为用户提供关于图片内容的扩展信息。
对应上述生成结构化数据搜索结果项的方法,另一个实施例中还提供了一种生成结构化数据搜索结果项的装置,能够根据查询词生成搜索结果项,从而能够为用户提供更准确的搜索结果,还能为用户提供关于图片内容的扩展信息。
利用本发明另一个实施例中提供的生成结构化数据搜索结果项的系统,图片能够与对应的结构化数据进行关联,当用户搜素图片时,为用户提供准确的搜索结果,还能够提供关于图片内容的扩展信息。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴 随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的结构化数据与图片的关联装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图6示出了可以实现根据本发明的方法的计算设备。该计算设备传统上包括处理器610和以存储器620形式的计算机程序产品或者计算机可读介质。存储器620可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器620具有用于执行上述方法中的任何方法步骤的程序代码631的存储空间630。例如,用于程序代码的存储空间630可以包括分别用于实现上面的方法中的各种步骤的各个程序代码631。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图7所述的便携式或者固定存储单元。该存储单元可以具有与图6的计算设备中的存储器620类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码631’,即可以由例如诸如610之类的处理器读取的代码,这些代码 当由计算设备运行时,导致该计算设备执行上面所描述的方法中的各个步骤。
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。

Claims (13)

  1. 一种结构化数据与图片的关联方法,包括:
    获取图片的文字描述信息,对所述文字描述信息进行语意扩展,得到扩展后的扩展描述信息;
    将所述扩展描述信息与存储有结构化数据的百科数据库进行匹配,确定与所述扩展描述信息匹配的主题;
    在匹配的主题包括的多条结构化数据中,选择与所述扩展描述信息相关度超过指定相关度的至少一条结构化数据;
    将选择的所述至少一条结构化数据与所述图片进行关联。
  2. 根据权利要求1所述的方法,其中,对所述文字描述信息进行语意扩展,包括:
    分别对所述文字描述信息进行分析,删除其中的停止词,其中,所述停止词是指出现多次但无实质意义的词;
    对删除后的部分进行语意扩展。
  3. 根据权利要求1至2任一项所述的方法,其中,所述停止词包括下列至少之一:的、得、地。
  4. 根据权利要求1至3任一项所述的方法,其中,所述图片的文字描述信息,包括下列至少之一:
    图片所在页面的标题;
    图片周围的文本;
    图片的anchor文本;
    图片的名称,其中,所述图片的名称由上述文本至少之一获得。
  5. 一种生成结构化数据搜索结果项的方法,包括:
    获取搜索查询词对应匹配的图片;
    根据所述图片被赋予的结构化数据,生成搜索结果项。
  6. 一种结构化数据与图片的关联装置,包括:
    扩展模块,配置为获取图片的文字描述信息,对所述文字描述信息进行语意扩展,得到扩展后的扩展描述信息;
    匹配模块,配置为将所述扩展描述信息与存储有结构化数据的百科数据库进行匹配,确定与所述扩展描述信息匹配的主题;
    选择模块,配置为在匹配的主题包括的多条结构化数据中,选择与所述扩展描述信息相关度超过指定相关度的至少一条结构化数据;
    关联模块,配置为将选择的所述至少一条结构化数据与所述图片进行关联。
  7. 根据权利要求6所述的装置,其中,所述扩展模块还配置为:
    对所述文字描述信息进行分析,删除其中的停止词,其中,所述停止词是指出现多次但无实质意义的词;
    对删除后的部分进行语意扩展。
  8. 根据权利要求7所述的装置,其中,所述停止词包括下列至少之一:的、得、地。
  9. 根据权利要求6至8任一项所述的装置,其中,所述图片的文字描述信息,包括下列至少之一:
    图片所在页面的标题;
    图片周围的文本;
    图片的anchor文本;
    图片的名称,其中,所述图片的名称由上述文本至少之一获得。
  10. 一种生成结构化数据搜索结果项的装置,包括:
    获取模块,配置为获取搜索查询词对应的图片;
    生成模块,配置为根据所述图片以及所述图片关联的结构化数据,生成搜索结果页。
  11. 一种生成结构化数据搜索结果项的系统,包括:
    百科数据库,配置为包括多个主题,每个主题包括多条结构化数据;
    图片数据库,配置为存储有多张图片;对每张图片的文字描述信息进行语意扩展,得到扩展后的扩展描述信息;以及,根据所述扩展后的描述信息与百科数据库匹配,关联该图片与匹配得到的至少一条结构化数据;
    用户终端,配置为输入图片的搜索查询词;
    搜索引擎,配置为在所述图片数据库中搜索并获取所述搜索查询词对应的图片,以及,在所述百科数据库中搜索并获取与所述图片关联的结构化数据,结合获取的所述图片以及与所述图片关联的关联信息,生成搜索结果页。
  12. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行根据权利要求1-5中的任一个所 述的方法。
  13. 一种计算机可读介质,其中存储了如权利要求12所述的计算机程序。
PCT/CN2015/080712 2014-06-09 2015-06-03 结构化数据与图片的关联方法与关联装置 WO2015188719A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410253722.8A CN104008180B (zh) 2014-06-09 2014-06-09 结构化数据与图片的关联方法与关联装置
CN201410253722.8 2014-06-09

Publications (1)

Publication Number Publication Date
WO2015188719A1 true WO2015188719A1 (zh) 2015-12-17

Family

ID=51368837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/080712 WO2015188719A1 (zh) 2014-06-09 2015-06-03 结构化数据与图片的关联方法与关联装置

Country Status (2)

Country Link
CN (1) CN104008180B (zh)
WO (1) WO2015188719A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008180B (zh) * 2014-06-09 2017-04-12 北京奇虎科技有限公司 结构化数据与图片的关联方法与关联装置
CN105488160A (zh) * 2015-11-30 2016-04-13 北大方正集团有限公司 一种图片挂接方法及装置、知识图谱的制作方法
WO2018119593A1 (zh) * 2016-12-26 2018-07-05 华为技术有限公司 一种语句推荐方法及装置
CN108197239B (zh) * 2017-12-29 2021-08-24 北京奇元科技有限公司 一种生成兴趣点网络拓扑图的方法及装置
US11631497B2 (en) * 2018-05-30 2023-04-18 International Business Machines Corporation Personalized device recommendations for proactive health monitoring and management
CN108984740B (zh) * 2018-07-16 2021-03-26 百度在线网络技术(北京)有限公司 页面交互方法、装置、设备及计算机可读介质
CN111462478B (zh) * 2019-01-22 2021-07-27 北京中合云通科技发展有限公司 一种城市路网信号控制子区划分方法及装置
CN113743438B (zh) * 2020-08-20 2024-06-18 北京沃东天骏信息技术有限公司 文本检测用数据集生成方法、装置和系统
CN113255349A (zh) * 2021-05-28 2021-08-13 北京字节跳动网络技术有限公司 一种信息处理的方法、装置以及计算机存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120106853A1 (en) * 2010-11-01 2012-05-03 Microsoft Corporation Image search
CN103226601A (zh) * 2013-04-25 2013-07-31 百度在线网络技术(北京)有限公司 一种图片搜索的方法和装置
CN103559220A (zh) * 2013-10-18 2014-02-05 北京奇虎科技有限公司 图片搜索设备、方法及系统
CN104008180A (zh) * 2014-06-09 2014-08-27 北京奇虎科技有限公司 结构化数据与图片的关联方法与关联装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102200966A (zh) * 2011-06-01 2011-09-28 潍坊北大青鸟华光照排有限公司 一种版面信息提取和加工的方法
CN102902771A (zh) * 2012-09-27 2013-01-30 百度国际科技(深圳)有限公司 一种图片搜索方法、装置及服务器
CN103425780B (zh) * 2013-08-19 2016-08-17 曙光信息产业股份有限公司 一种数据的查询方法和装置
CN103793498B (zh) * 2014-01-22 2017-08-25 百度在线网络技术(北京)有限公司 图片搜索方法、装置以及搜索引擎

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120106853A1 (en) * 2010-11-01 2012-05-03 Microsoft Corporation Image search
CN103226601A (zh) * 2013-04-25 2013-07-31 百度在线网络技术(北京)有限公司 一种图片搜索的方法和装置
CN103559220A (zh) * 2013-10-18 2014-02-05 北京奇虎科技有限公司 图片搜索设备、方法及系统
CN104008180A (zh) * 2014-06-09 2014-08-27 北京奇虎科技有限公司 结构化数据与图片的关联方法与关联装置

Also Published As

Publication number Publication date
CN104008180A (zh) 2014-08-27
CN104008180B (zh) 2017-04-12

Similar Documents

Publication Publication Date Title
WO2015188719A1 (zh) 结构化数据与图片的关联方法与关联装置
CN108694223B (zh) 一种用户画像库的构建方法及装置
US9305084B1 (en) Tag selection, clustering, and recommendation for content hosting services
WO2017020451A1 (zh) 信息推送方法和装置
KR101659097B1 (ko) 복수의 저장된 디지털 이미지들을 탐색하기 위한 방법 및 장치
US8909617B2 (en) Semantic matching by content analysis
JP2022065108A (ja) 電子記録の文脈検索のためのシステム及び方法
US20180101614A1 (en) Machine Learning-Based Data Aggregation Using Social Media Content
US10592571B1 (en) Query modification based on non-textual resource context
JP2017220203A (ja) 類似性スコアに基づきコンテンツアイテムと画像とのマッチングを評価する方法、およびシステム
JP2017508214A (ja) 検索推奨の提供
US20130226559A1 (en) Apparatus and method for providing internet documents based on subject of interest to user
WO2017113592A1 (zh) 模型生成方法、词语赋权方法、装置、设备及计算机存储介质
Brenner et al. Social event detection and retrieval in collaborative photo collections
KR101651780B1 (ko) 빅 데이터 처리 기술을 이용한 연관 단어 추출 방법 및 그 시스템
US20230086735A1 (en) Systems and methods for retrieving videos using natural language description
KR101696499B1 (ko) 한국어 키워드 검색문 해석 장치 및 방법
JP7395377B2 (ja) コンテンツ検索方法、装置、機器、および記憶媒体
JP2008268985A (ja) タグを付与する方法
EP3144825A1 (en) Enhanced digital media indexing and retrieval
US20170075999A1 (en) Enhanced digital media indexing and retrieval
JP2018005633A (ja) 関連コンテンツ抽出装置、関連コンテンツ抽出方法及び関連コンテンツ抽出プログラム
US20210342393A1 (en) Artificial intelligence for content discovery
US8195458B2 (en) Open class noun classification
JP2015036892A (ja) 情報処理装置、情報処理方法、及び、プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15806852

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15806852

Country of ref document: EP

Kind code of ref document: A1