CN103838769A - Search system and method - Google Patents

Search system and method Download PDF

Info

Publication number
CN103838769A
CN103838769A CN201210486931.8A CN201210486931A CN103838769A CN 103838769 A CN103838769 A CN 103838769A CN 201210486931 A CN201210486931 A CN 201210486931A CN 103838769 A CN103838769 A CN 103838769A
Authority
CN
China
Prior art keywords
picture
noun
weight
search
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210486931.8A
Other languages
Chinese (zh)
Inventor
李忠一
叶建发
卢俊锜
柳岳岑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hongfujin Precision Industry Shenzhen Co Ltd
Hon Hai Precision Industry Co Ltd
Original Assignee
Hongfujin Precision Industry Shenzhen Co Ltd
Hon Hai Precision Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hongfujin Precision Industry Shenzhen Co Ltd, Hon Hai Precision Industry Co Ltd filed Critical Hongfujin Precision Industry Shenzhen Co Ltd
Priority to CN201210486931.8A priority Critical patent/CN103838769A/en
Publication of CN103838769A publication Critical patent/CN103838769A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Abstract

The invention provides a search method. The search method includes the following steps that a picture is received; the graphic feature of the picture is analyzed, similarity calculation is performed on the graphic feature and all pictures in a picture library, and therefore the first N pictures of which the values of similarities are high are obtained according to the value of the calculated similarity; files containing the obtained N similarity pictures are found out, the positions of the N similarity pictures in the files are positioned respectively, and character information near the positions is obtained, weights of nouns or noun phrases in the obtained character information are calculated, the n nouns or noun phases with the high weights are obtained and serve as keywords, and the keywords are input to a search engine to perform full-text search. The invention further provides a search system. By means of the system and the method, the pictures can be used for searching for the character information needed by a user.

Description

Search system and method
Technical field
The present invention relates to Internet technical field, especially about a kind of search system and method.
Background technology
The development pole the earth of computer networking technology has improved the convenience of people's obtaining informations.In computer network, stored the information of magnanimity, found own required information for the ease of people, various search engines are widely used.
Traditional search engine depends on the keyword of user's input to a great extent, and the keyword providing according to user provides relevant Search Results to user.But, along with mass storage and datumization vision facilities, as popularizing of video camera, camera etc., all can produce every day such as a large amount of dissimilar picture, such as science, medical science, geography, lives etc.How utilizing these picture retrievals is a very important problem to required Word message.
Summary of the invention
In view of above content, be necessary to propose a kind of search system and method, the Word message that it can utilize picture searching user to need.
Described search system comprises: receiver module, the picture of inputting at a search engine through the browser of a terminal device for receiving user; Analysis module, for analyzing the graphic feature of received picture, and do similarity according to all pictures in this graphic feature and a picture library and calculate, and according to the value of the similarity of calculating, from described picture library, obtain the higher picture of value of top n similarity; Locating module, for finding out one or more file that comprises obtained a N similar pictures, and respectively N similar pictures obtaining of location its position hereof, and obtain near the Word message in this position; Weight computation module, for the noun of obtained all Word messages or noun phrase are carried out to weights calculating, generates the weight of each noun or noun phrase; And retrieval module, for according to described weight, n the noun that obtains that wherein weight is higher or noun phrase, be input to obtained a n noun or noun phrase in described search engine and do full-text search as key word, and return to result for retrieval to user.
Described searching method comprises: receive the picture that user inputs in a search engine through the browser of a terminal device; Analyze the graphic feature of the picture receiving, and do similarity according to all pictures in this graphic feature and a picture library and calculate, and according to the value of the similarity of calculating, from described picture library, obtain the higher picture of value of top n similarity; Find out one or more file that comprises obtained a N similar pictures, and respectively N similar pictures obtaining of location its position hereof, and obtain near the Word message in this position; Noun in obtained all Word messages or noun phrase are carried out to weights calculating, generate the weight of each noun or noun phrase; And according to described weight, n the noun that obtains that wherein weight is higher or noun phrase, be input to obtained a n noun or noun phrase in described search engine and do full-text search as key word, and return to result for retrieval to user.
Utilize search system provided by the present invention and method to find fast the webpage oneself needing by the search course of other users with same search object.
Accompanying drawing explanation
Fig. 1 is the applied environment figure of search system preferred embodiment of the present invention.
Fig. 2 is the functional block diagram of search system preferred embodiment of the present invention.
Fig. 3 is the method flow diagram of searching method preferred embodiment of the present invention.
Fig. 4 is the schematic diagram of a webpage that comprises picture.
Main element symbol description
Application server 1
Terminal device 2
Web page server 3
Picture library 4
Search system 10
Receiver module 100
Analysis module 101
Locating module 102
Weight computation module 103
Retrieval module 104
Storage unit 20
Control module 30
Following embodiment further illustrates the present invention in connection with above-mentioned accompanying drawing.
Embodiment
Consulting shown in Fig. 1, is the applied environment figure of search system preferred embodiment of the present invention.Described search system 10 is applied in application server 1.Described application server 1 sees through network and is connected with multiple terminal devices 2 and web page server 3 communications.Described network can be Internet or intranet etc.Described terminal device 2 can be personal computer, panel computer, PDA(personal digital assistant, personal digital assistant), the electric terminal equipment such as smart mobile phone.
Described web page server 3 is for the browsing service of webpage is provided, its built-in or external picture library 4.In described picture library 4, store the picture in described webpage, the address that wherein each picture comprises related information and records this picture place webpage, and positional information in the webpage of place.Web page server 3 is applied after the web-page requests that server 1 transmits, and obtains the picture that required webpage is corresponding from picture library 4, forms complete webpage and sends to corresponding terminal device 2 by application server 1 together with the Word message in required webpage.In other embodiment of the present invention, described web page server 3 also can be combined into a Web server with apps server with described application server 1.
Consulting shown in Fig. 2, is the functional block diagram of search system 10 preferred embodiments of the present invention.
Search system 10 of the present invention comprises multiple functional modules that are made up of sequencing code (referring to following description), there is following function: the picture that receives user's input, analyze the graphic feature in this picture, in picture library 4, find some similar pictures according to the graphic feature of analyzing out, obtain the file at this similar pictures place, as webpage etc., and locate each similar pictures its position hereof, obtain near the Word message in position, calculate the weight of noun in this Word message or noun phrase, find out one or more noun that weight is higher or noun phrase as search key.Described noun refers to and represents that the word of persons or things is as " computing machine ", " user ", " network " etc., and described noun phrase refers to the phrase being made up of several nouns or noun and modifier thereof, as " computer network ", " authorized user " etc.
The sequencing code storage of described search system 10 is in the storage unit 20 of application server 1, and performed to realize its function by the control module 30 of application server 1.The storage unit 20 of application server 1 can be the storage facilitiess such as smart media card (smart media card), safe digital card (secure digital card), flash memory cards (flash card).The control module 30 of described application server 1 can be central processing unit etc.
In the present embodiment, the functional module being made up of sequencing code in described search system 10 comprises receiver module 100, analysis module 101, locating module 102, weight computation module 103 and retrieval module 104.Below in conjunction with the function of Fig. 3 specification module 100 ~ 104.
Consulting shown in Fig. 3, is the method flow diagram of searching method preferred embodiment of the present invention.According to different demands, in this process flow diagram, the order of step can change, and some step can be omitted.
Step S01, receiver module 100 receives the picture that user inputs in a search engine through the browser of terminal device 2.
Step S02, analysis module 101 is analyzed the graphic feature of received picture, and does similarity according to all pictures in this graphic feature and picture library 4 and calculate.The graphic feature of described picture comprises tone, profile, shape etc.In this preferred embodiment, the analysis of graphic feature and the calculating of similarity can adopt SIFT(Scale Invariant Feature Transform) algorithm.
Step S03, analysis module 101, according to the value of the similarity of calculating, obtains the higher picture of value of top n similarity from described picture library 4.
Step S04, locating module 102 finds out according to the related information of the picture of storage in picture library 4 one or more file that comprises obtained a N similar pictures, as webpage.As mentioned above, the address that each picture in picture library 4 comprises related information and records this picture place webpage, and positional information in the webpage of place.
Step S05, locating module 102 according to described related information locate respectively obtained a N similar pictures its position hereof, and obtain near the Word message this position.In below describing, the Word message obtaining is considered as to passage information from a file.Near described, can refer to the capable Word message of the capable or left and right m of the upper and lower m of picture, can be also the Word message that surrounds this picture.For example, in webpage as shown in Figure 4, can be upper and lower two row or the Word message that surrounds this picture near described, i.e. " earth " and " famous blue hoodle photo is taken in 1972 by No. 17 spaceships of Apollo ".Described encirclement span is from the relatively little Word message of the line space of picture, for example, in Fig. 4, the distance between Word message " earth " and " famous blue hoodle photo is taken in 1972 by No. 17 spaceships of Apollo " and picture is significantly less than other Word message.
Step S06, the noun in obtained all Word messages or noun phrase are carried out weights calculating by weight computation module 103, generates the weight of each noun or noun phrase.Preferred embodiment of the present invention adopts TF-IDF(term frequency – inverse document frequency, word frequency-reverse file frequency) weighting algorithm calculates the weight of noun or noun phrase.Described TF-IDF is a kind of weighting technique of prospecting for information retrieval and information, in order to assess a noun or the noun phrase wherein significance level of a section for obtained all Word messages.The number of times that the importance of noun or noun phrase occurs in same section of Word message along with it increase that is directly proportional, but simultaneously can be along with the frequency that it occurs at the multistage Word message decline that is inversely proportional to.For example, total the number of the noun in passage information or noun phrase is 100, and noun " computing machine " has occurred 3 times, and the word frequency (TF) of " computing machine " word in this section of Word message is exactly 3/100=0.03 so.And if " computing machine " word occurred at 1,000 section of Word message, and the hop count of total Word message is 10,000,000, its reverse file frequency (IDF) is exactly log (10,000,000/1,000)=4, therefore, the weights of " computing machine " word are 0.03*4=0.12.
Other embodiment of the present invention also can adopt independent TF(term frequency, word frequency) weighting algorithm, do not consider the frequency that noun or noun phrase occur in described multistage Word message.In addition, other embodiment of the present invention also can adopt Boolean weighting algorithm.Described Boolean weighting algorithm refers to randomly draws several nouns or noun phrase in passage information, calculates the frequency that it occurs in this section of Word message.
Step S07, weight computation module 103 can also according to the each similar pictures in obtained a N similar pictures its position hereof, adjust the weight of described noun or noun phrase.It is generally acknowledged, if picture is at the homepage of a file, this picture outbalance in this part of file.Therefore, if some or several noun or the corresponding similar pictures of noun phrase in the homepage of its place file, weight computation module 103 can be multiplied by a coefficient by the weighted value of this one or several noun or noun phrase, as 1.1.
Step S08, retrieval module 104 is according to described weight, n the noun that obtains that wherein weight is higher or noun phrase.
Step S09, retrieval module 104 is input to obtained a n noun or noun phrase in described search engine and does full-text search as key word, and returns to result for retrieval to user.
Finally it should be noted that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can modify or be equal to replacement technical scheme of the present invention, and not depart from the spirit and scope of technical solution of the present invention.

Claims (10)

1. a search system, is characterized in that, this system comprises:
Receiver module, the picture of inputting at a search engine through the browser of a terminal device for receiving user;
Analysis module, for analyzing the graphic feature of received picture, and do similarity according to all pictures in this graphic feature and a picture library and calculate, and according to the value of the similarity of calculating, from described picture library, obtain the higher picture of value of top n similarity;
Locating module, for finding out one or more file that comprises obtained a N similar pictures, and respectively N similar pictures obtaining of location its position hereof, and obtain near the Word message in this position;
Weight computation module, for the noun of obtained all Word messages or noun phrase are carried out to weights calculating, generates the weight of each noun or noun phrase; And
Retrieval module, for according to described weight, n the noun that obtains that wherein weight is higher or noun phrase, be input to obtained a n noun or noun phrase in described search engine and do full-text search as key word, and return to result for retrieval to user.
2. search system as claimed in claim 1, is characterized in that, the graphic feature of described picture comprises tone, profile and shape.
3. search system as claimed in claim 1, is characterized in that, the address that each picture in described picture library comprises related information and records this picture place webpage, and positional information in the webpage of place.
4. search system as claimed in claim 1, is characterized in that, the calculating of described weights adopts TF-IDF weighting algorithm.
5. search system as claimed in claim 1, is characterized in that, described weight computation module also for according to each similar pictures its position hereof, adjust the weight of described noun or noun phrase.
6. a searching method, is characterized in that, the method comprises:
Receiving step: receive the picture that user inputs in a search engine through the browser of a terminal device;
Analytical procedure: the graphic feature of analyzing the picture receiving, and do similarity according to all pictures in this graphic feature and a picture library and calculate, and according to the value of the similarity of calculating, from described picture library, obtain the higher picture of value of top n similarity;
Positioning step: find out one or more file that comprises obtained a N similar pictures, and respectively N similar pictures obtaining of location its position hereof, and obtain near the Word message in this position;
Weight calculation step: the noun in obtained all Word messages or noun phrase are carried out to weights calculating, generate the weight of each noun or noun phrase; And
Search step: according to described weight, n the noun that obtains that wherein weight is higher or noun phrase, be input to obtained a n noun or noun phrase in described search engine and do full-text search as key word, and return to result for retrieval to user.
7. searching method as claimed in claim 6, is characterized in that, the graphic feature of described picture comprises tone, profile and shape.
8. searching method as claimed in claim 6, is characterized in that, the address that each picture in described picture library comprises related information and records this picture place webpage, and positional information in the webpage of place.
9. searching method as claimed in claim 6, is characterized in that, the calculating of described weights adopts TF-IDF weighting algorithm.
10. searching method as claimed in claim 6, is characterized in that, described weight calculation step also comprises:
According to each similar pictures its position hereof, adjust the weight of described noun or noun phrase.
CN201210486931.8A 2012-11-26 2012-11-26 Search system and method Pending CN103838769A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210486931.8A CN103838769A (en) 2012-11-26 2012-11-26 Search system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210486931.8A CN103838769A (en) 2012-11-26 2012-11-26 Search system and method

Publications (1)

Publication Number Publication Date
CN103838769A true CN103838769A (en) 2014-06-04

Family

ID=50802279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210486931.8A Pending CN103838769A (en) 2012-11-26 2012-11-26 Search system and method

Country Status (1)

Country Link
CN (1) CN103838769A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1378159A (en) * 2001-03-26 2002-11-06 Lg电子株式会社 Picture searching method and device
CN102012934A (en) * 2010-11-30 2011-04-13 百度在线网络技术(北京)有限公司 Method and system for searching picture
CN102033925A (en) * 2010-12-15 2011-04-27 闫迎瑞 Method for searching related word information by uploading a picture

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1378159A (en) * 2001-03-26 2002-11-06 Lg电子株式会社 Picture searching method and device
CN102012934A (en) * 2010-11-30 2011-04-13 百度在线网络技术(北京)有限公司 Method and system for searching picture
CN102033925A (en) * 2010-12-15 2011-04-27 闫迎瑞 Method for searching related word information by uploading a picture

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘金松等: "基于网页上下文分析的图片检索", 《语言计算与基于内容的文本处理——全国第七届计算语言学联合学术会议论文集》 *
热依玛依·买买提等: "基于文本的图片检索中图片相关文本提取技术的研究", 《少数民族青年自然语言处理技术研究与进展——第三届全国少数民族青年自然语言信息处理、第二届全国多语言知识库建设联合学术研讨会论文集》 *
焦隽等: "一种在无标注图像库中进行的基于关键词的检索方法", 《模式识别与人工智能》 *

Similar Documents

Publication Publication Date Title
CN108804532B (en) Query intention mining method and device and query intention identification method and device
JP6161679B2 (en) Search engine and method for realizing the same
US7519588B2 (en) Keyword characterization and application
Bennett et al. Inferring and using location metadata to personalize web search
US9262532B2 (en) Ranking entity facets using user-click feedback
US8762326B1 (en) Personalized hot topics
CN108304444B (en) Information query method and device
US8631097B1 (en) Methods and systems for finding a mobile and non-mobile page pair
US20110055238A1 (en) Methods and systems for generating non-overlapping facets for a query
CN109885773A (en) A kind of article personalized recommendation method, system, medium and equipment
CN104143005A (en) Related searching system and method
CN111858915A (en) Information recommendation method and system based on label similarity
CN104615723B (en) The determination method and apparatus of query word weighted value
JP5952711B2 (en) Prediction server, program and method for predicting future number of comments in prediction target content
US9454568B2 (en) Method, apparatus and computer storage medium for acquiring hot content
KR101508583B1 (en) Semantic searching system and method for smart device
US9465875B2 (en) Searching based on an identifier of a searcher
CN105224555A (en) A kind of methods, devices and systems of search
TW201421267A (en) Searching system and method
Chen et al. A framework for annotating OpenStreetMap objects using geo-tagged tweets
CN103902687A (en) Search result generating method and search result generating device
Tabarcea et al. Framework for location-aware search engine
CN104392000B (en) Determine the method and apparatus that mobile site captures quota
CN109408725B (en) Method and apparatus for determining user interest
CN103838769A (en) Search system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140604