CN103838769A - Search system and method - Google Patents
Search system and method Download PDFInfo
- Publication number
- CN103838769A CN103838769A CN201210486931.8A CN201210486931A CN103838769A CN 103838769 A CN103838769 A CN 103838769A CN 201210486931 A CN201210486931 A CN 201210486931A CN 103838769 A CN103838769 A CN 103838769A
- Authority
- CN
- China
- Prior art keywords
- picture
- noun
- weight
- search
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Abstract
The invention provides a search method. The search method includes the following steps that a picture is received; the graphic feature of the picture is analyzed, similarity calculation is performed on the graphic feature and all pictures in a picture library, and therefore the first N pictures of which the values of similarities are high are obtained according to the value of the calculated similarity; files containing the obtained N similarity pictures are found out, the positions of the N similarity pictures in the files are positioned respectively, and character information near the positions is obtained, weights of nouns or noun phrases in the obtained character information are calculated, the n nouns or noun phases with the high weights are obtained and serve as keywords, and the keywords are input to a search engine to perform full-text search. The invention further provides a search system. By means of the system and the method, the pictures can be used for searching for the character information needed by a user.
Description
Technical field
The present invention relates to Internet technical field, especially about a kind of search system and method.
Background technology
The development pole the earth of computer networking technology has improved the convenience of people's obtaining informations.In computer network, stored the information of magnanimity, found own required information for the ease of people, various search engines are widely used.
Traditional search engine depends on the keyword of user's input to a great extent, and the keyword providing according to user provides relevant Search Results to user.But, along with mass storage and datumization vision facilities, as popularizing of video camera, camera etc., all can produce every day such as a large amount of dissimilar picture, such as science, medical science, geography, lives etc.How utilizing these picture retrievals is a very important problem to required Word message.
Summary of the invention
In view of above content, be necessary to propose a kind of search system and method, the Word message that it can utilize picture searching user to need.
Described search system comprises: receiver module, the picture of inputting at a search engine through the browser of a terminal device for receiving user; Analysis module, for analyzing the graphic feature of received picture, and do similarity according to all pictures in this graphic feature and a picture library and calculate, and according to the value of the similarity of calculating, from described picture library, obtain the higher picture of value of top n similarity; Locating module, for finding out one or more file that comprises obtained a N similar pictures, and respectively N similar pictures obtaining of location its position hereof, and obtain near the Word message in this position; Weight computation module, for the noun of obtained all Word messages or noun phrase are carried out to weights calculating, generates the weight of each noun or noun phrase; And retrieval module, for according to described weight, n the noun that obtains that wherein weight is higher or noun phrase, be input to obtained a n noun or noun phrase in described search engine and do full-text search as key word, and return to result for retrieval to user.
Described searching method comprises: receive the picture that user inputs in a search engine through the browser of a terminal device; Analyze the graphic feature of the picture receiving, and do similarity according to all pictures in this graphic feature and a picture library and calculate, and according to the value of the similarity of calculating, from described picture library, obtain the higher picture of value of top n similarity; Find out one or more file that comprises obtained a N similar pictures, and respectively N similar pictures obtaining of location its position hereof, and obtain near the Word message in this position; Noun in obtained all Word messages or noun phrase are carried out to weights calculating, generate the weight of each noun or noun phrase; And according to described weight, n the noun that obtains that wherein weight is higher or noun phrase, be input to obtained a n noun or noun phrase in described search engine and do full-text search as key word, and return to result for retrieval to user.
Utilize search system provided by the present invention and method to find fast the webpage oneself needing by the search course of other users with same search object.
Accompanying drawing explanation
Fig. 1 is the applied environment figure of search system preferred embodiment of the present invention.
Fig. 2 is the functional block diagram of search system preferred embodiment of the present invention.
Fig. 3 is the method flow diagram of searching method preferred embodiment of the present invention.
Fig. 4 is the schematic diagram of a webpage that comprises picture.
Main element symbol description
|
1 |
|
2 |
|
3 |
Picture library | 4 |
|
10 |
|
100 |
|
101 |
Locating |
102 |
|
103 |
|
104 |
|
20 |
Control module | 30 |
Following embodiment further illustrates the present invention in connection with above-mentioned accompanying drawing.
Embodiment
Consulting shown in Fig. 1, is the applied environment figure of search system preferred embodiment of the present invention.Described search system 10 is applied in application server 1.Described application server 1 sees through network and is connected with multiple terminal devices 2 and web page server 3 communications.Described network can be Internet or intranet etc.Described terminal device 2 can be personal computer, panel computer, PDA(personal digital assistant, personal digital assistant), the electric terminal equipment such as smart mobile phone.
Described web page server 3 is for the browsing service of webpage is provided, its built-in or external picture library 4.In described picture library 4, store the picture in described webpage, the address that wherein each picture comprises related information and records this picture place webpage, and positional information in the webpage of place.Web page server 3 is applied after the web-page requests that server 1 transmits, and obtains the picture that required webpage is corresponding from picture library 4, forms complete webpage and sends to corresponding terminal device 2 by application server 1 together with the Word message in required webpage.In other embodiment of the present invention, described web page server 3 also can be combined into a Web server with apps server with described application server 1.
Consulting shown in Fig. 2, is the functional block diagram of search system 10 preferred embodiments of the present invention.
The sequencing code storage of described search system 10 is in the storage unit 20 of application server 1, and performed to realize its function by the control module 30 of application server 1.The storage unit 20 of application server 1 can be the storage facilitiess such as smart media card (smart media card), safe digital card (secure digital card), flash memory cards (flash card).The control module 30 of described application server 1 can be central processing unit etc.
In the present embodiment, the functional module being made up of sequencing code in described search system 10 comprises receiver module 100, analysis module 101, locating module 102, weight computation module 103 and retrieval module 104.Below in conjunction with the function of Fig. 3 specification module 100 ~ 104.
Consulting shown in Fig. 3, is the method flow diagram of searching method preferred embodiment of the present invention.According to different demands, in this process flow diagram, the order of step can change, and some step can be omitted.
Step S01, receiver module 100 receives the picture that user inputs in a search engine through the browser of terminal device 2.
Step S02, analysis module 101 is analyzed the graphic feature of received picture, and does similarity according to all pictures in this graphic feature and picture library 4 and calculate.The graphic feature of described picture comprises tone, profile, shape etc.In this preferred embodiment, the analysis of graphic feature and the calculating of similarity can adopt SIFT(Scale Invariant Feature Transform) algorithm.
Step S03, analysis module 101, according to the value of the similarity of calculating, obtains the higher picture of value of top n similarity from described picture library 4.
Step S04, locating module 102 finds out according to the related information of the picture of storage in picture library 4 one or more file that comprises obtained a N similar pictures, as webpage.As mentioned above, the address that each picture in picture library 4 comprises related information and records this picture place webpage, and positional information in the webpage of place.
Step S05, locating module 102 according to described related information locate respectively obtained a N similar pictures its position hereof, and obtain near the Word message this position.In below describing, the Word message obtaining is considered as to passage information from a file.Near described, can refer to the capable Word message of the capable or left and right m of the upper and lower m of picture, can be also the Word message that surrounds this picture.For example, in webpage as shown in Figure 4, can be upper and lower two row or the Word message that surrounds this picture near described, i.e. " earth " and " famous blue hoodle photo is taken in 1972 by No. 17 spaceships of Apollo ".Described encirclement span is from the relatively little Word message of the line space of picture, for example, in Fig. 4, the distance between Word message " earth " and " famous blue hoodle photo is taken in 1972 by No. 17 spaceships of Apollo " and picture is significantly less than other Word message.
Step S06, the noun in obtained all Word messages or noun phrase are carried out weights calculating by weight computation module 103, generates the weight of each noun or noun phrase.Preferred embodiment of the present invention adopts TF-IDF(term frequency – inverse document frequency, word frequency-reverse file frequency) weighting algorithm calculates the weight of noun or noun phrase.Described TF-IDF is a kind of weighting technique of prospecting for information retrieval and information, in order to assess a noun or the noun phrase wherein significance level of a section for obtained all Word messages.The number of times that the importance of noun or noun phrase occurs in same section of Word message along with it increase that is directly proportional, but simultaneously can be along with the frequency that it occurs at the multistage Word message decline that is inversely proportional to.For example, total the number of the noun in passage information or noun phrase is 100, and noun " computing machine " has occurred 3 times, and the word frequency (TF) of " computing machine " word in this section of Word message is exactly 3/100=0.03 so.And if " computing machine " word occurred at 1,000 section of Word message, and the hop count of total Word message is 10,000,000, its reverse file frequency (IDF) is exactly log (10,000,000/1,000)=4, therefore, the weights of " computing machine " word are 0.03*4=0.12.
Other embodiment of the present invention also can adopt independent TF(term frequency, word frequency) weighting algorithm, do not consider the frequency that noun or noun phrase occur in described multistage Word message.In addition, other embodiment of the present invention also can adopt Boolean weighting algorithm.Described Boolean weighting algorithm refers to randomly draws several nouns or noun phrase in passage information, calculates the frequency that it occurs in this section of Word message.
Step S07, weight computation module 103 can also according to the each similar pictures in obtained a N similar pictures its position hereof, adjust the weight of described noun or noun phrase.It is generally acknowledged, if picture is at the homepage of a file, this picture outbalance in this part of file.Therefore, if some or several noun or the corresponding similar pictures of noun phrase in the homepage of its place file, weight computation module 103 can be multiplied by a coefficient by the weighted value of this one or several noun or noun phrase, as 1.1.
Step S08, retrieval module 104 is according to described weight, n the noun that obtains that wherein weight is higher or noun phrase.
Step S09, retrieval module 104 is input to obtained a n noun or noun phrase in described search engine and does full-text search as key word, and returns to result for retrieval to user.
Finally it should be noted that, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although the present invention is had been described in detail with reference to preferred embodiment, those of ordinary skill in the art is to be understood that, can modify or be equal to replacement technical scheme of the present invention, and not depart from the spirit and scope of technical solution of the present invention.
Claims (10)
1. a search system, is characterized in that, this system comprises:
Receiver module, the picture of inputting at a search engine through the browser of a terminal device for receiving user;
Analysis module, for analyzing the graphic feature of received picture, and do similarity according to all pictures in this graphic feature and a picture library and calculate, and according to the value of the similarity of calculating, from described picture library, obtain the higher picture of value of top n similarity;
Locating module, for finding out one or more file that comprises obtained a N similar pictures, and respectively N similar pictures obtaining of location its position hereof, and obtain near the Word message in this position;
Weight computation module, for the noun of obtained all Word messages or noun phrase are carried out to weights calculating, generates the weight of each noun or noun phrase; And
Retrieval module, for according to described weight, n the noun that obtains that wherein weight is higher or noun phrase, be input to obtained a n noun or noun phrase in described search engine and do full-text search as key word, and return to result for retrieval to user.
2. search system as claimed in claim 1, is characterized in that, the graphic feature of described picture comprises tone, profile and shape.
3. search system as claimed in claim 1, is characterized in that, the address that each picture in described picture library comprises related information and records this picture place webpage, and positional information in the webpage of place.
4. search system as claimed in claim 1, is characterized in that, the calculating of described weights adopts TF-IDF weighting algorithm.
5. search system as claimed in claim 1, is characterized in that, described weight computation module also for according to each similar pictures its position hereof, adjust the weight of described noun or noun phrase.
6. a searching method, is characterized in that, the method comprises:
Receiving step: receive the picture that user inputs in a search engine through the browser of a terminal device;
Analytical procedure: the graphic feature of analyzing the picture receiving, and do similarity according to all pictures in this graphic feature and a picture library and calculate, and according to the value of the similarity of calculating, from described picture library, obtain the higher picture of value of top n similarity;
Positioning step: find out one or more file that comprises obtained a N similar pictures, and respectively N similar pictures obtaining of location its position hereof, and obtain near the Word message in this position;
Weight calculation step: the noun in obtained all Word messages or noun phrase are carried out to weights calculating, generate the weight of each noun or noun phrase; And
Search step: according to described weight, n the noun that obtains that wherein weight is higher or noun phrase, be input to obtained a n noun or noun phrase in described search engine and do full-text search as key word, and return to result for retrieval to user.
7. searching method as claimed in claim 6, is characterized in that, the graphic feature of described picture comprises tone, profile and shape.
8. searching method as claimed in claim 6, is characterized in that, the address that each picture in described picture library comprises related information and records this picture place webpage, and positional information in the webpage of place.
9. searching method as claimed in claim 6, is characterized in that, the calculating of described weights adopts TF-IDF weighting algorithm.
10. searching method as claimed in claim 6, is characterized in that, described weight calculation step also comprises:
According to each similar pictures its position hereof, adjust the weight of described noun or noun phrase.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210486931.8A CN103838769A (en) | 2012-11-26 | 2012-11-26 | Search system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210486931.8A CN103838769A (en) | 2012-11-26 | 2012-11-26 | Search system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103838769A true CN103838769A (en) | 2014-06-04 |
Family
ID=50802279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210486931.8A Pending CN103838769A (en) | 2012-11-26 | 2012-11-26 | Search system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103838769A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1378159A (en) * | 2001-03-26 | 2002-11-06 | Lg电子株式会社 | Picture searching method and device |
CN102012934A (en) * | 2010-11-30 | 2011-04-13 | 百度在线网络技术(北京)有限公司 | Method and system for searching picture |
CN102033925A (en) * | 2010-12-15 | 2011-04-27 | 闫迎瑞 | Method for searching related word information by uploading a picture |
-
2012
- 2012-11-26 CN CN201210486931.8A patent/CN103838769A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1378159A (en) * | 2001-03-26 | 2002-11-06 | Lg电子株式会社 | Picture searching method and device |
CN102012934A (en) * | 2010-11-30 | 2011-04-13 | 百度在线网络技术(北京)有限公司 | Method and system for searching picture |
CN102033925A (en) * | 2010-12-15 | 2011-04-27 | 闫迎瑞 | Method for searching related word information by uploading a picture |
Non-Patent Citations (3)
Title |
---|
刘金松等: "基于网页上下文分析的图片检索", 《语言计算与基于内容的文本处理——全国第七届计算语言学联合学术会议论文集》 * |
热依玛依·买买提等: "基于文本的图片检索中图片相关文本提取技术的研究", 《少数民族青年自然语言处理技术研究与进展——第三届全国少数民族青年自然语言信息处理、第二届全国多语言知识库建设联合学术研讨会论文集》 * |
焦隽等: "一种在无标注图像库中进行的基于关键词的检索方法", 《模式识别与人工智能》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804532B (en) | Query intention mining method and device and query intention identification method and device | |
JP6161679B2 (en) | Search engine and method for realizing the same | |
US7519588B2 (en) | Keyword characterization and application | |
Bennett et al. | Inferring and using location metadata to personalize web search | |
US9262532B2 (en) | Ranking entity facets using user-click feedback | |
US8762326B1 (en) | Personalized hot topics | |
CN108304444B (en) | Information query method and device | |
US8631097B1 (en) | Methods and systems for finding a mobile and non-mobile page pair | |
US20110055238A1 (en) | Methods and systems for generating non-overlapping facets for a query | |
CN109885773A (en) | A kind of article personalized recommendation method, system, medium and equipment | |
CN104143005A (en) | Related searching system and method | |
CN111858915A (en) | Information recommendation method and system based on label similarity | |
CN104615723B (en) | The determination method and apparatus of query word weighted value | |
JP5952711B2 (en) | Prediction server, program and method for predicting future number of comments in prediction target content | |
US9454568B2 (en) | Method, apparatus and computer storage medium for acquiring hot content | |
KR101508583B1 (en) | Semantic searching system and method for smart device | |
US9465875B2 (en) | Searching based on an identifier of a searcher | |
CN105224555A (en) | A kind of methods, devices and systems of search | |
TW201421267A (en) | Searching system and method | |
Chen et al. | A framework for annotating OpenStreetMap objects using geo-tagged tweets | |
CN103902687A (en) | Search result generating method and search result generating device | |
Tabarcea et al. | Framework for location-aware search engine | |
CN104392000B (en) | Determine the method and apparatus that mobile site captures quota | |
CN109408725B (en) | Method and apparatus for determining user interest | |
CN103838769A (en) | Search system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140604 |