WO2008028395A1 - A method for providing and searching information to the public using internet - Google Patents

A method for providing and searching information to the public using internet Download PDF

Info

Publication number
WO2008028395A1
WO2008028395A1 PCT/CN2007/002259 CN2007002259W WO2008028395A1 WO 2008028395 A1 WO2008028395 A1 WO 2008028395A1 CN 2007002259 W CN2007002259 W CN 2007002259W WO 2008028395 A1 WO2008028395 A1 WO 2008028395A1
Authority
WO
WIPO (PCT)
Prior art keywords
public
information
query
search
server
Prior art date
Application number
PCT/CN2007/002259
Other languages
French (fr)
Chinese (zh)
Inventor
Vincent Tsang
Michael Liu
Original Assignee
Vincent Tsang
Michael Liu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vincent Tsang, Michael Liu filed Critical Vincent Tsang
Publication of WO2008028395A1 publication Critical patent/WO2008028395A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention relates to a method of using the Internet to provide information, store information, and query information for the public, and more particularly to a method for querying information using the search engine on the Internet.
  • the search engine is currently the most commonly used web application tool on the Internet.
  • the commonly used search engines include Google Google, Baidu, Yahoo Yahoo, Yisou, Zhongsou and Alltheweb, etc. People use the search engine to obtain various information on the Internet.
  • most commonly used search engines currently use link-related articles to include publicly used query keywords.
  • the biggest drawback of such search engines is that the correlation between the search results and the query words is not high enough.
  • the query results are not accurate enough; such search engines are not really search for linked content, and the search results pages are filled with information that is not related to the search target of the searcher. For example, the encyclopedia page contains almost all the query keywords.
  • the object of the present invention is to provide a method for providing and querying information for the public by using the Internet.
  • the technical problem to be solved is to improve the accuracy and relevance between the public query words and the query results, and to make the public more efficient in querying information.
  • the present invention adopts the following technical solution: a method for providing and querying information for the public by using the Internet, having a browser and a server structure, the browser including a publishing device for transmitting information content to a storage device of the server, for using the server
  • the computing device sends a search device for querying the content;
  • the server includes a storage device for recording the information content transmitted by the publishing device in the master record database, a master record database for storing the data, for sending the data according to the search device Querying content, extracting records from the master record database, summarizing, sorting and transmitting the summary results, and receiving the record set transmitted by the computing device and transmitting the record set to the feedback device of the browser interface.
  • the server side of the present invention further includes the information content and the query content respectively transmitted from the distribution device and the search device, and then transmits the normalized word segmentation result to the storage device of the server side and the word segmentation device of the computing device.
  • the word segmentation device of the present invention is connected with a vocabulary database, and the vocabulary database is in the form of a path table, including a path sequence number, a path name, a click number, and a mask path.
  • the information content of the present invention includes links, rating levels, and rating basis; the query content is a query word.
  • the storage device of the present invention records the score transmitted by the distribution device, the rating level, and the score basis transmitted by the word segmentation device as a record, which is recorded in the master record database.
  • the computing device of the present invention extracts all the records including the query words from the main record database by using the word segmentation result sent by the word segmentation device, and cumulatively adds the scores of the records whose scores and scores are the same according to the contents of the two fields, and Sort the summary results by rating values.
  • the feedback device of the present invention receives the record set transmitted by the computing device and distributes it to the browser interface of the queryer.
  • the master record database of the present invention stores links of information, query paths, and score results in the form of a list, including: a resource table, a resource score record table, and a score path table.
  • the resource table of the present invention includes: a resource name, a resource serial number, a link address, a resource classification number, a description, a release time, and a user serial number;
  • the resource score record table includes: a score serial number, an resource sequence Column number, rating level;
  • the score path table includes: Resource score record serial number, path serial number.
  • a method for providing and querying information for the public by using the Internet having a client and server structure, characterized in that: the client comprises a publishing device for transmitting information content to a storage device of the server, and a computing device for the server a search device that sends the query content; the server includes a storage device for recording the information content transmitted by the publishing device in the main record database, a main record database for storing the data, and the query content sent according to the search device And a computing device for extracting records from the master record database, summarizing, sorting and transmitting the summary results, for receiving a record set transmitted by the computing device, and transmitting the record set to the client interface.
  • the present invention uses a search engine network composed of a publishing device, a searching device, a storage device, a master record database, a computing device, and a feedback device to search for information content, and the search result is more accurate, and the query word and the search result are more accurate.
  • the correlation between the results is also higher.
  • the public organizes the information experience and organically combines the user experience with information storage, classification and retrieval. It is not necessary to understand the rigorous classification of the information in science and classify according to its own understanding. Other publics also searched with this understanding, achieving the effect of organic combination of information classification and human understanding framework, and searching for keywords filled in by the public, greatly improving the retrieval performance of the network system.
  • FIG. 1 is a network topology diagram of an embodiment of the present invention.
  • FIG. 2 is a diagram showing the internal structure of a search engine according to an embodiment of the present invention.
  • Figure 3 is a flow chart of an embodiment of the present invention.
  • FIG. 4 is a diagram of a scoring operation interface according to an embodiment of the present invention.
  • FIG. 5 is a diagram of a query result interface according to an embodiment of the present invention.
  • FIG. 6 is a diagram of a direct release interface of an embodiment of the present invention.
  • FIG. 7 is a diagram of an embedded code publishing interface according to an embodiment of the present invention.
  • FIG. 8 is a diagram of a plug-in publishing interface according to an embodiment of the present invention.
  • 9 is a structural diagram of a vocabulary database according to an embodiment of the present invention.
  • Figure 10 is a block diagram showing the structure of a master record database in accordance with an embodiment of the present invention.
  • the method for providing, storing, and querying information for the public by utilizing the Internet uses the steps of scoring, summarizing, and retrieving to provide an efficient method for information classification and retrieval.
  • the ideal method for search engines is to let knowledge workers familiar with the target information knowledge and social knowledge architecture view the content of the information corresponding to each new link in the Internet, and put the information into the appropriate category.
  • the public also needs to understand the social knowledge system architecture, and then expand the classification, reach the target classification, and obtain the target information.
  • this method consumes a lot of manpower. At present, no government, organization or organization is willing to spend such a huge amount of resources to complete such a public welfare undertaking.
  • search engines require the public to carry out static classification in turn, which is inefficient. For this reason, many search engine developers want to use computer artificial intelligence to do this work, but the level of semantic analysis of the natural language of prior art artificial intelligence is not enough to achieve this goal.
  • the method of the present invention adopts a browser and a server structure B/S search engine network, and utilizes information fed back by the public on the interface of the browser, and transmits the information to the server through the Internet to complete the classification of the information, and store the database for supply. Other public to search.
  • the method for providing, storing, and querying information for the public by using the Internet is composed of a distribution device, a search device, a word segmentation device, a vocabulary database, a storage device, a master record database, a computing device, and a feedback device.
  • the publishing device at one end of the browser is used by the public to transmit a link, a rating level, and a rating basis to a word segmentation device on the server side through a browser.
  • This process is referred to as publishing in the present invention, such as: "www Bnb88. com; free movie download; 5 ", where "www. bnb88. com” is the release link, “free movie download” is the rating basis or query path, "5" is the result of the scoring; the storage device transmits the link transmitted by the device, the rating level, and the standardized rating basis transmitted by the word segmentation device as a record, that is,
  • the main record database stores the link of the information, the query path, and the scoring result in the form of a list.
  • the resource table resource of the main record database includes: a resource name res_name, which is used to save the resource name, such as "BNB88 free The movie ", the resource serial number id, the serial number used to save the link, such as "1806", the link address linked to an address, to save the link value, such as
  • the user serial number user_id is used to save the serial number of the published user
  • the resource score record table of the main record database res_graded_record includes: the score serial number id, which is used to save the score record
  • the serial number, the resource serial number res_id is the main serial number of the resource of the associated table resource table, the scoring level scorce, used to save the scoring level
  • the scoring path table of the main record database includes: resource scoring record serial number res_graded—
  • the record_id is the main serial number of the resource score record table, and the path serial number path_id is used to save the scoring path.
  • the main record database saves the link with the resource table: resource, and the resource score record table res_graded_ record saves the rating level, and saves the query path by the graded path table graded_path.
  • resource score record table res_graded_ record saves the rating level, and saves the query path by the graded path table graded_path.
  • the storage device may be a program written in the Java language, and the link, the rating basis, and the scoring level are written to the corresponding table of the main record database by using a database stored procedure or directly calling the database SQL language.
  • Medium the link, the rating basis, and the scoring level are written to the corresponding table of the main record database by using a database stored procedure or directly calling the database SQL language.
  • the search device at one end of the browser is used by the public to send a query word to the word segmentation device of the server through the browser, such as: "free movie download"; the word segmentation device normalizes the query word transmitted by the search device, and then the standardized word segmentation The result is passed to the computing device on the server side, such as: "Free / Movie / Download”.
  • the computing device extracts the word segmentation result sent by the word segmentation device, extracts all the records including the query word from the master record database, and then summarizes the score results of the obtained records, that is, the link and the score are based on the records in which the contents of the two fields are the same.
  • the score values are cumulatively added, and finally the summary results are sorted by the score values, and the sort results are transmitted to the feedback device. For example, if the final word segmentation result transmitted by the word segmentation device is "free/movie/download", the computing device first searches the main record database for all the records including the three paths "free”, "movie", and “download”. The links, valid paths and ratings of the records are presented together with the results: "ww. bnb88.
  • the sorting process is that the first basis of the sort is the total score, and the second basis is the link alphabetical order. Sorted by: "ww.bnb88.com ; Free / Movie / Download; '11","Li.bnb88.com; Free / Movie / Download / Animation; 5". Finally, the computing device transmits the result of the calculation to the feedback device. Programs written in the Java language pass the results to the feedback device through inter-program calls, as can be done in the following programming languages: Select a, res_name", a. " linked_ address , MAX (b. , score, ) from resource— temp a , re's—graded — record b . where a. id -b. res — id group by a., res —name,.
  • the feedback device receives the record set transmitted by the computing device, and the page is sent to the queryer's browser interface, for example, the feedback device view.jsp written in the JSP language, and the result transmitted by the computing device, such as "www.bnb88.com Free/movie/download; 11 "and"ww. bnb88. com; free/movie/download/animation; 5", displayed to the public on the page, if the number of records in the result set is greater than the feedback device page display threshold, The query results are displayed in tabs. The querier can click on the " 1 2 3 4 5 6 7 8 9 10 Next Page Last Page" link to view more details of the query results.
  • the word segmentation device is used to standardize the rating basis transmitted from the storage device, and the query word transmitted from the search device.
  • the scores in the non-vocabulary database in the query or the query word are masked, so that the extracted valid paths are respectively sent to the storage device or the computing device.
  • the vocabulary database structure is in the form of a path table path, including: a path sequence number id, which is used to store a sequence number value of the path, such as "1925", a path name path_name, to save the content of the path. For example, “free”, clicks clicked-count, the number of queries to save the path, such as "25223”, the mask path i S _shield_path, to save whether the path is a masked path, such as "Y”.
  • a path sequence number id which is used to store a sequence number value of the path, such as "1925"
  • path_name a path name path_name
  • the mask path i S _shield_path to save whether the path is a masked path, such as "Y”.
  • the method for using the Internet to provide, store, and query information for the public, the public and or the search engine website staff to use the publishing device to publish information includes: publishing through the search result page when browsing the link, and directly publishing in the rating operation interface, Embed code postings in other sites, and click the plugin button to publish when browsing links.
  • the search result page of the search engine created by the method opens a link of interest to it, and by clicking the link, the form of the search result page transmits the link address to the publishing device located on the server side. After hitting the link, the user enters the scoring operation interface.
  • the content of the target link page is embedded in the center of the page. The rest of the content includes the top, bottom, left, right, hover, pop-up display release code, and the public fills in the rating and rating here.
  • the scoring operation interface uses the input box to receive the public rating basis, one or more keywords related to the information content, and uses the drop-down box or the radio button to receive the public rating level, which is excellent, good, medium, poor, or poor. 4, 3, 2, and 5 levels.
  • the form of the scoring operation interface transmits the link and the rating level to the server-side storage device, and transmits the scoring basis to the server-side word segmentation device.
  • the form statement of the webpage of the publishing device located at the client is implemented by transmitting parameters to the storage device and the word segmentation device written in the JSP language on the server side.
  • the public browses the search engine search result page in the browser, and opens the link of interest, the link address pointed to by the link.
  • the scoring operation interface transmits the link and rating level to the server-side storage device through the form, such as ".bnb88.com", “excellent”, and simultaneously transmits the scoring basis to the server-side word segmentation device, such as "download movie free news mine”".
  • the public must specify the rating basis and rating level when publishing the information.
  • the publishing device will perform a check, that is, the form determines whether the input box based on the receiving score is empty, not empty. Submit, if it is empty, prompt the public to enter. This step can be implemented in a page scripting language, such as JavaScript, specific programs. Mouth ⁇
  • the search device transmits the information input by the public to the word segmentation via the Internet, and the word segmentation device receives the search device.
  • the incoming query word, the specification query word is transmitted to the computing device, such as "free / movie / download”
  • the computing device extracts the resource score record table from the master record database res_graded_recor contains all the records of the canonical query word, Then, the link and the score are cumulatively added according to the scores of the records in which the contents of the two fields are the same, and finally the summary results are sorted by the score value, the sort result is transmitted to the feedback device, and the feedback device finally outputs the result in the browser.
  • the middle page is displayed.
  • the information thus found is of interest to the inquirer and is considered by most netizens to be the best, improving the effectiveness of the search.
  • the public accesses the search engine URL on the browser, enters the search device page, enters a set of query keywords in the input box, such as "free movie download", clicks the "search” button, and the search device located at the browser side
  • the query keyword is passed to the word segmentation device on the server side.
  • the execution language used is exemplified as "ww.
  • the score operation interface is directly published, the public login to the search engine, click the link to enter the direct release interface, use the keyboard or mouse to enter the release target link, rating basis and rating level, click the "submit” or "Go” button,
  • the form directly publishing the interface transmits the link and the rating level to the server-side storage device, and transmits the rating basis to the server-side word segmentation device.
  • the transfer process is implemented by the form statement of the web page of the publishing device of the client being transmitted to the storage device and the word segmentation device written in the JSP language on the server side.
  • the public login search for the website click the "Publish” link, and the link is located in the server-side JSP language publishing device, 3 ⁇ 4" should be linked address as "www. tell7.
  • the jsp file form sends the link to the Pj jsp file written in the JSP language of the server-side publishing device, and enters the pj.jsp file page; the public enters the rating basis in the input box of the pj.jsp file page, such as "Download movie free” Cartoon Classic Foreign", use the radio button to determine the rating level, such as "5", click the "Go” button, pj. jsp file will link "ww. bnb88. com; 5" and rating level. "5" to the store The device transmits the score to the word segmentation device according to "Download the movie free cartoon classic foreign".
  • the pj.jsp file page pops up a dialog box asking the public to fill out.
  • the code is embedded in other sites.
  • the operators of other websites will set their own sites in the search engine search results page created by the method of the present invention.
  • the publishing code of the search engine created by the method of the present invention when the public accesses the page with the embedded code embedded, the public can fill in the rating basis in the input box of the page, and use the drop-down box or the radio button to determine the rating level, click After the submit button, the form in which the page that posted the code is embedded transmits the link and rating level to the server-side storage device, and transmits the rating basis to the server-side word segmentation device.
  • the public accesses the page with the embedded code embedded in the browser.
  • the access link of the page is " ⁇ . bnb88. com/movie/index.htm”.
  • This code calls a scripting language program on the server side, such as issue, js c public when opening a web page with embedded code in the browser, the embedded release code executes the script program issue, js, script program issue, js from The server publishing device obtains a score-based input box, a rating level radio button, and a link delivery code. The public fills in the rating basis and rating level here. After clicking the submit button, the script program issue, js transmits the link and rating level to the storage device located on the server side, such as
  • the plugin button when the link is browsed, the plugin button is clicked, and the public installs the search engine plugin set according to the method of the present invention in the browser of the local computer, and accesses the browser in the browser in which the plugin is installed.
  • the browser runs the plugin, pops up the publishing window, the public fills in the rating box in the input box of this page, uses the drop-down box or radio button to determine the rating level, click the submit button, the window
  • the table one-way server-side storage device transmits the link and the rating level, and simultaneously transmits the rating basis to the server-side word segmentation device. Specific examples are: The public downloads the release plugin on the search engine website and installs it on its local computer.
  • the window form sends the link and rating level to the server-side storage device, such as
  • the scores and query words are various. In order to effectively standardize the public's rating basis and query words, it is necessary to process the scores and the input of the query words, and extract the public scores and query words. Valid path.
  • the specific steps of word segmentation are: (1), the original vocabulary database is created, the original vocabulary database is used as the basic database, and the temporary database is the same.
  • the specific structure is the same as the vocabulary database.
  • the thesaurus can also be built by the search engine management company according to the current paper dictionary, or obtain the electronic vocabulary from the Internet, and use the database query language SQL, paper dictionary directly into the database, or use the vocabulary directly.
  • the batch data import work of the corresponding database imports the data in the original vocabulary database into the vocabulary database.
  • Use the bulk data import tool for the MySQL 5. 0 database downloaded on the Internet to import the text data master master text2db 1. 01.
  • Examples of vocabulary in the vocabulary database are "...free of charge; free of charge; free of charge; no crown; no open mouth; exemption; free of charge; ".
  • mask word attribute setting Set the mask attribute corresponding to some words in the original vocabulary database to mask, that is, set the mask word.
  • the mask path field is-shield_path of the path table path is set to mask, that is, its value is set to "Y".
  • the search engine operates the company's internal staff to use the database management page such as MySQL 5.0 MySQL database query browser MySQL Query Browser manually set the mask path field is_shield_path to mask, ie "Y", default is "N”", you can also set the property of this field to "Y” using the database language SQL. Shielded words include most pronouns, auxiliary words, Adverbs, punctuation, and policy-restricted words such as "I, us, this, the, the good, the good, the can, the sex" and so on.
  • masking words and participles the word segmentation device from the public rating basis or query words, from front to back, one by one, to form a new word, and then to see if the new word is in the vocabulary database, if not, discard the new word From the next word, repeat the above steps to form a new word, see if the new word is in the vocabulary database, and if so, that is, in the vocabulary database, view the mask attribute of the new word in the vocabulary database, if it is blocked , the word is blocked, if not blocked, the first valid word is taken out. Repeat the above steps until the rating basis or all valid words in the query word are taken out. For example, the rating basis or the query word is "I think this is the best free movie download site".
  • the word segmentation device reads the first word from it and gets "I” to see if it is included in the vocabulary database. If not, it means that the word currently obtained is not a word in the vocabulary database, and the word is masked; if so, continue to take the second word and get "I recognize” to see if there is this word in the vocabulary database; , the vocabulary database contains the words taken in the previous paragraph, "I", and the addition of a word "recognition” does not contain; the first word is decomposed, "I”, and at the same time, whether the word is a mask word, that is, in the vocabulary
  • the mask attribute of the word in the database is—shield—the path is “Y”, the attribute is “ ⁇ ”, indicating that the word is a mask word, and the word segmentation device blocks the word; if the mask attribute of “I recognize” is ashield —path is “ ⁇ ”, indicating that the word is a valid word, then continue to take a word, and so on, repeat the above operation until the third
  • the above operation method can be implemented by a computer in the Java programming language.
  • the word segmentation device takes a new word and encounters a space, the word segmentation device considers that the current word-taking operation ends, checks whether the extracted word is a mask word, and then continues the word segmentation operation after the space is started. After the word segmentation, the word segmentation device transmits the final result of the word segmentation to the storage device or Computing device.
  • the method for providing, storing, and querying information for the public by using the Internet includes the following specific steps:
  • the public enters the link address in the browser such as IE and Netscape, and browses the corresponding content.
  • the link points to the website or webpage or multimedia file.
  • the purpose is to let the public know the specific content of the target link.
  • the public is aware of the link. In the case of the content of the corresponding information, you can skip this step and publish it directly.
  • the process we call “publishing” refers to the process in which the public submits the link, the scoring basis, and the rating level to the publishing device or the word segmentation device.
  • the scoring basis is composed of one or more keywords, and the middle can be separated by a separator.
  • the separator includes a space, a diagonal tL, etc., which is called a "path" in the method of the present invention, and the keyword will be used as a corresponding link. Storage and query path.
  • the scoring is based on the fact that the public is aware of the content of the link and scores the path.
  • the word segmentation device standardizes the public rating basis, and sends the result to the storage device.
  • the storage device stores the score basis as a path in the master record database, and stores the corresponding link and rating level.
  • the master record database which records the public's rating elements in detail, including user ID, level, IP address, release time, link, rating basis, and rating level.
  • the public enters the query path on the Web side; if based on the client/month server structure C/S, the public enters the query path on the client side.
  • the search device is configured to receive a public query path, which may be a set of keywords separated by a separator, or a paragraph written by a natural language book.
  • the word segmentation device regulates the public query word and sends the result to the computing device, such as "free/ Movie/Download".
  • the computing device extracts all records containing the query word from the master record database according to the query path.
  • the database contains the following records: "Free movie download; http: ⁇ www.5see.com/; excellent", “Free movie download online watch; http://w.dguo.com/. ; excellent", "free movie download; http: //www. zzip. com.
  • the computing device adds up the scores of all the records satisfying the search expression searched according to the query path, and the record for a link may be one or more, as the last score of the information for the path, for example, the link " http: ⁇ . dguo. com/" corresponds to two records of "free movie download online viewing; excellent", then its score on the "free movie download online viewing” path is 10 points; there is only one "free movie download” record , the score of the "free movie download” path is 5 points, and after the summary, one or more new records are obtained. These records indicate the total score of the path corresponding to each link. For example, "free movie.
  • the computing device sorts the above recordsets from high to low by the summary of the scores, and transmits the sorted result record set to the feedback device.
  • the feedback device displays the record set transmitted by the computing device in a text form, for example, There are 1000 search results, and the result page is set to feedback 10.
  • the feedback results are displayed in the following order: "Free movie download online viewing; http: ⁇ .dguo. com/; 3209 ", "Free movie download; http : ⁇ . 5see. com/; 2989", "Free Movie Download Online; http: ⁇ www. cnvv. cn/ ; 1228 “, "Free Movie Download; http: //Li. zzip. com. cn/; 892" .
  • the result of each page is preferably 5 to 30.
  • the method for using the Internet to provide, store and query information for the public, the computer software and hardware environment used by the public, can satisfy the Internet access condition, the CPU of the central processing unit is P2 or more, the memory is 64M or more, and the free space of the disk is 50M or more.
  • the system is Microsoft Win 98 or above, Unix 7. 0 or higher or Linux 2. 2 or higher, the browser is IE 5. 0 or higher, Netscape 4. 0 or higher or Firefox 1.00 or higher, the network card is 10M or more, and the bandwidth is 56K or more. .
  • the network communication protocol is HTTP 1. 0.
  • Application development environment parameters Eclipse 3. 1; JDK 1. 5; JDBC 3 ⁇
  • a search engine based on the C/S, B/S architecture can be constructed. This embodiment is based on the B/S architecture.
  • the technical environment used is: Tomcat, JSP, Javabean, MySQL, and the communication protocol used is HTTP.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A method for providing and searching information to the public using Internet which solves the problem of improving the accuracy and relativity between the search keywords and the search results. The method has the Browser and Server structure, in which the browser includes issue device and search device, and the server includes storage device, main record database, computing device and feedback device. Compared with the prior techniques, the search engine built by the method of the present invention which is used by the public for searching information can return more accurate results, and the relativity between the search keywords and the search results is higher. The search engine on which a search according to he keyword filled in by the public is performed combines the user's experience with information storage, classification and search, which makes the public do not need know the precise classification in science, and improves the search performance enormously.

Description

利用互联网为公众提供和査询信息的方法 技术领域  Method of using the Internet to provide and query information for the public
本发明涉及一种利用互联网为公众提供信息、存储信息并査询信息的方 法, 特别是一种在互联网上使用搜索弓 I擎查询信息的方法。  The present invention relates to a method of using the Internet to provide information, store information, and query information for the public, and more particularly to a method for querying information using the search engine on the Internet.
背景技术 Background technique
搜索引擎是目前互联网上最常用的网络应用工具, 目前常用的搜索引擎 有谷歌 Google、 百度、 雅虎 Yahoo、 一搜、 中搜和 Alltheweb等等, 人们在 互联网上通过搜索引擎获取各种信息。根据搜索引擎研究报告显示, 目前常 用搜索引擎大多采用链接对应文章中是否包含公众使用的査询关键词,此类 搜索引擎的最大缺陷是, 搜索结果的内容与查询字之间的相关度不够高, 查 询结果不够准确; 此类搜索引擎并不是真正意义上的针对链接内容的搜索, 搜索结果页中大量充斥着与查询者搜索目标无关的信息。例如, 百科全书页 中几乎包含了所有的查询关键字, 无论查询者使用哪个关键字, 目前的搜索 引擎大多都会搜索出该页,但实际上,大多数情况,人们仅仅希望在使用 "百 科全书 "这类查询字时才搜索出该页。另外,有不少网页发布者利用此缺陷, 故意在网页中设置大量的公众常用的关键字, 诱引公众访问其页面, 达到各 种不良之目的, 如增加点击量, 让查询者计算机中毒等, 进一步让目前此类 搜索弓 I擎使用起来变得更为困难。  The search engine is currently the most commonly used web application tool on the Internet. Currently, the commonly used search engines include Google Google, Baidu, Yahoo Yahoo, Yisou, Zhongsou and Alltheweb, etc. People use the search engine to obtain various information on the Internet. According to the search engine research report, most commonly used search engines currently use link-related articles to include publicly used query keywords. The biggest drawback of such search engines is that the correlation between the search results and the query words is not high enough. The query results are not accurate enough; such search engines are not really search for linked content, and the search results pages are filled with information that is not related to the search target of the searcher. For example, the encyclopedia page contains almost all the query keywords. No matter which keyword the queryer uses, most current search engines will search for the page, but in fact, in most cases, people just want to use the encyclopedia. "This type of query is only searched for this page. In addition, many web publishers use this flaw to deliberately set a large number of commonly used keywords in the webpage to entice the public to access their pages to achieve various undesirable purposes, such as increasing clicks and causing the inquirer to poison the computer. Further, it is more difficult to use such a search engine.
发明内容 Summary of the invention
本发明的目的是提供一种利用互联网为公众提供和查询信息的方法, 要 解决的技术问题是提高公众查询字与査询结果间的准确度与相关度,让公众 查询信息的效率更高。 本发明采用以下技术方案:一种利用互联网为公众提供和查询信息的方 法, 具有浏览器和服务器结构, 所述浏览器包括用于向服务器的存储装置传 送信息内容的发布装置, 用于向服务器的计算装置发送査询内容的搜索装 置; 服务器包括用于将发布装置传送来的信息内容记录在主记录数据库中的 存储装置, 用于存储数据的主记录数据库, 用于根据搜索装置发来的査询内 容, 从主记录数据库中提取记录、 汇总、 将汇总结果进行排序并发送的计算 装置, 用于接收计算装置传送来的记录集, 发送给浏览器界面的反馈装置。 The object of the present invention is to provide a method for providing and querying information for the public by using the Internet. The technical problem to be solved is to improve the accuracy and relevance between the public query words and the query results, and to make the public more efficient in querying information. The present invention adopts the following technical solution: a method for providing and querying information for the public by using the Internet, having a browser and a server structure, the browser including a publishing device for transmitting information content to a storage device of the server, for using the server The computing device sends a search device for querying the content; the server includes a storage device for recording the information content transmitted by the publishing device in the master record database, a master record database for storing the data, for sending the data according to the search device Querying content, extracting records from the master record database, summarizing, sorting and transmitting the summary results, and receiving the record set transmitted by the computing device and transmitting the record set to the feedback device of the browser interface.
本发明的服务器端还包括分别将发布装置和搜索装置传送来的信息内 容和查询内容加以规范,然后将规范后的分词结果传送给服务器端的存储装 置和计算装置的分词装置。  The server side of the present invention further includes the information content and the query content respectively transmitted from the distribution device and the search device, and then transmits the normalized word segmentation result to the storage device of the server side and the word segmentation device of the computing device.
本发明的分词装置连接有词汇数据库, 词汇数据库以路径表的结构形 式, 包括路径序列号、 路径名、 点击数、 屏蔽路径。  The word segmentation device of the present invention is connected with a vocabulary database, and the vocabulary database is in the form of a path table, including a path sequence number, a path name, a click number, and a mask path.
本发明的信息内容包括链接、评分等级及评分依据;查询内容为查询字。 本发明的存储装置将发布装置传送来的链接、 评分等级, 分词装置传送 来的评分依据, 作为一条记录, '记录在主记录数据库中。  The information content of the present invention includes links, rating levels, and rating basis; the query content is a query word. The storage device of the present invention records the score transmitted by the distribution device, the rating level, and the score basis transmitted by the word segmentation device as a record, which is recorded in the master record database.
本发明的计算装置将分词装置发来的分词结果, 从主记录数据库中提取 包含査询字的所有记录, 将链接、 评分依据两个字段的内容都相同的记录的 评分值累积相加, 并将汇总结果以评分值进行排序。  The computing device of the present invention extracts all the records including the query words from the main record database by using the word segmentation result sent by the word segmentation device, and cumulatively adds the scores of the records whose scores and scores are the same according to the contents of the two fields, and Sort the summary results by rating values.
本发明的反馈装置接收计算装置传送来的记录集,分页发送给查询者的 浏览器界面。  The feedback device of the present invention receives the record set transmitted by the computing device and distributes it to the browser interface of the queryer.
本发明的主记录数据库以列表的形式存储信息的链接、 査询路径、评分 结果, 包括: 资源表、 资源评分记录表和评分路径表。  The master record database of the present invention stores links of information, query paths, and score results in the form of a list, including: a resource table, a resource score record table, and a score path table.
本发明的资源表包括: 资源名称、资源序列号、链接地址、 资源分类号、 描述、 发布时间、 用户序列号; 资源评分记录表包括: 评分序列号、 资源序 列号、 评分等级; 评分路径表包括: 资源评分记录序列号、 路径序列号。 一种利用互联网为公众提供和查询信息的方法, 具有客户机和服务器 结构, 其特征在于: 所述客户机包括用于向服务器的存储装置传送信息内容 的发布装置, 用于向服务器的计算装置发送査询内容的搜索装置; 服务器包 括用于将发布装置传送来的信息内容记录在主记录数据库中的存储装置, 用 于存储数据的主记录数据库, 用于根据搜索装置发来的査询内容, 从主记录 数据库中提取记录、 汇总、 将汇总结果进行排序并发送的计算装置, 用于接 收计算装置传送来的记录集, 发送给客户机界面的反馈装置。 The resource table of the present invention includes: a resource name, a resource serial number, a link address, a resource classification number, a description, a release time, and a user serial number; the resource score record table includes: a score serial number, an resource sequence Column number, rating level; The score path table includes: Resource score record serial number, path serial number. A method for providing and querying information for the public by using the Internet, having a client and server structure, characterized in that: the client comprises a publishing device for transmitting information content to a storage device of the server, and a computing device for the server a search device that sends the query content; the server includes a storage device for recording the information content transmitted by the publishing device in the main record database, a main record database for storing the data, and the query content sent according to the search device And a computing device for extracting records from the master record database, summarizing, sorting and transmitting the summary results, for receiving a record set transmitted by the computing device, and transmitting the record set to the client interface.
本发明与现有技术相比, 利用发布装置、 搜索装置、 存储装置、 主记录 数据库、 计算装置及反馈装置构成的搜索引擎网络, 针对信息内容的搜索, 搜索结果更加准确, 査询字与搜索结果间的相关度也更高, 公众对信息体验 后加以整理, 有机地将用户体验与信息存储、 分类和检索结合在一起, 无须 了解该信息在科学中的严谨分类, 按自己的理解进行分类, 而其他公众也是 以此理解加以搜索, 达到信息分类保存与人类认识框架有机结合之效果, 针 对公众填写的关键词进行检索, 大大提高了网络系统的检索性能。  Compared with the prior art, the present invention uses a search engine network composed of a publishing device, a searching device, a storage device, a master record database, a computing device, and a feedback device to search for information content, and the search result is more accurate, and the query word and the search result are more accurate. The correlation between the results is also higher. The public organizes the information experience and organically combines the user experience with information storage, classification and retrieval. It is not necessary to understand the rigorous classification of the information in science and classify according to its own understanding. Other publics also searched with this understanding, achieving the effect of organic combination of information classification and human understanding framework, and searching for keywords filled in by the public, greatly improving the retrieval performance of the network system.
附图说明 DRAWINGS
图 1是本发明实施例的网络拓朴图。  1 is a network topology diagram of an embodiment of the present invention.
图 2是本发明实施例的搜索引擎内部结构图。  2 is a diagram showing the internal structure of a search engine according to an embodiment of the present invention.
图 3是本发明实施例的流程图。  Figure 3 is a flow chart of an embodiment of the present invention.
图 4是本发明实施例的评分操作界面图。  4 is a diagram of a scoring operation interface according to an embodiment of the present invention.
图 5是本发明实施例的査询结果界面图。  FIG. 5 is a diagram of a query result interface according to an embodiment of the present invention.
图 6是本发明实施例的直接发布界面图。  6 is a diagram of a direct release interface of an embodiment of the present invention.
图 7是本发明实施例的嵌入代码发布界面图。  FIG. 7 is a diagram of an embedded code publishing interface according to an embodiment of the present invention.
图 8是本发明实施例的插件发布界面图。 图 9是本发明实施例的词汇数据库结构图。 · 图 10是本发明实施例的主记录数据库结构图。 FIG. 8 is a diagram of a plug-in publishing interface according to an embodiment of the present invention. 9 is a structural diagram of a vocabulary database according to an embodiment of the present invention. Figure 10 is a block diagram showing the structure of a master record database in accordance with an embodiment of the present invention.
具体实施方式 detailed description
下面结合附图和实施例对本发明作进一步详细的说明。本发明的利用互 联网为公众提供、 存储和查询信息的方法, 采用评分、 汇总和检索的步骤, 为公众提供一种高效的信息分类与检索的方法。我们知道, 搜索引擎理想的 方法是,让熟悉目标信息知识及社会知识体系架构的知识工作者查看互联网 中每个新增链接所对应的信息的内容, 并将信息摆放到合适的分类中去; 而 公众也需要了解社会知识体系架构, 依次展开分类, 到达目标分类, 获取目 标信息。 但这种方法会消耗大量人力, 目前还没有一个政府、 组织或机构愿 意花费如此巨大资源来完成这样一个公益事业; 同时此类搜索引擎需要公众 依次展开静态分类, 使用效率较低。 正因如此, 不少搜索引擎的开发者想利 用计算机的人工智能来完成此项工作,但现有技术的人工智能的自然语言的 语义分析水平还不足以实现此目标。  The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. The method for providing, storing, and querying information for the public by utilizing the Internet uses the steps of scoring, summarizing, and retrieving to provide an efficient method for information classification and retrieval. We know that the ideal method for search engines is to let knowledge workers familiar with the target information knowledge and social knowledge architecture view the content of the information corresponding to each new link in the Internet, and put the information into the appropriate category. The public also needs to understand the social knowledge system architecture, and then expand the classification, reach the target classification, and obtain the target information. However, this method consumes a lot of manpower. At present, no government, organization or organization is willing to spend such a huge amount of resources to complete such a public welfare undertaking. At the same time, such search engines require the public to carry out static classification in turn, which is inefficient. For this reason, many search engine developers want to use computer artificial intelligence to do this work, but the level of semantic analysis of the natural language of prior art artificial intelligence is not enough to achieve this goal.
如图 1所示,本发明的方法采用浏览器和服务器结构 B/S搜索引擎网络, 利用公众在浏览器的界面反馈的信息,通过互联网传递给服务器来完成信息 的分类, 进行数据库存储, 供其他公众来搜索。  As shown in FIG. 1 , the method of the present invention adopts a browser and a server structure B/S search engine network, and utilizes information fed back by the public on the interface of the browser, and transmits the information to the server through the Internet to complete the classification of the information, and store the database for supply. Other public to search.
如图 2所示,本发明的利用互联网为公众提供、存储和查询信息的方法, 由发布装置、搜索装置、 分词装置、词汇数据库、存储装置、主记录数据库、 计算装置及反馈装置构成。  As shown in FIG. 2, the method for providing, storing, and querying information for the public by using the Internet is composed of a distribution device, a search device, a word segmentation device, a vocabulary database, a storage device, a master record database, a computing device, and a feedback device.
浏览器一端的发布装置用于公众通过浏览器向服务器一端的存储装置 传送链接、评分等级, 以及向服务器一端的分词装置传送评分依据, 此过程 在本发明中被称为发布,如: "www. bnb88. com; 免费 电影下载; 5 ",其中, "www. bnb88. com"为发布链接, "免费 电影下载"为评分依据或查询路径, "5"为评分结果; 存储装置将发布装置传送过来的链接、 评分等级, 以及 分词装置传送过来的已经规范了的评分依据, 作为一条记录, 即The publishing device at one end of the browser is used by the public to transmit a link, a rating level, and a rating basis to a word segmentation device on the server side through a browser. This process is referred to as publishing in the present invention, such as: "www Bnb88. com; free movie download; 5 ", where "www. bnb88. com" is the release link, "free movie download" is the rating basis or query path, "5" is the result of the scoring; the storage device transmits the link transmitted by the device, the rating level, and the standardized rating basis transmitted by the word segmentation device as a record, that is,
"www. bnb88. com; 免费 /电影 /下载; 5 "的形式存储在服务器的主记录数据 库中。 "www.bnb88.com; free/movie/download; 5" is stored in the server's master record database.
如图 10所示, 主记录数据库以列表的形式存储信息的链接、査询路径、 评分结果, 主记录数据库的资源表 resource包括: 资源名称 res— name, 用 以保存资源名称, 如 "BNB88免费电影", 资源序列号 id, 用以保存链接的 序列号, 如 " 1806 ", 链接地址 linked一 address , 用以保存链接值, 如 As shown in FIG. 10, the main record database stores the link of the information, the query path, and the scoring result in the form of a list. The resource table resource of the main record database includes: a resource name res_name, which is used to save the resource name, such as "BNB88 free The movie ", the resource serial number id, the serial number used to save the link, such as "1806", the link address linked to an address, to save the link value, such as
"www. bnb88. com", 资源分类号 res一 category一 id, 用以保存资源类型序列 号, 其值为 "网站、 页面、 文件"之一, 描述 description, 用以保存资源 描述,发布时间 uploadjime,用以保存资源发布时间,用户序列号 user— id, 用以保存发布用户的序列号; 主记录数据库的资源评分记录表 res— graded— record包括: 评分序列号 id, 用以保存评分记录的序列号, 资 源序列号 res_id,为关联表资源表 resource的主序列号,评分等级 scorce, 用以保存评分等级; 主记录数据库的评分路径表 graded— path包括: 资源评 分记录序列号 res— graded— record_id,为资源评分记录表的主序列号,路径 序列号 path— id,保存评分路径。主记录数据库以资源表: resource保存链接, 以资源评分记录表 res— graded— record 保存评分等级, 以评分路径表 graded— path保存查询路径。当多个用户使用以上步骤对该链接进行评分时, 可能得到以下多条记录: "www. bnb88. com ; 免费 /电影 /下载 /动画; 5 ","www. bnb88.com", resource classification number res-category-id, used to save the resource type serial number, its value is one of "website, page, file", description description, to save the resource description, release time uploadjime For saving the resource release time, the user serial number user_id is used to save the serial number of the published user; the resource score record table of the main record database res_graded_record includes: the score serial number id, which is used to save the score record The serial number, the resource serial number res_id, is the main serial number of the resource of the associated table resource table, the scoring level scorce, used to save the scoring level; the scoring path table of the main record database includes: resource scoring record serial number res_graded— The record_id is the main serial number of the resource score record table, and the path serial number path_id is used to save the scoring path. The main record database saves the link with the resource table: resource, and the resource score record table res_graded_ record saves the rating level, and saves the query path by the graded path table graded_path. When multiple users use the above steps to score the link, they may get the following records: "www. bnb88. com ; free / movie / download / animation; 5 ",
"ww. bnb88. com ; 免费 /电影 /下载; 3 ", "www. bnb88. com ; 免费 /电影 / 下载; 3 ", "www. bnb88. com ; 免费 /电影 /在线 /观看; 5 " 。 本实例中, 存 储装置可以是由 Java语言编写的程序, 利用数据库存储过程或是直接调用 数据库 SQL语言将链接、评分依据与评分等级写入到主记录数据库的相应表 中。 "ww. bnb88. com ; free / movie / download; 3 ", "www. bnb88. com ; free / movie / download; 3 ", "www. bnb88. com ; free / movie / online / watch; 5". In this example, the storage device may be a program written in the Java language, and the link, the rating basis, and the scoring level are written to the corresponding table of the main record database by using a database stored procedure or directly calling the database SQL language. Medium.
浏览器一端的搜索装置用于公众通过浏览器向服务器的分词装置发送 査询字, 如: "免费 电影下载"; 分词装置将搜索装置传送来的査询字加以 规范, 然后将规范后的分词结果传送给服务器端的计算装置, 如: "免费 /电 影 /下载"。  The search device at one end of the browser is used by the public to send a query word to the word segmentation device of the server through the browser, such as: "free movie download"; the word segmentation device normalizes the query word transmitted by the search device, and then the standardized word segmentation The result is passed to the computing device on the server side, such as: "Free / Movie / Download".
计算装置将分词装置发来的分词结果,从主记录数据库中提取包含査询 字的所有记录, 接着将所得记录的评分结果进行汇总, 即将链接、评分依据 两个字段的内容都相同的记录的评分值累积相加,最后将汇总结果以评分值 进行排序, 排序结果传送给反馈装置。 具体举例为, 若分词装置传送来的最 终分词结果为 "免费 /电影 /下载", 计算装置首先在主记录数据库中搜寻所 有包含 "免费"、 "电影"、 "下载"这三个路径的记录,将相符合记录的链接、 有效路径与评分一并提出, 结果分别为: "ww. bnb88. com ; 免费 /电影 /下 载; 5", "www. bnb88. com ; 免费 /电影 /下载 /动画; 5", "www. bnb88. com ; 免费 /电影 /下载; 3 ", "ww. bnb88. com ; 免费 /电影 /下载; 3 "。 可用以下 程序语言完成:  The computing device extracts the word segmentation result sent by the word segmentation device, extracts all the records including the query word from the master record database, and then summarizes the score results of the obtained records, that is, the link and the score are based on the records in which the contents of the two fields are the same. The score values are cumulatively added, and finally the summary results are sorted by the score values, and the sort results are transmitted to the feedback device. For example, if the final word segmentation result transmitted by the word segmentation device is "free/movie/download", the computing device first searches the main record database for all the records including the three paths "free", "movie", and "download". The links, valid paths and ratings of the records are presented together with the results: "ww. bnb88. com ; free / movie / download; 5", "www. bnb88. com ; free / movie / download / animation 5", "www. bnb88. com ; free / movie / download; 3 ", "ww. bnb88. com ; free / movie / download; 3 ". This can be done in the following programming languages:
BEGIN BEGIN
DECLARE a, b CHAR (20); DECLARE a, b CHAR (20);
DECLARE c int ;  DECLARE c int ;
DECLARE curl CURSOR FOR select a.、res—graded— record— id、, GROUP— CONCAT  DECLARE curl CURSOR FOR select a., res_graded_ record — id, GROUP — CONCAT
a. ' path— id、 ) as paths from graded— path a GROUP by a. res— graded— record一 id ;  a. ' path_ id, ) as paths from graded—path a GROUP by a. res—graded—record one id ;
DECLARE Confiltered二 1 ; OPEN curl ;  DECLARE Confiltered II 1 ; OPEN curl ;
REPEAT FETCH curl INTO a, b; REPEAT FETCH curl INTO a, b;
IF b like paths .THEN  IF b like paths .THEN
INSERT INTO 、resource—tempT  INSERT INTO, resource_tempT
select *  Select *
from resource^ t  From resource^ t
where t.、 id' =  Where t., id' =
(select r. res一 id from res一 graded— record r where r.、 id、 二 a  (select r. res an id from res a graded - record r where r., id, two a
);  );
END IF;  END IF;
UNTIL c- 1 END REPEAT;  UNTIL c- 1 END REPEAT;
CLOSE curl; CLOSE curl;
END  END
select a. res— name , a. linked—address , b. score - from resource— temp、 a , 、 res— graded— record' b where a.、 id、=b.、res— icf ; ' 汇总过程为, 将链接为 "ww.bnb88.com", 评分依据为 "免费 /电影 /下 载" 的所有记录的评分值累加起来, 即 5+3+3=11, 最终结果为: "www.bnb88. com ; 免费 /电影 /下载; 11", "www.bnb88.com ; 免费 /电影 / 下载 /动画; 5"。 排序过程为, 排序的第一依据是得分总值, 第二依据是链 接字母顺序。 排序结果为: "ww.bnb88.com ; 免费 /电影 /下载;' 11", "丽 . bnb88.com; 免费 /电影 /下载 /动画; 5"。 最后, 计算装置将计算结 果传送给反馈装置。 由 Java语言编写的程序通过程序间的调用将结果传送 给反馈装置, 如可用以下程序语言完成: select a, res_name", a. " linked— address , MAX (b. 、 score、 ) from resource— temp a , re's—graded— record b . where a. id -b. res— id group by a.、res—name、。 Select a. res— name , a. linked—address , b. score - from resource— temp, a , , res—graded—record' b where a., id, =b., res— icf ; ' The summary process is , the link will be "ww.bnb88.com", and the scores of all the records based on "free/movie/download" will be added up, ie 5+3+3=11, and the final result is: "www.bnb88.com Free / Movie / Download; 11", "www.bnb88.com ; Free / Movie / Download / Animation; 5". The sorting process is that the first basis of the sort is the total score, and the second basis is the link alphabetical order. Sorted by: "ww.bnb88.com ; Free / Movie / Download; '11","Li.bnb88.com; Free / Movie / Download / Animation; 5". Finally, the computing device transmits the result of the calculation to the feedback device. Programs written in the Java language pass the results to the feedback device through inter-program calls, as can be done in the following programming languages: Select a, res_name", a. " linked_ address , MAX (b. , score, ) from resource— temp a , re's—graded — record b . where a. id -b. res — id group by a., res —name,.
反馈装置接收计算装置传送来的记录集,分页发送给査询者的浏览器界 面, 例如,利用 JSP语言编写的反馈装置 view. jsp,将计算装置传送来的结 果, 如 "www. bnb88. com; 免费 /电影 /下载; 11 "和 "ww. bnb88. com; 免 费 /电影 /下载 /动画; 5 ", 在页面上显示给公众, 若结果集内记录数大于反 馈装置页面显示阀值时, 查询结果分页显示。 查询者可以点击 " 1 2 3 4 5 6 7 8 9 10 下一页最后页第 页"链接, 查看査询结果的更多具体内容。  The feedback device receives the record set transmitted by the computing device, and the page is sent to the queryer's browser interface, for example, the feedback device view.jsp written in the JSP language, and the result transmitted by the computing device, such as "www.bnb88.com Free/movie/download; 11 "and"ww. bnb88. com; free/movie/download/animation; 5", displayed to the public on the page, if the number of records in the result set is greater than the feedback device page display threshold, The query results are displayed in tabs. The querier can click on the " 1 2 3 4 5 6 7 8 9 10 Next Page Last Page" link to view more details of the query results.
分词装置用于规范从存储装置传来的评分依据, 和从搜索装置传来的査 询字。 分词装置将评分依据或查询字, 与词汇数据库中存储的规范词, 如 "……免费; 免费生; 免官; 免冠; 免开尊口; 免礼; 免票; …… ", 匹配 记录进行比较, 即将评分依据或查询字中非词汇数据库中的词屏蔽掉, 从而 提取有效路径分别发送至存储装置或计算装置。  The word segmentation device is used to standardize the rating basis transmitted from the storage device, and the query word transmitted from the search device. The word segmentation device will score the basis or query word, and the normative words stored in the vocabulary database, such as "...free; free student; exemption from official; exemption; exemption; exemption; free ticket; ......", matching records The scores in the non-vocabulary database in the query or the query word are masked, so that the extracted valid paths are respectively sent to the storage device or the computing device.
如图 9所示, 词汇数据库结构以路径表 path的结构形式, 包括: 路径 序列号 id, 用以保存路径的序列号值, 如 " 1925 ", 路径名 path—name, 用 以保存路径的内容, 如"免费", 点击数 clicked—count , 用以保存路径的查 询次数, 如" 25223 ", 屏蔽路径 iS_shield_path, 用以保存该路径是否为屏 蔽路径, 如 "Y"。 As shown in FIG. 9, the vocabulary database structure is in the form of a path table path, including: a path sequence number id, which is used to store a sequence number value of the path, such as "1925", a path name path_name, to save the content of the path. For example, "free", clicks clicked-count, the number of queries to save the path, such as "25223", the mask path i S _shield_path, to save whether the path is a masked path, such as "Y".
本发明的利用互联网为公众提供、存储和査询信息的方法, 公众和或搜 索引擎网站工作人员利用发布装置发布信息的方式包括:通过搜索结果页面 浏览链接时发布, 在评分操作界面直接发布, 在其他站点内嵌入代码发布, 浏览链接时点击插件按钮发布。  The method for using the Internet to provide, store, and query information for the public, the public and or the search engine website staff to use the publishing device to publish information includes: publishing through the search result page when browsing the link, and directly publishing in the rating operation interface, Embed code postings in other sites, and click the plugin button to publish when browsing links.
如图 4所示,通过搜索结果页面浏览链接时发布, 公众在由本发明的方 法创建的搜索引擎的搜索结果页面中打开其感兴趣的某个链接,通过打幵该 链接, 搜索结果页的表单向位于服务器端的发布装置传送链接地址。打幵该 链接后进入评分操作界面,页面的正中位置内嵌显示目标链接页面的具体内 容, 其余部位包括上、 下、 左、 右、 悬浮、 弹出显示发布代码, 公众在此填 写评分依据及评分等级, 评分操作界面利用输入框接收公众评分依据, 与信 息内容相关的一个或以上的关键词, 利用下拉框或单选框接收公众评分等 级, 以优、 良、 中、 差、 劣或 5、 4、 3、 2、 1五个等级。 公众选择提交后, 评分操作界面的表单向服务器端的存储装置传送链接和评分等级, 同时向服 务器端的分词装置传送评分依据。 具体为, 位于客户端的发布装置的网页的 表单语句向服务器端的以 JSP语言编写的存储装置和分词装置传送参数来实 现。例如,公众在浏览器中浏览搜索引擎搜索结果页面,打开感兴趣的链接, 该 链 接 所 指 向 的 链 接 地 址. 举 例 为As shown in FIG. 4, when the link is browsed through the search result page, the public is in the party of the present invention. The search result page of the search engine created by the method opens a link of interest to it, and by clicking the link, the form of the search result page transmits the link address to the publishing device located on the server side. After hitting the link, the user enters the scoring operation interface. The content of the target link page is embedded in the center of the page. The rest of the content includes the top, bottom, left, right, hover, pop-up display release code, and the public fills in the rating and rating here. Level, the scoring operation interface uses the input box to receive the public rating basis, one or more keywords related to the information content, and uses the drop-down box or the radio button to receive the public rating level, which is excellent, good, medium, poor, or poor. 4, 3, 2, and 5 levels. After the public chooses to submit, the form of the scoring operation interface transmits the link and the rating level to the server-side storage device, and transmits the scoring basis to the server-side word segmentation device. Specifically, the form statement of the webpage of the publishing device located at the client is implemented by transmitting parameters to the storage device and the word segmentation device written in the JSP language on the server side. For example, the public browses the search engine search result page in the browser, and opens the link of interest, the link address pointed to by the link.
"www. tell7. com/view?url=http: //www. bnb88. com",搜索结果页的表单向 服务器端的以 JSP语言编写的发布装置 view, jsp传送链接地址, 被传送的 链接为 "www. bnb88. com"。打开该链接地址后进入评分操作界面, 页面中部 显示 "胃. bnb88. com"的具体内容, 页面上部显示发布代码, 公众在此填 写评分依据及评分等级,评分操作界面利用输入框接收公众评分依据,如"下 载 电影 免费 讯雷", 利用下拉框接收公众评分等级, 如 "优"。 公众点击"www. tell7. com/view?url=http: //www. bnb88.com", the form of the search results page to the server side of the publishing device view written in JSP language, jsp transfer link address, the link to be transmitted is " Www. bnb88. com". Open the link address and enter the scoring operation interface. The middle part of the page displays the specific content of "Stomach. bnb88. com". The upper part of the page displays the release code. The public fills in the rating basis and rating level. The scoring operation interface uses the input box to receive the public rating. For example, "Download Movie Free News", use the drop-down box to receive public rating levels, such as "Excellent". Public click
"Go"按钮后, 评分操作界面通过表单向服务器端的存储装置传送链接与评 分等级,如 " . bnb88. com"、 "优 ", 同时向服务器端的分词装置传送评分 依据,如"下载 电影 免费讯雷"。需要指出的是, 公众在进行信息发布时, 必须同时指定评分依据与评分等级,提交时,发布装置将进行查空检查, 即, 表单判断接收评分依据的输入框内容是否为空, 不为空则提交, 为空则提示 公众输入。该步骤可以采用页面脚本语言来实现,如 JavaScript, 具体程序 口 β After the "Go" button, the scoring operation interface transmits the link and rating level to the server-side storage device through the form, such as ".bnb88.com", "excellent", and simultaneously transmits the scoring basis to the server-side word segmentation device, such as "download movie free news mine"". It should be pointed out that the public must specify the rating basis and rating level when publishing the information. When submitting, the publishing device will perform a check, that is, the form determines whether the input box based on the receiving score is empty, not empty. Submit, if it is empty, prompt the public to enter. This step can be implemented in a page scripting language, such as JavaScript, specific programs. Mouth β
s  s
<SCRIPT LANGUAGE-" JavaScript")  <SCRIPT LANGUAGE-"JavaScript")
<!—  <! -
function check 0  Function check 0
{ . · ' . … ' if (document, issue, path, value, length ! =0)  { . . ' . ... ' if (document, issue, path, value, length ! =0)
{  {
}  }
else  Else
{  {
alert Γ评分依据不能为空! 〃);'  Alert Γ Rating can not be empty! 〃);'
return false;  Return false;
}  }
return false ;  Return false ;
} · . '  } · . '
//"> · ' //"> · '
</SCRIPT> " </SCRIPT> "
如图 5所示, 公众在浏览器上搜索时, 输入查询字后点击发送, 如 "免 费 电影 下载", 搜索装置将公众输入的信息通过互联网向分词装壹传送该 信息, 分词装置接收搜索装置传来的查询字, 规范查询字后将结果传送给计 算装置, 如 "免费 /电影 /下载", 计算装置从主记录数据库中提取资源评分 记录表 res—graded_recor包含了规范查询字的所有记录, 接着将链接、 评 分依据两个字段的内容都相同的记录的评分值累积相加,最后将汇总结果以 评分值进行排序, 排序结果传送给反馈装置, 反馈装置最终将结果在浏览器 中分页显示。 这样找出来的信息是査询者关心的且是大多数网民认为最好 的, 提高了检索的有效性。 如: 公众在浏览器上访问搜索引擎网址, 进入搜 索装置页面, 在输入框中输入一组査询关键词, 如 "免费 电影 下载", 点 击 "搜索"按钮, 位于浏览器端的搜索装置将此査询关键词传递给位于服务 器端的分词装置, 使用的执行语言举例为 " ww. tell7. com/find? hl-zh-CN&q=%E5%85%8D%E8%B4%B9+%E7%94%B5%E5%BD%Bl+%E40/oB8%8B%E8%BD% BD&lr二"。 As shown in FIG. 5, when the public searches on the browser, the query word is input and then clicked, such as "free movie download", the search device transmits the information input by the public to the word segmentation via the Internet, and the word segmentation device receives the search device. The incoming query word, the specification query word is transmitted to the computing device, such as "free / movie / download", the computing device extracts the resource score record table from the master record database res_graded_recor contains all the records of the canonical query word, Then, the link and the score are cumulatively added according to the scores of the records in which the contents of the two fields are the same, and finally the summary results are sorted by the score value, the sort result is transmitted to the feedback device, and the feedback device finally outputs the result in the browser. The middle page is displayed. The information thus found is of interest to the inquirer and is considered by most netizens to be the best, improving the effectiveness of the search. For example, the public accesses the search engine URL on the browser, enters the search device page, enters a set of query keywords in the input box, such as "free movie download", clicks the "search" button, and the search device located at the browser side The query keyword is passed to the word segmentation device on the server side. The execution language used is exemplified as "ww. tell7.com/find?hl-zh-CN&q=%E5%85%8D%E8%B4%B9+%E7%94% B5%E5%BD%Bl+%E4 0 /oB8%8B%E8%BD% BD&lr II".
如图 6所示, 在评分操作界面直接发布, 公众登录搜索引擎, 点击链接 进入直接发布界面,用键盘或鼠标输入发布目标链接、评分依据与评分等级, 点击 "提交"或 "Go"按钮, 直接发布界面的表单向服务器端的存储装置传 送链接和评分等级,同时向服务器端的分词装置传送评分依据。传送过程是, 位于客户端的发布装置的网页的表单语句向服务器端的以 JSP语言编写的存 储装置和分词装置传送参数来实现。 具体举例为, 公众登录搜索弓 ί擎网站,· 点击 "发布"链接, 链接位于服务器端的 JSP语言编写的发布装置, ¾ "应链 接地址举例为 " www. tell7. com/dirview. jsp ", 浏览器打开该路径显示 dirview. jsp文件, 进入发布装置的发布页面, dirview. jsp文件的表单的 输入框接收公众将要发布的链接, 如 "丽. bnb88. com", 点击 " Go"按钮, dirview. jsp 文件的表单向服务器端的发布装置中的 JSP语言编写的评价 P.j. jsp文件传送此链接,进入 pj. jsp文件页面;公众在 pj. jsp文件页面的 输入框中输入评分依据, 如 "下载 电影 免费卡通精典 国外", 利用单选 框确定评分等级,如" 5",点击" Go"按钮, pj. jsp文件将链接" ww. bnb88. com; 5"与评分等级. " 5"传送给存储装置,将评分依据"下载 电影 免费卡通精 典 国外"传送给分词装置。 当评分依据为空时, pj. jsp文件页面弹出对话 框, 要求公众填写。 如图 7所示, 在其他站点内嵌入代码发布, 一般地, 其他网站的经营者 为提高其网站在本发明的方法创建的搜索引擎搜索结果页中的排位,会在其 自身站点中设置由本发明的方法创建的搜索引擎的发布代码,当公众访问这 个内嵌了发布代码的页面时, 公众可在此页面的输入框中填写评分依据, 利 用下拉框或单选框确定评分等级, 点击提交按钮后, 内嵌了发布代码的页面 的表单向服务器端的存储装置传送链接和评分等级, 同时向服务器端的分词 装置传送评分依据。具体举例为, 公众在浏览器中访问内嵌了发布代码的页 面, 该页面的访问链接举例为 "■. bnb88. com/movie/index. htm"。 该页面 内嵌发布代码的具体程序语句举例为 " <script language^ javascript src="http : //www. tell7. com/issue. js"X/script>"。此代码调用位于服务 器端的一个脚本语言程序,如 issue, js c公众在浏览器中打开内嵌了发布代 码的网页时,内嵌的发布代码执行该脚本程序 issue, js,脚本程序 issue, js 从服务器发布装置获取包含评分依据输入框, 评分等级单选框, 以及链接传 送代码。 公众在此填写评分依据、 评分等级, 点击提交按钮后, 脚本程序 issue, js 将链接和评分等级传送给位于服务器端的存储装置, 如As shown in Figure 6, the score operation interface is directly published, the public login to the search engine, click the link to enter the direct release interface, use the keyboard or mouse to enter the release target link, rating basis and rating level, click the "submit" or "Go" button, The form directly publishing the interface transmits the link and the rating level to the server-side storage device, and transmits the rating basis to the server-side word segmentation device. The transfer process is implemented by the form statement of the web page of the publishing device of the client being transmitted to the storage device and the word segmentation device written in the JSP language on the server side. For example, the public login search for the website, click the "Publish" link, and the link is located in the server-side JSP language publishing device, 3⁄4" should be linked address as "www. tell7. com/dirview. jsp ", browse Open the path to display the dirview.jsp file, go to the publishing page of the publishing device, and enter the input box of the form of the dirview.jsp file to receive the link that the public will post, such as "Li.bnb88.com", click the "Go" button, dirview. The jsp file form sends the link to the Pj jsp file written in the JSP language of the server-side publishing device, and enters the pj.jsp file page; the public enters the rating basis in the input box of the pj.jsp file page, such as "Download movie free" Cartoon Classic Foreign", use the radio button to determine the rating level, such as "5", click the "Go" button, pj. jsp file will link "ww. bnb88. com; 5" and rating level. "5" to the store The device transmits the score to the word segmentation device according to "Download the movie free cartoon classic foreign". When the rating basis is empty, the pj.jsp file page pops up a dialog box asking the public to fill out. As shown in Figure 7, the code is embedded in other sites. Generally, the operators of other websites will set their own sites in the search engine search results page created by the method of the present invention. The publishing code of the search engine created by the method of the present invention, when the public accesses the page with the embedded code embedded, the public can fill in the rating basis in the input box of the page, and use the drop-down box or the radio button to determine the rating level, click After the submit button, the form in which the page that posted the code is embedded transmits the link and rating level to the server-side storage device, and transmits the rating basis to the server-side word segmentation device. For example, the public accesses the page with the embedded code embedded in the browser. The access link of the page is "■. bnb88. com/movie/index.htm". An example of a specific program statement that embeds code in the page is "<script language^ javascript src="http : //www. tell7. com/issue. js"X/script>". This code calls a scripting language program on the server side, such as issue, js c public when opening a web page with embedded code in the browser, the embedded release code executes the script program issue, js, script program issue, js from The server publishing device obtains a score-based input box, a rating level radio button, and a link delivery code. The public fills in the rating basis and rating level here. After clicking the submit button, the script program issue, js transmits the link and rating level to the storage device located on the server side, such as
"www. bnb88. com; 5",同时将评分依据传送给分词装置, 如 "下载 电影免 费卡通精典 国外"。 "www. bnb88. com; 5", and at the same time pass the rating basis to the word segmentation device, such as "Download the movie free cartoon classic abroad".
如图 8所示, 浏览链接时点击插件按钮发布, 公众在其本地计箅机的浏 览器中安装根据本发明的方法设定的搜索引擎插件, 当在安装了该插件的浏 览器中访问某链接时, 可以点击对应插件按钮, 浏览器运行该插件, 弹出发 布窗口, 公众在此页面的输入框中填写评分依据, 利用下拉框或单选框确定 评分等级, 点击提交按钮后, 本窗口的表单向服务器端的存储装置传送链接 和评分等级, 同时向服务器端的分词装置传送评分依据。 具体举例为: 公众 在本搜索引擎网站上下载发布插件并安装到其本地计算机上,在安装了该插 件的浏览器中访问任一网页, 点击位于标准工具栏的 "Tell7"插件按钮, 弹出发布窗口, 公众在此页面的 r入框中填写评分依据, 如 "下载 电影免 费卡通精典 国外", 利用单选框确定评分等级, 如 "5", 点击提交按钮后, 窗口的表单向服务器端的存储装置传送链接和评分等级, 如As shown in FIG. 8, when the link is browsed, the plugin button is clicked, and the public installs the search engine plugin set according to the method of the present invention in the browser of the local computer, and accesses the browser in the browser in which the plugin is installed. When linking, you can click the corresponding plugin button, the browser runs the plugin, pops up the publishing window, the public fills in the rating box in the input box of this page, uses the drop-down box or radio button to determine the rating level, click the submit button, the window The table one-way server-side storage device transmits the link and the rating level, and simultaneously transmits the rating basis to the server-side word segmentation device. Specific examples are: The public downloads the release plugin on the search engine website and installs it on its local computer. Access any webpage in the browser of the piece, click the "Tell7" plugin button located on the standard toolbar, pop up the release window, and the public fills in the rating basis in the r box of this page, such as "Download the movie free cartoon classic foreign" Use the radio button to determine the rating level, such as "5". After clicking the submit button, the window form sends the link and rating level to the server-side storage device, such as
"ww. bnb88. com; 5", 同时向服务器端的分词装置传送评分依据, 如 "下 载 电影免费卡通精典 国外"。 "ww. bnb88. com; 5", at the same time to send the score basis to the server-side word segmentation device, such as "Download the movie free cartoon classic foreign".
由于公众数量巨大, 填写的评分依据及查询字各式各样, 为了有效规范 公众的评分依据与查询字, 需要对评分依据与查询字的输入做分词处理, 提 取公众评分依据与査询字中的有效路径。 分词处理的具体步骤是: (1 )、 创 建原始词汇数据库, 原始词汇数据库作为基础数据库, 为临时库,.其具体结 构与词汇数据库相同, 来源有多种, 可以直接从社会获取现有的电子词库, 也可以由搜索引擎经营公司人员依照目前纸质词典构建, 或从网上获取电子 词库, 并将其中的词汇直接利用数据库查询语言 SQL, 纸质词典写入到数据 库中,或是利用对应数据库的批量数据导入工作将原始词汇数据库内的数据 导入到词汇数据库中。 如: 利用互联网上下载的用于 MySQL 5. 0数据库的批 量数据导入工具文本数据导入大师 text2db 1. 01。词汇数据库内的词汇举例 为 "……免费; 免费生; 免官; 免冠; 免开尊口; 免礼; 免票; …… "。 还 可以采用甲骨文 Oracle、 DB2、 Sybase、 SQL Server。 (2)、屏蔽词属性设置: 将原始词汇数据库中的部分词汇对应的屏蔽属性设置为屏蔽, 即设置屏蔽 词。本实例中,将路径表 path的屏蔽路径字段 is一 shield—path设置为屏蔽, 即设置其值为 "Y"。 由本搜索引擎经营公司内部人员使用数据库的管理页面 如 MySQL 5. 0的 MySQL数据库查询浏览器 MySQL Query Browser人工将屏蔽 路径字段 is— shield—path设置为屏蔽, 即 "Y", 缺省为 "N", 也可使用数 据库语言 SQL将此字段的属性设置为 "Y"。 屏蔽词包括大部分代词、 助词、 副词、 标点符号以及政策限制的词, 如 "我、 我们、 这、 的、 得、 好、 能、 可以、性" 等等。 (3)、屏蔽词与分词:分词装置从公众评分依据或查询字, 由前向后依次逐个取字, 构成新词, 然后査看新词是否在词汇数据库中, 若 否, 舍弃该新词, 从下一个字开始, 重复以上步骤, 构成新词, 查看新词是 否在词汇数据库中, 若是, 即在词汇数据库中有此词, 査看新词在词汇数据 库中的屏蔽属性, 若为屏蔽, 则屏蔽该词, 若不为屏蔽, 即取出了第一个有 效词。 依此类推重复以上步骤, 直至将评分依据或查询字中的所有有效词取 出。具体举例为, 评分依据或查询字为 "我认为这是最好的免费电影下载网 站", 分词装置从中读取第一个字, 得到 "我", 査看在词汇数据库中是否含 有这个字, 若无, 则说明目前取得的这个字不是词汇数据库中的词, 屏蔽掉 这个字; 若有, 继续取第二个字, 得到 "我认", 查看在词汇数据库中是否 有这个词; 若无, 说明词汇数据库包含前面所取的词, "我 ", 而再加一个字 "认"则不含; 分解出第一个词, "我 ", 同时查看该词是否为屏蔽词, 即在 词汇数据库中该词的屏蔽属性 is— shield— path是否为 "Y", 该属性为 "Υ", 表示此词为屏蔽词, 分词装置屏蔽掉该词; 若 "我认" 的屏蔽属性 is一 shield—path为" Ν",表示该词为有效词,则继续取下一个字,依此类推, 重复以上操作, 直到找到第 Ν个字, 在词库中无法找到该词, 则说明到 Ν - 1 个字皆为一个有效词。 .依此类推重复以上步骤, 直到将评分依据, 分解为以 下词汇: "我"、 "认为"、 "这是"、 "最好"、 "的"、 "免费"、 "电影"、 "下 载"、 "网站"。 其中 "我"、 "认为"、 "这是"、 "最好"、 "的"、 "网站"被屏 蔽掉, 最终的分词结果为 "免费"、 "电影"、 "下载"。 以上操作方法可采用 Java程序语言由计算机实现分词。当分词装置取新词碰到空格时,分词装置 认为目前本步取词操作结束, 查看取出的词是否为屏蔽词, 接着继续开始空 格后的分词操作。 分词结束后, 分词装置将分词最终结果传送给存储装置或 计算装置。 Due to the huge number of the public, the scores and query words are various. In order to effectively standardize the public's rating basis and query words, it is necessary to process the scores and the input of the query words, and extract the public scores and query words. Valid path. The specific steps of word segmentation are: (1), the original vocabulary database is created, the original vocabulary database is used as the basic database, and the temporary database is the same. The specific structure is the same as the vocabulary database. There are many sources, and the existing electronic products can be directly obtained from the society. The thesaurus can also be built by the search engine management company according to the current paper dictionary, or obtain the electronic vocabulary from the Internet, and use the database query language SQL, paper dictionary directly into the database, or use the vocabulary directly. The batch data import work of the corresponding database imports the data in the original vocabulary database into the vocabulary database. For example: Use the bulk data import tool for the MySQL 5. 0 database downloaded on the Internet to import the text data master master text2db 1. 01. Examples of vocabulary in the vocabulary database are "...free of charge; free of charge; free of charge; no crown; no open mouth; exemption; free of charge; ...". You can also use Oracle Oracle, DB2, Sybase, SQL Server. (2), mask word attribute setting: Set the mask attribute corresponding to some words in the original vocabulary database to mask, that is, set the mask word. In this example, the mask path field is-shield_path of the path table path is set to mask, that is, its value is set to "Y". The search engine operates the company's internal staff to use the database management page such as MySQL 5.0 MySQL database query browser MySQL Query Browser manually set the mask path field is_shield_path to mask, ie "Y", default is "N"", you can also set the property of this field to "Y" using the database language SQL. Shielded words include most pronouns, auxiliary words, Adverbs, punctuation, and policy-restricted words such as "I, us, this, the, the good, the good, the can, the sex" and so on. (3), masking words and participles: the word segmentation device from the public rating basis or query words, from front to back, one by one, to form a new word, and then to see if the new word is in the vocabulary database, if not, discard the new word From the next word, repeat the above steps to form a new word, see if the new word is in the vocabulary database, and if so, that is, in the vocabulary database, view the mask attribute of the new word in the vocabulary database, if it is blocked , the word is blocked, if not blocked, the first valid word is taken out. Repeat the above steps until the rating basis or all valid words in the query word are taken out. For example, the rating basis or the query word is "I think this is the best free movie download site". The word segmentation device reads the first word from it and gets "I" to see if it is included in the vocabulary database. If not, it means that the word currently obtained is not a word in the vocabulary database, and the word is masked; if so, continue to take the second word and get "I recognize" to see if there is this word in the vocabulary database; , the vocabulary database contains the words taken in the previous paragraph, "I", and the addition of a word "recognition" does not contain; the first word is decomposed, "I", and at the same time, whether the word is a mask word, that is, in the vocabulary The mask attribute of the word in the database is—shield—the path is “Y”, the attribute is “Υ”, indicating that the word is a mask word, and the word segmentation device blocks the word; if the mask attribute of “I recognize” is ashield —path is “ Ν”, indicating that the word is a valid word, then continue to take a word, and so on, repeat the above operation until the third word is found, the word cannot be found in the thesaurus, then it means to Ν - 1 word is valid . Repeat the above steps until the rating basis is broken down into the following words: "I", "Think", "This is", "Best", "Yes", "Free", "Movie", "Download"","Website". Among them, "I", "Think", "This is", "Best", "", "Website" are blocked, and the final participle results are "Free", "Movie", "Download". The above operation method can be implemented by a computer in the Java programming language. When the word segmentation device takes a new word and encounters a space, the word segmentation device considers that the current word-taking operation ends, checks whether the extracted word is a mask word, and then continues the word segmentation operation after the space is started. After the word segmentation, the word segmentation device transmits the final result of the word segmentation to the storage device or Computing device.
如图 3所示, 本发明的利用互联网为公众提供、存储和查询信息的方法 包括以下具体操作步骤:  As shown in FIG. 3, the method for providing, storing, and querying information for the public by using the Internet includes the following specific steps:
. 一、 公众在浏览器如 IE、 Netscape中输入链接地址, 浏览相应内容, 链接指向的是网站或网页或多媒体文件,其目的在于让公众了解被评目标链 接.的具体内容,公众在了解链接对应信息的内容的情况下,可以跳过这一步, 直接进行发布。  1. The public enters the link address in the browser such as IE and Netscape, and browses the corresponding content. The link points to the website or webpage or multimedia file. The purpose is to let the public know the specific content of the target link. The public is aware of the link. In the case of the content of the corresponding information, you can skip this step and publish it directly.
二、 公众填写评分依据后进行评分, 该过程我们称其为 "发布", 是指 公众将链接、 评分依据、 评分等级一同提交给发布装置或分词装置的过程。 评分依据, 由一个和或以上的关键词构成, 中间可以用分隔符分隔, 分隔符 包括空格、 斜 tL等等, 本发明方法中称其为 "路径", 这组关键词将作为对 应链接的存储与查询路径。评分是公众在了解链接对应内容的基础上, 针对 路径打分。  Second, the public fills in the scoring basis and scores. The process we call “publishing” refers to the process in which the public submits the link, the scoring basis, and the rating level to the publishing device or the word segmentation device. The scoring basis is composed of one or more keywords, and the middle can be separated by a separator. The separator includes a space, a diagonal tL, etc., which is called a "path" in the method of the present invention, and the keyword will be used as a corresponding link. Storage and query path. The scoring is based on the fact that the public is aware of the content of the link and scores the path.
. 三、 分词装置规范公众评分依据, 并将结果发送给存储装置, 存储装置 将评分依据作为路径存储在主记录数据库, 同时存储对应链接与评分等级。 : 主记录数据库, 可详细记录公众的评分要素, 包括用户 ID、 级别、 IP地址、 发布时间、 链接、 评分依据、 评分等级。 本实施例主记录数据库的存储记录. 包含的主要字段有: 链接、 路径、 评分等级。 如 "www. bnb88. com、 免费 / 电影 /下载、 5"。  3. The word segmentation device standardizes the public rating basis, and sends the result to the storage device. The storage device stores the score basis as a path in the master record database, and stores the corresponding link and rating level. : The master record database, which records the public's rating elements in detail, including user ID, level, IP address, release time, link, rating basis, and rating level. The storage record of the primary record database of this embodiment. The main fields included are: link, path, and rating level. Such as "www. bnb88.com, free / movie / download, 5".
四、 基于 B/S架构, 公众在 Web端输入查询路径; 如果基于客户机 /月艮 务器结构 C/S, 则公众在客户端输入查询路径。 搜索装置用于接收公众查询 路径, 查询路径可以是由分隔符分隔的一组关键字, 也可以是由自然语言书 写的一段话。  4. Based on the B/S architecture, the public enters the query path on the Web side; if based on the client/month server structure C/S, the public enters the query path on the client side. The search device is configured to receive a public query path, which may be a set of keywords separated by a separator, or a paragraph written by a natural language book.
五、 分词装置规范公众查询字, 并将结果发送给计算装置, 如 "免费 / 电影 /下载"。计算装置根据查询路径, 从主记录数据库中提取包含查询字的 所有记录。 例如, 数据库内含有以下记录: "免费 电影 下载; http:〃 www.5see.com/; 优"、 "免费 电影 下载 在线 观看; http://w . dguo. com/. ; 优 " 、 " 免 费 电 影 下 载 ; http: //www. zzip. com. cn/; 优 "、 "免费 电影 下载 在线; http:〃 . cnvv.cn/; 优", 当公众查询路径为 "免费 /电影 /下载"时, 即 是从数据库中提取所有包含此查询路径的记录, 以上记录都将搜索出, 当公 众查询路径为 "免费 /电影 /下载 /在线 /观看"时, 仅搜索出其中一条记录, 即 "免费 电影下载在线 观看; http: //www. dguo. com/; 优"。 所谓 "包 含"是指含查询路径里所有关键字的记录, 当满足搜索表达式的记录为 0条 时,搜索结果页中显示"您所查找的信息没有找到,请重新调整路径后查找"。 5. The word segmentation device regulates the public query word and sends the result to the computing device, such as "free/ Movie/Download". The computing device extracts all records containing the query word from the master record database according to the query path. For example, the database contains the following records: "Free movie download; http: 〃 www.5see.com/; excellent", "Free movie download online watch; http://w.dguo.com/. ; excellent", "free movie download; http: //www. zzip. com. cn/; excellent", "free movie download online; http :〃.cnvv.cn/; excellent", when the public query path is "free/movie/download", that is, all records containing this query path are extracted from the database, the above records will be searched out, when the public query path For "free/movie/download/online/view", only one of the records is searched, ie "free movie download online viewing; http: //www.dguo.com/; excellent". The so-called "include" means Query the records of all keywords in the path. When the record that satisfies the search expression is 0, the search result page displays "The information you are looking for is not found, please re-adjust the path and look for it".
六、计算装置将根据查询路径搜索出的满足搜索表达式的所有记录的得 分相加汇总, 针对某一链接的记录可以为一条或多条, 作为该信息针对该路 径的最后得分, 例如, 链接" http:〃 . dguo. com/"对应记录有两条 "免 费 电影 下载在线 观看; 优", 则其关于 "免费 电影 下载在线 观看" 路径的得分为 10分;只有一条"免费 电影下载"记录,则其关于"免费 电 影 下载"路径的得分为 5分, 汇总后得到一条或一条以上新记录, 这些记 录指明每条链接对应的路径总得分的.情况, 例如, "免费 电影 .下载; http: //www.5see. com/; 2989 "、 "免费 电影 下载 在线 观看; http: //www. dguo. com/ ; 3209 " 、 " 免 费 电 影 下 载 ; http: //www. zzip. com. cn/; 892 "、 "免费 电影 下载 在线; ;. http:〃 . cnvv.cn/; 1228"。计算装置将以上记录集按评分汇总结果从高 到低排序, 并将排序结果记录集传送给反馈装置。  6. The computing device adds up the scores of all the records satisfying the search expression searched according to the query path, and the record for a link may be one or more, as the last score of the information for the path, for example, the link " http:〃 . dguo. com/" corresponds to two records of "free movie download online viewing; excellent", then its score on the "free movie download online viewing" path is 10 points; there is only one "free movie download" record , the score of the "free movie download" path is 5 points, and after the summary, one or more new records are obtained. These records indicate the total score of the path corresponding to each link. For example, "free movie. Download; http : //www.5see.com/; 2989 ", "Free movie download online watch; http: //www.dguo. com/ ; 3209 " , " free movie download ; http: //www. zzip. com. cn /; 892 ", "Free movie download online; ;. http:〃 . cnvv.cn/; 1228". The computing device sorts the above recordsets from high to low by the summary of the scores, and transmits the sorted result record set to the feedback device.
七、反馈装置将计算装置传送来的记录集, 以文本形式分页显示,例如, 搜索结果有 1000条, 显示结果页面的设置为反馈 10条, 则反馈结果的显示 顺序是: "免费 电影 下载在线 观看; http:〃丽. dguo. com/; 3209 "、 "免费 电影下载; http :〃 . 5see. com/; 2989"、 "免费 电影下载在 线 ; http:〃 www. cnvv. cn/ ; 1228 "、 " 免费 电影 下载; http : //丽 . zzip. com. cn/; 892"。 至此, 公众提交査询路径后, 得到了所. 有包含此路径的所有记录, 并且根据记录得分从高到低浏览, 分页显示, 每 页显示结果以 5至 30为宜。 7. The feedback device displays the record set transmitted by the computing device in a text form, for example, There are 1000 search results, and the result page is set to feedback 10. The feedback results are displayed in the following order: "Free movie download online viewing; http:〃丽.dguo. com/; 3209 ", "Free movie download; http :〃. 5see. com/; 2989", "Free Movie Download Online; http:〃 www. cnvv. cn/ ; 1228 ", "Free Movie Download; http: //Li. zzip. com. cn/; 892" . At this point, after the public submits the query path, all the records containing the path are obtained, and the records are displayed from high to low according to the record score, and the page is displayed. The result of each page is preferably 5 to 30.
本发明的利用互联网为公众提供、 存储和查询信息的方法, 公众所用计 算机软硬件环境, 满足上网条件即可, 中央处理器 CPU为 P2以上, 内存为 64M以上,磁盘自由空间为 50M以上,操作系统为微软 Win 98以上、 Unix 7. 0 以上或 Linux 2. 2以上, 浏览器为 IE 5. 0以上、 Netscape 4. 0以上或火狐 Firefox 1. 0以上, 网卡为 10M以上, 带宽为 56K以上。 服务器计算机软硬. 件参数:操作系统为红帽子 Red Hat Enterprise Linux 4U2,加装 Tomcat 5. 5, 数据库为 MySQL 5. 0, 6· 0G双核 CPU, 1G内存, 120G硬盘, 100M网卡, 2M 带宽。网络通讯协议为 HTTP 1. 0。应用开发环境参数: Eclipse 3. 1; JDK 1. 5; JDBC 3丄  The method for using the Internet to provide, store and query information for the public, the computer software and hardware environment used by the public, can satisfy the Internet access condition, the CPU of the central processing unit is P2 or more, the memory is 64M or more, and the free space of the disk is 50M or more. The system is Microsoft Win 98 or above, Unix 7. 0 or higher or Linux 2. 2 or higher, the browser is IE 5. 0 or higher, Netscape 4. 0 or higher or Firefox 1.00 or higher, the network card is 10M or more, and the bandwidth is 56K or more. . Server computer software and hardware. Parameters: Red Hat Enterprise Linux 4U2, installed Tomcat 5. 5, database for MySQL 5. 0, 6 · 0G dual-core CPU, 1G memory, 120G hard drive, 100M network card, 2M bandwidth. The network communication protocol is HTTP 1. 0. Application development environment parameters: Eclipse 3. 1; JDK 1. 5; JDBC 3丄
利用本发明的方法, 可以构建基于 C/S、 B/S架构的搜索引擎。 本实施 例为基于 B/S架构。 所用技术环境为: Tomcat, JSP、 Javabean, MySQL, 所 用通讯协议为 HTTP。  With the method of the present invention, a search engine based on the C/S, B/S architecture can be constructed. This embodiment is based on the B/S architecture. The technical environment used is: Tomcat, JSP, Javabean, MySQL, and the communication protocol used is HTTP.

Claims

权 利 要 求 Rights request
1. 一种利用互联网为公众提供和查询信息的方法, 具有浏览器和服务器结1. A method of using the Internet to provide and query information to the public, with a browser and server
. 构, 其特征在于: 所述浏览器包括用于向服务器的存储装置传送信息内 容的发布装置, 用于向服务器的计算装置发送查询内容的搜索装置; 服 务器包括用于将发布装置传送来的信息内容记录在主记录数据库中的存 储装置, 用于存储数据的主记录数据库, 用于根据搜索装置发来的査询 内容, 从主记录数据库中提取记录、 汇总、 将汇总结果进行排序并发送 的计算装置, 用于接收计算装置传送来的记录集, 发送给浏览器界面的 反馈装置。 The browser includes: a publishing device for transmitting information content to a storage device of the server, a search device for transmitting a query content to a computing device of the server; the server includes a transmitting device for transmitting the information The information content is recorded in a storage device in the main record database, and a main record database for storing data, for extracting records from the main record database, sorting, sorting and sending the summary results according to the query content sent by the search device. The computing device is configured to receive a record set transmitted by the computing device and send the feedback device to the browser interface.
2. 根据权利要求 1所述的利用互联网为公众提供和査询信息的方法, 其特 征在于: 所述服务器端还包括分别将发布装置和搜索装置传送来的信息 内容和查询内容加以规范, 然后将规范后的分词结果传送给服务器端的 存储装置和计算装置的分词装置。  2. The method for providing and querying information for the public by using the Internet according to claim 1, wherein: the server further comprises: separately regulating information content and query content transmitted by the publishing device and the searching device, and then The standardized word segmentation result is transmitted to the storage device of the server side and the word segmentation device of the computing device.
3. 根据权利要求 2所述的利用互联网为公众提供和查询信息的方法, 其特 征在于: 所述分词装置连接有词汇数据库, 词汇数据库以路径表的结构 形式, 包括路径序列号、 路径名、 点击数、 屏蔽路径。  3. The method for providing and querying information for the public by using the Internet according to claim 2, wherein: the word segmentation device is connected with a vocabulary database, and the vocabulary database is in the form of a path table, including a path serial number, a path name, Clicks, blocked paths.
4. 根据权利要求 3所述的利用互联网为公众提供和查询信息的方法, 其特 征在于: 所述信息内容包括链接、 评分等级及评分依据; 査询内容为查 询字。  4. The method for providing and querying information for the public by using the Internet according to claim 3, wherein: the information content includes a link, a rating level, and a rating basis; and the query content is a query word.
5. 根据权利要求 4所述的利用互联网为公众提供和查询信息的方法, 其特 征在于: 所述存储装置将发布装置传送来的链接、 评分等级, 分词装置 传送来的评分依据, 作为一条记录, 记录在主记录数据库中。  5. The method for providing and querying information for the public by using the Internet according to claim 4, wherein: the storage device uses a link transmitted by the publishing device, a rating level, and a rating basis transmitted by the word segmentation device as a record. , recorded in the master record database.
6. 根据权利要求 5所述的利用互联网为公众提供和查询信息的方法, 其特 征在于: 所述计算装置将分词装置发来的分词结果, 从主记录数据库中 提取包含查询字的所有记录, 将链接、 评分依据两个字段的内容都相同 的记录的评分值累积相加, 并将汇总结果以评分值进行排序。 6. The method for providing and querying information for the public by using the Internet according to claim 5, The calculation means: the computing device extracts all the records including the query words from the main record database, and accumulates the scores of the records whose scores are the same according to the contents of the two fields. The summary results are sorted by the score value.
7. 根据权利要求 6所述的利用互联网为公众提供和査询信息的方法,其特 征在于: 所述反馈装置接收计算装置传送来的记录集, 分页发送给查询 者的浏览器界面。  7. The method for providing and querying information for the public by using the Internet according to claim 6, wherein: the feedback device receives the record set transmitted by the computing device and sends the page to the browser interface of the queryer.
8. 根据权利要求 7所述的利用互联网为公众提供和查询信息的方法,.其特 征在于: 所述主记录数据库以列表的形式存储信息的链接、 查询路径、 分结果, 包括: 资源表、 资源评分记录表和评分路径表。  8. The method for providing and querying information for the public by using the Internet according to claim 7, wherein: the master record database stores a link, a query path, and a result of the information in the form of a list, including: a resource table, Resource score record table and score path table.
9. 根据权利要求 8所述的利用互联网为公众提供和查询信息的方法, 其特 征在于: 所述资源表包括: 资源名称、 资源序列号、 链接地址、 资源分 类号、 描述、 发布时间、 用户序列号; 资源评分记录表包括: 评分序列 号、 资源序列号、 评分等级; 评分路径表包括: 资源评分记录序列号、 路径序列号。  9. The method for providing and querying information for the public by using the Internet according to claim 8, wherein: the resource table comprises: a resource name, a resource serial number, a link address, a resource classification number, a description, a release time, and a user. The serial number; the resource score record table includes: a score serial number, a resource serial number, and a score level; the score path table includes: a resource score record serial number, a path serial number.
10.一种利用互联网为公众提供和查询信息的方法, 具有客户机和服务器结 构, 其特征在于: 所述客户机包括用于向服务器的存储装置传送信息内 容的发布装置, 用于向服务器的计算装置发送查询内容的搜索装置; 服 . 10. A method of providing and querying information for the public by using the Internet, having a client and server structure, characterized in that: the client comprises a distribution device for transmitting information content to a storage device of the server, for a computing device that sends a query to the search device;
. 务器包括用于将发布装置传送来的信息内容记录在主记录数据库中的存 储装置, 用于存储数据的主记录数据库, 用于根据搜索装置发来的查询 内容, 从主记录数据库中提取记录、 汇总、 将汇总结果进行排序并发送 的计算装置, 用于接收计算装置传送来的记录集, .发送给客户机界面的 反馈装置。 The server includes a storage device for recording information content transmitted by the publishing device in the master record database, a master record database for storing data, and extracting from the master record database according to the query content sent by the search device. Recording, summarizing, computing means for sorting and transmitting the summary results, for receiving a record set transmitted by the computing device, a feedback device sent to the client interface.
PCT/CN2007/002259 2006-08-29 2007-07-25 A method for providing and searching information to the public using internet WO2008028395A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNB2006100623756A CN100462969C (en) 2006-08-29 2006-08-29 Method for providing and inquiry information for public by interconnection network
CN200610062375.6 2006-08-29

Publications (1)

Publication Number Publication Date
WO2008028395A1 true WO2008028395A1 (en) 2008-03-13

Family

ID=38692586

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/002259 WO2008028395A1 (en) 2006-08-29 2007-07-25 A method for providing and searching information to the public using internet

Country Status (2)

Country Link
CN (1) CN100462969C (en)
WO (1) WO2008028395A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5325159B2 (en) * 2010-05-12 2013-10-23 インターナショナル・ビジネス・マシーンズ・コーポレーション File server for extracting file list and displaying on client, method for displaying on client and computer program executable on file server
US8768861B2 (en) * 2010-05-31 2014-07-01 Yahoo! Inc. Research mission identification
CN102314435A (en) * 2010-06-30 2012-01-11 腾讯科技(深圳)有限公司 Method for searching webpage content and system
CN102467367B (en) * 2010-11-03 2015-09-02 北京北方微电子基地设备工艺研究中心有限责任公司 The help system of equipment control software and its implementation
CN103020253A (en) * 2012-12-20 2013-04-03 北京奇虎科技有限公司 Application search method and equipment
CN103336784B (en) * 2013-06-04 2016-04-20 百度在线网络技术(北京)有限公司 A kind of method and apparatus of the preferred resources descriptor for determining resource
CN107305574A (en) * 2016-04-25 2017-10-31 百度在线网络技术(北京)有限公司 Object search method and device
CN106503225A (en) * 2016-11-04 2017-03-15 奇异牛科技(深圳)有限公司 A kind of fragmentation demand and the integration platform and integration method of fragmentation Service Source
CN108279835B (en) * 2017-01-05 2021-03-02 腾讯科技(深圳)有限公司 Window display control method and device
CN107273508B (en) * 2017-06-20 2020-07-10 北京百度网讯科技有限公司 Information processing method and device based on artificial intelligence
CN107679077B (en) * 2017-08-28 2020-03-24 平安科技(深圳)有限公司 Paging implementation method and device, computer equipment and storage medium
CN110020045B (en) * 2017-09-25 2021-07-27 北京国双科技有限公司 Keyword path analysis method and device
CN111223533B (en) * 2019-12-24 2024-02-13 深圳市联影医疗数据服务有限公司 Medical data retrieval method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1462003A (en) * 2002-05-28 2003-12-17 百度在线网络技术(北京)有限公司 Method of issuring information and queuing by bid using searching engine
US20060074843A1 (en) * 2004-09-30 2006-04-06 Pereira Luis C World wide web directory for providing live links
CN1818908A (en) * 2006-03-16 2006-08-16 董崇军 Feedbakc information use of searcher in search engine

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10327189A (en) * 1997-05-27 1998-12-08 Nippon Telegr & Teleph Corp <Ntt> Evaluation service providing system
JPH11312177A (en) * 1998-04-28 1999-11-09 Victor Co Of Japan Ltd Device for evaluating home page preference
WO2002046961A1 (en) * 2000-12-06 2002-06-13 Sony Corporation Information processing device
KR20040006515A (en) * 2002-07-12 2004-01-24 주식회사 네오위즈 Method And System for Providing Information Service System and Searching Result by Using Log Analysis and Information Inputed by User

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1462003A (en) * 2002-05-28 2003-12-17 百度在线网络技术(北京)有限公司 Method of issuring information and queuing by bid using searching engine
US20060074843A1 (en) * 2004-09-30 2006-04-06 Pereira Luis C World wide web directory for providing live links
CN1818908A (en) * 2006-03-16 2006-08-16 董崇军 Feedbakc information use of searcher in search engine

Also Published As

Publication number Publication date
CN100462969C (en) 2009-02-18
CN101000611A (en) 2007-07-18

Similar Documents

Publication Publication Date Title
WO2008028395A1 (en) A method for providing and searching information to the public using internet
US8856096B2 (en) Extending keyword searching to syntactically and semantically annotated data
US9507867B2 (en) Discovery engine
Kumar et al. Keyword query based focused Web crawler
US8751466B1 (en) Customizable answer engine implemented by user-defined plug-ins
US8099423B2 (en) Hierarchical metadata generator for retrieval systems
Cheng et al. Entity synonyms for structured web search
US8504567B2 (en) Automatically constructing titles
Liu et al. Identifying web spam with the wisdom of the crowds
US20140032529A1 (en) Information resource identification system
US7548912B2 (en) Simplified search interface for querying a relational database
US20070250501A1 (en) Search result delivery engine
US20060047649A1 (en) Internet and computer information retrieval and mining with intelligent conceptual filtering, visualization and automation
US20060106793A1 (en) Internet and computer information retrieval and mining with intelligent conceptual filtering, visualization and automation
EP1587009A2 (en) Content propagation for enhanced document retrieval
US20120054440A1 (en) Systems and methods for providing a hierarchy of cache layers of different types for intext advertising
US9754022B2 (en) System and method for language sensitive contextual searching
US20110082850A1 (en) Network resource interaction detection systems and methods
US8626757B1 (en) Systems and methods for detecting network resource interaction and improved search result reporting
Croft et al. Search engines
Shestakov Intelligent Web Crawling.
Dinesh Real world evaluation of approaches to research paper recommendation
Xu et al. Method of deep web collection for mobile application store based on category keyword searching
Srinivasan et al. Improving Search Results Through Reducing Replica in User Profile
Hold et al. ECIR-A Lightweight Approach for Entity-Centric Information Retrieval.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07785174

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION UNDER RULE 112(1) EPC, EPO FORM 1205A DATED 21/07/09

122 Ep: pct application non-entry in european phase

Ref document number: 07785174

Country of ref document: EP

Kind code of ref document: A1