CN109960757A - Web search method and device - Google Patents

Web search method and device Download PDF

Info

Publication number
CN109960757A
CN109960757A CN201910146274.4A CN201910146274A CN109960757A CN 109960757 A CN109960757 A CN 109960757A CN 201910146274 A CN201910146274 A CN 201910146274A CN 109960757 A CN109960757 A CN 109960757A
Authority
CN
China
Prior art keywords
document
query statement
feature
score
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910146274.4A
Other languages
Chinese (zh)
Inventor
杨东旭
谢远江
许静芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201910146274.4A priority Critical patent/CN109960757A/en
Publication of CN109960757A publication Critical patent/CN109960757A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of web search method and devices, this method comprises: receiving the query statement of user's input;Obtain the collection of document to match with the query statement;The document in the collection of document is filtered for default feature, obtains preferred documents;Calculate the sequence score of the preferred documents;The preferred documents are ranked up from high to low according to the sequence score.Using the present invention, the consumption to computing resource can be substantially reduced, promotes retrieval performance.

Description

Web search method and device
Technical field
The present invention relates to web search fields, and in particular to a kind of web search method and device.
Background technique
For web search system, traditional retrieval mode is in such a way that the row of falling asks friendship, by query statement Each query word establishes the row of falling of corresponding collection of document, and therefrom chooses some qualified documents, for example, to looking into The corresponding document of all query words ask in sentence takes intersection, that is, chooses the document that wherein all query words occur, then right These documents carry out correlation marking and sort.Although this method can remove completely unrelated document to a certain extent, But can still there be the very low document of many correlations in the intersection;And when carrying out correlation calculations, it usually needs calculate The correlation of query statement and every document in the intersection.Correlation calculations are larger to the consumption of computing resource, and then influence Retrieval performance.
Summary of the invention
The embodiment of the present invention provides a kind of web search method and device, to reduce resource consumption, improves retrieval performance.
For this purpose, the invention provides the following technical scheme:
A kind of web search method, which comprises
Receive the query statement of user's input;
Obtain the collection of document to match with the query statement;
The document in the collection of document is filtered for default feature, obtains preferred documents;
Calculate the sequence score of the preferred documents;
The preferred documents are ranked up from high to low according to the sequence score.
Optionally, described that the document in the collection of document is filtered for default feature, obtain preferred documents packet It includes:
Calculate the default feature score that each document in the collection of document corresponds to the query statement;
Select the default feature score greater than the document of given threshold as preferred documents, or according to the default spy It obtains and selects the document of setting quantity as preferred documents point from high to low.
Optionally, the default feature includes following any one or more: BM25 feature, window feature, title feature, Crucial characteristic of field.
Optionally, it calculates the document and corresponds to the BM25 feature score of the query statement and include:
Cutting is carried out to the query statement, obtains each query word;
Calculate the Relevance scores of the query word Yu the document;
According to the weight and Relevance scores of query word each in the query statement, calculates the document and correspond to the inquiry The BM25 feature score of sentence.
Optionally, it calculates the document and corresponds to the window feature score of the query statement and include:
The character in the document is successively slipped over using the window of setting length, obtains matching with the query statement All matching segments, the length of the window are the presupposition multiple of the query statement length;
The weight that word number and hit word in the query statement are hit according to the matching segment, calculates the matching The matching degree of segment;
The window spy of the query statement is corresponded to using the maximum value in the matching degree of the matching segment as the document It obtains point.
Optionally, it calculates the document and corresponds to the title feature score of the query statement and include:
The weight of the word number and hit word in the query statement is hit according to the corresponding title of the document, calculates institute State the title feature score that document corresponds to the query statement.
Optionally, it calculates the document and corresponds to the key field feature score of the query statement and include:
The weight of the word number and hit word in the query statement is hit according to the corresponding key field of the document, is calculated The document corresponds to the key field feature score of the query statement.
A kind of Web page searching device, described device include:
Receiving module, for receiving the query statement of user's input;
Enquiry module, for obtaining the collection of document to match with the query statement;
Filtering module obtains preferred documents for being filtered for default feature to the document in the collection of document;
Computing module, for calculating the sequence score of the preferred documents;
Sorting module, for being ranked up from high to low to the preferred documents according to the sequence score.
Optionally, the filtering module includes:
Feature score computing unit corresponds to the default spy of the query statement for calculating each document in the collection of document It obtains point;
Screening unit, for selecting the default feature score to be greater than the document of given threshold as preferred documents;Or From high to low according to the default feature score, select the document of setting quantity as preferred documents.
Optionally, the default feature includes following any one or more: BM25 feature, window feature, title feature, Crucial characteristic of field;
The feature score computing unit includes following any one or more units: BM25 feature calculation unit, window Feature calculation unit, title feature computing unit, key field feature calculation unit.
Optionally, the BM25 feature calculation unit includes:
Cutting subelement obtains each query word for carrying out cutting to the query statement;
Correlation calculations subelement, for calculating the Relevance scores of the query word Yu the document;
Feature score computation subunit, for the weight and Relevance scores according to query word each in the query statement, Calculate the BM25 feature score that the document corresponds to the query statement.
Optionally, the window feature computing unit includes:
Matching segment determines subelement, for successively slipping over the character in the document using the window of setting length, obtains To all matching segments to match with the query statement, the length of the window is default times of the query statement length Number;
Matching degree computation subunit, for hitting the word number in the query statement according to the matching segment and being hit The weight of word calculates the matching degree of the matching segment;
Feature score determines subelement, for using the maximum value in the matching degree of the matching segment as the document pair Answer the window feature score of the query statement.
Optionally, the title feature computing unit, specifically for being looked into according to the corresponding title hit of the document The weight for asking the word number and hit word in sentence, calculates the title feature score that the document corresponds to the query statement.
Optionally, the key field feature calculation unit is specifically used for hitting institute according to the corresponding key field of the document The weight for stating the word number and hit word in query statement, calculates the document and corresponds to the crucial characteristic of field of the query statement and obtain Point.
A kind of electronic equipment, comprising: one or more processors, memory;
For the memory for storing computer executable instructions, the processor is executable for executing the computer Instruction, to realize mentioned-above method.
A kind of readable storage medium storing program for executing, is stored thereon with instruction, and described instruction is performed to realize mentioned-above method.
Web search method and device provided in an embodiment of the present invention is obtaining matching with the query statement that user inputs Collection of document after, the document in the collection of document is filtered first against default feature, obtains preferred documents, then The sequence score for calculating the preferred documents again is from high to low ranked up the preferred documents according to the sequence score. By filtration treatment, eliminates in the collection of document with the lower document of query statement correlation, filter out and query statement Then the higher document of correlation only carries out marking sequence to filtered document, that is, preferred documents, thus greatly reduce pair The consumption of computing resource, improves retrieval performance.
It further, can be for one or more default spies when being filtered to the document in the collection of document Sign, calculates the default feature score that each document in the collection of document corresponds to the query statement, by the default feature score Lower document filters out, and retains and the higher document of query statement correlation.According to multiple and different default features to the text Shelves be filtered, can be filtered out to the maximum extent from multiple and different angles with the higher preferred documents of query statement correlation, Further reduce the consumption in subsequent sequence calculating to computing resource.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only one recorded in the present invention A little embodiments are also possible to obtain other drawings based on these drawings for those of ordinary skill in the art.
Fig. 1 is the flow chart of web search method of the embodiment of the present invention;
Fig. 2 is a kind of structural block diagram of Web page searching device of the embodiment of the present invention;
Fig. 3 is a kind of structural block diagram of BM25 feature calculation unit in the embodiment of the present invention;
Fig. 4 is a kind of structural block diagram of window feature computing unit in the embodiment of the present invention;
Fig. 5 is a kind of block diagram of device for web search method shown according to an exemplary embodiment;
Fig. 6 is the structural schematic diagram of server in the embodiment of the present invention.
Specific embodiment
The scheme of embodiment in order to enable those skilled in the art to better understand the present invention with reference to the accompanying drawing and is implemented Mode is described in further detail the embodiment of the present invention.
The problem of for existing web search, the embodiment of the present invention provide a kind of web search method and dress It sets, after the collection of document for obtaining matching with the query statement that user inputs, first against default feature to the document sets Document in conjunction is filtered, and obtains preferred documents, the sequence score of the preferred documents is then calculated again, according to the sequence Score is from high to low ranked up the preferred documents.
As shown in Figure 1, being the flow chart of web search method of the embodiment of the present invention, comprising the following steps:
Step 101, the query statement of user's input is received.
The query statement can be input to searching for browser by the various input that smart machine provides by user In rope column, for example, the query statement can be inputted by modes such as voice input, text input, handwriting inputs.The inquiry Sentence may include one or more query words, for example, " document ", " rainbow meeting room " etc..
Step 102, the collection of document to match with the query statement is obtained.
Specifically, the collection of document can be obtained by the search engine service on backstage, wrapped in the collection of document The corresponding document of the corresponding all search results of current queries sentence is contained.
For example, described search engine service can be in such a way that the row of falling asks friendship, by every in the query statement A query word establishes the row of falling of corresponding collection of document, to the corresponding document meter of all query words in the query statement Intersection is calculated, the document that all query words occur in the query statement is chosen, generates the text to match with the query statement Shelves set.
Step 103, the document in the collection of document is filtered for default feature, obtains preferred documents.
In embodiments of the present invention, the default feature may include following any one or more: BM25 feature, window Feature, title feature, crucial characteristic of field.
When being filtered for default feature to the document in the collection of document, can calculate in the collection of document Each document corresponds to the default feature score of the query statement, and the default feature score is then selected to be greater than the text of given threshold Shelves are used as preferred documents, or from high to low according to the default feature score, select the document of setting quantity as preferred text Shelves.
It, can be respectively for each default feature in the manner described above to described when being set with multiple default features Document in collection of document is filtered, and obtains preferred documents.It should be noted that for each default feature, the threshold value Feature can be preset according to this to be set accordingly, so that the threshold value is adapted with corresponding default feature.
It is described in detail separately below for above-mentioned each default feature to document described in the document calculations in the collection of document The process of the default feature score of the corresponding query statement.
(1) it is directed to BM25 feature
The BM25 is a kind of algorithm for evaluating correlation between query word and document, it is a kind of based on probability inspection The algorithm that rope model proposes.
The document in the collection of document is filtered for BM25 feature, i.e., calculates the document using BM25 algorithm Each document corresponds to the BM25 feature score of the query statement in set.Specifically, firstly, being cut to the query statement Point, obtain each query word;Then the Relevance scores of the query word Yu the document are calculated;Finally according to the query statement In each query word weight and Relevance scores, by weighted calculation obtain the document correspond to the query statement BM25 it is special It obtains point.
Calculating for the weight of each query word in the query statement, can by under online to all document datas into Row statistics, obtains the word frequency and document frequency of each query word, and with due regard to part of speech, whether the information such as entity word, calculate It arrives.
It should be noted that the acquisition of query word needs to carry out described search sentence certain processing, for example remove institute It states some non-key words in search statement, proper transformation etc. is carried out to partial words, to increase the accuracy for the document recalled And it is comprehensive.These processing can use the prior art, and details are not described herein.
(2) it is directed to window feature
Firstly, successively slipping over the character in the document using the window of setting length, obtain and the query statement phase The length of matched all matching segments, the matching segment is the presupposition multiple of the query statement length, for example, 1 times, 3 Times etc.;Then the weight that word number and hit word in the query statement are hit according to the matching segment, calculates described Matching degree with segment;Maximum value in the matching degree of the matching segment is corresponded into the query statement as the document Window feature score.
For example, query statement are as follows: " rainbow meeting room ", length 5 are made of " rainbow " and " meeting room " two words, false If wherein " rainbow " weight is 15, " meeting room " weight is 10;
Document are as follows: " soho network mansion 8 floor rainbow this meeting room occupied ";
Assuming that length of window is 5, window is successively slipped over to the character in above-mentioned document, obtains two matching segments, respectively For " this meeting of rainbow ", " this meeting room ".
The matching degree for each matching segment being calculated: for matching segment " this meeting of rainbow ", matching degree are as follows: 15;For matching segment " this meeting room ", matching degree are as follows: 10.
Wherein matching degree maximum value is selected to correspond to the window feature score i.e. score of the query statement as the document It is 15.
It should be noted that in practical applications, the window of different length can be set separately, for example, the window Length can be respectively 1 times and 3 times of the query statement length.Correspondingly, it can be directed to the window and 3 of 1 times of length respectively The window of times length, calculates the window feature score that the document corresponds to the query statement.
(3) it is directed to title feature
It is similar with the calculating of the above-mentioned matching degree for window feature, institute can be hit according to the corresponding title of the document The weight for stating the word number and hit word in query statement, calculates the document and corresponds to the title feature of the query statement and obtain Point.
If query statement is " whether rainbow meeting room has been subscribed ", the weight of query word " rainbow " wherein included is 15, The weight of query word " meeting room " is 10, and " whether " weight is 1 to query word, and the weight of query word " " is 2, and query word is " pre- Order " weight be 12.
Whether document 1 entitled " afternoon reservation meeting room ", be matched to " whether ", " ", " reservation ", " meeting room ", The then title feature score of the corresponding query statement of document 1 are as follows: 1+2+12+10=25;
Document 2 is entitled " having subscribed to rainbow ", is matched to " reservation ", " rainbow ", then document 2 corresponds to the query statement Title feature score are as follows: 12+15=27.
(4) for crucial characteristic of field
The key field such as may include: domain name, Anchor Text, meta label etc..
Wherein, meta label can be used to describe the attribute of a html web page document, for example, author, date and time, Webpage description, keyword, page furbishing etc..
When calculating key field feature score, can be hit in the query statement according to the corresponding key field of the document Word number and hit word weight, calculate the key field feature score that the document corresponds to the query statement.
When there are multiple default features, the document in the collection of document can be carried out for each default feature respectively Then the preferred documents obtained for each default feature are taken union by filtering.That is, which presets feature mistake whether for It filters obtained preferred documents and is used as final preferred documents.
Step 104, the sequence score of the preferred documents is calculated.
When being ranked up score calculating, the prior art can be used, corresponding function is chosen according to application demand to count It calculates, such as MatchRank, AnchorRank, TitleRank etc., without limitation to this embodiment of the present invention.
Step 105, the preferred documents are ranked up from high to low according to the sequence score.
Web search method provided in an embodiment of the present invention, in the document for obtaining matching with the query statement that user inputs After set, the document in the collection of document is filtered first against default feature, preferred documents is obtained, then calculates again The sequence score of the preferred documents is from high to low ranked up the preferred documents according to the sequence score.Passed through Filter processing, eliminates in the collection of document with the lower document of query statement correlation, filters out and query statement correlation Then higher document only carries out marking sequence to filtered document, that is, preferred documents, thus greatly reduce and provide to calculating The consumption in source, improves retrieval performance.
Such as query statement " rainbow meeting room " above, based on arranging in the searching system for asking friendship, if one is long Text, wherein having comprising " rainbow " in one section, far apart another section includes " meeting room ", then this long article is because meet simultaneously There are " rainbow " and " meeting room " two query words, therefore can be called back.
If this long article can enter time-consuming correlation marking and sequence link according to the prior art.If pressed According to the scheme of the embodiment of the present invention, due to increasing correlation beta pruning process, i.e., to the collection of document to match with query statement In document be filtered, by the filter process, it is very low that the window feature of this long article obtains branch, it is possible to be filtered Fall, avoids the lower document of this correlation from entering correlation marking and sequence link, retrieval performance is effectively promoted.
It further, can be for one or more default spies when being filtered to the document in the collection of document Sign, calculates the default feature score that each document in the collection of document corresponds to the query statement, by the default feature score Lower document filters out, and retains and the higher document of query statement correlation.According to multiple and different default features to the text Shelves be filtered, can be filtered out to the maximum extent from multiple and different angles with the higher preferred documents of query statement correlation, Further reduce the consumption in subsequent sequence calculating to computing resource.
It should be noted that can be executable in such as one group of computer the step of the process of above-mentioned each attached drawing illustrates It is executed in the computer system of instruction, although also, logical order is shown in flow charts, and it in some cases, can With the steps shown or described are performed in an order that is different from the one herein.
Correspondingly, present invention implementation also provides a kind of Web page searching device, as shown in Fig. 2, being a kind of structure of the device Schematic diagram.
In this embodiment, described device includes following module:
Receiving module 201, for receiving the query statement of user's input;
Enquiry module 202, for obtaining the collection of document to match with the query statement;
Filtering module 203 obtains preferred text for being filtered for default feature to the document in the collection of document Shelves;
Computing module 204, for calculating the sequence score of the preferred documents;
Sorting module 205, for being ranked up from high to low to the preferred documents according to the sequence score.
Wherein, the query statement can be input to browser by the various input that smart machine provides by user Search column in, moreover, the query statement may include one or more query words, for example, " document ", " rainbow meeting room " Deng.The enquiry module 202 can obtain the collection of document by the search engine service on backstage, in the collection of document Contain the corresponding document of the corresponding all search results of current queries sentence.
The filtering module 203 can specifically include: feature score computing unit and screening unit.Wherein, the feature Score calculation unit is for calculating the default feature score that each document in the collection of document corresponds to the query statement;The sieve The document that menu member is used to that the default feature score to be selected to be greater than given threshold is as preferred documents;Or according to described default Feature score from high to low, selects the document of setting quantity as preferred documents.
The computing module 204 can use the prior art, be chosen according to application demand when being ranked up score calculating Corresponding function calculates, such as MatchRank, AnchorRank, TitleRank etc., does not limit this embodiment of the present invention It is fixed.
Web page searching device provided in an embodiment of the present invention, in the document for obtaining matching with the query statement that user inputs After set, the document in the collection of document is filtered first against default feature, preferred documents is obtained, then calculates again The sequence score of the preferred documents is from high to low ranked up the preferred documents according to the sequence score.Passed through Filter processing, eliminates in the collection of document with the lower document of query statement correlation, filters out and query statement correlation Then higher document only carries out marking sequence to filtered document, that is, preferred documents, thus greatly reduce and provide to calculating The consumption in source, improves retrieval performance.
In practical applications, the default feature may include following any one or more: BM25 feature, window are special Sign, title feature, crucial characteristic of field.
Correspondingly, the feature score computing unit may include following any one or more units: BM25 feature meter Calculate unit, window feature computing unit, title feature computing unit, key field feature calculation unit.Wherein:
The title feature computing unit specifically can be used for hitting the inquiry language according to the corresponding title of the document The weight of word number and hit word in sentence, calculates the title feature score that the document corresponds to the query statement.
The key field feature calculation unit is specifically used for hitting the inquiry language according to the corresponding key field of the document The weight of word number and hit word in sentence, calculates the key field feature score that the document corresponds to the query statement.
The specific structure is shown in FIG. 3 for one kind of the BM25 feature calculation unit, including following subelement:
Cutting subelement 31 obtains each query word for carrying out cutting to the query statement;
Correlation calculations subelement 32, for calculating the Relevance scores of the query word Yu the document;
Feature score computation subunit 33, for being obtained according to the weight and correlation of query word each in the query statement Point, calculate the BM25 feature score that the document corresponds to the query statement.
A kind of specific structure of the window feature computing unit is as shown in figure 4, include following subelement:
Matching segment determines subelement 41, for successively slipping over the character in the document using the window of setting length, All matching segments to match with the query statement are obtained, the length of the window is the default of the query statement length Multiple;
Matching degree computation subunit 42, for hitting the word number in the query statement according to the matching segment and being ordered The weight of middle word calculates the matching degree of the matching segment;
Feature score determines subelement 43, for using the maximum value in the matching degree of the matching segment as the document The window feature score of the corresponding query statement.
It should be noted that in practical applications, the presupposition multiple can have one or more.That is, can be with The window of different length is set separately, for example, the length of the window is respectively 1 times and 3 times of the query statement length.Phase Ying Di, the window feature computing unit can be directed to the window of 1 times of length and the window of 3 times of length respectively, calculate the text The window feature score of the corresponding query statement of shelves.
Web page searching device provided in an embodiment of the present invention can when being filtered to the document in the collection of document The default feature of the query statement is corresponded to for one or more default features, to calculate each document in the collection of document to obtain Point, the default lower document of feature score is filtered out, is retained and the higher document of query statement correlation.According to multiple The default feature of difference is filtered the document, can filter out to the maximum extent from multiple and different angles and query statement phase The higher preferred documents of closing property further reduce the consumption in subsequent sequence calculating to computing resource.
Fig. 5 is a kind of block diagram of device 800 for web search method shown according to an exemplary embodiment.Example Such as, device 800 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, and plate is set It is standby, Medical Devices, body-building equipment, personal digital assistant etc..
Referring to Fig. 5, device 800 may include following one or more components: processing component 802, memory 804, power supply Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, and Communication component 816.
The integrated operation of the usual control device 800 of processing component 802, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing element 802 may include that one or more processors 820 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more modules, just Interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, it is more to facilitate Interaction between media component 808 and processing component 802.
Memory 804 is configured as storing various types of other data to support the operation in equipment 800.These data are shown Example includes the instruction of any application or method for operating on device 800, contact data, and telephone book data disappears Breath, picture, video etc..Memory 804 can be by the volatibility or non-volatile memory device or their group of any classification It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Electric power assembly 806 provides electric power for the various assemblies of device 800.Electric power assembly 806 may include power management system System, one or more power supplys and other with for device 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between described device 800 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 808 includes a front camera and/or rear camera.When equipment 800 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when device 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched It is set to reception external audio signal.The received audio signal can be further stored in memory 804 or via communication set Part 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.
Sensor module 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented Estimate.For example, sensor module 814 can detecte the state that opens/closes of equipment 800, and the relative positioning of component, for example, it is described Component is the display and keypad of device 800, and sensor module 814 can be with 800 1 components of detection device 800 or device Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800 Temperature change.Sensor module 814 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 814 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 804 of instruction, above-metioned instruction can be completed above-mentioned key by the execution of the processor 820 of device 800, and accidentally touching is entangled Wrong method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD- ROM, tape, floppy disk and optical data storage devices etc..
The present invention also provides a kind of non-transitorycomputer readable storage mediums, when the instruction in the storage medium is by moving When the processor of dynamic terminal executes, so that mobile terminal is able to carry out all or part of step in aforementioned present invention embodiment of the method Suddenly.
Fig. 6 is the structural schematic diagram of server in the embodiment of the present invention.The server 1900 can be different because of configuration or performance And generate bigger difference, may include one or more central processing units (Central Processing Units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs 1942 or data 1944 storage medium 1930 (such as one or more mass memory units).Wherein, memory 1932 It can be of short duration storage or persistent storage with storage medium 1930.Be stored in storage medium 1930 program may include one or More than one module (diagram does not mark), each module may include to the series of instructions operation in server.Further Ground, central processing unit 1922 can be set to communicate with storage medium 1930, and storage medium 1930 is executed on server 1900 In series of instructions operation.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..
Obviously, embodiment described above only a part of the embodiments of the present invention, instead of all the embodiments. Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts all Other embodiments should fall within the scope of the present invention.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of web search method, which is characterized in that the described method includes:
Receive the query statement of user's input;
Obtain the collection of document to match with the query statement;
The document in the collection of document is filtered for default feature, obtains preferred documents;
Calculate the sequence score of the preferred documents;
The preferred documents are ranked up from high to low according to the sequence score.
2. the method according to claim 1, wherein described be directed to default feature to the text in the collection of document Shelves are filtered, and obtaining preferred documents includes:
Calculate the default feature score that each document in the collection of document corresponds to the query statement;
The document for selecting the default feature score to be greater than given threshold is obtained as preferred documents, or according to the default feature Divide the document for selecting setting quantity from high to low as preferred documents.
3. according to the method described in claim 2, it is characterized in that, the default feature includes following any one or more: BM25 feature, window feature, title feature, crucial characteristic of field.
4. according to the method described in claim 3, it is characterized in that, calculating the BM25 spy that the document corresponds to the query statement It obtains and point includes:
Cutting is carried out to the query statement, obtains each query word;
Calculate the Relevance scores of the query word Yu the document;
According to the weight and Relevance scores of query word each in the query statement, calculates the document and correspond to the query statement BM25 feature score.
5. according to the method described in claim 3, it is characterized in that, calculating the window spy that the document corresponds to the query statement It obtains and point includes:
The character in the document is successively slipped over using the window of setting length, obtains matching with the query statement all Segment is matched, the length of the window is the presupposition multiple of the query statement length;
The weight that word number and hit word in the query statement are hit according to the matching segment, calculates the matching segment Matching degree;
The window feature that maximum value in the matching degree of the matching segment corresponds to the query statement as the document is obtained Point.
6. according to the method described in claim 3, it is characterized in that, calculating the title spy that the document corresponds to the query statement It obtains and point includes:
The weight that the word number and hit word in the query statement are hit according to the corresponding title of the document, calculates the text The title feature score of the corresponding query statement of shelves.
7. according to the method described in claim 3, it is characterized in that, calculating the key field that the document corresponds to the query statement Feature score includes:
The weight of the word number and hit word in the query statement is hit according to the corresponding key field of the document, described in calculating Document corresponds to the key field feature score of the query statement.
8. a kind of Web page searching device, which is characterized in that described device includes:
Receiving module, for receiving the query statement of user's input;
Enquiry module, for obtaining the collection of document to match with the query statement;
Filtering module obtains preferred documents for being filtered for default feature to the document in the collection of document;
Computing module, for calculating the sequence score of the preferred documents;
Sorting module, for being ranked up from high to low to the preferred documents according to the sequence score.
9. a kind of electronic equipment characterized by comprising one or more processors, memory;
The memory is for storing computer executable instructions, and for executing, the computer is executable to be referred to the processor It enables, to realize method as described in any one of claim 1 to 7.
10. a kind of readable storage medium storing program for executing, is stored thereon with instruction, described instruction is performed to realize as claim 1 to 7 is any Method described in.
CN201910146274.4A 2019-02-27 2019-02-27 Web search method and device Pending CN109960757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910146274.4A CN109960757A (en) 2019-02-27 2019-02-27 Web search method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910146274.4A CN109960757A (en) 2019-02-27 2019-02-27 Web search method and device

Publications (1)

Publication Number Publication Date
CN109960757A true CN109960757A (en) 2019-07-02

Family

ID=67023975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910146274.4A Pending CN109960757A (en) 2019-02-27 2019-02-27 Web search method and device

Country Status (1)

Country Link
CN (1) CN109960757A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495984A (en) * 2020-03-20 2021-10-12 华为技术有限公司 Statement retrieval method and related device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625680A (en) * 2008-07-09 2010-01-13 东北大学 Document retrieval method in patent field
CN101990670A (en) * 2008-04-11 2011-03-23 微软公司 Search results ranking using editing distance and document information
CN102364467A (en) * 2011-09-29 2012-02-29 北京亿赞普网络技术有限公司 Network search method and system
CN103064846A (en) * 2011-10-20 2013-04-24 北京中搜网络技术股份有限公司 Retrieval device and retrieval method
CN103092945A (en) * 2013-01-11 2013-05-08 北京百度网讯科技有限公司 Searching method and device based on interface returning
CN103294681A (en) * 2012-02-23 2013-09-11 北京百度网讯科技有限公司 Method and device for generating search result
CN104050235A (en) * 2014-03-27 2014-09-17 浙江大学 Distributed information retrieval method based on set selection
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
US20150286729A1 (en) * 2014-04-02 2015-10-08 Samsung Electronics Co., Ltd. Method and system for content searching
CN105956148A (en) * 2016-05-12 2016-09-21 北京奇艺世纪科技有限公司 Resource information recommendation method and apparatus
CN106951411A (en) * 2017-03-24 2017-07-14 福州大学 The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing
CN108829819A (en) * 2018-06-12 2018-11-16 上海智臻智能网络科技股份有限公司 Personalized text recommended method and system, server, readable storage medium storing program for executing

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101990670A (en) * 2008-04-11 2011-03-23 微软公司 Search results ranking using editing distance and document information
CN101625680A (en) * 2008-07-09 2010-01-13 东北大学 Document retrieval method in patent field
CN102364467A (en) * 2011-09-29 2012-02-29 北京亿赞普网络技术有限公司 Network search method and system
CN103064846A (en) * 2011-10-20 2013-04-24 北京中搜网络技术股份有限公司 Retrieval device and retrieval method
CN103294681A (en) * 2012-02-23 2013-09-11 北京百度网讯科技有限公司 Method and device for generating search result
CN103092945A (en) * 2013-01-11 2013-05-08 北京百度网讯科技有限公司 Searching method and device based on interface returning
CN104050235A (en) * 2014-03-27 2014-09-17 浙江大学 Distributed information retrieval method based on set selection
US20150286729A1 (en) * 2014-04-02 2015-10-08 Samsung Electronics Co., Ltd. Method and system for content searching
CN104573028A (en) * 2015-01-14 2015-04-29 百度在线网络技术(北京)有限公司 Intelligent question-answer implementing method and system
CN105956148A (en) * 2016-05-12 2016-09-21 北京奇艺世纪科技有限公司 Resource information recommendation method and apparatus
CN106951411A (en) * 2017-03-24 2017-07-14 福州大学 The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing
CN108829819A (en) * 2018-06-12 2018-11-16 上海智臻智能网络科技股份有限公司 Personalized text recommended method and system, server, readable storage medium storing program for executing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘凤晨: "n- Gram/2L 索引结构的存储与时间优化算法", 《计算机工程与应用》 *
赵阳: "《基于中文信息处理的古籍整理研究评述》", 《图书情报工作》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495984A (en) * 2020-03-20 2021-10-12 华为技术有限公司 Statement retrieval method and related device

Similar Documents

Publication Publication Date Title
CN106708282B (en) A kind of recommended method and device, a kind of device for recommendation
CN105488112B (en) Information-pushing method and device
CN104331503B (en) The method and device of information push
CN110147467A (en) A kind of generation method, device, mobile terminal and the storage medium of text description
CN108038102A (en) Recommendation method, apparatus, terminal and the storage medium of facial expression image
CN108874939A (en) A kind of information search method and device
CN108073606A (en) A kind of news recommends method and apparatus, a kind of device recommended for news
CN107346182A (en) A kind of method for building user thesaurus and the device for building user thesaurus
CN107291772A (en) One kind search access method, device and electronic equipment
CN110222256A (en) A kind of information recommendation method, device and the device for information recommendation
CN108874827A (en) A kind of searching method and relevant apparatus
CN110069624A (en) Text handling method and device
CN106777016A (en) The method and device of information recommendation is carried out based on instant messaging
CN110502648A (en) Recommended models acquisition methods and device for multimedia messages
CN110110204A (en) A kind of information recommendation method, device and the device for information recommendation
CN110019885A (en) A kind of expression data recommended method and device
CN107045541A (en) data display method and device
CN103970831B (en) Recommend the method and apparatus of icon
CN110286775A (en) A kind of dictionary management method and device
CN110110207A (en) A kind of information recommendation method, device and electronic equipment
CN107729439A (en) Obtain the methods, devices and systems of multi-medium data
CN110309324A (en) A kind of searching method and relevant apparatus
CN107707759A (en) Terminal control method, device and system, storage medium
CN109960757A (en) Web search method and device
CN107436896A (en) Method, apparatus and electronic equipment are recommended in one kind input

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190702

RJ01 Rejection of invention patent application after publication