CN109960757A - Web search method and device - Google Patents
Web search method and device Download PDFInfo
- Publication number
- CN109960757A CN109960757A CN201910146274.4A CN201910146274A CN109960757A CN 109960757 A CN109960757 A CN 109960757A CN 201910146274 A CN201910146274 A CN 201910146274A CN 109960757 A CN109960757 A CN 109960757A
- Authority
- CN
- China
- Prior art keywords
- document
- query statement
- feature
- score
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a kind of web search method and devices, this method comprises: receiving the query statement of user's input;Obtain the collection of document to match with the query statement;The document in the collection of document is filtered for default feature, obtains preferred documents;Calculate the sequence score of the preferred documents;The preferred documents are ranked up from high to low according to the sequence score.Using the present invention, the consumption to computing resource can be substantially reduced, promotes retrieval performance.
Description
Technical field
The present invention relates to web search fields, and in particular to a kind of web search method and device.
Background technique
For web search system, traditional retrieval mode is in such a way that the row of falling asks friendship, by query statement
Each query word establishes the row of falling of corresponding collection of document, and therefrom chooses some qualified documents, for example, to looking into
The corresponding document of all query words ask in sentence takes intersection, that is, chooses the document that wherein all query words occur, then right
These documents carry out correlation marking and sort.Although this method can remove completely unrelated document to a certain extent,
But can still there be the very low document of many correlations in the intersection;And when carrying out correlation calculations, it usually needs calculate
The correlation of query statement and every document in the intersection.Correlation calculations are larger to the consumption of computing resource, and then influence
Retrieval performance.
Summary of the invention
The embodiment of the present invention provides a kind of web search method and device, to reduce resource consumption, improves retrieval performance.
For this purpose, the invention provides the following technical scheme:
A kind of web search method, which comprises
Receive the query statement of user's input;
Obtain the collection of document to match with the query statement;
The document in the collection of document is filtered for default feature, obtains preferred documents;
Calculate the sequence score of the preferred documents;
The preferred documents are ranked up from high to low according to the sequence score.
Optionally, described that the document in the collection of document is filtered for default feature, obtain preferred documents packet
It includes:
Calculate the default feature score that each document in the collection of document corresponds to the query statement;
Select the default feature score greater than the document of given threshold as preferred documents, or according to the default spy
It obtains and selects the document of setting quantity as preferred documents point from high to low.
Optionally, the default feature includes following any one or more: BM25 feature, window feature, title feature,
Crucial characteristic of field.
Optionally, it calculates the document and corresponds to the BM25 feature score of the query statement and include:
Cutting is carried out to the query statement, obtains each query word;
Calculate the Relevance scores of the query word Yu the document;
According to the weight and Relevance scores of query word each in the query statement, calculates the document and correspond to the inquiry
The BM25 feature score of sentence.
Optionally, it calculates the document and corresponds to the window feature score of the query statement and include:
The character in the document is successively slipped over using the window of setting length, obtains matching with the query statement
All matching segments, the length of the window are the presupposition multiple of the query statement length;
The weight that word number and hit word in the query statement are hit according to the matching segment, calculates the matching
The matching degree of segment;
The window spy of the query statement is corresponded to using the maximum value in the matching degree of the matching segment as the document
It obtains point.
Optionally, it calculates the document and corresponds to the title feature score of the query statement and include:
The weight of the word number and hit word in the query statement is hit according to the corresponding title of the document, calculates institute
State the title feature score that document corresponds to the query statement.
Optionally, it calculates the document and corresponds to the key field feature score of the query statement and include:
The weight of the word number and hit word in the query statement is hit according to the corresponding key field of the document, is calculated
The document corresponds to the key field feature score of the query statement.
A kind of Web page searching device, described device include:
Receiving module, for receiving the query statement of user's input;
Enquiry module, for obtaining the collection of document to match with the query statement;
Filtering module obtains preferred documents for being filtered for default feature to the document in the collection of document;
Computing module, for calculating the sequence score of the preferred documents;
Sorting module, for being ranked up from high to low to the preferred documents according to the sequence score.
Optionally, the filtering module includes:
Feature score computing unit corresponds to the default spy of the query statement for calculating each document in the collection of document
It obtains point;
Screening unit, for selecting the default feature score to be greater than the document of given threshold as preferred documents;Or
From high to low according to the default feature score, select the document of setting quantity as preferred documents.
Optionally, the default feature includes following any one or more: BM25 feature, window feature, title feature,
Crucial characteristic of field;
The feature score computing unit includes following any one or more units: BM25 feature calculation unit, window
Feature calculation unit, title feature computing unit, key field feature calculation unit.
Optionally, the BM25 feature calculation unit includes:
Cutting subelement obtains each query word for carrying out cutting to the query statement;
Correlation calculations subelement, for calculating the Relevance scores of the query word Yu the document;
Feature score computation subunit, for the weight and Relevance scores according to query word each in the query statement,
Calculate the BM25 feature score that the document corresponds to the query statement.
Optionally, the window feature computing unit includes:
Matching segment determines subelement, for successively slipping over the character in the document using the window of setting length, obtains
To all matching segments to match with the query statement, the length of the window is default times of the query statement length
Number;
Matching degree computation subunit, for hitting the word number in the query statement according to the matching segment and being hit
The weight of word calculates the matching degree of the matching segment;
Feature score determines subelement, for using the maximum value in the matching degree of the matching segment as the document pair
Answer the window feature score of the query statement.
Optionally, the title feature computing unit, specifically for being looked into according to the corresponding title hit of the document
The weight for asking the word number and hit word in sentence, calculates the title feature score that the document corresponds to the query statement.
Optionally, the key field feature calculation unit is specifically used for hitting institute according to the corresponding key field of the document
The weight for stating the word number and hit word in query statement, calculates the document and corresponds to the crucial characteristic of field of the query statement and obtain
Point.
A kind of electronic equipment, comprising: one or more processors, memory;
For the memory for storing computer executable instructions, the processor is executable for executing the computer
Instruction, to realize mentioned-above method.
A kind of readable storage medium storing program for executing, is stored thereon with instruction, and described instruction is performed to realize mentioned-above method.
Web search method and device provided in an embodiment of the present invention is obtaining matching with the query statement that user inputs
Collection of document after, the document in the collection of document is filtered first against default feature, obtains preferred documents, then
The sequence score for calculating the preferred documents again is from high to low ranked up the preferred documents according to the sequence score.
By filtration treatment, eliminates in the collection of document with the lower document of query statement correlation, filter out and query statement
Then the higher document of correlation only carries out marking sequence to filtered document, that is, preferred documents, thus greatly reduce pair
The consumption of computing resource, improves retrieval performance.
It further, can be for one or more default spies when being filtered to the document in the collection of document
Sign, calculates the default feature score that each document in the collection of document corresponds to the query statement, by the default feature score
Lower document filters out, and retains and the higher document of query statement correlation.According to multiple and different default features to the text
Shelves be filtered, can be filtered out to the maximum extent from multiple and different angles with the higher preferred documents of query statement correlation,
Further reduce the consumption in subsequent sequence calculating to computing resource.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only one recorded in the present invention
A little embodiments are also possible to obtain other drawings based on these drawings for those of ordinary skill in the art.
Fig. 1 is the flow chart of web search method of the embodiment of the present invention;
Fig. 2 is a kind of structural block diagram of Web page searching device of the embodiment of the present invention;
Fig. 3 is a kind of structural block diagram of BM25 feature calculation unit in the embodiment of the present invention;
Fig. 4 is a kind of structural block diagram of window feature computing unit in the embodiment of the present invention;
Fig. 5 is a kind of block diagram of device for web search method shown according to an exemplary embodiment;
Fig. 6 is the structural schematic diagram of server in the embodiment of the present invention.
Specific embodiment
The scheme of embodiment in order to enable those skilled in the art to better understand the present invention with reference to the accompanying drawing and is implemented
Mode is described in further detail the embodiment of the present invention.
The problem of for existing web search, the embodiment of the present invention provide a kind of web search method and dress
It sets, after the collection of document for obtaining matching with the query statement that user inputs, first against default feature to the document sets
Document in conjunction is filtered, and obtains preferred documents, the sequence score of the preferred documents is then calculated again, according to the sequence
Score is from high to low ranked up the preferred documents.
As shown in Figure 1, being the flow chart of web search method of the embodiment of the present invention, comprising the following steps:
Step 101, the query statement of user's input is received.
The query statement can be input to searching for browser by the various input that smart machine provides by user
In rope column, for example, the query statement can be inputted by modes such as voice input, text input, handwriting inputs.The inquiry
Sentence may include one or more query words, for example, " document ", " rainbow meeting room " etc..
Step 102, the collection of document to match with the query statement is obtained.
Specifically, the collection of document can be obtained by the search engine service on backstage, wrapped in the collection of document
The corresponding document of the corresponding all search results of current queries sentence is contained.
For example, described search engine service can be in such a way that the row of falling asks friendship, by every in the query statement
A query word establishes the row of falling of corresponding collection of document, to the corresponding document meter of all query words in the query statement
Intersection is calculated, the document that all query words occur in the query statement is chosen, generates the text to match with the query statement
Shelves set.
Step 103, the document in the collection of document is filtered for default feature, obtains preferred documents.
In embodiments of the present invention, the default feature may include following any one or more: BM25 feature, window
Feature, title feature, crucial characteristic of field.
When being filtered for default feature to the document in the collection of document, can calculate in the collection of document
Each document corresponds to the default feature score of the query statement, and the default feature score is then selected to be greater than the text of given threshold
Shelves are used as preferred documents, or from high to low according to the default feature score, select the document of setting quantity as preferred text
Shelves.
It, can be respectively for each default feature in the manner described above to described when being set with multiple default features
Document in collection of document is filtered, and obtains preferred documents.It should be noted that for each default feature, the threshold value
Feature can be preset according to this to be set accordingly, so that the threshold value is adapted with corresponding default feature.
It is described in detail separately below for above-mentioned each default feature to document described in the document calculations in the collection of document
The process of the default feature score of the corresponding query statement.
(1) it is directed to BM25 feature
The BM25 is a kind of algorithm for evaluating correlation between query word and document, it is a kind of based on probability inspection
The algorithm that rope model proposes.
The document in the collection of document is filtered for BM25 feature, i.e., calculates the document using BM25 algorithm
Each document corresponds to the BM25 feature score of the query statement in set.Specifically, firstly, being cut to the query statement
Point, obtain each query word;Then the Relevance scores of the query word Yu the document are calculated;Finally according to the query statement
In each query word weight and Relevance scores, by weighted calculation obtain the document correspond to the query statement BM25 it is special
It obtains point.
Calculating for the weight of each query word in the query statement, can by under online to all document datas into
Row statistics, obtains the word frequency and document frequency of each query word, and with due regard to part of speech, whether the information such as entity word, calculate
It arrives.
It should be noted that the acquisition of query word needs to carry out described search sentence certain processing, for example remove institute
It states some non-key words in search statement, proper transformation etc. is carried out to partial words, to increase the accuracy for the document recalled
And it is comprehensive.These processing can use the prior art, and details are not described herein.
(2) it is directed to window feature
Firstly, successively slipping over the character in the document using the window of setting length, obtain and the query statement phase
The length of matched all matching segments, the matching segment is the presupposition multiple of the query statement length, for example, 1 times, 3
Times etc.;Then the weight that word number and hit word in the query statement are hit according to the matching segment, calculates described
Matching degree with segment;Maximum value in the matching degree of the matching segment is corresponded into the query statement as the document
Window feature score.
For example, query statement are as follows: " rainbow meeting room ", length 5 are made of " rainbow " and " meeting room " two words, false
If wherein " rainbow " weight is 15, " meeting room " weight is 10;
Document are as follows: " soho network mansion 8 floor rainbow this meeting room occupied ";
Assuming that length of window is 5, window is successively slipped over to the character in above-mentioned document, obtains two matching segments, respectively
For " this meeting of rainbow ", " this meeting room ".
The matching degree for each matching segment being calculated: for matching segment " this meeting of rainbow ", matching degree are as follows:
15;For matching segment " this meeting room ", matching degree are as follows: 10.
Wherein matching degree maximum value is selected to correspond to the window feature score i.e. score of the query statement as the document
It is 15.
It should be noted that in practical applications, the window of different length can be set separately, for example, the window
Length can be respectively 1 times and 3 times of the query statement length.Correspondingly, it can be directed to the window and 3 of 1 times of length respectively
The window of times length, calculates the window feature score that the document corresponds to the query statement.
(3) it is directed to title feature
It is similar with the calculating of the above-mentioned matching degree for window feature, institute can be hit according to the corresponding title of the document
The weight for stating the word number and hit word in query statement, calculates the document and corresponds to the title feature of the query statement and obtain
Point.
If query statement is " whether rainbow meeting room has been subscribed ", the weight of query word " rainbow " wherein included is 15,
The weight of query word " meeting room " is 10, and " whether " weight is 1 to query word, and the weight of query word " " is 2, and query word is " pre-
Order " weight be 12.
Whether document 1 entitled " afternoon reservation meeting room ", be matched to " whether ", " ", " reservation ", " meeting room ",
The then title feature score of the corresponding query statement of document 1 are as follows: 1+2+12+10=25;
Document 2 is entitled " having subscribed to rainbow ", is matched to " reservation ", " rainbow ", then document 2 corresponds to the query statement
Title feature score are as follows: 12+15=27.
(4) for crucial characteristic of field
The key field such as may include: domain name, Anchor Text, meta label etc..
Wherein, meta label can be used to describe the attribute of a html web page document, for example, author, date and time,
Webpage description, keyword, page furbishing etc..
When calculating key field feature score, can be hit in the query statement according to the corresponding key field of the document
Word number and hit word weight, calculate the key field feature score that the document corresponds to the query statement.
When there are multiple default features, the document in the collection of document can be carried out for each default feature respectively
Then the preferred documents obtained for each default feature are taken union by filtering.That is, which presets feature mistake whether for
It filters obtained preferred documents and is used as final preferred documents.
Step 104, the sequence score of the preferred documents is calculated.
When being ranked up score calculating, the prior art can be used, corresponding function is chosen according to application demand to count
It calculates, such as MatchRank, AnchorRank, TitleRank etc., without limitation to this embodiment of the present invention.
Step 105, the preferred documents are ranked up from high to low according to the sequence score.
Web search method provided in an embodiment of the present invention, in the document for obtaining matching with the query statement that user inputs
After set, the document in the collection of document is filtered first against default feature, preferred documents is obtained, then calculates again
The sequence score of the preferred documents is from high to low ranked up the preferred documents according to the sequence score.Passed through
Filter processing, eliminates in the collection of document with the lower document of query statement correlation, filters out and query statement correlation
Then higher document only carries out marking sequence to filtered document, that is, preferred documents, thus greatly reduce and provide to calculating
The consumption in source, improves retrieval performance.
Such as query statement " rainbow meeting room " above, based on arranging in the searching system for asking friendship, if one is long
Text, wherein having comprising " rainbow " in one section, far apart another section includes " meeting room ", then this long article is because meet simultaneously
There are " rainbow " and " meeting room " two query words, therefore can be called back.
If this long article can enter time-consuming correlation marking and sequence link according to the prior art.If pressed
According to the scheme of the embodiment of the present invention, due to increasing correlation beta pruning process, i.e., to the collection of document to match with query statement
In document be filtered, by the filter process, it is very low that the window feature of this long article obtains branch, it is possible to be filtered
Fall, avoids the lower document of this correlation from entering correlation marking and sequence link, retrieval performance is effectively promoted.
It further, can be for one or more default spies when being filtered to the document in the collection of document
Sign, calculates the default feature score that each document in the collection of document corresponds to the query statement, by the default feature score
Lower document filters out, and retains and the higher document of query statement correlation.According to multiple and different default features to the text
Shelves be filtered, can be filtered out to the maximum extent from multiple and different angles with the higher preferred documents of query statement correlation,
Further reduce the consumption in subsequent sequence calculating to computing resource.
It should be noted that can be executable in such as one group of computer the step of the process of above-mentioned each attached drawing illustrates
It is executed in the computer system of instruction, although also, logical order is shown in flow charts, and it in some cases, can
With the steps shown or described are performed in an order that is different from the one herein.
Correspondingly, present invention implementation also provides a kind of Web page searching device, as shown in Fig. 2, being a kind of structure of the device
Schematic diagram.
In this embodiment, described device includes following module:
Receiving module 201, for receiving the query statement of user's input;
Enquiry module 202, for obtaining the collection of document to match with the query statement;
Filtering module 203 obtains preferred text for being filtered for default feature to the document in the collection of document
Shelves;
Computing module 204, for calculating the sequence score of the preferred documents;
Sorting module 205, for being ranked up from high to low to the preferred documents according to the sequence score.
Wherein, the query statement can be input to browser by the various input that smart machine provides by user
Search column in, moreover, the query statement may include one or more query words, for example, " document ", " rainbow meeting room "
Deng.The enquiry module 202 can obtain the collection of document by the search engine service on backstage, in the collection of document
Contain the corresponding document of the corresponding all search results of current queries sentence.
The filtering module 203 can specifically include: feature score computing unit and screening unit.Wherein, the feature
Score calculation unit is for calculating the default feature score that each document in the collection of document corresponds to the query statement;The sieve
The document that menu member is used to that the default feature score to be selected to be greater than given threshold is as preferred documents;Or according to described default
Feature score from high to low, selects the document of setting quantity as preferred documents.
The computing module 204 can use the prior art, be chosen according to application demand when being ranked up score calculating
Corresponding function calculates, such as MatchRank, AnchorRank, TitleRank etc., does not limit this embodiment of the present invention
It is fixed.
Web page searching device provided in an embodiment of the present invention, in the document for obtaining matching with the query statement that user inputs
After set, the document in the collection of document is filtered first against default feature, preferred documents is obtained, then calculates again
The sequence score of the preferred documents is from high to low ranked up the preferred documents according to the sequence score.Passed through
Filter processing, eliminates in the collection of document with the lower document of query statement correlation, filters out and query statement correlation
Then higher document only carries out marking sequence to filtered document, that is, preferred documents, thus greatly reduce and provide to calculating
The consumption in source, improves retrieval performance.
In practical applications, the default feature may include following any one or more: BM25 feature, window are special
Sign, title feature, crucial characteristic of field.
Correspondingly, the feature score computing unit may include following any one or more units: BM25 feature meter
Calculate unit, window feature computing unit, title feature computing unit, key field feature calculation unit.Wherein:
The title feature computing unit specifically can be used for hitting the inquiry language according to the corresponding title of the document
The weight of word number and hit word in sentence, calculates the title feature score that the document corresponds to the query statement.
The key field feature calculation unit is specifically used for hitting the inquiry language according to the corresponding key field of the document
The weight of word number and hit word in sentence, calculates the key field feature score that the document corresponds to the query statement.
The specific structure is shown in FIG. 3 for one kind of the BM25 feature calculation unit, including following subelement:
Cutting subelement 31 obtains each query word for carrying out cutting to the query statement;
Correlation calculations subelement 32, for calculating the Relevance scores of the query word Yu the document;
Feature score computation subunit 33, for being obtained according to the weight and correlation of query word each in the query statement
Point, calculate the BM25 feature score that the document corresponds to the query statement.
A kind of specific structure of the window feature computing unit is as shown in figure 4, include following subelement:
Matching segment determines subelement 41, for successively slipping over the character in the document using the window of setting length,
All matching segments to match with the query statement are obtained, the length of the window is the default of the query statement length
Multiple;
Matching degree computation subunit 42, for hitting the word number in the query statement according to the matching segment and being ordered
The weight of middle word calculates the matching degree of the matching segment;
Feature score determines subelement 43, for using the maximum value in the matching degree of the matching segment as the document
The window feature score of the corresponding query statement.
It should be noted that in practical applications, the presupposition multiple can have one or more.That is, can be with
The window of different length is set separately, for example, the length of the window is respectively 1 times and 3 times of the query statement length.Phase
Ying Di, the window feature computing unit can be directed to the window of 1 times of length and the window of 3 times of length respectively, calculate the text
The window feature score of the corresponding query statement of shelves.
Web page searching device provided in an embodiment of the present invention can when being filtered to the document in the collection of document
The default feature of the query statement is corresponded to for one or more default features, to calculate each document in the collection of document to obtain
Point, the default lower document of feature score is filtered out, is retained and the higher document of query statement correlation.According to multiple
The default feature of difference is filtered the document, can filter out to the maximum extent from multiple and different angles and query statement phase
The higher preferred documents of closing property further reduce the consumption in subsequent sequence calculating to computing resource.
Fig. 5 is a kind of block diagram of device 800 for web search method shown according to an exemplary embodiment.Example
Such as, device 800 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, and plate is set
It is standby, Medical Devices, body-building equipment, personal digital assistant etc..
Referring to Fig. 5, device 800 may include following one or more components: processing component 802, memory 804, power supply
Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, and
Communication component 816.
The integrated operation of the usual control device 800 of processing component 802, such as with display, telephone call, data communication, phase
Machine operation and record operate associated operation.Processing element 802 may include that one or more processors 820 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more modules, just
Interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, it is more to facilitate
Interaction between media component 808 and processing component 802.
Memory 804 is configured as storing various types of other data to support the operation in equipment 800.These data are shown
Example includes the instruction of any application or method for operating on device 800, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 804 can be by the volatibility or non-volatile memory device or their group of any classification
It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Electric power assembly 806 provides electric power for the various assemblies of device 800.Electric power assembly 806 may include power management system
System, one or more power supplys and other with for device 800 generate, manage, and distribute the associated component of electric power.
Multimedia component 808 includes the screen of one output interface of offer between described device 800 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action
Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers
Body component 808 includes a front camera and/or rear camera.When equipment 800 is in operation mode, such as screening-mode or
When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike
Wind (MIC), when device 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched
It is set to reception external audio signal.The received audio signal can be further stored in memory 804 or via communication set
Part 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.
I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock
Determine button.
Sensor module 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented
Estimate.For example, sensor module 814 can detecte the state that opens/closes of equipment 800, and the relative positioning of component, for example, it is described
Component is the display and keypad of device 800, and sensor module 814 can be with 800 1 components of detection device 800 or device
Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800
Temperature change.Sensor module 814 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 814 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device
800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation
In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided
It such as include the memory 804 of instruction, above-metioned instruction can be completed above-mentioned key by the execution of the processor 820 of device 800, and accidentally touching is entangled
Wrong method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-
ROM, tape, floppy disk and optical data storage devices etc..
The present invention also provides a kind of non-transitorycomputer readable storage mediums, when the instruction in the storage medium is by moving
When the processor of dynamic terminal executes, so that mobile terminal is able to carry out all or part of step in aforementioned present invention embodiment of the method
Suddenly.
Fig. 6 is the structural schematic diagram of server in the embodiment of the present invention.The server 1900 can be different because of configuration or performance
And generate bigger difference, may include one or more central processing units (Central Processing Units,
CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs
1942 or data 1944 storage medium 1930 (such as one or more mass memory units).Wherein, memory 1932
It can be of short duration storage or persistent storage with storage medium 1930.Be stored in storage medium 1930 program may include one or
More than one module (diagram does not mark), each module may include to the series of instructions operation in server.Further
Ground, central processing unit 1922 can be set to communicate with storage medium 1930, and storage medium 1930 is executed on server 1900
In series of instructions operation.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets
Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or
More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
Obviously, embodiment described above only a part of the embodiments of the present invention, instead of all the embodiments.
Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts all
Other embodiments should fall within the scope of the present invention.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its
Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or
Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of web search method, which is characterized in that the described method includes:
Receive the query statement of user's input;
Obtain the collection of document to match with the query statement;
The document in the collection of document is filtered for default feature, obtains preferred documents;
Calculate the sequence score of the preferred documents;
The preferred documents are ranked up from high to low according to the sequence score.
2. the method according to claim 1, wherein described be directed to default feature to the text in the collection of document
Shelves are filtered, and obtaining preferred documents includes:
Calculate the default feature score that each document in the collection of document corresponds to the query statement;
The document for selecting the default feature score to be greater than given threshold is obtained as preferred documents, or according to the default feature
Divide the document for selecting setting quantity from high to low as preferred documents.
3. according to the method described in claim 2, it is characterized in that, the default feature includes following any one or more:
BM25 feature, window feature, title feature, crucial characteristic of field.
4. according to the method described in claim 3, it is characterized in that, calculating the BM25 spy that the document corresponds to the query statement
It obtains and point includes:
Cutting is carried out to the query statement, obtains each query word;
Calculate the Relevance scores of the query word Yu the document;
According to the weight and Relevance scores of query word each in the query statement, calculates the document and correspond to the query statement
BM25 feature score.
5. according to the method described in claim 3, it is characterized in that, calculating the window spy that the document corresponds to the query statement
It obtains and point includes:
The character in the document is successively slipped over using the window of setting length, obtains matching with the query statement all
Segment is matched, the length of the window is the presupposition multiple of the query statement length;
The weight that word number and hit word in the query statement are hit according to the matching segment, calculates the matching segment
Matching degree;
The window feature that maximum value in the matching degree of the matching segment corresponds to the query statement as the document is obtained
Point.
6. according to the method described in claim 3, it is characterized in that, calculating the title spy that the document corresponds to the query statement
It obtains and point includes:
The weight that the word number and hit word in the query statement are hit according to the corresponding title of the document, calculates the text
The title feature score of the corresponding query statement of shelves.
7. according to the method described in claim 3, it is characterized in that, calculating the key field that the document corresponds to the query statement
Feature score includes:
The weight of the word number and hit word in the query statement is hit according to the corresponding key field of the document, described in calculating
Document corresponds to the key field feature score of the query statement.
8. a kind of Web page searching device, which is characterized in that described device includes:
Receiving module, for receiving the query statement of user's input;
Enquiry module, for obtaining the collection of document to match with the query statement;
Filtering module obtains preferred documents for being filtered for default feature to the document in the collection of document;
Computing module, for calculating the sequence score of the preferred documents;
Sorting module, for being ranked up from high to low to the preferred documents according to the sequence score.
9. a kind of electronic equipment characterized by comprising one or more processors, memory;
The memory is for storing computer executable instructions, and for executing, the computer is executable to be referred to the processor
It enables, to realize method as described in any one of claim 1 to 7.
10. a kind of readable storage medium storing program for executing, is stored thereon with instruction, described instruction is performed to realize as claim 1 to 7 is any
Method described in.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910146274.4A CN109960757A (en) | 2019-02-27 | 2019-02-27 | Web search method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910146274.4A CN109960757A (en) | 2019-02-27 | 2019-02-27 | Web search method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109960757A true CN109960757A (en) | 2019-07-02 |
Family
ID=67023975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910146274.4A Pending CN109960757A (en) | 2019-02-27 | 2019-02-27 | Web search method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109960757A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113495984A (en) * | 2020-03-20 | 2021-10-12 | 华为技术有限公司 | Statement retrieval method and related device |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101625680A (en) * | 2008-07-09 | 2010-01-13 | 东北大学 | Document retrieval method in patent field |
CN101990670A (en) * | 2008-04-11 | 2011-03-23 | 微软公司 | Search results ranking using editing distance and document information |
CN102364467A (en) * | 2011-09-29 | 2012-02-29 | 北京亿赞普网络技术有限公司 | Network search method and system |
CN103064846A (en) * | 2011-10-20 | 2013-04-24 | 北京中搜网络技术股份有限公司 | Retrieval device and retrieval method |
CN103092945A (en) * | 2013-01-11 | 2013-05-08 | 北京百度网讯科技有限公司 | Searching method and device based on interface returning |
CN103294681A (en) * | 2012-02-23 | 2013-09-11 | 北京百度网讯科技有限公司 | Method and device for generating search result |
CN104050235A (en) * | 2014-03-27 | 2014-09-17 | 浙江大学 | Distributed information retrieval method based on set selection |
CN104573028A (en) * | 2015-01-14 | 2015-04-29 | 百度在线网络技术(北京)有限公司 | Intelligent question-answer implementing method and system |
US20150286729A1 (en) * | 2014-04-02 | 2015-10-08 | Samsung Electronics Co., Ltd. | Method and system for content searching |
CN105956148A (en) * | 2016-05-12 | 2016-09-21 | 北京奇艺世纪科技有限公司 | Resource information recommendation method and apparatus |
CN106951411A (en) * | 2017-03-24 | 2017-07-14 | 福州大学 | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing |
CN108829819A (en) * | 2018-06-12 | 2018-11-16 | 上海智臻智能网络科技股份有限公司 | Personalized text recommended method and system, server, readable storage medium storing program for executing |
-
2019
- 2019-02-27 CN CN201910146274.4A patent/CN109960757A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101990670A (en) * | 2008-04-11 | 2011-03-23 | 微软公司 | Search results ranking using editing distance and document information |
CN101625680A (en) * | 2008-07-09 | 2010-01-13 | 东北大学 | Document retrieval method in patent field |
CN102364467A (en) * | 2011-09-29 | 2012-02-29 | 北京亿赞普网络技术有限公司 | Network search method and system |
CN103064846A (en) * | 2011-10-20 | 2013-04-24 | 北京中搜网络技术股份有限公司 | Retrieval device and retrieval method |
CN103294681A (en) * | 2012-02-23 | 2013-09-11 | 北京百度网讯科技有限公司 | Method and device for generating search result |
CN103092945A (en) * | 2013-01-11 | 2013-05-08 | 北京百度网讯科技有限公司 | Searching method and device based on interface returning |
CN104050235A (en) * | 2014-03-27 | 2014-09-17 | 浙江大学 | Distributed information retrieval method based on set selection |
US20150286729A1 (en) * | 2014-04-02 | 2015-10-08 | Samsung Electronics Co., Ltd. | Method and system for content searching |
CN104573028A (en) * | 2015-01-14 | 2015-04-29 | 百度在线网络技术(北京)有限公司 | Intelligent question-answer implementing method and system |
CN105956148A (en) * | 2016-05-12 | 2016-09-21 | 北京奇艺世纪科技有限公司 | Resource information recommendation method and apparatus |
CN106951411A (en) * | 2017-03-24 | 2017-07-14 | 福州大学 | The quick multi-key word Semantic Ranking searching method of data-privacy is protected in a kind of cloud computing |
CN108829819A (en) * | 2018-06-12 | 2018-11-16 | 上海智臻智能网络科技股份有限公司 | Personalized text recommended method and system, server, readable storage medium storing program for executing |
Non-Patent Citations (2)
Title |
---|
刘凤晨: "n- Gram/2L 索引结构的存储与时间优化算法", 《计算机工程与应用》 * |
赵阳: "《基于中文信息处理的古籍整理研究评述》", 《图书情报工作》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113495984A (en) * | 2020-03-20 | 2021-10-12 | 华为技术有限公司 | Statement retrieval method and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106708282B (en) | A kind of recommended method and device, a kind of device for recommendation | |
CN105488112B (en) | Information-pushing method and device | |
CN104331503B (en) | The method and device of information push | |
CN110147467A (en) | A kind of generation method, device, mobile terminal and the storage medium of text description | |
CN108038102A (en) | Recommendation method, apparatus, terminal and the storage medium of facial expression image | |
CN108874939A (en) | A kind of information search method and device | |
CN108073606A (en) | A kind of news recommends method and apparatus, a kind of device recommended for news | |
CN107346182A (en) | A kind of method for building user thesaurus and the device for building user thesaurus | |
CN107291772A (en) | One kind search access method, device and electronic equipment | |
CN110222256A (en) | A kind of information recommendation method, device and the device for information recommendation | |
CN108874827A (en) | A kind of searching method and relevant apparatus | |
CN110069624A (en) | Text handling method and device | |
CN106777016A (en) | The method and device of information recommendation is carried out based on instant messaging | |
CN110502648A (en) | Recommended models acquisition methods and device for multimedia messages | |
CN110110204A (en) | A kind of information recommendation method, device and the device for information recommendation | |
CN110019885A (en) | A kind of expression data recommended method and device | |
CN107045541A (en) | data display method and device | |
CN103970831B (en) | Recommend the method and apparatus of icon | |
CN110286775A (en) | A kind of dictionary management method and device | |
CN110110207A (en) | A kind of information recommendation method, device and electronic equipment | |
CN107729439A (en) | Obtain the methods, devices and systems of multi-medium data | |
CN110309324A (en) | A kind of searching method and relevant apparatus | |
CN107707759A (en) | Terminal control method, device and system, storage medium | |
CN109960757A (en) | Web search method and device | |
CN107436896A (en) | Method, apparatus and electronic equipment are recommended in one kind input |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190702 |
|
RJ01 | Rejection of invention patent application after publication |