CN107220384A - A kind of search word treatment method, device and computing device based on correlation - Google Patents

A kind of search word treatment method, device and computing device based on correlation Download PDF

Info

Publication number
CN107220384A
CN107220384A CN201710515009.XA CN201710515009A CN107220384A CN 107220384 A CN107220384 A CN 107220384A CN 201710515009 A CN201710515009 A CN 201710515009A CN 107220384 A CN107220384 A CN 107220384A
Authority
CN
China
Prior art keywords
search word
word
keyword
keyword sequence
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710515009.XA
Other languages
Chinese (zh)
Other versions
CN107220384B (en
Inventor
方轲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201710515009.XA priority Critical patent/CN107220384B/en
Priority to CN201911033168.1A priority patent/CN110795628B/en
Publication of CN107220384A publication Critical patent/CN107220384A/en
Application granted granted Critical
Publication of CN107220384B publication Critical patent/CN107220384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of search word treatment method, device and computing device based on correlation, this method includes:The search daily record of each user is obtained to extract available search word;Word segmentation processing is carried out to each available search word, to obtain its corresponding one or more Feature Words;Feature Words are changed to generate corresponding keyword, one or more corresponding keywords are combined, to form keyword sequence corresponding with available search word;From the corresponding available search word of each keyword sequence, frequency of occurrence highest available search word is selected as the predetermined search word of the keyword sequence;Each keyword sequence is separately input into correlation calculations model to be trained, according to the Sequential output of the correlation from big to small first quantity keyword sequence related to the keyword sequence inputted;The keyword sequence that first quantity is exported replaces with its corresponding predetermined search word, the corresponding relation formed between keyword sequence and the first quantity predetermined search word.

Description

A kind of search word treatment method, device and computing device based on correlation
Technical field
The present invention relates to Internet technical field, more particularly to a kind of search word treatment method, device based on correlation And computing device.
Background technology
With the fast development of Internet technology, it is that work and life are brought that increasing people, which starts to enjoy internet, Various facilities.During than if desired for obtaining information, it can carry out and search using search engine by keying in search term in a browser The related information search of rope word.And user search for a keyword when, also often be intended to search for its associative key, for example with Family inputs " java ", it is understood that there may be more meet the keyword of its intention, such as " java web ", " java rear ends ".Therefore, for Family keyword, with reference to the contact between different keywords, accurately providing its relative words can help user to save input time, together Shi Tisheng conversion ratios.
Current main stream approach is follow-up word combination collaborative filtering, and main thought is:" three states are inputted in view of user Will ", obtains inputting " real Three Kindoms is unparalleled " again in a few minutes of search result, it is believed that the user with identical follow-up word looks into Asking entry has certain similarity, if user input data enough, the phase of these entries can be provided based on collaborative filtering Close search term.However, follow-up word combination collaborative filtering still suffers from no small defect, particularly in the website of recruitment industry In portion's search, problem becomes apparent.
Compared to large-scale website, less, user's inquiry entry homogeneity is serious, therefore is permitted for the recruitment industry search data scale of construction Many entries may be without follow-up word.Moreover, as the user of recruiter, its search law does not meet " same user's search term All it is related " this precondition, the search content of this kind of user generally not theed least concerned, now failed using follow-up word.This Outside, popular word such as " java ", " product manager " vocabulary usually turn into other words follow-up word, this to unexpected winner relative words not Profit, but punishment is applied to popular word and manual setting weight is needed, the difficulty of project is increased, and find to be difficult in actual items Control.
The content of the invention
Therefore, the present invention provide it is a kind of based on correlation search term processing technical scheme, with try hard to solve or extremely It is few to alleviate the problem of existing above.
According to an aspect of the present invention there is provided a kind of search word treatment method based on correlation, suitable for being set in calculating Standby middle execution, this method comprises the following steps:The search daily record of each user in multiple users is obtained, being extracted from search daily record can Use search term;Word segmentation processing is carried out to each available search word, to obtain its corresponding one or more Feature Words;By one or More Feature Words are changed to generate corresponding keyword respectively, and combine one or more corresponding keywords, with Form keyword sequence corresponding with available search word;From the available search word corresponding to each keyword sequence, selection occurs Frequency highest available search word as the keyword sequence predetermined search word;Each keyword sequence is separately input to correlation Property computation model in be trained, according to the Sequential output of correlation from big to small to input keyword sequence it is related first Quantity keyword sequence;The keyword sequence that first quantity is exported replaces with its corresponding predetermined search word, so that The corresponding relation formed between keyword sequence and the first quantity predetermined search word.
Alternatively, in the search word treatment method based on correlation according to the present invention, being extracted from search daily record can The step of with search term, includes:Initial search word is obtained from search daily record and counts its quantity;If quantity is more than the first numerical value, Then the initial search word of the corresponding user of quantity is directly deleted;Count the search time of all not deleted each initial search words Number;The initial search word that searching times are less than second value is filtered out, remaining initial search word is regard as available search word.
Alternatively, in the search word treatment method based on correlation according to the present invention, by one or more features The step of word is changed to generate corresponding keyword respectively includes:Reject in one or more Feature Words and belong to meaningless The Feature Words of word or sensitive word;Remaining Feature Words carry out synonym conversion after rejecting, to generate corresponding keyword.
Alternatively, in the search word treatment method based on correlation according to the present invention, it is one or more right to combine The keyword answered, is included with being formed the step of keyword sequence corresponding with available search word:To one or more corresponding Keyword carries out text ascending order arrangement;To the keyword after arrangement, it will be connected between two neighboring keyword with the first symbol Connect, to form keyword sequence corresponding with available search word.
Alternatively, in the search word treatment method based on correlation according to the present invention, the first symbol is underscore
Alternatively, in the search word treatment method based on correlation according to the present invention, in formation and available search word After the step of corresponding keyword sequence, in addition to:Count the number of times that each keyword sequence repeats;If number of times is less than the One numerical value, then reject the corresponding keyword sequence of number of times;If number of times is not less than the first numerical value, retain the corresponding keyword of number of times Sequence.
Alternatively, according to the present invention the search word treatment method based on correlation in, when receive user key entry During query search word, this method also includes:Query search word is handled, to form keyword corresponding with query search word Sequence;The first corresponding quantity predetermined search word is obtained according to keyword sequence, and is searched from first quantity is specific Preceding second quantity predetermined search word is selected in rope word, the second quantity is not more than the first quantity;Search second quantity is specific Rope word recommends the user as the related term of query search word.
According to a further aspect of the invention there is provided a kind of search term processing unit based on correlation, suitable for residing in In computing device, the device includes extraction module, word-dividing mode, modular converter, selecting module, training module and replacement module. Wherein, extraction module is suitable to the search daily record for obtaining each user in multiple users, and available search word is extracted from search daily record;Point Word module is suitable to carry out word segmentation processing to each available search word, to obtain its corresponding one or more Feature Words;Modulus of conversion Block is suitable to be changed one or more Feature Words respectively to generate corresponding keyword, and combines one or more right The keyword answered, to form keyword sequence corresponding with available search word;Selecting module is suitable to right from each keyword sequence institute In the available search word answered, frequency of occurrence highest available search word is selected as the predetermined search word of the keyword sequence;Instruction Practice module to be suitable to be separately input to each keyword sequence to be trained in correlation calculations model, according to correlation from big to small Sequential output and related the first quantity keyword sequence of keyword sequence that inputs;Replacement module is suitable to the first quantity The keyword sequence of individual output replaces with its corresponding predetermined search word, so as to form keyword sequence and the first quantity spy Determine the corresponding relation between search term.
Alternatively, in the search term processing unit based on correlation according to the present invention, extraction module is further adapted for: Initial search word is obtained from search daily record and its quantity is counted when quantity is more than the first numerical value, by the corresponding user's of quantity Initial search word is directly deleted;Count the searching times of all not deleted each initial search words;Searching times are filtered out to be less than The initial search word of second value, regard remaining initial search word as available search word.
Alternatively, in the search term processing unit based on correlation according to the present invention, modular converter is further adapted for: Reject the Feature Words for belonging to meaningless word or sensitive word in one or more Feature Words;Remaining Feature Words are carried out after rejecting Synonym is converted, to generate corresponding keyword.
Alternatively, in the search term processing unit based on correlation according to the present invention, modular converter is further adapted for: Text ascending order arrangement is carried out to one or more corresponding keywords;To the keyword after arrangement, by two neighboring keyword Between be attached with the first symbol, to form corresponding with available search word keyword sequence.
Alternatively, in the search term processing unit based on correlation according to the present invention, the first symbol is underscore.
Alternatively, in the search term processing unit based on correlation according to the present invention, in addition to processing module, fit In:Count the number of times that each keyword sequence repeats;When number of times is less than the first numerical value, the corresponding crucial word order of number of times is rejected Row;When number of times is not less than the first numerical value, retain the corresponding keyword sequence of number of times.
Alternatively, in the search term processing unit based on correlation according to the present invention, in addition to recommending module, fit In:When receiving the query search word of user's key entry, query search word is handled, it is corresponding with query search word to be formed Keyword sequence;Corresponding the first quantity predetermined search word is obtained according to keyword sequence, and from first quantity Preceding second quantity predetermined search word is selected in individual predetermined search word, the second quantity is not more than the first quantity;By second quantity Individual predetermined search word recommends the user as the related term of query search word.
According to a further aspect of the invention there is provided a kind of computing device, including according to the present invention based on correlation Search term processing unit.
According to a further aspect of the invention there is provided a kind of computing device, including one or more processors, memory with And one or more programs, wherein one or more program storages in memory and are configured as by one or more processors Perform, one or more programs include the instruction for being used to perform the search word treatment method based on correlation according to the present invention.
According to a further aspect of the invention, a kind of computer-readable storage medium for storing one or more programs is also provided Matter, one or more programs include instruction, and instruction is when executed by a computing apparatus so that computing device is according to the present invention's Search word treatment method based on correlation.
The technical scheme handled according to the search term based on correlation of the present invention, first each available search word to user Word segmentation processing is carried out to obtain corresponding one or more Feature Words, each Feature Words are changed to generate corresponding key Word, combines each keyword to form keyword sequence corresponding with available search word, from can use corresponding to each keyword sequence In search term, frequency of occurrence highest available search word is selected as the predetermined search word of the keyword sequence, by each keyword Sequence is separately input to be trained in correlation calculations model, according to the Sequential output of correlation from big to small and the pass of input The first related quantity keyword sequence of keyword sequence, the keyword sequence that the first quantity is exported replaces with its correspondence Predetermined search word, the corresponding relation formed between keyword sequence and the first quantity predetermined search word.In above-mentioned technical side In case, the correlation calculations model only considers the distance between search term, and when window is set to infinity, user's is irregular Search will not influence its correlation calculations, while also being had a clear superiority in the processing of unexpected winner vocabulary, without to popular vocabulary It is artificial to adjust power.In addition, after formation keyword sequence corresponding with available search word, repeating for each keyword sequence Existing number of times is counted, and is rejected for the keyword sequence of the numerical value of number of times first, without entering to all keyword sequences Row subsequent treatment, reduces computation complexity and time cost.In addition, during the available search word of user is the advance daily record from search Extract, in extraction process can filtering spam user and the low search data of searching times, ensure result effectively and While accurately, processing speed is further increased.
Brief description of the drawings
In order to realize above-mentioned and related purpose, some illustrative sides are described herein in conjunction with following description and accompanying drawing Face, these aspects indicate the various modes of principles disclosed herein that can put into practice, and all aspects and its equivalent aspect It is intended to fall under in the range of theme claimed.The following detailed description by being read in conjunction with the figure, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference generally refers to identical Part or element.
Fig. 1 shows the structured flowchart of computing device 100 according to an embodiment of the invention;
Fig. 2 shows the flow of the search word treatment method 200 according to an embodiment of the invention based on correlation Figure;
Fig. 3 shows the signal of the search term processing unit 300 according to an embodiment of the invention based on correlation Figure;
Fig. 4 shows showing for the search term processing unit 400 based on correlation according to still another embodiment of the invention It is intended to;And
Fig. 5 shows showing for the search term processing unit 500 based on correlation according to still another embodiment of the invention It is intended to.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, computing device 100, which is typically comprised, is System memory 106 and one or more processor 104.Memory bus 108 can be used in processor 104 and system storage Communication between device 106.
Depending on desired configuration, processor 104 can be any kind of processing, include but is not limited to:Microprocessor (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 can be included such as The cache of one or more rank of on-chip cache 110 and second level cache 112 etc, processor core 114 and register 116.The processor core 114 of example can include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.The Memory Controller 118 of example can be with processor 104 are used together, or in some implementations, Memory Controller 118 can be an interior section of processor 104.
Depending on desired configuration, system storage 106 can be any type of memory, include but is not limited to:Easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System is stored Device 106 can include operating system 120, one or more apply 122 and routine data 124.In some embodiments, It may be arranged to be operated using routine data 124 on an operating system using 122.
Computing device 100 can also include contributing to from various interface equipments (for example, output equipment 142, Peripheral Interface 144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as contributing to via One or more A/V port 152 is communicated with the various external equipments of such as display or loudspeaker etc.Outside example If interface 144 can include serial interface controller 154 and parallel interface controller 156, they can be configured as contributing to Via one or more I/O port 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touch Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner) etc communicated.The communication of example is set Standby 146 can include network controller 160, and it can be arranged to be easy to via one or more COM1 164 and one The communication that other individual or multiple computing devices 162 pass through network communication link.
Network communication link can be an example of communication media.Communication media can be generally presented as in such as carrier wave Or computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and can With including any information delivery media." modulated data signal " can such signal, one in its data set or many It is individual or it change can the mode of coding information in the signal carry out.As nonrestrictive example, communication media can be with Include the wire medium of such as cable network or private line network etc, and it is such as sound, radio frequency (RF), microwave, infrared (IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein can include depositing Both storage media and communication media.
Computing device 100 can be implemented as server, such as file server, database server, application program service Device and WEB server etc., can also be embodied as a part for portable (or mobile) electronic equipment of small size, these electronic equipments Can be such as cell phone, personal digital assistant (PDA), personal media player device, wireless network browsing apparatus, individual Helmet, application specific equipment or the mixing apparatus of any of the above function can be included.Computing device 100 can also be real It is now to include desktop computer and the personal computer of notebook computer configuration.In certain embodiments, computing device 100 is real It is now server, the server is configured as performing the search word treatment method 200 based on correlation according to the present invention.Using 122 include the search term processing unit 300 based on correlation according to the present invention.
Fig. 2 shows the flow chart of the search word treatment method 200 according to an embodiment of the invention based on correlation. Search word treatment method 200 based on correlation is suitable in the computing device for being embodied as server that (such as the calculating shown in Fig. 1 is set Standby 100) middle execution.
As shown in Fig. 2 method 200 starts from step S210.In step S210, the search of each user in multiple users is obtained Daily record, available search word is extracted from search daily record.According to one embodiment of present invention, can be in the following manner from search day Available search word is extracted in will.Initial search word is obtained from search daily record first and count its quantity, if the quantity is more than the One numerical value, then directly delete the initial search word of the corresponding user of the quantity.Then, count all not deleted each original to search The searching times of rope word, filter out searching times be less than second value initial search word, using remaining initial search word as Available search word.Wherein, the first numerical value is preferably 200, and second value is preferably 3.In this embodiment, for user A Speech, its initial search word searched in daily record amounts to 150, and the initial search word in user B search daily record amounts to 237 Individual, the initial search word in user C daily record amounts to 89.Because the quantity of original search term in user B search daily record is big Directly deleted in 100, therefore by user B initial search word, this is filtered equivalent to user B is regarded as into junk user in advance Processing.User A 150 initial search words and user C 89 initial search words are retained, and are now counted each again and original are searched The searching times that the searching times of rope word, wherein user A have 115 initial search words are less than 3, and user C has 56 initial search The searching times of word be less than 3, filter out searching times be less than 3 initial search word, using remaining 68 initial search words as Available search word.It should be noted that acquired search daily record is the daily record of user's one-year age, first full dose is obtained, after Daily incremental update.
Then, into step S220, word segmentation processing is carried out to each available search word, it is corresponding one or more to obtain its Individual Feature Words.According to one embodiment of present invention, word segmentation processing is carried out to available search word by Jieba participles instrument, than Such as when available search word is " java PHPs ", the result after word segmentation processing is " java ", " software development " and " engineer " this 3 Feature Words.It should be noted that instrument used in word segmentation processing or algorithm, do not enter in the present invention Row limitation, as long as the condition of accurate participle can be met, in other words, all these technology people for understanding the present invention program It can be readily apparent that, and also within protection scope of the present invention, not repeated herein for member.
After Feature Words are obtained, step S230 is performed, one or more Feature Words are changed with generation pair respectively The keyword answered, and one or more corresponding keywords are combined, to form keyword sequence corresponding with available search word. According to one embodiment of present invention, Feature Words are changed to generate corresponding keyword in the following manner, first rejected Belong to the Feature Words of meaningless word or sensitive word in one or more Feature Words, then remaining Feature Words after rejecting are carried out together Adopted word conversion, to generate corresponding keyword.In this embodiment, the NUL in Feature Words, such as " ", " t ", " n ", entirely The search term being made up of numeral can directly be removed as meaningless word, then when carrying out synonym conversion, typically be utilized Some search terms are unconditionally converted into synonym by corresponding dictionary, for example " exploitation ", " Developmental Engineer ", " soft project Teacher ", " software development ", " PHP " are converted into " engineer ", then " engineer " is corresponding keyword, " main Pipe ", " head ", " director ", " chief inspector ", " tl " are transformed into " leader ", then " leader " is corresponding keyword.Cause This, 3 Feature Words " java ", " software development " and " engineer " generated in step S220, corresponding keyword is successively For " java ", " engineer " and " engineer ", both are identical due to rear, therefore the keyword finally given is " java " and " work Cheng Shi ".Conversion process to Feature Words is not only limited to above-mentioned expression, can suitably be adjusted under conditions of application scenarios are met It can be readily apparent that for transformation rule, all these technical staff for understanding the present invention program, and also at this Within the protection domain of invention, do not repeated herein.
After Feature Words are changed to generate corresponding keyword, start to one or more keyword carry out groups Close, to form keyword sequence corresponding with available search word.According to one embodiment of present invention, can shape in the following manner Into keyword sequence corresponding with available search word.First, text ascending order row is carried out to one or more corresponding keywords Row, then, to the keyword after arrangement, will be attached, to be formed and can use between two neighboring keyword with the first symbol The corresponding keyword sequence of search term.Wherein, the first symbol is underscore.In this embodiment, to " java " and " engineering This 2 keywords of teacher " carry out text ascending order arrangement, it is known that " java " makes number one, and " engineer " comes second, incite somebody to action both It is connected with underscore, it is " java_ that can finally obtain with available search word " java PHPs " corresponding keyword sequence Engineer ".
Certainly, computation complexity is simplified in order to further, can also be according to crucial word order after each keyword sequence is obtained The number of repetition of row filters out Partial key word sequence.According to still another embodiment of the invention, in formation and available search word After corresponding keyword sequence, the number of times that each keyword sequence repeats is counted, if the number of times is less than the first numerical value, is picked Except the corresponding keyword sequence of the number of times, if the number of times is not less than the first numerical value, retain the corresponding keyword sequence of the number of times. Wherein, the first numerical value is preferably 20.In this embodiment, the number of times that keyword sequence " java_ engineer " repeats is 39 times, it is not less than 20, then retains the keyword sequence.It should be noted that it is usually in some necessity to reject keyword sequence Application scenarios under the step of just perform, whether should specifically have the demand rejected with reference to current scene, and the first numerical value is set Surely need also exist for being weighed according to actual conditions.
In step S240, from the available search word corresponding to each keyword sequence, selection frequency of occurrence highest can With predetermined search word of the search term as the keyword sequence.To illustrate step S240 and subsequent processing steps, according to the present invention Another embodiment, according to step S210, the available search word for obtaining user A is respectively " java webpages ", " java programs Member " and " java background scripts ", user B available search word is respectively " java programmers " and " java backstages ", and user C's can It is respectively " java engineer " and " java backstages " with search term.By step S220 and S230, each available search word pair is obtained The keyword sequence answered, it is specific as shown in table 1:
Table 1
Now, frequency of occurrence statistics is carried out to the available search word corresponding to each keyword sequence in table 1,
It is specific as shown in table 2:
Table 2
According to the frequency of occurrence of each available search word in table 2, selection frequency of occurrence highest available search word is used as key The predetermined search word of word sequence, then for keyword sequence " java_ engineer ", available search word " java programmers " Frequency of occurrence is 2, and more than the frequency of occurrence of available search word " java engineer ", then its predetermined search word is " java programs Member ".Table 3 shows the example of keyword sequence according to an embodiment of the invention and its corresponding predetermined search word, specifically It is as follows:
Table 3
Herein, by corresponding relation one new relation table of formation in table 2 between available search word and keyword sequence, Mapping table is designated as, corresponding relation one new relation table of formation in table 3 between keyword sequence and predetermined search word is designated as Mode reduces table, so as to subsequent step processing.Hereafter, into step S250, each keyword sequence is separately input to correlation It is trained in computation model, according to the Sequential output of correlation from big to small first number related to the keyword sequence inputted Amount keyword sequence.Wherein, the first quantity is preferably 20.Certainly, when related keyword sequence quantity is less than the first quantity,
When being such as less than 20, directly according to the correlation Sequential output from big to small institute related with the keyword sequence of input There is keyword sequence.According to one embodiment of present invention, correlation calculations model selection item2vec models, by step Keyword sequence " java_ webpages " that S240 is obtained, " java_ engineer ", " java_ backstages " and " java_ backstages _ script " point It is not input in item2vec models and is trained.Item2vec models are different from word2vec models, and word2vec models are One sentence is regarded to the ordered sequence of word composition as, item2vec models have been given up the spatial information of word in sentence, regarded as The set that word is constituted, and only it regard the word in contextual window size as context, item2vec compared to word2vec models Whole words in sample are accordingly to be regarded as its context by model to any word, and in other words, the contextual window of item2vec models is regarded For infinity.If it follows that the contextual window of word2vec models is set into a very big positive integer, can incite somebody to action should Word2vec models are trained as item2vec models.In this embodiment, using Gensim instruments, it is called Word2vec models are trained to each keyword sequence, and parameter setting is:Model vector dimension vecSize=200, training time Number itemNum=200, contextual window window=1000000.Being dimensioned to contextual window window herein 1000000, its numerical value exceedes the number of initial search word, therefore to each initial search word, its context is all whole Document, embodies item2vec models herein.It is right after training is completed to each keyword sequence by correlation calculations model In each keyword sequence, the coefficient correlation for obtaining other associated keyword sequences is regard as correlation, here phase relation Several spans is 0~1.Table 4 shows the example of keyword sequence dependency relation according to an embodiment of the invention, this Shi Shangwei is ranked up processing, specific as follows:
Table 4
As shown in table 4, coef1, coef2, coef3 and coef4 represent the numerical value of corresponding correlation respectively,
Its value is followed successively by 0.75,0.35,0.86 and 0.61.According to this result, the keyword sequence of each output is sorted, The example of keyword sequence dependency relation after finally giving based on relevance ranking, it is specific as shown in table 5:
Table 5
Finally, into step S270, the keyword sequence that the first quantity is exported replaces with its and corresponding specific searched Rope word, so as to form the corresponding relation between keyword sequence and the first quantity predetermined search word.According to one of the present invention Embodiment, reduces table, by the keyword sequence " java_ engineer " of output, " java_ backstages ", " java_ webpages " with reference to mode " java_ backstages _ script " replace with successively its for predetermined search word, i.e., replace with " java programmers ", " java respectively Backstage ", " java webpages " and " java background scripts ".Table 6 show keyword sequence according to an embodiment of the invention with The example of predetermined search word corresponding relation, it is specific as follows:
Table 6
After the corresponding relation of keyword sequence and predetermined search word is constructed, be usually existed in database so as to Inquire about at any time, therefore can quickly and accurately recommend the phase of its query search word keyed in user by this corresponding relation Close word.According to still another embodiment of the invention, when receiving the query search word of user's key entry, first query search word is entered Row processing, to form keyword sequence corresponding with query search word, corresponding first is obtained further according to keyword sequence Quantity predetermined search word, and select from the first quantity predetermined search word preceding second quantity predetermined search word, and Two quantity are not more than the first quantity, finally recommend the second quantity predetermined search word as the related term of query search word The user.Wherein, the second quantity is preferably 10.Certainly, if the quantity of the predetermined search word got is not less than the second quantity, Predetermined search word is all recommended into the user as the related term of query search word.Further, by popular specific search The corresponding relation of word, such as " product manager ", " java engineer " and keyword sequence is put into popular caching, to accelerate service speed Degree.
In this embodiment, the query search word that user keys in is " java websites ", to improve treatment effeciency, is existed first Search whether to exist in mapping table with query search word identical available search word, if in the presence of directly obtaining the available search The corresponding keyword sequence of word, without handling query search word, to form keyword corresponding with query search word Sequence, if being not present, according to step S220 and step S230 formation keyword sequence corresponding with query search word.It is very bright Aobvious, simultaneously " java websites " is not present in the available search word in mapping table, then it handle and obtain corresponding keyword sequence For " java_ webpages ".Next, search whether there is keyword sequence " java_ webpages " from hot topic caching, if in the presence of, The first quantity predetermined search word corresponding with the keyword sequence is directly obtained, and will the specific search of wherein preceding second quantity Word recommends user as related term, if being not present, and keyword sequence " java_ webpages " is whether there is in inquiry database, If in the presence of, the first quantity predetermined search word corresponding with the keyword sequence is obtained, and will wherein preceding second quantity spy Search term, which is determined, as related term recommends user.Now, keyword sequence " java_ webpages " has been found in hot topic caching, Because the quantity of predetermined search word is less than the second quantity, therefore the consequently recommended related term to user is followed successively by " java programmers " " java backstages ".
Fig. 3 shows the schematic diagram of the search term processing unit 300 based on correlation of one embodiment of the invention.As schemed Shown in 3, the search term processing unit 300 based on correlation includes extraction module 310, word-dividing mode 320, modular converter 330, choosing Select module 340, training module 350 and replacement module 360.
Extraction module 310 is suitable to the search daily record for obtaining each user in multiple users, and available search is extracted from search daily record Rope word.Extraction module 310 is further adapted for obtaining initial search word from search daily record and counts its quantity;When quantity is more than the During one numerical value, the initial search word of the corresponding user of quantity is directly deleted;Count all not deleted each initial search words Searching times;The initial search word that searching times are less than second value is filtered out, remaining initial search word is searched as available Rope word.The detail of the execution aforesaid operations of extraction module 310 can be found in the step S210 in method 200, not gone to live in the household of one's in-laws on getting married herein State.
Word-dividing mode 320 is connected with extraction module 310, suitable for carrying out word segmentation processing to each available search word, to obtain it Corresponding one or more Feature Words.The detail of the execution aforesaid operations of word-dividing mode 320 can be found in the step in method 200 Rapid S220, is not repeated herein.
Modular converter 330 is connected with word-dividing mode 320, suitable for one or more Feature Words are changed with life respectively Into corresponding keyword, and one or more corresponding keywords are combined, to form keyword corresponding with available search word Sequence.Modular converter 330 is further adapted for rejecting the feature for belonging to meaningless word or sensitive word in one or more Feature Words Word;Remaining Feature Words carry out synonym conversion after rejecting, to generate corresponding keyword.Modular converter 330 is further fitted In to one or more corresponding keywords progress text ascending order arrangements;To the keyword after arrangement, by two neighboring key It is attached between word with the first symbol, to form keyword sequence corresponding with available search word.Wherein, under the first symbol is Line.The detail of the execution aforesaid operations of modular converter 330 can be found in the step S230 in method 200, not gone to live in the household of one's in-laws on getting married herein State.
Selecting module 340 is connected with modular converter 330, suitable for from the available search word corresponding to each keyword sequence, Frequency of occurrence highest available search word is selected as the predetermined search word of the keyword sequence.Selecting module 340 performs above-mentioned The detail of operation can be found in the step S240 in method 200, not repeated herein.
Training module 350 is connected with modular converter 330, suitable for each keyword sequence is separately input into correlation calculations mould It is trained, is closed according to the Sequential output of correlation from big to small first quantity related to the keyword sequence inputted in type Keyword sequence.The detail of the execution aforesaid operations of training module 350 can be found in the step S250 in method 200, refuse herein To repeat.
Replacement module 360 is connected with selecting module 340 and training module 350 respectively, suitable for export the first quantity Keyword sequence replaces with its corresponding predetermined search word, so as to form keyword sequence and the first quantity predetermined search word Between corresponding relation.The detail of the execution aforesaid operations of replacement module 360 can be found in the step S260 in method 200, this Place is not repeated.
Fig. 4 shows the schematic diagram of the search term processing unit 400 based on correlation of another embodiment of the invention.Such as Shown in Fig. 4, the extraction module 410 of the search term processing unit 400 based on correlation, word-dividing mode 420, modular converter 430, choosing Module 440, training module 450 and replacement module 460 are selected, respectively with the search term processing unit 300 based on correlation in Fig. 3 Extraction module 310, word-dividing mode 320, modular converter 330, selecting module 340, training module 350 and replacement module 360 are one by one Correspondence, is consistent, and increased processing module 470 newly.
Processing module 470 is connected with modular converter 430, suitable for counting the number of times that each keyword sequence repeats;When secondary When number is less than the first numerical value, the corresponding keyword sequence of number of times is rejected;When number of times is not less than the first numerical value, retain number of times correspondence Keyword sequence.The detail that processing module 470 performs aforesaid operations can be found in method 200 after execution step S230, The processing procedure that the number of times repeated according to keyword sequence is rejected or retained to the keyword sequence, is not gone to live in the household of one's in-laws on getting married herein State.
Fig. 5 shows the schematic diagram of the search term processing unit 500 based on correlation of another embodiment of the invention.Such as Shown in Fig. 5, the extraction module 510 of the search term processing unit 500 based on correlation, word-dividing mode 520, modular converter 530, choosing Module 540, training module 550 and replacement module 560 are selected, respectively with the search term processing unit 300 based on correlation in Fig. 3 Extraction module 310, word-dividing mode 320, modular converter 330, selecting module 340, training module 350 and replacement module 360 are one by one Correspondence, is consistent, and increased recommending module 580 newly.
Recommending module 580 is connected with replacement module 560, suitable for when receiving the query search word of user's key entry, to looking into Ask search term to be handled, to form keyword sequence corresponding with query search word;Obtain right with it according to keyword sequence The the first quantity predetermined search word answered, and select from the first quantity predetermined search word the specific search of preceding second quantity Word, the second quantity is not more than the first quantity;Recommend the second quantity predetermined search word as the related term of query search word Give the user.The detail that recommending module 580 performs aforesaid operations can be found in method 200 after execution step S260, Recommend the processing procedure of the related term of the query search word during query search word for receiving user's key entry to the user, herein not Repeated
The specific steps and embodiment handled on the search term based on correlation, in the description based on Fig. 2 Detailed disclosure, here is omitted.
In the existing search word treatment method based on correlation, it is believed that user's inquiry entry with identical follow-up word has Certain similarity, if user input data enough, the relevant search word of these entries can be provided based on collaborative filtering, But when the search data scale of construction is little, and user's inquiry entry homogeneity is serious, many entries may be without follow-up word, Er Qieruo Search content is not theed least concerned, and is now failed using follow-up word, is unfavorable for the processing of unexpected winner relative words.According to present invention implementation The technical scheme of the processing of the search term based on correlation of example, carries out word segmentation processing to obtain to each available search word of user first Corresponding one or more Feature Words are taken, each Feature Words are changed to generate corresponding keyword, each keyword is combined To form keyword sequence corresponding with available search word, from the available search word corresponding to each keyword sequence, select Each keyword sequence is separately input to phase by existing frequency highest available search word as the predetermined search word of the keyword sequence Be trained in closing property computation model, according to the Sequential output of correlation from big to small it is related with the keyword sequence of input the One quantity keyword sequence, the keyword sequence that the first quantity is exported replaces with its corresponding predetermined search word, shape Corresponding relation between keyword sequence and the first quantity predetermined search word.In the above-mentioned technical solutions, the correlation meter Calculate model and only consider the distance between search term, when window is set to infinity, the irregular search of user will not influence it Correlation calculations, while also being had a clear superiority in the processing of unexpected winner vocabulary, without manually adjusting power to popular vocabulary.In addition, After formation keyword sequence corresponding with available search word, united for the number of times that each keyword sequence repeats Meter, is rejected for the keyword sequence of the numerical value of number of times first, without all keyword sequences are carried out with subsequent treatment, reduction Computation complexity and time cost.In addition, the available search word of user is to be extracted in advance from search daily record, carrying Meeting filtering spam user and the low search data of searching times, while ensuring result efficiently and accurately, enter during taking One step improves processing speed.
A7. the method as any one of A1-6, when receiving the query search word of user's key entry, methods described is also Including:
The query search word is handled, to form keyword sequence corresponding with the query search word;
The first corresponding quantity predetermined search word is obtained according to the keyword sequence, and it is individual from first quantity Preceding second quantity predetermined search word is selected in predetermined search word, second quantity is not more than first quantity;
The user is recommended using the second quantity predetermined search word as the related term of the query search word.
B9. the device as described in B8, the extraction module is further adapted for:
Initial search word is obtained from the search daily record and counts its quantity;
When the quantity is more than the first numerical value, the initial search word of the corresponding user of the quantity is directly deleted;
Count the searching times of all not deleted each initial search words;
The initial search word that searching times are less than second value is filtered out, remaining initial search word is regard as available search Word.
B10. the device as described in B8 or 9, the modular converter is further adapted for:
Reject the Feature Words for belonging to meaningless word or sensitive word in one or more Feature Words;
Remaining Feature Words carry out synonym conversion after rejecting, to generate corresponding keyword.
B11. the device as any one of B8-10, the modular converter is further adapted for:
Text ascending order arrangement is carried out to one or more corresponding keyword;
To the keyword after arrangement, will be attached between two neighboring keyword with the first symbol, with formed with it is described The corresponding keyword sequence of available search word.
B12. the device as described in B11, wherein, first symbol is underscore.
B13. the device as any one of B8-12, in addition to processing module, is suitable to:
Count the number of times that each keyword sequence repeats;
When the number of times is less than the first numerical value, the corresponding keyword sequence of the number of times is rejected;
When the number of times is not less than the first numerical value, retain the corresponding keyword sequence of the number of times.
B14. the device as any one of B8-13, in addition to recommending module, is suitable to:
When receive user key entry query search word when, the query search word is handled, with formed with it is described The corresponding keyword sequence of query search word;
The first corresponding quantity predetermined search word is obtained according to the keyword sequence, and it is individual from first quantity Preceding second quantity predetermined search word is selected in predetermined search word, second quantity is not more than first quantity;
The user is recommended using the second quantity predetermined search word as the related term of the query search word.
In the specification that this place is provided, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice in the case of these no details.In some instances, known method, knot is not been shown in detail Structure and technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, exist Above in the description of the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:It is i.e. required to protect The application claims of shield are than the feature more features that is expressly recited in each claim.More precisely, as following As claims reflect, inventive aspect is all features less than single embodiment disclosed above.Therefore, abide by Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself It is used as the separate embodiments of the present invention.
Those skilled in the art should be understood the module or unit or group of the equipment in example disclosed herein Between can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined as a module or be segmented into addition multiple Submodule.
Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or group between be combined into one between module or unit or group, and can be divided into addition multiple submodule or subelement or Between subgroup.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is disclosed to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can or similar purpose identical, equivalent by offer alternative features come generation Replace.
Although in addition, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of be the same as Example does not mean in of the invention Within the scope of and form different embodiments.For example, in the following claims, times of embodiment claimed One of meaning mode can be used in any combination.
In addition, be described as herein can be by the processor of computer system or by performing for some in the embodiment Method or the combination of method element that other devices of the function are implemented.Therefore, with for implementing methods described or method The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, device embodiment Element described in this is the example of following device:The device is used to implement as in order to performed by implementing the element of the purpose of the invention Function.
Various technologies described herein can combine hardware or software, or combinations thereof is realized together.So as to the present invention Method and apparatus, or the process and apparatus of the present invention some aspects or part can take embedded tangible media, such as it is soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and when being performed by the machine, the machine becomes to put into practice this hair Bright equipment.
In the case where program code is performed on programmable computers, computing device generally comprises processor, processor Readable storage medium (including volatibility and nonvolatile memory and/or memory element), at least one input unit, and extremely A few output device.Wherein, memory is arranged to store program codes;Processor is arranged to according to the memory Instruction in the described program code of middle storage, performs the search word treatment method based on correlation of the present invention.
By way of example and not limitation, computer-readable medium includes computer-readable storage medium and communication media.Calculate Machine computer-readable recording medium includes computer-readable storage medium and communication media.Computer-readable storage medium storage such as computer-readable instruction, The information such as data structure, program module or other data.Communication media is general modulated with carrier wave or other transmission mechanisms etc. Data-signal processed passes to embody computer-readable instruction, data structure, program module or other data including any information Pass medium.Any combination above is also included within the scope of computer-readable medium.
As used in this, unless specifically stated so, come using ordinal number " first ", " second ", " the 3rd " etc. Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being so described must Must have the time it is upper, spatially, in terms of sequence or given order in any other manner.
Although describing the present invention according to the embodiment of limited quantity, above description, the art are benefited from It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that The language that is used in this specification primarily to readable and teaching purpose and select, rather than in order to explain or limit Determine subject of the present invention and select.Therefore, in the case of without departing from the scope and spirit of the appended claims, for this Many modifications and changes will be apparent from for the those of ordinary skill of technical field.For the scope of the present invention, to this The done disclosure of invention is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims (10)

1. a kind of search word treatment method based on correlation, suitable for being performed in computing device, methods described includes:
The search daily record of each user in multiple users is obtained, available search word is extracted from the search daily record;
Word segmentation processing is carried out to each available search word, to obtain its corresponding one or more Feature Words;
One or more Feature Words are changed respectively to generate corresponding keyword, and combine one or more Multiple corresponding keywords, to form keyword sequence corresponding with the available search word;
From the available search word corresponding to each keyword sequence, selection frequency of occurrence highest available search word is used as the key The predetermined search word of word sequence;
Each keyword sequence is separately input to be trained in correlation calculations model, according to the order of correlation from big to small Export the first quantity keyword sequence related to the keyword sequence inputted;
The keyword sequence that first quantity is exported replaces with its corresponding predetermined search word, so as to form keyword sequence With the corresponding relation between the first quantity predetermined search word.
2. the method as described in claim 1, described to include from the search daily record the step of extraction available search word:
Initial search word is obtained from the search daily record and counts its quantity;
If the quantity is more than the first numerical value, the initial search word of the corresponding user of the quantity is directly deleted;
Count the searching times of all not deleted each initial search words;
The initial search word that searching times are less than second value is filtered out, remaining initial search word is regard as available search word.
3. method as claimed in claim 1 or 2, described to be changed one or more Feature Words respectively to generate The step of corresponding keyword, includes:
Reject the Feature Words for belonging to meaningless word or sensitive word in one or more Feature Words;
Remaining Feature Words carry out synonym conversion after rejecting, to generate corresponding keyword.
4. the method as any one of claim 1-3, the combination is one or more a corresponding keyword, with The step of forming keyword sequence corresponding with the available search word includes:
Text ascending order arrangement is carried out to one or more corresponding keyword;
It to the keyword after arrangement, will be attached, used with being formed with described with the first symbol between two neighboring keyword The corresponding keyword sequence of search term.
5. method as claimed in claim 4, wherein, first symbol is underscore.
6. the method as any one of claim 1-5, is forming keyword sequence corresponding with the available search word The step of after, in addition to:
Count the number of times that each keyword sequence repeats;
If the number of times is less than the first numerical value, the corresponding keyword sequence of the number of times is rejected;
If the number of times is not less than the first numerical value, retain the corresponding keyword sequence of the number of times.
7. a kind of search term processing unit based on correlation, suitable for residing in computing device, described device includes:
Extraction module, the search daily record suitable for obtaining each user in multiple users extracts available search from the search daily record Word;
Word-dividing mode, suitable for carrying out word segmentation processing to each available search word, to obtain its corresponding one or more Feature Words;
Modular converter, suitable for one or more Feature Words are changed to generate corresponding keyword, and group respectively One or more corresponding keyword is closed, to form keyword sequence corresponding with the available search word;
Selecting module, suitable for from the available search word corresponding to each keyword sequence, selecting frequency of occurrence highest is available to search Rope word as the keyword sequence predetermined search word;
Training module, suitable for being separately input to each keyword sequence to be trained in correlation calculations model, according to correlation The first quantity keyword sequence related to the keyword sequence inputted of Sequential output from big to small;
Replacement module, the keyword sequence suitable for the first quantity is exported replaces with its corresponding predetermined search word, so that The corresponding relation formed between keyword sequence and the first quantity predetermined search word.
8. a kind of computing device, including the search term processing unit based on correlation as claimed in claim 7.
9. a kind of computing device, including:
One or more processors;
Memory;And
One or more programs, wherein one or more of program storages are in the memory and are configured as by described one Individual or multiple computing devices, one or more of programs include being used to perform in the method according to claim 1 to 6 Either method instruction.
10. a kind of computer-readable recording medium for storing one or more programs, one or more of programs include instruction, The instruction is when executed by a computing apparatus so that in method of the computing device according to claim 1 to 6 Either method.
CN201710515009.XA 2017-06-29 2017-06-29 A kind of search word treatment method based on correlation, device and calculate equipment Active CN107220384B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710515009.XA CN107220384B (en) 2017-06-29 2017-06-29 A kind of search word treatment method based on correlation, device and calculate equipment
CN201911033168.1A CN110795628B (en) 2017-06-29 2017-06-29 Search term processing method and device based on correlation and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710515009.XA CN107220384B (en) 2017-06-29 2017-06-29 A kind of search word treatment method based on correlation, device and calculate equipment

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201911033168.1A Division CN110795628B (en) 2017-06-29 2017-06-29 Search term processing method and device based on correlation and computing equipment

Publications (2)

Publication Number Publication Date
CN107220384A true CN107220384A (en) 2017-09-29
CN107220384B CN107220384B (en) 2019-11-15

Family

ID=59950626

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710515009.XA Active CN107220384B (en) 2017-06-29 2017-06-29 A kind of search word treatment method based on correlation, device and calculate equipment
CN201911033168.1A Active CN110795628B (en) 2017-06-29 2017-06-29 Search term processing method and device based on correlation and computing equipment

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201911033168.1A Active CN110795628B (en) 2017-06-29 2017-06-29 Search term processing method and device based on correlation and computing equipment

Country Status (1)

Country Link
CN (2) CN107220384B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609192A (en) * 2017-10-12 2018-01-19 北京京东尚科信息技术有限公司 The supplement searching method and device of a kind of search engine
CN107798091A (en) * 2017-10-23 2018-03-13 金蝶软件(中国)有限公司 The method and its relevant device that a kind of data crawl
CN110457339A (en) * 2018-05-02 2019-11-15 北京京东尚科信息技术有限公司 Data search method and device, electronic equipment, storage medium
CN110750682A (en) * 2018-07-06 2020-02-04 武汉斗鱼网络科技有限公司 Title hot word automatic metering method, storage medium, electronic equipment and system
CN110795612A (en) * 2019-10-28 2020-02-14 北京字节跳动网络技术有限公司 Search word recommendation method and device, electronic equipment and computer-readable storage medium
CN112685361A (en) * 2020-12-24 2021-04-20 北京浪潮数据技术有限公司 Information query method and device and computer readable storage medium
CN112883295A (en) * 2019-11-29 2021-06-01 北京搜狗科技发展有限公司 Data processing method, device and medium
CN113239183A (en) * 2021-05-28 2021-08-10 北京达佳互联信息技术有限公司 Training method and device of ranking model, electronic equipment and storage medium
CN116340469A (en) * 2023-05-29 2023-06-27 之江实验室 Synonym mining method and device, storage medium and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328752B (en) * 2021-01-04 2021-06-15 平安科技(深圳)有限公司 Course recommendation method and device based on search content, computer equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104143005A (en) * 2014-08-04 2014-11-12 五八同城信息技术有限公司 Related searching system and method
CN104199822A (en) * 2014-07-11 2014-12-10 五八同城信息技术有限公司 Method and system for identifying demand classification corresponding to searching
CN104239321A (en) * 2013-06-14 2014-12-24 高德软件有限公司 Data processing method and device for search engine
CN105335391A (en) * 2014-07-09 2016-02-17 阿里巴巴集团控股有限公司 Processing method and device of search request on the basis of search engine

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136213B (en) * 2011-11-23 2017-04-12 阿里巴巴集团控股有限公司 Method and device for providing related words
US8700621B1 (en) * 2012-03-20 2014-04-15 Google Inc. Generating query suggestions from user generated content
CN104598583B (en) * 2015-01-14 2018-01-09 百度在线网络技术(北京)有限公司 The generation method and device of query statement recommendation list

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239321A (en) * 2013-06-14 2014-12-24 高德软件有限公司 Data processing method and device for search engine
CN105335391A (en) * 2014-07-09 2016-02-17 阿里巴巴集团控股有限公司 Processing method and device of search request on the basis of search engine
CN104199822A (en) * 2014-07-11 2014-12-10 五八同城信息技术有限公司 Method and system for identifying demand classification corresponding to searching
CN104143005A (en) * 2014-08-04 2014-11-12 五八同城信息技术有限公司 Related searching system and method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609192A (en) * 2017-10-12 2018-01-19 北京京东尚科信息技术有限公司 The supplement searching method and device of a kind of search engine
CN107798091B (en) * 2017-10-23 2021-05-18 金蝶软件(中国)有限公司 Data crawling method and related equipment thereof
CN107798091A (en) * 2017-10-23 2018-03-13 金蝶软件(中国)有限公司 The method and its relevant device that a kind of data crawl
CN110457339A (en) * 2018-05-02 2019-11-15 北京京东尚科信息技术有限公司 Data search method and device, electronic equipment, storage medium
CN110750682A (en) * 2018-07-06 2020-02-04 武汉斗鱼网络科技有限公司 Title hot word automatic metering method, storage medium, electronic equipment and system
CN110795612A (en) * 2019-10-28 2020-02-14 北京字节跳动网络技术有限公司 Search word recommendation method and device, electronic equipment and computer-readable storage medium
CN112883295A (en) * 2019-11-29 2021-06-01 北京搜狗科技发展有限公司 Data processing method, device and medium
CN112883295B (en) * 2019-11-29 2024-02-23 北京搜狗科技发展有限公司 Data processing method, device and medium
CN112685361A (en) * 2020-12-24 2021-04-20 北京浪潮数据技术有限公司 Information query method and device and computer readable storage medium
CN112685361B (en) * 2020-12-24 2024-09-10 北京浪潮数据技术有限公司 Information query method, device and computer readable storage medium
CN113239183A (en) * 2021-05-28 2021-08-10 北京达佳互联信息技术有限公司 Training method and device of ranking model, electronic equipment and storage medium
CN116340469A (en) * 2023-05-29 2023-06-27 之江实验室 Synonym mining method and device, storage medium and electronic equipment
CN116340469B (en) * 2023-05-29 2023-08-11 之江实验室 Synonym mining method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110795628B (en) 2023-04-11
CN107220384B (en) 2019-11-15
CN110795628A (en) 2020-02-14

Similar Documents

Publication Publication Date Title
CN107220384B (en) A kind of search word treatment method based on correlation, device and calculate equipment
CN109840287B (en) Cross-modal information retrieval method and device based on neural network
CN111310438B (en) Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
CN106202010B (en) Method and apparatus based on deep neural network building Law Text syntax tree
CN111190997B (en) Question-answering system implementation method using neural network and machine learning ordering algorithm
CN113672708B (en) Language model training method, question-answer pair generation method, device and equipment
CN112100326B (en) Anti-interference question and answer method and system integrating retrieval and machine reading understanding
CN111310439B (en) Intelligent semantic matching method and device based on depth feature dimension changing mechanism
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN107168954A (en) Text key word generation method and device and electronic equipment and readable storage medium storing program for executing
CN116244418B (en) Question answering method, device, electronic equipment and computer readable storage medium
CN105468719B (en) A kind of inquiry error correction method, device and calculate equipment
CN107977347A (en) A kind of topic De-weight method and computing device
CN107341233A (en) A kind of position recommends method and computing device
CN111898369A (en) Article title generation method, model training method and device and electronic equipment
WO2023109436A1 (en) Part of speech perception-based nested named entity recognition method and system, device and storage medium
CN112287656B (en) Text comparison method, device, equipment and storage medium
CN107688609A (en) A kind of position label recommendation method and computing device
CN117421393B (en) Generating type retrieval method and system for patent
CN117744652A (en) Domain feature word mining method and device based on large language model
WO2023240839A1 (en) Machine translation method and apparatus, and computer device and storage medium
US20130339003A1 (en) Assisted Free Form Decision Definition Using Rules Vocabulary
CN108491423A (en) A kind of sort method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant