CN104166651A - Data searching method and device based on integration of data objects in same classes - Google Patents

Data searching method and device based on integration of data objects in same classes Download PDF

Info

Publication number
CN104166651A
CN104166651A CN201310182427.3A CN201310182427A CN104166651A CN 104166651 A CN104166651 A CN 104166651A CN 201310182427 A CN201310182427 A CN 201310182427A CN 104166651 A CN104166651 A CN 104166651A
Authority
CN
China
Prior art keywords
data
label
data label
objects
data object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310182427.3A
Other languages
Chinese (zh)
Other versions
CN104166651B (en
Inventor
郎皓
欧海峰
张丙奇
孙健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201711064889.XA priority Critical patent/CN107844565B/en
Priority to CN201310182427.3A priority patent/CN104166651B/en
Publication of CN104166651A publication Critical patent/CN104166651A/en
Application granted granted Critical
Publication of CN104166651B publication Critical patent/CN104166651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data searching method and device based on integration of data objects in the same classes. The data searching method comprises the steps that a search request from a user is received, and one or more data objects matched with the search request are searched for among the data objects to be searched for; each found data object is analyzed to obtain a data label of each data object; the obtained data labels are matched; one or more data objects matched with the data labels are integrated into a same-class data object set, and the search result is fed back to the user. According to the data searching method and device based on integration of the data objects in the same classes, the massive data objects are integrated in a classified mode, the data objects in the same classes are obtained, one of the fed-back multiple same-class data objects is displayed in a search engine, so that the accuracy and fed-back rate of data search are improved, and the diversity of the searching results is increased.

Description

The method and apparatus of the data search based on homogeneous data object is integrated
Technical field
The application relates to data search field, relates in particular to a kind of method and apparatus of the data search based on homogeneous data object is integrated.
Background technology
Along with the arriving in cloud epoch, large data have attracted increasing concern, and large data technique does not lie in grasp mass data/data object, become the needed data of user and be more conceived to reach collection, process and arrange within rational time.In network, exist a large amount of data, utilize fully these data, can bring great convenience for user's life.User can be by using search engine to carry out data search, in order to obtain expecting the data of acquisition.Taking data search as example, search engine captures the webpage in internet in advance, after the webpage to captured carries out pre-service, just can provide retrieval service.Wherein, most important is exactly the keyword extracting in webpage, and other also comprise removes repeated pages, participle, judgement type of webpage, analyzes hyperlink, calculates the importance degree/richness etc. of webpage.
In the time carrying out data search, search engine is just according to the key word of user's input, retrieve the occurrence high with this key word correlativity, but in this process, the Search Results enormous amount matching with described key word, and include the every field of social life, thus cause Search Results quality low, as: be unfavorable for that user uses, poor accuracy.
If the means that employing information is integrated, search engine can carry out the mass data object of its crawl the processing such as content is selected, analyzed, classification, and the scope that can dwindle data search increases the specific aim of Search Results.But due to the ambiguity existing between data (as: the corresponding different field of same key word), cause the accuracy of Search Results low; Or key word exists other expression methods (too net of Ethernet, second), cause Search Results to return not comprehensive.
For example, key word " Ethernet " is carried out to data search, in search results pages, there will be the Search Results relevant to " Ethernet ", but " Ethernet " and " second is net too " is the key word of same meaning different expression, owing to there is not any incidence relation between both keyword, the Search Results relevant to " second is net too " can appear in search results pages, causes a part of Search Results to fail to be retrieved, reduce Search Results quality, as: the return rate of Search Results.
And, because search engine has carried out the processing such as content is selected, analyzed, classification to the data/data object of magnanimity, in the time returning to Search Results, in search results pages, can show multiple same or analogous data objects, so just cause the waste of Search Results.For example, in every one page search results pages, can only show 20 Search Results, but in these 20 Search Results, have 10 to be same or analogous data object, user has to repeatedly click lower one page so, to check different data objects.
Summary of the invention
The application's fundamental purpose is to provide a kind of method and apparatus of the data search based on homogeneous data object is integrated, while carrying out data search to solve the search engine of use prior art, because data volume is excessive, and between data object and data object, there is not relevance, and the low-quality problem of Search Results occurring.
In order to solve the problems of the technologies described above, the application's object is achieved through the following technical solutions:
The application provides a kind of method of the data search based on homogeneous data object is integrated, comprise the following steps: receive the searching request from user, one or more data objects that search and described searching request match in all data objects to be searched; Each in described one or more data objects that analysis searches, to obtain the data label of data object described in each; The described data label obtaining is mated; One or more data objects that described data label is matched are integrated into homogeneous data object composition, and are back to user as Search Results.
Preferably, according in the method described in the application, described data label comprises the first data label and the second data label, the first data label and the second data label different attributive character of identified data object respectively.
Preferably, according in the method described in the application, can also comprise: to all data objects to be searched, integrate and process in advance, to determine the corresponding one or more homogeneous data objects of data object to be searched described in each, to obtain data object mapping relations table.
Preferably, according in the method described in the application, to all data objects to be searched, integrate and process in advance, comprising: the second data label in each data object and the second data label classification distribution table are excavated to processing; The second data label in each data object is carried out to the second data label excavation, generate the synon set of the second data label of all data objects; The first data label in each data object is carried out to the first data label excavation, generate the first data label synonym set of all data objects; The first data label in each data object and the second data label are excavated, generate the mapping relations of the first data label to the second data label.
Preferably, according in the method described in the application, described the second data label synonym comprises: same item now, has different the second data labels and has multiple data objects of identical the first data label; Described the first data label synonym comprises: the first multiple similar data label in same data object.
Preferably, according in the method described in the application, the first data label in each data object and the second data label are excavated, generate the mapping relations of the first data label to the second data label, comprise: if a data object only has first data label and described the first data label only to have co-occurrence with the second unique data label, set up the mapping relations of described the first data label and described the second data label.
Preferably, according in the method described in the application, to all data objects to be searched, integrate and process in advance, comprise: extract one or more the second data labels in same data object, to obtain one or more candidate's the second data labels, and one or more candidates the second data label extracting is carried out to disambiguation; Based on the rule of configuration, extract the first data label in multiple data objects, and to multiple the first data label normalizeds that extract; Synon the second data label or the first data label are normalized each other; According to the first data label building and the mapping relations of the second data label, to lacking the data object of the second data label, carry out the second data label completion.
Preferably, according in the method described in the application, one or more candidates the second data label extracting is carried out to disambiguation, comprise: the classification distribution table based on the second data label, obtain the number of times that described candidate's the second data label occurs in described classification, if number of times is greater than default threshold value, think the second data label of described data object; And/or, if multiple candidate's the second data labels appear in a data object, being chosen in the second data label classification distribution table, described second data label that occurrence number is maximum is as the second data label of described data object.
Preferably, according in the method described in the application, can comprise: in search results pages, show one of them of multiple data objects in described homogeneous data combination, wherein, described homogeneous data combination comprises: multiple data objects of homogeneous data object each other.
Preferably, according in the method described in the application, described homogeneous data object can comprise: in same item now, have the second data label of identical or synonym and have multiple data objects of the first data label of identical or synonym.
The application also provides a kind of device of the data search based on homogeneous data object is integrated, comprise: receive and search module, for receiving the searching request from user, one or more data objects that search and described searching request match in all data objects to be searched; Acquisition module, for analyzing each of described one or more data objects of searching, to obtain the data label of data object described in each; Matching module, for mating the described data label obtaining; Integrate and return to module, being integrated into homogeneous data object composition for one or more data objects that described data label is matched, and being back to user as Search Results.
Preferably, according in the device described in the application, described data label comprises the first data label and the second data label, the first data label and the second data label different attributive character of identified data object respectively.
Preferably, according in the device described in the application, can also comprise: pretreatment module, be used for all data objects to be searched, integrate and process in advance, to determine the corresponding one or more homogeneous data objects of data object to be searched described in each, to obtain data object mapping relations table.
Preferably, according in the device described in the application, described pretreatment module is also configured to: the second data label in each data object and the second data label classification distribution table are excavated to processing; The second data label in each data object is carried out to the second data label excavation, generate the synon set of the second data label of all data objects; The first data label in each data object is carried out to the first data label excavation, generate the first data label synonym set of all data objects; The first data label in each data object and the second data label are excavated, generate the mapping relations of the first data label to the second data label; If a data object only has first data label and described the first data label only to have co-occurrence with the second unique data label, set up the mapping relations of described the first data label and described the second data label.
Preferably, according in the device described in the application, described the second data label synonym comprises: now similar, and there are different the second data labels and there are multiple data objects of identical the first data label; Stating the first data label synonym comprises: the first multiple similar data label in same data object.
Preferably, according in the device described in the application, described pretreatment module is also configured to: extract one or more the second data labels in same data object, to obtain one or more candidate's the second data labels, and one or more candidates the second data label extracting is carried out to disambiguation; Based on the rule of configuration, extract the first data label in multiple data objects, and to multiple the first data label normalizeds that extract; Synon the second data label or the first data label are normalized each other; According to the first data label building and the mapping relations of the second data label, to lacking the data object of the second data label, carry out the second data label completion; Classification distribution table based on the second data label, obtains the number of times that described candidate's the second data label occurs in described classification, if number of times is greater than default threshold value, thinks the second data label of described data object; If and/or multiple candidate's the second data labels appear in a data object, be chosen in the second data label classification distribution table, described second data label that occurrence number is maximum is as the second data label of described data object.
Preferably, according in the device described in the application, described integration with return to module and be also configured to: in search results pages, show one of them of multiple data objects in the combination of described homogeneous data, wherein, described homogeneous data combination comprises: multiple data objects of homogeneous data object each other; Described homogeneous data object comprises: in same item now, have the second data label of identical or synonym and have multiple data objects of the first data label of identical or synonym.
Compared with prior art, according to the application's technical scheme, there is following beneficial effect:
The application utilizes the important label/attribute such as the first data label, the second data label of data object, in advance to the integration of classifying of mass data object, and between homogeneous data object, set up association, improve accuracy and the return rate of data search, thereby promoted the quality of Search Results.
Multiple homogeneous data objects that the application returns to search engine are integrated processing, and in search results pages, only show in the plurality of homogeneous data object, thereby make search results pages show a greater variety of data objects, increased the diversity of Search Results, better user experience.
Brief description of the drawings
Accompanying drawing described herein is used to provide further understanding of the present application, forms the application's a part, and the application's schematic description and description is used for explaining the application, does not form the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the process flow diagram of the method for the data search based on homogeneous data object is integrated of the embodiment of the present application;
Fig. 2 is the process flow diagram of the step of the pre-integration processing to homogeneous data object of the embodiment of the present application;
Fig. 3 is the process flow diagram to the data mining processing under all data objects to be searched execution lines of the embodiment of the present application;
Fig. 4 is the embodiment of the present application all data objects to be searched is carried out corresponding normalization and shone upon the process flow diagram of processing; And
Fig. 5 is the structural drawing of the device of the data search based on homogeneous data object is integrated of the embodiment of the present application.
Embodiment
The application's main thought is, utilize the data label (attribute) comprising in searched data object to distinguish homogeneous data object and inhomogeneity data object, the data label (for example: the first data label and the second data label etc.) comprising in usage data object, the mass data object in integrated database in advance.For example: to shining upon between the second data label of same implication different expression, as, " Ethernet " and " second is net too ", to shining upon between difference the first data label comprising in same data object, the different pieces of information object with same the first data label is shone upon etc., based on the mapping relations between data object and data label, to obtain multiple data objects of homogeneous data object each other.And, in data search, based on the integration in advance of mass data object, according to user's searching request, as utilized " key word " searching request (Key), obtain the data object matching with this key word in database in, can also obtain the homogeneous data object of this data object matching, thereby accuracy and the return rate of data search are improved, and, can also integrate processing to multiple homogeneous data objects, in search results pages, only show in the plurality of homogeneous data object, thereby make search results pages show a greater variety of data objects, increase the diversity of Search Results.
For making the application's object, technical scheme and advantage clearer, below in conjunction with drawings and the specific embodiments, the application is described in further detail.
According to the application's embodiment, provide a kind of method of the data search based on homogeneous data object is integrated.With reference to the process flow diagram of method of the data search based on homogeneous data is integrated of figure 1 the embodiment of the present application.
At step S102 place, receive the searching request from user, and carry out search.Wherein, described searching request is for searching out at all data objects to be searched the one or more data objects that match with this searching request.
This searching request can comprise key word or network linking etc., carries out search according to searching request, to find the data object matching with this key word, or finds this network linking one or more data objects pointed etc.User, by sending this searching request, can in many data objects, obtain and this key word or the data object that matches with the content of this network linking representative, and this data object matching can be one or more.
Described one or more data object can be stored in database with the form of data file.Wherein, each data object in described one or more data objects comprises various data labels, as the first data label, the second data label etc.The first data label, the second data label are to represent to represent two kinds of diverse features or attribute, and this description is in order to divide into two kinds of features but not definition.Be stored in data file to be searched in database, need to have corresponding data structure to organize and integrate, integrality, high-quality and the high-level efficiency of its search of guarantee, this is described integrating below in the processing of homogeneous data object.
At step S104 place, by each data object in the one or more data objects that search is analyzed, obtain the data label of described each data object.Wherein, described data label comprises the first data label and the second data label, the first data label and the second data label different attributive character of identified data object respectively.
In other words, can be by each data object in the one or more data objects that search be analyzed, obtain the first data label and/or second data label of described each data object.
In described each data object, comprise multiple data labels, as the title of data object, memory location, index number etc.Each data object can comprise the first data label, the second data label, and described the first data label and the second data label be the different data label of characterization data object respectively.Wherein, the first data label and the second data label can be determined a data object, so, in the application's embodiment, using the first data label and the second data label as the basis of integrating homogeneous data.For example,, in employee information table, using employee ID(12345) as the first data label, using name (Zhang San) as the second data label, this employee ID(12345) and name (Zhang San) can determine an employee Zhang San in this employee information table.
In mass data object, can utilize described the first data label and the second data label can determine a data Properties of Objects, determine the homogeneous data object of this data object, again for example, can be using commodity as a data object, using the article No. of these commodity as the first data label, using the brand word of these commodity as the second data label, can be by the commodity that search be analyzed, to obtain article No. and the brand word of these commodity, can mate with magnanimity commodity by the article No. of this acquisition and brand word, thereby obtain the similar commodity of these commodity, for example, commodity " shell head sport footwear " can be determined in commodity article No. " 1111 " and commodity brand word " Nike ", so in magnanimity commodity, can be by commodity article No. " 1111 " and commodity brand word " Nike " be mated with magnanimity commodity, thereby obtaining multiple is the commodity with money with " shell head sport footwear ".By the first data label obtaining and/or the second data label step of mating with mass data, specifically can be referring to step S106.
At step S106 place, according to the data label obtaining, as, the first data label and/or the second data label, mate the described data label obtaining, to obtain the one or more similar data object matching with described each data object.
Thus, can obtain the more data object that matches with searching request, thereby improve the comprehensive of data search, promote the quality of Search Results, the data search service of providing convenience for user.
At step S108 place, be a homogeneous data combination by described each data object and corresponding one or more homogeneous data objects integration (polymerization) thereof, and return to described user.
In other words, one or more data objects that described data label (the first data label and/or described the second data label) can be matched are integrated into homogeneous data object composition, and are back to user as Search Results.
Described homogeneous data combination comprises multiple data objects of homogeneous data object each other, and user can check each the homogeneous data object in the multiple homogeneous data objects that wherein comprise by this homogeneous data combination.
In one embodiment, can in search results pages, only show one of them of multiple data objects in homogeneous data combination, and hiding other homogeneous data objects, in the time need to showing other homogeneous data objects that this is hidden, can trigger one for showing the operation of these other homogeneous data objects of hiding, for example,, by triggering the modes such as a button.
Further, Search Results is by the data object matching with described searching request, and described in one or more homogeneous data objects corresponding to data object that match return to described user.Like this, in returning to the data object matching with searching request, return to the homogeneous data object corresponding with this data object, accuracy and the return rate of data search are improved, and the concept that uses homogeneous data to combine, condenses together of a sort data object, can in search results pages, show a greater variety of data objects, increase the diversity of Search Results.
Wherein, can, according to data label, as the first data label and/or the second data label, match more homogeneous data objects, be that the data structure of integrating based on homogeneous data object realizes.The integration process of similar data object will be specifically described below.
The process flow diagram of the pre-integration processing to homogeneous data object of the embodiment of the present application as shown in Figure 2.
Step S202, integrates processing in advance to all data objects to be searched, to determine the corresponding one or more homogeneous data objects of data object to be searched described in each, obtains data object mapping relations table.
Carry out the step that this integrates processing in advance, object is to obtain the one or more homogeneous data objects corresponding with each data object.Wherein, described homogeneous data object comprises, in identical class now, has the second data label of identical or synonym and has multiple data objects of the first data label of identical or synonym.
This step of integrating in advance processing is by the data mining processing to mass data object under line, and based on this, data object is excavated to result, the second data label on execution line and the extraction of the first data label, and corresponding normalization and mapping processing, the final mapping relations that obtain between data object, the first data label, the second data label, thereby, same item of a sort data object is now integrated together, obtains the incidence relation of the homogeneous data object after integrating.
First, based on mass data object (as several hundred million orders of magnitude), carry out the data mining processing under line, for example all data objects to be searched are carried out to the data mining processing under line, preferred mode, as shown in Figure 3, excavate the mapping table of the second data label table, the second data label synonym table, the second data label classification distribution table, the first data label synonym table and the first data label to the second data label.
Step S302, excavates processing to the second data label in each data object and the second data label classification distribution table.
Can extract the second data label in each data object, generate the set of the second data label of all data objects, as, the second data label table formed; And set based on this second data label, by all data objects in database, carry out classification division, the the second data label classification that obtains all data objects distributes, as, form the second data label classification distribution table, the various classification as paper database: GEOGRAPHIC ATTRIBUTES, life kind etc.Various classification as merchandising database: clothing, clock and watch class etc.
Preferably, to same class now, in the data file of each data object, add up all different the second data labels that it has, and the second all data label of all data objects is formed to the set of the second data label, and add up number of times or frequency that each second data label occurs.According to all second data labels of all data objects, can form the second data label table of all data objects, now the number of times or the frequency that occur in the class of difference classification according to the second data label of each data object, can form the second data label classification distribution table of all the second data labels, wherein, in this second data label classification distribution table, comprise multiple classifications, each class multiple the second data labels now, and the number of times of each the second data label (or frequency).
For example: GEOGRAPHIC ATTRIBUTES data object city now, in its file, extract school bus, subway, three the second different data labels of car, all occur in the file of " city " this object now in same GEOGRAPHIC ATTRIBUTES, " school bus ", " subway ", " car " three second data labels are all put into the second data label set, like this, same class now with inhomogeneity now all second data labels of all data objects all extract form the second data label set, can store by tabular form.And also need these the second data labels, as number of times or frequency (frequency) that in GEOGRAPHIC ATTRIBUTES object " city " now, different the second data labels " school bus ", " subway ", " car " occur are separately added up the classification distribution relation that forms the second data label." school bus " occurs 15 times, " subway " occurs 10 times, " car " occurs 20 times, and queue up from big to small, and other classifications as the file of " the large-scale articles for use of the family " data object of " life kind " in, the number of times that " school bus " occurs is 0, and the number of times that " subway " occurs is 0, and the number of times that " car " occurs is 20.Thus, " school bus ", " subway " these two second data labels belong to " geography " class now, and " car " can be corresponding in " geography ", " life " class now.Classification, such the second data label, this second data label are now added up to the number of times or the frequency (frequency) that occur and preserved by the second data label classification distribution table.
Wherein, the number of times that the second data label is occurred or the statistics of frequency, can be in the data file that comprises data object, the information of relevant the second data label that statistics occurs, for example: the second data label comprising in data object attribute information, title word segmentation result etc.
Step S304, can carry out the second data label excavation to the second data label in each data object, generates the synon set of the second data label of all data objects, as, form the second data label synonym table.
In same item now, the second data label in each data object is carried out to synonym and excavate processing, for example, extract same class the second data label and first data label of all data objects now.Be considered as right once jointly the occurring of the second data label synonym by thering is difference the second data label comprising in two data objects of identical the first data label.For example: data object M 1there is label a(the second data label) and coding B(the first data label); Data object M 2there is label A (the second data label) and coding B(the first data label), can think that label A and label a are synonym, data object M 1with data object M 2be exactly right once jointly the occurring of synonym.After this, can be based on the second data label classification distribution table, add up the right common occurrence number of all the second data label synonyms.Sort from high to low according to the second synon common occurrence number of data label (or frequency), can preferentially generate the second data label synonym table to the second data label synonym of high reps and come and preserve.
This excavation processing obtains same item now, has different the second data labels and has multiple data object associations of identical the first data label, forms the synon set of the second data label, forms the synon multiple data objects of the second data label each other.Such as, if there is the first identical data label between multiple data objects, but there is the second different data labels, described the second different data label can be called to the second data label synonym, can also preserve by the form with the second data label synonym table.
Step S306, can carry out the first data label excavation to the first data label in each data object, generates the first data label synonym set of all data objects, as the first data label synonym table.
For example, can extract multiple first data labels of (as its data file) in present individual data object, for example, multiple the first data labels that comprise in the heading message of file, if wherein two the first data labels meet that length is identical and prefix is identical, think the first data label synonym pair, finally, by the first all data label synonyms pair, aggregate into the first data label synonym bunch (synonym set), can adopt the first data label synonym table form to preserve.Like this, by the first multiple similar data label in same data object, formed the synon set of the first data label.
More specifically, in a data object, can comprise multiple the first data labels, as, a certain data object A comprises the first data label " 1110 ", " 1111 ", and, these two first data labels meet that length is identical and prefix is identical, and the first data label " 1110 " and " 1111 " can become the first data label synonym.Can form one first data label synonym table by such mode.
Step S308, can excavate the first data label in each data object and the second data label, generates the mapping relations of the first data label to the second data label.
For example, extract the first data label and the second data label in the data file of all data objects, according to described the second data label classification distribution table, add up same the first data label and different the second data label co-occurrence (the jointly occurring) number of times (or frequency) at data object, wherein, if a data object only has first data label and this first data label only to have co-occurrence mistake with the second unique data label, set up the mapping relations of this first data label and this second data label, as set up the mapping table of the first data label to the second data label, and preserve, so that second data tag information that may lack in the feature of some data object of completion, as, some data object may occur only having the first data label and the situation that there is no the second data label: in certain data object, only have coding " 11 " (the first data label) and without label (the second data label), but once there is and only occurred the situation with label " A " (the first data label) co-occurrence in coding " 11 ", mapping.The information of second data label this feature of such mapping in can completion data object.Thereby, in the time of response searching request, can provide the recall rate (find rate, search full rate, return rate) of the same class data object of polymerization.
For example: data object A only includes first data label " 1110 ", and do not comprise (lacking) second data label, according to the first data label of all data objects that extract and the second data label, at same class now, only there is co-occurrence with the second data label " BB " of data object B in this first data label " 1110 ", in other words, in all data objects, only have data object B to comprise the first data label " 1110 " and the second data label " BB ", only there is co-occurrence with the second data label " BB " of data object B in the first data label " 1110 " of data object A, in this case, can set up the mapping relations of the first data label " 1110 " to the second data label " BB ".
Further, the mapping table of the second data label table based on above-mentioned data mining polymerization, the second data label synonym table, the first data label synonym table, the first data label to the second data label, the classification of the second data label distributes, can all data objects on line be normalized and be shone upon, embody thus the mapping relations between data object, the first data label and the second data label.
According to the above-mentioned data mining result to mass data object, can form the incidence relation of the homogeneous data object of initial integration,, in same item now, there is the second data label of identical or synonym and there are multiple data objects of the first data label of identical or synonym.
Secondly, for a certain data object (or each data object), based on each set (table) of excavating under line, in the heading message and attribute information of this data object (data file), extract the second data label and the first data label, and, according to data mining result, all data objects to be searched are carried out to corresponding normalization (unification) and mapping processing, and final integration belongs to of a sort data object, as shown in Figure 4.Further optimizing similar data object integrates.The classification distribution table of optimization data object, the mapping relations of optimization data object, the second data label, the first data label etc.
Step S402, extracts one or more the second data labels in same data object, to obtain one or more candidate's the second data labels.Heading message participle to a certain data object (attribute information equally can), then by certain participle fragment (set) coupling the second data label table, if mated with the second data label in the second data label table completely, using the second data label as candidate.For example, in a data object, comprise the second data label " A ", the second data label " B ", the second data label " C ", and in the second data label table, only comprise the second data label " A ", the second data label " B ", candidate's the second data label using the second data label " A ", the second data label " B " as this data object.
Based on the second data label classification distribution table, add up number of times or frequency that different the second data labels occur, sort from high in the end according to number of times or frequency, the second data label that can be using the candidate of high reps or high frequency the second data label as this data object.
Step S404, carries out disambiguation to one or more candidates the second data label extracting.
The disambiguation processing that candidate's the second data label is carried out comprises that according to each candidate's the second data label occurrence number in classification distribution table or frequency filter out candidate's the second data label of conforming to a predetermined condition the second data label as data object.
In a concrete embodiment, classification under specified data object, classification distribution table based on the second data label, obtain the number of times (or frequency) that this candidate's second data label occurs in this classification, if number of times is greater than default threshold value (such as 1 time), think the second data label of this data object.In another embodiment, if multiple candidate's the second data labels appear in a data object, be chosen in the second data label classification distribution table, second data label of occurrence number maximum (frequency maximum) is as the second data label of this data object.For example, candidate's second data label of a known data object comprises the second data label " A " and the second data label " B ", according to the second data label classification distribution table, class under this data object now, the number of times that the second data label " A " occurs is 1000 times, the number of times that the second data label " B " occurs is 1 time, the second data label " A " can be defined as to the second data label of this data object.
Step S406, the second data label synonym normalization.The second data label after disambiguation can be based on excavating under line the second data label synonym table, the second data label of the data object of extraction is rewritten to normalization the second data label.For example, when the second data label " A " occurred 500 times in the second data label classification distribution table, the synonym " B " of this second data label " A " occurs 20 times in the second data label classification distribution table, the second data label " B " can be changed to the second data label " A ".
For example, a certain data object is commodity, and the second data label of commodity can comprise the brand word of commodity.Same commodity, may there is different literary styles in its brand word, comprise the synonym of brand word and the form of wrongly writing.For example, the brand word of a certain commodity is " new bolune ", and this brand word exists synonym " New Balance " and " new balance ", or the form of wrongly writing " newbalance ", or writes a Chinese character in simplified form " nb " etc.Can be according to the brand word (the second data label) after the synonym table of brand word (the second data label synonym table) and disambiguation, rewrite the second data label (brand word) extracting, a unified most suitable brand word, as the second data label of these commodity, uses " New Balance " second data label as these commodity as unified.
Step S408, according to the first data label building and the mapping relations of the second data label, to lacking the data object of the second data label, carries out the second data label completion.
If a data object has only extracted the first data label and has not extracted the second data label, the i.e. situation of the second data label disappearance, and, the first data label extracting, completely can with the first data label to the second data label mapping table of excavating under same class object line in the first data label match, in the first data label to the second data label mapping table excavating, obtain the second data label of this data object from this line, for this data object being aggregated in the set of corresponding same class data object, further, this the second data label can be write in this data object that lacks the second data label.
Step S410, based on the rule of configuration, extracts the first data label in multiple data objects.Based on configuration rule in heading message, the attribute information etc. of data file, the first data label of extracted data object.For example, configuration regular expression extracts the first data label of certain data object.
Step S412, to multiple the first data label normalizeds that extract.For example, in a data object, comprise identical data number, as " 1110 ", with different son numbering as " 001 ", " 002 ", son numbering is removed, to reach the normalization of the first data label: " 1110-001 " and " 1110-002 " is normalized under " 1110 ", or is normalized in the lump under the first identical data label " 1110-001 ".
Taking the commercial articles searching of search mass data object as example, the first data label of extracting in data object commodity is as article No., based on separator cutting, article No. " 537889-001 " is " 537889 " and " 001 " two parts based on "-" cutting, and main article No. " 537889 " is above considered as to the article No. after normalization.
Step S414, the first data label synonym normalization.After the first data label normalized, based on the first data label synonym table of excavating under line, the first data label of the data object of extraction to be rewritten, unification is first data label.For example, when the first data label " 1110 " occurred 500 times in the second data label classification distribution table, the synonym " 1111 " of this first data label " 1110 " occurs 20 times in the classification distribution table of data object, the first data label " 1111 " can be changed to the first data label " 1110 ".
Operation each table based on data mining under line on line, by each data object based on the most ever-present the first data label and the second data label in its data label, integrate, synon the second data label or the first data label are normalized each other, according to the second data label table, the second data label synonym table, the first data label synonym table, the second data label classification distribution table, the first data label to the second data label mapping table, determine in a certain class now, which data object should be integrated into homogeneous data object, and unified its second data label and the first data label, so that search coupling.
According to the data mining under line, with normalization, the completion processing on line, can obtain the mapping relations between data object, the first data label, this three of the second data label, can form data object mapping relations table, so that in the time of data search, according to this data object mapping relations table search homogeneous data object.
Step S204, the data object mapping relations table obtaining is integrated in storage in advance.This storage comprises that the data object mapping relations table of storing after the integration by obtaining after completion normalization on excavation and line under line is in database.
In to the integration processing procedure in advance of data object, form the second data label table, the second data label classification distribution table, the second data label synonym table, the first data label synonym table, the first data label to the second data label mapping table, and the result store that described various tables (set) are processed as pre-integration is in database, like this, can in data search, call at any time, to improve system arithmetic speed.
By the application's the method that data object is integrated in advance, associated by setting up between homogeneous data object, and in Search Results, show homogeneous data object, can provide more fully data for user, improve accuracy and the return rate of data search, thereby promoted the quality of Search Results.
The application also provides a kind of device of the data search of integrating based on homogeneous data.
As shown in Figure 5, be the structure drawing of device of data search of integrating based on homogeneous data of the embodiment of the present application.
According in the device 500 described in the application, can comprise and receiving and search module 501, acquisition module 503, matching module 505, integrate and return to module 507.The enforcement of each step of the corresponding said method of modules.
Wherein, receive and search module 501, for receiving the searching request from user, and carry out search, wherein, the one or more data objects of described searching request for matching in all data object to be searched search and described searching request.
Acquisition module 503, be used for each data object of the described one or more data objects that search by analysis, obtain the data label of data object described in each, wherein, described data label comprises the first data label and the second data label, the first data label and the second data label different attributive character of identified data object respectively.So, described acquisition module 503 can be for obtaining the first data label and/or second data label of data object described in each.
Matching module 505, for mating according to the described data label (the first data label and/or described the second data label) obtaining,, each data object in the one or more data objects that search is done to further coupling, to obtain the one or more homogeneous data objects corresponding with described each data object.
Integrate and return to module 507, being integrated into homogeneous data object composition for one or more data objects that described data label (the first data label and/or described the second data label) is matched, and being back to user as Search Results.Wherein, the combination of described homogeneous data comprises: multiple data objects of homogeneous data object each other, in search results pages, can show one of them of multiple data objects in described homogeneous data combination.
In the device 500 described in the application, also comprise pretreatment module 509 and memory module 511.
Wherein, pretreatment module 509, for all data objects to be searched are integrated to processing in advance, determines the corresponding one or more homogeneous data objects of data object to be searched described in each, to obtain data object mapping relations table.
Particularly, this pretreatment module 509 is carried out data object normalization and the mapping on data mining and the line under line to all data objects to be searched.
In the time of the data mining of carrying out under line, described pretreatment module 509 can be excavated processing to the second data label in each data object and the second data label classification distribution table.
Described pretreatment module 509 can be carried out the second data label excavation to the second data label in each data object, generates the synon set of the second data label of all data objects.Wherein, described the second data label synonym comprises: same item now, has different the second data labels and has multiple data objects of identical the first data label.
Described pretreatment module 509 can be carried out the first data label excavation to the first data label in each data object, generates the first data label synonym set of all data objects.Wherein, described the first data label synonym comprises: the first multiple similar data label in same data object.
Described pretreatment module 509 can be excavated the first data label in each data object and the second data label, generates the mapping relations of the first data label to the second data label.Particularly, if a data object only has first data label and this first data label only to have co-occurrence with the second unique data label, set up the mapping relations of this first data label and this second data label.
In the time carrying out the data object normalization on line and shine upon, this pretreatment module 509 is configured to: extract one or more the second data labels in same data object, to obtain one or more candidate's the second data labels, and one or more candidates the second data label extracting is carried out to disambiguation.Further, the classification distribution table based on the second data label, obtains the number of times that described candidate's the second data label occurs in this classification, if number of times is greater than default threshold value, thinks the second data label of this data object.In another kind of embodiment, if multiple candidate's the second data labels appear in a data object, be chosen in the second data label classification distribution table, described second data label that occurrence number is maximum is as the second data label of described data object.
Pretreatment module 509 is also configured to: based on the rule of configuration, extract the first data label in multiple data objects, and to multiple the first data label normalizeds that extract; Synon the second data label or the first data label are normalized each other; According to the first data label building and the mapping relations of the second data label, to lacking the data object of the second data label, carry out the second data label completion.
These pretreatment module 509 objects are all data objects to be searched (mass data object) to carry out integration processing in advance, to obtain homogeneous data object, described homogeneous data object is included in same item now, has the second data label of identical or synonym and has multiple data objects of the first data label of identical or synonym.And, integrating in advance in the process of processing, can obtain the mapping relations of data object, the first data label, the second data label, form data object mapping relations table.
Memory module 511, the data object mapping relations table obtaining for storing integration in advance.In the time carrying out data search, can pass through this data object mapping relations table, directly match the homogeneous data object corresponding with the data object searching.
So, utilize the key character/attribute such as the first data label, the second data label of data object, in advance to the integration of classifying of mass data object, and between homogeneous data object, set up associated, improve accuracy and the return rate of data search, thereby promoted the quality of Search Results.
And, multiple homogeneous data objects that search engine is returned are integrated processing, and can in search results pages, only show in the plurality of homogeneous data object, make search results pages show a greater variety of data objects, increase the diversity of Search Results, better user experience.
Because the embodiment of the step in the embodiment of the included modules of the described the application's of Fig. 5 device and the application's method is corresponding, owing to Fig. 1-Fig. 4 being described in detail, so for not fuzzy the application, no longer the detail of modules is described at this.
Each embodiment in this instructions is general, and the mode of going forward one by one that adopts is described, and what each embodiment stressed is and the difference of other embodiment, between each embodiment identical similar part mutually referring to.
The application can describe in the general context of computer executable instructions, for example program module or unit.Usually, program module or unit can comprise and carry out particular task or realize routine, program, object, assembly, data structure of particular abstract data type etc.In general, program module or unit can be realized by software, hardware or both combinations.Also can in distributed computing environment, put into practice the application, in these distributed computing environment, be executed the task by the teleprocessing equipment being connected by communication network.In distributed computing environment, program module or unit can be arranged in the local and remote computer-readable storage medium including memory device.
Finally, also it should be noted that, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby the process, method, commodity or the equipment that make to comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or be also included as the intrinsic key element of this process, method, commodity or equipment.The in the situation that of more restrictions not, the key element being limited by statement " comprising ... ", and be not precluded within process, method, commodity or the equipment that comprises described key element and also have other identical element.
Those skilled in the art should understand, the application's embodiment can be provided as method, system or computer program.Therefore, the application can adopt complete hardware implementation example, completely implement software example or the form in conjunction with the embodiment of software and hardware aspect.And the application can adopt the form at one or more upper computer programs of implementing of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) that wherein include computer usable program code.
Applied principle and the embodiment of specific case to the application herein and set forth, the explanation of above embodiment is just for helping to understand the application's method and main thought thereof; , for one of ordinary skill in the art, according to the application's thought, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application meanwhile.
In a typical configuration, computing equipment comprises one or more processors (CPU), input/output interface, network interface and internal memory.Internal memory may comprise the volatile memory in computer-readable medium, and the forms such as random access memory (RAM) and/or Nonvolatile memory, as ROM (read-only memory) (ROM) or flash memory (flash RAM).Internal memory is the example of computer-readable medium.
Computer-readable medium comprises that permanent and impermanency, removable and non-removable media can realize information storage by any method or technology.Information can be module or other data of computer-readable instruction, data structure, program.The example of the storage medium of computing machine comprises, but be not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic RAM (DRAM), the random access memory (RAM) of other types, ROM (read-only memory) (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc ROM (read-only memory) (CD-ROM), digital versatile disc (DVD) or other optical memory, magnetic magnetic tape cassette, the storage of tape magnetic rigid disk or other magnetic storage apparatus or any other non-transmission medium, can be used for the information that storage can be accessed by computing equipment.According to defining herein, computer-readable medium does not comprise non-temporary computer readable media (transitory media), as data-signal and the carrier wave of modulation.

Claims (17)

1. a method for the data search based on homogeneous data object is integrated, is characterized in that, comprising:
Receive the searching request from user, one or more data objects that search and described searching request match in all data objects to be searched;
Each in described one or more data objects that analysis searches, to obtain the data label of data object described in each;
The described data label obtaining is mated;
One or more data objects that described data label is matched are integrated into homogeneous data object composition, and are back to user as Search Results.
2. method according to claim 1, is characterized in that, described data label comprises the first data label and the second data label, the first data label and the second data label different attributive character of identified data object respectively.
3. method according to claim 2, it is characterized in that, also comprise: to all data objects to be searched, integrate and process in advance, to determine the corresponding one or more homogeneous data objects of data object to be searched described in each, to obtain data object mapping relations table.
4. method according to claim 3, is characterized in that, to all data objects to be searched, integrates and processes in advance, comprising:
The second data label in each data object and the second data label classification distribution table are excavated to processing;
The second data label in each data object is carried out to the second data label excavation, generate the synon set of the second data label of all data objects;
The first data label in each data object is carried out to the first data label excavation, generate the first data label synonym set of all data objects;
The first data label in each data object and the second data label are excavated, generate the mapping relations of the first data label to the second data label.
5. method according to claim 4, is characterized in that,
Described the second data label synonym comprises: same item now, has different the second data labels and has multiple data objects of identical the first data label;
Described the first data label synonym comprises: the first multiple similar data label in same data object.
6. method according to claim 4, it is characterized in that, the first data label in each data object and the second data label are excavated, generate the mapping relations of the first data label to the second data label, comprise: if a data object only has first data label and described the first data label only to have co-occurrence with the second unique data label, set up the mapping relations of described the first data label and described the second data label.
7. method according to claim 3, is characterized in that, to all data objects to be searched, integrates and processes in advance, comprising:
Extract one or more the second data labels in same data object, to obtain one or more candidate's the second data labels, and one or more candidates the second data label extracting is carried out to disambiguation;
Based on the rule of configuration, extract the first data label in multiple data objects, and to multiple the first data label normalizeds that extract;
Synon the second data label or the first data label are normalized each other;
According to the first data label building and the mapping relations of the second data label, to lacking the data object of the second data label, carry out the second data label completion.
8. method according to claim 7, is characterized in that, one or more candidates the second data label extracting is carried out to disambiguation, comprising:
Classification distribution table based on the second data label, obtains the number of times that described candidate's the second data label occurs in described classification, if number of times is greater than default threshold value, thinks the second data label of described data object; And/or, if multiple candidate's the second data labels appear in a data object, being chosen in the second data label classification distribution table, described second data label that occurrence number is maximum is as the second data label of described data object.
9. the method for claim 1, is characterized in that, comprising:
In search results pages, show one of them of multiple data objects in the combination of described homogeneous data, wherein, described homogeneous data combination comprises: multiple data objects of homogeneous data object each other.
10. method according to claim 2, is characterized in that, described homogeneous data object comprises: in same item now, have the second data label of identical or synonym and have multiple data objects of the first data label of identical or synonym.
The device of 11. 1 kinds of data searchs based on homogeneous data object is integrated, is characterized in that, comprising:
Receive and search module, for receiving the searching request from user, one or more data objects that search and described searching request match in all data objects to be searched;
Acquisition module, for analyzing each of described one or more data objects of searching, to obtain the data label of data object described in each;
Matching module, for mating the described data label obtaining;
Integrate and return to module, being integrated into homogeneous data object composition for one or more data objects that described data label is matched, and being back to user as Search Results.
12. devices according to claim 11, is characterized in that, described data label comprises the first data label and the second data label, the first data label and the second data label different attributive character of identified data object respectively.
13. devices according to claim 12, is characterized in that, also comprise:
Pretreatment module, for to all data objects to be searched, integrates and processes in advance, to determine the corresponding one or more homogeneous data objects of data object to be searched described in each, to obtain data object mapping relations table.
14. devices according to claim 13, is characterized in that, described pretreatment module is also configured to:
The second data label in each data object and the second data label classification distribution table are excavated to processing;
The second data label in each data object is carried out to the second data label excavation, generate the synon set of the second data label of all data objects;
The first data label in each data object is carried out to the first data label excavation, generate the first data label synonym set of all data objects;
The first data label in each data object and the second data label are excavated, generate the mapping relations of the first data label to the second data label;
If a data object only has first data label and described the first data label only to have co-occurrence with the second unique data label, set up the mapping relations of described the first data label and described the second data label.
15. devices according to claim 14, is characterized in that, described the second data label synonym comprises: same item now, has different the second data labels and has multiple data objects of identical the first data label; Described the first data label synonym comprises: the first multiple similar data label in same data object.
16. devices according to claim 13, is characterized in that, described pretreatment module is also configured to:
Extract one or more the second data labels in same data object, to obtain one or more candidate's the second data labels, and one or more candidates the second data label extracting is carried out to disambiguation;
Based on the rule of configuration, extract the first data label in multiple data objects, and to multiple the first data label normalizeds that extract;
Synon the second data label or the first data label are normalized each other;
According to the first data label building and the mapping relations of the second data label, to lacking the data object of the second data label, carry out the second data label completion;
Classification distribution table based on the second data label, obtains the number of times that described candidate's the second data label occurs in described classification, if number of times is greater than default threshold value, thinks the second data label of described data object; And/or
If there are multiple candidate's the second data labels in a data object, be chosen in the second data label classification distribution table, described second data label that occurrence number is maximum is as the second data label of described data object.
17. devices according to claim 12, is characterized in that, described integration with return to module and be also configured to:
In search results pages, show one of them of multiple data objects in the combination of described homogeneous data, wherein, described homogeneous data combination comprises: multiple data objects of homogeneous data object each other;
Described homogeneous data object comprises: in same item now, have the second data label of identical or synonym and have multiple data objects of the first data label of identical or synonym.
CN201310182427.3A 2013-05-16 2013-05-16 Method and apparatus based on the data search integrated to homogeneous data object Active CN104166651B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711064889.XA CN107844565B (en) 2013-05-16 2013-05-16 Commodity searching method and device
CN201310182427.3A CN104166651B (en) 2013-05-16 2013-05-16 Method and apparatus based on the data search integrated to homogeneous data object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310182427.3A CN104166651B (en) 2013-05-16 2013-05-16 Method and apparatus based on the data search integrated to homogeneous data object

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201711064889.XA Division CN107844565B (en) 2013-05-16 2013-05-16 Commodity searching method and device

Publications (2)

Publication Number Publication Date
CN104166651A true CN104166651A (en) 2014-11-26
CN104166651B CN104166651B (en) 2017-10-13

Family

ID=51910470

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201310182427.3A Active CN104166651B (en) 2013-05-16 2013-05-16 Method and apparatus based on the data search integrated to homogeneous data object
CN201711064889.XA Active CN107844565B (en) 2013-05-16 2013-05-16 Commodity searching method and device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201711064889.XA Active CN107844565B (en) 2013-05-16 2013-05-16 Commodity searching method and device

Country Status (1)

Country Link
CN (2) CN104166651B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462316A (en) * 2014-12-01 2015-03-25 苏州朗米尔照明科技有限公司 Label matching method
CN104794003A (en) * 2015-02-04 2015-07-22 汉鼎信息科技股份有限公司 Large data analysis system integrating real-time mode and non-real-time mode
CN105138637A (en) * 2015-08-24 2015-12-09 浪潮软件股份有限公司 Data processing method and device
CN105279277A (en) * 2015-11-12 2016-01-27 百度在线网络技术(北京)有限公司 Knowledge data processing method and device
CN105786922A (en) * 2014-12-25 2016-07-20 高德软件有限公司 Method and equipment for determining missing electronic map data
CN106033457A (en) * 2015-03-18 2016-10-19 阿里巴巴集团控股有限公司 Method and device for determining the attribute information of fruit target attributes
CN106372191A (en) * 2016-08-31 2017-02-01 广东华邦云计算股份有限公司 Data search method and device
CN107451141A (en) * 2016-05-30 2017-12-08 阿里巴巴集团控股有限公司 Processing exchange method, the apparatus and system of a kind of data recommendation
CN108415886A (en) * 2018-03-07 2018-08-17 清华大学 A kind of data label error correction method and device based on production process
US10073640B1 (en) 2017-03-10 2018-09-11 Toshiba Memory Corporation Large scale implementation of a plurality of open channel solid state drives
CN109063222A (en) * 2018-11-04 2018-12-21 吉铁磊 A kind of self-adapting data searching method based on big data
CN109558468A (en) * 2018-12-13 2019-04-02 北京百度网讯科技有限公司 Processing method, device, equipment and the storage medium of resource
US10289309B2 (en) 2016-09-12 2019-05-14 Toshiba Memory Corporation Automatic detection of multiple streams
CN110059967A (en) * 2019-04-23 2019-07-26 北京相数科技有限公司 A kind of data processing method and device applied to city decision Analysis
CN110263240A (en) * 2019-06-18 2019-09-20 深圳市酷开网络科技有限公司 Integration searching method, device and the computer readable storage medium of information
CN110516140A (en) * 2019-08-15 2019-11-29 北京泰迪熊移动科技有限公司 A kind of information processing method, equipment and computer storage medium
US10542089B2 (en) 2017-03-10 2020-01-21 Toshiba Memory Corporation Large scale implementation of a plurality of open channel solid state drives
CN111667235A (en) * 2020-05-18 2020-09-15 上海兴亚报关有限公司 Customs clearance information management method, system, device and storage medium
CN113763081A (en) * 2020-08-26 2021-12-07 北京沃东天骏信息技术有限公司 Article recall method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173547A1 (en) * 2008-04-22 2012-07-05 Uc4 Software Gmbh Method Of Detecting A Reference Sequence Of Events In A Sample Sequence Of Events
CN102890686A (en) * 2011-07-21 2013-01-23 腾讯科技(深圳)有限公司 Method and system for showing commodity search result
CN102915498A (en) * 2011-08-03 2013-02-06 腾讯科技(深圳)有限公司 Method and device for goods classification of e-commerce platform
CN102930038A (en) * 2012-11-12 2013-02-13 江苏外博资讯有限公司 Combined method of search result similar items and system of the same
CN103020207A (en) * 2012-12-06 2013-04-03 优视科技有限公司 Browser label page grouping management method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100776697B1 (en) * 2006-01-05 2007-11-16 주식회사 인터파크지마켓 Method for searching products intelligently based on analysis of customer's purchasing behavior and system therefor
CN101206674A (en) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 Enhancement type related search system and method using commercial articles as medium
CN101887436B (en) * 2009-05-12 2013-08-21 阿里巴巴集团控股有限公司 Retrieval method and device
CN101963966A (en) * 2009-07-24 2011-02-02 李占胜 Method for sorting search results by adding labels into search results
CN102207963A (en) * 2011-05-30 2011-10-05 何吴迪 Post-search instant intelligent navigation technique for cloud computing window platform
CN103092856B (en) * 2011-10-31 2015-09-23 阿里巴巴集团控股有限公司 Search result ordering method and equipment, searching method and equipment
CN102902806B (en) * 2012-10-17 2016-02-10 深圳市宜搜科技发展有限公司 A kind of method and system utilizing search engine to carry out query expansion
CN103106240A (en) * 2012-12-12 2013-05-15 江苏乐买到网络科技有限公司 Method of searching products in online shopping

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173547A1 (en) * 2008-04-22 2012-07-05 Uc4 Software Gmbh Method Of Detecting A Reference Sequence Of Events In A Sample Sequence Of Events
CN102890686A (en) * 2011-07-21 2013-01-23 腾讯科技(深圳)有限公司 Method and system for showing commodity search result
CN102915498A (en) * 2011-08-03 2013-02-06 腾讯科技(深圳)有限公司 Method and device for goods classification of e-commerce platform
CN102930038A (en) * 2012-11-12 2013-02-13 江苏外博资讯有限公司 Combined method of search result similar items and system of the same
CN103020207A (en) * 2012-12-06 2013-04-03 优视科技有限公司 Browser label page grouping management method and device

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462316A (en) * 2014-12-01 2015-03-25 苏州朗米尔照明科技有限公司 Label matching method
CN104462316B (en) * 2014-12-01 2017-09-26 苏州朗米尔照明科技有限公司 A kind of tag match method
CN105786922A (en) * 2014-12-25 2016-07-20 高德软件有限公司 Method and equipment for determining missing electronic map data
CN104794003A (en) * 2015-02-04 2015-07-22 汉鼎信息科技股份有限公司 Large data analysis system integrating real-time mode and non-real-time mode
CN104794003B (en) * 2015-02-04 2019-06-04 汉鼎宇佑互联网股份有限公司 It is a kind of to integrate real-time and non-real-time mode big data analysis system
CN106033457A (en) * 2015-03-18 2016-10-19 阿里巴巴集团控股有限公司 Method and device for determining the attribute information of fruit target attributes
CN106033457B (en) * 2015-03-18 2019-10-18 阿里巴巴集团控股有限公司 A kind of method and apparatus of the attribute information of determining fruit objective attribute target attribute
CN105138637A (en) * 2015-08-24 2015-12-09 浪潮软件股份有限公司 Data processing method and device
WO2017080220A1 (en) * 2015-11-12 2017-05-18 百度在线网络技术(北京)有限公司 Knowledge data processing method and apparatus
CN105279277A (en) * 2015-11-12 2016-01-27 百度在线网络技术(北京)有限公司 Knowledge data processing method and device
CN107451141B (en) * 2016-05-30 2021-01-29 阿里巴巴集团控股有限公司 Data recommendation processing interaction method, device and system
CN107451141A (en) * 2016-05-30 2017-12-08 阿里巴巴集团控股有限公司 Processing exchange method, the apparatus and system of a kind of data recommendation
CN106372191A (en) * 2016-08-31 2017-02-01 广东华邦云计算股份有限公司 Data search method and device
US10289309B2 (en) 2016-09-12 2019-05-14 Toshiba Memory Corporation Automatic detection of multiple streams
US10542089B2 (en) 2017-03-10 2020-01-21 Toshiba Memory Corporation Large scale implementation of a plurality of open channel solid state drives
US10073640B1 (en) 2017-03-10 2018-09-11 Toshiba Memory Corporation Large scale implementation of a plurality of open channel solid state drives
CN108415886B (en) * 2018-03-07 2019-04-05 清华大学 A kind of data label error correction method and device based on production process
CN108415886A (en) * 2018-03-07 2018-08-17 清华大学 A kind of data label error correction method and device based on production process
CN109063222A (en) * 2018-11-04 2018-12-21 吉铁磊 A kind of self-adapting data searching method based on big data
CN109063222B (en) * 2018-11-04 2021-11-30 朗威寰球(北京)科技集团有限公司 Self-adaptive data searching method based on big data
CN109558468A (en) * 2018-12-13 2019-04-02 北京百度网讯科技有限公司 Processing method, device, equipment and the storage medium of resource
CN110059967A (en) * 2019-04-23 2019-07-26 北京相数科技有限公司 A kind of data processing method and device applied to city decision Analysis
CN110263240A (en) * 2019-06-18 2019-09-20 深圳市酷开网络科技有限公司 Integration searching method, device and the computer readable storage medium of information
CN110516140A (en) * 2019-08-15 2019-11-29 北京泰迪熊移动科技有限公司 A kind of information processing method, equipment and computer storage medium
CN111667235A (en) * 2020-05-18 2020-09-15 上海兴亚报关有限公司 Customs clearance information management method, system, device and storage medium
CN113763081A (en) * 2020-08-26 2021-12-07 北京沃东天骏信息技术有限公司 Article recall method and device

Also Published As

Publication number Publication date
CN107844565B (en) 2021-07-16
CN107844565A (en) 2018-03-27
CN104166651B (en) 2017-10-13

Similar Documents

Publication Publication Date Title
CN104166651A (en) Data searching method and device based on integration of data objects in same classes
CN107729336B (en) Data processing method, device and system
CN101593200B (en) Method for classifying Chinese webpages based on keyword frequency analysis
CN106484875B (en) MOLAP-based data processing method and device
CN102725759B (en) For the semantic directory of Search Results
CN111008265B (en) Enterprise information searching method and device
CN102419778B (en) Information searching method for discovering and clustering sub-topics of query statement
CN103049575B (en) A kind of academic conference search system of topic adaptation
US20150178273A1 (en) Unsupervised Relation Detection Model Training
CN103106199B (en) Text searching method and device
CN104376406A (en) Enterprise innovation resource management and analysis system and method based on big data
CN103577416A (en) Query expansion method and system
US20170109358A1 (en) Method and system of determining enterprise content specific taxonomies and surrogate tags
US20110208715A1 (en) Automatically mining intents of a group of queries
CN110008306A (en) A kind of data relationship analysis method, device and data service system
CN103838754A (en) Information searching device and method
WO2010008488A1 (en) Method and system for dynamically generating a search result
CN104216979A (en) Chinese technology patent automatic classification system and method for patent classification by using system
CN103778206A (en) Method for providing network service resources
CN104077385A (en) Classification and retrieval method of files
CN105138637A (en) Data processing method and device
Gasmi et al. : A new informative generic base of association rules
Tayal et al. Fast retrieval approach of sentimental analysis with implementation of bloom filter on Hadoop
CN103761286A (en) Method for retrieving service resources on basis of user interest
CN105404677A (en) Tree structure based retrieval method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant