CN104166651B - Method and apparatus based on the data search integrated to homogeneous data object - Google Patents

Method and apparatus based on the data search integrated to homogeneous data object Download PDF

Info

Publication number
CN104166651B
CN104166651B CN201310182427.3A CN201310182427A CN104166651B CN 104166651 B CN104166651 B CN 104166651B CN 201310182427 A CN201310182427 A CN 201310182427A CN 104166651 B CN104166651 B CN 104166651B
Authority
CN
China
Prior art keywords
data
label
data label
objects
data object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310182427.3A
Other languages
Chinese (zh)
Other versions
CN104166651A (en
Inventor
郎皓
欧海峰
张丙奇
孙健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201711064889.XA priority Critical patent/CN107844565B/en
Priority to CN201310182427.3A priority patent/CN104166651B/en
Publication of CN104166651A publication Critical patent/CN104166651A/en
Application granted granted Critical
Publication of CN104166651B publication Critical patent/CN104166651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is related to a kind of method and apparatus based on the data search integrated to homogeneous data object, including:The searching request from user is received, one or more data objects that search matches with the searching request in all data objects to be searched;Each in the one or more of data objects searched is analyzed, to obtain the data label of each data object;The data label of acquisition is matched;One or more data objects that the data label matches are integrated into homogeneous data object composition, and user is back to as search result.The application utilizes the data label of data object, classification integration is carried out to mass data object in advance and obtains homogeneous data object, and one in multiple homogeneous data objects of return is shown in a search engine, so as to improve the accuracy and return rate of data search, and add the diversity of search result.

Description

Method and apparatus based on the data search integrated to homogeneous data object
Technical field
The application is related to field of data search, more particularly to a kind of based on the data search integrated to homogeneous data object Method and apparatus.
Background technology
With the arriving of cloud era, big data has attracted increasing concern, and big data technology does not lie in grasp magnanimity Data/data object, and be more conceived to and reach collection within reasonable time, handle and arrange as the number required for user According to.Substantial amounts of data are there are in a network, sufficiently using these data, can be brought greatly just for the life of user Profit.User can carry out data search by using search engine, to the data for obtaining expecting obtaining.Using data search as Example, search engine is captured to the webpage in internet in advance, after the webpage to being captured is pre-processed, and can just be carried For retrieval service.Wherein, most important is exactly to extract the keyword in webpage, and other also include removing repeated pages, participle, sentenced Disconnected type of webpage, analysis hyperlink, the importance/richness for calculating webpage etc..
When carrying out data search, search engine is the keyword inputted according to user, is retrieved and the keyword phase The high occurrence of closing property, but in the process, the search result enormous amount matched with the keyword, and include society The every field of life, so as to cause search result quality low, such as:It is unfavorable for user to use, accuracy is poor.
According to the means of information integration, mass data object that search engine can be captured carry out content select, The processing such as analysis, classification, can reduce the scope of data search, increase the specific aim of search result.But it is due between data The ambiguity of presence(Such as:Same keyword correspondence different field), cause the accuracy of search result low;Or there is it in keyword His expression method(Ethernet, second too net), cause search result to return not comprehensive.
For example, carrying out data search to keyword " Ethernet ", occur in search results pages related to " Ethernet " Search result, but " Ethernet " and " second too net " is the keyword of same meaning different expression, due to both keyword it Between be not present any incidence relation, then the search result related to " second too net " will not be not present in search results pages, cause A part of search result fails to be retrieved, and reduces search result quality, such as:The return rate of search result.
Also, select, analyze, sorting out etc. and handle because search engine has carried out content to the data/data object of magnanimity, When returning to search result, in search results pages, multiple same or analogous data objects can be shown, thus causes and searches The waste of hitch fruit.For example, 20 search results can only be shown in every page search results pages, but in this 20 search knots It is same or analogous data object to have 10 in fruit, then user has to repeatedly click on lower one page, to check different numbers According to object.
The content of the invention
The main purpose of the application be provide a kind of method based on the data search integrated to homogeneous data object with Device, with solve using prior art search engine carry out data search when, because data volume is excessive, and data object with In the absence of relevance between data object, and the low-quality problem of search result occurred.
In order to solve the above-mentioned technical problem, the purpose of the application is achieved through the following technical solutions:
This application provides a kind of method based on the data search integrated to homogeneous data object, comprise the following steps: The searching request from user is received, search matches with the searching request in all data objects to be searched one Or multiple data objects;Each in the one or more of data objects searched is analyzed, to obtain described in each The data label of data object;The data label of acquisition is matched;The data label is matched one or Multiple data objects are integrated into homogeneous data object composition, and are back to user as search result.
Preferably, according in method described herein, the data label includes the first data label and the second number According to label, the first data label attributive character different with the second data label difference identified data object.
Preferably, according in method described herein, it can also include:To all data objects to be searched, in advance First integration is handled, to determine one or more homogeneous data objects that each described data object to be searched is corresponding, with Obtain data object mapping table.
Preferably, according in method described herein, to all data objects to be searched, advance integration is handled, Including:Excavation processing is carried out to the second data label in each data object and the second data label classification distribution table;To each The second data label in data object carries out the second data label excavation, and the second data label for generating all data objects is same The set of adopted word;First data label excavation is carried out to the first data label in each data object, all data objects are generated The first data label TongYiCi CiLin;The first data label and the second data label in each data object is excavated, The first data label is generated to the mapping relations of the second data label.
Preferably, according in method described herein, the second data label synonym includes:Identical classification Under, multiple data objects with different second data labels and with identical first data label;First data label Synonym includes:Multiple the first similar data labels in same data object.
Preferably, according in method described herein, the first data label in each data object and second are counted Excavated according to label, generate the mapping relations of the first data label to the second data label, including:If a data object is only With the presence of first data label and first data label only with unique second data label co-occurrence, then set up described The mapping relations of first data label and second data label.
Preferably, according in method described herein, to all data objects to be searched, advance integration is handled, Including:The data label of one or more of same data object second is extracted, to obtain one or more data of candidate second Label, and disambiguation is carried out to the data label of one or more candidates second of extraction;Rule based on configuration, extracts multiple data The first data label in object, and to multiple first data label normalizeds of extraction;By the second of synonym each other Data label or the first data label are normalized;Closed according to the mapping of the first data label of structure and the second data label System, the data object to lacking the second data label carries out the second data label completion.
Preferably, according in method described herein, the data label of one or more candidates second of extraction is entered Row disambiguation, including:Classification distribution table based on the second data label, obtains the data label of candidate second in the classification The number of times of appearance, if number of times is more than default threshold value, then it is assumed that be the second data label of the data object;And/or, if one There are multiple data labels of candidate second in data object, then selects in the second data label classification distribution table, occurrence number is most Many second data labels as the data object the second data label.
Preferably, according in method described herein, it can include:In search results pages, show described similar One of them of multiple data objects in data combination, wherein, homogeneous data combination includes:Homogeneous data object each other Multiple data objects.
Preferably, according in method described herein, the homogeneous data object can include:In identical classification Under, multiple data pair with the second identical or synonymous data label and with the first identical or synonymous data label As.
Present invention also provides a kind of device based on the data search integrated to homogeneous data object, including:Receive with Search module, for receiving the searching request from user, search please with the search in all data objects to be searched Seek the one or more data objects matched;Acquisition module, for analyzing the one or more of data objects searched In each, to obtain the data label of each data object;Matching module, for the data mark to acquisition Label are matched;Integrate with returning to module, for one or more data objects that the data label matches to be integrated into Homogeneous data object composition, and it is back to user as search result.
Preferably, according in device described herein, the data label includes the first data label and the second number According to label, the first data label attributive character different with the second data label difference identified data object.
Preferably, according in device described herein, it can also include:Pretreatment module, for needing to be searched The data object of rope, advance integration processing, to determine corresponding one or more of each described data object to be searched Homogeneous data object, to obtain data object mapping table.
Preferably, according in device described herein, the pretreatment module is further configured to:To each data object In the second data label and the second data label classification distribution table carry out excavation processing;To the second number in each data object The second data label excavation is carried out according to label, the set of the second data label synonym of all data objects is generated;To each number The first data label excavation is carried out according to the first data label in object, the first data label for generating all data objects is synonymous Set of words;The first data label and the second data label in each data object is excavated, the first data label of generation is extremely The mapping relations of second data label;If a data object only have first data label and first data label only There is co-occurrence with unique second data label, then the mapping for setting up first data label and second data label is closed System.
Preferably, according in device described herein, the second data label synonym includes:It is similar now, Multiple data objects with different second data labels and with identical first data label;State the first data label synonym Including:Multiple the first similar data labels in same data object.
Preferably, according in device described herein, the pretreatment module is further configured to:Extract same data The data label of one or more of object second, to obtain one or more data labels of candidate second, and to the one of extraction Individual or multiple data labels of candidate second carry out disambiguation;Rule based on configuration, extracts the first data in multiple data objects Label, and to multiple first data label normalizeds of extraction;Second data label of synonym each other or first are counted It is normalized according to label;According to the mapping relations of the first data label of structure and the second data label, to lacking the second number According to the data object of label, the second data label completion is carried out;Classification distribution table based on the second data label, obtains described wait The number of times for selecting the second data label to occur in the classification, if number of times is more than default threshold value, then it is assumed that be the data pair The second data label of elephant;If and/or there are multiple data labels of candidate second in a data object, selects in the second data mark Sign in classification distribution table, most second data label of occurrence number as the data object the second data mark Label.
Preferably, according in device described herein, the integration is further configured to returning to module:In search knot In fruit page, one of them of multiple data objects in the homogeneous data combination is shown, wherein, the homogeneous data combination bag Include:Multiple data objects of homogeneous data object each other;The homogeneous data object includes:In identical class now, with identical Or synonymous the second data label and multiple data objects with the first identical or synonymous data label.
Compared with prior art, according to the technical scheme of the application, there is following beneficial effect:
The application is using important label/attributes such as the first data label, the second data labels of data object, in advance to sea Measure data object carry out classification integration, and between homogeneous data object set up association, improve data search accuracy and Return rate, so as to improve the quality of search result.
Multiple homogeneous data objects that the application returns to search engine carry out integration processing, and in search results pages only One in the plurality of homogeneous data object is shown, so that search results pages show a greater variety of data objects, is added The diversity of search result, better user experience.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used to explain the application, does not constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the flow chart of the method based on the data search integrated to homogeneous data object of the embodiment of the present application;
Fig. 2 is the flow chart for the step of pre- integration to homogeneous data object of the embodiment of the present application is handled;
Fig. 3 is the flow that all data objects to be searched are performed with the data mining processing under line of the embodiment of the present application Figure;
Fig. 4 is that being performed to all data objects to be searched for the embodiment of the present application normalizes and mapping processing accordingly Flow chart;And
Fig. 5 is the structure chart of the device based on the data search integrated to homogeneous data object of the embodiment of the present application.
Embodiment
The main thought of the application is, utilizes the data label included in searched data object(Attribute)Come area Divide homogeneous data object and inhomogeneity data object, use the data label included in data object(For example:First data label With the second data label etc.), the mass data object in advance integrated database.For example:To the of same implication different expression Mapped between two data labels, e.g., " Ethernet " and " second too net " is counted to the difference first included in same data object Mapped according between label, the different pieces of information object progress with same first data label is mapped etc., based on data Mapping relations between object and data label, to obtain multiple data objects of homogeneous data object each other.Also, in data In search, the advance integration based on mass data object, according to the searching request of user, such as utilizes " keyword "(Key)Search Rope is asked, and while obtaining the data object matched with the keyword in database, can also obtain the number matched According to the homogeneous data object of object, so that the accuracy and return rate of data search are improved, also, can also be to multiple similar Data object carries out integration processing, one in the plurality of homogeneous data object is only shown in search results pages, so that searching Rope result page shows a greater variety of data objects, adds the diversity of search result.
To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with drawings and the specific embodiments, to this Application is described in further detail.
According to embodiments herein, there is provided a kind of method based on the data search integrated to homogeneous data object. With reference to the flow chart of the method based on the data search integrated to homogeneous data of Fig. 1 the embodiment of the present application.
At step S102, the searching request from user is received, and perform search.Wherein, the searching request is used for The one or more data objects matched with the searching request are searched out in all data objects to be searched.
The searching request can perform search comprising keyword or network linking etc. according to searching request, to find with being somebody's turn to do The data object that keyword matches, or find one or more data objects pointed by the network linking etc..User passes through Send the searching request, can be obtained in many data objects with the keyword or with the content phase representated by the network linking The data object of matching, and the data object matched can be one or more.
One or more of data objects can be to be stored in database in the form of data file.Wherein, described one Each data object in individual or multiple data objects includes various data labels, such as the first data label, the second data Label etc..First data label, the second data label are to represent to represent two kinds of entirely different features or attribute, and this description is In order to divide into two kinds of features and non-a defined.It is stored in data file to be searched in database, it is desirable to have corresponding data knot Structure carries out tissue and integration, just can guarantee that the integrality of its search, high-quality and high efficiency, this will integrate homogeneous data below It is described in the processing of object.
At step S104, divided by each data object in one or more data objects to searching Analysis, obtains the data label of each data object.Wherein, the data label includes the first data label and the second number According to label, the first data label attributive character different with the second data label difference identified data object.
In other words, it can be divided by each data object in one or more data objects to searching Analysis, obtains the first data label and/or the second data label of each data object.
Multiple data labels, the title of such as data object, storage location, index are included in each described data object Numbering etc..Each data object can include the first data label, the second data label, first data label and second Data label distinguishes the different data labels of characterize data object.Wherein, the first data label and the second data label can be with A data object is determined, so, in embodiments herein, integration is used as using the first data label and the second data label The basis of homogeneous data.For example, in employee information table, with employee ID(12345)As the first data label, with name( Three)It is used as the second data label, the employee ID(12345)And name(Zhang San)It can determine that the zooid in the employee information table Work Zhang San.
In mass data object, it is possible to use first data label and the second data label can determine a number According to Properties of Objects, determine the homogeneous data object of the data object, in another example, can using commodity as a data object, , can be by searching using the brand word of the commodity as the second data label using the article No. of the commodity as the first data label Rope to commodity analyzed, to obtain the article No. and brand word of the commodity, can by the article No. and brand word of the acquisition with Magnanimity commodity are matched, so that the similar commodity of the commodity are obtained, for example, commodity article No. " 1111 " and Brand word are " resistance to Gram " commodity " shell head sport footwear " can be determined, then in magnanimity commodity, can by by commodity article No. " 1111 " and Brand word " Nike " is matched with magnanimity commodity, so that it is the commodity with money with " shell head sport footwear " to obtain multiple. The step of the first data label and/or the second data label of acquisition are matched with mass data, for details, reference can be made to step S106。
At step S106, according to the data label of acquisition, e.g., the first data label and/or the second data label, to obtaining The data label taken is matched, to obtain the one or more similar numbers matched with each described data object According to object.
Thus, the more data objects matched with searching request can be obtained, so that the comprehensive of data search is improved, The quality of search result is lifted, convenient data search services are provided the user.
, will each described data object and its corresponding one or more homogeneous data object integration at step S108 (Polymerization)Combined for a homogeneous data, and return to the user.
In other words, can be by the data label(First data label and/or second data label)Match One or more data objects are integrated into homogeneous data object composition, and are back to user as search result.
The homogeneous data combination includes multiple data objects of homogeneous data object each other, and user can be similar by this Each homogeneous data object that data combination is checked in the multiple homogeneous data objects wherein included.
In one embodiment, multiple data objects that can be in search results pages only in the combination of displaying homogeneous data One of them, and other homogeneous data objects are hidden, when needing to show other hiding homogeneous data objects, it can trigger One operation for showing other hiding homogeneous data objects, the modes such as a button are triggered for example, passing through.
Further, search result is the data object that will be matched with the searching request, and the number matched The user is returned to according to the corresponding one or more homogeneous data objects of object.So, match in return with searching request Data object while, return to corresponding with data object homogeneous data object, improve the accuracy of data search with Return rate, moreover, the concept combined using homogeneous data, of a sort data object is condensed together, can be in search knot A greater variety of data objects are shown in fruit page, the diversity of search result is added.
Wherein, can be according to data label, such as the first data label and/or the second data label are more same to match Class data object, is that the data structure integrated based on homogeneous data object is realized.It will be detailed below similar data object Integration process.
The flow chart of the pre- integration processing to homogeneous data object of the embodiment of the present application as shown in Figure 2.
All data objects to be searched are carried out advance integration processing, to determine to wait to search described in each by step S202 The corresponding one or more homogeneous data objects of the data object of rope, obtain data object mapping table.
The step of performing the advance integration processing, it is therefore intended that obtain one or many corresponding with each data object Individual homogeneous data object.Wherein, the homogeneous data object includes, in identical class now, with identical or synonymous second Data label and multiple data objects with the first identical or synonymous data label.
The step of advance integration processing, is handled by the data mining to mass data object under line, and based on this pair Data object excavates result, performs the extraction of the second data label and the first data label on line, and returns accordingly One changes and mapping processing, finally obtains the mapping relations between data object, the first data label, the second data label, so that, The of a sort data object of identical class now is integrated together, that is, the association of the homogeneous data object after being integrated is closed System.
First, based on mass data object(Such as several hundred million orders of magnitude), the data mining processing under line is performed, such as to all Data object to be searched performs the data mining processing under line, mode preferably, as shown in figure 3, excavating the second data mark Sign table, the second data label synonym table, the second data label classification distribution table, the first data label synonym table, Yi Ji The mapping table of one data label to the second data label.
Step S302, digs to the second data label in each data object and the second data label classification distribution table Pick is handled.
The second data label in each data object can be extracted, the collection of the second data label of all data objects is generated Close, e.g., form the second data label table;And the set based on second data label, by all data pair in database As carrying out classification division, obtaining the second data label classification distribution of all data objects, e.g., form the second data label class A variety of classifications of mesh distribution table, such as paper database:GEOGRAPHIC ATTRIBUTES, life kind etc..Such as the various differences of merchandising database Classification:Clothing, clock and watch class etc..
Preferably, to same class now, in the data file of each data object, count its have all different Two data labels, and all the second data labels of all data objects are constituted to the set of the second data labels, Yi Jitong Count the number of times or frequency of each second data label appearance., can be with according to all second data labels of all data objects The second data label table of all data objects is formed, according to the second data label of each data object in different classifications Number of times or frequency that class occurs now, can form the second data label classification distribution table of all second data labels, wherein, Comprising multiple second data labels multiple classifications, each class now in the second data label classification distribution table, and it is each The number of times of individual second data label(Or frequency).
For example:School bus, subway, three differences of car are extracted in the data object city of GEOGRAPHIC ATTRIBUTES now, its file The second data label, occur in the file for being all " city " this object in same GEOGRAPHIC ATTRIBUTES now, by " school bus ", Three the second data labels of " subway ", " car " are all put into the second data label set, so, same class now and not All second data labels of similar data object all now all extract to form the second data label set, Ke Yitong Cross tabular form storage.And also need to these the second data labels, difference second in such as object " city " of GEOGRAPHIC ATTRIBUTES now Number of times or frequency that data label " school bus ", " subway ", " car " each occur(The frequency)Carry out statistics and form the second data The classification distribution relation of label." school bus " occurs 15 times, and " subway " occurs 10 times, and " car " occurs 20 times, and from big to small Queue up, and in other classifications such as file of " the large-scale articles for use of the family " data object of " life kind ", the number of times that " school bus " occurs For 0, the number of times that " subway " occurs is 0, and the number of times that " car " occurs is 20.Thus, then " school bus ", " subway " the two second Data label belongs to " geography " class now, and " car " can be corresponded in " geography ", " life " class now.By classification, such Now the times or frequency that the second data label, second data label statistics occur(The frequency)Pass through the second data label class Mesh distribution table is preserved.
Wherein, the number of times or the statistics of frequency occurred to the second data label, can be in the data text including data object In part, the information of relevant second data label occurred is counted, for example:Wrapped in data object attribute information, title word segmentation result Second data label contained etc..
Step S304, can carry out the second data label excavation to the second data label in each data object, and generation is all The set of second data label synonym of data object, e.g., forms the second data label synonym table.
In identical class now, synonym excavation processing is carried out to the second data label in each data object, for example, extracting Go out the second data label and the first data label of same class all data objects now.By with identical first data label Different second data labels included in two data objects, which are considered as the once common of the second data label synonym pair, to be occurred.Example Such as:Data object M1With label a(Second data label)With coding B(First data label);Data object M2With label A (Second data label)With coding B(First data label), then can consider label A and label a is synonym, then data object M1With data object M2It is exactly that the once common of synonym pair occurs.Hereafter, the second data label classification distribution table can be based on, Count the common occurrence number of all second data label synonyms pair.Go out occurrence jointly according to the second data label synonym Number(Or frequency)Sort from high to low, can be preferentially same to the second data label synonym the second data label of generation of high reps Adopted vocabulary comes and preserved.
Excavation processing obtains identical class now, with different second data labels and with identical first data label The association of multiple data objects, forms the set of the second data label synonym, forms many of the second data label synonym each other Individual data object.Such as, if having the data label of identical first between multiple data objects, but with the second different data Label, can be referred to as the second data label synonym by the second different data label, can also be by with the second data The form of label synonym table is preserved.
Step S306, can carry out the first data label excavation to the first data label in each data object, and generation is all First data label TongYiCi CiLin of data object, such as the first data label synonym table.
Appeared in for example, can extract in individual data object(Such as its data file)Multiple first data labels, for example, Multiple first data labels included in the heading message of file, if to meet length identical for the data label of two of which first And prefix is identical, then it is assumed that be the first data label synonym pair, finally, the first all data label synonyms pair gathers Synthesize the first data label synonym cluster(TongYiCi CiLin), can be preserved using the first data label synonym table form.This Sample, by multiple the first similar data labels in same data object, forms the set of the first data label synonym.
More specifically, multiple first data labels, e.g., a certain data object A bags can be included in a data object Containing the first data label " 1110 ", " 1111 ", also, the two first data labels meet that length is identical and prefix is identical, then First data label " 1110 " and " 1111 " can turn into the first data label synonym.One can be formed by such mode One data label synonym table.
Step S308, can be excavated to the first data label and the second data label in each data object, generation The mapping relations of first data label to the second data label.
For example, the first data label and the second data label in extracting the data files of all data objects, according to The second data label classification distribution table, counts same first data label and different second data labels in data object Co-occurrence(It is common to occur)Number of times(Or frequency), wherein, if a data object only has first data label and first number Only there is co-occurrence mistake with unique second data label according to label, then set up first data label and second data label Mapping relations, such as set up the first data label to the mapping table of the second data label, and preserve, so as to some data objects of completion Feature in may missing the second data tag information, e.g., some data objects be likely to occur only the first data label and Situation without the second data label:Only " 11 " are encoded in certain data object(First data label)And without label(Second number According to label), but coding " 11 " once occurred and only occurred and label " A "(First data label)The situation of co-occurrence, then reflect Penetrate.Such mapping can be in completion data object second data label this feature information.So as to when response search please When asking, the recall rate of the same class data object of polymerization can be provided(Find rate, search full rate, return rate).
For example:Data object A only includes first data label " 1110 ", and does not include(Lack)Second data mark Label, according to the first data label and the second data label of all data objects extracted, first number in same class now There is co-occurrence in the second data label " BB " according to label " 1110 " only with data object B, in other words, in all data objects, Only data object B includes the first data mark of the first data label " 1110 " and the second data label " BB ", then data object A Sign second data label " BB " of " 1110 " only with data object B and there is co-occurrence, in such a case, it is possible to set up the first data Mapping relations of the label " 1110 " to the second data label " BB ".
Further, the second data label table for being polymerize based on above-mentioned data mining, the second data label synonym table, the One data label synonym table, the first data label to the second data label mapping table, the second data label classification distribution, All data objects on line can be normalized and be mapped, thus embody data object, the first data label and Mapping relations between second data label.
According to the above-mentioned data mining result to mass data object, the homogeneous data pair of initial integration can be formed The incidence relation of elephant, i.e. in identical class now, with the second identical or synonymous data label and with identical or synonymous Multiple data objects of first data label.
Secondly, for a certain data object(Or each data object), based on each set excavated under line(Table), from this Data object(In data file)Heading message and attribute information in, extract the second data label and the first data label, Also, according to data mining result, corresponding normalization is performed to all data objects to be searched(It is unified)And mapping Processing, final integrate belongs to of a sort data object, as shown in Figure 4.Further optimize similar data object to integrate.Optimization The classification distribution table of data object, optimization data object, the second data label, mapping relations of the first data label etc..
Step S402, extracts one or more of same data object the second data label, one or more to obtain The data label of candidate second.To the heading message participle of a certain data object(Attribute information equally can be with), then by certain participle Fragment(Set)The second data label table is matched, if matched completely with the second data label in the second data label table, It regard the second data label as candidate.For example, including the second data label " A ", the second data label in a data object " B ", the second data label " C ", and only comprising the second data label " A ", the second data label in the second data label table " B ", then using the second data label " A ", the second data label " B " as the data object the data label of candidate second.
Based on the second data label classification distribution table, number of times or frequency that different second data labels occur are counted, according to Number of times or frequency sort from high in the end, can regard the data label of candidate second of high reps or frequent as the data object Second data label.
Step S404, disambiguation is carried out to the data label of one or more candidates second of extraction.
The disambiguation processing carried out to the data label of candidate second includes being distributed in classification according to each data label of candidate second Occurrence number or frequency in table filter out the data label of candidate second conformed to a predetermined condition and counted as the second of data object According to label.
In a specific embodiment, the classification belonging to data object is determined, the classification based on the second data label Distribution table, obtains the number of times that the data label of candidate second occurs in the classification(Or frequency)If number of times is more than default threshold Value(Such as 1 time), then it is assumed that it is the second data label of the data object.In another embodiment, if a data object goes out Existing multiple data labels of candidate second, then select in the second data label classification distribution table, occurrence number is most(The frequency is most Greatly)Second data label as the data object the second data label.For example, as it is known that the candidate of a data object Second data label includes the second data label " A " and the second data label " B ", according to the second data label classification distribution table, In the class belonging to the data object now, the number of times that the second data label " A " occurs is 1000 times, and the second data label " B " goes out Existing number of times is 1 time, then the second data label " A " can be defined as to the second data label of the data object.
Step S406, the normalization of the second data label synonym.The second data label after disambiguation can be based on excavating under line The second data label synonym table, the second data label of the data object of extraction is rewritten, normalize the second data Label.For example, when the second data label " A " occurred 500 times in the second data label classification distribution table, the second data mark The synonym " B " of label " A " occurs 20 times in the second data label classification distribution table, then can be by the second data label " B " more It is changed to the second data label " A ".
For example, a certain data object is commodity, and the second data label of commodity can include the brand word of commodity.It is same Individual commodity, its brand word there may be different literary styles, include the synonym and the form wrongly write of brand word.For example, a certain business The brand word of product is " new bolune ", and the brand word has synonym " New Balance " and " new balance ", or the shape wrongly write Formula " newbalance ", or write a Chinese character in simplified form " nb " etc..Can be according to the synonym table of brand word(Second data label synonym table)With And the brand word after disambiguation(Second data label), rewrite the second data label extracted(Brand word), that is, unify one and most close Suitable brand word uses " New Balance " as the second data mark of the commodity as the second data label of the commodity as unified Label.
Step S408, according to the mapping relations of the first data label of structure and the second data label, to lacking the second number According to the data object of label, the second data label completion is carried out.
If a data object has only extracted the first data label and has not extracted the second data label, i.e., the second number According to the situation of tag misses, also, the first data label extracted, it is entirely capable of the first data with being excavated under same class purpose line First data label of the label into the second data label mapping table matches, then the first data label excavated under the line to The second data label of the data object is obtained in second data label mapping table, for the data object is aggregated to accordingly Same class data object set in, lack the second data label it is possible to further which second data label is write into this Data object in.
Step S410, the rule based on configuration extracts the first data label in multiple data objects.Rule based on configuration In heading message, attribute information then in data file etc., the first data label of data object is extracted.For example, configuration canonical Expression formula extracts the first data label of certain data object.
Step S412, to multiple first data label normalizeds of extraction.For example, phase is included in a data object Same data number, such as " 1110 ", and different son numbering such as " 001 ", " 002 ", then sub- numbering is removed, to reach the first number According to the normalization of label:" 1110-001 " and " 1110-002 " is normalized under " 1110 ", or is normalized to identical in the lump Under first data label " 1110-001 ".
Exemplified by searching for the commercial articles searching of mass data object, the first data label extracted in data object commodity is such as: Article No., based on separator cutting, it is " 537889 " and " 001 " two parts that article No. " 537889-001 ", which is based on "-" cutting, by main goods Number i.e. " 537889 " above are considered as the article No. after normalization.
Step S414, the normalization of the first data label synonym.After the first data label normalized, based under line The the first data label synonym table excavated, the first data label of the data object of extraction is rewritten, and unified is one First data label.For example, when the first data label " 1110 " occurred 500 times in the second data label classification distribution table, The synonym " 1111 " of first data label " 1110 " occurs 20 times in the classification distribution table of data object, then can be by One data label " 1111 " is changed to the first data label " 1110 ".
Each table based on data mining under line is operated on line, each data object is based in its data label most frequently The first data label and the second data label existed, is integrated, and the second data label of synonym each other or first are counted It is normalized according to label, according to the second data label table, the second data label synonym table, the first data label synonym Table, the second data label classification distribution table, the first data label are to the second data label mapping table, it is determined that in a certain class now, Which data object should be integrated into homogeneous data object, and its unified second data label and the first data label, in order to Search matching.
According to the data mining under line, and normalization, completion processing on line, data object, the first data can be obtained Mapping relations between label, the second data label this three, can form data object mapping table, so as in data search When, homogeneous data object is searched for according to the data object mapping table.
Obtained data object mapping table is integrated in step S204, storage in advance.The storage includes storage and passed through under line Excavate with the data object mapping table after the integration obtained after completion normalization on line in database.
In the advance integration processing procedure to data object, the second data label table, the second data label classification are formed Distribution table, the second data label synonym table, the first data label synonym table, the first data label to the second data label reflect Firing table, and by described various tables(Set)It is stored in database, so, can be searched in data as the pre- result for integrating processing Called at any time in rope, to improve system operations speed.
By the method integrated in advance to data object of the application, associated being set up between homogeneous data object, And homogeneous data object is shown in search result, more fully data can be provided and for users to use, improve data and search The accuracy and return rate of rope, so as to improve the quality of search result.
Present invention also provides a kind of device for the data search integrated based on homogeneous data.
As shown in figure 5, the structure drawing of device of the data search integrated based on homogeneous data for the embodiment of the present application.
According in device 500 described herein, can include receiving with search module 501, acquisition module 503, With module 505, integrate with returning to module 507.The implementation of each step of the modules correspondence above method.
Wherein, receive and search module 501, for receiving the searching request from user, and perform search, wherein, institute Stating searching request is used for one or more numbers that search matches with the searching request in all data objects to be searched According to object.
Acquisition module 503, for by analyzing each data in the one or more of data objects searched Object, obtains the data label of each data object, wherein, the data label includes the first data label and second Data label, the first data label attributive character different with the second data label difference identified data object.So, it is described to obtain Modulus block 503 can be used for the first data label and/or the second data label for obtaining each data object.
Matching module 505, for the data label according to acquisition(First data label and/or second data Label)Matched, i.e. further matching is done to each data object in one or more data objects for searching, To obtain one or more homogeneous data objects corresponding with each described data object.
Integrate with returning to module 507, for by the data label(First data label and/or the second data mark Label)The one or more data objects matched are integrated into homogeneous data object composition, and are back to user as search result. Wherein, the homogeneous data combination includes:Multiple data objects of homogeneous data object each other, can be with search results pages Show one of them of multiple data objects in the homogeneous data combination.
In device 500 described herein, in addition to pretreatment module 509 and memory module 511.
Wherein, pretreatment module 509, for carrying out advance integration processing to all data objects to be searched, it is determined that often The corresponding one or more homogeneous data objects of one data object to be searched, to obtain data object mapping relations Table.
Specifically, all data objects to be searched of 509 pairs of the pretreatment module enter under line data mining and line On data object normalization and map.
In the data mining under entering line, the pretreatment module 509 can be to the second data in each data object Label and the second data label classification distribution table carry out excavation processing.
The pretreatment module 509 can carry out the second data label digging to the second data label in each data object Pick, generates the set of the second data label synonym of all data objects.Wherein, the second data label synonym bag Include:Identical class now, with different second data labels and with identical first data label multiple data objects.
The pretreatment module 509 can carry out the first data label digging to the first data label in each data object Pick, generates the first data label TongYiCi CiLin of all data objects.Wherein, the first data label synonym includes: Multiple the first similar data labels in same data object.
The pretreatment module 509 can be dug to the first data label and the second data label in each data object Pick, generates the mapping relations of the first data label to the second data label.Specifically, if a data object only has one One data label and first data label only has co-occurrence with unique second data label, then set up first data label With the mapping relations of second data label.
When data object on line is entered is normalized and mapped, the pretreatment module 509 is configured to:Extract same number According to the data label of one or more of object second, to obtain one or more data labels of candidate second, and to extraction One or more data labels of candidate second carry out disambiguation.Further, the classification distribution table based on the second data label, is obtained The number of times that the data label of candidate second occurs in the classification, if number of times is more than default threshold value, then it is assumed that be the data Second data label of object.In another embodiment, if multiple data labels of candidate second occurs in a data object, select Select in the second data label classification distribution table, most second data label of occurrence number is used as the data pair The second data label of elephant.
Pretreatment module 509 is further configured to:Rule based on configuration, extracts the first data mark in multiple data objects Label, and to multiple first data label normalizeds of extraction;By the second data label or the first data of synonym each other Label is normalized;According to the mapping relations of the first data label of structure and the second data label, to lacking the second data The data object of label, carries out the second data label completion.
The purpose of pretreatment module 509 is to all data objects to be searched(Mass data object)Carry out in advance Integration is handled, and to obtain homogeneous data object, the homogeneous data object is included in identical class now, with identical or synonymous Second data label and multiple data objects with the first identical or synonymous data label.Also, at advance integration During reason, data object, the first data label, the mapping relations of the second data label can be obtained, data object is formed Mapping table.
Memory module 511, obtained data object mapping table is integrated for storing in advance.Carrying out data search When, homogeneous data pair corresponding with the data object searched can directly be matched by the data object mapping table As.
So, using key character/attributes such as the first data label of data object, the second data labels, in advance to sea Measure data object carry out classification integration, and between homogeneous data object set up association, improve data search accuracy and Return rate, so as to improve the quality of search result.
Also, the multiple homogeneous data objects for returning to search engine carry out integration processing, it is possible in search results pages In only show one in the plurality of homogeneous data object, search results pages is shown a greater variety of data objects, add The diversity of search result, better user experience.
Embodiment and the side of the application due to the modules included by the device of the application described by Fig. 5 The embodiment of step in method is corresponding, due to Fig. 1-Fig. 4 being described in detail, so in order to The application is not obscured, the detail no longer to modules is described herein.
Each embodiment in this specification is typically described by the way of progressive, and what each embodiment was stressed is With the difference of other embodiment, between each embodiment identical similar part mutually referring to.
The application can be described in the general context of computer executable instructions, such as program Module or unit.Usually, program module or unit can include performing particular task or realize particular abstract data type Routine, program, object, component, data structure etc..In general, program module or unit can be by softwares, hardware or both Combination realize.The application can also be put into practice in a distributed computing environment, in these DCEs, by passing through Communication network and connected remote processing devices perform task.In a distributed computing environment, program module or unit can With positioned at including in the local and remote computer-readable storage medium including storage device.
Finally, in addition it is also necessary to explanation, term " comprising ", "comprising" or its any other variant are intended to non-exclusive Property include so that process, method, commodity or equipment including a series of key elements not only include those key elements, and Also include other key elements for being not expressly set out, or also include for this process, method, commodity or equipment inherently Key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that including described Also there is other identical element in process, method, commodity or the equipment of key element.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the application can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the application can be used in one or more computers for wherein including computer usable program code Usable storage medium(Including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)The computer program production of upper implementation The form of product.
Specific case used herein is set forth to the principle and embodiment of the application, and above example is said It is bright to be only intended to help and understand the present processes and its main thought;Simultaneously for those of ordinary skill in the art, foundation The thought of the application, will change in specific embodiments and applications, in summary, and this specification content is not It is interpreted as the limitation to the application.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) And/or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory.Internal memory is computer-readable Jie The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include the data-signal and carrier wave of non-temporary computer readable media (transitory media), such as modulation.

Claims (17)

1. a kind of method based on the data search integrated to homogeneous data object, it is characterised in that including:
The searching request from user is received, search matches with the searching request in all data objects to be searched One or more data objects;
Each in the one or more of data objects searched is analyzed, to obtain the number of each data object According to label;
The data label of acquisition is matched with data object to be searched, matched with obtaining with the data label One or more similar data objects;
One or more data objects that the data label matches are integrated into homogeneous data object composition, and are used as search As a result it is back to user.
2. according to the method described in claim 1, it is characterised in that the data label includes the first data label and the second number According to label, the first data label attributive character different with the second data label difference identified data object.
3. method according to claim 2, it is characterised in that also include:It is whole in advance to all data objects to be searched Conjunction is handled, to determine one or more homogeneous data objects that each described data object to be searched is corresponding, to obtain Data object mapping table.
4. method according to claim 3, it is characterised in that to all data objects to be searched, advance integration is handled, Including:
Excavation processing is carried out to the second data label in each data object and the second data label classification distribution table;
Second data label excavation is carried out to the second data label in each data object, the second number of all data objects is generated According to the set of label synonym;
First data label excavation is carried out to the first data label in each data object, the first number of all data objects is generated According to label TongYiCi CiLin;
The first data label and the second data label in each data object is excavated, the first data label of generation to second The mapping relations of data label.
5. method according to claim 4, it is characterised in that
The second data label synonym includes:Identical class now, with different second data labels and with identical first Multiple data objects of data label;
The first data label synonym includes:Multiple the first similar data labels in same data object.
6. method according to claim 4, it is characterised in that counted to the first data label in each data object and second Excavated according to label, generate the mapping relations of the first data label to the second data label, including:If a data object is only With the presence of first data label and first data label only with unique second data label co-occurrence, then set up described The mapping relations of first data label and second data label.
7. method according to claim 3, it is characterised in that to all data objects to be searched, advance integration is handled, Including:
The data label of one or more of same data object second is extracted, to obtain one or more the second data of candidate marks Label, and disambiguation is carried out to the data label of one or more candidates second of extraction;
Rule based on configuration, extracts the first data label in multiple data objects, and to multiple first data marks of extraction Sign normalized;
The second data label or the first data label of synonym each other are normalized;
According to the mapping relations of the first data label of structure and the second data label, the data pair to lacking the second data label As carrying out the second data label completion.
8. method according to claim 7, it is characterised in that enter to the data label of one or more candidates second of extraction Row disambiguation, including:
Classification distribution table based on the second data label, obtains time that the data label of candidate second occurs in the classification Number, if number of times is more than default threshold value, then it is assumed that be the second data label of the data object;And/or, if a data object There are multiple data labels of candidate second, then select in the second data label classification distribution table, most one of occurrence number Second data label as the data object the second data label.
9. the method as described in claim 1, it is characterised in that including:
In search results pages, one of them of multiple data objects in the homogeneous data combination is shown, wherein, it is described same The combination of class data includes:Multiple data objects of homogeneous data object each other.
10. method according to claim 2, it is characterised in that the homogeneous data object includes:In identical class now, Multiple data objects with the second identical or synonymous data label and with the first identical or synonymous data label.
11. a kind of device based on the data search integrated to homogeneous data object, it is characterised in that including:
Receive and search module, for receiving the searching request from user, in all data objects to be searched search with One or more data objects that the searching request matches;
Acquisition module, for analyzing each in the one or more of data objects searched, to obtain each institute State the data label of data object;
Matching module, is matched for the data label to acquisition with data object to be searched, with obtain with it is described One or more similar data objects that data label matches;
Integrate with returning to module, for one or more data objects that the data label matches to be integrated into homogeneous data Object composition, and it is back to user as search result.
12. device according to claim 11, it is characterised in that the data label includes the first data label and second Data label, the first data label attributive character different with the second data label difference identified data object.
13. device according to claim 12, it is characterised in that also include:
Pretreatment module, for all data objects to be searched, advance integration processing, to determine that each is described to be searched The corresponding one or more homogeneous data objects of data object, to obtain data object mapping table.
14. device according to claim 13, it is characterised in that the pretreatment module is further configured to:
Excavation processing is carried out to the second data label in each data object and the second data label classification distribution table;
Second data label excavation is carried out to the second data label in each data object, the second number of all data objects is generated According to the set of label synonym;
First data label excavation is carried out to the first data label in each data object, the first number of all data objects is generated According to label TongYiCi CiLin;
The first data label and the second data label in each data object is excavated, the first data label of generation to second The mapping relations of data label;
If a data object only have first data label and first data label only with unique second data mark There is co-occurrence in label, then set up the mapping relations of first data label and second data label.
15. device according to claim 14, it is characterised in that the second data label synonym includes:It is mutually similar Now, multiple data objects with different second data labels and with identical first data label;The first data mark Label synonym includes:Multiple the first similar data labels in same data object.
16. device according to claim 13, it is characterised in that the pretreatment module is further configured to:
The data label of one or more of same data object second is extracted, to obtain one or more the second data of candidate marks Label, and disambiguation is carried out to the data label of one or more candidates second of extraction;
Rule based on configuration, extracts the first data label in multiple data objects, and to multiple first data marks of extraction Sign normalized;
The second data label or the first data label of synonym each other are normalized;
According to the mapping relations of the first data label of structure and the second data label, the data pair to lacking the second data label As carrying out the second data label completion;
Classification distribution table based on the second data label, obtains time that the data label of candidate second occurs in the classification Number, if number of times is more than default threshold value, then it is assumed that be the second data label of the data object;And/or
If multiple data labels of candidate second occurs in a data object, select in the second data label classification distribution table, go out Most second data label of occurrence number as the data object the second data label.
17. device according to claim 12, it is characterised in that the integration is further configured to returning to module:
In search results pages, one of them of multiple data objects in the homogeneous data combination is shown, wherein, it is described same The combination of class data includes:Multiple data objects of homogeneous data object each other;
The homogeneous data object includes:In identical class now, with the second identical or synonymous data label and with phase Multiple data objects of the first same or synonymous data label.
CN201310182427.3A 2013-05-16 2013-05-16 Method and apparatus based on the data search integrated to homogeneous data object Active CN104166651B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711064889.XA CN107844565B (en) 2013-05-16 2013-05-16 Commodity searching method and device
CN201310182427.3A CN104166651B (en) 2013-05-16 2013-05-16 Method and apparatus based on the data search integrated to homogeneous data object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310182427.3A CN104166651B (en) 2013-05-16 2013-05-16 Method and apparatus based on the data search integrated to homogeneous data object

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201711064889.XA Division CN107844565B (en) 2013-05-16 2013-05-16 Commodity searching method and device

Publications (2)

Publication Number Publication Date
CN104166651A CN104166651A (en) 2014-11-26
CN104166651B true CN104166651B (en) 2017-10-13

Family

ID=51910470

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201310182427.3A Active CN104166651B (en) 2013-05-16 2013-05-16 Method and apparatus based on the data search integrated to homogeneous data object
CN201711064889.XA Active CN107844565B (en) 2013-05-16 2013-05-16 Commodity searching method and device

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201711064889.XA Active CN107844565B (en) 2013-05-16 2013-05-16 Commodity searching method and device

Country Status (1)

Country Link
CN (2) CN104166651B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462316B (en) * 2014-12-01 2017-09-26 苏州朗米尔照明科技有限公司 A kind of tag match method
CN105786922B (en) * 2014-12-25 2020-02-14 高德软件有限公司 Method and device for determining missing electronic map data
CN104794003B (en) * 2015-02-04 2019-06-04 汉鼎宇佑互联网股份有限公司 It is a kind of to integrate real-time and non-real-time mode big data analysis system
CN106033457B (en) * 2015-03-18 2019-10-18 阿里巴巴集团控股有限公司 A kind of method and apparatus of the attribute information of determining fruit objective attribute target attribute
CN105138637A (en) * 2015-08-24 2015-12-09 浪潮软件股份有限公司 Data processing method and device
CN105279277A (en) * 2015-11-12 2016-01-27 百度在线网络技术(北京)有限公司 Knowledge data processing method and device
CN107451141B (en) * 2016-05-30 2021-01-29 阿里巴巴集团控股有限公司 Data recommendation processing interaction method, device and system
CN106372191A (en) * 2016-08-31 2017-02-01 广东华邦云计算股份有限公司 Data search method and device
US10289309B2 (en) 2016-09-12 2019-05-14 Toshiba Memory Corporation Automatic detection of multiple streams
US10542089B2 (en) 2017-03-10 2020-01-21 Toshiba Memory Corporation Large scale implementation of a plurality of open channel solid state drives
US10073640B1 (en) 2017-03-10 2018-09-11 Toshiba Memory Corporation Large scale implementation of a plurality of open channel solid state drives
CN108415886B (en) * 2018-03-07 2019-04-05 清华大学 A kind of data label error correction method and device based on production process
CN109063222B (en) * 2018-11-04 2021-11-30 朗威寰球(北京)科技集团有限公司 Self-adaptive data searching method based on big data
CN109558468B (en) * 2018-12-13 2022-04-01 北京百度网讯科技有限公司 Resource processing method, device, equipment and storage medium
CN110059967B (en) * 2019-04-23 2021-02-23 北京相数科技有限公司 Data processing method and device applied to city aid decision analysis
CN110263240A (en) * 2019-06-18 2019-09-20 深圳市酷开网络科技有限公司 Integration searching method, device and the computer readable storage medium of information
CN110516140A (en) * 2019-08-15 2019-11-29 北京泰迪熊移动科技有限公司 A kind of information processing method, equipment and computer storage medium
CN111667235A (en) * 2020-05-18 2020-09-15 上海兴亚报关有限公司 Customs clearance information management method, system, device and storage medium
CN113763081A (en) * 2020-08-26 2021-12-07 北京沃东天骏信息技术有限公司 Article recall method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890686A (en) * 2011-07-21 2013-01-23 腾讯科技(深圳)有限公司 Method and system for showing commodity search result
CN102915498A (en) * 2011-08-03 2013-02-06 腾讯科技(深圳)有限公司 Method and device for goods classification of e-commerce platform
CN102930038A (en) * 2012-11-12 2013-02-13 江苏外博资讯有限公司 Combined method of search result similar items and system of the same
CN103020207A (en) * 2012-12-06 2013-04-03 优视科技有限公司 Browser label page grouping management method and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100776697B1 (en) * 2006-01-05 2007-11-16 주식회사 인터파크지마켓 Method for searching products intelligently based on analysis of customer's purchasing behavior and system therefor
CN101206674A (en) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 Enhancement type related search system and method using commercial articles as medium
US8200682B2 (en) * 2008-04-22 2012-06-12 Uc4 Software Gmbh Method of detecting a reference sequence of events in a sample sequence of events
CN101887436B (en) * 2009-05-12 2013-08-21 阿里巴巴集团控股有限公司 Retrieval method and device
CN101963966A (en) * 2009-07-24 2011-02-02 李占胜 Method for sorting search results by adding labels into search results
CN102207963A (en) * 2011-05-30 2011-10-05 何吴迪 Post-search instant intelligent navigation technique for cloud computing window platform
CN103092856B (en) * 2011-10-31 2015-09-23 阿里巴巴集团控股有限公司 Search result ordering method and equipment, searching method and equipment
CN102902806B (en) * 2012-10-17 2016-02-10 深圳市宜搜科技发展有限公司 A kind of method and system utilizing search engine to carry out query expansion
CN103106240A (en) * 2012-12-12 2013-05-15 江苏乐买到网络科技有限公司 Method of searching products in online shopping

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890686A (en) * 2011-07-21 2013-01-23 腾讯科技(深圳)有限公司 Method and system for showing commodity search result
CN102915498A (en) * 2011-08-03 2013-02-06 腾讯科技(深圳)有限公司 Method and device for goods classification of e-commerce platform
CN102930038A (en) * 2012-11-12 2013-02-13 江苏外博资讯有限公司 Combined method of search result similar items and system of the same
CN103020207A (en) * 2012-12-06 2013-04-03 优视科技有限公司 Browser label page grouping management method and device

Also Published As

Publication number Publication date
CN104166651A (en) 2014-11-26
CN107844565B (en) 2021-07-16
CN107844565A (en) 2018-03-27

Similar Documents

Publication Publication Date Title
CN104166651B (en) Method and apparatus based on the data search integrated to homogeneous data object
CN102725759B (en) For the semantic directory of Search Results
US9323731B1 (en) Data extraction using templates
CN103678335B (en) The method of method, apparatus and the commodity navigation of commodity sign label
CN104331446B (en) A kind of massive data processing method mapped based on internal memory
CN105760495B (en) A kind of knowledge based map carries out exploratory searching method for bug problem
US20130290319A1 (en) Performing application searches
CN105045901A (en) Search keyword push method and device
CN106815307A (en) Public Culture knowledge mapping platform and its use method
CN105159930A (en) Search keyword pushing method and apparatus
CN105302810A (en) Information search method and apparatus
Babu et al. Improving Quality of Content Based Image Retrieval with Graph Based Ranking
CN110008306A (en) A kind of data relationship analysis method, device and data service system
WO2010008488A1 (en) Method and system for dynamically generating a search result
CN103838754A (en) Information searching device and method
EP2529323A2 (en) Improved searching using semantic keys
US8700624B1 (en) Collaborative search apps platform for web search
CN103995828B (en) A kind of cloud storage daily record data analysis method
CN105138637A (en) Data processing method and device
US20140075299A1 (en) Systems and methods for generating extraction models
CN105404677A (en) Tree structure based retrieval method
CN103853771B (en) A kind of method for pushing and system of search result
JP5324677B2 (en) Similar document search support device and similar document search support program
CN102902705B (en) Ambiguity in location data
JP5780036B2 (en) Extraction program, extraction method and extraction apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant