Summary of the invention
In consideration of it, the application provides method and system, the processing equipment of a kind of determining object temperature, same class can be merged
The object of other multiple data sources, to realize the purpose of all object order lists in determining same category data source.
To achieve the goals above, the application provides following technical characteristic:
A kind of temperature computing system, comprising: processing equipment and the multiple data sources being connected with the processing equipment;
The processing equipment is standardized for the temperature to object in multiple data sources, to multiple same objects into
Row duplicate removal merges, to obtain the heat of multiple duplicate removal residue objects and multiple duplicate removal residue objects after merging multiple data sources
Degree executes sorting operation to the multiple duplicate removal residue object according to temperature.
Optionally, the processing equipment, the ranking results after being also used to obtain sorting operation, and send the ranking results
To the multiple data source;
The multiple data source, for receiving and showing the ranking results.
A kind of method of determining object temperature, comprising:
The temperature of object in multiple data sources is standardized;
Duplicate removal merging is carried out to multiple same objects, with obtain multiple duplicate removal residue objects after merging multiple data sources with
And the temperature of multiple duplicate removal residue objects;
Sorting operation is executed to the multiple duplicate removal residue object according to temperature.
Optionally, before the temperature to object in multiple data sources is standardized, further includes:
Obtain multiple preset attributes of object and the attribute value of multiple preset attributes in data source;
The temperature attribute for indicating temperature is determined in multiple preset attributes;
By the corresponding attribute value of the temperature attribute, it is determined as original hot value of object.
Optionally, the temperature attribute determined in multiple preset attributes for indicating temperature, comprising:
Calculate the null value rate of multiple preset attributes and the uniformity of multiple preset attributes in data source;
Null value rate in data source is greater than the first preset value and the uniformity is greater than the preset attribute of the second preset value, is determined as
The temperature attribute of data source.
Optionally, the temperature attribute determined in multiple preset attributes for indicating temperature, comprising:
Calculate the null value rate of multiple preset attributes in data source;
By null value rate in data source less than the preset attribute of the first preset value, it is determined as the temperature attribute of data source.
Optionally, described that duplicate removal merging is carried out to multiple same objects, it multiple is gone with obtain after merging multiple data sources
The temperature of weight residue object and multiple duplicate removal residue objects, comprising:
During merging multiple data sources, the similarity in multiple data sources between object is calculated, similarity is greater than
Multiple same objects of preset threshold form object set, to obtain multiple object sets;
Duplicate removal is carried out to each object set, merges the duplicate removal residue object that each object set only retains;
Obtain the temperature of multiple duplicate removal residue objects and multiple duplicate removal residue objects.
Optionally, the temperature to object in multiple data sources is standardized, comprising:
Maximum hot value and minimum hot value are determined in multiple data sources;
First difference of original hot value of computing object and minimum hot value, and, calculate the maximum hot value with
Second difference of the minimum hot value, the standard temperature by the quotient of first difference and second difference, as object
Value;
Utilize original hot value of the standard hot value upgating object.
Optionally, further includes:
Ranking results after obtaining sorting operation;
At most a data source of the ranking results is sent respectively.
Optionally, temperature includes: the broadcasting frequency of object, the broadcasting time of object, the broadcasting number of object or object
Collect number.
A kind of processing equipment, comprising:
Communication module, for obtaining the temperature of object in multiple data sources;
Processor is standardized for the temperature to object in multiple data sources, carries out duplicate removal to multiple same objects
Merge, to obtain the temperature of multiple duplicate removal residue objects and multiple duplicate removal residue objects after merging multiple data sources, according to
Temperature executes sorting operation to the multiple duplicate removal residue object.
By the above technological means, may be implemented it is following the utility model has the advantages that
This application provides a kind of object temperature calculation methods, since multiple data sources are used for the attribute of computing object temperature
Difference, in order to enable the object temperature of multiple data sources is comparable, the application carries out standard to the temperature of multiple data sources
Change.
In order to obtain the temperature of all objects in same category, the object that the application merges in multiple data sources (merged
An object only retained for multiple same objects in journey, deletes repeating objects), obtain multiple duplicate removal residue objects and multiple
The temperature of duplicate removal residue object.Multiple duplicate removal residue objects are all objects in same category, it is subsequent can be to all objects heat
Degree executes sorting operation.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
Term is explained:
Data source: as the term suggests being the source of data, it is to provide the device or original media of certain required data.
Temperature: indicating the degree that object is popular with users in data source in this application, for example, can be broadcast using object
Number, frequency for putting etc. indicate that user likes the degree of object.
Null value rate: data value is the object number of null value and the quotient of all object numbers in a group objects data.
The uniformity: the uniformity coefficient of number of objects in each grade of an attribute is indicated.
For the ease of skilled in the art realises that the application application scenarios, this application provides a kind of temperature calculate be
System.It referring to Fig. 1, specifically includes: processing equipment 100 and the multiple data sources 200 being connected with the processing equipment.
It is just of practical significance it is understood that being ranked up in practical situations to same category object, i.e., to video
Each video of classification is ranked up, each song of song classification is ranked up etc..Multiple data sources in the application
It can be that the source of same category object is provided.For example, the video website of video is provided, alternatively, providing the music website of song
Etc..
On the basis of the temperature computing system that Fig. 1 is provided, according to one embodiment of the application, a kind of determination is provided
The method of object temperature.As shown in Fig. 2, specifically includes the following steps:
Step S201: processing equipment 100 determines the temperature of each object in each data source.
In order to realize the purpose of all object order lists in each data source of determining same category, inventor moving party
Case: merging the object of same category of each data source, obtains the temperature of all objects in same category.For this purpose, firstly the need of
Determine the temperature of each object in each data source.
It may comprise steps of according to one embodiment provided by the present application referring to Fig. 3:
Step S2011: processing equipment 100 obtains the object information of multiple objects in each data source;Wherein, described right
Image information includes the attribute value of multiple preset attributes and each preset attribute.
Processing equipment 100 can predefine multiple preset attributes for computing object temperature, for example, broadcasting time,
Broadcasting number, collection number, searching times etc..
It is understood that each data source can count the attribute value of some attributes.Processing equipment 100 and each data source
With data connection, so processing equipment 100 can use the object mark that crawler mode obtains each object from each data source
Know, and, the attribute value of multiple preset attributes of each object.
Each data source may not statistical disposition equipment setting each preset attribute attribute value.Therefore, when one
In the case that data source does not count the attribute value of a preset attribute, the attribute of the preset attribute of all objects in the data source
Value is null value.
Processing equipment 100 is that data source identification is arranged in each data source, multiple object informations in obtaining each data source
Afterwards, by the storage corresponding with each data source identification of multiple object informations of each data source.
Referring to table 1, the example of multiple object informations of each data source is stored for processing equipment 100.
Table 1
Optionally, the meeting of processing equipment 100 carries out data cleansing to multiple object informations of each data source, each to guarantee
The rationalization of multiple object informations in data source.Carrying out cleaning to data may include:
First, suppressing exception attribute value.Having some attribute values in each data source may be exceptional value, and therefore, judgement is each
Whether a attribute value is in preset zone of reasonableness, if not in suppressing exception attribute value.
Second, the data format of attribute value is standardized.Due to the data format disunity of each data source, in order to
Convenient for subsequent processing, the data format of each attribute value can be carried out to unification.
Third deletes redundant data value.Some data sources in order to protect data, may back up data.This reality
It applies regular meeting and deletes redundant data value.
It can also include other contents, herein not it is understood that being cleaned to the object information of each data source
It enumerates again.
Step S2012: processing equipment 100 is that each data source determines temperature attribute in multiple preset attributes respectively.
Each data source has the attribute for computing object temperature originally, and some is using broadcasting time as computing object
The temperature attribute of temperature, some for that will play temperature attribute of the number as computing object temperature, make for that will collect number by some
For computing object temperature attribute, etc..
Therefore, processing equipment 100 can redefine temperature attribute for each data source, to make each data source determine heat
The index of degree attribute is unified (index is unified, but the temperature attribute for determining temperature can be different).
The application provides the two indices for determining temperature attribute:
First index: the minimum preset attribute of null value rate.
By taking a preset attribute of a data source as an example, processing equipment 100 can count the preset attribute in the data source
In do not have data value (null value) object number, and, all object numbers in the data source, then calculate both quotient
Quotient, is determined as the null value rate of the preset attribute by value.
The null value rate of one preset attribute is lower, indicates that the number of objects in the preset attribute with attribute value is more.Make
The object temperature of more object can be calculated with the preset attribute computing object temperature.
Second index: the highest attribute of the uniformity.
By taking a preset attribute of a data source as an example, processing equipment 100 can each preset attribute minimum value with
Several grades are set between maximum value.Processing equipment 100 can distinguish the number of objects that statistical attribute value is located at each grade.
If the number of objects in each grade more tends to be identical, the uniformity is higher.
In this step, processing equipment 100 can calculate the null value rate of each preset attribute in each data source and each pre-
If the uniformity of attribute;Then, null value rate in each data source is greater than the first preset value and the uniformity is greater than the second preset value
Preset attribute, be determined as the temperature attribute of each data source.
Wherein, the first preset value and the second preset value are respectively preset data value, and specific data value can basis
Actual conditions specifically determine, it is not limited here.
Step S2013: by the corresponding hot value of temperature attribute of each object in each data source, it is determined as each object
Temperature.
Fig. 2 is returned to, enter step S202: processing equipment 100 is standardized the temperature of object in multiple data sources.
That is, being standardized acquisition standard hot value to the temperature of each object in each data source respectively, the standard hot of object is utilized
Original hot value of angle value upgating object.
Since the object information in each data source is not quite similar, so, it is used to calculate heat based on what identical index determined
The temperature attribute of degree is also not quite similar.Some data sources are using broadcasting time as temperature attribute, and some data sources are using use
Broadcasting frequency is as temperature attribute, etc..
Since the attribute for being used for computing object temperature in each data source is inconsistent, so the object temperature of each data source
Without comparativity.Therefore, can the object temperature to each data source be standardized.
The present embodiment can be standardized the object temperature of each data source using deviation standardization.Referring to fig. 4, have
Body process may include:
Step S2021: maximum hot value and minimum hot value are determined in each data source.
In all objects that each data source is included, each object temperature is ranked up, obtains maximum hot value
With minimum hot value.
Step S2022: the first difference of the original hot value of computing object and minimum hot value, and, calculate the maximum
Second difference of hot value and the minimum hot value, by the quotient of first difference and second difference, as object
Standard hot value.
Step S2023: original hot value of the standard hot value upgating object is utilized.
Present embodiments provide a kind of mode of data normalization, it is to be understood that also there are many standardize to data
Mode, such as: such as extremum method, standard deviation method, tri linear method, the distribution of half normality will not enumerate herein.To data
The specific implementation procedure being standardized has been mature technology, and details are not described herein.
It is then returned to Fig. 2, enters step S203: duplicate removal merging is carried out to multiple same objects, merges multiple numbers to obtain
According to the temperature of multiple duplicate removal residue objects and multiple duplicate removal residue objects behind source.
After being standardized to each object temperature in each data source, the object of each data source can be merged.
Since an object can reside in multiple data sources (for example, comprising same song in multiple music websites), so
The case where multiple data sources include same object will necessarily be encountered.
According to one embodiment of the application, referring to Fig. 5, the implementation procedure of this step includes:
Step S2031: during merging multiple data sources, calculating the similarity in multiple data sources between object, will
Multiple same objects that similarity is greater than preset threshold form object set, to obtain multiple object sets.
Object information includes the index of object similarity, may include: program abstract, creator, creation company, issue date
The indexs such as phase, region, languages.For an object of a data source, based on index of similarity computing object and other data
The similarity of each object in source.
Then, multiple objects of default similarity will be greater than in other data sources with the similarity of the object, be determined as with
The object is same object.Same object in each data source can form an object set, which includes multiple phases
Same object.
It is understood that the object except object set, there is no the object of same object in each data source.
Step S2032: duplicate removal is carried out to each object set, it is right to merge the duplicate removal residue that each object set only retains
As.
Since object set includes multiple same objects, each object is identical but object temperature is not quite similar.Therefore, Ke Yitong
The temperature of one same object.
The mode of unified same object temperature may include: that temperature maximum value in object set is determined as object after reunification
Temperature, alternatively, temperature average value in object set is determined as object temperature after reunification, etc..
For each object set, duplicate removal is carried out to object set and only retains the extra object of object deletion.In object set
The object of reservation is known as duplicate removal residue object, and the hot value of duplicate removal residue object is hot value after reunification.
Step S2033: the temperature of multiple duplicate removal residue objects and multiple duplicate removal residue objects is obtained.
During merging multiple data sources, after executing deduplication operation to object set, merges each object set and only protect
The duplicate removal residue object stayed obtains all objects in same category data source.
Be then returned to Fig. 2, enter step S204: processing equipment 100 holds the multiple duplicate removal residue object according to temperature
Row sorting operation.
Step S205: processing equipment 100 obtains ranking results after executing sorting operation, and sends ranking results to multiple
Data source.
Step S206: multiple 200 sources of data receive and show ranking results.
By the above content, having the beneficial effect that for the application can be learnt:
This application provides a kind of object temperature calculation methods, since multiple data sources are used for the attribute of computing object temperature
Difference, in order to enable the object temperature of multiple data sources is comparable, the application carries out standard to the temperature of multiple data sources
Change.
In order to obtain the temperature of all objects in same category, the object that the application merges in multiple data sources (merged
An object only retained for multiple same objects in journey, deletes repeating objects), obtain multiple duplicate removal residue objects and multiple
The temperature of duplicate removal residue object.Multiple duplicate removal residue objects are all objects in same category, it is subsequent can be to all objects heat
Degree executes sorting operation.Referring to Fig. 6, present invention also provides a kind of processing equipments, comprising:
Communication module, for obtaining the temperature of multiple objects in each data source;
Processor is standardized for the temperature to object in multiple data sources, carries out duplicate removal to multiple same objects
Merge, to obtain the temperature of multiple duplicate removal residue objects and multiple duplicate removal residue objects after merging multiple data sources, according to
Temperature executes sorting operation to the multiple duplicate removal residue object.
If function described in the present embodiment method is realized in the form of SFU software functional unit and as independent product pin
It sells or in use, can store in a storage medium readable by a compute device.Based on this understanding, the embodiment of the present application
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, this is soft
Part product is stored in a storage medium, including some instructions are used so that calculating equipment (it can be personal computer,
Server, mobile computing device or network equipment etc.) execute all or part of step of each embodiment the method for the application
Suddenly.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), deposits at random
The various media that can store program code such as access to memory (RAM, RandomAccess Memory), magnetic or disk.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with it is other
The difference of embodiment, same or similar part may refer to each other between each embodiment.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.