CN108776692A - Method and apparatus for handling information - Google Patents

Method and apparatus for handling information Download PDF

Info

Publication number
CN108776692A
CN108776692A CN201810572350.3A CN201810572350A CN108776692A CN 108776692 A CN108776692 A CN 108776692A CN 201810572350 A CN201810572350 A CN 201810572350A CN 108776692 A CN108776692 A CN 108776692A
Authority
CN
China
Prior art keywords
information
aggregate
target
preset
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810572350.3A
Other languages
Chinese (zh)
Inventor
安金龙
张宁
刘业辉
张飞
王彦明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810572350.3A priority Critical patent/CN108776692A/en
Publication of CN108776692A publication Critical patent/CN108776692A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present application discloses the method and apparatus for handling information.One specific implementation mode of this method includes:Obtain target identification data;From preset first information set, corresponding mark data and the relevant first information of target identification data are extracted, and based on the first information extracted, generate first information subclass;The second information aggregate of target is determined from preset at least one second information aggregate, and determines the quantity for the second information that the second information aggregate of target includes;It is more than preset amount threshold in response to quantification, the second information aggregate of target, which is divided into the second information subset of preset quantity, to be closed;The second information subset in being closed for the second information subset of preset quantity is closed, and it is result information set which, which is closed with first information subclass associated storage,.This embodiment improves the flexibility of information processing, contribute to data skew caused by solving the problems, such as a large amount of associated datas of processing.

Description

Method and apparatus for handling information
Technical field
The invention relates to field of computer technology, and in particular to the method and apparatus for handling information.
Background technology
With the rapid development of Internet, data show the growth of explosion type, the processing of mass data has become this The emphasis of field technology personnel research.Data correlation is the operation that Data processing often carries out, including interior connection, left outside company It connects, right outer connection, full connection etc..Since the data volume after association is huge, data skew can be caused.Data skew refers to simultaneously In the data set of row processing, the data of certain part are significantly more than other parts, so that the processing speed of the part becomes The bottleneck of entire data set processing.To solve data skew, the data volume for adjusting individual data processing task is needed, such as by one The associated data handled in a task is distributed in multiple tasks, or is sewed to the mark addition of associated data is front and back at random, will Data are broken up, then are associated.
Invention content
The embodiment of the present application proposes the method and apparatus for handling information.
In a first aspect, the embodiment of the present application provides a kind of method for handling information, this method includes:Obtain target Mark data;From preset first information set, corresponding mark data and relevant first letter of target identification data are extracted Breath, and based on the first information extracted, generate first information subclass;From preset at least one second information aggregate It determines the second information aggregate of target, and determines the quantity for the second information that the second information aggregate of target includes, wherein target the The corresponding mark data of the second information in two information aggregates is related to target identification data;It is more than in response to quantification default Amount threshold, the second information aggregate of target is divided into preset quantity the second information subset and is closed;For preset quantity the The second information subset during two information subsets are closed is closed, and it is knot which, which is closed with first information subclass associated storage, Fruit information aggregate.
In some embodiments, after the quantity for determining the second information that the second information aggregate of target includes, method is also Including:It is less than or equal to preset amount threshold in response to quantification, by first information subclass and the second information aggregate of target Associated storage is result information set.
In some embodiments, it is result information closing second information subset with first information subclass associated storage After set, method further includes:The result information set of associated storage is sent to pre-assigned, associated for handling The equipment of the result information set of storage.
In some embodiments, the second information aggregate obtains as follows in advance:Obtain the preset second original letter Breath set;For the second raw information in the second raw information set, the corresponding mark data of the second raw information is determined; Identical second raw information of identified mark data is determined as the second information, generates the second information aggregate.
In some embodiments, identical second raw information of identified mark data is being determined as the second information collection After conjunction, method further includes:Mark data based on corresponding to identified second information aggregate, to identified at least one Second information aggregate is ranked up, at least one second information aggregate after being sorted.
Second aspect, the embodiment of the present application provide a kind of device for handling information, which includes:It obtains single Member is configured to obtain target identification data;Generation unit is configured to from preset first information set, and extraction corresponds to Mark data and the relevant first information of target identification data, and based on the first information extracted, generate the first information Subclass;Determination unit is configured to determine the second information aggregate of target from preset at least one second information aggregate, with And determine the quantity for the second information that the second information aggregate of target includes, wherein the second information in the second information aggregate of target Corresponding mark data is related to target identification data;Division unit is configured in response to quantification and is more than preset number Threshold value is measured, the second information aggregate of target, which is divided into the second information subset of preset quantity, to be closed;First storage unit, is configured to The second information subset in being closed for the second information subset of preset quantity is closed, which is closed and first information Set associative is stored as result information set.
In some embodiments, device further includes:Second storage unit is configured in response to quantification and is less than or equal to First information subclass and target the second information aggregate associated storage are result information set by preset amount threshold.
In some embodiments, the first storage unit is further configured to:By the result information set of associated storage It is sent to equipment pre-assigned, for handling the associated result information set stored.
In some embodiments, the second information aggregate obtains as follows in advance:Obtain the preset second original letter Breath set;For the second raw information in the second raw information set, the corresponding mark data of the second raw information is determined; Identical second raw information of identified mark data is determined as the second information, generates the second information aggregate.
In some embodiments, identical second raw information of identified mark data is being determined as the second information collection After conjunction, further include:Mark data based on corresponding to identified second information aggregate, to identified at least one second Information aggregate is ranked up, at least one second information aggregate after being sorted.
The third aspect, the embodiment of the present application provide a kind of server, which includes:One or more processors; Storage device is stored thereon with one or more programs;When one or more programs are executed by one or more processors so that One or more processors realize the method as described in any realization method in first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method as described in any realization method in first aspect is realized when computer program is executed by processor.
Method and apparatus provided by the embodiments of the present application for handling information, by from preset first information set In, extract the relevant first information of target identification data of corresponding mark data and acquisition and the first information subset that generates It closes, being divided into multiple second information subsets in the second information aggregate that the quantity for the data for including is more than to amount threshold closes, most Associated storage first information subclass and the second information subset are closed afterwards, to improve the flexibility of information processing, help to solve Data skew problem caused by certainly handling a large amount of associated datas.
Description of the drawings
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the method for handling information of the application;
Fig. 3 is the schematic diagram according to an application scenarios of the method for handling information of the application;
Fig. 4 is the flow chart according to another embodiment of the method for handling information of the application;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for handling information of the application;
Fig. 6 is adapted for the structural schematic diagram of the computer system of the server for realizing the embodiment of the present application.
Specific implementation mode
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, is illustrated only in attached drawing and invent relevant part with related.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the method for handling information that can apply the embodiment of the present application or the device for handling information Exemplary system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted by network 104 with server 105 with using terminal equipment 101,102,103, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as web browser is answered on terminal device 101,102,103 With, shopping class application, searching class application etc..
Terminal device 101,102,103 can be hardware, can also be software.When terminal device 101,102,103 is hard Can be the various electronic equipments with data systematic function, including but not limited to smart mobile phone, tablet computer, electronics when part Book reader, pocket computer on knee and desktop computer etc..It, can be with when terminal device 101,102,103 is software In above-mentioned cited electronic equipment.Multiple softwares or software module may be implemented into (such as providing distribution in it The software or software module of formula service), single software or software module can also be implemented as.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to being generated on terminal device 101,102,103 The back-end data processing server that data are handled.Back-end data processing server can be at the data that get Reason, and by handling result (such as the first information subclass of generation and the second information subset close) associated storage.
It should be noted that the method for handling information that the embodiment of the present application is provided generally is held by server 105 Row, correspondingly, the device for handling information is generally positioned in server 105.
It should be noted that server can be hardware, can also be software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server can also be implemented as.It, can when server is software To be implemented as multiple softwares or software module (such as providing the software or software module of Distributed Services), can also realize At single software or software module.It is not specifically limited herein.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.It should be noted that the data of server process can be with It does not obtain from terminal device, is obtained from other servers, under this kind of scene, in above system framework, terminal Equipment can be substituted by other servers.In addition, in the case that the data of server process need not be from long-range obtain, it is above-mentioned System architecture can not include terminal device.
With continued reference to Fig. 2, the flow of one embodiment of the method for handling information according to the application is shown 200.The method for being used to handle information, includes the following steps:
Step 201, target identification data are obtained.
In the present embodiment, the executive agent (such as server shown in FIG. 1) of the method for handling information can lead to It crosses wired connection mode or radio connection and obtains target identification data from long-range, or target identification number is obtained from local According to.Wherein, target identification data can be the mark data that above-mentioned executive agent is extracted from preset mark data set.Mark Know data and can be used for identifying a certain category information (such as the relevant informations such as the title of certain article, the place of production, specification).Mark data Can be the data being arranged by technical staff, can also be by above-mentioned executive agent using mark generating algorithm (such as Hash calculate Method, Message Digest 5 etc.) data that the corresponding information of mark data is calculated.
Step 202, it from preset first information set, extracts corresponding mark data and target identification data is relevant The first information, and based on the first information extracted, generate first information subclass.
In the present embodiment, based on the target identification data obtained in step 201, above-mentioned executive agent can be first from pre- If first information set in, extract corresponding mark data and the relevant first information of target identification data.Wherein, the first letter Breath can have corresponding, pre-set mark data, and single mark data can correspond at least one first information. The first information can be various types of information, for example, some object (such as article, article, picture etc.) of certain website displaying Relevant information (such as title, specification etc.).It should be noted that with the relevant mark data of target identification data can be with The identical mark data of target identification data, alternatively, being to have preset correlation (for instance in same with target identification data In a mark data set) mark data.
In practice, first information set can be the list in Distributed Computing Platform (such as Spark, Hadoop etc.). Every first information in the list can have mark data.For example, mark data can be certain class (such as drinks) commodity Number, which can correspond at least one first information, for every first in above-mentioned at least one first information Information, the first information may include at least one following information of the commodity:(such as a first information is for title, specification " A boards wine ", another first information are " B boards wine ") etc..
Then, above-mentioned executive agent can generate first information subclass based on the first information extracted.Specifically, It may include the whole first information extracted in first information subclass, can also include the extracted part first information.
Step 203, the second information aggregate of target is determined from preset at least one second information aggregate, and determines mesh Mark the quantity for the second information that the second information aggregate includes.
In the present embodiment, above-mentioned executive agent can determine mesh first from preset at least one second information aggregate Mark the second information aggregate.Wherein, the corresponding mark data of the second information in the second information aggregate of target and target identification data It is related.In practice, the storage form of above-mentioned at least one second information aggregate can be Distributed Computing Platform (such as Spark, Hadoop etc.) in list, and above-mentioned at least one second information aggregate can be stored in a list, can also be deposited respectively The every information stored up in different lists in (such as each list stores second information aggregate) list can have Mark data.For example, mark data can be the number of certain commodity, which can correspond at least one the second information, right Every information in above-mentioned at least one the second information, second information may include that the title of the commodity, the commodity are ordered The information such as single.It should be noted that can be identical with target identification data with the relevant mark data of target identification data Mark data, alternatively, being that there is preset correlation with target identification data (for instance in the same mark data set) Mark data.
Then, above-mentioned executive agent can determine the quantity for the second information that the second information aggregate of target includes.
Step 204, it is more than preset amount threshold in response to quantification, the second information aggregate of target is divided into default The second information subset of quantity is closed.
In the present embodiment, above-mentioned executive agent can be more than preset amount threshold in response to quantification, by target Second information aggregate is divided into the second information subset of preset quantity and closes.Specifically, above-mentioned executive agent can be by the second information The second information in set is evenly distributed in above-mentioned preset quantity the second information subset conjunction, alternatively, being set according to technical staff The second information subset for setting closes the quantity for the second information for including, and the second information in the second information aggregate is distributed in each the During two information subsets are closed.
Step 205, the second information subset in being closed for the second information subset of preset quantity is closed, by second information Set is result information set with first information subclass associated storage.
In the present embodiment, the second information subset in being closed for the second information subset of preset quantity is closed, above-mentioned execution It is result information set that main body, which can close second information subset with first information subclass associated storage,.Specifically, as a result Information aggregate can be first information subclass and the union that second information subset is closed, and each result information is with corresponding Mark data, the mark data can be above-mentioned target identification data.As an example it is supposed that above-mentioned preset quantity is positive integer n, Then by executing step 205, n result information set can be generated.
In practice, in a distributed manner for computing platform Spark, in Spark platforms, it usually needs by the letter in two tables Breath is associated.The every information stored in two tables has corresponding key (key) data, and key data is the mark of information Data.The information stored in two tables can be associated (join) by key data, and the corresponding information of identical key data is true It is set to associated information.As an example, above-mentioned two table can be respectively Table A and table B, wherein Table A is used to store the kind of commodity Class, table B are used to store the order of commodity.The every information i.e. first information stored in Table A, the every information stored in table B is i.e. For the second information, and the information of mark data having the same is second information aggregate in table B.Due to being stored in table B Information content be ever-increasing, therefore, information content in table B is much larger than the information content in Table A.It is above-mentioned by executing Step 201~step 205, can generate multiple associated result information set, and each result information set can be stored in not In same subregion (partition), the result information set in each subregion can be carried out by different tasks (task) respectively It handles (such as sales statistics, Method for Sales Forecast etc.), to avoid data skew, increases the efficiency of multi-task parallel processing.It needs It is bright, in above-mentioned each subregion other than being stored with result information set, other information can also be stored with.
In some optional realization methods of the present embodiment, above-mentioned executive agent can be in response to determining that target second is believed The quantity for the second information that breath set includes is less than or equal to preset amount threshold, and first information subclass and target second are believed Breath set associative is stored as result information set.Wherein, result information set can be first information subclass and second letter The union of subclass is ceased, and each result information has corresponding mark data, which can be above-mentioned target identification Data.In practice, the task that result information set can be distributed by Distributed Computing Platform be handled, due to target the The quantity for the second information that two information aggregates include is less than or equal to preset amount threshold, therefore can be to avoid data skew.
It is a signal according to the application scenarios of the method for handling information of the present embodiment with continued reference to Fig. 3, Fig. 3 Figure.In the application scenarios of Fig. 3, server 301 obtains target identification data 302, i.e. " abc123 " first, the target identification number According to characterization alcoholic commercial articles.Then, from preset list 303, (information of every a line in list 303 is the first letter to server 301 Breath) in, extraction mark data is the first information of " abc123 ", generates first information subclass 304.Then, server 301 from (for storing at least one second information aggregate, the information of every a line in list 305 is the second letter for list 305 in list 305 Breath), determine the second information aggregate of target 306, wherein the mark data for the second information that the second information aggregate of target 306 includes It is " abc123 ".Subsequently, server 301 determines that the quantity for the second information that the second information aggregate of target 306 includes is more than The second information aggregate of target 306 is divided into preset quantity (2) second information subset and closed by preset amount threshold.Finally, Server 301 merges the conjunction of each second information subset with first information subclass 304 respectively, generates result information set 307 With 308.Server 301 can carry out result information set 307 and 308 subsequent processing (such as by 307 He of result information set 308 equipment for being sent respectively to the different tasks for executing processing result information set in Distributed Computing Platform).
The method that above-described embodiment of the application provides, by from preset first information set, extracting corresponding mark The first information subclass known the relevant first information of target identification data of data and acquisition and generated, in the data that will include Quantity be more than the second information aggregate of amount threshold and be divided into multiple second information subsets and close, the last associated storage first information Subclass and the second information subset are closed, and to improve the flexibility of information processing, help to solve to handle a large amount of associated datas Caused by data skew problem.
With further reference to Fig. 4, it illustrates the flows 400 of another embodiment of the method for handling information.The use In the flow 400 of the method for processing information, include the following steps:
Step 401, target identification data are obtained.
In the present embodiment, step 401 and the step 201 in Fig. 2 corresponding embodiments are almost the same, and which is not described herein again.
Step 402, it from preset first information set, extracts corresponding mark data and target identification data is relevant The first information, and based on the first information extracted, generate first information subclass.
In the present embodiment, step 402 and the step 202 in Fig. 2 corresponding embodiments are almost the same, and which is not described herein again.
Step 403, the second information aggregate of target is determined from preset at least one second information aggregate, and determines mesh Mark the quantity for the second information that the second information aggregate includes.
In the present embodiment, step 403 and the step 203 in Fig. 2 corresponding embodiments are almost the same, and which is not described herein again.
In some optional realization methods of the present embodiment, the second information aggregate can be by the method for handling information Executive agent (such as server shown in FIG. 1) or other executive agents obtain as follows in advance:
First, from long-range or from local obtain preset second raw information set.As an example, the second raw information can To be the order information for the commodity that certain website is sold, each order information is corresponding to a kind of commodity.
Then, for the second raw information in the second raw information set, the corresponding mark of the second raw information is determined Know data.Wherein, mark data can be the data being arranged by technical staff, or utilization Encryption Algorithm (such as Hash is calculated Method) number that is calculated of certain information (such as order information include trade name) for including to the second raw information According to.
Finally, identical second raw information of identified mark data is determined as the second information, generates the second information Set.As an example, certain mark data be " 123 ", it is " 123 " to have the mark data of 10 the second raw informations, then this 10 The collection of the second raw information of item is combined into the second information aggregate.
In some optional realization methods of the present embodiment, for generating holding for above-mentioned at least one second information aggregate Row main body, can be based on corresponding to identified second information aggregate after generating above-mentioned at least one second information aggregate Mark data is ranked up identified at least one second information aggregate, at least one second information after being sorted Set.As an example it is supposed that mark data " 001 " corresponds to the second information aggregate B, mark data " 002 " corresponds to the second information collection C is closed, mark data " 003 " corresponds to the second information aggregate A, then the sequence of at least one second information aggregate after sorting is " the Two information aggregate B, the second information aggregate C, the second information aggregate A ".It, can be in the behaviour for being subsequently associated storage by sequence It when making, is associated using the sequence of arrangement, so as to improve the efficiency of associated storage.
Step 404, it is more than preset amount threshold in response to quantification, the second information aggregate of target is divided into default The second information subset of quantity is closed.
In the present embodiment, step 404 and the step 204 in Fig. 2 corresponding embodiments are almost the same, and which is not described herein again.
Step 405, the second information subset in being closed for the second information subset of preset quantity is closed, by second information Set is result information set with first information subclass associated storage, and the result information set of associated storage is sent To equipment pre-assigned, for handling the associated result information set stored.
In the present embodiment, the second information subset in being closed for the second information subset of preset quantity is closed, above-mentioned execution It is result information set that main body, which can first close second information subset with first information subclass associated storage,.Then, on State executive agent the result information set of associated storage can be sent to it is pre-assigned, for handling associated store The equipment of result information set.
Specifically, different equipment can be respectively sent to by each result information set that step 405 generates, often A equipment handles the result information set received respectively.It should be noted that the equipment for processing result information set can Can also be software to be hardware.As an example, when the equipment for processing result information set is hardware, which can To be device clusters that the various electronic equipments communicated to connect with above-mentioned executive agent form, which is performed in parallel pair The processing task of result information set.It, can be by multiple software moulds when the equipment for processing result information set is software Block is integrated in same or different electronic equipment, and each software module is performed in parallel the processing to result information set and appoints Business.
Figure 4, it is seen that compared with the corresponding embodiments of Fig. 2, the method for handling information in the present embodiment Flow 400 highlight the step of sending generated result information set, further improve the flexibility of information processing, have Help improve the efficiency of processing data.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for handling letter One embodiment of the device of breath, the device embodiment is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.
As shown in figure 5, the device 500 for handling information of the present embodiment includes:Acquiring unit 501 is configured to obtain Take target identification data;Generation unit 502 is configured to from preset first information set, extracts corresponding mark data With the relevant first information of target identification data, and based on the first information extracted, first information subclass is generated;It determines Unit 503 is configured to determine the second information aggregate of target from preset at least one second information aggregate, and determines mesh Mark the quantity for the second information that the second information aggregate includes, wherein the corresponding mark of the second information in the second information aggregate of target It is related to target identification data to know data;Division unit 504 is configured in response to quantification and is more than preset quantity threshold The second information aggregate of target is divided into the second information subset of preset quantity and closed by value;First storage unit 505, is configured to The second information subset in being closed for the second information subset of preset quantity is closed, which is closed and first information Set associative is stored as result information set.
In the present embodiment, acquiring unit 501 can be by wired connection mode or radio connection from remotely obtaining Target identification data are taken, or target identification data are obtained from local.Wherein, target identification data can be above-mentioned acquiring unit 501 mark datas extracted from preset mark data set.Mark data can be used for identifying a certain category information (such as certain The relevant informations such as title, the place of production, the specification of kind article).Mark data can be the data being arranged by technical staff, can also be By above-mentioned apparatus 500 using mark generating algorithm (such as hash algorithm, Message Digest 5 etc.) to the corresponding letter of mark data Cease the data being calculated.
In the present embodiment, the target identification data obtained based on acquiring unit 501, above-mentioned generation unit 502 can be first First from preset first information set, corresponding mark data and the relevant first information of target identification data are extracted.Wherein, The first information can have a corresponding, pre-set mark data, and single mark data can correspond at least one the One information.The first information can be various types of information, for example, the displaying of certain website some objects (such as article, article, Picture etc.) relevant information (such as title, specification etc.).It should be noted that with the relevant mark data of target identification data Can be mark data identical with target identification data, alternatively, be with target identification data have preset correlation (such as In the same mark data set) mark data.Then, above-mentioned generation unit 502 can be based on first extracted Information generates first information subclass.Specifically, it in first information subclass may include the whole first information extracted, It can also include the extracted part first information.
In the present embodiment, determination unit 503 can determine mesh first from preset at least one second information aggregate Mark the second information aggregate.Wherein, the corresponding mark data of the second information in the second information aggregate of target and target identification data It is related.For example, mark data can be the number of certain commodity, which can correspond at least one the second information, for upper State every information in at least one the second information, second information may include the title of the commodity, the commodity order etc. Information.It should be noted that can be mark identical with target identification data with the relevant mark data of target identification data Data, alternatively, being the mark that there is preset correlation (for instance in the same mark data set) with target identification data Know data.Then, above-mentioned determination unit 503 can determine the quantity for the second information that the second information aggregate of target includes.
In the present embodiment, division unit 504 can be more than preset amount threshold in response to quantification, by target the Two information aggregates are divided into the second information subset of preset quantity and close.Specifically, above-mentioned division unit 504 can be by the second information The second information in set is evenly distributed in above-mentioned preset quantity the second information subset conjunction, alternatively, being set according to technical staff The second information subset set closes the quantity for the second information for including, each by the second information in the second information aggregate to be distributed in During second information subset is closed.
In the present embodiment, the second information subset in being closed for the second information subset of preset quantity is closed, and above-mentioned first It is result information set that storage unit 505, which can close second information subset with first information subclass associated storage,.Specifically Ground, result information set can be first information subclass and the union that second information subset is closed, and each result information has There is corresponding mark data, which can be above-mentioned target identification data.As an example it is supposed that above-mentioned preset quantity is Positive integer n can generate n result information set.
In some optional realization methods of the present embodiment, which can also include:Second storage unit (figure In be not shown), be configured in response to quantification be less than or equal to preset amount threshold, by first information subclass and target Second information aggregate associated storage is result information set.
In some optional realization methods of the present embodiment, the first storage unit 505 can be further configured to:It will The result information set of associated storage is sent to setting for pre-assigned result information set for handling associated storage It is standby.
In some optional realization methods of the present embodiment, the second information aggregate can obtain as follows in advance It arrives:Obtain preset second raw information set;For the second raw information in the second raw information set, determine this second The corresponding mark data of raw information;Identical second raw information of identified mark data is determined as the second information, it is raw At the second information aggregate.
In some optional realization methods of the present embodiment, by the identical second original letter of identified mark data Breath is determined as after the second information aggregate, can also include:Mark data based on corresponding to identified second information aggregate, Identified at least one second information aggregate is ranked up, at least one second information aggregate after being sorted.
The device that above-described embodiment of the application provides, by from preset first information set, extracting corresponding mark The first information subclass known the relevant first information of target identification data of data and acquisition and generated, in the data that will include Quantity be more than the second information aggregate of amount threshold and be divided into multiple second information subsets and close, the last associated storage first information Subclass and the second information subset are closed, and to improve the flexibility of information processing, help to solve to handle a large amount of associated datas Caused by data skew problem.
Below with reference to Fig. 6, it illustrates the computer systems 600 suitable for the server for realizing the embodiment of the present application Structural schematic diagram.Server shown in Fig. 6 is only an example, should not be to the function and use scope band of the embodiment of the present application Carry out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various actions appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
It is connected to I/O interfaces 605 with lower component:Importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 608 including hard disk etc.; And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also according to needing to be connected to I/O interfaces 605.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 610, as needed in order to be read from thereon Computer program be mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed by communications portion 609 from network, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer-readable medium either the two arbitrarily combines.Computer-readable medium for example can be --- but it is unlimited In --- electricity, system, device or the device of magnetic, optical, electromagnetic, infrared ray or semiconductor, or the arbitrary above combination.It calculates The more specific example of machine readable medium can include but is not limited to:Being electrically connected with one or more conducting wires, portable meter Calculation machine disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or The above-mentioned any appropriate combination of person.In this application, can be any include computer-readable medium or storage program has Shape medium, the program can be commanded the either device use or in connection of execution system, device.And in the application In, computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, wherein Carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but not limited to electric Magnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable Jie Any computer-readable medium other than matter, the computer-readable medium can be sent, propagated or transmitted for being held by instruction Row system, device either device use or program in connection.The program code for including on computer-readable medium It can transmit with any suitable medium, including but not limited to:Wirelessly, electric wire, optical cable, RF etc. or above-mentioned arbitrary conjunction Suitable combination.
The calculating of the operation for executing the application can be write with one or more programming languages or combinations thereof Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute on the user computer, partly execute, executed as an independent software package on the user computer, Part executes or executes on a remote computer or server completely on the remote computer on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including LAN (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).
Flow chart in attached drawing and block diagram, it is illustrated that according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part for a part for one module, program segment, or code of table, the module, program segment, or code includes one or more uses The executable instruction of the logic function as defined in realization.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it to note Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be arranged in the processor, for example, can be described as:A kind of processor packet Include acquiring unit, generation unit, determination unit, division unit and the first storage unit.Wherein, the title of these units is at certain In the case of do not constitute restriction to the unit itself, for example, acquiring unit is also described as " obtaining target identification data Unit ".
As on the other hand, present invention also provides a kind of computer-readable medium, which can be Included in server described in above-described embodiment;Can also be individualism, and without be incorporated the server in.It is above-mentioned Computer-readable medium carries one or more program, when said one or multiple programs are executed by the server, Make the server:Obtain target identification data;From preset first information set, corresponding mark data and target are extracted The relevant first information of mark data, and based on the first information extracted, generate first information subclass;From it is preset to The second information aggregate of target is determined in few second information aggregate, and determines the second letter that the second information aggregate of target includes The quantity of breath, wherein the corresponding mark data of the second information in the second information aggregate of target is related to target identification data;It rings Preset amount threshold should be more than in quantification, the second information aggregate of target is divided into the second information subset of preset quantity It closes;The second information subset in being closed for the second information subset of preset quantity is closed, which is closed and the first letter Breath subclass associated storage is result information set.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Other technical solutions of arbitrary combination and formation.Such as features described above has similar work(with (but not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (12)

1. a kind of method for handling information, including:
Obtain target identification data;
From preset first information set, corresponding mark data and relevant first letter of the target identification data are extracted Breath, and based on the first information extracted, generate first information subclass;
The second information aggregate of target is determined from preset at least one second information aggregate, and determines that the target second is believed The quantity for the second information that breath set includes, wherein the corresponding mark number of the second information in the second information aggregate of the target According to related to the target identification data;
It is more than preset amount threshold in response to the determination quantity, the second information aggregate of the target is divided into preset quantity A second information subset is closed;
The second information subset in being closed for the second information subset of the preset quantity is closed, by second information subset conjunction and institute It is result information set to state first information subclass associated storage.
2. according to the method described in claim 1, wherein, in the second letter that the determination second information aggregate of target includes After the quantity of breath, the method further includes:
It is less than or equal to preset amount threshold in response to the determination quantity, by the first information subclass and the target the Two information aggregate associated storages are result information set.
3. according to the method described in claim 1, wherein, second information subset is closed and the first information subset described Associated storage is closed as after result information set, the method further includes:
The result information set of associated storage is sent to result information collection pre-assigned, for handling associated storage The equipment of conjunction.
4. according to the method described in one of claim 1-3, wherein the second information aggregate obtains as follows in advance:
Obtain preset second raw information set;
For the second raw information in the second raw information set, the corresponding mark number of second raw information is determined According to;
Identical second raw information of identified mark data is determined as the second information, generates the second information aggregate.
5. according to the method described in claim 4, wherein, described by identical second raw information of identified mark data It is determined as after the second information aggregate, the method further includes:
Mark data based on corresponding to identified second information aggregate, to identified at least one second information aggregate into Row sequence, at least one second information aggregate after being sorted.
6. a kind of device for handling information, including:
Acquiring unit is configured to obtain target identification data;
Generation unit is configured to from preset first information set, extracts corresponding mark data and the target identification The relevant first information of data, and based on the first information extracted, generate first information subclass;
Determination unit is configured to determine the second information aggregate of target from preset at least one second information aggregate, and Determine the quantity for the second information that the second information aggregate of the target includes, wherein in the second information aggregate of the target The corresponding mark data of two information is related to the target identification data;
Division unit is configured in response to determine that the quantity is more than preset amount threshold, by the second information of the target Set is divided into the second information subset of preset quantity and closes;
First storage unit, the second information subset being configured in being closed for the second information subset of the preset quantity are closed, It is result information set that second information subset, which is closed with the first information subclass associated storage,.
7. device according to claim 6, wherein described device further includes:
Second storage unit is configured in response to determine that the quantity is less than or equal to preset amount threshold, by described first It is result information set that information subset, which is closed with the target the second information aggregate associated storage,.
8. device according to claim 6, wherein first storage unit is further configured to:
The result information set of associated storage is sent to result information collection pre-assigned, for handling associated storage The equipment of conjunction.
9. according to the device described in one of claim 6-8, wherein the second information aggregate obtains as follows in advance:
Obtain preset second raw information set;
For the second raw information in the second raw information set, the corresponding mark number of second raw information is determined According to;
Identical second raw information of identified mark data is determined as the second information, generates the second information aggregate.
10. device according to claim 9, wherein described by the identical second original letter of identified mark data Breath is determined as after the second information aggregate, further includes:
Mark data based on corresponding to identified second information aggregate, to identified at least one second information aggregate into Row sequence, at least one second information aggregate after being sorted.
11. a kind of server, including:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of processors are real The now method as described in any in claim 1-5.
12. a kind of computer-readable medium, is stored thereon with computer program, wherein the program is realized when being executed by processor Method as described in any in claim 1-5.
CN201810572350.3A 2018-06-06 2018-06-06 Method and apparatus for handling information Pending CN108776692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810572350.3A CN108776692A (en) 2018-06-06 2018-06-06 Method and apparatus for handling information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810572350.3A CN108776692A (en) 2018-06-06 2018-06-06 Method and apparatus for handling information

Publications (1)

Publication Number Publication Date
CN108776692A true CN108776692A (en) 2018-11-09

Family

ID=64025717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810572350.3A Pending CN108776692A (en) 2018-06-06 2018-06-06 Method and apparatus for handling information

Country Status (1)

Country Link
CN (1) CN108776692A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287038A (en) * 2019-06-10 2019-09-27 天翼电子商务有限公司 Promote the method and system of the data-handling efficiency of Spark Streaming frame
CN111046020A (en) * 2019-11-28 2020-04-21 泰康保险集团股份有限公司 Information processing method and device, storage medium and electronic equipment
CN111694932A (en) * 2019-03-13 2020-09-22 百度在线网络技术(北京)有限公司 Conversation method and device
CN112183986A (en) * 2020-09-21 2021-01-05 北京每日优鲜电子商务有限公司 Operation index information encryption method, device, electronic equipment and medium
CN112905596A (en) * 2021-03-05 2021-06-04 北京中经惠众科技有限公司 Data processing method and device, computer equipment and storage medium
CN113283267A (en) * 2020-02-19 2021-08-20 广东博智林机器人有限公司 Minimum closed space extraction method and device based on two-dimensional space
CN113836151A (en) * 2020-06-23 2021-12-24 北京大数医达科技有限公司 Data processing method and device, electronic equipment and computer readable medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512526A (en) * 2006-09-06 2009-08-19 微软公司 Dynamic fragment mapping
CN103473334A (en) * 2013-09-18 2013-12-25 浙江中控技术股份有限公司 Data storage method, inquiry method and system
CN105095413A (en) * 2015-07-09 2015-11-25 北京京东尚科信息技术有限公司 Method and apparatus for solving data skew
CN105930479A (en) * 2016-04-28 2016-09-07 乐视控股(北京)有限公司 Data skew processing method and apparatus
US20170139943A1 (en) * 2015-11-18 2017-05-18 International Business Machines Corporation Method for storing a dataset
CN107016115A (en) * 2017-04-18 2017-08-04 网易(杭州)网络有限公司 Data export method, device, computer-readable recording medium and electronic equipment
CN107341240A (en) * 2017-07-05 2017-11-10 中国人民大学 A kind of processing method for tackling tilt data stream on-line joining process
CN107577531A (en) * 2016-07-05 2018-01-12 阿里巴巴集团控股有限公司 Load-balancing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101512526A (en) * 2006-09-06 2009-08-19 微软公司 Dynamic fragment mapping
CN103473334A (en) * 2013-09-18 2013-12-25 浙江中控技术股份有限公司 Data storage method, inquiry method and system
CN105095413A (en) * 2015-07-09 2015-11-25 北京京东尚科信息技术有限公司 Method and apparatus for solving data skew
US20170139943A1 (en) * 2015-11-18 2017-05-18 International Business Machines Corporation Method for storing a dataset
CN105930479A (en) * 2016-04-28 2016-09-07 乐视控股(北京)有限公司 Data skew processing method and apparatus
CN107577531A (en) * 2016-07-05 2018-01-12 阿里巴巴集团控股有限公司 Load-balancing method and device
CN107016115A (en) * 2017-04-18 2017-08-04 网易(杭州)网络有限公司 Data export method, device, computer-readable recording medium and electronic equipment
CN107341240A (en) * 2017-07-05 2017-11-10 中国人民大学 A kind of processing method for tackling tilt data stream on-line joining process

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694932A (en) * 2019-03-13 2020-09-22 百度在线网络技术(北京)有限公司 Conversation method and device
CN110287038A (en) * 2019-06-10 2019-09-27 天翼电子商务有限公司 Promote the method and system of the data-handling efficiency of Spark Streaming frame
CN111046020A (en) * 2019-11-28 2020-04-21 泰康保险集团股份有限公司 Information processing method and device, storage medium and electronic equipment
CN111046020B (en) * 2019-11-28 2023-09-12 泰康保险集团股份有限公司 Information processing method and device, storage medium and electronic equipment
CN113283267A (en) * 2020-02-19 2021-08-20 广东博智林机器人有限公司 Minimum closed space extraction method and device based on two-dimensional space
CN113836151A (en) * 2020-06-23 2021-12-24 北京大数医达科技有限公司 Data processing method and device, electronic equipment and computer readable medium
CN113836151B (en) * 2020-06-23 2024-04-23 北京大数医达科技有限公司 Data processing method, device, electronic equipment and computer readable medium
CN112183986A (en) * 2020-09-21 2021-01-05 北京每日优鲜电子商务有限公司 Operation index information encryption method, device, electronic equipment and medium
CN112905596A (en) * 2021-03-05 2021-06-04 北京中经惠众科技有限公司 Data processing method and device, computer equipment and storage medium
CN112905596B (en) * 2021-03-05 2024-02-02 北京中经惠众科技有限公司 Data processing method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108776692A (en) Method and apparatus for handling information
CN109460513A (en) Method and apparatus for generating clicking rate prediction model
CN106911697B (en) Access rights setting method, device, server and storage medium
CN108595628A (en) Method and apparatus for pushed information
CN108805594A (en) Information-pushing method and device
CN109976997A (en) Test method and device
CN108664513A (en) Method, apparatus and equipment for pushing keyword
CN110019258A (en) The method and apparatus for handling order data
CN108595448A (en) Information-pushing method and device
CN109388548A (en) Method and apparatus for generating information
CN110298716A (en) Information-pushing method and device
CN108388563A (en) Information output method and device
CN110348921A (en) The method and apparatus that shops's article is chosen
CN109389182A (en) Method and apparatus for generating information
CN109413056A (en) Method and apparatus for handling information
CN109408748A (en) Method and apparatus for handling information
CN108600329A (en) For pushed information, the method and apparatus for showing information
CN108573054A (en) Method and apparatus for pushed information
CN107291835A (en) A kind of recommendation method and apparatus of search term
CN109492687A (en) Method and apparatus for handling information
CN110276566A (en) Information output method and device
CN108959289A (en) Categories of websites acquisition methods and device
CN109614603A (en) Method and apparatus for generating information
CN109754199A (en) Information output method and device
CN108376177A (en) Method and distributed system for handling information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181109