Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to
During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as institute
The example of the consistent apparatus and method of some aspects be described in detail in attached claims, this specification.
It is only merely for the purpose of description specific embodiment in the term that this specification uses, and is not intended to be limiting this explanation
Book." one kind " of used singulative, " described " and "the" are also intended to bag in this specification and in the appended claims
Most forms are included, unless context clearly shows that other implications.It is also understood that term "and/or" used herein is
Refer to and any or all may be combined comprising the associated list items purpose of one or more.
It will be appreciated that though various information may be described using term first, second, third, etc. in this specification, but
These information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, do not taking off
In the case of this specification scope, the first information can also be referred to as the second information, and similarly, the second information can also be claimed
For the first information.Depending on linguistic context, word as used in this " if " can be construed to " ... when " or
" when ... " or " in response to determining ".
With the continuous development of big data technology, application, data of today may be considered a kind of intangible asset;
The side for possessing more data can be current or following have more advantages.For example, nowadays increasingly common is directed to
The personalized service of user, the realization of these services generally require to rely on the data of magnanimity;The data can include user's row
For the various types of data of the various dimensions such as data, industry data, business datum;Service provider can by analyze data,
So as to predict some possible demands of user, so as to targetedly provide service.
As it was previously stated, homogeneity application it is more and more, in some application platforms even occur it is multiple have it is identical or
The different system of person's identity function;Some systems may due to realize purpose it is different, towards terminal user it is different, so as to have
Different applications, but function is roughly the same;These different systems or application hereinafter referred to as channel.These have the mutually same trade
The data that the channel of business possesses can cause a large amount of repetitions, but because different channels may have different data formats, phase
None- identified between mutually, can easily form numerous data barriers.The daily data that can all produce magnanimity in internet, but this
A little data can not be collected effectively at present, because channel is too many, can also have many different channels even if same business, and
The data of different channels lead to not be mutually distinguishable, it is impossible to accomplish to exchange each other's needs again due to difference.For example, acquire multiple canals
The data in road, but the data of different channels are often stored separately;Between each other without any association.If same data
Due to from different channels, then due to having differences, can also be collected repeatedly, cause the waste of storage resources.
For example, certain user suffers from chronic disease, it is necessary to periodically be checked;He is registered by A platforms for the first time;One
After individual month, he is changed to register by B platforms for the second time;After another moon, he is changed to register by C platforms again for the third time;So
The behavioral data that he registers three times is result in be respectively stored in three systems, and these systems are all autonomous closures;Therefore,
Service provider can not excavate the real demand of user from the data of individual system;If user is entered in same channel
Row is registered, then by the behavior of registering three times of user, can analyze user needs periodically to be checked, then for this
Analysis result automatically can help user to register in advance completely, remind user.
The system architecture diagram for realizing data processing provided in this specification embodiment with reference to shown in figure 1 is provided below.Fig. 1
The shown system can include multiple different channels 11, processing component 12, and data platform 13;
Channel 11, it can refer to that there is the different system of identical services or application;It is as flat in there is reservation to register in Alipay
Platform, wisdom hospital, service window;These channels all have same or similar business in medical industry, including preengage and register,
Check report etc..
It processing component 12, can be used for receiving the data that the multiple different channels 11 flow into, and these data carried out
After processing, the data of unified standardization are formed;The processing component 12 can specifically include Msgbroker (Message Agent groups
Part).
Data platform 13, it can be used for storing the data after processing for processing component 12.
It is noted that the processing component 12 and data platform 13 can be separate;Or the processing
Component 12 can also be integrated in data platform 13, that is to say, that processing component 12 can be one in data platform 13
Individual software module, a part.
A kind of embodiment for the method for realizing data processing of this specification is introduced incorporated by reference to the example shown in Fig. 2 below, such as
Shown in Fig. 2, this method may comprise steps of:
Step 210:Obtain the data that the different channels of multiple identical services flow into.
In the present embodiment, the data of channel can be flowed into by the interface that above-mentioned processing component provides.It is for example, different
The data of channel can be flowed into processing component by openapi interfaces.
It is noted that flowing into the in the case of of failure for data in this specification, added machinery has been also set up:
In the case where any channel flow enters data failure, returned to the channel and flow into failure notification;
In the case that the channel reception flows into failure notification to return, data are resend.So that it is determined that channel flows into
The integrality of data.
Step 220:The data that the different channels are flowed into, translate to the data of standardization.
Because the data of different channels there may be the difference of form, such as under normal circumstances, data can be by field
What the field value of type and field type was formed.
For example, one section represents user A reservations and registers the behavioral data of D days " department of stomatology " general out patient service, can be by
What the field value of following field type and field type was formed:
{ (User, user A), (Time, D day), (Department, the department of stomatology), (Type, common door can be designated as
Examine) }.
As previously described, because the difference that the field type of identical meanings is set in different channels, such as:
In channel A, the field type for representing user is:User;
And in channel B, the field type of same expression user is:Yonghu.
In the present embodiment, the step 220, it can specifically include:
Masterplate is translated according to corresponding to the different channels, the different field for the identical meanings that the different channels are flowed into
Type translates to unified field type;
In the present embodiment, record has the special code type of different channels and unified field type in the translation masterplate
Corresponding relation.
Continue to use above-mentioned example, it is assumed that the unified field type for representing user is:Username;
For in channel A represent user field type be:User;It can be recorded in translation masterplate corresponding to channel A
There are User and Username corresponding relation;Therefore, after translation masterplate translation, user is represented in the data that channel A is flowed into
Field type be just changed to Username from User;
For in channel B represent user field type be:Yonghu;It can remember in translation masterplate corresponding to channel B
Record has Yonghu and Username corresponding relation;Therefore, after translation masterplate translation, represented in the data that channel B is flowed into
The field type of user is just changed to Username from Yonghu.
In the present embodiment, the translation masterplate can artificially be pre-set;I.e. according to the special code type of channel,
Itself and unified field type are established into corresponding relation;
Or
The translation masterplate can also calculate to generate by machine learning.For example, the field value flowed into channel enters
Row semantic analysis, so that it is determined that the implication of field type;The implication identical field type of different channels is associated to unified word
Segment type.For example, can to channel flow into field value " Zhang San " carry out semantic analysis, can analyze " Zhang San " this
Field type corresponding to field value " user " is to represent user, and then establishes " user " and unified expression user field type
The corresponding relation of " Username ".
Step 230:The data standardized after translation are associated with corresponding channel.
After the data for flowing into channel translate to the data of standardization, it is also necessary to channel corresponding to association, so as to
To know the source of this data.
The association, the data that will be standardized after translation can be referred to and be associated with corresponding channel mark.The channel
Mark has uniqueness, such as ID.
Step 240:The data for the standardization for being associated with channel are stored.
As it was previously stated, processing component can be by the data storage after processing in data platform.
In the present embodiment, data storage can use Hbase as data storage method;Data transfer can use
Msgbroker massage patterns.
Pass through the present embodiment, the data of unified standard after the data translation that different channels are flowed into, so as to abolish data
Barrier, the real meaning of the data between different channels can be identified;And then it can be carried out based on the data that these are standardized big
The analysis of data.
As it was previously stated, in actual applications, often there is the problem of repetition for the data of different channels;Therefore, after translation
A large amount of duplicate data may also can be there are among the data of standardization;The data that storage largely repeats are clearly to cause storage to provide
The waste in source.Pin is on the other hand, this specification on the basis of Fig. 2 embodiments, before the step 130, can also include:
Whether the data that standardize are identical with data with existing after judging translation;The data with existing is the standardization stored
Data;
The data standardized after translation are associated by the step 140 with corresponding channel, can specifically be included:
In the case that the data that are standardized after translation differ with data with existing, the data that are standardized after this is translated with
Corresponding channel is associated.
On the other hand, methods described also includes:
In the case of the data and data with existing identical that are standardized after translation, by the data with existing with being marked after the translation
Channel corresponding to the data of standardization is associated.
In the present embodiment, by the data standardized after translation compared with the data of the standardization stored, sentence
It is disconnected whether identical.If identical, the data that standardize have been present after illustrating translation, are an invalid data, and then can be with
Data with existing channel corresponding with the data standardized after the translation is associated;
If it is not the same, the data that standardize are an effective data after explanation translation, and then after can this be translated
The data of standardization are associated with corresponding channel.
Illustrate, it is assumed that the data A standardized after translation channel 1;
If the data A that standardizes is compared with data with existing after translation, data A and all data with existing not phases are determined
Together, it is an effective data to illustrate data A, then can associate data A with channel 1;
If the data A standardized after translation is identical with data with existing B, it is an invalid data to illustrate data A,
There is data B to be associated with channel 2;Then data with existing B can be associated channel 1, that is to say, that data with existing B is not only associated with channel
2, also it is associated with channel 1;It can identify that data B sources are channel 1 and channel 2 according to data with existing B.
By the present embodiment, the data standardized after translation can be cleaned, merge the data of repetition.
It is noted that it is described judge translation after the data that standardize it is whether identical with data with existing, can specifically wrap
Include:
A1:Judge whether the field value of same field type between the data that standardize and data with existing after translation is consistent;
A2:The quantity of the consistent same field type of static fields value;
A3:Calculate the ratio that counted quantity accounts for whole field type quantity;
A4:In the case where the ratio exceedes threshold value, it is determined that the data standardized after translation are identical with data with existing;
A5:In the case where the ratio is no more than threshold value, it is determined that the data that standardize and data with existing not phase after translation
Together.
For example, the data A standardized after translation is { (User, user A), (Time, D day), (Department, mouth
Chamber section), (Type, general out patient service) };
Data with existing B { (User, user A), (Time, D day), (Department, the department of stomatology), (Type, Zhuan Jiamen
Examine) };
Wherein, it is all " user A " that field type User field value is identical;
Identical field type Time field value is all " D days ";
Identical field type Department field value is all " department of stomatology ";
Field type Type field value differs, and data A is " general out patient service ", and data B is " medical expert's consultation ";
The data of the consistent same field type of static fields value are 3;
Calculate counted quantity and account for the ratios of whole field type data and be:3/4=75%.
In the present embodiment, the threshold value can be artificial set in advance;
With the progress of the continuous development of computer technology, particularly artificial intelligence, the threshold value, which can also be, passes through machine
Device study is calculated.For example, threshold value when being handled based on history, by machine learning algorithm can calculate one it is optimal
Threshold value.
Have again, the threshold value can also be what is be calculated based on big data technology.For example, pass through mass data, hair
The threshold value now largely set is 90%, then can determine that threshold value can also be set as 90% in this detection process.
It should be noted that if threshold value is set as 100%, the data for illustrating only to standardize after translation are with having
In the case that the field value of all same field types is all consistent between data, can just be determined it is identical, it is also assumed that being
It is identical.
If the ratio calculated has exceeded the threshold value of this setting, then can determine the data standardized after translation
It is identical with data with existing;
, whereas if the ratio calculated is not above the threshold value of this setting, then can determine standard after translation
The data of change differ with data with existing.
It is noted that each field type can also be set and weighed according to practical situations in this specification
Weight.For example, larger weight is set for more important field type, it is smaller for more unessential field type setting
Weight.
For example, the data A standardized after translation is { (User, user A), (Time, D day), (Department, mouth
Chamber section), (Type, general out patient service) };
Data with existing B { (User, user A), (Time, D day), (Department, the department of stomatology), (Type, Zhuan Jiamen
Examine) };
Field type " User " for representing user, weight 1.5;
Field type " Time " for representing the date, weight 1.3;
Field type " Department " for representing section office, weight 1;
Field type " Type " for representing outpatient service type, weight 0.5;
So, calculate counted quantity and account for the ratios of whole field type data and be:(1.5+1.3+1)/4=95%.
The system architecture diagram for realizing data processing provided in this specification embodiment with reference to shown in figure 3 is provided below.Fig. 3
It may be considered the data application process on the basis of Fig. 1;The system shown in Fig. 3 can include data platform 31, processing component
32, and multiple different channels 33;
Wherein, data platform 31, the data for being stored with standardization can be referred to, data, services can be provided for channel;Institute
Processing component 32 can be sent to by the message comprising data by stating data platform;The message can be understood as in the service of offer
Hold;
Processing component 32, for being responsible for receiving the message of data platform transmission, and according to type of message (Message
Type);And according to subscribing relationship (Binding), send this message to the channel for subscribing to the message.The subscribing relationship,
It can be built after channel have subscribed certain message;For example, certain medical channel have subscribed the service of registering of data platform, that
The subscribing relationship that can generation is registered between content-message and the medical channel in data platform.
Above-mentioned message can have type of message, and different type of messages can be made a distinction by unique mark, for example,
TOPIC/eventcode unique marks.Specifically, type of message can include inquiry service, decision data service, channel route
Service etc..
Channel 33, can be the subscriber that have subscribed some services in data platform.For example, it have subscribed the extension for message of registering
Number platform.
In general, channel 33 can subscribe to a variety of different message;Data platform can also send a variety of different message.
Data platform 31 translates to the data of standardization based on the foregoing data from each different channels being collected into
Afterwards, these data can be analyzed, concluded, the generation service content such as statistics;So as to which according to subscribing relationship, message be sent
Channel to subscription.
Specifically, on the basis of embodiment illustrated in fig. 1, methods described can also include:
Receive the message for being used to provide service that the data analysis based on storage that data platform is sent is drawn;
Obtain the type of message of the message;
According to subscribing relationship, it is determined that subscribing to the channel of the type of message;
Send the message to the channel.
Registered for example, data platform is collected into reservation in platform this channel on certain hospital doctor next Wednesday
Number source also has 30;Number source of the doctor also has 35 in this channel of service window;And from the tables of data of wisdom hospital channel
The bright doctor stops to examine next Wednesday.Based on above-mentioned data, illustrate that platform is registered in reservation and service window the two platforms do not receive
Doctor stops the notice examined, and can still register, well imagine, then patient in next week three go hospital can not the number of taking, so as to wave
Take patient's time, reduce the public credibility of channel.Assuming that reservation is registered, platform have subscribed on the service of registering in data platform
The message that doctor stops examining is sent to reservation and registered platform by words, data platform can;Platform can of registering so is preengage to take
Disappear number source of doctor's next Wednesday.
Corresponding with aforementioned data processing method embodiment, this specification additionally provides the embodiment of data processing equipment.
Described device embodiment can be realized by software, can also be realized by way of hardware or software and hardware combining.With software
It is the processor by equipment where it by nonvolatile memory as the device on a logical meaning exemplified by realization
Corresponding computer program instructions read what operation in internal memory was formed.For hardware view, this specification data processing dress
A kind of hardware configuration of equipment where putting can include outside processor, network interface, internal memory and nonvolatile memory, real
The equipment in example where device is applied generally according to the data processing actual functional capability, other hardware can also be included, this is no longer gone to live in the household of one's in-laws on getting married
State.
Referring to Fig. 4, the module map of the data processing equipment provided for the embodiment of this specification one, described device has corresponded to figure
2 illustrated embodiments, described device include:
Acquiring unit 410, obtain the data that the different channels of multiple identical services flow into;
Unit 420 is translated, the data that the different channels are flowed into, translates to the data of standardization;
Associative cell 430, the data standardized after translation are associated with corresponding channel;
Memory cell 440, the data for the standardization for being associated with channel are stored.
In an optional embodiment:
The data are made up of the field value of some field types and the field type;
The translation unit 420, is specifically included:
Masterplate is translated according to corresponding to the different channels, the different field for the identical meanings that the different channels are flowed into
Type translates to unified field type;Record has the special code type of different channels and unified word in the translation masterplate
The corresponding relation of segment type.
In an optional embodiment:
Before the associative cell 430, described device also includes:
Judgment sub-unit, whether the data that standardize are identical with data with existing after judging translation;The data with existing is
The data of the standardization of storage;
The associative cell 430, is specifically included:
In the case that the data that are standardized after translation differ with data with existing, the data that are standardized after this is translated with
Corresponding channel is associated.
In an optional embodiment:
Described device also includes:
Subelement is associated, will be described several in the case that the data standardized after translation differ with data with existing
It is associated according to the corresponding channel of data with being standardized after the translation.
In an optional embodiment:
The judgment sub-unit, is specifically included:
Field value judgment sub-unit, judge the word of same field type between the data that standardize and data with existing after translation
Whether segment value is consistent;
Quantity statistics subelement, the quantity of the consistent same field type of static fields value;
Ratio computation subunit, calculate the ratio that counted quantity accounts for whole field type quantity;
First determination subelement, in the case where the ratio exceedes threshold value, it is determined that after translation the data that standardize with
There are data identical;
Second determination subelement, in the case where the ratio is no more than threshold value, it is determined that after translation the data that standardize with
Data with existing differs.
In an optional embodiment:
Described device also includes:
Receiving subelement, receive the disappearing for offer service that the data analysis based on storage that data platform is sent is drawn
Breath;
Subelement is obtained, obtains the type of message of the message;
Determination subelement, according to subscribing relationship, it is determined that subscribing to the channel of the type of message;
Transmission sub-unit, send the message to the channel.
In an optional embodiment:
The specified service types include:
The service of inquiry, decision data service, channel route service.
In an optional embodiment:
After the acquiring unit 410, described device also includes:
Returning unit, in the case where any channel flow enters data failure, returned to the channel and flow into failure notification.
System, device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity,
Or realized by the product with certain function.One kind typically realizes that equipment is computer, and the concrete form of computer can
To be personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play
In device, navigation equipment, E-mail receiver/send equipment, game console, tablet PC, wearable device or these equipment
The combination of any several equipment.
The function of unit and the implementation process of effect specifically refer to and step are corresponded in the above method in said apparatus
Implementation process, it will not be repeated here.
For device embodiment, because it corresponds essentially to embodiment of the method, so related part is real referring to method
Apply the part explanation of example.Device embodiment described above is only schematical, wherein described be used as separating component
The unit of explanation can be or may not be physically separate, can be as the part that unit is shown or can also
It is not physical location, you can with positioned at a place, or can also be distributed on multiple NEs.Can be according to reality
Need to select some or all of module therein to realize the purpose of this specification scheme.Those of ordinary skill in the art are not
In the case of paying creative work, you can to understand and implement.
Figure 4 above describes inner function module and the structural representation of data processing equipment, its substantial executive agent
Can be a kind of electronic equipment, including:
Processor;
For storing the memory of processor-executable instruction;
Wherein, the processor is configured as:
Obtain the data that the different channels of multiple identical services flow into;
The data that the different channels are flowed into, translate to the data of standardization;
The data standardized after translation are associated with corresponding channel;
The data for the standardization for being associated with channel are stored.
In an optional embodiment:
The data are made up of the field value of some field types and the field type;
The data that the different channels are flowed into, translate to the data of standardization, specifically include:
Masterplate is translated according to corresponding to the different channels, the different field for the identical meanings that the different channels are flowed into
Type translates to unified field type;Record has the special code type of different channels and unified word in the translation masterplate
The corresponding relation of segment type.
In an optional embodiment:
It is described the data standardized after translation are associated with corresponding channel before, in addition to:
Whether the data that standardize are identical with data with existing after judging translation;The data with existing is the standardization stored
Data;
The data standardized after translation are associated with corresponding channel, specifically included:
In the case that the data that are standardized after translation differ with data with existing, the data that are standardized after this is translated with
Corresponding channel is associated.
In an optional embodiment:
The processor is additionally configured to:
In the case that the data standardized after translation differ with data with existing, after the data with existing and the translation
Channel corresponding to the data of standardization is associated.
In an optional embodiment:
It is described judge translation after the data that standardize it is whether identical with data with existing, specifically include:
Judge whether the field value of same field type between the data that standardize and data with existing after translation is consistent;
The quantity of the consistent same field type of static fields value;
Calculate the ratio that counted quantity accounts for whole field type quantity;
In the case where the ratio exceedes threshold value, it is determined that the data standardized after translation are identical with data with existing;
In the case where the ratio is no more than threshold value, it is determined that the data standardized after translation differ with data with existing.
In an optional embodiment:
The processor is additionally configured to:
Receive the message for being used to provide service that the data analysis based on storage that data platform is sent is drawn;
Obtain the type of message of the message;
According to subscribing relationship, it is determined that subscribing to the channel of the type of message;
Send the message to the channel.
In an optional embodiment:
The specified service types include:
The service of inquiry, decision data service, channel route service.
In an optional embodiment:
After the data that the different channels for obtaining multiple identical services flow into, in addition to:
In the case where any channel flow enters data failure, returned to the channel and flow into failure notification.
In the embodiment of above-mentioned electronic equipment, it should be appreciated that the processor can be CPU (English:
Central Processing Unit, referred to as:CPU), it can also be other general processors, digital signal processor (English:
Digital Signal Processor, referred to as:DSP), application specific integrated circuit (English:Application Specific
Integrated Circuit, referred to as:ASIC) etc..General processor can be microprocessor or the processor can also be
Any conventional processor etc., and foregoing memory can be read-only storage (English:Read-only memory, abbreviation:
ROM), random access memory (English:Random access memory, referred to as:RAM), flash memory, hard disk or solid
State hard disk.The step of method with reference to disclosed in the embodiment of the present invention, can be embodied directly in hardware processor and perform completion, or
Hardware and software module combination in person's processor perform completion.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment
Divide mutually referring to what each embodiment stressed is the difference with other embodiment.Set especially for electronics
For standby embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is real referring to method
Apply the part explanation of example.
Those skilled in the art will readily occur to this specification after considering specification and putting into practice invention disclosed herein
Other embodiments.This specification is intended to any modification, purposes or adaptations of this specification, these modifications,
Purposes or adaptations follow the general principle of this specification and undocumented in the art including this specification
Common knowledge or conventional techniques.Description and embodiments be considered only as it is exemplary, the true scope of this specification and
Spirit is pointed out by following claim.
It should be appreciated that the precision architecture that this specification is not limited to be described above and is shown in the drawings,
And various modifications and changes can be being carried out without departing from the scope.The scope of this specification is only limited by appended claim
System.