Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as institute
The example of the consistent device and method of some aspects be described in detail in attached claims, this specification.
It is only to be not intended to be limiting this explanation merely for for the purpose of describing particular embodiments in the term that this specification uses
Book.The "an" of used singular, " described " and "the" are also intended to packet in this specification and in the appended claims
Most forms are included, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein is
Refer to and includes that one or more associated any or all of project listed may combine.
It will be appreciated that though various information may be described using term first, second, third, etc. in this specification, but
These information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not taking off
In the case where this specification range, the first information can also be referred to as the second information, and similarly, the second information can also be claimed
For the first information.Depending on context, word as used in this " if " can be construed to " ... when " or
" when ... " or " in response to determination ".
Es (ElasticSearch) is a kind of search server based on Lucene, it, which is provided, has distributed be mostly used
The search engine of family retrieval capability.Can be stored in Es cluster quantitative business datum in case access, for example: anti money washing lead
Domain, user service data are stored in Es cluster in the form of data directory, in case search uses.
In view of the limitation of business datum amount and Es cluster capacity, the old number stored in Es cluster generally can be periodically eliminated
According to so as to the transition from the old to the new of storing data.In traditional data robin scheme, be according to time dimension timing eliminate it is expired
Data.For example, Es cluster only allows to store one month business datum, then according to the life cycle algorithm of succession of the old by the new, in this month 1
Number when, then need to eliminate No. 1 business datum of last month, with guarantee Es cluster capacity health and balance.
The data of date earlier above are eliminated and are deleted according to date dimension by traditional scheme.After eliminating, if necessary
The business datum being eliminated is accessed, needs data reloading synchronization under line.The dimension of traditional robin scheme is excessively single
One, it is not bound with actual business demand progress data and eliminates, (entered and eliminated section) He Duozhu across time dimension in access
When body client's connected reference, it is easy to hit failure occur.It must reload and eliminate data, increase system load and prolong
Late.
In view of the above problems, this specification embodiment provides a kind of superseded method of the data applied to search system, and
A kind of superseded square law device of the data applied to search system for executing this method.The application that the present embodiment is related to below
It eliminates method in the data of search system to be described in detail, shown in Figure 1, this method may comprise steps of:
S101, obtains the accessed information of business datum, and the accessed information includes at least the accessed of business datum
Temporal information calculates according to the accessed information and adjusts the active value of corresponding business datum;
The present embodiment is illustrated search system by taking search server Es (ElasticSearch) as an example, and Es is a kind of
Search engine, a certain amount of business datum can be stored in Es cluster is searched for standby user and use.This specification is hereafter referred to Es
For search system.
By taking anti-money laundering field as an example, history service data are stored in Es cluster in the form of data directory, business datum
It may include the payer in transaction, beneficiary, exchange hour, tradable commodity, the information such as transaction amount.User can be by business
Certain information input Es in data search the single for being associated with the information or a plurality of business datum in turn.For example: user
Payer account name " woods one " and transaction time of origin " October 8 " are inputted into Es as filter information, then can search for obtaining " woods
One " the All Activity occurred in " October 8 ", each of these transaction can be considered as a business datum.
It is appreciated that this business datum can be considered as and " be more concerned about if a business datum is often accessed by the user
This business datum can be considered as " data not being concerned about " if a business datum is not accessed by the user by data ".Therefore, originally
Embodiment introduces this parameter of the active value of business network data, first obtains the accessed information of business datum, and according to the quilt
Access information calculates and adjusts the active value of corresponding business datum, such as: the work of corresponding business datum is raised after being accessed every time
Jump value.
Wherein, the accessed information of business datum needs to record when user accesses business datum, specifically, can be
Increase the access information recording strategy to business datum in the search history record of Es engine, after each business datum is accessed,
Record the accessed information;
Specifically, configuration access data can be corresponded to for every business datum, accessing in data may include that the business datum exists
Total access times in preset time period, total the number of visiting people, the information such as time being accessed every time, which can be in business
Data are corresponding after being accessed every time to be updated.
S102 extracts the time field of business datum, determines business datum in search system according to the time field
There are durations;
S103, calculates the temperature score value of every business datum using preset data temperature algorithm, the temperature score value with
Business datum there are when length be inversely proportional, and it is directly proportional to the active value of the business datum;
Temperature score value and business datum there are when length be inversely proportional, and it is directly proportional to the active value of the business datum, should
Algorithmic formula can be with are as follows:
Score=α * (current time-business datum creation time)+β * (active value * c)+temperature radix;Wherein α<0, β>
0, c > 0, temperature radix is presetting constant.
The algorithmic notation, it is superseded whether a business datum carries out, depend not only on it is simple there are time dimensions, it is heavier
What is wanted is client's degree of attentiveness (i.e. business datum active value).Based on the replacement policy and algorithm, handed over compared to traditional the old and new
Probability and the response time of data breakdown can be greatly lowered in the replacement policy replaced.
Wherein, when calculating the active value of business datum, the association calculating parameter of active value is set according to specific needs.
Generally, institute can be calculated according to the accessed frequency information in the newest accessed temporal information and predetermined period of business datum
State the active value of business datum.It illustrates;The newest accessed time of one business datum closer to current time, then the industry
The active value for data of being engaged in is higher;Accessed number was more at upper one week for one business datum, then the active value of the business datum is got over
It is high
It, can also be by business in addition to the accessed time based on business datum and other than accessed frequency calculates corresponding active value
The other parameters of data are added in the influent factor of temperature score value.Such as:
A) the accessed number in the predetermined amount of time of record traffic data, using accessed number as calculating hot value
One of weight parameter is higher than quilt by the active value that 100 people access 1 business datum respectively such as in same 100 times access
The business datum of 1 personal visit 100 times;
B) different business scenarios is pre-defined respectively, and business locations, the hot spot value of the business factors such as business hours section will
The above-mentioned traffic hotspots value of business datum is as one of the weight parameter for calculating hot value.Such as hot spot service is defined as by double 11
Period, it is believed that occur " business transaction of double 11 " be easier by user query.When then calculating the active value of business datum,
It can be " the corresponding up-regulation of the active value of the business datum of double 11 " by business hours section.
S104 compares the temperature score value for calculating business datum and predefined superseded threshold value, by temperature score value
Business datum lower than superseded threshold value is deleted from search system.
Data superseded specific opportunity can generally take timing replacement policy according to application scenarios sets itself.Periodically
Replacement policy executes a data in each predetermined point of time and eliminates, by " cold data " that can be eliminated in batches from Es cluster
It deletes, such as executes a data every other week and eliminate.This data, which eliminate mode, more new business number occurs in each access
According to temperature score value, but the related datas such as the accessed information of business datum are recorded at the business datum, when timing is washed in a pan
When eliminating beginning, the temperature score value of every business datum is unifiedly calculated, and superseded temperature score value is ineligible after calculating
Business datum.
It, can also be according to practical business scene settings others replacement policy other than taking timing replacement policy.Such as: for Es
Cluster sets a safety margin value, after new business datum is added, detects that the remaining space of Es cluster is less than the safety
When margin value, then business datum eliminative mechanism is triggered, executes the superseded process of primary above-mentioned business datum and wash in a pan " cold data "
Eliminate processing.
Wherein, the superseded threshold value of a search system not instead of fixed data, the data that a dynamic updates are eliminated
The update method of threshold value can specifically refer to Fig. 2, include the following steps:
S201, determine search system maximum bearer cap and currently use space, according to it is described maximum bearer cap with
The remaining available space of search system is currently calculated with space;
S202 makes superseded threshold value and the residue that can use sky according to the remaining available space dynamic replacement and obsoleteness threshold value
Between be inversely proportional.
Eliminating threshold value is dynamically to have been adjusted with space according to Es cluster bearer cap with current.When es cluster is remaining
When space is few, business datum must have higher temperature score value that could not be eliminated, and eliminate be unsatisfactory for temperature point by a larger margin
Value > superseded threshold value business datum, increases the survival threshold of business datum.The dynamic adjustable strategies for eliminating threshold value can be more preferable
While meeting Es cluster and avoid itself occurring storage pressure, more demands for storing hot spot service data.
It is in addition to comparing the temperature score value for calculating business datum with predefined superseded threshold value, temperature score value is low
Outside the business datum for eliminating threshold value is deleted from search system, plan can also be eliminated according to practical business scene settings others
Slightly.Such as: business datum being ranked up from high to low according to temperature score value, the partial service data of sequence rearward are carried out superseded
Processing.
Under application scenes, such as anti-money laundering field, the business association between each account and account of risk clique is past
Toward that can be in reticular structure, multiagent task Transaction Inquiries, i.e. continuous-query be likely to when business datum in user's access clique
The connected transaction of multiple accounts in clique.In this case, it can be criticized according to the characteristics of operation system with experience, consideration introducing
Update mechanism is measured, batch updating mechanism can be from breakdown when largely reducing anti money washing multiagent task Transaction Inquiries
Rate, the access data updating process can refer to Fig. 3, particularly may be divided into following steps:
S301, monitoring business data and the in real time accessed information of record traffic data;
S302 determines the associated client of the client, the associated client is at least after the business datum of client is accessed
Including other clients of transaction occurred with the client in preset time range;
S303 determines the connected transaction of the associated client, according to the determination of the correlation degree of the connected transaction and updates
The active value of corresponding business datum.
As can be seen that updating the business datum after the business datum of a batch updating mechanism i.e. account is accessed
While accessing data, the access data of its interlock account transaction data are also updated.
Specifically, different label tagged traffic data can be set, after being accessed such as business datum, what is be directly accessed
Label should " direct Access Events " and corresponding access time at business datum;By the associated data of the business datum of batch updating
Place's label should " associated access event " and corresponding access time.It, can be according to different Access Events when calculating temperature score value
Different calculating weights is set for the temperature score value of business datum, such as by the weight setting " directly accessed " is 1, " association visit
Ask " weight setting be 0.7.
The data robin scheme that this specification provides eliminates two aspects in service data visitation and business datum respectively and is situated between
Enter and play a role: when user carries out service inquiry, updating the newest accessed time of business datum, is eliminated carrying out data
When, the hot value of each business datum is calculated based on the accessed time.The accessed number of business datum is more, and the accessed time gets over
Close to current time, then it is believed that the more accessed user of business datum is concerned about, then temperature score value is correspondinglyd increase.Business datum exists
Es is longer there are the time, it is believed that data get over " old ", then accordingly reduce temperature score value.Thus it can realize that dsc data resides Es collection
Group, cold data (being indifferent to data) technical effect superseded as early as possible improve user's under conditions of the Es cluster confined space
Search hit rate avoids frequently reloading system load and retardation rate caused by having eliminated data.
Corresponding to above method embodiment, it is superseded that this specification embodiment also provides a kind of data applied to search system
Square law device, it is shown in Figure 4, including access monitoring module 410, duration determining module 420,430 sum number of temperature determining module
According to determining module 440.
Access monitoring module 410: for obtaining the accessed information of business datum, the accessed information includes at least industry
The accessed temporal information for data of being engaged in, calculates according to the accessed information and adjusts the active value of corresponding business datum;
Duration determining module 420: for extracting the time field of business datum, business number is determined according to the time field
According in search system there are durations;
Temperature computing module 430: for calculating the temperature score value of every business datum using preset data temperature algorithm,
The temperature score value and business datum there are when length be inversely proportional, and it is directly proportional to the active value of the business datum;
Data eliminate module 440: temperature score value and predefined superseded threshold value for that will calculate business datum carry out
Comparison deletes the business datum that temperature score value is lower than superseded threshold value from search system.
This specification embodiment also provides a kind of computer equipment, includes at least memory, processor and is stored in
On reservoir and the computer program that can run on a processor, wherein processor realized when executing described program aforementioned applications in
The data of search system eliminate method, and the method includes at least:
The accessed information of business datum is obtained, the accessed time that the accessed information includes at least business datum believes
Breath, calculates according to the accessed information and adjusts the active value of corresponding business datum;
The time field for extracting business datum, determines presence of the business datum in search system according to the time field
Duration;
The temperature score value of every business datum, the temperature score value and business number are calculated using preset data temperature algorithm
According to there are when length be inversely proportional, and it is directly proportional to the active value of the business datum;
The temperature score value for calculating business datum and predefined superseded threshold value are compared, by temperature score value lower than naughty
The business datum for eliminating threshold value is deleted from search system.
Fig. 5 shows one kind provided by this specification embodiment and more specifically calculates device hardware structural schematic diagram,
The equipment may include: processor 1010, memory 1020, input/output interface 1030, communication interface 1040 and bus
1050.Wherein processor 1010, memory 1020, input/output interface 1030 and communication interface 1040 are real by bus 1050
The now communication connection inside equipment each other.
Processor 1010 can use general CPU (Central Processing Unit, central processing unit), micro- place
Reason device, application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or one
Or the modes such as multiple integrated circuits are realized, for executing relative program, to realize technical side provided by this specification embodiment
Case.
Memory 1020 can use ROM (Read Only Memory, read-only memory), RAM (Random Access
Memory, random access memory), static storage device, the forms such as dynamic memory realize.Memory 1020 can store
Operating system and other applications are realizing technical solution provided by this specification embodiment by software or firmware
When, relevant program code is stored in memory 1020, and execution is called by processor 1010.
Input/output interface 1030 is for connecting input/output module, to realize information input and output.Input and output/
Module can be used as component Configuration (not shown) in a device, can also be external in equipment to provide corresponding function.Wherein
Input equipment may include keyboard, mouse, touch screen, microphone, various kinds of sensors etc., output equipment may include display,
Loudspeaker, vibrator, indicator light etc..
Communication interface 1040 is used for connection communication module (not shown), to realize the communication of this equipment and other equipment
Interaction.Wherein communication module can be realized by wired mode (such as USB, cable etc.) and be communicated, can also be wirelessly
(such as mobile network, WIFI, bluetooth etc.) realizes communication.
Bus 1050 include an access, equipment various components (such as processor 1010, memory 1020, input/it is defeated
Outgoing interface 1030 and communication interface 1040) between transmit information.
It should be noted that although above equipment illustrates only processor 1010, memory 1020, input/output interface
1030, communication interface 1040 and bus 1050, but in the specific implementation process, which can also include realizing normal fortune
Other assemblies necessary to row.In addition, it will be appreciated by those skilled in the art that, it can also be only comprising real in above equipment
Component necessary to existing this specification example scheme, without including all components shown in figure.
This specification embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey
Realize that the data above-mentioned applied to search system eliminate method when sequence is executed by processor, the method includes at least:
The accessed information of business datum is obtained, the accessed time that the accessed information includes at least business datum believes
Breath, calculates according to the accessed information and adjusts the active value of corresponding business datum;
The time field for extracting business datum, determines presence of the business datum in search system according to the time field
Duration;
The temperature score value of every business datum, the temperature score value and business number are calculated using preset data temperature algorithm
According to there are when length be inversely proportional, and it is directly proportional to the active value of the business datum;
The temperature score value for calculating business datum and predefined superseded threshold value are compared, by temperature score value lower than naughty
The business datum for eliminating threshold value is deleted from search system.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit
The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with
It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual
The purpose for needing to select some or all of the modules therein to realize this specification scheme.Those of ordinary skill in the art are not
In the case where making the creative labor, it can understand and implement.
As seen through the above description of the embodiments, those skilled in the art can be understood that this specification
Embodiment can be realized by means of software and necessary general hardware platform.Based on this understanding, this specification is implemented
Substantially the part that contributes to existing technology can be embodied in the form of software products the technical solution of example in other words,
The computer software product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are to make
It is each to obtain computer equipment (can be personal computer, server or the network equipment etc.) execution this specification embodiment
Method described in certain parts of a embodiment or embodiment.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer can
To be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
In device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment
The combination of any several equipment.
The above is only the specific embodiment of this specification embodiment, it is noted that for the general of the art
For logical technical staff, under the premise of not departing from this specification embodiment principle, several improvements and modifications can also be made, this
A little improvements and modifications also should be regarded as the protection scope of this specification embodiment.