WO2018028326A1 - Model updating method and apparatus - Google Patents

Model updating method and apparatus Download PDF

Info

Publication number
WO2018028326A1
WO2018028326A1 PCT/CN2017/090609 CN2017090609W WO2018028326A1 WO 2018028326 A1 WO2018028326 A1 WO 2018028326A1 CN 2017090609 W CN2017090609 W CN 2017090609W WO 2018028326 A1 WO2018028326 A1 WO 2018028326A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
feature
point
sequence
slice
Prior art date
Application number
PCT/CN2017/090609
Other languages
French (fr)
Chinese (zh)
Inventor
谭银燕
周鹏飞
汪芳山
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018028326A1 publication Critical patent/WO2018028326A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a method and apparatus for updating a model.
  • the machine learning algorithm is an algorithm that obtains a data model (hereinafter referred to as a model) by using known data, and uses the model to predict unknown data; for example, using the model and data to be received for content recommendation services, etc. .
  • a model a data model
  • Traditional machine learning algorithms require all known data to be prepared before learning, and once the model is available, it is not changed.
  • Incremental modeling technology supports the incremental updating of the acquired model with new data, so that the updated model can better adapt to the changing rules of the newly added data, thereby improving the accuracy of prediction of unknown data.
  • the model update method provided by the incremental modeling technology is as follows: obtain new data, historical model and update trigger point; update the historical model with new data at the moment when the trigger point is updated, thereby training a new model .
  • when to trigger the update of the model is a key issue, which affects the frequency of update of the model and the accuracy with which the model predicts unknown data.
  • a fixed duration or a fixed amount of data is generally used as an update trigger point, that is, if the time period from the last update trigger point to the current time reaches a fixed duration, the trigger model is updated; or, if from above
  • the model is triggered to be updated.
  • the phase may be caused.
  • the next model update in the two model updates triggered by the two adjacent update trigger points The meaning is not big, even no doubt, resulting in waste of resources.
  • Embodiments of the present invention provide a model updating method and apparatus, which are used to at least solve the problem that the data feature of the newly added data between the two adjacent update trigger points does not change significantly with the data characteristics of the previous data.
  • the subsequent model update in the two model updates triggered by the two adjacent update trigger points has little or no doubt, resulting in waste of resources.
  • a method for updating a model including: acquiring first online service data received in a window where a trigger point to be tested is located, where the trigger point to be tested may be any one of the trigger points to be tested; a data feature of the online service data, constructing a first feature sequence; determining an association relationship between the first feature sequence and the at least one representative slice, wherein the representative slice is a slice of the feature sequence constructed according to the data feature of the historical service data; The current model is updated when the relationship between the first feature sequence and the at least one representative slice satisfies a preset condition.
  • the technical solution provided by the embodiment of the present invention combines the data feature of the online service data, the data feature of the historical service, the association relationship between the feature sequences constructed by the two, and the preset conditions to determine the test to be tested.
  • the trigger point is an update trigger point; compared with the technical solution provided by the prior art that the fixed duration or the fixed data amount is used as the update trigger point, the data characteristics of the newly added data between the adjacent two update trigger points can be reduced.
  • the change from the data characteristics of the previous data is not obvious, and the subsequent model update triggered by the two adjacent update trigger points has little meaning or even a doubtful problem. save resources.
  • the association relationship refers to the relationship between a feature sequence and a representative slice.
  • the relationship between a feature sequence and a representative slice may be represented by a distance or similarity between a feature sequence and a representative slice.
  • the at least one representative slice includes a plurality of representative slices
  • the relationship between the first feature sequence and the at least one representative slice satisfying the preset condition may include: the first feature sequence and any one or more of the plurality of representative slices The relationship between the representative slices satisfies the preset condition.
  • the method may further include: if the association relationship between the first feature sequence and the at least one representative slice does not satisfy the pre- Setting the condition, obtaining the second online service data received in the window of the subsequent test trigger point of the trigger point to be tested; and then, according to the data feature of the first online service data and the data feature of the second online service data, according to Receiving time sequence to construct a second feature sequence; determining And an association relationship between the second feature sequence and the at least one representative slice; if the association relationship between the second feature sequence and the at least one representative slice satisfies a preset condition, the current model is updated.
  • the online received in the window where the next trigger point to be tested is located is obtained.
  • Business data then constructing a new feature sequence according to the receiving time order according to the data characteristics of the first online service data and the online service data received in the window of the next test trigger point, and determining the new feature sequence and At least one represents an association relationship between slices, and if the association relationship satisfies a preset condition, the current model is updated. If the association relationship does not meet the preset condition, the online service data received in the window of the next to-be-tested trigger point of the next to-be-tested trigger point is obtained... until the constructed new feature sequence and at least one representative The relationship between the slices meets the preset conditions, and the current model is updated.
  • updating the current model may include: updating the current model if the distance is less than or equal to the first preset threshold.
  • updating the current model may include: if the similarity is greater than or equal to the second preset threshold, updating the current model.
  • the constructing the first feature sequence according to the data feature of the first online service data may include: constructing the first data sequence according to the data feature of the first online service data; wherein, in the first data sequence An element of the data is a data point, and the data point includes at least the following features: a time at which the data point is located, and a data feature of the service data corresponding to the data point; for example, the data point corresponding to the first online service data includes at least the following features: The time at which the data point is located (ie, at the end of the receiving window of the first online service data), the data characteristics of the first online service data.
  • the data point can be expressed as (t, v), where t represents the time at which the data point is located, and v represents the data characteristic of the service data corresponding to the data point.
  • the rate of change between the previous data points may also include the following characteristics: the time period between the time the data point is located and the time the previous data point was.
  • an element in the first feature sequence can be represented as (t, ⁇ , d), where t represents the time at which the data point is located, ⁇ represents the rate of change between the data point and the previous data point, and d represents The time period between the time the data point is located and the time the previous data point was.
  • the optional design provides a specific implementation of the feature sequence constructed according to the data characteristics of the online service data, but the specific implementation is not limited thereto. For example, the number, meaning, and the like of the features included in each element in the feature sequence can be changed according to actual needs. Even so, the overall concept uses the concept in the possible design.
  • the method may further include: extracting feature points in the first data sequence (ie, special data points, or For a representative data point, when it is specifically implemented, it can be determined according to actual needs.
  • the feature point is a local extremum point, an inflection point, etc. on the curve, wherein the local extremum point may include: a peak point , a valley point, etc.), and constructing a second data sequence based on feature points in the first data sequence.
  • the generating the first feature sequence by the first data sequence may include: generating the first feature sequence by the second data sequence; wherein the element in the first feature sequence includes a time at which the feature point is located, and the feature point is the previous one The rate of change between feature points, and the time period between the time at which the feature point is located and the time at which the previous feature point is located.
  • the number of elements in the first feature sequence may be many if the first feature sequence is directly generated according to the first data sequence.
  • the calculation amount in the process of determining the association relationship between the first feature sequence and the at least one representative slice is large; the possible design obtains the second data sequence by extracting the feature points in the first data sequence, and according to the The second data sequence generates a first feature sequence; the number of elements in the first feature sequence generated in the possible design is smaller than that, and the number of elements in the first feature sequence is obtained according to the first data feature, thereby reducing the determination
  • the amount of calculation in the process of the association between the first feature sequence and at least one representative slice thereby speeding up the processing.
  • the association between the first feature sequence generated by the second feature sequence obtained by using the feature point in the first data sequence and the at least one representative slice is not too large.
  • the possible design if the server continuously receives the online service data, can ensure that the server acquires a set of data features at any of the trigger points to be tested in the subsequent steps, thereby ensuring whether each trigger point to be tested is determined to be Update the trigger point.
  • the possible design if the server continuously receives the online service data, can ensure that the server acquires multiple sets of data features at any trigger point to be tested in the subsequent step, thereby ensuring whether each trigger point to be tested is determined to be Updating the trigger point; and, compared to the last possible design, the granularity (ie, window) of acquiring the data feature is smaller in the possible design, and thus the number of data features obtained is larger, from a statistical point of view, This can improve the accuracy of the calculation.
  • the method may further include determining, as a trigger point to be tested, a time when an integer multiple of the preset duration from the time when the online service data is started to be received.
  • the method may further include determining, as a trigger point to be tested, from a time when the online service data is received to a time when the received online service data is an integer multiple of the preset data amount.
  • determining the trigger point to be tested according to any rule does not affect the basic concept of the technical solution provided by the embodiment of the present invention. Therefore, how to determine the specific implementation manner of the trigger point to be tested is not limited to the foregoing. Two possible designs.
  • the method may further include: acquiring historical service data, and constructing according to the historical service data. a sequence of historical features; then, determining a model change point in the sequence of historical features, wherein the model change point is an update of the magnitude of the change between the two models before and after the triggered model update process is greater than or equal to a preset threshold Trigger point; then, the historical feature sequence is cut based on the model change point in the historical feature sequence to obtain a representative slice.
  • a preset threshold Trigger point for the specific implementation manner of determining the model change point and the cutting history feature sequence, reference may be made to FIG. 11 .
  • the method provided in the possible design can be obtained in an offline state or in an online state; and, the representative slice can be changed without being generated, or updated when the representative slice needs to be updated, or Updated as the sequence of historical features is updated.
  • the representative slice can also be determined empirically, and then these representative slices are stored in advance.
  • the above-mentioned cutting of the historical feature sequence based on the model change point to obtain a representative slice may include: cutting the historical feature sequence based on the model change point, and clustering the slice obtained after the cutting, to obtain Represents a slice.
  • the possible design can reduce the number of representative slices, thereby saving the storage space occupied by the representative slice library; further, it can also reduce the determination of the online feature sequence (for example, the first feature sequence or the second feature) Sequence) A computational amount in the process of representing the association between slices, similar to these features, thereby increasing the rate of model update.
  • a model updating apparatus which can implement the functions performed in the above method examples.
  • the apparatus may include: an obtaining module, a building module, a determining module, and an updating module.
  • the acquiring module is configured to obtain the first online service data received in the window where the trigger point to be tested is located.
  • a building module configured to construct a first feature sequence according to data characteristics of the first online service data.
  • a determining module configured to determine an association relationship between the first feature sequence and the at least one representative slice; the representative slice is a slice of the feature sequence constructed according to the data feature of the historical service data.
  • an updating module configured to update the current model if an association relationship between the first feature sequence and the at least one representative slice satisfies a preset condition.
  • the acquiring module may be further configured to: if the association relationship between the first feature sequence and the at least one representative slice does not satisfy the preset condition, obtain a subsequent trigger point to be tested of the to-be-tested trigger point
  • the second online service data received in the window the building module may be further configured to: construct the second feature sequence according to the receiving time sequence according to the data feature of the first online service data and the data feature of the second online service data;
  • the module may be further configured to: determine an association relationship between the second feature sequence and the at least one representative slice;
  • the update module may be further configured to: if the relationship between the second feature sequence and the at least one representative slice meets a preset condition, Update the current model.
  • the vector is used to represent the first feature sequence and the representative slice; the determining module is specifically configured to: determine a distance between the first feature sequence and the at least one representative slice; the update module may be specifically configured to: if the distance If it is less than or equal to the first preset threshold, the current model is updated.
  • the vector is used to represent the first feature sequence and the representative slice; the determining module is specifically configured to: determine a similarity between the first feature sequence and the at least one representative slice; the update module may be specifically configured to: If the similarity is greater than or equal to the second preset threshold, the current model is updated.
  • the building module may be specifically configured to: construct a first data sequence according to data characteristics of the first online service data; wherein, one element in the first data sequence is a data point, and the data point includes at least The following features: a data point at which the data point is located, a data feature of the service data corresponding to the data point; generating a first feature sequence from the first data sequence; wherein the element in the first feature sequence
  • the prime includes at least the following characteristics: the time at which the data point is located, the rate of change between the data point and the previous data point, and the time period between the time at which the data point is located and the time at which the previous data point is located.
  • the building module may be further configured to: extract feature points in the first data sequence, and construct a second data sequence according to the feature points in the first data sequence.
  • the constructing module may be configured to: when the first data sequence is generated by the first data sequence, generate the first feature sequence; wherein the element in the first feature sequence includes the feature point Time, the rate of change between the feature point and the previous feature point, and the time period between the time at which the feature point is located and the time at which the previous feature point is located.
  • the determining module may be further configured to determine, as the trigger point to be tested, a time when an integer multiple of the preset duration from the time when the online service data is started to be received.
  • the determining module may be further configured to determine, as the trigger to be tested, from a moment when the online service data is started to be received, and when the received online service data is an integer multiple of the preset data amount. point.
  • the generating module may be specifically configured to: cut the historical feature sequence based on the model change point, and cluster the slice obtained after the cutting to obtain a representative slice.
  • a model updating apparatus which can implement the functions performed in the above method examples, and the functions can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the device includes processor memory, system bus and communication A signaling interface; the processor is configured to support the apparatus to perform the corresponding functions of the above methods.
  • the communication interface is used to support communication between the device and other network elements.
  • the apparatus can also include a memory for coupling with the processor that retains the program instructions and data necessary for the apparatus.
  • the communication interface may specifically be a transceiver.
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions corresponding to the foregoing method, which includes a program designed to execute the above aspects.
  • any of the model update devices or computer storage media provided above are used to perform the model update method provided above, and therefore, the beneficial effects that can be achieved can be referred to the corresponding model update provided above.
  • the beneficial effects in the method are not described here.
  • FIG. 1 is a schematic structural diagram of a system to which the technical solution provided by the embodiment of the present invention is applied;
  • FIG. 2 is a schematic structural diagram of a model updating apparatus according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart diagram of a method for updating a model according to an embodiment of the present disclosure
  • FIG. 3a is a schematic flowchart diagram of another method for updating a model according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a relationship between a window and a trigger point to be tested
  • FIG. 5 is a schematic diagram of another relationship between a window and a trigger point to be tested
  • FIG. 6 is a schematic flowchart diagram of another method for updating a model according to an embodiment of the present disclosure.
  • FIG. 6a is a schematic flowchart diagram of another method for updating a model according to an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of determining feature points according to an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart diagram of a method for acquiring a representative slice according to an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart of a method for acquiring a representative slice according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a curve drawn according to a first data sequence according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of feature points determined by the curve shown in FIG. 9 according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a model change before and after an update trigger point according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a model change point determined based on the curve shown in FIG. 9 according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of a curve drawn according to data points according to an embodiment of the present invention.
  • FIG. 14 is a schematic structural diagram of a model updating apparatus according to an embodiment of the present invention.
  • FIG. 15 is a schematic structural diagram of another model updating apparatus according to an embodiment of the present invention.
  • the basic principle of the technical solution provided by the embodiment of the present invention is that the relationship between the feature sequence constructed according to the data feature of the online service data and the representative slice of the feature sequence constructed according to the data feature of the historical service data satisfies a preset condition.
  • the technical solution provided by the embodiment of the present invention combines the data characteristics of the online service data, the data characteristics of the historical service, the association relationship between the feature sequences constructed by the two, and the preset conditions to determine the trigger to be tested. Whether the point is an update trigger point; compared with the technical solution provided by the prior art that the fixed duration or the fixed data amount is used as the update trigger point, the data characteristics of the newly added data between the adjacent two update trigger points can be reduced.
  • the data characteristics of the previous data change are not obvious, and the subsequent model update in the two model updates triggered by the adjacent two update trigger points has little meaning and even no doubt, thereby saving resources.
  • FIG. 1 is a schematic structural diagram of a system to which the technical solution provided by the embodiment of the present invention is applicable, where the system may include a server and one or more service clients connected to the server, and FIG. 1 is a system.
  • Two business clients, Service Client 1 and Business Client 2 are included in the example.
  • the service client can be used by users of the online service, for example, a set-top box of an internet protocol television (IPTV), a smart phone, a computer, and the like.
  • IPTV internet protocol television
  • smart phone smart phone
  • computer and the like.
  • the service client can obtain and record the service data, and send the service data to the server according to the preset rule.
  • the service client uses the video client as an example, and the video player can obtain the video during the process of playing the video.
  • the business data is recorded, and the business data is sent to the server one by one or in batches at the end of the video.
  • the server is configured to receive service data sent by the service client, and maintain (or update) the model according to the service data, where the updated model is used to enable the server to perform prediction according to the service data to be received.
  • FIG. 2 is a schematic structural diagram of a model updating apparatus 20 according to an embodiment of the present invention.
  • the model updating device 20 may be a server, and the model updating device 20 may include a processor 201, a memory 202, a system bus 203, and a communication interface 204.
  • the memory 202 is used to store computer execution instructions
  • the processor 201 is connected to the memory 202 via a system bus, and when the model updating apparatus 20 is in operation, the processor 201 executes computer execution instructions stored in the memory 203 to cause the model updating apparatus 20 to execute the present Any one of the model updating methods provided by the embodiment of the invention.
  • model update methods refer to the related descriptions in the following and the drawings, and details are not described herein again.
  • the embodiment of the invention further provides a storage medium, which may include a memory 202.
  • the processor 201 can be a processor or a collective term for multiple processing elements.
  • the processor 201 can be a central processing unit (CPU).
  • the processor 201 can also be other general purpose processors, digital signal processing (DSP), application specific integrated circuit (ASIC), field-programmable gate array (FPGA) or Other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the processor 201 may also be a dedicated processor, which may include at least one of a baseband processing chip, a radio frequency processing chip, and the like. Further, the dedicated processor may also include a chip having other dedicated processing functions of the model updating device 20.
  • the memory 202 may include a volatile memory such as a random-access memory (RAM); the memory 202 may also include a non-volatile memory such as a read-only memory.
  • RAM random-access memory
  • the memory 202 may also include a non-volatile memory such as a read-only memory.
  • ROM read-only memory
  • flash memory flash memory
  • HDD hard disk drive
  • SSD solid-state drive
  • memory 202 may also include the above types of memory The combination.
  • System bus 203 can include a data bus, a power bus, a control bus, and a signal status bus. For the sake of clarity in the present embodiment, various buses are illustrated as system bus 203 in FIG.
  • Communication interface 204 may specifically be a transceiver on model update device 20.
  • the transceiver can be a wireless transceiver.
  • the wireless transceiver may be an antenna of the model updating device 20 or the like.
  • the processor 201 transmits and receives data to and from other devices, such as a service client, via the communication interface 204.
  • each step in the flow of any one of the model update methods provided below may be implemented by the processor 201 in hardware form executing a computer-executed instruction in the form of software stored in the memory 202. To avoid repetition, we will not repeat them here.
  • Business data refers to the data generated by the business client in the process of using the business.
  • the business data may include data of the service itself, and may also include feedback data of the user to the service.
  • the service data is represented as a time series.
  • the service client is an IPTV online video playback client.
  • the service data of the IPTV online video service may include but is not limited to any of the following information: session ID, user account, and video.
  • the operation record for the video can be Including but not limited to: user's collection of video, browsing View, whether the user recommends content for the video, etc.
  • the online service data and the historical service data in the embodiment of the present invention are all for the server.
  • the online service data refers to the server receiving from the current time and receiving within a preset time period before the current time.
  • the historical service data refers to the service data that the server receives from the current time and is received outside the preset time period before the current time.
  • the trigger point to be tested, the update trigger point, and the model change point are concepts in the time domain, that is, a one-dimensional concept.
  • the trigger point to be tested, the update trigger point, and the model change point can all be represented by t.
  • the trigger point t1 to be tested is used to indicate that the time t1 is used as the trigger point to be tested, and, for example, the update trigger point t2 is used for Indicates that the time t2 is used as the update trigger point.
  • the trigger point to be tested refers to a trigger point (a point on the immediate domain, that is, a time point) that is set according to a certain rule and is used to cause the server to determine whether the model needs to be updated.
  • the server may periodically or continuously receive online service data sent by one or more service clients connected to the server, and the server may determine whether to update the model at specific moments, and the specific moments are It is the trigger point to be tested.
  • the embodiment of the present invention does not limit how to determine the trigger point to be tested.
  • the server can use any time as the trigger point to be tested.
  • the server may include, but is not limited to, the following two implementation manners. Test trigger point:
  • the server may use the time when the integer time multiple of the preset duration from the time when the online service data is started to be received as the trigger point to be tested. For example, if the preset duration is T, the time when the server will receive the online service data from the beginning is t0, the server may use the time t0+nT as the trigger point to be tested; wherein T is greater than 0, and n may be greater than or equal to 0. Any integer.
  • T is greater than 0, and n may be greater than or equal to 0. Any integer.
  • T is not limited in the embodiment of the present invention.
  • the server may start as the trigger point to be tested from the moment when the online service data is received to the time when the received online service data is an integer multiple of the preset data amount. For example, if the preset data amount is R, the time when the server will start receiving the online service data is t0, and the server may start from t0 every time when R online service data is received as the trigger point to be tested.
  • Updating the trigger point can be understood as the actual trigger point or the effective trigger point, which refers to the trigger point for performing the model update.
  • the trigger point to be tested may be the update trigger point, or it may not be the update trigger point.
  • each trigger point to be tested determined according to the above manner 1 or mode 2 is used as an update trigger point.
  • it is determined whether a trigger point to be tested is updated according to a certain rule. Trigger point. Specific examples can be referred to below.
  • the model change point is used to determine the process of representing the slice, which refers to the update trigger point whose amplitude between the two models before and after the triggered model update process is greater than or equal to the preset threshold.
  • the update trigger point herein may be an update trigger point in the prior art, or may be an update trigger point provided by the embodiment of the present invention. Specific instructions can be found below.
  • Both data points and feature points are concepts in the time domain and data feature domains, that is, two-dimensional concepts.
  • the data point can be expressed as (t, v), where t represents the time at which the data point is located, and v represents the data characteristic of the service data corresponding to the data point.
  • Feature points are special data points. Specific instructions can be found below.
  • FIG. 3 is a schematic flowchart diagram of a method for updating a model according to an embodiment of the present invention.
  • the execution body of the method shown in FIG. 3 may be a server, and the method may include the following steps:
  • the server can periodically or continuously receive the online service data sent by the one or more service clients connected to the server.
  • the server is based on the online service data received in the window.
  • the S301 may include: the server obtains online service data received from the window in which the trigger point to be tested is located, and one or more service clients connected to the server, and uses the online service data as the first online service data.
  • the trigger point to be tested in S301 can be any one of the trigger points to be tested.
  • the window can be a time window or a data volume window.
  • a time window can refer to a time period in which a time period approaching zero is a time.
  • a data volume window can refer to a fixed amount of data. The size of the window in which the trigger point is to be measured is not limited in the embodiment of the present invention.
  • the server may receive online service data in each window, and may not receive online service data in some windows. For example, during peak business hours, the server may receive online business data in each window for a period of time; during low peak periods, the server may not receive online business data in certain windows.
  • the method may further include: acquiring data characteristics of the first online service data.
  • the embodiment of the present invention does not limit the specific content and quantity of the data features of the online service data, and the acquisition manner, and may be determined according to factors such as the service data itself and actual requirements.
  • the IPTV online video is an animation.
  • the data feature of the first online service data may include, but is not limited to, watching a cartoon in a receiving window of the first online service data (ie, a window in which the trigger point is to be tested). The number of people, the average playing time of the animation in the receiving window of the first online business data, and the like.
  • the server can obtain the number of independent video accounts in the receiving window of the first online service data by counting the number of independent user accounts whose video type is an animation in the receiving window of the first online business data.
  • the server can obtain the receiving window of the first online service data by counting the average value of the difference between the end time of the video and the start time of the video of the independent user watching the cartoon in the receiving window of the first online service data. The average playing time of the inner animation.
  • the feature values of the data features and the data features are collectively represented by data features.
  • data features described herein should be understood in some scenarios as characteristic values of data features.
  • the above “acquiring data characteristics of the first online service data” should be understood as: acquiring feature values of data features of the first online service data.
  • the vector may be used to represent the first feature sequence.
  • the elements in the first feature sequence are obtained according to data characteristics of online service data acquired in one or more windows.
  • the description will be made by taking an example in which a first feature sequence is represented by a vector.
  • S303 Determine an association relationship between the first feature sequence and the at least one representative slice; the representative slice is a slice of the feature sequence constructed according to the data feature of the historical service data.
  • the at least one representative slice includes one or more representative slices, the representative slice may be determined by a service expert, or may be generated by the server according to a certain method; the representative slice may be pre-stored in the server, or may be pre-executed by S303 Server generated.
  • the vector representation can be used to represent the slice.
  • the relationship between the first feature sequence and the at least one representative slice may be a similarity or distance between the two or the like.
  • the S303 may include: obtaining Taking a representative slice of the at least one representative slice that is equal to the number of elements in the first feature sequence, and determining an association relationship between the first feature sequence and a representative slice equal to the number of elements in the first feature sequence . Specific examples thereof can be referred to below.
  • the relationship between the first feature sequence and the at least one representative slice satisfies a preset condition, and may include: the first feature sequence and the at least one of the plurality of representative slices represent a slice The relationship between the two meets the preset conditions.
  • the preset condition may be predetermined according to one or more factors such as any representation of the relationship (such as distance or similarity, etc.), actual demand and experience.
  • the S303-S304 may include: determining, by the server, an association relationship between the first feature sequence and one of the plurality of representative slices, and determining the association When the relationship does not satisfy the preset condition, determining an association relationship between the first feature sequence and another representative slice of the plurality of representative slices, and so on, until the first feature sequence and one of the plurality of representative slices The relationship between the representative slices meets the preset condition, that is, the relationship between the first feature sequence and the plurality of representative slices satisfies the preset condition.
  • Update the model when the relationship between the feature sequence constructed according to the data feature of the online service data and the representative slice of the feature sequence constructed according to the data feature of the historical service data satisfies a preset condition, Update the model.
  • the technical solution provided by the embodiment of the present invention combines the data characteristics of the online service data, the data characteristics of the historical service, the association relationship between the feature sequences constructed by the two, and the preset conditions to determine the trigger to be tested. Whether the point is an update trigger point; compared with the technical solution provided by the prior art that the fixed duration or the fixed data amount is used as the update trigger point, the data characteristics of the newly added data between the adjacent two update trigger points can be reduced.
  • the data characteristics of the previous data change are not obvious, and the subsequent model update in the two model updates triggered by the adjacent two update trigger points has little meaning and even no doubt, thereby saving resources.
  • the method may further include:
  • the at least one representative slice includes a plurality of representative slices, and the relationship between the first feature sequence and the at least one representative slice does not satisfy the preset condition, and may include: the first feature sequence and each of the plurality of representative slices The relationship between the representative slices does not satisfy the preset condition.
  • the second feature sequence is constructed according to the receiving time sequence according to the data feature of the first online service data and the data feature of the second online service data.
  • S307 Determine an association relationship between the second feature sequence and the at least one representative slice.
  • S308 Update the current model if the relationship between the second feature sequence and the at least one representative slice satisfies a preset condition.
  • the specific implementation manners of S307 to S308 may refer to the specific implementation manners of S303 to S304 in the foregoing, and details are not described herein again.
  • S305-S308 may include: if the association relationship between the first feature sequence and the at least one representative slice does not satisfy the preset condition, the window in which the next trigger point to be tested is obtained is received in the window where the trigger point to be tested is obtained.
  • the online service data wherein the trigger point to be tested is represented as the i-th trigger point to be tested, and the next test trigger point of the to-be-tested trigger point is represented as the i+1th trigger point to be tested.
  • the data feature of the online service data ie, the first online service data
  • the online received in the window of the i+1th test trigger point Business data building a sequence of features.
  • a relationship between the feature sequence and at least one representative slice is determined. If the association meets the preset condition, the current model is updated. If the association relationship does not meet the preset condition, the online service data received in the window where the i+2 test trigger points are located is obtained; and then, the online received according to the window where the i th test trigger point is located The data characteristics of the service data (that is, the first online service data), the online service data received in the window where the i+1th test trigger point is located, and the window in which the i+2 test trigger points are located are received. Online business data, building a sequence of features. A relationship between the feature sequence and at least one representative slice is determined. If the association meets the preset condition, the current model is updated. If the association relationship does not meet the preset condition, the online service data received in the window of the i+3th trigger point to be tested is obtained, and so on, until the current model is updated.
  • the new feature sequence acquired by the server each time may be in each of the last feature sequences.
  • the elements obtained from the data characteristics of the newly acquired online business data are added.
  • S306 can be understood as: after each element in the first feature sequence, an element obtained according to the data feature of the second online service data is added to obtain a second feature sequence.
  • the server may delete the feature sequence used in the current update process after each update of the model, or may use the feature sequence last used in the update process as a follow-up history. Part of the sequence of features.
  • the method may further include: determining a size of a window where the trigger point to be tested in S301 is located, and specifically: assuming that the trigger point to be tested in S301 is the i-th trigger point to be tested, i ⁇ 1, i is Integer; then:
  • the window of the trigger point to be tested may be a window between the time when the server starts receiving online service data and the trigger point to be tested; if i ⁇ 2, the trigger to be tested The window where the point is located may be a window from the i-1th trigger point to be tested to the trigger point to be tested.
  • the window of the trigger point to be tested may be 1/N of the window between the time when the server starts receiving online service data and the trigger point to be tested; if i ⁇ 2, The window of the trigger point to be tested may be 1/N of the window between the i-1th trigger point to be tested and the trigger point to be tested; wherein N ⁇ 2, N is an integer, 1/N indicates One in N.
  • the window between the two moments refers to the time period between the two moments. For example, if the time window is 10 min minutes, the window between two adjacent trigger points to be tested refers to the length of the time period between adjacent two test points to be tested is 10 min.
  • the window between the two moments refers to a window between the online service data that the server receives a fixed amount of data; wherein the amount of data of the online service data can be the traffic or number of the online service, etc. .
  • the data volume window is 10M (megabytes)
  • the window between two adjacent trigger points to be tested refers to the traffic of the online service data received by the server between two adjacent trigger points to be tested. It is 10M.
  • the data volume window is 500
  • the window between two adjacent trigger points to be tested refers to the number of online service data received by the server between two adjacent trigger points to be tested.
  • FIG. 4 it is a schematic diagram of a relationship between a window (specifically, a time window) and a trigger point to be tested.
  • a window specifically, a time window
  • the two trigger points 1 to be tested and the trigger point 2 to be tested are included in the period from the time when the server starts receiving the online service data to the current time.
  • the window where the trigger point 1 to be tested is located is the window 1
  • the window where the trigger point 2 to be tested is located is the window 2.
  • FIG. 5 it is a schematic diagram of a relationship between a window (specifically, a time window) and a trigger point to be tested.
  • a window specifically, a time window
  • N the time from the time when the server starts receiving the online service data to the current time
  • the two trigger points to be tested and the trigger point 2 to be tested are included as an example for description.
  • the window where the trigger point 1 is to be tested is the window 3, and the trigger point 2 to be tested is located.
  • the window is window 6.
  • the optional implementation manner can ensure that the server acquires a set of data features at any trigger point to be tested in the subsequent step, thereby ensuring whether each trigger point to be tested is an update trigger point.
  • the foregoing implementation manner 1 can ensure that a set of data features are acquired before the first trigger point to be tested, or between any two adjacent test trigger points; the foregoing implementation manner 2 can guarantee the first one. Multiple sets of data features are acquired before the trigger point to be tested, or between any two adjacent test trigger points. Among them, the description of the data characteristics can be referred to below.
  • the S302 may include: a data feature of the online service data received in the window according to the trigger point to be tested, and a window between the time when the online service data is started to be received, and the window between the trigger points to be tested. And the data feature of the online service data received in at least one window other than the window where the trigger point is to be tested, and constructing the first feature sequence.
  • the at least one window refers to each window. For example, based on FIG.
  • S302 may include: according to the data feature of the online service data received in the window 1 and the data feature of the online service data received in the window 2 And the data characteristics of the online business data received in the window 3, constructing the first feature sequence.
  • the S302 may include: a data feature of the online service data received in the window in which the trigger point is to be tested, and a relationship between the i-1th test trigger point and the first test trigger point.
  • a data feature of the online service data received in the window and received in at least one window other than the window in which the trigger point is to be tested constructs a first feature sequence.
  • the at least one window refers to each window. For example, based on FIG. 5, if the trigger point to be tested is the trigger point 2 to be tested, S302 may include: according to the data feature of the online service data received in the window 4, and the data feature of the online service data received in the window 5. And a data feature of the online business data received in the window 6, constructing a first feature sequence.
  • S302 may include:
  • S302.1 Build a first data sequence according to data characteristics of the first online service data, where an element in the first data sequence is a data point, and the data point includes at least the following characteristics: a time point at which the data point is located, and a data point The data characteristics of the corresponding business data.
  • the time at which the data point is located refers to the end of the receiving window of each service data corresponding to the data point, that is, the trigger point to be tested, and optionally, the service data corresponding to the data point can be received.
  • the serial number of the window indicates, of course, the specific implementation is not limited to this.
  • the data point corresponding to the data feature of the first online service may be represented as (t, v); wherein t represents the serial number of the receiving window of the data feature of the first online service, and v represents the data feature of the first online service.
  • the first data sequence can be understood as a set consisting of one data point, or a set consisting of a plurality of data points in chronological order of the time at which the plurality of data points are located, the set being represented by a vector.
  • the nth data point in the first data sequence can be represented as (t n , v n ), where t n represents the time at which the nth data point in the first data sequence is located, and v n represents the first The data characteristics of the online service data corresponding to the nth data point in the data sequence; 1 ⁇ n ⁇ N, n and N are integers, and N represents the total number of data points in the first data sequence.
  • the first data sequence can be expressed as ⁇ (t 1 , v 1 ), t 2 , v 2 ), ... (t n , v n ) (t N , v N ) ⁇ .
  • the v n in the nth data point may be represented by a vector form, for example
  • the nth data point can be expressed as: (t n , v n1 , v n2 , ... v nm ... v nM ), where v nm represents the mth of the online service data corresponding to the nth data point Data characteristics; in this case, the first data sequence can be expressed as ⁇ (t 1 , v 11 , v 12 , ... v 1m ...
  • the data feature of the first online service data may be represented as (t, v1, v2), where t represents the serial number of the receiving window of the first online service data, and v1 represents the receiving window of the first online service data.
  • t represents the serial number of the receiving window of the first online service data
  • v1 represents the receiving window of the first online service data.
  • the number of people watching the animation inside, v2 indicates the average playing time of the cartoon in the receiving window of the first online business data.
  • S302.2 Generate a first feature sequence by the first data sequence; wherein the element in the first feature sequence includes at least the following feature: a time at which the data point is located, a rate of change between the data point and the previous data point.
  • the element in the first feature sequence may further include the following feature: a time period between a time when the data point is located and a time when the previous data point is located. Since the optional feature can be inferred according to the moment when the previous data point of the data point is located, the element in the first feature sequence may not include the optional feature.
  • the previous data point of the first data point in the first data sequence is the last data point in the previous data sequence, which needs to be explained. Yes, according to the description of the model change point below, the last data point is A model change point that is closest to the current time. If there is no other data sequence before the first data sequence in time series, the first data point in the first data sequence is actually: the second one from the time when the server starts to receive the online service data (ie, the starting point) The data point, the previous data point is the first data point from the moment the server starts receiving online business data.
  • the time at which the first data point in the first data sequence is located is the second test point to be tested. All of the following are examples of other data sequences before the first data sequence.
  • the nth element in the first feature sequence can be represented as (t n , ⁇ n , d n ), where t n represents the time at which the nth data point in the first data sequence is located, and ⁇ n represents The nth data point and the previous data point in the first data sequence (specifically, the n-1th data point in the first data sequence, or the last data point in the previous data sequence of the first data sequence)
  • the rate of change between d n represents the time period between the time at which the nth data point in the first data sequence is located and the time at which the previous data point is located.
  • the TS of the n-th element (t n, ⁇ n, d n) of ⁇ n may be expressed in vector form, example, n-th element (T n , ⁇ n , d n ) can be expressed as: (t n , ⁇ n1 , ⁇ n2 , ...
  • ⁇ nm represents the first data feature for the mth
  • the rate of change between the nth data point and the previous data point in the data sequence can be expressed as (t, v1, v2)
  • the mth data feature represents the first data feature, such as the number of people watching the animation in the receiving window of the online service data
  • ⁇ nm represents (t n , v n1 ) and (t n-1 , v (n-1) ) 1 )
  • the server receives online service data in each of a plurality of consecutive windows (excluding the first window after receiving the service data from the beginning), according to the received in each window
  • the online business data can obtain a data point.
  • the time period between the time when the data point is located and the time when the previous data point is located is a time period corresponding to one window.
  • the server may not receive online business data in some windows, and based on the window, one cannot get one.
  • a data point In this case, the time period between the time when the data point is located and the time when the previous data point is located is not a time period corresponding to one window, and may be a time period corresponding to multiple windows.
  • the rate of change ⁇ between the data point and the previous data point may be any of the following: a slope between the data point and the previous data point, a normalization of the slope between the data point and the previous data point, The normalization of the inverse tangent of the slope between the data point and the previous data point, the inverse tangent of the slope between the data point and the previous data point, and the slope between the data point and the previous data point The symbol corresponding to the value of the cut.
  • Table 1 An example of a rate of change between a data point and a previous data point is shown in Table 1:
  • the range of the inverse tangent of the slope is divided into the above seven sub-regions, that is, the rate of change between the data point and the previous data point is ranked by seven levels, and the actual implementation is not limited thereto.
  • the rate of change between a data point and a previous data point can be located at any level.
  • the first feature sequence may be: ⁇ (3, -2, 1), (4, 3, 1), (5, 0, 1) ... ⁇ .
  • the “4” in the element (4, 3, 1) indicates the time at which the data point corresponding to the element is located, specifically the serial number of the receiving window of the online service data corresponding to the data point, and “3” indicates the data point.
  • the rate of change from the previous data point is a rapid rise (see Table 1), and "1" indicates the time period between the time at which the data point is located and the time at which the previous data point is located.
  • 1 window corresponds to Time period.
  • the method may further include:
  • S302.1a extract feature points in the first data sequence, and construct a second data sequence according to the feature points in the first data sequence.
  • feature points are local extreme points on the curve (eg, peak points, valley points), inflection points, and so on.
  • the feature points in the first data sequence may be feature points on the curve formed by each data point in the first data sequence.
  • the relationship between the data points and the feature points is: the feature points must be data points, but the data points are not necessarily feature points.
  • the server may be based on the n-1th data point (t n-1 , v n-1 ), and the n+1th data point (t n+1 , v)
  • the relationship between n+1 ) determines whether the nth data point (t n , v n ) is a feature point; specifically: the relationship can be expressed by the following formula: Where Thre1 is a constant greater than or equal to 0.
  • the nth data point may be used as a feature point as long as the data feature of at least one dimension satisfies the above formula.
  • the time interval from the previous feature point is greater than or equal to Thre2; wherein Thre2 is a constant greater than or equal to 0.
  • the further optional implementation is used to avoid the continuation of the eigenvalues of the data features in the adjacent two data points, and the two consecutive data points are used as the feature points, thereby resulting in the acquired feature points.
  • the problem of lower accuracy which ultimately leads to lower accuracy of model updates.
  • the eigenvalue of the data feature may suddenly become larger due to the server repeatedly receiving the online service data and the like in the next window of the adjacent two windows; or, because the server network is in the adjacent two windows In the latter window, the network connection error business data or the online service data is not received, and the characteristic value of the data feature suddenly becomes small. That is, the further optional implementation is for avoiding online service data received in the next window of the adjacent two windows due to abrupt changes in the feature values of the data features in the adjacent two data points.
  • the first data sequence is the same as the second data sequence.
  • FIG. 7 it is a schematic diagram for determining feature points.
  • the abscissa represents t, and the ordinate represents v;
  • the three consecutive data points on the timing acquired by the server are data points A(t n-1 , v n-1 ), and data points B (t n , v n ) and the data point C(t n+1 , v n+1 );
  • the data point A(t n-1 , v n-1 ) indicates that the number of people watching the cartoon in the time window t n-1 is v N-1
  • the data point B(t n , v n ) indicates that the number of people watching the cartoon in the time window t n is v n
  • the data point C(t n+1 , v n+1 ) is expressed in the time window t n
  • the number of people watching ** in +1 is v n+1 .
  • the previous feature point of the data point B is (t 1 , v 1 ), where, in the present example, n is an integer greater than or equal to 2.
  • n is an integer greater than or equal to 2.
  • Condition 1 and Condition 2 shows that, if the ordinate data point B deviates t n in time on the straight line AC corresponding to that of the point B '(i.e. point in the mathematical sense) the ordinate is greater than or equal to thre1, and the data The time period between the time t n at which the point B is located and the time t 1 at which the previous feature point is located is greater than or equal to Thre 2 , and it is determined that the data point B is a feature point.
  • the current trigger point to be tested is the trigger point 1 to be tested
  • the data point obtained according to the data feature of the online service data received in the window where the trigger point 1 is to be tested is the data point B. That is, the data point B is a new data point in the process of determining whether the trigger point 1 to be tested is an update trigger point, and in the process of determining the update trigger point, the data point B is directly used as the feature point B. If the current trigger point to be tested is the next to-be-tested trigger point of the trigger point 1 to be tested (ie, the trigger point 2 to be tested), and the data point C is newly added during the process of determining whether the trigger point 2 to be tested is the update trigger point.
  • the data point in the process of determining the update trigger point, it is determined according to the method shown in FIG. 7 whether the data point B is a feature point. In addition, after the trigger point 2 to be tested is not the update trigger point, it is determined whether the subsequent test trigger point of the trigger point 2 to be tested is the next update trigger point, and the data point C is directly used as the feature point; The trigger point to be tested is the update trigger point.
  • S302.2 in FIG. 6 may include the following S302.2', as shown in FIG. 6a:
  • S302.2' generating a first feature sequence by the second data sequence; wherein the element in the first feature sequence includes a time at which the feature point is located, a rate of change between the feature point and the previous feature point, and a feature point The time period between the moment and the moment when the previous feature point is located.
  • step S302.2' refer to the specific implementation manner of the foregoing S302.2, and details are not described herein again.
  • the first feature sequence may be: ⁇ (5, -2, 5), (14, -1, 9) ... ⁇ .
  • the "14" in the element (14, -1, 9) indicates the time at which the feature point corresponding to the element is located, specifically the sequence number of the receiving window of the online service data corresponding to the feature point, and "-1" indicates the feature.
  • the rate of change between the point and the previous feature point is slowly decreasing (see Table 1), and "9" indicates the time period between the time at which the feature point is located and the time at which the previous feature point is located, specifically: 9 windows The corresponding time period.
  • the number of data points included in the first data sequence may be many, such that if the first feature sequence is directly generated according to the first data sequence, the number of elements in the first feature sequence is There will be a lot, which will make the calculation amount in the process of determining the association relationship between the first feature sequence and the at least one representative slice; the optional implementation obtains the second by extracting the feature points in the first data sequence.
  • the number of elements in the first feature sequence generated in the optional implementation is less than, and the elements in the first feature sequence are obtained according to the first data feature The number, therefore, the amount of calculation in determining the relationship between the first feature sequence and the at least one representative slice can be reduced, thereby speeding up the processing.
  • the feature point is some special data point in the first data sequence (referred to as a representative data point)
  • the first feature generated by the second feature sequence obtained by using the feature point in the first data sequence
  • the association between the sequence and the at least one representative slice, and the error between the association between the first feature sequence generated using the first data sequence and the at least one representative slice is not too great.
  • the vector is used to represent the first feature sequence and the representative slice; in this case, the S303 can be packaged. Include: determining a distance between the first feature sequence and the at least one representative slice. S304 may include updating the current model if the distance is less than or equal to the first preset threshold.
  • the representative slice is a sequence of features constructed from the data characteristics of the historical business data, and thus, it can represent the representative slice using the manner of representing the first feature sequence described above.
  • the data feature of the online service data of the first feature sequence is determined to be the same as the data feature of the historical service data.
  • the data feature of the online service data and the data feature of the historical service data are both : The number of people watching the movie in the receiving window and the average playing time of the animation in the receiving window.
  • the distance between the first feature sequence and the representative slice can be seen as the distance between the two vectors.
  • the distance between two vectors can be determined in any way.
  • the first feature sequence and the representative slice can also be regarded as slices.
  • An optional implementation for determining the distance between the two slices is provided below. It should be noted that the two slices in the calculated distance are The number of elements is equal:
  • the first feature sequence is represented as Slice p and the representative slice is represented as Slice q .
  • the following formula determines the distance between Slice p and Slice q :
  • D(Slice p , Slice q ) represents the distance between Slice p and Slice q ;
  • I represents the number of data points (optionally feature points) in the first feature sequence, and I is greater than or equal to 1.
  • D m (Slice pi , Slice qi ) represents the mode distance between the i-th data feature of the online service data corresponding to Slice p and the i-th data feature of the historical service data corresponding to Slice q ;
  • D d (Slice pi , Slice qi ) represents the temporal distance between the i-th data feature of the online service data corresponding to Slice p and the i-th data feature of the historical service data corresponding to Slice q . among them:
  • ⁇ pi represents the rate of change between the i-th data point and the previous data point in Slice p
  • ⁇ qi represents the rate of change between the i-th data point and the previous data point in Slice q
  • R pi d pi represents the time period between the i-th data point in the previous Slice p data points representing a proportion of the total period of Slice p
  • t last data point represents the last time the Slice p where, t first Indicates the time at which the first data point of Slice p is located
  • d first represents the time period between the first data point of Slice p and the last data point in the previous slice (this time period is saved in the first of Slice p ) Among the elements).
  • the vector is used to represent the first feature sequence and the representative slice; in this case, the S303 can be packaged. Included: determining a similarity between the first feature sequence and the at least one representative slice. S304 may include updating the current model if the similarity is greater than or equal to the second preset threshold.
  • the similarity between the first feature sequence and the representative slice can be seen as the similarity between the two vectors.
  • the similarity between two vectors can be determined in any way.
  • the first feature sequence and the representative slice can also be regarded as slices.
  • An optional implementation for determining the similarity between the two slices is provided below. It should be noted that two slices of similarity are calculated. The number of elements in are equal:
  • the first feature sequence is represented as Slice p and the representative slice is represented as Slice q .
  • the following formula determines the similarity between Slice p and Slice q :
  • D(Slice p , Slice q ) represents the similarity between Slice p and Slice q
  • D m (Slice pi , Slice qi ) represents the i-th data feature of the online service data corresponding to Slice p corresponding to Slice q
  • D d (Slice pi , Slice qi ) indicates the i-th data feature of the online service data corresponding to Slice p and the i-th of the historical service data corresponding to Slice q
  • the time distance between data features is included in the time distance between data features. among them:
  • I represents the number of data points (optionally feature points) in the first feature sequence, and I is an integer greater than or equal to 1.
  • R pi d pi represents the time period between the i-th data point in the previous Slice p data points representing a proportion of the total period of Slice p; t last data point represents the last time the Slice p where, t first Indicates the time at which the first data point of Slice p is located; d first represents the time period between the first data point of Slice p and the last data point in the previous slice (this time period is saved in the first of Slice p ) Among the elements).
  • FIG. 8 is a schematic flowchart diagram of a method for acquiring a representative slice according to an embodiment of the present invention.
  • the method shown in Figure 8 can include:
  • S801 Acquire historical business data, and construct historical features according to data characteristics of historical business data. sequence.
  • the historical service data refers to any part of historical business data or all historical business data relative to the current time.
  • the historical service data refers to any part of historical business data or all historical business data relative to the current time.
  • the IPTV online video is an animation
  • the data feature of the online business data is an example of the number of people watching the animation. If the window is a time window, for example, half an hour, then S801 may include:
  • S1 The server counts the number of people watching the animation under each window in a period of time, and obtains the data sequence 1.
  • the data sequence 1 can be similar to the first data feature provided above.
  • a schematic diagram of a curve drawn according to the first data sequence is shown in FIG. 9.
  • the abscissa indicates the window number
  • the ordinate indicates the number of people watching the cartoon
  • FIG. 9 shows the watching cartoons obtained in several windows. Number of people.
  • S2 The server extracts the feature points in the data feature 1 and constructs the data sequence 2 from the extracted feature points.
  • the data sequence 2 can be similar to the second data feature provided above.
  • the extracted feature points are respectively represented as feature points A to P, as shown in FIG. 10 (FIG. 10 is drawn based on FIG. 9).
  • the model change point refers to an update trigger point whose amplitude between the two models before and after the triggered model update process is greater than or equal to a preset threshold.
  • the update trigger point may be an update trigger point determined according to a method for determining an update trigger point provided in the prior art, or may be determined by any method for determining an update trigger point according to an embodiment of the present invention. Update the trigger point.
  • the model change point is explained below through a specific example:
  • the model in the server is model 1
  • the sequence obtained by arranging the update trigger points in chronological order is: update trigger points 1, 2, then, at the time when the trigger point 1 is updated, the current model (ie, Model 1) After updating, model 2 is obtained; after updating the current model (ie, model 2) at the time when update trigger point 2 is updated, model 3 is obtained, as shown in FIG.
  • update trigger point 1 if the magnitude of the change between the two models before and after the model update process triggered by the update trigger point 1 (ie, model 1 and model 2) is greater than or equal to the preset threshold, it will be updated.
  • Trigger point 1 is used as the model change point; for updating trigger point 2, if the update model 2 is triggered by the model update process, the change between the two models (ie, model 2 and model 3) is greater than or equal to the preset. Threshold, the trigger point 2 will be updated as the model change point.
  • the specific implementation manner of the variation range between the two models is not limited in the embodiment of the present invention, and may be implemented by using any one of the prior art.
  • the magnitude of the change between the two models can be determined in any of the following ways:
  • the model is a logistic regression model, and the Euclidean distance between the vectors of the parameters of the two models may be used as the variation range between the two models.
  • Mode 2 Taking the model in the embodiment of the present invention as a naive Bayesian model, the Euclidean distance between the vectors formed by the prior probabilities of the two models may be used as the variation range between the two models.
  • the update trigger point in the optional implementation manner is an update trigger point determined by any method for determining an update trigger point provided by the embodiment of the present invention, and the trigger point to be tested and the update trigger point are updated.
  • the relationship between the model change points is explained: First, the trigger point to be tested, the update trigger point and the model change point are time concepts. Secondly, the trigger point to be tested may be the update trigger point, or may not be the update trigger point; the update trigger point must be the trigger point to be tested; the update trigger point may be the model change point or the model change point; the model change point must be Update the trigger point.
  • the interval between adjacent update trigger points is an integer multiple of the interval between adjacent test trigger points; the interval between adjacent model change points is an integer multiple of the interval between adjacent test trigger points; The interval between adjacent update trigger points is not directly related to the interval between adjacent model change points.
  • the relationship between the time at which the data point is located, the time at which the feature point is located, and the model change point are as follows: the time at which the data point is located may be the model change point or the model change point; the model change point must be The time at which the data point is located; there is no direct relationship between the time at which the feature point is located and the model change point.
  • FIG. 12 is drawn based on FIG. 10; in actual implementation, the server determines the model change point based on the update trigger point, and in order to clearly explain that the time at which the feature point is located is independent of the model change point, FIG. 11 Combined with the determined model change points In one figure (ie Figure 12). It can be seen from FIG.
  • the adjacent model change points may include the time at which one or more feature points are located, for example, the time at which the adjacent model change points F and H include the feature points F, G, and H;
  • One or more model change points may be included between the moments at which the feature points are located, for example, three model change points are included between the moments where the adjacent feature points B and C are located. Therefore, the moment at which the feature point is located is independent of the model change point.
  • the historical feature sequence is cut to obtain a plurality of segments; wherein, when cutting, the model change point can be used as the starting point of the latter segment.
  • Each fragment is a subset of the historical feature sequence.
  • k is an integer greater than or equal to 2
  • L is an integer greater than or equal to 2
  • n is an integer greater than or equal to 2.
  • the method shown in FIG. 8 may be performed by the server in an offline state, or may be performed by the server in an online state. If the method shown in FIG. 8 is that the server is executed in the online state, the method shown in FIG. 8 can be executed at any step before the execution of S303 described above.
  • the representative slice library may not be updated once determined; or may be updated as the historical feature sequence is updated.
  • the updating the historical feature sequence may include: when the online feature sequence (for example, the first feature sequence and the second feature sequence, etc.) gradually becomes a new historical feature sequence, in which case, These newly added historical feature sequences can be used as a new representative slice update representative slice library, or these newly added historical feature sequences can be combined with the original historical feature sequence to re-determine the representative slice to update the representative slice library; thus, the server
  • the data characteristics of the historical business data and the historical business data may not be saved, but the historical feature sequence may be saved, thereby saving storage space, and the rate of updating the representative slicing library may be increased, and the time for updating the representative slicing library may be shortened.
  • the slice between model change points 8, 9 can be expressed as: ⁇ (33, 2,3), (38,1,5) ⁇ .
  • S803 may include:
  • S803' The historical feature sequence is cut based on the model change point, and the slice obtained after the cutting is clustered to obtain a representative slice.
  • the historical feature sequence is cut based on the model change point, and the sliced slice is clustered by using a clustering algorithm to obtain a representative slice.
  • the embodiment of the present invention does not limit the specific implementation of the clustering algorithm, and may be any clustering algorithm in the prior art, for example, may be a k-means clustering algorithm.
  • the relationship between any two slices is determined. If the relationship satisfies certain conditions, the two slices may be clustered (ie, the two slices are considered to be slices of the same kind), and then selected. Any one of these slices is used as a representative slice of the class.
  • the relationship between two slices reference may be made to the above. For example, according to the above calculation method for determining the distance between Slice p and Slice q , the distance between any two slices obtained after cutting is determined, and if the distance is less than or equal to a preset threshold, the two slices are performed. Cluster and use one of the two slices as a representative slice.
  • the mode distance Dm (Slice_1, Slice_2), the temporal distance Dt (Slice_1, Slice_2), and the total distance D (Slice_1, Slice_2) between Slice_1 and Slice_2 are as shown in Table 2:
  • the distances of D (Slice_1, Slice_3) and D (Slice_2, Slice_3) can be calculated to be 0. Therefore, the three slices can be selected as the representative slice.
  • the features of the partial segments (ie, slices) obtained after cutting the historical feature sequences may be similar, and the clusters obtained after the cutting are clustered by using the optional implementation manner. It is enough to reduce the number of representative slices, thereby saving the storage space occupied by the representative slice library; further, it is also possible to reduce the determination between the online feature sequence (for example, the first feature sequence or the second feature sequence) and the representative slice similar to these features. The amount of computation in the process of associating relationships, thereby increasing the rate of model update.
  • the data point a is determined as a feature point in the process of determining whether the last trigger point to be tested is an update trigger point.
  • S13 may specifically include: determining whether the data point b is a feature point according to the method provided above, and directly using the data point c as a feature point. It is assumed that the determined second data sequence is: ⁇ data point a (104, v1), data point b (109, v2), data point c (114, v3) ⁇ .
  • a preset distance threshold ie, The first preset threshold in the text is used to determine whether the time at which the data point c is located (ie, the trigger point to be tested) is an update trigger point.
  • the current time is the time at which the data point c is located, it is only necessary to determine whether the time at which the data point c is located is an update trigger point.
  • determining whether the data point b is the update trigger point may include: calculating the slice Slice_ab (109, 2, 3) represented by ab and the representative slice Slice_AB ⁇ (5, - 2,5) ⁇ , the distance of Slice_BC1 ⁇ (7,3,2) ⁇ , as shown in Table 5.
  • the solution provided by the embodiment of the present invention is mainly introduced from the perspective of a model updating device (specifically, a server).
  • the model updating apparatus includes hardware structures and/or software modules corresponding to the execution of the respective functions.
  • the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the modules and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
  • the embodiment of the present invention may divide the function module by the model update device according to the above method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 14 shows a schematic structural diagram of a model updating apparatus 140.
  • the model updating device 140 may be the server involved in the above embodiment.
  • the model updating apparatus 140 may include: an obtaining module 1401, a building module 1402, a determining module 1403, and an updating module 1404; and optionally, the generating module 1405.
  • the function of each of the functional modules may be inferred according to the steps in the method embodiments provided above, or may refer to the content provided in the above content of the invention, and details are not described herein again. .
  • the above-mentioned obtaining module 1401, building module 1402, determining module 1403, updating module 1404, and generating module 1405 can all be integrated into one processing module in one model updating device.
  • the model updating apparatus may further include a communication module and a storage module.
  • FIG. 15 is a schematic structural diagram of a model updating apparatus 150 according to an embodiment of the present invention.
  • the model updating means 150 may be the server involved in the above embodiment.
  • the model updating apparatus 150 may include a processing module 1501 and a communication module 1502.
  • the processing module 1501 is configured to perform control management on the operation of the model updating apparatus 150.
  • the processing module 1501 is configured to support the model updating apparatus 150 to perform the operations in FIG. 3, FIG. 3a, FIG. 6, FIG. 6a, FIG. 8, and FIG.
  • Various steps, and/or other processes for the techniques described herein. can also be used to support the steps S1 to S3, S11 to S15, and the like provided in the specific examples above.
  • the communication module 1502 is configured to support communication of the model update device 150 with other network entities, such as communication with a service client, and the like.
  • the model updating apparatus 150 may further include: a storage module 1503, configured to store the program code and data corresponding to the model updating apparatus 150 to perform any of the model updating methods provided above.
  • the processing module 1501 may be a processor or a controller, such as a CPU, a general purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. It is possible to implement or carry out various exemplary logical blocks, modules and circuits described in connection with the disclosure of the embodiments of the invention.
  • the processor may also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the communication module 1502 can be a transceiver, a transceiver circuit, a communication interface, or the like.
  • the storage module 1503 can be a memory.
  • the model updating apparatus 150 may be shown by the model updating apparatus 20 shown in FIG. 2.
  • the device is exemplified by the division of each functional module mentioned above.
  • the above function assignment can be completed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete the above description. All or part of the function.
  • the device and the module described above refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or modules is only a logical function division.
  • there may be another division manner for example, multiple modules or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.
  • the modules described as separate components may or may not be physically separated.
  • the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, and the like, which can store a program code.

Abstract

A model updating method and apparatus, relating to the technical field of computers and used to at least solve the problem of a waste of resources caused by the poor significance or even insignificance of a later model update in two model updates triggered by two adjacent updating triggering points, due to the fact that changes between data features of newly-added data of two adjacent updating triggering points and data features of previous data are non-obvious. The method comprises: acquiring first online service data received within a window where a triggering point to be detected is located; according to the data features of the first online service data, constructing a first feature sequence; determining an association relationship between the first feature sequence and at least one representative slice, wherein the representative slice is a slice of a feature sequence constructed according to the data feature of historical service data; if the association relationship between the first feature sequence and the at least one representative slice satisfies a pre-set condition, updating the current model.

Description

一种模型更新方法和装置Model updating method and device
本申请要求于2016年8月8日提交中国专利局、申请号为201610645496.7,发明名称为“一种模型更新方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201610645496.7, entitled "A Model Updating Method and Apparatus" on August 8, 2016, the entire contents of which are incorporated herein by reference. .
技术领域Technical field
本发明涉及计算机技术领域,尤其涉及一种模型更新方法和装置。The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for updating a model.
背景技术Background technique
机器学习算法是通过对已知数据进行分析,获得数据模型(下文中称为模型),利用该模型对未知数据进行预测的算法;示例的,利用该模型和待接收的数据进行内容推荐服务等。传统的机器学习算法需要在学习之前,准备好所有的已知数据,且模型一旦获得,就不再更改。The machine learning algorithm is an algorithm that obtains a data model (hereinafter referred to as a model) by using known data, and uses the model to predict unknown data; for example, using the model and data to be received for content recommendation services, etc. . Traditional machine learning algorithms require all known data to be prepared before learning, and once the model is available, it is not changed.
随着在线类业务(例如在线推荐类业务、在线营销类业务等)的发展,数据规模不断增大,数据变化速率越来越快,利用传统的机器学习算法得到的模型不能很好地适应新增的数据的变化规律,使得利用该模型对未知数据的预测的准确率降低。基于此,增量建模技术应运而生。增量建模技术支持利用新增的数据渐进地对已得到的模型进行更新,使更新后的模型更能适应新增的数据的变化规律,从而提高对未知数据的预测的准确率。With the development of online business (such as online recommendation business, online marketing business, etc.), the data scale is increasing and the data change rate is getting faster and faster. The model obtained by the traditional machine learning algorithm can not adapt well to the new one. The law of variation of the increased data makes the prediction accuracy of the unknown data using the model lower. Based on this, incremental modeling technology came into being. Incremental modeling technology supports the incremental updating of the acquired model with new data, so that the updated model can better adapt to the changing rules of the newly added data, thereby improving the accuracy of prediction of unknown data.
目前,增量建模技术提供的模型更新方法如下:获取新增的数据、历史模型和更新触发点;在更新触发点所在的时刻,利用新增的数据更新历史模型,从而训练出新的模型。在增量建模技术中,何时触发模型的更新是一个关键问题,这影响到模型的更新频率及利用模型对未知数据进行预测的准确率。目前,一般将固定时长或固定数据量作为更新触发点,即:若从上一更新触发点所在的时刻至当前时刻之间的时间段达到固定时长,则触发模型进行更新;或,若从上一更新触发点所在的时刻开始新增的数据量达到固定数据量,则触发模型进行更新。At present, the model update method provided by the incremental modeling technology is as follows: obtain new data, historical model and update trigger point; update the historical model with new data at the moment when the trigger point is updated, thereby training a new model . In incremental modeling techniques, when to trigger the update of the model is a key issue, which affects the frequency of update of the model and the accuracy with which the model predicts unknown data. At present, a fixed duration or a fixed amount of data is generally used as an update trigger point, that is, if the time period from the last update trigger point to the current time reaches a fixed duration, the trigger model is updated; or, if from above When the amount of data added at the time when the trigger point is updated reaches a fixed amount of data, the model is triggered to be updated.
在利用上述确定更新触发点的方法进行模型更新的过程中,若相邻两个更新触发点之间的新增数据的数据特征与之前数据的数据特征之间变化不明显,则会导致该相邻两个更新触发点所触发的两次模型更新中在后的一次模型更新 的意义不大,甚至毫无疑义,从而造成资源的浪费。In the process of updating the model by using the method for determining the update trigger point, if the data characteristics of the newly added data between the two adjacent update trigger points are not significantly changed from the data characteristics of the previous data, the phase may be caused. The next model update in the two model updates triggered by the two adjacent update trigger points The meaning is not big, even no doubt, resulting in waste of resources.
发明内容Summary of the invention
本发明的实施例提供一种模型更新方法和装置,用以至少解决因相邻两个更新触发点之间的新增数据的数据特征与之前数据的数据特征之间变化不明显,而导致的该相邻两个更新触发点所触发的两次模型更新中在后的一次模型更新的意义不大,甚至毫无疑义,从而造成的资源浪费的问题。Embodiments of the present invention provide a model updating method and apparatus, which are used to at least solve the problem that the data feature of the newly added data between the two adjacent update trigger points does not change significantly with the data characteristics of the previous data. The subsequent model update in the two model updates triggered by the two adjacent update trigger points has little or no doubt, resulting in waste of resources.
为达到上述目的,本发明的实施例采用如下技术方案:In order to achieve the above object, embodiments of the present invention adopt the following technical solutions:
一方面,提供一种模型更新方法,包括:获取在待测触发点所在的窗口内接收到的第一在线业务数据,其中,待测触发点可以是任意一个待测触发点;然后,根据第一在线业务数据的数据特征,构建第一特征序列;确定第一特征序列与至少一个代表切片之间的关联关系,其中,代表切片是根据历史业务数据的数据特征构建的特征序列的切片;若第一特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。可见,本发明实施例提供的技术方案中结合了在线业务数据的数据特征、历史业务的数据特征、二者所构建的特征序列之间的关联关系,以及预设条件这些特征,来确定待测触发点是否为更新触发点;与现有技术提供的将固定时长或固定数据量作为更新触发点的技术方案相比,能够减少因相邻两个更新触发点之间的新增数据的数据特征与之前数据的数据特征之间变化不明显,而导致的该相邻两个更新触发点所触发的两次模型更新中在后的一次模型更新的意义不大,甚至毫无疑义的问题,从而节省资源。In one aspect, a method for updating a model is provided, including: acquiring first online service data received in a window where a trigger point to be tested is located, where the trigger point to be tested may be any one of the trigger points to be tested; a data feature of the online service data, constructing a first feature sequence; determining an association relationship between the first feature sequence and the at least one representative slice, wherein the representative slice is a slice of the feature sequence constructed according to the data feature of the historical service data; The current model is updated when the relationship between the first feature sequence and the at least one representative slice satisfies a preset condition. It can be seen that the technical solution provided by the embodiment of the present invention combines the data feature of the online service data, the data feature of the historical service, the association relationship between the feature sequences constructed by the two, and the preset conditions to determine the test to be tested. Whether the trigger point is an update trigger point; compared with the technical solution provided by the prior art that the fixed duration or the fixed data amount is used as the update trigger point, the data characteristics of the newly added data between the adjacent two update trigger points can be reduced. The change from the data characteristics of the previous data is not obvious, and the subsequent model update triggered by the two adjacent update trigger points has little meaning or even a doubtful problem. save resources.
其中,关联关系,是指一个特征序列与一个代表切片之间的关联关系。具体实现时,若使用向量表示第一特征序列和代表切片,则一个特征序列与一个代表切片之间的关联关系可以用一个特征序列与一个代表切片之间的距离或相似度等表示。若至少一个代表切片包括多个代表切片,则第一特征序列与至少一个代表切片之间的关联关系满足预设条件可以包括:第一特征序列与该多个代表切片中的任意一个或多个代表切片之间的关联关系满足预设条件。The association relationship refers to the relationship between a feature sequence and a representative slice. In a specific implementation, if a vector is used to represent the first feature sequence and the representative slice, the relationship between a feature sequence and a representative slice may be represented by a distance or similarity between a feature sequence and a representative slice. If the at least one representative slice includes a plurality of representative slices, the relationship between the first feature sequence and the at least one representative slice satisfying the preset condition may include: the first feature sequence and any one or more of the plurality of representative slices The relationship between the representative slices satisfies the preset condition.
在一种可能的设计中,在确定第一特征序列与至少一个代表切片之间的关联关系之后,该方法还可以包括:若第一特征序列与至少一个代表切片之间的关联关系不满足预设条件,则获取待测触发点的后续待测触发点所在的窗口内接收到的第二在线业务数据;然后,根据第一在线业务数据的数据特征和第二在线业务数据的数据特征,按照接收时间先后顺序构建第二特征序列;确定第 二特征序列与至少一个代表切片之间的关联关系;若第二特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。实际实现时,可选的,若第一特征序列与至少一个代表切片之间的关联关系不满足预设条件,则获取待测触发点的下一待测触发点所在的窗口内接收到的在线业务数据;然后根据第一在线业务数据的数据特征和该下一待测触发点所在的窗口内接收到的在线业务数据按照接收时间先后顺序构建新的特征序列,并确定该新的特征序列与至少一个代表切片之间的关联关系,若该关联关系满足预设条件,则更新当前模型。若该关联关系不满足预设条件,则获取该下一待测触发点的下一待测触发点所在的窗口内接收到的在线业务数据……直至所构建的新的特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。In a possible design, after determining the association relationship between the first feature sequence and the at least one representative slice, the method may further include: if the association relationship between the first feature sequence and the at least one representative slice does not satisfy the pre- Setting the condition, obtaining the second online service data received in the window of the subsequent test trigger point of the trigger point to be tested; and then, according to the data feature of the first online service data and the data feature of the second online service data, according to Receiving time sequence to construct a second feature sequence; determining And an association relationship between the second feature sequence and the at least one representative slice; if the association relationship between the second feature sequence and the at least one representative slice satisfies a preset condition, the current model is updated. In an actual implementation, if the association between the first feature sequence and the at least one representative slice does not meet the preset condition, the online received in the window where the next trigger point to be tested is located is obtained. Business data; then constructing a new feature sequence according to the receiving time order according to the data characteristics of the first online service data and the online service data received in the window of the next test trigger point, and determining the new feature sequence and At least one represents an association relationship between slices, and if the association relationship satisfies a preset condition, the current model is updated. If the association relationship does not meet the preset condition, the online service data received in the window of the next to-be-tested trigger point of the next to-be-tested trigger point is obtained... until the constructed new feature sequence and at least one representative The relationship between the slices meets the preset conditions, and the current model is updated.
在一种可能的设计中,使用向量表示第一特征序列和代表切片;确定第一特征序列与至少一个代表切片之间的关联关系,可以包括:确定第一特征序列与至少一个代表切片之间的距离。该情况下,若第一特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型,可以包括:若距离小于或等于第一预设阈值,则更新当前模型。In a possible design, the vector is used to represent the first feature sequence and the representative slice; determining the association relationship between the first feature sequence and the at least one representative slice may include: determining between the first feature sequence and the at least one representative slice the distance. In this case, if the association relationship between the first feature sequence and the at least one representative slice meets the preset condition, updating the current model may include: updating the current model if the distance is less than or equal to the first preset threshold.
在一种可能的设计中,使用向量表示第一特征序列和代表切片;确定第一特征序列与至少一个代表切片之间的关联关系,可以包括:确定第一特征序列与至少一个代表切片之间的相似度。该情况下,若第一特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型,可以包括:若相似度大于或等于第二预设阈值,则更新当前模型。In a possible design, the vector is used to represent the first feature sequence and the representative slice; determining the association relationship between the first feature sequence and the at least one representative slice may include: determining between the first feature sequence and the at least one representative slice Similarity. In this case, if the association relationship between the first feature sequence and the at least one representative slice meets the preset condition, updating the current model may include: if the similarity is greater than or equal to the second preset threshold, updating the current model.
在一种可能的设计中,根据第一在线业务数据的数据特征,构建第一特征序列,可以包括:根据第一在线业务数据的数据特征,构建第一数据序列;其中,第一数据序列中的一个元素为一个数据点,数据点至少包含以下特征:数据点所在的时刻,数据点所对应的业务数据的数据特征;示例的,第一在线业务数据对应的数据点至少包含以下特征:该数据点所在的时刻(即第一在线业务数据的接收窗口的末尾处),第一在线业务数据的数据特征。示例的,数据点可以表示为(t,v),其中,t表示该数据点所在的时刻,v表示该数据点所对应的业务数据的数据特征。然后,将第一数据序列生成第一特征序列;其中,第一特征序列中的元素至少包含以下特征:该数据点所在的时刻,该数据点与前一数据点(即与该数据点相邻的前一数据点)之间的变化速率;可选的,还可以包括以下特征:该数据点所在的时刻与前一数据点所在的时刻之间的时间段。 示例的,第一特征序列中的元素可以表示为(t,△,d),其中,t表示该数据点所在的时刻,△表示该数据点与前一数据点之间的变化速率,d表示该数据点所在的时刻与前一数据点所在的时刻之间的时间段。该可选的设计给出了一种根据在线业务数据的数据特征构建的特征序列的具体实现方式,但是,具体实现时不限于此。例如,特征序列中的每个元素所包含的特征的数量、含义等可以根据实际需要进行更改,即使如此,其整体的构思也使用了该可能的设计中的构思。In a possible design, the constructing the first feature sequence according to the data feature of the first online service data may include: constructing the first data sequence according to the data feature of the first online service data; wherein, in the first data sequence An element of the data is a data point, and the data point includes at least the following features: a time at which the data point is located, and a data feature of the service data corresponding to the data point; for example, the data point corresponding to the first online service data includes at least the following features: The time at which the data point is located (ie, at the end of the receiving window of the first online service data), the data characteristics of the first online service data. For example, the data point can be expressed as (t, v), where t represents the time at which the data point is located, and v represents the data characteristic of the service data corresponding to the data point. And generating, by the first data sequence, the first feature sequence; wherein the element in the first feature sequence includes at least the following feature: a time point of the data point, the data point is adjacent to the previous data point (ie, adjacent to the data point The rate of change between the previous data points; optionally, may also include the following characteristics: the time period between the time the data point is located and the time the previous data point was. For example, an element in the first feature sequence can be represented as (t, Δ, d), where t represents the time at which the data point is located, Δ represents the rate of change between the data point and the previous data point, and d represents The time period between the time the data point is located and the time the previous data point was. The optional design provides a specific implementation of the feature sequence constructed according to the data characteristics of the online service data, but the specific implementation is not limited thereto. For example, the number, meaning, and the like of the features included in each element in the feature sequence can be changed according to actual needs. Even so, the overall concept uses the concept in the possible design.
在一种可能的设计中,在根据第一在线业务数据的数据特征,构建第一数据序列之后,该方法还可以包括:提取第一数据序列中的特征点(即特殊的数据点,或称为有代表性的数据点,具体实现时,可以根据实际需要进行确定。从物理意义上来讲,特征点是曲线上的局部极值点、拐点等,其中,局部极值点可以包括:峰值点、谷值点等),并根据第一数据序列中的特征点构建第二数据序列。该情况下,将第一数据序列生成第一特征序列,可以包括:将第二数据序列生成第一特征序列;其中,第一特征序列中的元素包括特征点所在的时刻,特征点与前一特征点之间的变化速率,以及特征点所在的时刻与前一特征点所在的时刻之间的时间段。In a possible design, after constructing the first data sequence according to the data feature of the first online service data, the method may further include: extracting feature points in the first data sequence (ie, special data points, or For a representative data point, when it is specifically implemented, it can be determined according to actual needs. In the physical sense, the feature point is a local extremum point, an inflection point, etc. on the curve, wherein the local extremum point may include: a peak point , a valley point, etc.), and constructing a second data sequence based on feature points in the first data sequence. In this case, the generating the first feature sequence by the first data sequence may include: generating the first feature sequence by the second data sequence; wherein the element in the first feature sequence includes a time at which the feature point is located, and the feature point is the previous one The rate of change between feature points, and the time period between the time at which the feature point is located and the time at which the previous feature point is located.
由于实际实现时,第一数据序列中包含的数据点的个数会很多,这样,若直接根据第一数据序列生成第一特征序列,则第一特征序列中的元素的个数会很多,这会使得在确定第一特征序列与至少一个代表切片之间的关联关系的过程中的计算量较大;该可能的设计通过提取第一数据序列中的特征点得到第二数据序列,并根据第二数据序列生成第一特征序列;该可能的设计中生成的第一特征序列中的元素的个数小于,根据第一数据特征得到第一特征序列中的元素的个数,因此,能够减少确定第一特征序列与至少一个代表切片之间的关联关系的过程中的计算量,从而加快处理速度。另外,由于特征点是第一数据序列中的一些特殊的数据点,因此,利用第一数据序列中的特征点得到的第二特征序列生成的第一特征序列与至少一个代表切片之间的关联关系,与,利用第一数据序列生成的第一特征序列与至少一个代表切片之间的关联关系之间的误差不会太大。Since the actual number of data points included in the first data sequence is large, the number of elements in the first feature sequence may be many if the first feature sequence is directly generated according to the first data sequence. The calculation amount in the process of determining the association relationship between the first feature sequence and the at least one representative slice is large; the possible design obtains the second data sequence by extracting the feature points in the first data sequence, and according to the The second data sequence generates a first feature sequence; the number of elements in the first feature sequence generated in the possible design is smaller than that, and the number of elements in the first feature sequence is obtained according to the first data feature, thereby reducing the determination The amount of calculation in the process of the association between the first feature sequence and at least one representative slice, thereby speeding up the processing. In addition, since the feature point is some special data point in the first data sequence, the association between the first feature sequence generated by the second feature sequence obtained by using the feature point in the first data sequence and the at least one representative slice The relationship between the relationship, and the relationship between the first feature sequence generated using the first data sequence and the at least one representative slice is not too large.
在一种可能的设计中,上述待测触发点是第i个待测触发点,i≥1,i是整数;若i=1,则该待测触发点所在的窗口是指从开始接收在线业务数据的时刻至待测触发点之间的窗口;若i≥2,则该待测触发点所在的窗口是从第i-1个 待测触发点至待测触发点之间的窗口。该可能的设计,若服务器源源不断地接收在线业务数据,则能够保证后续步骤中,服务器在任意一待测触发点均获取到一组数据特征,从而能够保证确定每个待测触发点是否是更新触发点。In a possible design, the trigger point to be tested is the i-th trigger point to be tested, i≥1, i is an integer; if i=1, the window of the trigger point to be tested refers to receiving online from the beginning. The window between the time of the business data and the trigger point to be tested; if i≥2, the window of the trigger point to be tested is from the i-1th The window between the trigger point to be tested and the trigger point to be tested. The possible design, if the server continuously receives the online service data, can ensure that the server acquires a set of data features at any of the trigger points to be tested in the subsequent steps, thereby ensuring whether each trigger point to be tested is determined to be Update the trigger point.
在一种可能的设计中,上述待测触发点是第i个待测触发点,i≥1,i是整数;若i=1,则该待测触发点所在的窗口是指从开始接收在线业务数据的时刻至待测触发点之间的窗口的1/N;若i≥2,则该待测触发点所在的窗口是从第i-1个待测触发点至待测触发点之间的窗口的1/N。其中,N>2,N是整数,1/N表示N分之一。该可能的设计,若服务器源源不断地接收在线业务数据,则能够保证后续步骤中,服务器在任意一待测触发点均获取到多组数据特征,从而能够保证确定每个待测触发点是否是更新触发点;并且,相比上一可能的设计,该可能的设计中,获取数据特征的粒度(即窗口)更小,从而得到的数据特征的数量更多,从统计学的角度上来讲,这能够提高计算的精确度。In a possible design, the trigger point to be tested is the i-th trigger point to be tested, i≥1, i is an integer; if i=1, the window of the trigger point to be tested refers to receiving online from the beginning. 1/N of the window between the time of the service data and the trigger point to be tested; if i≥2, the window of the trigger point to be tested is from the i-1th trigger point to be tested to the trigger point to be tested 1/N of the window. Where N>2, N is an integer, and 1/N represents one-N. The possible design, if the server continuously receives the online service data, can ensure that the server acquires multiple sets of data features at any trigger point to be tested in the subsequent step, thereby ensuring whether each trigger point to be tested is determined to be Updating the trigger point; and, compared to the last possible design, the granularity (ie, window) of acquiring the data feature is smaller in the possible design, and thus the number of data features obtained is larger, from a statistical point of view, This can improve the accuracy of the calculation.
在一种可能的设计中,该方法还可以包括:将从开始接收在线业务数据的时刻开始的预设时长的整数倍时的时刻,确定为待测触发点。In a possible design, the method may further include determining, as a trigger point to be tested, a time when an integer multiple of the preset duration from the time when the online service data is started to be received.
在一种可能的设计中,该方法还可以包括:将从开始接收在线业务数据的时刻开始至接收到的在线业务数据为预设数据量的整数倍时的时刻,确定为待测触发点。In a possible design, the method may further include determining, as a trigger point to be tested, from a time when the online service data is received to a time when the received online service data is an integer multiple of the preset data amount.
需要说明的是,实际实现时,根据任意规则确定待测触发点,均不会影响本发明实施例提供的技术方案的基本构思,因此,如何确定待测触发点的具体实现方式不限于上述提供的两种可能的设计。It should be noted that, in actual implementation, determining the trigger point to be tested according to any rule does not affect the basic concept of the technical solution provided by the embodiment of the present invention. Therefore, how to determine the specific implementation manner of the trigger point to be tested is not limited to the foregoing. Two possible designs.
在一种可能的设计中,在所述确定所述第一特征序列与至少一个代表切片之间的关联关系之前,所述方法还可以包括:获取历史业务数据,并根据所述历史业务数据构建历史特征序列;然后,确定所述历史特征序列中的模型变化点,其中,模型变化点是指:所触发的模型更新过程前后的两个模型之间的变化幅度大于或等于预设阈值的更新触发点;接着,基于所述历史特征序列中的模型变化点对历史特征序列进行切割,得到代表切片。其中,确定模型变化点和切割历史特征序列的具体实现方式可以参考图11。该可能的设计中提供的方法可以在离线状态下得到,也可以是在在线状态下得到的;并且,代表切片可以一生成就不再改变,或者在需要对代表切片进行更新时再更新,也可以随着历史特征序列的更新而更新。具体实现时,还可以根据经验确定代表切片,然后预先存储这些代表切片。 In a possible design, before the determining the association relationship between the first feature sequence and the at least one representative slice, the method may further include: acquiring historical service data, and constructing according to the historical service data. a sequence of historical features; then, determining a model change point in the sequence of historical features, wherein the model change point is an update of the magnitude of the change between the two models before and after the triggered model update process is greater than or equal to a preset threshold Trigger point; then, the historical feature sequence is cut based on the model change point in the historical feature sequence to obtain a representative slice. For the specific implementation manner of determining the model change point and the cutting history feature sequence, reference may be made to FIG. 11 . The method provided in the possible design can be obtained in an offline state or in an online state; and, the representative slice can be changed without being generated, or updated when the representative slice needs to be updated, or Updated as the sequence of historical features is updated. In the specific implementation, the representative slice can also be determined empirically, and then these representative slices are stored in advance.
在一种可能的设计中,上述基于模型变化点对历史特征序列进行切割,得到代表切片,可以包括:基于模型变化点对历史特征序列进行切割,并对切割后得到的切片进行聚类,得到代表切片。相比上一可能的设计,该可能的设计能够减少代表切片的数量,从而节省代表切片库所占的存储空间;进一步地,还可以减少确定在线特征序列(例如第一特征序列或第二特征序列)与这些特征类似的代表切片之间的关联关系的过程中的计算量,从而提高模型更新速率。In a possible design, the above-mentioned cutting of the historical feature sequence based on the model change point to obtain a representative slice may include: cutting the historical feature sequence based on the model change point, and clustering the slice obtained after the cutting, to obtain Represents a slice. Compared with the last possible design, the possible design can reduce the number of representative slices, thereby saving the storage space occupied by the representative slice library; further, it can also reduce the determination of the online feature sequence (for example, the first feature sequence or the second feature) Sequence) A computational amount in the process of representing the association between slices, similar to these features, thereby increasing the rate of model update.
另一方面,提供一种模型更新装置,该装置可以实现上述方法示例中所执行的功能,示例的,该装置可以包括:获取模块、构建模块、确定模块和更新模块。其中:获取模块,用于获取在待测触发点所在的窗口内接收到的第一在线业务数据。构建模块,用于根据第一在线业务数据的数据特征,构建第一特征序列。确定模块,用于确定第一特征序列与至少一个代表切片之间的关联关系;代表切片是根据历史业务数据的数据特征构建的特征序列的切片。更新模块,用于若第一特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。In another aspect, a model updating apparatus is provided, which can implement the functions performed in the above method examples. For example, the apparatus may include: an obtaining module, a building module, a determining module, and an updating module. The acquiring module is configured to obtain the first online service data received in the window where the trigger point to be tested is located. And a building module, configured to construct a first feature sequence according to data characteristics of the first online service data. And a determining module, configured to determine an association relationship between the first feature sequence and the at least one representative slice; the representative slice is a slice of the feature sequence constructed according to the data feature of the historical service data. And an updating module, configured to update the current model if an association relationship between the first feature sequence and the at least one representative slice satisfies a preset condition.
在一种可能的设计中,获取模块还可以用于:若第一特征序列与至少一个代表切片之间的关联关系不满足预设条件,则获取该待测触发点的后续待测触发点所在的窗口内接收到的第二在线业务数据;构建模块还可以用于:根据第一在线业务数据的数据特征和第二在线业务数据的数据特征,按照接收时间先后顺序构建第二特征序列;确定模块还可以用于:确定第二特征序列与至少一个代表切片之间的关联关系;更新模块还可以用于:若第二特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。In a possible design, the acquiring module may be further configured to: if the association relationship between the first feature sequence and the at least one representative slice does not satisfy the preset condition, obtain a subsequent trigger point to be tested of the to-be-tested trigger point The second online service data received in the window; the building module may be further configured to: construct the second feature sequence according to the receiving time sequence according to the data feature of the first online service data and the data feature of the second online service data; The module may be further configured to: determine an association relationship between the second feature sequence and the at least one representative slice; the update module may be further configured to: if the relationship between the second feature sequence and the at least one representative slice meets a preset condition, Update the current model.
在一种可能的设计中,使用向量表示第一特征序列和代表切片;确定模块具体可以用于:确定第一特征序列与至少一个代表切片之间的距离;更新模块具体可以用于:若距离小于或等于第一预设阈值,则更新当前模型。In a possible design, the vector is used to represent the first feature sequence and the representative slice; the determining module is specifically configured to: determine a distance between the first feature sequence and the at least one representative slice; the update module may be specifically configured to: if the distance If it is less than or equal to the first preset threshold, the current model is updated.
在一种可能的设计中,使用向量表示第一特征序列和代表切片;确定模块具体可以用于:确定第一特征序列与至少一个代表切片之间的相似度;更新模块具体可以用于:若相似度大于或等于第二预设阈值,则更新当前模型。In a possible design, the vector is used to represent the first feature sequence and the representative slice; the determining module is specifically configured to: determine a similarity between the first feature sequence and the at least one representative slice; the update module may be specifically configured to: If the similarity is greater than or equal to the second preset threshold, the current model is updated.
在一种可能的设计中,构建模块具体可以用于:根据第一在线业务数据的数据特征,构建第一数据序列;其中,第一数据序列中的一个元素为一个数据点,数据点至少包含以下特征:数据点所在的时刻,数据点所对应的业务数据的数据特征;将第一数据序列生成第一特征序列;其中,第一特征序列中的元 素至少包含以下特征:数据点所在的时刻,数据点与前一数据点之间的变化速率,以及数据点所在的时刻与前一数据点所在的时刻之间的时间段。In a possible design, the building module may be specifically configured to: construct a first data sequence according to data characteristics of the first online service data; wherein, one element in the first data sequence is a data point, and the data point includes at least The following features: a data point at which the data point is located, a data feature of the service data corresponding to the data point; generating a first feature sequence from the first data sequence; wherein the element in the first feature sequence The prime includes at least the following characteristics: the time at which the data point is located, the rate of change between the data point and the previous data point, and the time period between the time at which the data point is located and the time at which the previous data point is located.
在一种可能的设计中,构建模块还可以用于:提取第一数据序列中的特征点,并根据第一数据序列中的特征点构建第二数据序列。该情况下,构建模块在执行将第一数据序列生成第一特征序列时,具体可以用于:将第二数据序列生成第一特征序列;其中,第一特征序列中的元素包括特征点所在的时刻,特征点与前一特征点之间的变化速率,以及特征点所在的时刻与前一特征点所在的时刻之间的时间段。In a possible design, the building module may be further configured to: extract feature points in the first data sequence, and construct a second data sequence according to the feature points in the first data sequence. In this case, the constructing module may be configured to: when the first data sequence is generated by the first data sequence, generate the first feature sequence; wherein the element in the first feature sequence includes the feature point Time, the rate of change between the feature point and the previous feature point, and the time period between the time at which the feature point is located and the time at which the previous feature point is located.
在一种可能的设计中,该待测触发点是第i个待测触发点,i≥1,i是整数;若i=1,则该待测触发点所在的窗口是指从开始接收在线业务数据的时刻至该待测触发点之间的窗口;若i≥2,则该待测触发点所在的窗口是从第i-1个待测触发点至该待测触发点之间的窗口。In a possible design, the trigger point to be tested is the i-th trigger point to be tested, i≥1, i is an integer; if i=1, the window of the trigger point to be tested refers to receiving the online from the beginning. a window between the time of the service data and the trigger point to be tested; if i≥2, the window of the trigger point to be tested is a window from the i-1th trigger point to be tested to the trigger point to be tested .
在一种可能的设计中,该待测触发点是第i个待测触发点,i≥1,i是整数;若i=1,则该待测触发点所在的窗口是指从开始接收在线业务数据的时刻至该待测触发点之间的窗口的1/N;若i≥2,则该待测触发点所在的窗口是从第i-1个待测触发点至该待测触发点之间的窗口的1/N。其中,N≥2,N是整数。In a possible design, the trigger point to be tested is the i-th trigger point to be tested, i≥1, i is an integer; if i=1, the window of the trigger point to be tested refers to receiving the online from the beginning. 1/N of the window between the time of the service data and the trigger point to be tested; if i≥2, the window of the trigger point to be tested is from the i-1th trigger point to be tested to the trigger point to be tested 1/N between the windows. Where N ≥ 2 and N is an integer.
在一种可能的设计中,确定模块还可以用于:将从开始接收在线业务数据的时刻开始的预设时长的整数倍时的时刻,确定为该待测触发点。In a possible design, the determining module may be further configured to determine, as the trigger point to be tested, a time when an integer multiple of the preset duration from the time when the online service data is started to be received.
在一种可能的设计中,确定模块还可以用于:将从开始接收在线业务数据的时刻开始至接收到的在线业务数据为预设数据量的整数倍时的时刻,确定为该待测触发点。In a possible design, the determining module may be further configured to determine, as the trigger to be tested, from a moment when the online service data is started to be received, and when the received online service data is an integer multiple of the preset data amount. point.
在一种可能的设计中,获取模块还可以用于:获取历史业务数据;构建模块还可以用于:根据历史业务数据构建历史特征序列;确定模块还可以用于:确定历史特征序列中的模型变化点;该装置还可以包括:生成模块,用于基于历史特征序列中的模型变化点对历史特征序列进行切割,得到代表切片。In a possible design, the obtaining module may be further configured to: obtain historical business data; the building module may be further configured to: construct a historical feature sequence according to historical business data; and the determining module may be further configured to: determine a model in the historical feature sequence The change point; the apparatus may further include: a generating module, configured to cut the historical feature sequence based on the model change point in the historical feature sequence to obtain a representative slice.
在一种可能的设计中,生成模块具体可以用于:基于模型变化点对历史特征序列进行切割,并对切割后得到的切片进行聚类,得到代表切片。In a possible design, the generating module may be specifically configured to: cut the historical feature sequence based on the model change point, and cluster the slice obtained after the cutting to obtain a representative slice.
又一方面,提供一种模型更新装置,该装置可以实现上述方法示例中所执行的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个上述功能相应的模块。In still another aspect, a model updating apparatus is provided, which can implement the functions performed in the above method examples, and the functions can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.
在一种可能的设计中,该装置的结构中包括处理器存储器、系统总线和通 信接口;该处理器被配置为支持该装置执行上述方法中相应的功能。该通信接口用于支持该装置与其他网元之间的通信。该装置还可以包括存储器,该存储器用于与处理器耦合,其保存该装置必要的程序指令和数据。该通信接口具体可以是收发器。In a possible design, the structure of the device includes processor memory, system bus and communication A signaling interface; the processor is configured to support the apparatus to perform the corresponding functions of the above methods. The communication interface is used to support communication between the device and other network elements. The apparatus can also include a memory for coupling with the processor that retains the program instructions and data necessary for the apparatus. The communication interface may specifically be a transceiver.
再一方面,本发明实施例提供了一种计算机存储介质,用于储存上述方法所对应的计算机软件指令,其包含用于执行上述方面所设计的程序。In still another aspect, an embodiment of the present invention provides a computer storage medium for storing computer software instructions corresponding to the foregoing method, which includes a program designed to execute the above aspects.
可以理解地,上述提供的任一种模型更新装置或计算机存储介质均用于执行上文所提供的模型更新方法,因此,其所能达到的有益效果可参考上文所提供的相应的模型更新方法中的有益效果,此处不再赘述。It can be understood that any of the model update devices or computer storage media provided above are used to perform the model update method provided above, and therefore, the beneficial effects that can be achieved can be referred to the corresponding model update provided above. The beneficial effects in the method are not described here.
附图说明DRAWINGS
图1为本发明实施例提供的技术方案所适用的一种系统的架构示意图;1 is a schematic structural diagram of a system to which the technical solution provided by the embodiment of the present invention is applied;
图2为本发明实施例提供的一种模型更新装置的结构示意图;2 is a schematic structural diagram of a model updating apparatus according to an embodiment of the present invention;
图3为本发明实施例提供的一种模型更新方法的流程示意图;FIG. 3 is a schematic flowchart diagram of a method for updating a model according to an embodiment of the present disclosure;
图3a为本发明实施例提供的另一种模型更新方法的流程示意图;FIG. 3a is a schematic flowchart diagram of another method for updating a model according to an embodiment of the present disclosure;
图4为一种窗口与待测触发点之间的关系的示意图;4 is a schematic diagram of a relationship between a window and a trigger point to be tested;
图5为另一种窗口与待测触发点之间的关系的示意图;FIG. 5 is a schematic diagram of another relationship between a window and a trigger point to be tested;
图6为本发明实施例提供的另一种模型更新方法的流程示意图;FIG. 6 is a schematic flowchart diagram of another method for updating a model according to an embodiment of the present disclosure;
图6a为本发明实施例提供的另一种模型更新方法的流程示意图;FIG. 6a is a schematic flowchart diagram of another method for updating a model according to an embodiment of the present disclosure;
图7为本发明实施例提供的一种确定特征点的示意图;FIG. 7 is a schematic diagram of determining feature points according to an embodiment of the present invention;
图8为本发明实施例提供的一种获取代表切片的方法的流程示意图;FIG. 8 is a schematic flowchart diagram of a method for acquiring a representative slice according to an embodiment of the present invention;
图8a为本发明实施例提供的一种获取代表切片的方法的流程示意图;FIG. 8 is a schematic flowchart of a method for acquiring a representative slice according to an embodiment of the present invention;
图9为本发明实施例提供的一种根据第一数据序列绘制的曲线的示意图;FIG. 9 is a schematic diagram of a curve drawn according to a first data sequence according to an embodiment of the present invention; FIG.
图10为本发明实施例提供的一种图9所示的曲线确定的特征点的示意图;FIG. 10 is a schematic diagram of feature points determined by the curve shown in FIG. 9 according to an embodiment of the present invention; FIG.
图11为本发明实施例提供的一种更新触发点前后模型变化的示意图;FIG. 11 is a schematic diagram of a model change before and after an update trigger point according to an embodiment of the present invention; FIG.
图12为本发明实施例提供的一种基于图9所示的曲线确定的模型变化点的示意图;FIG. 12 is a schematic diagram of a model change point determined based on the curve shown in FIG. 9 according to an embodiment of the present invention; FIG.
图13为本发明实施例提供的一种根据数据点绘制的曲线的示意图;FIG. 13 is a schematic diagram of a curve drawn according to data points according to an embodiment of the present invention; FIG.
图14为本发明实施例提供的一种模型更新装置的结构示意图;FIG. 14 is a schematic structural diagram of a model updating apparatus according to an embodiment of the present invention;
图15为本发明实施例提供的另一种模型更新装置的结构示意图。 FIG. 15 is a schematic structural diagram of another model updating apparatus according to an embodiment of the present invention.
具体实施方式detailed description
本发明实施例提供的技术方案的基本原理为:根据在线业务数据的数据特征构建的特征序列,与根据历史业务数据的数据特征构建的特征序列的代表切片,之间的关联关系满足预设条件时,对模型进行更新。由于本发明实施例提供的技术方案中结合了在线业务数据的数据特征、历史业务的数据特征、二者所构建的特征序列之间的关联关系,以及预设条件这些特征,来确定待测触发点是否为更新触发点;与现有技术提供的将固定时长或固定数据量作为更新触发点的技术方案相比,能够减少因相邻两个更新触发点之间的新增数据的数据特征与之前数据的数据特征变化不明显,而导致的该相邻两个更新触发点所触发的两次模型更新中在后的一次模型更新的意义不大,甚至毫无疑义的问题,从而节省资源。The basic principle of the technical solution provided by the embodiment of the present invention is that the relationship between the feature sequence constructed according to the data feature of the online service data and the representative slice of the feature sequence constructed according to the data feature of the historical service data satisfies a preset condition. When the model is updated. The technical solution provided by the embodiment of the present invention combines the data characteristics of the online service data, the data characteristics of the historical service, the association relationship between the feature sequences constructed by the two, and the preset conditions to determine the trigger to be tested. Whether the point is an update trigger point; compared with the technical solution provided by the prior art that the fixed duration or the fixed data amount is used as the update trigger point, the data characteristics of the newly added data between the adjacent two update trigger points can be reduced. The data characteristics of the previous data change are not obvious, and the subsequent model update in the two model updates triggered by the adjacent two update trigger points has little meaning and even no doubt, thereby saving resources.
如图1所示,是本发明实施例提供的技术方案所适用的一种系统的架构示意图,该系统可以包括服务器和与该服务器连接的一个或多个业务客户端,图1中是以系统中包含两个业务客户端,即业务客户端1和业务客户端2,为例进行说明的。其中,业务客户端可以为在线类业务的用户使用端,例如,网路协议电视(internet protocol television,IPTV)的机顶盒、智能手机、电脑等。FIG. 1 is a schematic structural diagram of a system to which the technical solution provided by the embodiment of the present invention is applicable, where the system may include a server and one or more service clients connected to the server, and FIG. 1 is a system. Two business clients, Service Client 1 and Business Client 2, are included in the example. The service client can be used by users of the online service, for example, a set-top box of an internet protocol television (IPTV), a smart phone, a computer, and the like.
业务客户端可以获取并记录业务数据,并根据预设规则向服务器发送业务数据,示例的,以业务客户端为视频播放客户端为例,视频播放客户端可以在播放视频的过程中,获取并记录业务数据,并在视频结束时刻逐条或批量地向服务器发送该业务数据。服务器用于接收业务客户端发送的业务数据,并根据业务数据维护(或更新)模型,其中,更新后的模型用于使服务器根据待接收的业务数据进行预测。The service client can obtain and record the service data, and send the service data to the server according to the preset rule. For example, the service client uses the video client as an example, and the video player can obtain the video during the process of playing the video. The business data is recorded, and the business data is sent to the server one by one or in batches at the end of the video. The server is configured to receive service data sent by the service client, and maintain (or update) the model according to the service data, where the updated model is used to enable the server to perform prediction according to the service data to be received.
如图2所示,是本发明实施例提供的一种模型更新装置20的结构示意图。该模型更新装置20可以是服务器,该模型更新装置20可以包括:处理器201、存储器202、系统总线203和通信接口204。其中:存储器202用于存储计算机执行指令,处理器201与存储器202通过系统总线连接,当模型更新装置20运行时,处理器201执行存储器203存储的计算机执行指令,以使模型更新装置20执行本发明实施例提供的任意一种模型更新方法。具体的模型更新方法可参考下文及附图中的相关描述,此处不再赘述。FIG. 2 is a schematic structural diagram of a model updating apparatus 20 according to an embodiment of the present invention. The model updating device 20 may be a server, and the model updating device 20 may include a processor 201, a memory 202, a system bus 203, and a communication interface 204. Wherein: the memory 202 is used to store computer execution instructions, the processor 201 is connected to the memory 202 via a system bus, and when the model updating apparatus 20 is in operation, the processor 201 executes computer execution instructions stored in the memory 203 to cause the model updating apparatus 20 to execute the present Any one of the model updating methods provided by the embodiment of the invention. For specific model update methods, refer to the related descriptions in the following and the drawings, and details are not described herein again.
本发明实施例还提供一种存储介质,该存储介质可以包括存储器202。The embodiment of the invention further provides a storage medium, which may include a memory 202.
处理器201可以是一个处理器,也可以是多个处理元件的统称。例如,处 理器201可以为中央处理器(central processing unit,CPU)。处理器201也可以为其他通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。处理器201还可以为专用处理器,该专用处理器可以包括基带处理芯片、射频处理芯片等中的至少一个。进一步地,该专用处理器还可以包括具有模型更新装置20其他专用处理功能的芯片。The processor 201 can be a processor or a collective term for multiple processing elements. For example, at The processor 201 can be a central processing unit (CPU). The processor 201 can also be other general purpose processors, digital signal processing (DSP), application specific integrated circuit (ASIC), field-programmable gate array (FPGA) or Other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The processor 201 may also be a dedicated processor, which may include at least one of a baseband processing chip, a radio frequency processing chip, and the like. Further, the dedicated processor may also include a chip having other dedicated processing functions of the model updating device 20.
存储器202可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储器202也可以包括非易失性存储器(non-volatile memory),例如只读存储器(英文全称:read-only memory,英文缩写:ROM),快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器202还可以包括上述种类的存储器的组合。The memory 202 may include a volatile memory such as a random-access memory (RAM); the memory 202 may also include a non-volatile memory such as a read-only memory. Full name: read-only memory, abbreviation: ROM), flash memory, hard disk drive (HDD) or solid-state drive (SSD); memory 202 may also include the above types of memory The combination.
系统总线203可以包括数据总线、电源总线、控制总线和信号状态总线等。本实施例中为了清楚说明,在图2中将各种总线都示意为系统总线203。 System bus 203 can include a data bus, a power bus, a control bus, and a signal status bus. For the sake of clarity in the present embodiment, various buses are illustrated as system bus 203 in FIG.
通信接口204具体可以是模型更新装置20上的收发器。该收发器可以为无线收发器。例如,无线收发器可以是模型更新装置20的天线等。处理器201通过通信接口204与其他设备,例如与业务客户端之间进行数据的收发。 Communication interface 204 may specifically be a transceiver on model update device 20. The transceiver can be a wireless transceiver. For example, the wireless transceiver may be an antenna of the model updating device 20 or the like. The processor 201 transmits and receives data to and from other devices, such as a service client, via the communication interface 204.
在具体实现过程中,下文中提供的任意一种模型更新方法流程中的各步骤均可以通过硬件形式的处理器201执行存储器202中存储的软件形式的计算机执行指令实现。为避免重复,此处不再赘述。In a specific implementation process, each step in the flow of any one of the model update methods provided below may be implemented by the processor 201 in hardware form executing a computer-executed instruction in the form of software stored in the memory 202. To avoid repetition, we will not repeat them here.
下面对本发明实施例中的部分术语进行解释说明,以方便读者理解:Some of the terms in the embodiments of the present invention are explained below to facilitate the reader's understanding:
1)、业务数据,在线业务数据,历史业务数据1), business data, online business data, historical business data
业务数据,是指业务客户端在使用业务的过程中产生的数据。业务数据可以包括业务本身的数据,也可以包括用户对业务的反馈数据。业务数据表示为时序形式,以业务客户端为IPTV在线视频播放客户端为例,IPTV在线视频业务的业务数据可以包括但不限于以下任意一种信息:会话标识(Session ID)、用户账号、视频的开始播放时间、视频的结束播放时间、播放类型、视频类型、视频ID、用户对视频的操作记录等;其中,ID是身份标识号码(identity)的英文缩写;用于对视频的操作记录可以包括但不限于:用户对视频的收藏、浏 览、用户对视频推荐内容是否点击等。Business data refers to the data generated by the business client in the process of using the business. The business data may include data of the service itself, and may also include feedback data of the user to the service. The service data is represented as a time series. The service client is an IPTV online video playback client. The service data of the IPTV online video service may include but is not limited to any of the following information: session ID, user account, and video. The start time of the play, the end play time of the video, the play type, the video type, the video ID, the operation record of the user on the video, etc., wherein the ID is an abbreviation of the identity number (identity); the operation record for the video can be Including but not limited to: user's collection of video, browsing View, whether the user recommends content for the video, etc.
本发明实施例中的在线业务数据和历史业务数据均是对服务器而言的,具体的,在线业务数据是指服务器从当前时刻开始的、且在当前时刻之前的预设时间段之内接收到的业务数据。历史业务数据是指服务器从当前时刻开始的、且在当前时刻之前的预设时间段之外接收到的业务数据。The online service data and the historical service data in the embodiment of the present invention are all for the server. Specifically, the online service data refers to the server receiving from the current time and receiving within a preset time period before the current time. Business data. The historical service data refers to the service data that the server receives from the current time and is received outside the preset time period before the current time.
2)、待测触发点,更新触发点和模型变化点2), the trigger point to be tested, the update trigger point and the model change point
待测触发点、更新触发点和模型变化点均是时间域中的概念,也就是说是一维概念。示例的,待测触发点、更新触发点和模型变化点均可以用t来表示,例如,待测触发点t1用于表示将t1时刻作为待测触发点,又如,更新触发点t2用于表示将t2时刻作为更新触发点。The trigger point to be tested, the update trigger point, and the model change point are concepts in the time domain, that is, a one-dimensional concept. For example, the trigger point to be tested, the update trigger point, and the model change point can all be represented by t. For example, the trigger point t1 to be tested is used to indicate that the time t1 is used as the trigger point to be tested, and, for example, the update trigger point t2 is used for Indicates that the time t2 is used as the update trigger point.
待测触发点,是指根据一定的规则设置的、且用于使服务器判断是否需要对模型进行更新的触发点(即时域上的点,即时间点)。需要说明的是,服务器可以定期或源源不断地接收与该服务器连接的一个或多个业务客户端发送的在线业务数据,服务器可以在一些特定的时刻确定是否对模型进行更新,这些特定的时刻即为待测触发点。本发明实施例对如何确定待测触发点的方式不进行限定,理论上,服务器可以将任意一时刻作为待测触发点;实际实现时,服务器可以包括但不限于通过以下两种实现方式确定待测触发点:The trigger point to be tested refers to a trigger point (a point on the immediate domain, that is, a time point) that is set according to a certain rule and is used to cause the server to determine whether the model needs to be updated. It should be noted that the server may periodically or continuously receive online service data sent by one or more service clients connected to the server, and the server may determine whether to update the model at specific moments, and the specific moments are It is the trigger point to be tested. The embodiment of the present invention does not limit how to determine the trigger point to be tested. In theory, the server can use any time as the trigger point to be tested. In actual implementation, the server may include, but is not limited to, the following two implementation manners. Test trigger point:
方式1:服务器可以将从开始接收在线业务数据的时刻开始的预设时长的整数倍时的时刻,作为待测触发点。例如,若预设时长为T,服务器将从开始接收在线业务数据的时刻为t0,则服务器可以将t0+nT时刻作为待测触发点;其中,T大于0,n可以是大于或等于0的任意一整数。本发明实施例对T的具体取值不进行限定。Mode 1: The server may use the time when the integer time multiple of the preset duration from the time when the online service data is started to be received as the trigger point to be tested. For example, if the preset duration is T, the time when the server will receive the online service data from the beginning is t0, the server may use the time t0+nT as the trigger point to be tested; wherein T is greater than 0, and n may be greater than or equal to 0. Any integer. The specific value of T is not limited in the embodiment of the present invention.
方式2:服务器可以将从开始接收在线业务数据的时刻开始至接收到的在线业务数据为预设数据量的整数倍时的时刻,作为待测触发点。例如,若预设数据量为R,服务器将开始接收在线业务数据的时刻为t0,服务器可以将从t0开始每接收到R个在线业务数据时的时刻,作为待测触发点。Manner 2: The server may start as the trigger point to be tested from the moment when the online service data is received to the time when the received online service data is an integer multiple of the preset data amount. For example, if the preset data amount is R, the time when the server will start receiving the online service data is t0, and the server may start from t0 every time when R online service data is received as the trigger point to be tested.
更新触发点,可以理解为实际触发点或有效触发点,是指执行模型更新的触发点。待测触发点可能是更新触发点,也可能不是更新触发点。在现有技术中,可以将按照上述方式1或方式2确定的每个待测触发点均作为更新触发点;在本发明实施例中,需要根据一定的规则确定一个待测触发点是否为更新触发点。具体示例可以参考下文。 Updating the trigger point can be understood as the actual trigger point or the effective trigger point, which refers to the trigger point for performing the model update. The trigger point to be tested may be the update trigger point, or it may not be the update trigger point. In the prior art, each trigger point to be tested determined according to the above manner 1 or mode 2 is used as an update trigger point. In the embodiment of the present invention, it is determined whether a trigger point to be tested is updated according to a certain rule. Trigger point. Specific examples can be referred to below.
模型变化点,用于确定代表切片的过程中,其是指所触发的模型更新过程前后的两个模型之间的变化幅度大于或等于预设阈值的更新触发点。其中,这里的更新触发点可以是现有技术中的更新触发点,也可以是本发明实施例提供的更新触发点。具体说明可以参考下文。The model change point is used to determine the process of representing the slice, which refers to the update trigger point whose amplitude between the two models before and after the triggered model update process is greater than or equal to the preset threshold. The update trigger point herein may be an update trigger point in the prior art, or may be an update trigger point provided by the embodiment of the present invention. Specific instructions can be found below.
3)、数据点和特征点3), data points and feature points
数据点和特征点均是时间域和数据特征域中的概念,也就是说是二维概念。示例的,数据点可以表示为(t,v),其中,t表示数据点所在的时刻,v表示数据点所对应的业务数据的数据特征。特征点是特殊的数据点。具体说明可以参考下文。Both data points and feature points are concepts in the time domain and data feature domains, that is, two-dimensional concepts. For example, the data point can be expressed as (t, v), where t represents the time at which the data point is located, and v represents the data characteristic of the service data corresponding to the data point. Feature points are special data points. Specific instructions can be found below.
需要说明的是,为了便于清楚描述本发明实施例的技术方案,在本发明的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分,本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定。“多个”是指两个或两个以上。It should be noted that, in order to facilitate the clear description of the technical solutions of the embodiments of the present invention, in the embodiments of the present invention, the same items or similar items whose functions and functions are substantially the same are used in the words “first” and “second”. For the sake of distinction, those skilled in the art will understand that the words "first", "second" and the like do not limit the quantity and the order of execution. "Multiple" means two or more.
下面结合本发明实施例中的附图,对本发明实施例中的技术方案进行示例性描述。The technical solutions in the embodiments of the present invention are exemplarily described below with reference to the accompanying drawings in the embodiments of the present invention.
如图3所示,为本发明实施例提供的一种模型更新方法的流程示意图。图3所示的方法的执行主体可以是服务器,该方法可以包括以下步骤:FIG. 3 is a schematic flowchart diagram of a method for updating a model according to an embodiment of the present invention. The execution body of the method shown in FIG. 3 may be a server, and the method may include the following steps:
S301、获取在待测触发点所在的窗口内接收到的第一在线业务数据。S301. Acquire first online service data received in a window where the trigger point to be tested is located.
可以理解的,服务器可以定期或源源不断地接收与该服务器连接的一个或多个业务客户端发送的在线业务数据,本发明实施例的后续步骤中,服务器基于窗口内接收到的在线业务数据的数据特征更新模型。具体的:S301可以包括:服务器获取待测触发点所在的窗口内接收到的来自与该服务器连接的一个或多个业务客户端的在线业务数据,并将这些在线业务数据作为第一在线业务数据。As can be understood, the server can periodically or continuously receive the online service data sent by the one or more service clients connected to the server. In the subsequent steps of the embodiment of the present invention, the server is based on the online service data received in the window. Data feature update model. Specifically, the S301 may include: the server obtains online service data received from the window in which the trigger point to be tested is located, and one or more service clients connected to the server, and uses the online service data as the first online service data.
S301中的待测触发点可以是任意一个待测触发点。窗口,可以是时间窗口或数据量窗口等。一个时间窗口可以是指一个时间段,其中,趋近于0的时间段,即为时刻。一个数据量窗口可以是指固定的数据量。本发明实施例对待测触发点所在的窗口的大小不进行限定。The trigger point to be tested in S301 can be any one of the trigger points to be tested. The window can be a time window or a data volume window. A time window can refer to a time period in which a time period approaching zero is a time. A data volume window can refer to a fixed amount of data. The size of the window in which the trigger point is to be measured is not limited in the embodiment of the present invention.
需要说明的是,实际实现时,服务器可能在每个窗口内均接收到在线业务数据,也可能在某些窗口内接收不到在线业务数据。例如,在业务高峰期时,服务器可能在一段时间内的每个窗口内均接收到在线业务数据;在业务低峰期时,服务器可能在在某些窗口内接收不到在线业务数据。 It should be noted that, in actual implementation, the server may receive online service data in each window, and may not receive online service data in some windows. For example, during peak business hours, the server may receive online business data in each window for a period of time; during low peak periods, the server may not receive online business data in certain windows.
S302、根据第一在线业务数据的数据特征,构建第一特征序列。S302. Construct a first feature sequence according to data characteristics of the first online service data.
在S302之前,该方法还可以包括:获取第一在线业务数据的数据特征。其中,本发明实施例对在线业务数据的数据特征的具体内容和数量,以及获取方式等均不进行限定,其可以根据业务数据本身、实际需求等因素确定。示例的,以IPTV在线视频是动画片为例,第一在线业务数据的数据特征可以包括但不限于:第一在线业务数据的接收窗口(即待测触发点所在的窗口)内观看动画片的人数、第一在线业务数据的接收窗口内动画片的平均播放时长等。具体的:若服务器接收到的IPTV在线视频业务的业务数据为:Session ID、用户账号、视频的开始播放时间、视频的结束播放时间、播放类型、视频类型、视频ID、用户对视频的操作记录等,则服务器通过统计第一在线业务数据的接收窗口内,视频类型为动画片的独立用户账号的数量,即可得出第一在线业务数据的接收窗口内观看动画片的人数。服务器通过统计第一在线业务数据的接收窗口内观看动画片的独立用户的“视频的结束播放时间与视频的开始播放时间之差”的平均值,即可得出第一在线业务数据的接收窗口内动画片的平均播放时长。Before S302, the method may further include: acquiring data characteristics of the first online service data. The embodiment of the present invention does not limit the specific content and quantity of the data features of the online service data, and the acquisition manner, and may be determined according to factors such as the service data itself and actual requirements. For example, the IPTV online video is an animation. The data feature of the first online service data may include, but is not limited to, watching a cartoon in a receiving window of the first online service data (ie, a window in which the trigger point is to be tested). The number of people, the average playing time of the animation in the receiving window of the first online business data, and the like. Specific: If the service data of the IPTV online video service received by the server is: Session ID, user account, start time of video, end time of video, play type, video type, video ID, user operation record of video Then, the server can obtain the number of independent video accounts in the receiving window of the first online service data by counting the number of independent user accounts whose video type is an animation in the receiving window of the first online business data. The server can obtain the receiving window of the first online service data by counting the average value of the difference between the end time of the video and the start time of the video of the independent user watching the cartoon in the receiving window of the first online service data. The average playing time of the inner animation.
需要说明的是,为了描述上的简洁,下文中将数据特征和数据特征的特征值统一用数据特征来表示。本领域普通技术人员可以理解:本文中描述的数据特征,在某些场景下应当被理解为数据特征的特征值。例如,上述“获取第一在线业务数据的数据特征”应当被理解为:获取第一在线业务数据的数据特征的特征值。关于下文中的相关描述,不再一一叙述。It should be noted that, for the sake of brevity in description, the feature values of the data features and the data features are collectively represented by data features. Those of ordinary skill in the art will appreciate that the data features described herein should be understood in some scenarios as characteristic values of data features. For example, the above “acquiring data characteristics of the first online service data” should be understood as: acquiring feature values of data features of the first online service data. Regarding the related descriptions below, they will not be described one by one.
可选的,可以使用向量表示第一特征序列,该情况下,第一特征序列中的元素是根据一个或多个窗口内获取到的在线业务数据的数据特征得到的。下文中,均以使用向量表示第一特征序列为例进行说明。Optionally, the vector may be used to represent the first feature sequence. In this case, the elements in the first feature sequence are obtained according to data characteristics of online service data acquired in one or more windows. Hereinafter, the description will be made by taking an example in which a first feature sequence is represented by a vector.
S303、确定第一特征序列与至少一个代表切片之间的关联关系;代表切片是根据历史业务数据的数据特征构建的特征序列的切片。S303. Determine an association relationship between the first feature sequence and the at least one representative slice; the representative slice is a slice of the feature sequence constructed according to the data feature of the historical service data.
至少一个代表切片包括一个或多个代表切片,代表切片可以是业务专家确定的,也可以是服务器根据一定的方法生成的;代表切片可以预先存储在服务器中的,也可以是在执行S303之前由服务器生成的。可以使用向量表示代表切片,当然,具体实现时,不限于此。第一特征序列与至少一个代表切片之间的关联关系可以是二者之间的相似度或距离等。The at least one representative slice includes one or more representative slices, the representative slice may be determined by a service expert, or may be generated by the server according to a certain method; the representative slice may be pre-stored in the server, or may be pre-executed by S303 Server generated. The vector representation can be used to represent the slice. Of course, the specific implementation is not limited to this. The relationship between the first feature sequence and the at least one representative slice may be a similarity or distance between the two or the like.
具体的,若以向量表示第一特征序列和代表切片;则S303可以包括:获 取该至少一个代表切片中的与该第一特征序列中的元素个数相等的代表切片,确定第一特征序列与该第一特征序列中的元素个数相等的代表切片,之间的关联关系。其具体示例可以参考下文。Specifically, if the first feature sequence and the representative slice are represented by a vector, the S303 may include: obtaining Taking a representative slice of the at least one representative slice that is equal to the number of elements in the first feature sequence, and determining an association relationship between the first feature sequence and a representative slice equal to the number of elements in the first feature sequence . Specific examples thereof can be referred to below.
S304、若第一特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。S304. If the relationship between the first feature sequence and the at least one representative slice meets a preset condition, update the current model.
若至少一个代表切片包括多个代表切片,则第一特征序列与至少一个代表切片之间的关联关系满足预设条件,可以包括:第一特征序列与该多个代表切片中的至少一个代表切片之间的关联关系满足预设条件。预设条件可以是根据关联关系的任一种表示方式(例如距离或相似度等),实际需求和经验等一种或多种因素预先确定的。If the at least one representative slice includes a plurality of representative slices, the relationship between the first feature sequence and the at least one representative slice satisfies a preset condition, and may include: the first feature sequence and the at least one of the plurality of representative slices represent a slice The relationship between the two meets the preset conditions. The preset condition may be predetermined according to one or more factors such as any representation of the relationship (such as distance or similarity, etc.), actual demand and experience.
可选的,若至少一个代表切片包括多个代表切片,则S303~S304可以包括:服务器确定第一特征序列与该多个代表切片中的一个代表切片之间的关联关系,并在确定该关联关系不满足预设条件时,确定第一特征序列与该多个代表切片中的另一个代表切片之间的关联关系,依次类推,直至,第一特征序列与该多个代表切片中的某一个代表切片之间的关联关系满足预设条件为止,即认为:第一特征序列与该多个代表切片之间的关联关系满足预设条件。Optionally, if the at least one representative slice includes multiple representative slices, the S303-S304 may include: determining, by the server, an association relationship between the first feature sequence and one of the plurality of representative slices, and determining the association When the relationship does not satisfy the preset condition, determining an association relationship between the first feature sequence and another representative slice of the plurality of representative slices, and so on, until the first feature sequence and one of the plurality of representative slices The relationship between the representative slices meets the preset condition, that is, the relationship between the first feature sequence and the plurality of representative slices satisfies the preset condition.
本发明实施例提供的模型更新方法中,根据在线业务数据的数据特征构建的特征序列,与根据历史业务数据的数据特征构建的特征序列的代表切片,之间的关联关系满足预设条件时,对模型进行更新。由于本发明实施例提供的技术方案中结合了在线业务数据的数据特征、历史业务的数据特征、二者所构建的特征序列之间的关联关系,以及预设条件这些特征,来确定待测触发点是否为更新触发点;与现有技术提供的将固定时长或固定数据量作为更新触发点的技术方案相比,能够减少因相邻两个更新触发点之间的新增数据的数据特征与之前数据的数据特征变化不明显,而导致的该相邻两个更新触发点所触发的两次模型更新中在后的一次模型更新的意义不大,甚至毫无疑义的问题,从而节省资源。In the model updating method provided by the embodiment of the present invention, when the relationship between the feature sequence constructed according to the data feature of the online service data and the representative slice of the feature sequence constructed according to the data feature of the historical service data satisfies a preset condition, Update the model. The technical solution provided by the embodiment of the present invention combines the data characteristics of the online service data, the data characteristics of the historical service, the association relationship between the feature sequences constructed by the two, and the preset conditions to determine the trigger to be tested. Whether the point is an update trigger point; compared with the technical solution provided by the prior art that the fixed duration or the fixed data amount is used as the update trigger point, the data characteristics of the newly added data between the adjacent two update trigger points can be reduced. The data characteristics of the previous data change are not obvious, and the subsequent model update in the two model updates triggered by the adjacent two update trigger points has little meaning and even no doubt, thereby saving resources.
可选的,如图3a(图3a是基于图3进行绘制的)所示,在S303之后,该方法还可以包括:Optionally, as shown in FIG. 3a (FIG. 3a is drawn based on FIG. 3), after S303, the method may further include:
S305:若第一特征序列与至少一个代表切片之间的关联关系不满足预设条件,则获取待测触发点的后续待测触发点所在的窗口内接收到的第二在线业务数据。 S305: If the association relationship between the first feature sequence and the at least one representative slice does not meet the preset condition, obtain the second online service data received in the window where the subsequent trigger point of the trigger point to be tested is located.
其中,若至少一个代表切片包括多个代表切片,则第一特征序列与至少一个代表切片之间的关联关系不满足预设条件,可以包括:第一特征序列与该多个代表切片中的每个代表切片之间的关联关系均不满足预设条件。The at least one representative slice includes a plurality of representative slices, and the relationship between the first feature sequence and the at least one representative slice does not satisfy the preset condition, and may include: the first feature sequence and each of the plurality of representative slices The relationship between the representative slices does not satisfy the preset condition.
S306:根据第一在线业务数据的数据特征和第二在线业务数据的数据特征,按照接收时间先后顺序构建第二特征序列。S306: The second feature sequence is constructed according to the receiving time sequence according to the data feature of the first online service data and the data feature of the second online service data.
S307:确定第二特征序列与至少一个代表切片之间的关联关系。S307: Determine an association relationship between the second feature sequence and the at least one representative slice.
S308:若第二特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。S308: Update the current model if the relationship between the second feature sequence and the at least one representative slice satisfies a preset condition.
示例的,S307~S308的具体实现方式可以参考上文中S303~S304的具体实现方式,此处不再赘述。For example, the specific implementation manners of S307 to S308 may refer to the specific implementation manners of S303 to S304 in the foregoing, and details are not described herein again.
可选的,S305~S308可以包括:若第一特征序列与至少一个代表切片之间的关联关系不满足预设条件,则获取待测触发点的下一个待测触发点所在的窗口内接收到的在线业务数据;其中,将该待测触发点表示为第i个待测触发点,将该待测触发点的下一个待测触发点表示为第i+1个待测触发点。然后,根据第i个待测触发点所在的窗口内接收到的在线业务数据(即上述第一在线业务数据)的数据特征和第i+1个待测触发点所在的窗口内接收到的在线业务数据,构建特征序列。确定该特征序列与至少一个代表切片之间的关联关系。若该关联关系满足预设条件,则更新当前模型。若该关联关系不满足预设条件,则获取第i+2个待测触发点所在的窗口内接收到的在线业务数据;然后,根据第i个待测触发点所在的窗口内接收到的在线业务数据(即上述第一在线业务数据)的数据特征、第i+1个待测触发点所在的窗口内接收到的在线业务数据和第i+2个待测触发点所在的窗口内接收到的在线业务数据,构建特征序列。确定该特征序列与至少一个代表切片之间的关联关系。若该关联关系满足预设条件,则更新当前模型。若该关联关系不满足预设条件,则获取第i+3个待测触发点所在的窗口内接收到的在线业务数据,依次类推,直至更新当前模型。Optionally, S305-S308 may include: if the association relationship between the first feature sequence and the at least one representative slice does not satisfy the preset condition, the window in which the next trigger point to be tested is obtained is received in the window where the trigger point to be tested is obtained. The online service data; wherein the trigger point to be tested is represented as the i-th trigger point to be tested, and the next test trigger point of the to-be-tested trigger point is represented as the i+1th trigger point to be tested. Then, according to the data feature of the online service data (ie, the first online service data) received in the window where the i th test trigger is located, and the online received in the window of the i+1th test trigger point Business data, building a sequence of features. A relationship between the feature sequence and at least one representative slice is determined. If the association meets the preset condition, the current model is updated. If the association relationship does not meet the preset condition, the online service data received in the window where the i+2 test trigger points are located is obtained; and then, the online received according to the window where the i th test trigger point is located The data characteristics of the service data (that is, the first online service data), the online service data received in the window where the i+1th test trigger point is located, and the window in which the i+2 test trigger points are located are received. Online business data, building a sequence of features. A relationship between the feature sequence and at least one representative slice is determined. If the association meets the preset condition, the current model is updated. If the association relationship does not meet the preset condition, the online service data received in the window of the i+3th trigger point to be tested is obtained, and so on, until the current model is updated.
结合下文中的描述,若以向量表示特征序列(包括第一特征序列和第二特征序列),则可以理解的,服务器每次获取到的新的特征序列可以是在上一次特征序列中的各元素之后,增加根据新获取的在线业务数据的数据特征得到的元素。示例的,与上述S302相比,则S306可以理解为:在第一特征序列中的各元素之后,增加根据第二在线业务数据的数据特征得到的元素,得到第二特征序列。 According to the description below, if the feature sequence (including the first feature sequence and the second feature sequence) is represented by a vector, it can be understood that the new feature sequence acquired by the server each time may be in each of the last feature sequences. After the element, the elements obtained from the data characteristics of the newly acquired online business data are added. For example, compared with S302 above, S306 can be understood as: after each element in the first feature sequence, an element obtained according to the data feature of the second online service data is added to obtain a second feature sequence.
需要说明的是,具体实现时,服务器在每次对模型进行更新之后,可以删除本次更新过程中使用的特征序列,也可以将本次更新过程中最后一次使用的特征序列,作为后续的历史特征序列中的一部分。It should be noted that, in specific implementation, the server may delete the feature sequence used in the current update process after each update of the model, or may use the feature sequence last used in the update process as a follow-up history. Part of the sequence of features.
可选的,上述方法还可以包括:确定S301中的待测触发点所在的窗口的大小,具体的:假设S301中的待测触发点是第i个待测触发点,i≥1,i是整数;那么:Optionally, the method may further include: determining a size of a window where the trigger point to be tested in S301 is located, and specifically: assuming that the trigger point to be tested in S301 is the i-th trigger point to be tested, i≥1, i is Integer; then:
实现方式1:若i=1,则该待测触发点所在的窗口,可以是从服务器开始接收在线业务数据的时刻至该待测触发点之间的窗口;若i≥2,则待测触发点所在的窗口,可以是从第i-1个待测触发点至该待测触发点之间的窗口。Implementation 1: If i=1, the window of the trigger point to be tested may be a window between the time when the server starts receiving online service data and the trigger point to be tested; if i≥2, the trigger to be tested The window where the point is located may be a window from the i-1th trigger point to be tested to the trigger point to be tested.
实现方式2:若i=1,则该待测触发点所在的窗口,可以是从服务器开始接收在线业务数据的时刻至该待测触发点之间的窗口的1/N;若i≥2,则待测触发点所在的窗口,可以是从第i-1个待测触发点至该待测触发点之间的窗口的1/N;其中,N≥2,N是整数,1/N表示N分之一。Implementation 2: If i=1, the window of the trigger point to be tested may be 1/N of the window between the time when the server starts receiving online service data and the trigger point to be tested; if i≥2, The window of the trigger point to be tested may be 1/N of the window between the i-1th trigger point to be tested and the trigger point to be tested; wherein N≥2, N is an integer, 1/N indicates One in N.
其中,若窗口是时间窗口,则两个时刻之间的窗口是指两个时刻之间的时间段。示例的,若时间窗口为10min分钟,则相邻两个待测触发点之间的窗口,是指相邻两个待测触发点之间的时间段长度为10min。Wherein, if the window is a time window, the window between the two moments refers to the time period between the two moments. For example, if the time window is 10 min minutes, the window between two adjacent trigger points to be tested refers to the length of the time period between adjacent two test points to be tested is 10 min.
若窗口是数据量窗口,则两个时刻之间的窗口是指服务器接收到固定数据量的在线业务数据的之间的窗口;其中,在线业务数据的数据量可以在线业务的流量或个数等。示例的,若数据量窗口为10M(兆),则相邻两个待测触发点之间的窗口,是指服务器在相邻两个待测触发点之间,接收到的在线业务数据的流量是10M。若数据量窗口为500个,则相邻两个待测触发点之间的窗口,是指服务器在相邻两个待测触发点之间,接收到的在线业务数据的个数为500。If the window is a data volume window, the window between the two moments refers to a window between the online service data that the server receives a fixed amount of data; wherein the amount of data of the online service data can be the traffic or number of the online service, etc. . For example, if the data volume window is 10M (megabytes), the window between two adjacent trigger points to be tested refers to the traffic of the online service data received by the server between two adjacent trigger points to be tested. It is 10M. If the data volume window is 500, the window between two adjacent trigger points to be tested refers to the number of online service data received by the server between two adjacent trigger points to be tested.
如图4所示,为一种窗口(具体是时间窗口)与待测触发点之间的关系的示意图。图4中以上述实现方式1,且从服务器开始接收在线业务数据的时刻至当前时刻这段时间内包含两个待测触发点1和待测触发点2,为例进行说明。图4中,待测触发点1所在的窗口是窗口1,待测触发点2所在的窗口是窗口2。As shown in FIG. 4, it is a schematic diagram of a relationship between a window (specifically, a time window) and a trigger point to be tested. In the above-mentioned implementation manner 1, in FIG. 4, the two trigger points 1 to be tested and the trigger point 2 to be tested are included in the period from the time when the server starts receiving the online service data to the current time. In FIG. 4, the window where the trigger point 1 to be tested is located is the window 1, and the window where the trigger point 2 to be tested is located is the window 2.
如图5所示,为一种窗口(具体是时间窗口)与待测触发点之间的关系的示意图。图5中以上述实现方式2,且N=3,且从服务器开始接收在线业务数据的时刻至当前时刻这段时间内包含两个待测触发点1和待测触发点2,为例进行说明。图5中,待测触发点1所在的窗口是窗口3,待测触发点2所在的 窗口是窗口6。As shown in FIG. 5, it is a schematic diagram of a relationship between a window (specifically, a time window) and a trigger point to be tested. In FIG. 5, in the foregoing implementation manner 2, and N=3, and the time from the time when the server starts receiving the online service data to the current time, the two trigger points to be tested and the trigger point 2 to be tested are included as an example for description. . In Figure 5, the window where the trigger point 1 is to be tested is the window 3, and the trigger point 2 to be tested is located. The window is window 6.
由于后续步骤中是按照窗口内接收到的在线业务数据,确定在线业务数据的数据特征的,且一个窗口内接收到的在线业务数据,可以生成一组数据特征(包括一个或多个数据特征),因此,该可选的实现方式,能够保证后续步骤中,服务器在任意一待测触发点均获取到一组数据特征,从而能够保证确定每个待测触发点是否是更新触发点。另外,上述实现方式1能够保证在第一个待测触发点之前,或任意相邻的两个待测触发点之间,均获取到一组数据特征;上述实现方式2能够保证在第一个待测触发点之前,或任意相邻的两个待测触发点之间,均获取到多组数据特征。其中,关于数据特征的描述可参考下文。Since a subsequent step is to determine the data characteristics of the online service data according to the online service data received in the window, and the online service data received in one window, a set of data features (including one or more data features) may be generated. Therefore, the optional implementation manner can ensure that the server acquires a set of data features at any trigger point to be tested in the subsequent step, thereby ensuring whether each trigger point to be tested is an update trigger point. In addition, the foregoing implementation manner 1 can ensure that a set of data features are acquired before the first trigger point to be tested, or between any two adjacent test trigger points; the foregoing implementation manner 2 can guarantee the first one. Multiple sets of data features are acquired before the trigger point to be tested, or between any two adjacent test trigger points. Among them, the description of the data characteristics can be referred to below.
基于上述实现方式2:Based on the above implementation 2:
若i=1,则S302可以包括:根据该待测触发点所在的窗口内接收的在线业务数据的数据特征,以及从开始接收在线业务数据的时刻至该待测触发点之间的窗口中的、且除待测触发点所在的窗口之外的至少一个窗口内接收到的在线业务数据的数据特征,构建第一特征序列。可选的,该至少一个窗口是指每个窗口。示例的,基于图5,若待测触发点是待测触发点1,则S302可以包括:根据窗口1内接收到的在线业务数据的数据特征、窗口2内接收到的在线业务数据的数据特征和窗口3内接收到的在线业务数据的数据特征,构建第一特征序列。If i=1, the S302 may include: a data feature of the online service data received in the window according to the trigger point to be tested, and a window between the time when the online service data is started to be received, and the window between the trigger points to be tested. And the data feature of the online service data received in at least one window other than the window where the trigger point is to be tested, and constructing the first feature sequence. Optionally, the at least one window refers to each window. For example, based on FIG. 5, if the trigger point to be tested is the trigger point 1 to be tested, S302 may include: according to the data feature of the online service data received in the window 1 and the data feature of the online service data received in the window 2 And the data characteristics of the online business data received in the window 3, constructing the first feature sequence.
若i≥2,则S302可以包括:根据该待测触发点所在的窗口内接收的在线业务数据的数据特征,以及从第i-1个待测触发点至第一待测触发点之间的窗口中的、且除待测触发点所在的窗口之外的至少一个窗口内接收到的在线业务数据的数据特征,构建第一特征序列。可选的,该至少一个窗口是指每个窗口。示例的,基于图5,若待测触发点是待测触发点2,则S302可以包括:根据窗口4内接收到的在线业务数据的数据特征、窗口5内接收到的在线业务数据的数据特征和窗口6内接收到的在线业务数据的数据特征,构建第一特征序列。If i≥2, the S302 may include: a data feature of the online service data received in the window in which the trigger point is to be tested, and a relationship between the i-1th test trigger point and the first test trigger point. A data feature of the online service data received in the window and received in at least one window other than the window in which the trigger point is to be tested, constructs a first feature sequence. Optionally, the at least one window refers to each window. For example, based on FIG. 5, if the trigger point to be tested is the trigger point 2 to be tested, S302 may include: according to the data feature of the online service data received in the window 4, and the data feature of the online service data received in the window 5. And a data feature of the online business data received in the window 6, constructing a first feature sequence.
可选的,如图6所示(图6是基于图3进行绘制的),S302可以包括:Optionally, as shown in FIG. 6 (FIG. 6 is drawn based on FIG. 3), S302 may include:
S302.1:根据第一在线业务数据的数据特征,构建第一数据序列;其中,第一数据序列中的一个元素为一个数据点,数据点至少包含以下特征:数据点所在的时刻,数据点所对应的业务数据的数据特征。S302.1: Build a first data sequence according to data characteristics of the first online service data, where an element in the first data sequence is a data point, and the data point includes at least the following characteristics: a time point at which the data point is located, and a data point The data characteristics of the corresponding business data.
其中,数据点所在的时刻是指该数据点所对应的各业务数据的接收窗口的结尾处,即待测触发点,可选的,可以使用该数据点所对应的业务数据的接收 窗口的序号表示,当然,具体实现时,不限于此。示例的,第一在线业务的数据特征对应的数据点可以表示为(t,v);其中,t表示第一在线业务的数据特征的接收窗口的序号,v表示第一在线业务的数据特征。The time at which the data point is located refers to the end of the receiving window of each service data corresponding to the data point, that is, the trigger point to be tested, and optionally, the service data corresponding to the data point can be received. The serial number of the window indicates, of course, the specific implementation is not limited to this. For example, the data point corresponding to the data feature of the first online service may be represented as (t, v); wherein t represents the serial number of the receiving window of the data feature of the first online service, and v represents the data feature of the first online service.
第一数据序列可以理解为是由一个数据点构成的集合,或由多个数据点按照该多个数据点所在的时刻的时间先后顺序构成的集合,该集合可以使用向量表示。The first data sequence can be understood as a set consisting of one data point, or a set consisting of a plurality of data points in chronological order of the time at which the plurality of data points are located, the set being represented by a vector.
示例的,第一数据序列中的第n个数据点可以表示为(tn,vn),其中,tn表示第一数据序列中的第n个数据点所在的时刻,vn表示第一数据序列中的第n个数据点所对应的在线业务数据的数据特征;1≤n≤N,n和N均是整数,N表示第一数据序列中的数据点的总数目。该情况下,第一数据序列可以表示为{(t1,v1),t2,v2),……(tn,vn)……(tN,vN)}。若在线业务数据的数据特征为多维度(即在线业务数据的数据特征的数量为多个),则第n个数据点(tn,vn)中的vn可以用向量形式表示,示例的,第n个数据点可以表示为:(tn,vn1,vn2,……vnm……vnM),其中,vnm表示第n个数据点所对应的在线业务数据的第m个数据特征;该情况下,第一数据序列可以表示为{(t1,v11,v12,……v1m……v1M),(t2,v21,v22,……v2m……v2M),……(tn,vn1,vn2,……vnm……vnM),……(tN,vN1,vN2,……vNm……vNM)。For example, the nth data point in the first data sequence can be represented as (t n , v n ), where t n represents the time at which the nth data point in the first data sequence is located, and v n represents the first The data characteristics of the online service data corresponding to the nth data point in the data sequence; 1 ≤ n ≤ N, n and N are integers, and N represents the total number of data points in the first data sequence. In this case, the first data sequence can be expressed as {(t 1 , v 1 ), t 2 , v 2 ), ... (t n , v n ) (t N , v N )}. If the data feature of the online service data is multi-dimensional (ie, the number of data features of the online service data is multiple), the v n in the nth data point (t n , v n ) may be represented by a vector form, for example The nth data point can be expressed as: (t n , v n1 , v n2 , ... v nm ... v nM ), where v nm represents the mth of the online service data corresponding to the nth data point Data characteristics; in this case, the first data sequence can be expressed as {(t 1 , v 11 , v 12 , ... v 1m ... v 1M ), (t 2 , v 21 , v 22 , ... v 2m ... ...v 2M ), ... (t n , v n1 , v n2 , ... v nm ... v nM ), ... (t N , v N1 , v N2 , ... v Nm ... v NM ).
基于S302中的示例,第一在线业务数据的数据特征可以表示为(t,v1,v2),其中,t表示第一在线业务数据的接收窗口的序号,v1表示第一在线业务数据的接收窗口内观看动画片的人数,v2表示第一在线业务数据的接收窗口内动画片的平均播放时长。Based on the example in S302, the data feature of the first online service data may be represented as (t, v1, v2), where t represents the serial number of the receiving window of the first online service data, and v1 represents the receiving window of the first online service data. The number of people watching the animation inside, v2 indicates the average playing time of the cartoon in the receiving window of the first online business data.
S302.2:将第一数据序列生成第一特征序列;其中,第一特征序列中的元素至少包含以下特征:数据点所在的时刻,数据点与前一数据点之间的变化速率。S302.2: Generate a first feature sequence by the first data sequence; wherein the element in the first feature sequence includes at least the following feature: a time at which the data point is located, a rate of change between the data point and the previous data point.
可选的,第一特征序列中的元素还可以包含以下特征:数据点所在的时刻与前一数据点所在的时刻之间的时间段。由于该可选的特征可以根据该数据点所在的时刻的前一数据点所在的时刻推断出,因此,具体实现时,第一特征序列中的元素可以不包含该可选的特征。Optionally, the element in the first feature sequence may further include the following feature: a time period between a time when the data point is located and a time when the previous data point is located. Since the optional feature can be inferred according to the moment when the previous data point of the data point is located, the element in the first feature sequence may not include the optional feature.
其中,若在时序上,第一数据序列之前还有其他数据序列,则第一数据序列中的第一个数据点的前一数据点是前一个数据序列中的最后一个数据点,需要说明的是,根据下文中对模型变化点的描述部分可知,该最后一个数据点是 距离当前时刻最近的一个模型变化点。若在时序上,第一数据序列之前没有其他数据序列,则第一数据序列中的第一个数据点实际上为:从服务器开始接收在线业务数据的时刻(即起始点)开始的第二个数据点,其前一个数据点为从服务器开始接收在线业务数据的时刻开始的第一个数据点。这是因为:从服务器开始接收在线业务数据的时刻开始的第一个数据点的前一数据点不存在,因此,该第一数据点与其前一数据点之间的变化速率没有意义,从而导致该第一数据点没有意义。示例的,如图4所示,第一数据序列中的首个数据点所在的时刻是第二待测触发点。下文中均以第一数据序列之前还有其他的数据序列为例进行说明。Wherein, if there are other data sequences before the first data sequence in time series, the previous data point of the first data point in the first data sequence is the last data point in the previous data sequence, which needs to be explained. Yes, according to the description of the model change point below, the last data point is A model change point that is closest to the current time. If there is no other data sequence before the first data sequence in time series, the first data point in the first data sequence is actually: the second one from the time when the server starts to receive the online service data (ie, the starting point) The data point, the previous data point is the first data point from the moment the server starts receiving online business data. This is because the previous data point of the first data point from the time when the server starts receiving the online service data does not exist, so the rate of change between the first data point and the previous data point has no meaning, resulting in This first data point has no meaning. For example, as shown in FIG. 4, the time at which the first data point in the first data sequence is located is the second test point to be tested. All of the following are examples of other data sequences before the first data sequence.
示例的,第一特征序列中的第n个元素可以表示为(tn,△n,dn),其中,tn表示第一数据序列中的第n个数据点所在的时刻,△n表示第一数据序列中的第n个数据点与前一数据点(具体可以是第一数据序列中的第n-1个数据点,或第一数据序列的前一数据序列中的最后一个数据点)之间的变化速率,dn表示第一数据序列中的第n个数据点所在的时刻与前一数据点所在的时刻之间的时间段。该情况下,第一特征序列可以表示为:TS={(t1,△1,d1),(t2,△2,d2)……(tn,△n,dn)……(tN,△N,dN)}。若在线业务数据的数据特征为多维度,则TS中的第n个元素(tn,△n,dn)中的△n可以用向量的形式表示,示例的,第n个元素(tn,△n,dn)可以表示为:(tn,△n1,△n2,……△nm……△nM,dn),其中,△nm表示对于第m个数据特征而言,第一数据序列中的第n个数据点与前一数据点之间的变化速率;示例的,基于S302.1中的示例,第一在线业务数据的数据特征可以表示为(t,v1,v2),若第m个数据特征表示第1个数据特征,例如在线业务数据的接收窗口内观看动画片的人数,则△nm表示(tn,vn1)与(tn-1,v(n-1)1)之间的变化斜率。该情况下,第一数据序列可以表示为{(t1,△11,△12,……△1m……△1M,d1),(t2,△21,△22,……△2m……△2M,d2,……(tn,△n1,△n2,……△nm……△nM,dn)……(tN,△N1,△N2,……△Nm……△NM,dN)。For example, the nth element in the first feature sequence can be represented as (t n , Δ n , d n ), where t n represents the time at which the nth data point in the first data sequence is located, and Δ n represents The nth data point and the previous data point in the first data sequence (specifically, the n-1th data point in the first data sequence, or the last data point in the previous data sequence of the first data sequence) The rate of change between d n represents the time period between the time at which the nth data point in the first data sequence is located and the time at which the previous data point is located. In this case, the first characteristic sequence can be expressed as: TS = {(t 1 , Δ 1 , d 1 ), (t 2 , Δ 2 , d 2 ) (t n , Δ n , d n )... (t N , Δ N , d N )}. If the data characteristic line service data is multi-dimensional, the TS of the n-th element (t n, △ n, d n) of △ n may be expressed in vector form, example, n-th element (T n , Δ n , d n ) can be expressed as: (t n , Δ n1 , Δ n2 , ... Δ nm Δ Δ nM , d n ), where Δ nm represents the first data feature for the mth The rate of change between the nth data point and the previous data point in the data sequence; by way of example, based on the example in S302.1, the data characteristics of the first online service data can be expressed as (t, v1, v2), If the mth data feature represents the first data feature, such as the number of people watching the animation in the receiving window of the online service data, Δ nm represents (t n , v n1 ) and (t n-1 , v (n-1) ) 1 ) The slope of the change between. In this case, the first data sequence can be expressed as {(t 1 , Δ 11 , Δ 12 , ... Δ 1m ...... Δ 1M , d 1 ), (t 2 , Δ 21 , Δ 22 , ... Δ 2m ... ... △ 2M , d 2 , ... (t n , Δ n1 , Δ n2 , ... Δ nm ...... Δ nM , d n ) (t N , Δ N1 , Δ N2 , ... △ Nm ...... △ NM , d N ).
需要说明的是,若服务器在连续的多个窗口(不包含从开始接收业务数据之后的第一个窗口)中的每个窗口内均接收到在线业务数据,则根据每个窗口内接收到的在线业务数据可以得到一个数据点,该情况下,数据点所在的时刻与前一数据点所在的时刻之间的时间段为一个窗口对应的时间段。实际实现时,服务器可能在某些窗口内接收不到在线业务数据,则基于该窗口不能得到一个 数据点,该情况下,数据点所在的时刻与前一数据点所在的时刻之间的时间段不为一个窗口对应的时间段,具体可以为多个窗口对应的时间段。It should be noted that if the server receives online service data in each of a plurality of consecutive windows (excluding the first window after receiving the service data from the beginning), according to the received in each window The online business data can obtain a data point. In this case, the time period between the time when the data point is located and the time when the previous data point is located is a time period corresponding to one window. In actual implementation, the server may not receive online business data in some windows, and based on the window, one cannot get one. A data point. In this case, the time period between the time when the data point is located and the time when the previous data point is located is not a time period corresponding to one window, and may be a time period corresponding to multiple windows.
数据点与前一数据点之间的变化速率△,可以是以下任意一种:数据点与前一数据点之间的斜率、数据点与前一数据点之间的斜率的归一化处理、数据点与前一数据点之间的斜率的反正切值、数据点与前一数据点之间的斜率的反正切值的归一化处理、数据点与前一数据点之间的斜率的反正切值对应的符号等。示例的,一种表示数据点与前一数据点之间的变化速率的示例如表1所示:The rate of change Δ between the data point and the previous data point may be any of the following: a slope between the data point and the previous data point, a normalization of the slope between the data point and the previous data point, The normalization of the inverse tangent of the slope between the data point and the previous data point, the inverse tangent of the slope between the data point and the previous data point, and the slope between the data point and the previous data point The symbol corresponding to the value of the cut. An example of a rate of change between a data point and a previous data point is shown in Table 1:
表1Table 1
Figure PCTCN2017090609-appb-000001
Figure PCTCN2017090609-appb-000001
上述表1中将斜率的反正切值的范围分为上述7个子区域,即将数据点与前一数据点之间的变化速率定位7个等级,实际实现时,不限于此。例如,可以将数据点与前一数据点之间的变化速率定位任意等级。In the above Table 1, the range of the inverse tangent of the slope is divided into the above seven sub-regions, that is, the rate of change between the data point and the previous data point is ranked by seven levels, and the actual implementation is not limited thereto. For example, the rate of change between a data point and a previous data point can be located at any level.
示例的,基于表1,第一特征序列可以是:{(3,-2,1),(4,3,1),(5,0,1)……}。其中,元素(4,3,1)中的“4”表示该元素对应的数据点所在的时刻,具体为该数据点所对应的在线业务数据的接收窗口的序号,“3”表示该数据点与前一数据点之间的变化速率为快速上升(见表1),“1”表示该数据点所在的时刻与前一数据点所在的时刻之间的时间段,具体为:1个窗口对应的时间段。For example, based on Table 1, the first feature sequence may be: {(3, -2, 1), (4, 3, 1), (5, 0, 1) ...}. The “4” in the element (4, 3, 1) indicates the time at which the data point corresponding to the element is located, specifically the serial number of the receiving window of the online service data corresponding to the data point, and “3” indicates the data point. The rate of change from the previous data point is a rapid rise (see Table 1), and "1" indicates the time period between the time at which the data point is located and the time at which the previous data point is located. Specifically: 1 window corresponds to Time period.
进一步可选的,如图6a(图6a是基于图3和图6进行绘制的)所示,在S302.1之后,该方法还可以包括:Further, as shown in FIG. 6a (FIG. 6a is drawn based on FIG. 3 and FIG. 6), after S302.1, the method may further include:
S302.1a:提取第一数据序列中的特征点,并根据第一数据序列中的特征点构建第二数据序列。S302.1a: extract feature points in the first data sequence, and construct a second data sequence according to the feature points in the first data sequence.
从物理意义上来讲,特征点是曲线上的局部极值点(例如,峰值点、谷值点)、拐点等。对于本发明实施例来说,第一数据序列中的特征点可以是由第一数据序列中的各数据点构成的曲线上的特征点。其中,数据点与特征点的关系为:特征点一定是数据点,但是,数据点不一定是特征点。Physically, feature points are local extreme points on the curve (eg, peak points, valley points), inflection points, and so on. For the embodiment of the present invention, the feature points in the first data sequence may be feature points on the curve formed by each data point in the first data sequence. The relationship between the data points and the feature points is: the feature points must be data points, but the data points are not necessarily feature points.
可选的,对于任意一维度的数据特征来说,服务器可以根据第n-1个数据点(tn-1,vn-1)、第n+1个数据点(tn+1,vn+1)之间的关系,确定第n个数据点(tn, vn)是否为特征点;具体的:该关系可以用以下公式表示:
Figure PCTCN2017090609-appb-000002
其中,Thre1为大于或等于0的一个常量。
Optionally, for any one-dimensional data feature, the server may be based on the n-1th data point (t n-1 , v n-1 ), and the n+1th data point (t n+1 , v) The relationship between n+1 ) determines whether the nth data point (t n , v n ) is a feature point; specifically: the relationship can be expressed by the following formula:
Figure PCTCN2017090609-appb-000002
Where Thre1 is a constant greater than or equal to 0.
需要说明的是,若在线业务数据的数据特征是多个维度的,则只要至少一个维度的数据特征满足上述公式,则可将第n个数据点作为特征点。It should be noted that, if the data feature of the online service data is in multiple dimensions, the nth data point may be used as a feature point as long as the data feature of at least one dimension satisfies the above formula.
进一步可选的,与前一特征点之间的时间间隔大于或等于Thre2;其中,Thre2为大于或等于0的一个常量。该进一步可选的实现方式用于避免因相邻两个数据点中的数据特征的特征值突变,而导致的将连续的两个数据点均作为特征点,从而导致的获取到的特征点的精确度较低,最终导致模型更新的精确度较低的问题。示例的,可能因服务器在相邻两个窗口中的后一个窗口内重复接收在线业务数据等,而导致的数据特征的特征值突然变大;或者,因服务器网络在相邻两个窗口中的后一个窗口内网络连接错误业务数据或没有接收到在线业务数据等,而导致的数据特征的特征值突然变小。也就是说,该进一步可选的实现方式用于避免因相邻两个数据点中的数据特征的特征值突变,而导致的相邻两个窗口中的后一个窗口内接收到的在线业务数据的数据特征突变对获取特征点的精确度的影响,Further optionally, the time interval from the previous feature point is greater than or equal to Thre2; wherein Thre2 is a constant greater than or equal to 0. The further optional implementation is used to avoid the continuation of the eigenvalues of the data features in the adjacent two data points, and the two consecutive data points are used as the feature points, thereby resulting in the acquired feature points. The problem of lower accuracy, which ultimately leads to lower accuracy of model updates. For example, the eigenvalue of the data feature may suddenly become larger due to the server repeatedly receiving the online service data and the like in the next window of the adjacent two windows; or, because the server network is in the adjacent two windows In the latter window, the network connection error business data or the online service data is not received, and the characteristic value of the data feature suddenly becomes small. That is, the further optional implementation is for avoiding online service data received in the next window of the adjacent two windows due to abrupt changes in the feature values of the data features in the adjacent two data points. The effect of data feature mutations on the accuracy of acquiring feature points,
具体实现时,若Thre1为0,且Thre2为小于或等于最小窗口对应的时间段,则第一数据序列与第二数据序列相同。In a specific implementation, if Thre1 is 0, and Thrre2 is less than or equal to a time period corresponding to the minimum window, the first data sequence is the same as the second data sequence.
示例的,如图7所示,为一种确定特征点的示意图。图7中的横坐标表示t,纵坐标表示v;服务器获取到的时序上三个连续的数据点为数据点A(tn-1,vn-1)、数据点B(tn,vn)和数据点C(tn+1,vn+1);其中,数据点A(tn-1,vn-1)表示在时间窗口tn-1内观看动画片的人数为vn-1,数据点B(tn,vn)表示在时间窗口tn内观看动画片的人数为vn,数据点C(tn+1,vn+1)表示在时间窗口tn+1内观看动画片的人数为vn+1。数据点B的前一特征点为(t1,v1),其中,在本示例中,n为大于或等于2的整数。那么,根据上述条件1和条件2可知,若数据点B的纵坐标偏离AC所在直线上的tn时刻对应的点B'(即数学意义上的点)的纵坐标大于或等于Thre1,且数据点B所在的时刻tn与前一特征点所在的时刻t1之间的时间段大于或等于Thre2,则确定数据点B为特征点。For example, as shown in FIG. 7, it is a schematic diagram for determining feature points. In Fig. 7, the abscissa represents t, and the ordinate represents v; the three consecutive data points on the timing acquired by the server are data points A(t n-1 , v n-1 ), and data points B (t n , v n ) and the data point C(t n+1 , v n+1 ); wherein, the data point A(t n-1 , v n-1 ) indicates that the number of people watching the cartoon in the time window t n-1 is v N-1 , the data point B(t n , v n ) indicates that the number of people watching the cartoon in the time window t n is v n , and the data point C(t n+1 , v n+1 ) is expressed in the time window t n The number of people watching ** in +1 is v n+1 . The previous feature point of the data point B is (t 1 , v 1 ), where, in the present example, n is an integer greater than or equal to 2. Then, according to the above-described Condition 1 and Condition 2 shows that, if the ordinate data point B deviates t n in time on the straight line AC corresponding to that of the point B '(i.e. point in the mathematical sense) the ordinate is greater than or equal to thre1, and the data The time period between the time t n at which the point B is located and the time t 1 at which the previous feature point is located is greater than or equal to Thre 2 , and it is determined that the data point B is a feature point.
需要说明的是,实际实现时,若当前待测触发点是待测触发点1,且根据待测触发点1所在的窗口内接收到的在线业务数据的数据特征得到的数据点为数据点B,即数据点B为确定待测触发点1是否是更新触发点的过程中新增的数据点,则在本次确定更新触发点的过程中,直接将数据点B作为特征点B。 若当前待测触发点是待测触发点1的下一待测触发点(即待测触发点2),且数据点C为确定待测触发点2是否是更新触发点的过程中新增的数据点,则在本次确定更新触发点的过程中,按照图7所示的方法确定数据点B是否是特征点。另外,在待测触发点2不是更新触发点之后,确定待测触发点2的后续待测触发点是否是下一个更新触发点的过程中,将数据点C直接作为特征点;直至确定某个待测触发点为更新触发点为止。It should be noted that, in actual implementation, if the current trigger point to be tested is the trigger point 1 to be tested, and the data point obtained according to the data feature of the online service data received in the window where the trigger point 1 is to be tested is the data point B. That is, the data point B is a new data point in the process of determining whether the trigger point 1 to be tested is an update trigger point, and in the process of determining the update trigger point, the data point B is directly used as the feature point B. If the current trigger point to be tested is the next to-be-tested trigger point of the trigger point 1 to be tested (ie, the trigger point 2 to be tested), and the data point C is newly added during the process of determining whether the trigger point 2 to be tested is the update trigger point. For the data point, in the process of determining the update trigger point, it is determined according to the method shown in FIG. 7 whether the data point B is a feature point. In addition, after the trigger point 2 to be tested is not the update trigger point, it is determined whether the subsequent test trigger point of the trigger point 2 to be tested is the next update trigger point, and the data point C is directly used as the feature point; The trigger point to be tested is the update trigger point.
基于包含S301.1a的可选的实现方式中,图6中的S302.2可以包括以下S302.2',如图6a所示:Based on the optional implementation including S301.1a, S302.2 in FIG. 6 may include the following S302.2', as shown in FIG. 6a:
S302.2':将第二数据序列生成第一特征序列;其中,第一特征序列中的元素包括该特征点所在的时刻,特征点与前一特征点之间的变化速率,以及特征点所在的时刻与前一特征点所在的时刻之间的时间段。S302.2': generating a first feature sequence by the second data sequence; wherein the element in the first feature sequence includes a time at which the feature point is located, a rate of change between the feature point and the previous feature point, and a feature point The time period between the moment and the moment when the previous feature point is located.
其中,步骤S302.2'的具体实现方式可以参考上文S302.2的具体实现方式,此处不再赘述。For the specific implementation of the step S302.2', refer to the specific implementation manner of the foregoing S302.2, and details are not described herein again.
示例的,基于表1,第一特征序列可以为:{(5,-2,5),(14,-1,9)……}。其中,元素(14,-1,9)中的“14”表示该元素对应的特征点所在的时刻,具体为该特征点对应的在线业务数据的接收窗口的序号,“-1”表示该特征点与前一特征点之间的变化速率为缓慢下降(见表1),“9”表示该特征点所在的时刻与前一特征点所在的时刻之间的时间段,具体为:9个窗口对应的时间段。For example, based on Table 1, the first feature sequence may be: {(5, -2, 5), (14, -1, 9) ...}. The "14" in the element (14, -1, 9) indicates the time at which the feature point corresponding to the element is located, specifically the sequence number of the receiving window of the online service data corresponding to the feature point, and "-1" indicates the feature. The rate of change between the point and the previous feature point is slowly decreasing (see Table 1), and "9" indicates the time period between the time at which the feature point is located and the time at which the previous feature point is located, specifically: 9 windows The corresponding time period.
需要说明的是,实际实现时,第一数据序列中包含的数据点的个数会很多,这样,若直接根据第一数据序列生成第一特征序列,则第一特征序列中的元素的个数会很多,这会使得在确定第一特征序列与至少一个代表切片之间的关联关系的过程中的计算量较大;该可选的实现方式通过提取第一数据序列中的特征点得到第二数据序列,并根据第二数据序列生成第一特征序列;该可选的实现方式中生成的第一特征序列中的元素的个数小于,根据第一数据特征得到第一特征序列中的元素的个数,因此,能够减少确定第一特征序列与至少一个代表切片之间的关联关系的过程中的计算量,从而加快处理速度。另外,由于特征点是第一数据序列中的一些特殊的数据点(称为有代表性的数据点),因此,利用第一数据序列中的特征点得到的第二特征序列生成的第一特征序列与至少一个代表切片之间的关联关系,与,利用第一数据序列生成的第一特征序列与至少一个代表切片之间的关联关系之间的误差不会太大。It should be noted that, in actual implementation, the number of data points included in the first data sequence may be many, such that if the first feature sequence is directly generated according to the first data sequence, the number of elements in the first feature sequence is There will be a lot, which will make the calculation amount in the process of determining the association relationship between the first feature sequence and the at least one representative slice; the optional implementation obtains the second by extracting the feature points in the first data sequence. a data sequence, and generating a first feature sequence according to the second data sequence; the number of elements in the first feature sequence generated in the optional implementation is less than, and the elements in the first feature sequence are obtained according to the first data feature The number, therefore, the amount of calculation in determining the relationship between the first feature sequence and the at least one representative slice can be reduced, thereby speeding up the processing. In addition, since the feature point is some special data point in the first data sequence (referred to as a representative data point), the first feature generated by the second feature sequence obtained by using the feature point in the first data sequence The association between the sequence and the at least one representative slice, and the error between the association between the first feature sequence generated using the first data sequence and the at least one representative slice is not too great.
可选的,使用向量表示第一特征序列和代表切片;该情况下,S303可以包 括:确定第一特征序列与至少一个代表切片之间的距离。S304可以包括:若该距离小于或等于第一预设阈值,则更新当前模型。Optionally, the vector is used to represent the first feature sequence and the representative slice; in this case, the S303 can be packaged. Include: determining a distance between the first feature sequence and the at least one representative slice. S304 may include updating the current model if the distance is less than or equal to the first preset threshold.
实际上,代表切片为根据历史业务数据的数据特征构建的特征序列,因此,其可以使用上述表示第一特征序列的方式来表示代表切片。具体示例可参考上文,需要说明的是,确定第一特征序列的在线业务数据的数据特征与历史业务数据的数据特征相同,例如,在线业务数据的数据特征与历史业务数据的数据特征均为:接收窗口内观看动画片的人数、接收窗口内动画片的平均播放时长。In effect, the representative slice is a sequence of features constructed from the data characteristics of the historical business data, and thus, it can represent the representative slice using the manner of representing the first feature sequence described above. For a specific example, reference may be made to the above. It should be noted that the data feature of the online service data of the first feature sequence is determined to be the same as the data feature of the historical service data. For example, the data feature of the online service data and the data feature of the historical service data are both : The number of people watching the movie in the receiving window and the average playing time of the animation in the receiving window.
第一特征序列与代表切片之间的距离,可以看作是两个向量之间的距离。具体实现时,可以用任意一种方式来确定两个向量之间的距离。另外,也可以将第一特征序列和代表切片均看作是切片,下面提供一种确定两个切片之间的距离的可选的实现方式,需要说明的是,计算距离的两个切片中的元素的个数是相等的:The distance between the first feature sequence and the representative slice can be seen as the distance between the two vectors. In a specific implementation, the distance between two vectors can be determined in any way. In addition, the first feature sequence and the representative slice can also be regarded as slices. An optional implementation for determining the distance between the two slices is provided below. It should be noted that the two slices in the calculated distance are The number of elements is equal:
将第一特征序列表示为Slicep,代表切片表示为Sliceq,以下公式确定Slicep与Sliceq之间的距离:The first feature sequence is represented as Slice p and the representative slice is represented as Slice q . The following formula determines the distance between Slice p and Slice q :
Figure PCTCN2017090609-appb-000003
Figure PCTCN2017090609-appb-000003
其中,D(Slicep,Sliceq)表示Slicep与Sliceq之间的距离;I表示第一特征序列中的数据点(可选的为特征点)的个数,I为大于或等于1的整数,Dm(Slicepi,Sliceqi)表示Slicep对应的在线业务数据的第i个数据特征与Sliceq对应的历史业务数据的第i个数据特征之间的模式距离;Dd(Slicepi,Sliceqi)表示Slicep对应的在线业务数据的第i个数据特征与Sliceq对应的历史业务数据的第i个数据特征之间的时间距离。其中:Where D(Slice p , Slice q ) represents the distance between Slice p and Slice q ; I represents the number of data points (optionally feature points) in the first feature sequence, and I is greater than or equal to 1. The integer, D m (Slice pi , Slice qi ), represents the mode distance between the i-th data feature of the online service data corresponding to Slice p and the i-th data feature of the historical service data corresponding to Slice q ; D d (Slice pi , Slice qi ) represents the temporal distance between the i-th data feature of the online service data corresponding to Slice p and the i-th data feature of the historical service data corresponding to Slice q . among them:
Dm(Slicepi,Sliceqi)=|Δpiqi|;D m (Slice pi , Slice qi )=|Δ piqi |;
Dd(Slicepi,Sliceqi)=|Rpi-Rqi|;D d (Slice pi , Slice qi )=|R pi -R qi |;
其中,Δpi表示Slicep中的第i个数据点与前一数据点之间的变化速率,Δqi表示Sliceq中的第i个数据点与前一数据点之间的变化速率;
Figure PCTCN2017090609-appb-000004
Rpi表示Slicep中的第i个数据点与前一数据点之间的时间段dpi占Slicep的总时间段的比例;tlast表示Slicep的最后一个数据点所在的时刻,tfirst表示Slicep的第一个数据点所在的时刻;dfirst表示Slicep的第一个数据点与前一切片中的最后一个数据点之间的时间段(此时间段保存在Slicep的第一个元素中)。
Where Δ pi represents the rate of change between the i-th data point and the previous data point in Slice p , and Δ qi represents the rate of change between the i-th data point and the previous data point in Slice q ;
Figure PCTCN2017090609-appb-000004
R pi d pi represents the time period between the i-th data point in the previous Slice p data points representing a proportion of the total period of Slice p; t last data point represents the last time the Slice p where, t first Indicates the time at which the first data point of Slice p is located; d first represents the time period between the first data point of Slice p and the last data point in the previous slice (this time period is saved in the first of Slice p ) Among the elements).
可选的,使用向量表示第一特征序列和代表切片;该情况下,S303可以包 括:确定第一特征序列与至少一个代表切片之间的相似度。S304可以包括:若相似度大于或等于第二预设阈值,则更新当前模型。Optionally, the vector is used to represent the first feature sequence and the representative slice; in this case, the S303 can be packaged. Included: determining a similarity between the first feature sequence and the at least one representative slice. S304 may include updating the current model if the similarity is greater than or equal to the second preset threshold.
第一特征序列与代表切片之间的相似度,可以看作是两个向量之间的相似度。具体实现时,可以用任意一种方式来确定两个向量之间的相似度。另外,也可以将第一特征序列和代表切片均看作是切片,下面提供一种确定两个切片之间的相似度的可选的实现方式,需要说明的是,计算相似度的两个切片中的元素的个数是相等的:The similarity between the first feature sequence and the representative slice can be seen as the similarity between the two vectors. In a specific implementation, the similarity between two vectors can be determined in any way. In addition, the first feature sequence and the representative slice can also be regarded as slices. An optional implementation for determining the similarity between the two slices is provided below. It should be noted that two slices of similarity are calculated. The number of elements in are equal:
将第一特征序列表示为Slicep,代表切片表示为Sliceq,以下公式确定Slicep与Sliceq之间的相似度:The first feature sequence is represented as Slice p and the representative slice is represented as Slice q . The following formula determines the similarity between Slice p and Slice q :
D(Slicep,Sliceq)=Dm(Slicep,Sliceq)+Dt(Slicep,Sliceq);D(Slice p , Slice q )=D m (Slice p , Slice q )+D t (Slice p , Slice q );
其中,D(Slicep,Sliceq)表示Slicep与Sliceq之间的相似度;Dm(Slicepi,Sliceqi)表示Slicep对应的在线业务数据的第i个数据特征与Sliceq对应的历史业务数据的第i个数据特征之间的模式距离;Dd(Slicepi,Sliceqi)表示Slicep对应的在线业务数据的第i个数据特征与Sliceq对应的历史业务数据的第i个数据特征之间的时间距离。其中:Where D(Slice p , Slice q ) represents the similarity between Slice p and Slice q ; D m (Slice pi , Slice qi ) represents the i-th data feature of the online service data corresponding to Slice p corresponding to Slice q The mode distance between the i-th data features of the historical service data; D d (Slice pi , Slice qi ) indicates the i-th data feature of the online service data corresponding to Slice p and the i-th of the historical service data corresponding to Slice q The time distance between data features. among them:
Figure PCTCN2017090609-appb-000005
Figure PCTCN2017090609-appb-000005
Figure PCTCN2017090609-appb-000006
Figure PCTCN2017090609-appb-000006
其中,I表示第一特征序列中的数据点(可选的为特征点)的个数,I为大于或等于1的整数,Where I represents the number of data points (optionally feature points) in the first feature sequence, and I is an integer greater than or equal to 1.
Figure PCTCN2017090609-appb-000007
Rpi表示Slicep中的第i个数据点与前一数据点之间的时间段dpi占Slicep的总时间段的比例;tlast表示Slicep的最后一个数据点所在的时刻,tfirst表示Slicep的第一个数据点所在的时刻;dfirst表示Slicep的第一个数据点与前一切片中的最后一个数据点之间的时间段(此时间段保存在Slicep的第一个元素中)。
Figure PCTCN2017090609-appb-000007
R pi d pi represents the time period between the i-th data point in the previous Slice p data points representing a proportion of the total period of Slice p; t last data point represents the last time the Slice p where, t first Indicates the time at which the first data point of Slice p is located; d first represents the time period between the first data point of Slice p and the last data point in the previous slice (this time period is saved in the first of Slice p ) Among the elements).
如图8所示,为本发明实施例提供的一种获取代表切片的方法的流程示意图。图8所示的方法可以包括:FIG. 8 is a schematic flowchart diagram of a method for acquiring a representative slice according to an embodiment of the present invention. The method shown in Figure 8 can include:
S801:获取历史业务数据,并根据历史业务数据的数据特征构建历史特征 序列。S801: Acquire historical business data, and construct historical features according to data characteristics of historical business data. sequence.
其中,历史业务数据是指相对于当前时刻来说的任意一部分历史业务数据或全部的历史业务数据。根据历史业务数据的数据特征构建历史特征序列的具体实现方式可以参考上文中根据在线业务数据的数据特征构建第一特征序列的具体实现方式,此处不再赘述。The historical service data refers to any part of historical business data or all historical business data relative to the current time. For a specific implementation manner of constructing a historical feature sequence according to the data feature of the historical service data, reference may be made to the specific implementation manner of constructing the first feature sequence according to the data feature of the online service data, and details are not described herein again.
示例的,以IPTV在线视频是动画片,在线业务数据的数据特征为观看动画片的人数为例,假设窗口为时间窗口,例如半小时,那么,S801可以包括:For example, the IPTV online video is an animation, and the data feature of the online business data is an example of the number of people watching the animation. If the window is a time window, for example, half an hour, then S801 may include:
S1:服务器统计一段时间内的每个窗口下观看动画片的人数,得到数据序列1。S1: The server counts the number of people watching the animation under each window in a period of time, and obtains the data sequence 1.
其中,数据序列1可以类似于上文提供的第一数据特征。根据第一数据序列绘制的曲线的示意图如图9所示,其中,图9中横坐标表示窗口序号,纵坐标表示观看动画片的人数,图9中表示出了若干窗口内得到的观看动画片的人数。Among them, the data sequence 1 can be similar to the first data feature provided above. A schematic diagram of a curve drawn according to the first data sequence is shown in FIG. 9. In FIG. 9, the abscissa indicates the window number, the ordinate indicates the number of people watching the cartoon, and FIG. 9 shows the watching cartoons obtained in several windows. Number of people.
S2:服务器提取数据特征1中的特征点,并将提取出的特征点构建数据序列2。S2: The server extracts the feature points in the data feature 1 and constructs the data sequence 2 from the extracted feature points.
其中,数据序列2可以类似于上文提供的第二数据特征。将提取出的特征点分别表示为特征点A~P,如图10(图10是基于图9进行绘制的)所示。Wherein, the data sequence 2 can be similar to the second data feature provided above. The extracted feature points are respectively represented as feature points A to P, as shown in FIG. 10 (FIG. 10 is drawn based on FIG. 9).
S3:根据数据特征2构建历史特征序列。S3: Construct a historical feature sequence according to data feature 2.
基于图10所示的示例,S3中得到的历史特征序列可以为:HTS={(5,-2,5),(14,3,9)……};其中,(5,-2,5)表示特征点B,(14,3,9)表示特征点C。Based on the example shown in FIG. 10, the historical feature sequence obtained in S3 may be: HTS={(5,-2,5), (14,3,9)......}; wherein, (5, -2, 5 ) indicates feature point B, and (14, 3, 9) indicates feature point C.
S802:确定历史特征序列中的模型变化点。S802: Determine a model change point in the historical feature sequence.
其中,模型变化点是指:所触发的模型更新过程前后的两个模型之间的变化幅度大于或等于预设阈值的更新触发点。其中,该更新触发点可以是按照现有技术中提供的确定更新触发点的方法所确定的更新触发点,也可以是按照本发明实施例提供的任意一种确定更新触发点的方法所确定的更新触发点。下面通过一个具体的示例对模型变化点进行说明:The model change point refers to an update trigger point whose amplitude between the two models before and after the triggered model update process is greater than or equal to a preset threshold. The update trigger point may be an update trigger point determined according to a method for determining an update trigger point provided in the prior art, or may be determined by any method for determining an update trigger point according to an embodiment of the present invention. Update the trigger point. The model change point is explained below through a specific example:
假设t0时刻,服务器中的模型为模型1,按照时间先后顺序对更新触发点进行排列后得到的序列为:更新触发点1、2,那么,在更新触发点1所在的时刻对当前模型(即模型1)进行更新之后,得到模型2;在更新触发点2所在的时刻对当前模型(即模型2)进行更新之后,得到模型3,如图11所示。该 情况下,对于更新触发点1来说,若更新触发点1所触发的模型更新过程前后的两个模型(即模型1和模型2)之间的变化幅度大于或等于预设阈值,则将更新触发点1作为模型变化点;对于更新触发点2来说,若更新触发点2所触发的模型更新过程前后的两个模型(即模型2和模型3)之间的变化幅度大于或等于预设阈值,则将更新触发点2作为模型变化点。Assume that at time t0, the model in the server is model 1, and the sequence obtained by arranging the update trigger points in chronological order is: update trigger points 1, 2, then, at the time when the trigger point 1 is updated, the current model (ie, Model 1) After updating, model 2 is obtained; after updating the current model (ie, model 2) at the time when update trigger point 2 is updated, model 3 is obtained, as shown in FIG. The In the case of updating trigger point 1, if the magnitude of the change between the two models before and after the model update process triggered by the update trigger point 1 (ie, model 1 and model 2) is greater than or equal to the preset threshold, it will be updated. Trigger point 1 is used as the model change point; for updating trigger point 2, if the update model 2 is triggered by the model update process, the change between the two models (ie, model 2 and model 3) is greater than or equal to the preset. Threshold, the trigger point 2 will be updated as the model change point.
本发明实施例对两个模型之间的变化幅度的具体实现方式不进行限定,其可以使用现有技术中的任何一种方式实现。可选的,可以通过以下任意一种方式确定两个模型之间的变化幅度:The specific implementation manner of the variation range between the two models is not limited in the embodiment of the present invention, and may be implemented by using any one of the prior art. Alternatively, the magnitude of the change between the two models can be determined in any of the following ways:
方式1、以本发明实施例中的模型是逻辑回归模型为例,可以将两个模型的参数分别组成的向量之间的欧式距离,作为这两个模型之间的变化幅度。 Mode 1 In the embodiment of the present invention, the model is a logistic regression model, and the Euclidean distance between the vectors of the parameters of the two models may be used as the variation range between the two models.
方式2、以本发明实施例中的模型为朴素贝叶斯模型为例,可以将两个模型的先验概率分别组成的向量之间的欧式距离,作为这两个模型之间的变化幅度。Mode 2: Taking the model in the embodiment of the present invention as a naive Bayesian model, the Euclidean distance between the vectors formed by the prior probabilities of the two models may be used as the variation range between the two models.
需要说明的是,以该可选的实现方式中的更新触发点是本发明实施例提供的任意一种确定更新触发点的方法所确定的更新触发点为例,对待测触发点、更新触发点,以及模型变化点之间的关系进行说明:首先,待测触发点、更新触发点和模型变化点是时间概念。其次,待测触发点可能是更新触发点,也可能不是更新触发点;更新触发点一定是待测触发点;更新触发点可能是模型变化点,也可能不是模型变化点;模型变化点一定是更新触发点。另外,相邻更新触发点之间的间隔为相邻待测触发点之间的间隔的整数倍;相邻模型变化点之间的间隔为相邻待测触发点之间的间隔的整数倍;相邻更新触发点之间的间隔与相邻模型变化点之间的间隔没有直接关系。It should be noted that the update trigger point in the optional implementation manner is an update trigger point determined by any method for determining an update trigger point provided by the embodiment of the present invention, and the trigger point to be tested and the update trigger point are updated. And the relationship between the model change points is explained: First, the trigger point to be tested, the update trigger point and the model change point are time concepts. Secondly, the trigger point to be tested may be the update trigger point, or may not be the update trigger point; the update trigger point must be the trigger point to be tested; the update trigger point may be the model change point or the model change point; the model change point must be Update the trigger point. In addition, the interval between adjacent update trigger points is an integer multiple of the interval between adjacent test trigger points; the interval between adjacent model change points is an integer multiple of the interval between adjacent test trigger points; The interval between adjacent update trigger points is not directly related to the interval between adjacent model change points.
另外需要说明的是,数据点所在的时刻、特征点所在的时刻与模型变化点之间的关系如下:数据点所在的时刻可能是模型变化点,也可能不是模型变化点;模型变化点一定是数据点所在的时刻;特征点所在的时刻与模型变化点之间没有直接关系。In addition, it should be noted that the relationship between the time at which the data point is located, the time at which the feature point is located, and the model change point are as follows: the time at which the data point is located may be the model change point or the model change point; the model change point must be The time at which the data point is located; there is no direct relationship between the time at which the feature point is located and the model change point.
基于上述S1~S3的示例,假设在S802中确定的模型变化点如图12所示,图12中的每个小圆点所对应的横坐标表示一个模型变化点,部分模型变化点与特征点所在的时刻重合。需要说明的是,图12是基于图10进行绘制的;实际实现时,服务器是基于更新触发点确定模型变化点的,为了下文中清楚说明特征点所在的时刻与模型变化点无关,将图11与所确定的模型变化点合并在了 一个附图(即图12)中。由图12可知,相邻模型变化点之间可以包含一个或多个特征点所在的时刻,例如,相邻模型变化点F、H之间包含特征点F、G和H所在的时刻;相邻特征点所在的时刻之间可以包含一个或多个模型变化点,例如,相邻特征点B、C所在的时刻之间包括3个模型变化点。因此,特征点所在的时刻与模型变化点无关。Based on the above examples of S1 to S3, it is assumed that the model change point determined in S802 is as shown in FIG. 12, and the abscissa corresponding to each small dot in FIG. 12 represents a model change point, part of the model change point and feature point. The moments at the moment coincide. It should be noted that FIG. 12 is drawn based on FIG. 10; in actual implementation, the server determines the model change point based on the update trigger point, and in order to clearly explain that the time at which the feature point is located is independent of the model change point, FIG. 11 Combined with the determined model change points In one figure (ie Figure 12). It can be seen from FIG. 12 that the adjacent model change points may include the time at which one or more feature points are located, for example, the time at which the adjacent model change points F and H include the feature points F, G, and H; One or more model change points may be included between the moments at which the feature points are located, for example, three model change points are included between the moments where the adjacent feature points B and C are located. Therefore, the moment at which the feature point is located is independent of the model change point.
S803:基于模型变化点对历史特征序列进行切割,得到代表切片。S803: Cutting the historical feature sequence based on the model change point to obtain a representative slice.
具体的,在每个模型变化点,对历史特征序列进行切割,得到多个片段;其中,进行切割时,可以将模型变化点作为后一个片段的起点。每个片段均是历史特征序列的子集。Specifically, at each model change point, the historical feature sequence is cut to obtain a plurality of segments; wherein, when cutting, the model change point can be used as the starting point of the latter segment. Each fragment is a subset of the historical feature sequence.
假设历史特征序列为HTS={(t1,m1,d1),(t2,m2,d2),……(tn,mn,dn)},并且S802中确定的模型变化点有L-1个,那么,在S803中,对历史特征序列HTS进行切割之后,得到L个切片,分别如下:Suppose the historical feature sequence is HTS={(t 1 , m 1 , d 1 ), (t 2 , m 2 , d 2 ), . . . (t n , m n , d n )}, and the model determined in S802 There are L-1 change points. Then, in S803, after cutting the historical feature sequence HTS, L slices are obtained, as follows:
Figure PCTCN2017090609-appb-000008
Figure PCTCN2017090609-appb-000008
其中,k为大于或等于2的整数,L为大于或等于2的整数,n为大于或等于2的整数。Wherein k is an integer greater than or equal to 2, L is an integer greater than or equal to 2, and n is an integer greater than or equal to 2.
需要说明的是,图8所示的方法可以是服务器在离线状态下执行的,也可以是服务器在在线状态下执行的。若图8所示的方法是服务器在在线状态下执行的,则图8所示的方法可以在执行上文所述的S303之前的任意一步骤执行。It should be noted that the method shown in FIG. 8 may be performed by the server in an offline state, or may be performed by the server in an online state. If the method shown in FIG. 8 is that the server is executed in the online state, the method shown in FIG. 8 can be executed at any step before the execution of S303 described above.
若将S803中得到的代表切片构成的集合称为代表切片库,则该代表切片库可以一旦确定,就不再更新;也可以随着历史特征序列的更新而更新。其中,更新历史特征序列可以包括:当随着时间的推移,在线特征序列(例如上文中的第一特征序列和第二特征序列等)会逐渐变成新增的历史特征序列,该情况下,可以将这些新增的历史特征序列作为新的代表切片更新代表切片库,也可以将这些新增的历史特征序列与原先的历史特征序列结合,重新确定代表切片来更新代表切片库;这样,服务器中可以不保存历史业务数据及历史业务数据的数据特征,而是保存历史特征序列,从而节省存储空间,并且可以提高更新代表切片库的速率,缩短更新代表切片库的时间。If the set of representative slices obtained in S803 is referred to as a representative slice library, the representative slice library may not be updated once determined; or may be updated as the historical feature sequence is updated. The updating the historical feature sequence may include: when the online feature sequence (for example, the first feature sequence and the second feature sequence, etc.) gradually becomes a new historical feature sequence, in which case, These newly added historical feature sequences can be used as a new representative slice update representative slice library, or these newly added historical feature sequences can be combined with the original historical feature sequence to re-determine the representative slice to update the representative slice library; thus, the server The data characteristics of the historical business data and the historical business data may not be saved, but the historical feature sequence may be saved, thereby saving storage space, and the rate of updating the representative slicing library may be increased, and the time for updating the representative slicing library may be shortened.
基于S802中的示例,在模型变化点8、9之间的切片可以表示为:{(33, 2,3),(38,1,5)}。相邻特征点B、C之间的切片可以表示为:Slice_1={(7,3,2)},Slice_2={(10,3,3)},Slice_3={(14,3,4)}。需要说明的是,相邻两个特征点之间的多个切片中的变化速率共用这两个特征点之间的变化速率。Based on the example in S802, the slice between model change points 8, 9 can be expressed as: {(33, 2,3), (38,1,5)}. The slice between adjacent feature points B and C can be expressed as: Slice_1={(7,3,2)}, Slice_2={(10,3,3)}, Slice_3={(14,3,4)} . It should be noted that the rate of change in the plurality of slices between two adjacent feature points shares the rate of change between the two feature points.
可选的,如图8a(图8a是基于图8进行绘制的)所示,S803可以包括:Optionally, as shown in FIG. 8a (FIG. 8a is drawn based on FIG. 8), S803 may include:
S803':基于模型变化点对历史特征序列进行切割,并对切割后得到的切片进行聚类,得到代表切片。S803': The historical feature sequence is cut based on the model change point, and the slice obtained after the cutting is clustered to obtain a representative slice.
具体的,基于模型变化点对历史特征序列进行切割,并利用聚类算法对切割后得到的切片进行聚类,得到代表切片。其中,本发明实施例对聚类算法的具体实现不进行限定,其可以是现有技术中的任意一种聚类算法,例如,可以是k-means聚类算法。Specifically, the historical feature sequence is cut based on the model change point, and the sliced slice is clustered by using a clustering algorithm to obtain a representative slice. The embodiment of the present invention does not limit the specific implementation of the clustering algorithm, and may be any clustering algorithm in the prior art, for example, may be a k-means clustering algorithm.
示例的,确定任意两个切片之间的关联关系,若该关联关系满足一定的条件,则可以对这两个切片进行聚类(即认为这两个切片为同一种类的的切片),然后选择其中的任意一个切片作为该类的代表切片。其中,两个切片之间的关联关系的具体实现方式可以参考上文。例如,根据上文中确定Slicep与Sliceq之间的距离的计算方法,确定切割后得到的任意两个切片之间的距离,若该距离小于或等于预设阈值,则将这两个切片进行聚类,并将这两个切片中的一个切片作为代表切片。For example, the relationship between any two slices is determined. If the relationship satisfies certain conditions, the two slices may be clustered (ie, the two slices are considered to be slices of the same kind), and then selected. Any one of these slices is used as a representative slice of the class. For a specific implementation manner of the relationship between two slices, reference may be made to the above. For example, according to the above calculation method for determining the distance between Slice p and Slice q , the distance between any two slices obtained after cutting is determined, and if the distance is less than or equal to a preset threshold, the two slices are performed. Cluster and use one of the two slices as a representative slice.
基于S803中的示例,Slice_1和Slice_2之间的模式距离Dm(Slice_1,Slice_2),时间距离Dt(Slice_1,Slice_2),以及总距离D(Slice_1,Slice_2)如表2所示:Based on the example in S803, the mode distance Dm (Slice_1, Slice_2), the temporal distance Dt (Slice_1, Slice_2), and the total distance D (Slice_1, Slice_2) between Slice_1 and Slice_2 are as shown in Table 2:
表2Table 2
Dm(Slice_1,Slice_2)Dm(Slice_1,Slice_2) Dt(Slice_1,Slice_2)Dt(Slice_1,Slice_2) D(Slice_1,Slice_2)D (Slice_1, Slice_2)
|3-3|=0|3-3|=0 |2/2-3/3|=0|2/2-3/3|=0 00
同理,可计算得到D(Slice_1,Slice_3)和D(Slice_2,Slice_3)的距离均为0,因此,这三个切片可选slice_1作为代表切片。Similarly, the distances of D (Slice_1, Slice_3) and D (Slice_2, Slice_3) can be calculated to be 0. Therefore, the three slices can be selected as the representative slice.
类似地,可以得出如图12所示的各切片的代表切片为表3所示:Similarly, it can be concluded that the representative slices of each slice as shown in FIG. 12 are as shown in Table 3:
表3table 3
Slice_ABSlice_AB Slice_BCSlice_BC Slice_CESlice_CE Slice_FHSlice_FH
{(5,-2,5)}{(5,-2,5)} {(7,3,2)}{(7,3,2)} {(16,-3,3),(21,2,5)}{(16,-3,3),(21,2,5)} {(33,2,3),(38,1,5)}{(33,2,3),(38,1,5)}
需要说明的是,对历史特征序列进行切割后得到的部分片段(即切片)的特征可能类似,利用该可选的实现方式对切割后得到的切片进行聚类之后,能 够减少代表切片的数量,从而节省代表切片库所占的存储空间;进一步地,还可以减少确定在线特征序列(例如第一特征序列或第二特征序列)与这些特征类似的代表切片之间的关联关系的过程中的计算量,从而提高模型更新速率。It should be noted that the features of the partial segments (ie, slices) obtained after cutting the historical feature sequences may be similar, and the clusters obtained after the cutting are clustered by using the optional implementation manner. It is enough to reduce the number of representative slices, thereby saving the storage space occupied by the representative slice library; further, it is also possible to reduce the determination between the online feature sequence (for example, the first feature sequence or the second feature sequence) and the representative slice similar to these features. The amount of computation in the process of associating relationships, thereby increasing the rate of model update.
下面以一个具体示例对上文提供的模型更新方法进行说明:The model update method provided above is explained below with a specific example:
假设当前时刻服务器中的数据序列中为{数据点a(104,v1),数据点b(109,v2)},也就是说,数据点a、b为不是更新触发点,数据点a的前一数据点为更新触发点,其中,“104”表示第104个时间窗口,“109”表示第109个时间窗口。那么,若当前时刻为一待测触发点,则:Assume that the data sequence in the current time server is {data point a (104, v1), data point b (109, v2)}, that is, data points a, b are not update trigger points, before data point a A data point is an update trigger point, where "104" represents the 104th time window and "109" represents the 109th time window. Then, if the current time is a trigger point to be tested, then:
S11:获取该待测触发点内接收到的在线业务数据,及该在线业务数据的数据特征v3,并得到数据点c(114,v3),“114”表示第114个时间窗口,假设根据数据点a、b和c绘制的曲线如图13所示。S11: Obtain the online service data received in the trigger point to be tested, and the data feature v3 of the online service data, and obtain the data point c (114, v3), and “114” represents the 114th time window, which is assumed to be based on the data. The curves drawn by points a, b and c are shown in Fig. 13.
S12:将数据点c(114,v3)加入数据序列{数据点a(104,v1),数据点b(109,v2)}中,得到第一数据序列{数据点a(104,v1),数据点b(109,v2),数据点c(114,v3)}。S12: Add data point c (114, v3) to the data sequence {data point a (104, v1), data point b (109, v2)} to obtain a first data sequence {data point a (104, v1), Data point b (109, v2), data point c (114, v3)}.
S13:根据上文提供的特征点的判断方法,提取第一数据序列中的特征点,生成第二数据序列。S13: Extract a feature point in the first data sequence according to the determining method of the feature point provided above, to generate a second data sequence.
需要说明的是,既然数据特征{数据点a(104,v1),数据点b(109,v2)}中包含数据点a,且数据点a不是该数据序列中的最后一个数据点,说明在判断上一待测触发点是否为更新触发点的过程中确定了数据点a为特征点。S13具体可以包括:根据上文提供的方法确定数据点b是否为特征点,并直接将数据点c作为特征点。假设所确定的第二数据序列为:{数据点a(104,v1),数据点b(109,v2),数据点c(114,v3)}。It should be noted that since the data feature {data point a (104, v1), data point b (109, v2)} contains the data point a, and the data point a is not the last data point in the data sequence, The data point a is determined as a feature point in the process of determining whether the last trigger point to be tested is an update trigger point. S13 may specifically include: determining whether the data point b is a feature point according to the method provided above, and directly using the data point c as a feature point. It is assumed that the determined second data sequence is: {data point a (104, v1), data point b (109, v2), data point c (114, v3)}.
S14:根据第二数据序列确定的特征序列TS={(109,2,3),(114,1,5)}。S14: The feature sequence TS={(109, 2, 3), (114, 1, 5)} determined according to the second data sequence.
S15:确定特征序列TS={(109,2,3),(114,1,5)}与至少一个代表切片之间的距离,并确定该距离是否小于或等于预设的距离阈值(即上文中的第一预设阈值),以判断数据点c所在的时刻(即待测触发点)是否为更新触发点。S15: Determine a distance between the feature sequence TS={(109, 2, 3), (114, 1, 5)} and at least one representative slice, and determine whether the distance is less than or equal to a preset distance threshold (ie, The first preset threshold in the text is used to determine whether the time at which the data point c is located (ie, the trigger point to be tested) is an update trigger point.
由于当前时刻为数据点c所在的时刻,因此,只需要判断数据点c所在的时刻是否为更新触发点。Since the current time is the time at which the data point c is located, it is only necessary to determine whether the time at which the data point c is located is an update trigger point.
假设代表切片库中的各代表切片如表3所示,那么,根据特征序列TS中包含的元素的个数为2,可知,在S5中,需要计算结算Slice ac{(109,2,3),(114,1,5)}分别与Slice CE、Slice FH之间的距离,具体计算过程 如表4所示:Assuming that each representative slice in the representative slice library is as shown in Table 3, then according to the number of elements included in the feature sequence TS, it can be seen that in S5, it is necessary to calculate the settlement Slice ac{(109, 2, 3). , (114,1,5)}, the distance between Slice CE and Slice FH, respectively, the specific calculation process As shown in Table 4:
表4Table 4
Figure PCTCN2017090609-appb-000009
Figure PCTCN2017090609-appb-000009
假设预设的距离阈值为0.5,则根据表4可知,D(Slice_ac,Slice_FH)=0<0.5,因此,c为更新触发点。Assuming that the preset distance threshold is 0.5, it can be seen from Table 4 that D(Slice_ac, Slice_FH)=0<0.5, so c is an update trigger point.
不失一般性,在判断数据点c所在的时刻是否为更新触发点之前,已经判断过数据点b所在的时刻是否为更新触发点了。Without loss of generality, it is determined whether the time at which the data point b is located is an update trigger point before determining whether the data point c is at the time of updating the trigger point.
示例的,假设当前时刻为数据点b所在的时刻,则判断数据点b是否为更新触发点可以包括:计算ab所表示的切片Slice_ab(109,2,3)与代表切片Slice_AB{(5,-2,5)}、Slice_BC1{(7,3,2)}的距离,如表5所示。For example, if the current time is the time when the data point b is located, determining whether the data point b is the update trigger point may include: calculating the slice Slice_ab (109, 2, 3) represented by ab and the representative slice Slice_AB {(5, - 2,5)}, the distance of Slice_BC1{(7,3,2)}, as shown in Table 5.
表5table 5
D(Slice_ab,Slice_AB)D (Slice_ab, Slice_AB) D(Slice_ab,Slice_BC)D (Slice_ab, Slice_BC)
|2-(-2)|+|3/3-5/5|=4|2-(-2)|+|3/3-5/5|=4 |2-3|+|3/3-2/2|=1|2-3|+|3/3-2/2|=1
假设预设的距离阈值为0.5,那么由表5可知,数据点b所在的时刻不是更新触发点。Assuming that the preset distance threshold is 0.5, it can be seen from Table 5 that the time at which the data point b is located is not the update trigger point.
上述主要从模型更新装置(具体是指服务器)的角度对本发明实施例提供的方案进行了介绍。可以理解的是,为了实现上述各个功能,模型更新装置包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。The solution provided by the embodiment of the present invention is mainly introduced from the perspective of a model updating device (specifically, a server). It can be understood that in order to implement the above various functions, the model updating apparatus includes hardware structures and/or software modules corresponding to the execution of the respective functions. Those skilled in the art will readily appreciate that the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the modules and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
本发明实施例可以根据上述方法示例对模型更新装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。 The embodiment of the present invention may divide the function module by the model update device according to the above method example. For example, each function module may be divided according to each function, or two or more functions may be integrated into one processing module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
在采用各个功能划分的各个功能模块的情况下,图14示出了一种模型更新装置140的结构示意图。该模型更新装置140可以是上述实施例中所涉及的服务器。该模型更新装置140可以包括:获取模块1401、构建模块1402、确定模块1403和更新模块1404;可选的,还可以包括:生成模块1405。这各功能模块中的每个功能模块所具有的功能可以根据上文所提供的各方法实施例中的各步骤推断出来,或者可以参考上文发明内容部分所提供的内容,此处不再赘述。In the case of the respective functional modules divided by the respective functions, FIG. 14 shows a schematic structural diagram of a model updating apparatus 140. The model updating device 140 may be the server involved in the above embodiment. The model updating apparatus 140 may include: an obtaining module 1401, a building module 1402, a determining module 1403, and an updating module 1404; and optionally, the generating module 1405. The function of each of the functional modules may be inferred according to the steps in the method embodiments provided above, or may refer to the content provided in the above content of the invention, and details are not described herein again. .
在采用集成的模块的情况下,上述获取模块1401、构建模块1402、确定模块1403、更新模块1404和生成模块1405均可以集成为一个模型更新装置中的一个处理模块。另外,模型更新装置中还可以包括通信模块和存储模块。In the case of adopting an integrated module, the above-mentioned obtaining module 1401, building module 1402, determining module 1403, updating module 1404, and generating module 1405 can all be integrated into one processing module in one model updating device. In addition, the model updating apparatus may further include a communication module and a storage module.
如图15所示,为本发明实施例提供的一种模型更新装置150的结构示意图。该模型更新装置150可以是上述实施例中所涉及的服务器。该模型更新装置150可以包括:处理模块1501和通信模块1502。其中,处理模块1501用于对模型更新装置150的工作进行控制管理,例如,处理模块1501用于支持模型更新装置150执行图3、图3a、图6、图6a、图8、图8a中的各步骤,和/或用于本文所描述的技术的其它过程。例如,还可以用于支持上文具体示例中提供的S1~S3、S11~S15中的各步骤等。通信模块1502用于支持模型更新装置150与其他网络实体的通信,例如与业务客户端的通信等。可选的,模型更新装置150还可以包括:存储模块1503,用于存储模型更新装置150执行上文所提供的任一模型更新方法所对应的程序代码和数据。FIG. 15 is a schematic structural diagram of a model updating apparatus 150 according to an embodiment of the present invention. The model updating means 150 may be the server involved in the above embodiment. The model updating apparatus 150 may include a processing module 1501 and a communication module 1502. The processing module 1501 is configured to perform control management on the operation of the model updating apparatus 150. For example, the processing module 1501 is configured to support the model updating apparatus 150 to perform the operations in FIG. 3, FIG. 3a, FIG. 6, FIG. 6a, FIG. 8, and FIG. Various steps, and/or other processes for the techniques described herein. For example, it can also be used to support the steps S1 to S3, S11 to S15, and the like provided in the specific examples above. The communication module 1502 is configured to support communication of the model update device 150 with other network entities, such as communication with a service client, and the like. Optionally, the model updating apparatus 150 may further include: a storage module 1503, configured to store the program code and data corresponding to the model updating apparatus 150 to perform any of the model updating methods provided above.
其中,处理模块1501可以是处理器或控制器,例如可以是CPU,通用处理器,DSP,ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明实施例公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块1502可以是收发器、收发电路或通信接口等。存储模块1503可以是存储器。The processing module 1501 may be a processor or a controller, such as a CPU, a general purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. It is possible to implement or carry out various exemplary logical blocks, modules and circuits described in connection with the disclosure of the embodiments of the invention. The processor may also be a combination of computing functions, for example, including one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like. The communication module 1502 can be a transceiver, a transceiver circuit, a communication interface, or the like. The storage module 1503 can be a memory.
当处理模块1501为处理器,通信模块1502为收发器,存储模块1503为存储器时,本发明实施例所涉及的模型更新装置150可以以图2所示的模型更新装置20所示。When the processing module 1501 is a processor, the communication module 1502 is a transceiver, and the storage module 1503 is a memory, the model updating apparatus 150 according to the embodiment of the present invention may be shown by the model updating apparatus 20 shown in FIG. 2.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述 的装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统、装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。It will be apparent to those skilled in the art that the above description is convenient and concise for the description. The device is exemplified by the division of each functional module mentioned above. In practical applications, the above function assignment can be completed by different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete the above description. All or part of the function. For the specific working process of the system, the device and the module described above, refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the modules or modules is only a logical function division. In actual implementation, there may be another division manner, for example, multiple modules or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated. The components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated modules, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, and the like, which can store a program code.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。 The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims (23)

  1. 一种模型更新方法,其特征在于,包括:A method for updating a model, comprising:
    获取在待测触发点所在的窗口内接收到的第一在线业务数据;Obtaining the first online service data received in the window where the trigger point to be tested is located;
    根据所述第一在线业务数据的数据特征,构建第一特征序列;Constructing a first feature sequence according to the data feature of the first online service data;
    确定所述第一特征序列与至少一个代表切片之间的关联关系;所述代表切片是根据历史业务数据的数据特征构建的特征序列的切片;Determining an association relationship between the first feature sequence and at least one representative slice; the representative slice is a slice of a feature sequence constructed according to data characteristics of historical service data;
    若所述第一特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。If the association relationship between the first feature sequence and the at least one representative slice satisfies a preset condition, the current model is updated.
  2. 根据权利要求1所述的方法,其特征在于,在确定所述第一特征序列与至少一个代表切片之间的关联关系之后,所述方法还包括:The method according to claim 1, wherein after determining the association relationship between the first feature sequence and the at least one representative slice, the method further comprises:
    若所述第一特征序列与所述至少一个代表切片之间的关联关系不满足所述预设条件,则获取所述待测触发点的后续待测触发点所在的窗口内接收到的第二在线业务数据;And if the association relationship between the first feature sequence and the at least one representative slice does not satisfy the preset condition, acquiring a second received in a window where the trigger point to be tested of the to-be-tested trigger point is located Online business data;
    根据所述第一在线业务数据的数据特征和所述第二在线业务数据的数据特征,按照接收时间先后顺序构建第二特征序列;And constructing, according to the data feature of the first online service data and the data feature of the second online service data, the second feature sequence according to the receiving time sequence;
    确定所述第二特征序列与所述至少一个代表切片之间的关联关系;Determining an association relationship between the second feature sequence and the at least one representative slice;
    若所述第二特征序列与所述至少一个代表切片之间的关联关系满足预设条件,则更新所述当前模型。And updating the current model if an association relationship between the second feature sequence and the at least one representative slice satisfies a preset condition.
  3. 根据权利要求1或2所述的方法,其特征在于,使用向量表示所述第一特征序列和所述代表切片;所述确定所述第一特征序列与至少一个代表切片之间的关联关系,包括:The method according to claim 1 or 2, wherein the first feature sequence and the representative slice are represented by a vector; the determining the association relationship between the first feature sequence and at least one representative slice, include:
    确定所述第一特征序列与至少一个代表切片之间的距离;Determining a distance between the first feature sequence and at least one representative slice;
    若所述第一特征序列与所述至少一个代表切片之间的关联关系满足预设条件,则更新当前模型,包括:If the association relationship between the first feature sequence and the at least one representative slice meets a preset condition, updating the current model, including:
    若所述距离小于或等于第一预设阈值,则更新当前模型。If the distance is less than or equal to the first preset threshold, the current model is updated.
  4. 根据权利要求1或2所述的方法,其特征在于,使用向量表示所述第一特征序列和所述代表切片;所述确定所述第一特征序列与至少一个 代表切片之间的关联关系,包括:The method according to claim 1 or 2, wherein the first feature sequence and the representative slice are represented by a vector; the determining the first feature sequence and at least one Represents the relationship between slices, including:
    确定所述第一特征序列与至少一个代表切片之间的相似度;Determining a similarity between the first feature sequence and the at least one representative slice;
    若所述第一特征序列与所述至少一个代表切片之间的关联关系满足预设条件,则更新当前模型,包括:If the association relationship between the first feature sequence and the at least one representative slice meets a preset condition, updating the current model, including:
    若所述相似度大于或等于第二预设阈值,则更新当前模型。If the similarity is greater than or equal to the second preset threshold, the current model is updated.
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述根据所述第一在线业务数据的数据特征,构建第一特征序列,包括:The method according to any one of claims 1 to 4, wherein the constructing the first feature sequence according to the data feature of the first online service data comprises:
    根据所述第一在线业务数据的数据特征,构建第一数据序列;其中,所述第一数据序列中的一个元素为一个数据点,所述数据点至少包含以下特征:所述数据点所在的时刻,所述数据点所对应的业务数据的数据特征;Constructing a first data sequence according to the data feature of the first online service data; wherein, one element in the first data sequence is a data point, and the data point includes at least the following feature: where the data point is located At the moment, the data characteristics of the service data corresponding to the data point;
    将所述第一数据序列生成第一特征序列;其中,所述第一特征序列中的元素至少包含以下特征:所述数据点所在的时刻,所述数据点与前一数据点之间的变化速率,以及所述数据点所在的时刻与所述前一数据点所在的时刻之间的时间段。Generating, by the first data sequence, a first feature sequence; wherein, the element in the first feature sequence includes at least a feature: a time at which the data point is located, a change between the data point and a previous data point The rate, and the time period between the time at which the data point is located and the time at which the previous data point is located.
  6. 根据权利要求5所述的方法,其特征在于,在所述根据所述第一在线业务数据的数据特征,构建第一数据序列之后,所述方法还包括:The method according to claim 5, wherein after the constructing the first data sequence according to the data feature of the first online service data, the method further comprises:
    提取所述第一数据序列中的特征点,并根据所述第一数据序列中的特征点构建第二数据序列;Extracting feature points in the first data sequence, and constructing a second data sequence according to feature points in the first data sequence;
    所述将所述第一数据序列生成第一特征序列,包括:Generating the first data sequence to the first feature sequence, including:
    将所述第二数据序列生成第一特征序列;其中,所述第一特征序列中的元素包括所述特征点所在的时刻,所述特征点与前一特征点之间的变化速率,以及所述特征点所在的时刻与所述前一特征点所在的时刻之间的时间段。Generating, by the second data sequence, a first feature sequence; wherein, the element in the first feature sequence includes a time at which the feature point is located, a rate of change between the feature point and a previous feature point, and a A time period between a time at which the feature point is located and a time at which the previous feature point is located.
  7. 根据权利要求1至6任意一项所述的方法,其特征在于,所述待测触发点是第i个待测触发点,i≥1,i是整数;若i=1,则所述待测触发点所在的窗口是指从开始接收在线业务数据的时刻至所述待测触发点之间的窗口;若i≥2,则所述待测触发点所在的窗口是从第i-1个待测触发点至所述待测触发点之间的窗口。 The method according to any one of claims 1 to 6, wherein the trigger point to be tested is the i-th trigger point to be tested, i≥1, i is an integer; if i=1, the method is The window in which the trigger point is located refers to the window between the time when the online service data is received and the trigger point to be tested; if i≥2, the window of the trigger point to be tested is from the i-1th A window between the trigger point to be tested and the trigger point to be tested.
  8. 根据权利要求1至7任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, wherein the method further comprises:
    将从开始接收在线业务数据的时刻开始的预设时长的整数倍时的时刻,确定为所述待测触发点。The time at which the integer time of the preset duration is started from the time when the online service data is started is determined as the trigger point to be tested.
  9. 根据权利要求1至7任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, wherein the method further comprises:
    将从开始接收在线业务数据的时刻开始至接收到的在线业务数据为预设数据量的整数倍时的时刻,确定为所述待测触发点。The time at which the online service data is received is started, and the time when the received online service data is an integral multiple of the preset data amount is determined as the trigger point to be tested.
  10. 根据权利要求1至9任意一项所述的方法,其特征在于,在所述确定所述第一特征序列与至少一个代表切片之间的关联关系之前,所述方法还包括:The method according to any one of claims 1 to 9, wherein before the determining the association relationship between the first feature sequence and the at least one representative slice, the method further comprises:
    获取历史业务数据,并根据所述历史业务数据构建历史特征序列;Obtaining historical business data, and constructing a historical feature sequence according to the historical business data;
    确定所述历史特征序列中的模型变化点;Determining a model change point in the sequence of historical features;
    基于所述历史特征序列中的模型变化点对历史特征序列进行切割,得到代表切片。The historical feature sequence is cut based on the model change point in the historical feature sequence to obtain a representative slice.
  11. 根据权利要求10所述的方法,其特征在于,所述基于模型变化点对历史特征序列进行切割,得到代表切片,包括:The method according to claim 10, wherein the cutting the historical feature sequence based on the model change point to obtain a representative slice comprises:
    基于模型变化点对历史特征序列进行切割,并对切割后得到的切片进行聚类,得到代表切片。The historical feature sequence is cut based on the model change point, and the slice obtained after cutting is clustered to obtain a representative slice.
  12. 一种模型更新装置,其特征在于,包括:A model updating device, comprising:
    获取模块,用于获取在待测触发点所在的窗口内接收到的第一在线业务数据;An acquiring module, configured to acquire first online service data received in a window where the trigger point to be tested is located;
    构建模块,用于根据所述第一在线业务数据的数据特征,构建第一特征序列;a building module, configured to construct a first feature sequence according to the data feature of the first online service data;
    确定模块,用于确定所述第一特征序列与至少一个代表切片之间的关联关系;所述代表切片是根据历史业务数据的数据特征构建的特征序列的切片; a determining module, configured to determine an association relationship between the first feature sequence and at least one representative slice; the representative slice is a slice of a feature sequence constructed according to data features of historical service data;
    更新模块,用于若所述第一特征序列与至少一个代表切片之间的关联关系满足预设条件,则更新当前模型。And an updating module, configured to update the current model if an association relationship between the first feature sequence and the at least one representative slice satisfies a preset condition.
  13. 根据权利要求12所述的装置,其特征在于,The device according to claim 12, characterized in that
    所述获取模块还用于:若所述第一特征序列与所述至少一个代表切片之间的关联关系不满足所述预设条件,则获取所述待测触发点的后续待测触发点所在的窗口内接收到的第二在线业务数据;The acquiring module is further configured to: if the association relationship between the first feature sequence and the at least one representative slice does not meet the preset condition, obtain a subsequent trigger point to be tested of the to-be-tested trigger point Second online business data received within the window;
    所述构建模块还用于:根据所述第一在线业务数据的数据特征和所述第二在线业务数据的数据特征,按照接收时间先后顺序构建第二特征序列;The building module is further configured to: according to the data feature of the first online service data and the data feature of the second online service data, construct a second feature sequence according to a receiving time sequence;
    所述确定模块还用于:确定所述第二特征序列与所述至少一个代表切片之间的关联关系;The determining module is further configured to: determine an association relationship between the second feature sequence and the at least one representative slice;
    所述更新模块还用于:若所述第二特征序列与所述至少一个代表切片之间的关联关系满足预设条件,则更新所述当前模型。The updating module is further configured to: if the association relationship between the second feature sequence and the at least one representative slice meets a preset condition, update the current model.
  14. 根据权利要求12或13所述的装置,其特征在于,使用向量表示所述第一特征序列和所述代表切片;The apparatus according to claim 12 or 13, wherein the first feature sequence and the representative slice are represented by a vector;
    所述确定模块具体用于:确定所述第一特征序列与至少一个代表切片之间的距离;The determining module is specifically configured to: determine a distance between the first feature sequence and at least one representative slice;
    所述更新模块具体用于:若所述距离小于或等于第一预设阈值,则更新当前模型。The update module is specifically configured to: if the distance is less than or equal to the first preset threshold, update the current model.
  15. 根据权利要求12或13所述的装置,其特征在于,使用向量表示所述第一特征序列和所述代表切片;The apparatus according to claim 12 or 13, wherein the first feature sequence and the representative slice are represented by a vector;
    所述确定模块具体用于:确定所述第一特征序列与至少一个代表切片之间的相似度;The determining module is specifically configured to: determine a similarity between the first feature sequence and at least one representative slice;
    所述更新模块具体用于:若所述相似度大于或等于第二预设阈值,则更新当前模型。The update module is specifically configured to: if the similarity is greater than or equal to a second preset threshold, update the current model.
  16. 根据权利要求12至15任意一项所述的装置,其特征在于,所述构建模块具体用于:The device according to any one of claims 12 to 15, wherein the building module is specifically configured to:
    根据所述第一在线业务数据的数据特征,构建第一数据序列;其中, 所述第一数据序列中的一个元素为一个数据点,所述数据点至少包含以下特征:所述数据点所在的时刻,所述数据点所对应的业务数据的数据特征;Constructing a first data sequence according to data characteristics of the first online service data; wherein An element in the first data sequence is a data point, and the data point includes at least the following feature: a time at which the data point is located, and a data feature of the service data corresponding to the data point;
    将所述第一数据序列生成第一特征序列;其中,所述第一特征序列中的元素至少包含以下特征:所述数据点所在的时刻,所述数据点与前一数据点之间的变化速率,以及所述数据点所在的时刻与所述前一数据点所在的时刻之间的时间段。Generating, by the first data sequence, a first feature sequence; wherein, the element in the first feature sequence includes at least a feature: a time at which the data point is located, a change between the data point and a previous data point The rate, and the time period between the time at which the data point is located and the time at which the previous data point is located.
  17. 根据权利要求16所述的装置,其特征在于,The device of claim 16 wherein:
    所述构建模块还用于:提取所述第一数据序列中的特征点,并根据所述第一数据序列中的特征点构建第二数据序列;The building module is further configured to: extract feature points in the first data sequence, and construct a second data sequence according to the feature points in the first data sequence;
    所述构建模块在执行所述将所述第一数据序列生成第一特征序列时,具体用于:将所述第二数据序列生成第一特征序列;其中,所述第一特征序列中的元素包括所述特征点所在的时刻,所述特征点与前一特征点之间的变化速率,以及所述特征点所在的时刻与所述前一特征点所在的时刻之间的时间段。The constructing module is configured to: when the first data sequence is generated by using the first data sequence, generate: the second data sequence to generate a first feature sequence; wherein, the element in the first feature sequence And including a time period at which the feature point is located, a rate of change between the feature point and a previous feature point, and a time period between a time at which the feature point is located and a time at which the previous feature point is located.
  18. 根据权利要求12至17任意一项所述的装置,其特征在于,所述待测触发点是第i个待测触发点,i≥1,i是整数;若i=1,则所述待测触发点所在的窗口是指从开始接收在线业务数据的时刻至所述待测触发点之间的窗口;若i≥2,则所述待测触发点所在的窗口是从第i-1个待测触发点至所述待测触发点之间的窗口。The device according to any one of claims 12 to 17, wherein the trigger point to be tested is the i-th trigger point to be tested, i≥1, i is an integer; if i=1, the waiting The window in which the trigger point is located refers to the window between the time when the online service data is received and the trigger point to be tested; if i≥2, the window of the trigger point to be tested is from the i-1th A window between the trigger point to be tested and the trigger point to be tested.
  19. 根据权利要求12至18任意一项所述的装置,其特征在于,Apparatus according to any one of claims 12 to 18, wherein
    所述确定模块还用于:将从开始接收在线业务数据的时刻开始的预设时长的整数倍时的时刻,确定为所述待测触发点。The determining module is further configured to determine, as the trigger point to be tested, a time when an integer multiple of the preset duration from the time when the online service data is started to be received.
  20. 根据权利要求12至18任意一项所述的装置,其特征在于,Apparatus according to any one of claims 12 to 18, wherein
    所述确定模块还用于:将从开始接收在线业务数据的时刻开始至接收到的在线业务数据为预设数据量的整数倍时的时刻,确定为所述待测触发点。The determining module is further configured to determine, as the trigger point to be tested, from a time when the online service data is started to be received, and when the received online service data is an integer multiple of the preset data amount.
  21. 根据权利要求12至20任意一项所述的装置,其特征在于,Apparatus according to any one of claims 12 to 20, wherein
    所述获取模块还用于:获取历史业务数据; The obtaining module is further configured to: acquire historical service data;
    所述构建模块还用于:根据所述历史业务数据构建历史特征序列;The building module is further configured to: construct a historical feature sequence according to the historical service data;
    所述确定模块还用于:确定所述历史特征序列中的模型变化点;The determining module is further configured to: determine a model change point in the historical feature sequence;
    所述装置还包括:The device also includes:
    生成模块,用于基于所述历史特征序列中的模型变化点对历史特征序列进行切割,得到代表切片。And a generating module, configured to cut the historical feature sequence based on the model change point in the historical feature sequence to obtain a representative slice.
  22. 根据权利要求21所述的装置,其特征在于,The device according to claim 21, wherein
    所述生成模块具体用于:基于模型变化点对历史特征序列进行切割,并对切割后得到的切片进行聚类,得到代表切片。The generating module is specifically configured to: cut a historical feature sequence based on a model change point, and perform clustering on the slice obtained after the cutting to obtain a representative slice.
  23. 一种模型更新装置,其特征在于,包括:处理器、存储器、系统总线和通信接口;A model updating apparatus, comprising: a processor, a memory, a system bus, and a communication interface;
    所述存储器用于存储计算机执行指令,所述处理器与所述存储器通过所述系统总线连接,当所述装置运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述装置执行如权利要求1-11任意一项所述的模型更新方法。 The memory is configured to store a computer to execute instructions, the processor is coupled to the memory via the system bus, and when the device is in operation, the processor executes the computer-executed instructions stored in the memory to cause The apparatus performs the model updating method according to any one of claims 1-11.
PCT/CN2017/090609 2016-08-08 2017-06-28 Model updating method and apparatus WO2018028326A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610645496.7A CN107704929B (en) 2016-08-08 2016-08-08 Model updating method and device
CN201610645496.7 2016-08-08

Publications (1)

Publication Number Publication Date
WO2018028326A1 true WO2018028326A1 (en) 2018-02-15

Family

ID=61162616

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/090609 WO2018028326A1 (en) 2016-08-08 2017-06-28 Model updating method and apparatus

Country Status (2)

Country Link
CN (1) CN107704929B (en)
WO (1) WO2018028326A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080092A (en) * 2019-11-29 2020-04-28 北京云聚智慧科技有限公司 Data annotation management method and device, electronic equipment and readable storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175084B (en) * 2019-04-04 2023-04-25 阿里巴巴集团控股有限公司 Data change monitoring method and device
CN111538767B (en) * 2020-05-28 2023-07-14 支付宝(杭州)信息技术有限公司 Data processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050008212A1 (en) * 2003-04-09 2005-01-13 Ewing William R. Spot finding algorithm using image recognition software
CN101290626A (en) * 2008-06-12 2008-10-22 昆明理工大学 Text categorization feature selection and weight computation method based on field knowledge
CN103488705A (en) * 2013-09-06 2014-01-01 电子科技大学 User interest model incremental update method of personalized recommendation system
CN105589971A (en) * 2016-01-08 2016-05-18 车智互联(北京)科技有限公司 Method and device for training recommendation model, and recommendation system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620597A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 Method for analyzing product association of data service in mobile communication industry
US10289959B2 (en) * 2010-05-26 2019-05-14 Automation Anywhere, Inc. Artificial intelligence and knowledge based automation enhancement
CN105095271B (en) * 2014-05-12 2019-04-05 北京大学 Microblogging search method and microblogging retrieve device
CN104217040A (en) * 2014-10-11 2014-12-17 清华大学 Rapid pollution incident detection method based on traditional online monitor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050008212A1 (en) * 2003-04-09 2005-01-13 Ewing William R. Spot finding algorithm using image recognition software
CN101290626A (en) * 2008-06-12 2008-10-22 昆明理工大学 Text categorization feature selection and weight computation method based on field knowledge
CN103488705A (en) * 2013-09-06 2014-01-01 电子科技大学 User interest model incremental update method of personalized recommendation system
CN105589971A (en) * 2016-01-08 2016-05-18 车智互联(北京)科技有限公司 Method and device for training recommendation model, and recommendation system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080092A (en) * 2019-11-29 2020-04-28 北京云聚智慧科技有限公司 Data annotation management method and device, electronic equipment and readable storage medium
CN111080092B (en) * 2019-11-29 2023-04-18 北京云聚智慧科技有限公司 Data annotation management method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN107704929A (en) 2018-02-16
CN107704929B (en) 2020-10-23

Similar Documents

Publication Publication Date Title
US11816272B2 (en) Identifying touchpoint contribution utilizing a touchpoint attribution attention neural network
US20220050842A1 (en) Querying a database
US11941527B2 (en) Population based training of neural networks
US10824669B2 (en) Sticker recommendation method and apparatus, server cluster, and storage medium
US10860941B2 (en) Method and device for predicting information propagation in social network
WO2016008383A1 (en) Application recommendation method and application recommendation apparatus
WO2018028326A1 (en) Model updating method and apparatus
US11593860B2 (en) Method, medium, and system for utilizing item-level importance sampling models for digital content selection policies
US7676518B2 (en) Clustering for structured data
US10949000B2 (en) Sticker recommendation method and apparatus
US11863397B2 (en) Traffic prediction method, device, and storage medium
WO2017076004A1 (en) Method and apparatus for predicting user position in preset time point
Zhang et al. Federated learning with adaptive communication compression under dynamic bandwidth and unreliable networks
US10592147B2 (en) Dataset relevance estimation in storage systems
US20220245526A1 (en) Quantile hurdle modeling systems and methods for sparse time series prediction applications
US20150052126A1 (en) Method and system for recommending relevant web content to second screen application users
CN113792212A (en) Multimedia resource recommendation method, device, equipment and storage medium
Shortle et al. Optimal splitting for rare-event simulation
CN108563648B (en) Data display method and device, storage medium and electronic device
US20230214676A1 (en) Prediction model training method, information prediction method and corresponding device
CN112087365A (en) Instant messaging method and device applied to group, electronic equipment and storage medium
CN113326436B (en) Method, device, electronic equipment and storage medium for determining recommended resources
US20190114673A1 (en) Digital experience targeting using bayesian approach
US11275749B2 (en) Enhanced query performance prediction for information retrieval systems
CN113138877B (en) Method, apparatus and computer program product for managing backup systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17838456

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17838456

Country of ref document: EP

Kind code of ref document: A1