CN109189822B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN109189822B
CN109189822B CN201810895106.0A CN201810895106A CN109189822B CN 109189822 B CN109189822 B CN 109189822B CN 201810895106 A CN201810895106 A CN 201810895106A CN 109189822 B CN109189822 B CN 109189822B
Authority
CN
China
Prior art keywords
data
processed
identifier
time window
occurrence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810895106.0A
Other languages
Chinese (zh)
Other versions
CN109189822A (en
Inventor
杨竞霜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute Of Big Data Research
Original Assignee
Beijing Institute Of Big Data Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute Of Big Data Research filed Critical Beijing Institute Of Big Data Research
Priority to CN201810895106.0A priority Critical patent/CN109189822B/en
Publication of CN109189822A publication Critical patent/CN109189822A/en
Application granted granted Critical
Publication of CN109189822B publication Critical patent/CN109189822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The present disclosure relates to a data processing method and apparatus, the method comprising: determining the identifiers of a plurality of data to be processed, grouping the data to be processed according to the occurrence time of the data to be processed and a plurality of target time periods in the statistical time interval, acquiring the identifier and the first occurrence frequency of the data to be processed in each data group according to the obtained identifiers of the data to be processed in the data groups, acquiring the identifier and the second occurrence frequency of the data to be processed in the first time window according to the identifiers of the data to be processed and the first occurrence frequency, moving the first time window by taking the duration of the target time period as a unit, and acquiring the identifier and the third occurrence frequency of the data to be processed in the moved first time window. The method and the device can realize the rapid statistics of various types of data in different time windows and acquire the occurrence frequency of the data to be processed.

Description

Data processing method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus.
Background
With the development of wireless technology and the progress of terminal devices, the trend of data quantification is emerging in various industries. In the field of scientific research, such as astronomical observation data, meteorological data, ocean monitoring data and the like, as a sensor network is mature, the collection of the data becomes easy, and the explosive growth of log information is caused; in addition, in the decision making field, daily trading data, enterprise-related reports, microblog data, and the like in the stock market are also being developed vigorously.
With the increasing scale of data, how to process massive data becomes a major topic of current research.
Disclosure of Invention
According to an aspect of the present disclosure, there is provided a data processing method, the method including:
preprocessing a plurality of data to be processed received in a statistical time interval, and determining identifiers of the plurality of data to be processed, wherein the identifiers are used for determining the types and/or sources of the data to be processed;
grouping the data to be processed according to the occurrence time of the data to be processed and a plurality of target time periods in the statistical time interval to obtain a plurality of data groups;
acquiring the identifier and the first occurrence frequency of the data to be processed in each data group according to the identifier and the first occurrence frequency of the data to be processed in the plurality of data groups, wherein the first occurrence frequency is the occurrence frequency of the identifier of the data to be processed in each data group;
acquiring an identifier and a second occurrence number of the to-be-processed data in a first time window according to the plurality of data groups, the identifier of the to-be-processed data in the plurality of data groups and the first occurrence number, wherein the second occurrence number is the occurrence number of the identifier of the to-be-processed data in the first time window; and
and moving the first time window by taking the duration of the target time period as a unit, and acquiring the identifier of the data to be processed in the moved first time window and a third occurrence frequency, wherein the third occurrence frequency is the occurrence frequency of the identifier of the data to be processed in the moved first time window.
In a possible embodiment, preprocessing a plurality of pieces of data to be processed received within a statistical time interval, and determining an identifier of the plurality of pieces of data to be processed includes:
and converting the information of the categories or the sources of the data to be processed into the identifiers, wherein the identifiers are integer identifiers of 0-K.
In one possible embodiment, the length of the first time window is M times the duration of the target time period, where M is an integer greater than 2.
In a possible embodiment, the number of the data sets is N, where the duration of the target time period is smaller than the duration of the statistical time interval, and N is a ratio of the duration of the statistical time interval to the duration of the target time period.
In one possible embodiment, the method further comprises:
and acquiring the identifier of the data to be processed in the second time window and the fourth occurrence frequency of the corresponding data to be processed according to the identifiers of the data to be processed in the plurality of first time windows and the second occurrence frequency of the corresponding data to be processed and/or the identifier of the data to be processed in the moved first time window and the third occurrence frequency of the corresponding data to be processed, wherein the length of the second time window is greater than that of the first time window.
In one possible embodiment, the method further comprises:
and acquiring the identifier of the data to be processed in a third time window and a fifth occurrence frequency of the data to be processed corresponding to the identifier of the data to be processed in the first time window and/or the second time window and the fourth occurrence frequency of the data to be processed corresponding to the identifier of the data to be processed, wherein the length of the third time window is greater than that of the second time window.
According to another aspect of the present disclosure, there is provided a data processing apparatus, the apparatus comprising:
the device comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is configured to preprocess a plurality of pieces of data to be processed received in a statistical time interval and determine identifiers of the plurality of pieces of data to be processed, and the identifiers are used for determining the types and/or sources of the data to be processed;
the grouping module is connected with the preprocessing module and is configured to perform grouping processing on the plurality of data to be processed according to the occurrence time of the plurality of data to be processed and a plurality of target time periods in the statistic time interval to obtain a plurality of data groups;
the first acquisition module is connected with the grouping module and is configured to acquire the identifier and the first occurrence number of the to-be-processed data in each data group according to the identifier and the first occurrence number of the to-be-processed data in the plurality of data groups, wherein the first occurrence number is the occurrence number of the identifier of the to-be-processed data in each data group;
the second acquisition module is connected with the first acquisition module and configured to acquire an identifier of the data to be processed in a first time window and a second occurrence number according to the data groups, the identifier of the data to be processed in the data groups and the first occurrence number, wherein the second occurrence number is the occurrence number of the identifier of the data to be processed in the first time window; and
and the third acquisition module is connected with the second acquisition module and is configured to move the first time window by taking the duration of the target time period as a unit, and acquire the identifier of the data to be processed in the moved first time window and a third occurrence frequency, wherein the third occurrence frequency is the occurrence frequency of the identifier of the data to be processed in the moved first time window.
In one possible embodiment, the preprocessing module comprises:
the first determining submodule is configured to convert the information of the categories or the sources of the plurality of pieces of data to be processed into the identifiers, and the identifiers are integer identifiers of 0-K.
In one possible embodiment, the length of the first time window is M times the duration of the target time period, where M is an integer greater than 2.
In a possible embodiment, the number of the data sets is N, where the duration of the target time period is smaller than the duration of the statistical time interval, and N is a ratio of the duration of the statistical time interval to the duration of the target time period.
According to another aspect of the present disclosure, there is provided a data processing apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the above data processing method.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the above-described data processing method.
The method comprises the steps of preprocessing a plurality of data to be processed received in a statistical time interval, determining identifiers of the plurality of data to be processed, wherein the identifiers are used for determining the types and/or sources of the data to be processed, grouping the plurality of data to be processed according to the occurrence time of the plurality of data to be processed and a plurality of target time periods in the statistical time interval to obtain a plurality of data groups, obtaining the identifier and the first occurrence number of the data to be processed in each data group according to the identifiers of the data to be processed in the plurality of data groups, obtaining the identifier and the second occurrence number of the data to be processed in a first time window according to the plurality of data groups, the identifiers of the data to be processed in the plurality of data groups and the first occurrence number, and moving the first time window by taking the duration of the target time period as a unit, and acquiring the identifier and the third occurrence frequency of the data to be processed in the moved first time window. The method and the device can realize the rapid statistics of various data under different time windows, obtain the occurrence frequency of the data to be processed, prepare for the subsequent data processing, greatly improve the speed and efficiency of data processing, and obviously save operation resources. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure.
FIG. 2 shows a flow diagram of a data processing method according to an embodiment of the present disclosure.
FIG. 3 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
FIG. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
FIG. 5 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
FIG. 6 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Referring to fig. 1, fig. 1 is a flow chart illustrating a data processing method according to an embodiment of the present disclosure.
The data processing method can be applied to a terminal or a server, and the terminal can comprise a computer, a mobile phone, a tablet computer and the like.
As shown in fig. 1, the method includes:
step S110, preprocessing a plurality of to-be-processed data received within a statistical time interval, and determining identifiers of the plurality of to-be-processed data, where the identifiers are used to determine categories and/or sources of the to-be-processed data.
In a possible implementation, the data processing method can perform statistics on data in the fields of finance, scientific research and the like. The statistical time interval may be, for example, a month, two months, etc., of time in months or other units of time. For example, the data processing method may count data (or transaction data, etc.) of the user logging in the financial transaction system within two months.
In one possible embodiment, the identifier may correspond to a category and a source of the to-be-processed data, the category may include login data, transaction data, information modification data, and the like, and the source may include an IP address of a device transmitting the to-be-processed data, a user account number, a login terminal device ID, and the like.
In a possible implementation, preprocessing a plurality of to-be-processed data received within a statistical time interval, and determining an identifier of the plurality of to-be-processed data may include the following steps:
and converting the information of the categories or the sources of the data to be processed into the identifiers, wherein the identifiers are integer identifiers of 0-K.
In this way, information (usually a string of character strings) representing the category and source of the data to be processed can be converted into an integer number (identification), thereby reducing the computation resources and storage resources of the server or the interrupt.
In a possible implementation manner, for example, an array may be established for the to-be-processed data, the to-be-processed data in the statistical time interval are numbered according to a time sequence, and the to-be-processed data appearing repeatedly is displayed by using an existing number, where the number may be used as the identifier, and the number (identifier) corresponds to an IP address of the to-be-processed data (or other information indicating a category and/or a source of the to-be-processed data).
For example, taking the data to be processed as the IP address as an example, when the sequentially occurring IP addresses are IP0, IP1, IP2, IP1, IP1, and IP3, an array is established to number the occurring IP addresses. At this time, IP0 is numbered 0; number 1 to IP 1; number 2 to IP 2; carry forward the second occurrence of IP1 with the previous number 1; carry forward the third occurrence of IP1 with the previous number 1; IP3 is numbered 3. It should be noted that, when the to-be-processed data is preprocessed, the created data may carry a timestamp to indicate the occurrence time of the to-be-processed data in the statistical time interval after being digitized, the above forms of the IP address and the number are merely examples and are not used to limit the forms, and in an actual situation, the IP address does not necessarily start from 0, and the number does not necessarily start from 0.
Step S120, performing grouping processing on the multiple pieces of data to be processed according to the occurrence time of the multiple pieces of data to be processed and the multiple target time periods within the statistical time interval, so as to obtain multiple data groups.
In a possible embodiment, the number of the data sets is N, where the duration of the target time period is smaller than the duration of the statistical time interval, and N is a ratio of the duration of the statistical time interval to the duration of the target time period.
In a possible implementation manner, the number of the plurality of target time periods may be determined according to needs, and may also be determined according to the duration of the target time period and the statistical time interval.
For example, 10000 target time periods may be set for the statistical time interval of 10000 minutes, and the duration of the corresponding target time period is 1 minute; the duration of the target time period may be set to 1 minute (or others) first, and then the number of target time periods may be 10000.
And the data groups comprise the identifiers of the data to be processed which sequentially appear and the first appearance times corresponding to the identifiers.
Step S130, obtaining an identifier of the to-be-processed data in each data group and a first occurrence number according to the identifier of the to-be-processed data in the plurality of data groups, where the first occurrence number is an occurrence number of the identifier of the to-be-processed data in each data group.
In one possible implementation, the identifier of the data to be processed in the current data set and the first occurrence number may be obtained by accumulating the occurrence number of the identifier of the data to be processed in the current data set.
For example, when the statistical time interval is 10000 minutes and the target time periods are 10000, the duration of the corresponding target time period is 1 minute, and three target time periods of 1-3 minutes (from the beginning of the first minute to the end of the third minute) are taken as an example for explanation.
If the first minute data set includes: number 0 (corresponding to IP0 or device 0 or other data to be processed); number 1; number 0.
If the second minute data set includes: number 1 (corresponding to IP1 or device 1 or other data to be processed); number 0; and (4) number 3.
If the third minute data set includes: number 3 (corresponding to IP3 or device 3 or other data to be processed); number 2; number 0.
From the above assumptions, it is possible to obtain:
the first minute data set included: numbering 0, wherein the first occurrence frequency is 2 times; number 1, first occurrence number 1.
The second minute data set included: numbering 1, wherein the first occurrence frequency is 1; number 0, first occurrence number 1; number 3, the first number of occurrences is 1. The third minute data set included: number 3, the first occurrence frequency is 1; number 2, the first number of occurrences is 1 number 0, and the first number of occurrences is 1.
Step S140, obtaining an identifier of the to-be-processed data in the first time window and a second occurrence number according to the plurality of data groups, the identifier of the to-be-processed data in the plurality of data groups, and the first occurrence number, where the second occurrence number is the occurrence number of the identifier of the to-be-processed data in the first time window.
In one possible embodiment, the length of the first time window is M times the duration of the target time period, where M is an integer greater than 2.
For example, when the statistical time interval is 10000 minutes and the target time periods are 10000, the duration of the corresponding target time period is 1 minute, and the length of the first time window may be 10 minutes, twenty minutes, thirty minutes, and so on.
In one possible implementation, the first number of occurrences of the to-be-processed data in the first time window may be accumulated to obtain a second number of occurrences of the to-be-processed data specifically identified in the first time window.
Step S150, moving the first time window by using the duration of the target time period as a unit, and obtaining an identifier of the to-be-processed data in the moved first time window and a third occurrence number, where the third occurrence number is the occurrence number of the identifier of the to-be-processed data in the moved first time window.
In a possible implementation manner, the data to be processed in the shifted first time window and the corresponding first occurrence number may be accumulated to obtain a third occurrence number of the data to be processed specifically identified in the shifted first time window.
In one possible embodiment, the third occurrence may be obtained by using the second occurrence in the first time window before the movement and the first occurrence in the data set.
For example, when the first time window (1-10 minutes) is moved by the duration (1 minute) of the target time period, a moved first time window (2-11 minutes) is formed, and the identifier and the third occurrence number of the data to be processed in the moved first time window can be obtained according to the identifier and the second occurrence number of the data to be processed in the first time window (1-10 minutes) and the identifier and the first occurrence number of the data to be processed in the data set of the 11 th minute. The method comprises the following steps: and removing the data of the 1 st minute in the first time window (1-10 minutes), reserving the data of the 2 nd minute to the 10 th minute, and adding the identifier and the first occurrence number of the data to be processed in the data group of the 11 th minute on the basis to obtain the identifier and the third occurrence number of the data to be processed in the first time window (2-11 minutes) after movement.
And sequentially moving the first time windows by taking the duration of the target time period as a unit and acquiring corresponding data, so as to obtain the identification and the occurrence frequency of the data to be processed of each first time window in the whole statistical time interval.
It should be noted that the above examples of the respective data are for convenience of explaining the technical solutions of the present disclosure, and are not intended to limit the present disclosure.
Thus, the identification of a plurality of data to be processed is determined by preprocessing a plurality of data to be processed received in the statistical time interval, according to the occurrence time of the plurality of data to be processed and a plurality of target time periods in the statistic time interval, grouping the data to be processed to obtain a plurality of data groups, obtaining the identifier and the first occurrence number of the data to be processed in each data group according to the identifier of the data to be processed in the data groups, acquiring the identifier and the second occurrence frequency of the data to be processed in a first time window according to the data groups, the identifier of the data to be processed in the data groups and the first occurrence frequency, and moving the first time window by taking the duration of the target time period as a unit, and acquiring the identifier and the third occurrence frequency of the data to be processed in the moved first time window. The method and the device can realize the rapid statistics of various data under different time windows, obtain the occurrence frequency of the data to be processed, prepare for the subsequent data processing, greatly improve the speed and efficiency of data processing, and obviously save operation resources.
Referring to fig. 2, fig. 2 is a flow chart illustrating a data processing method according to an embodiment of the disclosure.
As shown in fig. 2, the method may further include, in addition to the foregoing steps S110 to S150:
step S260, obtaining the identifier of the to-be-processed data in the second time window and the fourth occurrence frequency of the corresponding to-be-processed data according to the identifiers of the to-be-processed data in the plurality of first time windows and the second occurrence frequency of the corresponding to-be-processed data and/or the identifier of the to-be-processed data in the moved first time window and the third occurrence frequency of the corresponding to-be-processed data, wherein the length of the second time window is greater than the length of the first time window.
In a possible embodiment, if the length of the second time window is an integer multiple of the length of the first time window, for example, if the length of the first time window is 10 minutes (1-10 minutes), the length of the shifted first time window is 11-20 minutes, and the length of the second time window is 20 minutes (1-20 minutes), the identifier and the corresponding fourth occurrence number of the data to be processed in the second time window can be obtained by adding the identifier and the second occurrence number in the first time window of 1-10 minutes and the identifier and the third occurrence number in the first time window of 11-20 minutes.
In a possible embodiment, if the length of the second time window is not an integer multiple of the length of the first time window, for example, if the length of the first time window is 10 minutes (1-10 minutes) and the length of the second time window is 15 minutes (1-15 minutes), the identifier of the data to be processed in the second time window and the corresponding fourth number of occurrences may be obtained by adding up the identifier in the first time window of 1-10 minutes, the second number of occurrences and the identifier of the data to be processed in the data group of 11 th minute, 12 th minute, 13 th minute, 14 th minute and 15 th minute, and the first number of occurrences.
Step S270, obtaining the identifier of the to-be-processed data in a third time window and a fifth occurrence frequency of the corresponding to-be-processed data according to the identifier of the to-be-processed data in the first time window and/or the second time window and the fourth occurrence frequency of the corresponding to-be-processed data, wherein the length of the third time window is greater than the length of the second time window.
For example, the length of the first time window is 10 minutes (1-10 minutes), the length of the second time window is 20 minutes (11-30 minutes), and if the length of the third time window is 30 minutes (1-30 minutes), the identifier of the data to be processed in the third time window and the fifth occurrence number of the corresponding data to be processed can be obtained through the data of the first time window and the second time window.
It should be understood that the sizes of the first time window, the second time window and the third time window are not always in corresponding multiple relationships, and when the three have no corresponding multiple relationships, the identifier and the corresponding number of occurrences of the to-be-processed data in the third time window may be obtained by combining the identifiers and the first number of occurrences in the plurality of data sets.
It should be noted that, although the data processing method is described above by taking the first time window, the second time window, and the third time window as examples, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, the user can set a plurality of time windows flexibly according to personal preferences and/or actual application scenarios, for example, more time windows such as a fourth time window (forty minutes), a fifth time window (fifty minutes), and so on may be included, as long as the above-described method is adopted to calculate the identifier and the corresponding occurrence number of the data to be processed in the large time window by using the small time window.
By the method, the identification and the occurrence frequency of the data to be processed in the large time window can be obtained through the identification and the occurrence frequency of the data to be processed in the small time window, and the calculation resource can be saved and the calculation speed can be greatly improved by adopting a dynamic and superposition calculation mode.
Referring to fig. 3, fig. 3 is a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 3, the apparatus includes:
the preprocessing module 10 is configured to preprocess a plurality of pieces of data to be processed received within a statistical time interval, and determine identifiers of the plurality of pieces of data to be processed, where the identifiers are used to uniquely determine categories and/or sources of the data to be processed.
And the grouping module 20 is connected to the preprocessing module, and is configured to perform grouping processing on the multiple pieces of data to be processed according to the occurrence time of the multiple pieces of data to be processed and multiple target time periods within the statistical time interval, so as to obtain multiple data groups.
The first obtaining module 30 is connected to the grouping module, and configured to obtain, according to the identifier of the to-be-processed data in the plurality of data groups, the identifier of the to-be-processed data in each data group and a first occurrence number, where the first occurrence number is the occurrence number of the identifier of the to-be-processed data in each data group.
The second obtaining module 40 is connected to the first obtaining module, and is configured to obtain, according to the plurality of data groups, the identifiers of the data to be processed in the plurality of data groups, and the first occurrence number, the identifiers of the data to be processed in a first time window and a second occurrence number, where the second occurrence number is the occurrence number of the identifiers of the data to be processed in the first time window.
A third obtaining module 50, connected to the second obtaining module, configured to move the first time window by using the duration of the target time period as a unit, and obtain an identifier of the to-be-processed data in the moved first time window and a third occurrence frequency, where the third occurrence frequency is an occurrence frequency of the identifier of the to-be-processed data in the moved first time window.
It should be noted that the data processing apparatus is an apparatus item corresponding to the data processing method, and for specific introduction of each module in the data processing apparatus, reference is made to the description of the data processing method before, and details are not repeated here.
In this way, the identification of the multiple pieces of data to be processed is determined by preprocessing the multiple pieces of data to be processed received in a statistical time interval, the identification is used for uniquely determining the type and/or source of the data to be processed, the multiple pieces of data to be processed are grouped according to the occurrence time of the multiple pieces of data to be processed and multiple target time periods in the statistical time interval to obtain multiple data groups, the identification and the first occurrence number of the data to be processed in each data group are obtained according to the identification of the data to be processed in the multiple data groups, the identification and the second occurrence number of the data to be processed in a first time window are obtained according to the multiple data groups, the identification and the first occurrence number of the data to be processed in the multiple data groups, and the first time window is moved by taking the duration of the target time period as a unit, and acquiring the identifier and the third occurrence frequency of the data to be processed in the moved first time window. The method and the device can realize the rapid statistics of various data under different time windows, obtain the occurrence frequency of the data to be processed, prepare for the subsequent data processing, greatly improve the speed and efficiency of data processing, and obviously save operation resources.
Referring to fig. 4, fig. 4 is a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 4, the apparatus includes:
the preprocessing module 10 is configured to preprocess a plurality of pieces of data to be processed received within a statistical time interval, and determine identifiers of the plurality of pieces of data to be processed, where the identifiers are used to uniquely determine categories and/or sources of the data to be processed.
In a possible implementation, the preprocessing module 10 may include:
a first determining sub-module 110 configured to convert the information of the category or the source of the plurality of data to be processed into the identifier, where the identifier is an integer identifier of 0-K.
And the grouping module 20 is connected to the preprocessing module, and is configured to perform grouping processing on the multiple pieces of data to be processed according to the occurrence time of the multiple pieces of data to be processed and multiple target time periods within the statistical time interval, so as to obtain multiple data groups.
The first obtaining module 30 is connected to the grouping module, and configured to obtain, according to the identifier of the to-be-processed data in the plurality of data groups, the identifier of the to-be-processed data in each data group and a first occurrence number, where the first occurrence number is the occurrence number of the identifier of the to-be-processed data in each data group.
The second obtaining module 40 is connected to the first obtaining module, and is configured to obtain, according to the plurality of data groups, the identifiers of the data to be processed in the plurality of data groups, and the first occurrence number, the identifiers of the data to be processed in a first time window and a second occurrence number, where the second occurrence number is the occurrence number of the identifiers of the data to be processed in the first time window.
A third obtaining module 50, connected to the second obtaining module, configured to move the first time window by using the duration of the target time period as a unit, and obtain an identifier of the to-be-processed data in the moved first time window and a third occurrence frequency, where the third occurrence frequency is an occurrence frequency of the identifier of the to-be-processed data in the moved first time window.
A fourth obtaining module 60, connected to the third obtaining module 50, configured to obtain the identifier of the to-be-processed data in the second time window and the fourth occurrence number of the corresponding to-be-processed data according to the identifiers of the to-be-processed data in the plurality of first time windows and the second occurrence number of the corresponding to-be-processed data and/or the identifier of the to-be-processed data in the moved first time window and the third occurrence number of the corresponding to-be-processed data, where the length of the second time window is greater than the length of the first time window.
A fifth obtaining module 70, connected to the fourth obtaining module and the third obtaining module 60, configured to obtain the identifier of the to-be-processed data in a third time window and a fifth occurrence frequency of the corresponding to-be-processed data according to the identifier of the to-be-processed data in the first time window and/or the second time window and the fourth occurrence frequency of the corresponding to-be-processed data, where the length of the third time window is greater than the length of the second time window.
It should be noted that the data processing apparatus is an apparatus item corresponding to the data processing method, and for specific introduction of each module in the data processing apparatus, reference is made to the description of the data processing method before, and details are not repeated here.
Referring to fig. 5, fig. 5 shows a block diagram 800 of a data processing apparatus according to an embodiment of the present disclosure. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 5, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the device 800 to perform the above-described methods.
Referring to fig. 6, fig. 6 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure. For example, the apparatus 1900 may be provided as a server.
Referring to FIG. 6, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the apparatus 1900 to perform the above-described methods.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (8)

1. A method of data processing, the method comprising:
preprocessing a plurality of data to be processed received in a statistical time interval, and determining identifiers of the plurality of data to be processed, wherein the identifiers are used for determining the types and/or sources of the data to be processed;
grouping the data to be processed according to the occurrence time of the data to be processed and a plurality of target time periods in the statistical time interval to obtain a plurality of data groups; the number of the data sets is the ratio of the duration of the statistical time interval to the duration of the target time period;
acquiring an identifier of the data to be processed in each data group and a first occurrence frequency according to the identifier of the data to be processed in the data groups, wherein the first occurrence frequency is the occurrence frequency of the identifier of the data to be processed in each data group;
acquiring an identifier and a second occurrence number of the to-be-processed data in a first time window according to the plurality of data groups, the identifier of the to-be-processed data in the plurality of data groups and the first occurrence number, wherein the second occurrence number is the occurrence number of the identifier of the to-be-processed data in the first time window; the length of the first time window is M times of the duration of the target time period, and M is an integer greater than 2; and
and moving the first time window by taking the duration of the target time period as a unit, and acquiring the identifier of the data to be processed in the moved first time window and a third occurrence frequency, wherein the third occurrence frequency is the occurrence frequency of the identifier of the data to be processed in the moved first time window.
2. The data processing method of claim 1, wherein preprocessing a plurality of data to be processed received within a statistical time interval, and determining the identity of the plurality of data to be processed comprises:
and converting the information of the categories or the sources of the data to be processed into the identifiers, wherein the identifiers are integer identifiers of 0-K.
3. The data processing method of claim 1, wherein the plurality of data sets is N, wherein the duration of the target time period is less than the duration of the statistical time interval, and N is a ratio of the duration of the statistical time interval to the duration of the target time period.
4. The data processing method of claim 1, wherein the method further comprises:
and acquiring the identifier of the data to be processed in the second time window and the fourth occurrence frequency of the corresponding data to be processed according to the identifiers of the data to be processed in the plurality of first time windows and the second occurrence frequency of the corresponding data to be processed and/or the identifier of the data to be processed in the moved first time window and the third occurrence frequency of the corresponding data to be processed, wherein the length of the second time window is greater than that of the first time window.
5. The data processing method of claim 4, wherein the method further comprises:
and acquiring the identifier of the data to be processed in a third time window and a fifth occurrence frequency of the data to be processed corresponding to the identifier of the data to be processed in the first time window and/or the second time window and the fourth occurrence frequency of the data to be processed corresponding to the identifier of the data to be processed, wherein the length of the third time window is greater than that of the second time window.
6. A data processing apparatus, characterized in that the apparatus comprises:
the device comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is configured to preprocess a plurality of pieces of data to be processed received in a statistical time interval and determine identifiers of the plurality of pieces of data to be processed, and the identifiers are used for determining the types and/or sources of the data to be processed;
the grouping module is connected with the preprocessing module and is configured to perform grouping processing on the plurality of data to be processed according to the occurrence time of the plurality of data to be processed and a plurality of target time periods in the statistic time interval to obtain a plurality of data groups;
the first acquisition module is connected with the grouping module and is configured to acquire the identifier and the first occurrence number of the to-be-processed data in each data group according to the identifier and the first occurrence number of the to-be-processed data in the plurality of data groups, wherein the first occurrence number is the occurrence number of the identifier of the to-be-processed data in each data group;
the second acquisition module is connected with the first acquisition module and configured to acquire an identifier of the data to be processed in a first time window and a second occurrence number according to the data groups, the identifier of the data to be processed in the data groups and the first occurrence number, wherein the second occurrence number is the occurrence number of the identifier of the data to be processed in the first time window; and
and the third acquisition module is connected to the second acquisition module and configured to move the first time window by taking the duration of the target time period as a unit, and acquire the identifier of the to-be-processed data in the moved first time window and a third occurrence frequency, wherein the third occurrence frequency is the occurrence frequency of the identifier of the to-be-processed data in the moved first time window.
7. The data processing apparatus of claim 6, wherein the pre-processing module comprises:
the first determining submodule converts the information of the categories or the sources of the data to be processed into the identifiers, and the identifiers are integer identifiers of 0-K.
8. The data processing apparatus of claim 6, wherein the plurality of data sets is N, wherein the duration of the target time period is less than the duration of the statistical time interval, and N is a ratio of the duration of the statistical time interval to the duration of the target time period.
CN201810895106.0A 2018-08-08 2018-08-08 Data processing method and device Active CN109189822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810895106.0A CN109189822B (en) 2018-08-08 2018-08-08 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810895106.0A CN109189822B (en) 2018-08-08 2018-08-08 Data processing method and device

Publications (2)

Publication Number Publication Date
CN109189822A CN109189822A (en) 2019-01-11
CN109189822B true CN109189822B (en) 2022-01-14

Family

ID=64920453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810895106.0A Active CN109189822B (en) 2018-08-08 2018-08-08 Data processing method and device

Country Status (1)

Country Link
CN (1) CN109189822B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569329B (en) * 2019-10-28 2022-08-02 深圳市商汤科技有限公司 Data processing method and device, electronic equipment and storage medium
CN111641704B (en) * 2020-05-28 2021-08-03 深圳华锐金融技术股份有限公司 Resource-related data transmission method, device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101873478A (en) * 2010-06-18 2010-10-27 杭州海康威视数字技术股份有限公司 Control method and related device of network monitoring system
CN104699422A (en) * 2015-03-11 2015-06-10 华为技术有限公司 Determination method and determination device of cache data
CN105224543A (en) * 2014-05-30 2016-01-06 国际商业机器公司 For the treatment of seasonal effect in time series method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101873478A (en) * 2010-06-18 2010-10-27 杭州海康威视数字技术股份有限公司 Control method and related device of network monitoring system
CN105224543A (en) * 2014-05-30 2016-01-06 国际商业机器公司 For the treatment of seasonal effect in time series method and apparatus
CN104699422A (en) * 2015-03-11 2015-06-10 华为技术有限公司 Determination method and determination device of cache data

Also Published As

Publication number Publication date
CN109189822A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN109800737B (en) Face recognition method and device, electronic equipment and storage medium
CN110889469B (en) Image processing method and device, electronic equipment and storage medium
CN107948708B (en) Bullet screen display method and device
CN106960014B (en) Associated user recommendation method and device
CN110781813B (en) Image recognition method and device, electronic equipment and storage medium
CN110633700B (en) Video processing method and device, electronic equipment and storage medium
CN110858924B (en) Video background music generation method and device and storage medium
CN107508573B (en) Crystal oscillator oscillation frequency correction method and device
CN111523346B (en) Image recognition method and device, electronic equipment and storage medium
CN110633715B (en) Image processing method, network training method and device and electronic equipment
CN109189822B (en) Data processing method and device
CN109992754B (en) Document processing method and device
CN110955800A (en) Video retrieval method and device
CN110675355B (en) Image reconstruction method and device, electronic equipment and storage medium
CN112102300A (en) Counting method and device, electronic equipment and storage medium
CN110121115B (en) Method and device for determining wonderful video clip
CN113506325B (en) Image processing method and device, electronic equipment and storage medium
CN113506324B (en) Image processing method and device, electronic equipment and storage medium
CN110858921A (en) Program video processing method and device
CN113506229B (en) Neural network training and image generating method and device
CN113506323B (en) Image processing method and device, electronic equipment and storage medium
CN113506319B (en) Image processing method and device, electronic equipment and storage medium
CN113259310B (en) Image pushing method and device, electronic equipment and storage medium
CN110119652B (en) Video shot segmentation method and device
US20230019679A1 (en) Image processing method and device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant