WO2023019560A1 - 数据处理方法、装置、电子设备和计算机可读存储介质 - Google Patents
数据处理方法、装置、电子设备和计算机可读存储介质 Download PDFInfo
- Publication number
- WO2023019560A1 WO2023019560A1 PCT/CN2021/113809 CN2021113809W WO2023019560A1 WO 2023019560 A1 WO2023019560 A1 WO 2023019560A1 CN 2021113809 W CN2021113809 W CN 2021113809W WO 2023019560 A1 WO2023019560 A1 WO 2023019560A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- statistical
- data
- sample data
- interval
- group
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 35
- 230000015654 memory Effects 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 16
- 230000004044 response Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 abstract description 9
- 101100194363 Schizosaccharomyces pombe (strain 972 / ATCC 24843) res2 gene Proteins 0.000 description 23
- 230000000007 visual effect Effects 0.000 description 15
- 101100194362 Schizosaccharomyces pombe (strain 972 / ATCC 24843) res1 gene Proteins 0.000 description 14
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000007619 statistical method Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
Definitions
- Embodiments of the present disclosure relate to a data processing method, device, electronic device, and computer-readable storage medium.
- At least one embodiment of the present disclosure provides a data processing method, including: acquiring at least one sample data in a statistical group, each sample data including a statistical index and an index value of the statistical index; creating a first statistical array corresponding to the statistical index, The first statistical array includes a plurality of first elements, the plurality of first elements are respectively used to count different index values; and the index value of at least one sample data is traversed, and the first statistical array is used to perform statistics on at least one sample data , to obtain data statistical results, the multiple first elements in the first statistical array are respective statistical sub-results of each index value.
- obtaining at least one sample data in the statistical group includes: obtaining multiple sample data corresponding to each statistical group in multiple statistical groups; traversing the at least one An index value of a sample data, and using the first statistical array to perform statistics on the at least one sample data to obtain the data statistical results, including: for each statistical group, traversing the index of the at least one sample data value, and use the first statistics array to perform statistics on the at least one sample data to obtain the data statistics result.
- At least one sample data obtained in a statistical unit is used as a statistical group, and the method further includes: determining at least one of the different index values The index value to be counted; obtain a plurality of statistical intervals, wherein each statistical interval includes at least one of the statistical units; establish a second statistical array for the plurality of statistical intervals, wherein the second statistical array includes a plurality of first Two elements, the plurality of second elements correspond one-to-one to the plurality of statistical intervals; and filter the interval statistical results belonging to the plurality of statistical intervals from the data statistical results in each statistical group;
- the plurality of second elements respectively perform statistics on the index values to be counted in the interval statistical results, so as to obtain an index value statistical result of the index values to be counted in each statistical interval.
- the plurality of second elements are used to respectively perform statistics on the index values to be counted in the statistical results of the intervals, so as to obtain the
- the statistical result of the index value of the index value to be counted includes: for each of the statistical results of the interval, determining the statistical interval to which the statistical group corresponding to the statistical result of the interval belongs; extracting the index value to be counted from the statistical result of the interval The statistical sub-results; and adding the statistical sub-results to the second element corresponding to the statistical interval to which the statistical group corresponding to the statistical result of the interval belongs, so as to obtain the index value to be counted in each statistical interval Statistical results of index values.
- the index value of at least one sample data in the statistical group is traversed, and the at least one sample data in the statistical group is performed using the first statistical group Statistics, to obtain the statistical results of the data in the statistical group, including: traversing the index value of at least one sample data in the statistical group, and using multiple elements in the first statistical array to respectively analyze the Each indicator value is counted to obtain the statistical result of the data in the statistical group.
- the multiple statistical intervals include a first statistical interval and a second statistical interval, the range of the second statistical interval is larger than the range of the first statistical interval, and the The first statistical interval is in the second statistical interval.
- the data processing method provided in an embodiment of the present disclosure further includes: receiving initial data from a data source; and establishing the at least one piece of sample data according to the initial data.
- the initial data includes statistical attribute information
- establishing the at least one sample data according to the initial data includes: determining whether there is a storing the storage file for the initial data; in response to the existence of the storage file for storing the initial data, storing the initial data in the storage file for use as the at least one sample data; in response to the absence of A storage file for storing the initial data, determining a statistical group to which the initial data belongs according to the statistical attribute information, generating a file path according to the statistical group, and creating the storage file in the file path,
- the storage file is used to store the initial data so as to use the initial data as the at least one sample data, wherein the initial data belonging to the same statistical group are stored in the same storage file.
- obtaining at least one sample data in the statistical group includes: generating a file path of a storage file corresponding to the statistical group; judging whether the file path exists; And in response to the existence of the file path, initial data in the statistical group is acquired from a storage file in the file path as the at least one sample data.
- the statistical unit includes a preset time period or a preset number of at least one terminal device.
- the data processing method is applied to multiple electronic devices, the at least one sample data includes multiple sample data groups, and the multiple electronic devices and the multiple There is a one-to-one correspondence between the sample data groups, and the plurality of electronic devices are configured to respectively perform statistics based on the corresponding sample data groups, and add the statistical values to obtain the data statistical results.
- At least one embodiment of the present disclosure provides a data processing device, including: a sample acquisition unit configured to acquire at least one sample data in a statistical group, each sample data includes a statistical index and an index value of the statistical index; an array creation unit configured In order to create a first statistical array corresponding to the statistical index, the first statistical array includes a plurality of first elements, and the plurality of first elements are respectively used to perform statistics on different index values; and a traversal unit configured To traverse the index value of the at least one sample data, and use the first statistical array to perform statistics on the at least one sample data to obtain data statistical results, wherein the multiple first elements in the first statistical array are respectively Statistical sub-results for each metric value.
- At least one embodiment of the present disclosure provides an electronic device, including a processor; a memory including one or more computer program instructions; wherein the one or more computer program instructions are stored in the memory, and are executed by the When the processor executes, implement the instructions of the data processing method provided by any embodiment of the present disclosure.
- At least one embodiment of the present disclosure provides a computer-readable storage medium that stores computer-readable instructions in a non-transitory manner, wherein when the computer-readable instructions are executed by a processor, the data processing provided by any embodiment of the present disclosure is implemented. method.
- FIG. 1A shows a system architecture to which a data processing method according to an embodiment of the present disclosure can be applied
- Fig. 1B shows a flowchart of a data processing method provided by at least one embodiment of the present disclosure
- Fig. 2 shows a flowchart of another data processing method provided by at least one embodiment of the present disclosure
- FIG. 3 shows a method flowchart of step S80 in FIG. 2 provided by at least one embodiment of the present disclosure
- Fig. 4 shows a flowchart of another data processing method provided by at least one embodiment of the present disclosure
- FIG. 5 shows a method flowchart of step S420 in FIG. 4 provided by at least one embodiment of the present disclosure
- Fig. 6 shows a flow chart of the method of step S10 in Fig. 1 provided by at least one embodiment of the present disclosure
- Fig. 7A shows a flowchart of another data processing method provided by at least one embodiment of the present disclosure
- Fig. 7B shows a schematic diagram of another data processing method provided by at least one embodiment of the present disclosure.
- Fig. 8 shows a schematic block diagram of a data processing device provided by at least one embodiment of the present disclosure
- Fig. 9 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure.
- FIG. 10 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure.
- Fig. 11 shows a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure.
- the "item” here may represent different meanings in different statistical situations, for example, “item” may refer to different statistical indicators, or may refer to the index value of the same statistical indicator.
- the “item” may include the statistical indicator “total quantity” and the statistical indicator “equipment running time” and so on.
- the “item” may refer to index values 1 and 2 of gender, “1" for example represents “male”, and “2” for example represents “female”.
- the transportation all-media control platform is a software-hardware integrated industry digital platform that provides customer group analysis, information management and precise delivery, centralized management and control of media equipment, and real-time environmental monitoring for public transportation operators or media operators.
- the front-end visual large-screen display content of the transportation all-media management and control platform includes a variety of topics, such as institutional users, traffic passenger flow, media advertising content, and hardware equipment information. There are many statistical indicators and report components in each topic.
- the customer group analysis page of the front-end visualized large screen includes time-sharing statistics of various types of passenger numbers (passers-by, contacts, and followers), statistics of passenger numbers of different genders and age groups, ranking statistics of the number of followers of various advertisements, etc.
- the media content page of the front-end visual large screen includes the number of play plans, normal play plans, play failure plans, popular advertisements, popular materials, etc.
- the hardware device page of the front-end visualized large screen includes various servers, the running time of playback devices, etc. information.
- the above-mentioned statistical indicators need to be calculated according to the dimension of organization (such as lines, stations, and equipment in subway scenarios) and time dimensions (such as the current day, 7 days, and 30 days), and most pages need to be timed at a certain frequency (such as 1 hours) to update.
- the traffic omni-media control platform uses the statistical methods in the above-mentioned related technologies to perform statistical analysis on the above-mentioned statistical indicators and the index values of the statistical indicators, it will not only take a long time for statistics, but may also be used for statistical analysis due to the throughput of big data. device performance issues. It should be understood that although the present disclosure uses the traffic omni-media management and control platform as an example to illustrate the implementation of the present disclosure, this does not mean that the present disclosure is only applicable to traffic application scenarios such as the traffic omni-media management and control platform. Data processing methods can be applied to any application scenario where statistical analysis of data is performed.
- At least one embodiment of the present disclosure provides a data processing method, a data processing device, electronic equipment, and a computer-readable storage medium.
- the data processing method includes: obtaining at least one sample data in a statistical group, each sample data including a statistical index and an index value of the statistical index; creating a first statistical array corresponding to the statistical index, the first statistical array includes a plurality of first Elements, a plurality of first elements are respectively used to count different index values; and the index value of at least one sample data is traversed, and the first statistical array is used to perform statistics on at least one sample data to obtain data statistical results, the first The plurality of first elements in the statistical array are the respective statistical sub-results of each index value.
- the data processing method uses the array to realize the statistics of multiple index values through one data processing, and realizes the calculation of high parallelism, thereby improving the efficiency of data calculation, and only needs to update the statistical indicators to realize the method. It is applied to count the updated statistical indicators, so the method also has high reusability.
- the data processing method can also be executed multiple times, and each time the data processing method is executed as a stage, on the basis of the data statistics results obtained by executing the data processing method in the previous stage, and then enter the next stage , so that the data statistics results can be further statistically analyzed, thereby reducing data redundancy, realizing data regularization, and effectively avoiding performance problems of computing equipment caused by instantaneous large throughput data.
- FIG. 1A shows a system architecture to which the data processing method provided by at least one embodiment of the present disclosure can be applied.
- the system architecture 100 includes a front-end visual large screen 101 , a business end 102 and a big data background 103 .
- the front-end visual large screen 101 is used to present a variety of thematic content, including institutional users, traffic passenger flow, media advertisement content, and hardware equipment information, for example.
- the customer group analysis page of the front-end visual large screen 101 includes time-sharing statistics of various types of passenger numbers (passing by, contacted, and followed), statistics of different genders and different age groups of passenger numbers, and ranking statistics of the number of people followed by various advertisements.
- the media content page of the front-end visual large screen 101 includes the number of playing plans, the number of normal playing plans, the number of failed playing plans, popular advertisements, popular materials, etc. information such as running time.
- the business end 102 may be, for example, a server providing support for the front-end large visual screen 101 .
- the server can analyze and process the data received from the front-end visual large screen 101 , and feed back the processing results (such as web pages, information, or data obtained or generated according to the request) to the front-end large visual screen 101 .
- the service end 102 may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or a cloud server.
- the business end 102 not only interacts with the front-end large visual screen 101, but also interacts with the big data background 103, and the business end 102 can serve as an interface for interaction between the front-end large visual screen 101 and the big data background.
- the business end 102 can transmit the sample data from the terminal device or the request from the front-end large visual screen 101 to the big data background 103, and the big data background 103 will perform statistical analysis on the sample data or respond to the request.
- the business end 102 can send the statistical results of the big data background 103 to the front-end large visual screen 101 to display the statistical analysis results on the front-end large visual screen 101 .
- the big data background 103 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server.
- the data processing method provided by at least one embodiment of the present disclosure may be executed by the big data background 103 .
- the system architecture can push batch data processing down to the big data background 103 for calculation, which can effectively reduce the pressure on the calculation and response time of the front-end visualization large screen 101 and the business terminal 102.
- the system architecture shown in FIG. 1A is only an example, and does not have a limiting effect on the present disclosure, and those skilled in the art can implement the data processing method of the present disclosure through any system architecture.
- the data processing method can also be executed only by the business end 102 and the front-end visual large screen 101, and the data processing method provided by at least one embodiment of the present disclosure is mainly executed by the business end 102 (for example, performing calculations).
- the system architecture The big data background 103 may not be included.
- Fig. 1B shows a flowchart of a data processing method provided by at least one embodiment of the present disclosure.
- the method may include steps S10-S30.
- Step S10 Obtain at least one sample data in the statistical group, each sample data includes a statistical index and an index value of the statistical index.
- Step S20 Create a first statistical array corresponding to the statistical index, the first statistical array includes a plurality of first elements, and the plurality of first elements are respectively used to count different index values.
- Step S30 traverse the index value of at least one sample data, and use the first statistical array to perform statistics on the at least one sample data, so as to obtain data statistics results.
- step S10 for example, at least one sample data collected within a preset time period is used as a statistical group, or at least one sample collected by a preset terminal device (for example, in a subway scene, a display screen arranged at a station) Data as a statistical group.
- a preset terminal device for example, in a subway scene, a display screen arranged at a station
- the preset time period is 1 hour, and at least one sample data collected from 15:00 on March 30, 2021 to 16:00 on March 30, 2021 is obtained.
- the sample data may include the track record after the computer vision technology analyzes the image collected by the camera, and the camera is the camera in the display screen arranged on the subway station.
- Table 1 below schematically shows at least one sample data collected by the statistical group from 15:00 on March 30, 2021 to 16:00 on March 30, 2021.
- Each sample data may include multiple sample indicators, for example, the multiple sample indicators are: event time, camera identification code (ID), head ID, head status, age, gender, and duration.
- Each sample index has an index value corresponding to the sample index, and the index value may refer to the value of the sample index.
- the indicator values of the head state include 1, 2, and 3, respectively indicating that the state of the passenger is recognized as passing the screen, touching the screen, and focusing on the screen.
- the "screen” here refers to, for example, in a subway scene, a screen of a display screen in a subway station.
- the age index values are 1, 2, 3, and 4, which respectively represent the identified age range.
- the index values of gender are 1 and 2, representing male and female, respectively.
- the index value of the event time may be the moment when the event occurs, that is, the moment when the head ID is collected.
- the index value of the camera ID may refer to the number of the camera that captures the head ID. For example, in a subway scene, the camera may be a camera arranged on a display screen in a station.
- sample data 1 indicates that when the event time is 1599129468132, camera04 captures that a passenger whose head ID is 1 touches the screen, and the age of the passenger is between 0 and 18 years old. for women.
- one or more of the plurality of sample indicators can be used as statistical indicators.
- head status can be used as a statistical indicator.
- gender can be used as a statistical indicator.
- head status and gender are both statistical indicators.
- the statistical indicator may be selected from multiple sample indicators by those skilled in the art or statisticians.
- step S20 for example, according to the index value list of the statistical index, a first statistical array corresponding to the statistical index is created.
- the index value list contains all index values of the statistical index.
- the statistical indicator is the head state, and according to the index value list of the head state includes 1, 2, and 3, a first statistical array res1 corresponding to the head state of the statistical index is created.
- the first statistical array res1 includes 3 elements, namely res1[0], res1[1] and res1[2].
- the statistical indicators are head status and gender, according to the index value list of head status includes 1, 2, 3 and the index value list of gender includes 1 and 2, create the first statistical array corresponding to the statistical indicators head status and gender res2.
- the first statistical array res2 includes 5 elements, namely res2[0], res2[1], res2[2], res2[3], and res2[4].
- the first statistical array includes a plurality of first elements, and the number of the first elements may be the same as the number of index values in the index value list of the statistical index, so that the plurality of first elements and the statistical index The multiple index values of are in one-to-one correspondence.
- res1[0] corresponds to index value 1
- res1[1] corresponds to index value 2
- res1[2] corresponds to index value 3.
- a plurality of first elements res1[0], res1[1] and res1[2] are used to count the index value 1, the index value 2 and the index value 3 respectively, that is, res1[0] is used to count the The number of people is counted
- res1[1] is used to count the number of people who touch the screen
- res1[2] is used to count the number of people who pay attention to the screen.
- res2[0], res2[1] and res2[2] correspond to the index value 1
- res2[3] and res2[4] correspond to index value 1 and index value 2 of the statistical index "gender" respectively.
- a plurality of first elements res2[0], res2[1], res2[2], res2[3] and res2[4] are respectively used for the index value 1 of the head state, the index value 2 of the head state, and the index value of the head state Index value 3, gender index value 1, and gender index value 2 are used for statistics.
- res2[0] is used to count the number of people passing the screen
- res2[1] is used to count the number of people who touch the screen
- res2[2] is used to count the number of people who pay attention to the screen
- res2[3] It is used to count the number of men
- res2[4] is used to count the number of women.
- the plurality of first elements in the first statistical array are the respective statistical sub-results of each index value.
- step S30 is similar to the case where the statistical index is the head state, and will not be repeated here.
- the elements in the first statistical array may be split, and an interval statistical result table may be generated according to the elements in the first statistical array and the index values corresponding to the elements.
- Table 2 below schematically shows an interval statistical result table obtained by performing statistics on the sample data collected from 15:00 to 16:00 on March 30, 2021.
- running the statistical function may execute step S20 and step S30 in FIG. 1B above.
- a person skilled in the art or a statistician only needs to input the name of the statistical index to be counted and the list of index values of the statistical index, and then call the statistical function to perform statistics on the statistical index. Therefore, the data processing in this embodiment can achieve high reusability.
- step S30 may be to traverse the index value of at least one sample data in the statistical group, and use multiple elements in the first statistical array to count each index value in the statistical group to obtain the statistical group The data statistics results of each index value in the index.
- the sample data may be grouped first according to a certain sample index to obtain logical blocks of grouped data. Then, within each logical block of grouped data, other sample indicators are used as statistical indicators for statistics. For example, the sample data is first grouped according to the camera ID, or the logical blocks of grouped data corresponding to each camera, and then for each logical block of branch data, statistics are performed using head status and/or gender as statistical indicators.
- step S10 includes obtaining a plurality of sample data corresponding to each statistical group in a plurality of statistical groups; and step S30 includes traversing at least one index value of sample data for each statistical group, and using The first statistics array performs statistics on at least one sample data to obtain data statistics results.
- step S10 may be to acquire the sample data collected in each preset time period in multiple preset time periods.
- the preset time period is 1 hour
- the statistical unit is 1 hour
- the sample data of each hour is sequentially obtained. For example, if the current moment is 16:32 on March 30, 2021, the sample data collected from 15:00 to 16:00 on March 30, 2021 will be obtained, and after 15:00 and 16:00 on March 30, 2021
- the sample data collected between 14:00 and 15:00 is obtained at the time before 00, and by analogy, the sample data collected in each preset time period among the multiple preset time periods is obtained.
- step S10 is to sequentially acquire hour-level sample data.
- step S30 for the sample data per hour, the index value of at least one sample data is traversed, and the first statistics array is used to perform statistics on the at least one sample data, so as to obtain the data statistics result of the hour.
- all the samples are performed at a certain moment in the next preset time period after the preset time period and adjacent to the preset time period.
- statistics For example, for the sample data from 15:00-16:00 on March 30, 2021, statistics are performed at some point between 16:00-17:00 to obtain data statistics results.
- Fig. 2 shows a flowchart of another data processing method provided by at least one embodiment of the present disclosure.
- the data processing method may further include steps S40 to S80 in addition to steps S10 to S30 in FIG. 1B .
- Step S40 Determine at least one index value to be counted from different index values.
- the index value to be counted select one or more of the number of people passing by the screen, the number of people touching the screen, the number of people following the screen, the number of men and the number of women in Table 2 as the index value to be counted.
- the value of the indicator to be counted is the number of people passing by the screen.
- Step S50 Obtain multiple statistical intervals.
- At least one sample data obtained in a statistical unit is used as a statistical group, and each statistical interval includes at least one statistical unit.
- the statistical unit may be a preset time period, or a preset number of terminal devices.
- the ranges of the multiple statistical intervals may increase sequentially.
- the plurality of statistical intervals includes a first statistical interval and a second statistical interval, the range of the second statistical interval is larger than the range of the first statistical interval, and the first statistical interval is within the second statistical interval.
- the plurality of statistical intervals include a first statistical interval, a second statistical interval and a third statistical interval, the third statistical interval is larger than the second statistical interval, and the second statistical interval is within the third statistical interval, and the third statistical interval is within the third statistical interval.
- the range of the second statistical interval is greater than the range of the first statistical interval, and the first statistical interval is within the second statistical interval.
- the statistical unit may be a preset time period, and each statistical interval may be at least one continuous preset time period.
- the current time is 2021-03-30 16:32
- the multiple statistical intervals can be: the current statistical interval [2021-03-30 00:00:00-2021-03-30 16:00:00), 7 Day statistical interval [2021-03-23 16:00:00-2021-03-30 16:00:00), and 30-day statistical interval [2021-02-28 16:00:00-2021-03-30 16 :00:00). That is, multiple statistical intervals represent different time dimensions respectively.
- the statistical unit is one terminal device, and the multiple statistical intervals may be, for example, a certain terminal device, all terminal devices on the station containing the terminal device, and all terminal devices on the line where the station is located.
- Step S60 Establish a second statistical array for the multiple statistical intervals, the second statistical array includes multiple second elements, and the multiple second elements correspond to the multiple statistical intervals one by one.
- the number of statistical intervals is N, and N is an integer greater than or equal to 2, then the number of second elements in the second statistical data may also be N, so that multiple second elements correspond to multiple statistical intervals one-to-one.
- the number of statistical intervals is 3, and the second statistical array ges may include 3 elements, namely ges[0], ges[1] and ges[2].
- ges[0] corresponds to [2021-03-30 00:00:00-2021-03-30 16:00:00)
- ges[1] corresponds to [2021-03-23 16:00:00-2021 -03-30 16:00:00) corresponds
- ges[2] corresponds to [2021-02-28 16:00:00-2021-03-30 16:00:00).
- the second statistical array ges can include 6 elements, which are ges[0], ges[1], ges[2], ges[3], ges[4], and ges[5].
- ges[0], ges[1], and ges[2] are used to count different time dimensions of the number of people passing by the screen
- ges[3], ges[4], and ges[5] are used to count the Count the number of people on screen in different time dimensions.
- Step S70 Screen statistical results of intervals belonging to multiple statistical intervals from statistical results of data in each statistical group.
- the data statistics results within the maximum statistical interval are filtered out from the data statistics results.
- the statistical interval of 30 days can be selected from the statistical results of the data [2021-02-28 16:00:00- Interval statistical results within 2021-03-30 16:00:00).
- Step S80 Use a plurality of second elements to separately count the index values to be counted in the interval statistical results, so as to obtain the index value statistical results of the index values to be counted in each statistical interval.
- ges[0] to count the index value 1 to be counted (that is, the number of people passing the screen) in [2021-03-30 00:00:00-2021-03-30 16:00:00)
- ges[ 1] Count the number of people passing the screen in [2021-03-23 16:00:00-2021-03-30 16:00:00
- ges[2] for [2021-02-28 16:00: 00-2021-03-30 16:00:00) to count the number of people passing by the screen, so as to obtain the The number of people passing by the screen, the number of people passing by the screen in [2021-03-23 16:00:00-2021-03-30 16:00:00), and [2021-02-28 16:00:00-2021-03- 30 16:00:00) the number of people passing by the screen.
- the method shown in Figure 2 can realize data statistics of different dimensions through one data processing, and achieve high parallelism calculation, thereby improving the efficiency of data calculation, and only need to update the index value to be counted, the method can be realized It is applied to count the updated index values to be counted, so this method also has high reusability.
- processing method can reduce data redundancy and achieve data regularization , while effectively avoiding performance problems of computing equipment caused by instantaneous large throughput data.
- the number of second elements included in the second statistical array and the number of first elements included in the first statistical array may be dynamically expanded according to actual needs.
- the statistical dimension of the second statistical data can be dynamically expanded through parameter configuration according to actual needs.
- Fig. 3 shows a flowchart of the method of step S80 in Fig. 2 provided by at least one embodiment of the present disclosure.
- the data processing method includes steps S81-S83.
- Step S81 For each interval statistical result, determine the statistical interval to which the statistical group corresponding to the interval statistical result belongs.
- the preset time period of the statistical group corresponding to the interval statistical results is [2021-03-30 15:00:00-2021-03-30 16:00:00)
- the statistical interval to which the preset time period belongs not only includes the current statistical interval [2021-03-30 00:00:00-2021-03-30 16:00:00), but also includes the 7-day statistical interval [2021-03- 23 16:00:00-2021-03-30 16:00:00), and the 30-day statistical interval [2021-02-28 16:00:00-2021-03-30 16:00:00).
- Step S82 Extract statistical sub-results of index values to be counted from the interval statistical results.
- the value of the index to be counted is the number of passers-by, and the statistical sub-result of the number of passers-by is extracted from each interval statistical result.
- the number of passers-by in the statistical interval [2021-03-30 15:00:00-2021-03-30 16:00:00) is extracted from the interval statistical results shown in Table 2 to be 29 people.
- Step S83 Add the statistical sub-results to the second element corresponding to the statistical interval to which the statistical group corresponding to the interval statistical result belongs, so as to obtain the index value statistical result of the index value to be counted in each statistical interval.
- the type of the second statistical array ges is an array format containing three second elements.
- Fig. 4 shows a flowchart of another data processing method provided by at least one embodiment of the present disclosure.
- the data processing method may further include steps S410 to S420 in addition to steps S10 to S30 in FIG. 1B .
- steps S410 and step S420 are performed before step S10.
- Step S410 Receive initial data from a data source.
- the big data background 103 receives initial data from the business end 102 .
- the display screen may include a camera, and the camera transmits the collected initial data to the service end 102, and then the service end 102 transmits the initial data from the camera to the big data background 103, so as to sink the data processing to the Big data background 103 to reduce the pressure on the front-end visualization large screen and the calculation and response time of the business end.
- Step S420 Create at least one sample data according to the initial data.
- the big data background 103 creates at least one sample data according to the initial data.
- the initial data includes statistical attribute information.
- Statistical attribute information serves as information for storing initial data. For example, if the initial data is stored according to time, the statistical attribute information may be the event time in Table 1, or if the initial data is stored according to the camera ID, the statistical attribute information may be the camera ID, etc.
- the storage file for storing the initial data is determined according to the preset time period to which the event time in the initial data belongs, and the initial data in each preset time period is stored in the same storage file.
- the storage file for storing the initial data is determined, and the initial data collected by each camera is stored in the same storage file.
- FIG. 5 shows a method flowchart of step S420 in FIG. 4 provided by at least one embodiment of the present disclosure.
- the method may include steps S421-S423.
- Step S421 According to the statistical attribute information, determine whether there is a storage file for storing the initial data.
- the statistical attribute information is event time, and it is determined whether there is a storage file for storing the initial data according to the event time of the initial data.
- the initial data belonging to the same statistical group are stored in the same storage file.
- initial data belonging to the same preset time period are stored in the same storage file.
- the statistical attribute information of the initial data is 15:32, March 30, 2021
- the preset time period to which 15:32, March 30, 2021 belongs is from 15:00, March 30, 2021 to March 30, 2021
- Step S422 In response to the existence of a storage file for storing the initial data, store the initial data in the storage file for use as at least one sample data.
- Step S423 In response to the absence of a storage file for storing the initial data, determine the statistical group to which the initial data belongs according to the statistical attribute information, and generate a file path according to the statistical group, and create a storage file in the file path, and the storage file is used for storing Initial data, for using the initial data as the at least one sample data.
- the initial data belonging to the same statistical group is stored in the same storage file, so that the initial data can be divided when storing the initial data, which can facilitate the subsequent search for the initial data, and further improve the statistical efficiency.
- the Kafka distributed log system can be used to store the initial data from the business end to the big data platform 103 according to the above path structure (for example, it can be stored in the local memory of the big data platform 103, or stored in the other storage devices associated with the big data platform 103), so that the big data platform 103 can perform statistical analysis on the sample data.
- step S10 may be to obtain at least one sample data from a storage file landed by the Kafka distributed log system.
- the initial data may also be stored in a database.
- step S10 may be to obtain at least one sample data from the database.
- Fig. 6 shows a flow chart of the method in step S10 in Fig. 1 provided by at least one embodiment of the present disclosure.
- the method may include step S11 to step S13.
- Step S11 Generate the file path of the storage file corresponding to the statistics group.
- Step S12 Determine whether there is a file path.
- Step S13 In response to the existence of the file path, acquire initial data in the statistics group from a storage file in the file path as at least one sample data.
- the initial data in the statistical group is obtained from a storage file in the file path as at least one sample data.
- the data processing method is applied to multiple electronic devices.
- multiple electronic devices form the big data background 103 shown in FIG. 1 , that is, multiple electronic devices form a server cluster.
- At least one piece of sample data includes a plurality of sample data groups, the plurality of electronic devices correspond to the plurality of sample data groups, and the plurality of electronic devices are configured to perform statistics based on the corresponding sample data groups, and add the statistical values to obtain the The statistical results of the above data.
- a plurality of sample data sets may refer to sample data respectively stored in different electronic devices, for example.
- part of the sample data in at least one sample data in the statistical group from 15:00 to 16:00 on March 30, 2021 landed on the first server among the multiple electronic devices this part of the sample data is a sample data group, and 2021 From 15:00 to 16:00 on March 30, 2020, other sample data in at least one sample data in the statistical group except the above-mentioned part of the sample data landed on the second server among the multiple electronic devices.
- the other sample data is another sample data set.
- the first server performs statistics on part of the sample data in the first server to obtain a first data statistical result res(a)
- the second server performs statistics on other sample data in the second server to obtain a second data statistical result res(b).
- Fig. 7A shows a flowchart of another data processing method provided by at least one embodiment of the present disclosure.
- the data processing method includes steps S701 to S707.
- the data processing method is described by taking the preset time period as 1 hour, that is, the statistical unit as 1 hour as an example.
- Step S701 Generate the file path of the previous hour according to the current time.
- This step S701 is similar to step S11 in FIG. 6 .
- Step S702 Determine whether the file path exists.
- This step S701 is similar to step S12 in FIG. 6 .
- Step S703 aggregate and preprocess the sample data of the last hour.
- the aggregation preprocessing may be, for example, executing step S20 and step S30 in FIG. 1B , so as to obtain a data statistical result of counting the index values of the statistical indexes of the sample data.
- steps S701 to S703 can obtain hour-level data statistics results.
- Step S704 Store the hour-level data statistics results in the data warehouse.
- the data statistical results are stored in the data warehouse according to the form of Table 2 above.
- Step S705 Determine whether there are data records in the statistical interval.
- step S705 may be to determine whether there is a data record in the statistical interval with the largest range among the multiple statistical intervals. For example, it is judged whether there is a statistical result of data belonging to the statistical interval with the largest range in the data warehouse.
- the interval statistical results belonging to the statistical interval with the largest range may be selected from the data statistical results in each statistical group, and step S706 is executed.
- Step S706 Hierarchical aggregation and parallel processing of interval statistical results according to the time dimension.
- steps S81-S83 described above with reference to FIG. 3 are executed, and interval statistical results are hierarchically aggregated and processed in parallel according to the time dimension, so as to obtain the index value statistical results of each time dimension.
- the time dimension includes: the statistical interval of the day [2021-03-30 00:00:00-2021-03-30 16:00:00), the 7-day statistical interval [2021-03-23 16:00:00-2021- 03-30 16:00:00), and the 30-day statistical interval [2021-02-28 16:00:00-2021-03-30 16:00:00).
- Step S707 Insert the index value statistical results into the summary data table.
- the index value statistical result of the index value to be counted is inserted into the summary data table.
- the summary data table may include statistical results of respective index values of different index values to be counted. Summarizing the index value statistical results of each index value to be counted into the summary data table can facilitate comparison and analysis by statisticians.
- Fig. 7B shows a schematic diagram of another data processing method provided by at least one embodiment of the present disclosure.
- the data processing method includes steps S710 to S730.
- Step S710 is, for example, executed by the first server
- step S720 is, for example, executed by the second server.
- Step S710 The first server performs data processing on the sample data in the first sample data group in the statistics group according to the method shown in FIG. 1B.
- the index values of the statistical indexes of the sample data in the first sample data group are counted by using the first statistical array res.
- the first statistical array res can contain 3 first elements, and the 3 first elements are res[0], res[1] and res[2 respectively ], so as to use the three first elements to perform statistics on the three index values to obtain the first data statistical result, that is, to obtain the statistical sub-results of each index value in the first sample data set.
- the first statistical array res is initialized so that each first element is 0.
- Step S720 The second server performs data processing on the sample data of the second sample data group in the statistical group according to the method described in FIG. 1B.
- the first statistics array res is used to perform statistics on the index values of the statistical indicators of the sample data in the second sample data group, so as to obtain the second data statistical result.
- the first statistical array res is initialized so that each first element is 0.
- the processing method of the second server is basically the same as the processing method of the first server, the difference is that the specific sample data targeted by the two are different. Processing is performed on the first sample data set.
- the statistical groups in step S710 and step S720 are the same statistical group, for example, they are both statistical groups composed of sample data obtained at 15:00-16:00 on March 30, 2021.
- the sample data of the same statistical group may fall into different servers for storage. Therefore, for the sample data statistics of the same statistical group, it is necessary to calculate the statistical results of the sample data belonging to the same statistical group in different servers to obtain the same Statistical results of statistical groups.
- Step S730 Add the corresponding first element in the first statistical result of data and the statistical result of second data to obtain statistical result 700 of data.
- the res[0] obtained by the statistics of the first server and the res[0] obtained by the statistics of the second server are added together to obtain the res[0] of the data statistics result 700.
- the res[1] and The res[1] obtained by the second server statistics is added to obtain the res[1] of the data statistics result 700
- the res[2] obtained by the first server statistics and the res[2] obtained by the second server statistics are added to obtain the data statistics result res[2] of 700.
- Fig. 8 shows a schematic block diagram of a data processing apparatus 800 provided by at least one embodiment of the present disclosure.
- the data processing apparatus 800 includes a sample acquisition unit 810 , an array creation unit 820 and a traversal unit 830 .
- the sample obtaining unit 810 is configured to obtain at least one sample data in the statistical group, each sample data includes a statistical index and an index value of the statistical index.
- the sample acquisition unit 810 can, for example, execute step S10 described in FIG. 1B .
- the array creating unit 820 is configured to create a first statistical array corresponding to the statistical index, the first statistical array includes a plurality of first elements, and the plurality of first elements are respectively used to perform statistics on different index values.
- the array creation unit 820 may, for example, execute step S20 described in FIG. 1B .
- the traversal unit 830 is configured to traverse the index values of the at least one sample data, and use the first statistical array to perform statistics on the at least one sample data to obtain data statistical results, wherein the multiple The first element is the respective statistical sub-result of each index value.
- the traversal unit 830 may, for example, execute step S30 described in FIG. 1B .
- the sample acquiring unit 810, the array creating unit 820 and the traversing unit 830 may be hardware, software, firmware and any feasible combination thereof.
- the sample acquisition unit 810, the array creation unit 820, and the traversal unit 830 may be dedicated or general-purpose circuits, chips or devices, or a combination of processors and memories.
- the embodiment of the present disclosure does not limit it.
- each unit of the data processing device 800 corresponds to each step of the aforementioned data processing method, and for specific functions of the data processing device 800, reference may be made to relevant descriptions of the data processing method, here No longer.
- the components and structure of the data processing apparatus 800 shown in FIG. 8 are only exemplary, not limiting, and the data processing apparatus 800 may further include other components and structures as required.
- At least one embodiment of the present disclosure further provides an electronic device, the electronic device includes a processor and a memory, and the memory includes one or more computer program modules.
- One or more computer program modules are stored in the memory and configured to be executed by the processor, and the one or more computer program modules include instructions for implementing the above data processing method.
- the electronic device can realize the parallel statistics of multiple index values through one data processing by using the array, which improves the statistical efficiency and has high reusability.
- Fig. 9 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure.
- the electronic device 900 includes a processor 910 and a memory 920 .
- Memory 920 is used to store non-transitory computer readable instructions (eg, one or more computer program modules).
- the processor 910 is configured to execute non-transitory computer-readable instructions, and when the non-transitory computer-readable instructions are executed by the processor 910, one or more steps in the data processing method described above may be performed.
- the memory 920 and the processor 910 may be interconnected by a bus system and/or other forms of connection mechanisms (not shown).
- the processor 910 may be a central processing unit (CPU), a graphics processing unit (GPU), or other forms of processing units having data processing capabilities and/or program execution capabilities.
- the central processing unit (CPU) may be of X86 or ARM architecture and the like.
- the processor 910 can be a general purpose processor or a dedicated processor, and can control other components in the electronic device 900 to perform desired functions.
- memory 920 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory.
- the volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example.
- Non-volatile memory may include, for example, read only memory (ROM), hard disks, erasable programmable read only memory (EPROM), compact disc read only memory (CD-ROM), USB memory, flash memory, and the like.
- One or more computer program modules can be stored on the computer-readable storage medium, and the processor 910 can run one or more computer program modules to realize various functions of the electronic device 900 .
- Various application programs, various data, and various data used and/or generated by the application programs can also be stored in the computer-readable storage medium.
- Fig. 10 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure.
- the electronic device 1000 is, for example, suitable for implementing the data processing method provided by the embodiments of the present disclosure.
- the electronic device 1000 may be a terminal device or the like. It should be noted that the electronic device 1000 shown in FIG. 10 is only an example, which does not impose any limitation on the functions and application scope of the embodiments of the present disclosure.
- the electronic device 1000 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 1010, which may be randomly accessed according to a program stored in a read-only memory (ROM) 1020 or loaded from a storage device 1080.
- a processing device such as a central processing unit, a graphics processing unit, etc.
- RAM memory
- various appropriate actions and processes are executed by programs in the memory (RAM) 1030 .
- RAM 1030 various programs and data necessary for the operation of the electronic device 1000 are also stored.
- the processing device 1010, the ROM 1020, and the RAM 1030 are connected to each other through a bus 1040.
- An input/output (I/O) interface 1050 is also connected to bus 1040 .
- the following devices can be connected to the I/O interface 1050: input devices 1060 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1070 such as a computer; a storage device 1080 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1090 .
- the communication means 1090 may allow the electronic device 1000 to perform wireless or wired communication with other electronic devices to exchange data.
- FIG. 10 shows electronic device 1000 having various means, it should be understood that it is not required to implement or have all of the means shown, and electronic device 1000 may alternatively implement or have more or fewer means.
- the data processing method described above can be implemented as a computer software program.
- embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the above data processing method.
- the computer program may be downloaded and installed from a network via communication means 1090, or installed from storage means 1080, or installed from ROM 1020.
- the processing device 1010 When the computer program is executed by the processing device 1010, the functions defined in the data processing method provided by the embodiments of the present disclosure can be realized.
- At least one embodiment of the present disclosure also provides a computer-readable storage medium for storing non-transitory computer-readable instructions, and when the non-transitory computer-readable instructions are executed by a computer, the above-mentioned data processing method.
- the array can be used to realize parallel statistics on multiple index values through one data processing, which improves the statistical efficiency and has high reusability.
- Fig. 11 is a schematic diagram of a storage medium provided by some embodiments of the present disclosure.
- a storage medium 1100 is used to store non-transitory computer readable instructions 1110 .
- the non-transitory computer-readable instructions 1110 are executed by a computer, one or more steps in the data processing method described above may be performed.
- the storage medium 1100 can be applied to the above-mentioned electronic device 900 .
- the storage medium 1100 may be the memory 920 in the electronic device 900 shown in FIG. 9 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Complex Calculations (AREA)
Abstract
一种数据处理方法、数据处理装置、电子设备和计算机可读存储介质。该数据处理方法包括:获取统计组内的至少一个样本数据,每个样本数据包括统计指标和统计指标的指标值(S10);创建对应于统计指标的第一统计数组,第一统计数组包括多个第一元素,多个第一元素分别用于对不同的指标值进行统计(S20);以及遍历至少一个样本数据的指标值,并利用第一统计数组对至少一个样本数据进行统计,以获得数据统计结果,第一统计数组中的多个第一元素分别为每个指标值各自的统计子结果(S30)。该方法利用数组通过一次数据处理便可实现对多个指标值的统计,从而实现了高并行度的计算,提高了统计效率,并且具有高可复用性。
Description
本公开的实施例涉及一种数据处理方法、装置、电子设备和计算机可读存储介质。
随着科学技术和经济的快速发展,当前社会已进入大数据时代。对大数据进行统计和分析,可以从大数据中挖掘出有价值的信息,进而可以利用这些有价值的信息,实现社会中各个领域的改善和优化。例如,通过对交通数据的统计和分析可以发现城市交通的弊端,从而可以根据弊端改进城市交通。
发明内容
本公开至少一个实施例提供一种数据处理方法,包括:获取统计组内的至少一个样本数据,每个样本数据包括统计指标和统计指标的指标值;创建对应于统计指标的第一统计数组,第一统计数组包括多个第一元素,多个第一元素分别用于对不同的指标值进行统计;以及遍历至少一个样本数据的指标值,并利用第一统计数组对至少一个样本数据进行统计,以获得数据统计结果,第一统计数组中的多个第一元素分别为每个指标值各自的统计子结果。
例如,在本公开一实施例提供的数据处理方法中,获取所述统计组内的至少一个样本数据,包括:获取多个统计组内每个统计组对应的多个样本数据;遍历所述至少一个样本数据的指标值,并利用所述第一统计数组对所述至少一个样本数据进行统计,以获得所述数据统计结果,包括:针对每个统计组,遍历所述至少一个样本数据的指标值,并利用所述第一统计数组对所述至少一个样本数据进行统计,以获得所述数据统计结果。
例如,在本公开一实施例提供的数据处理方法中,在统计单位内获取到的至少一个样本数据作为一个所述统计组,所述方法还包括:从所述不同的指标值中确定至少一个待统计指标值;获取多个统计区间,其中,每个统计区间包括至少一个所述统计单位;为所述多个统计区间建立第二统计数组,其中,所述第二统计数组包括多个第二元素,所述多个第二元素与所述多个统计区间一一对应;以及从所述每个统计组内的数据统计结果中筛选属于所述多个统计区间的区间统计结果;利用所述多个第二元素分别对所述区间统计结果中的所述待统计指标值进行统计,以获得每个统计区间的所述待统计指标值的指标值统计结果。
例如,在本公开一实施例提供的数据处理方法中,利用所述多个第二元素分别对所述区间统计结果中的所述待统计指标值进行统计,以获得每个统计区间的所述待统计指标值的指标值统计结果,包括:针对每个所述区间统计结果,确定所述区间统计结果对应的统计组所属的统计区间;从所述区间统计结果中提取所述待统计指标值的统计子结果;以及将所述统计子结果累加到与所述区间统计结果对应的统计组所属的统计区间相对应的第二元素上,以获得每个统计区间的所述待统计指标值的指标值统计结果。
例如,在本公开一实施例提供的数据处理方法中,遍历所述统计组内至少一个样本数据的指标值,并利用所述第一统计数组对所述统计组中所述至少一个样本数据进行统计,以获得所述统计组内的数据统计结果,包括:遍历所述统计组内至少一个样本数据的指标值,并利用所述第一统计数组中的多个元素分别对所述统计组中各个指标值计数,以获得所述统计组内的数据统计结果。
例如,在本公开一实施例提供的数据处理方法中,多个统计区间包括第一统计区间和第二统计区间,所述第二统计区间的范围大于所述第一统计区间的范围,并且所述第一统计区间在所述第二统计区间中。
例如,在本公开一实施例提供的数据处理方法中,还包括:接收来自数据源的初始数据;以及根据所述初始数据,建立所述至少一个样本数据。
例如,在本公开一实施例提供的数据处理方法中,初始数据包括统计属性信息,根据所述初始数据,建立所述至少一个样本数据,包括:根据所述统计属性信息,确定是否存在用于存储所述初始数据的存储文件;响应于存在用于存储所述初始数据的存储文件,将所述初始数据存储到所述存储文件中,以作为所述至少一个样本数据使用;响应于不存在用于存储所述初始数据的存储文件,根据所述统计属性信息确定所述初始数据所属的统计组,并且根据所述统计组生成文件路径,并且在所述文件路径中创建所述存储文件,其中,所述存储文件用于存储所述初始数据,以将所述初始数据作为所述至少一个样本数据使用,其中,属于同一统计组的初始数据存储到同一存储文件中。
例如,在本公开一实施例提供的数据处理方法中,获取所述统计组内的至少一个样本数据,包括:生成所述统计组对应的存储文件的文件路径;判断是否存在所述文件路径;以及响应于存在所述文件路径,从所述文件路径中的存储文件中获取所述统计组内的初始数据以作为所述至少一个样本数据。
例如,在本公开一实施例提供的数据处理方法中,统计单位包括预设时间段或者预设数量的至少一个终端设备。
例如,在本公开一实施例提供的数据处理方法中,所述数据处理方法应用于多个电子设备,所述至少一个样本数据包括多个样本数据组,所述多个电子设备与所述多个样本数据组一一对应,所述多个电子设备配置为分别基于对应的样本数据组进行统计,并将统计的数值相加以得到所述数据统计结果。
本公开至少一个实施例提供一种数据处理装置,包括:样本获取单元,配置为获取统计组内的至少一个样本数据,每个样本数据包括统计指标和统计指标的指标值;数组创建单元,配置为创建对应于所述统计指标的第一统计数组,所述第一统计数组包括多个第一元素,所述多个第一元素分别用于对不同的指标值进行统计;以及遍历单元,配置为遍历所述至少一个样本数据的指标值,并利用所述第一统计数组对所述至少一个样本数据进行统计,以获得数据统计结果,其中,第一统计数组中的多个第一元素分别为每个指标值各自的统计子结果。
本公开至少一个实施例提供一种电子设备,包括处理器;存储器,包括一个或多个计算机程序指令;其中,所述一个或多个计算机程序指令被存储在所述存储器中,并由所述处理器执行时实现本公开任一实施例提供的数据处理方法的指令。
本公开至少一个实施例提供一种计算机可读存储介质,非暂时性存储有计算机可读指令,其中,当所述计算机可读指令由处理器执行时实现本公开任一实施例提供的数据处理方法。
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1A示出了一种可以应用本公开实施例的数据处理方法的系统架构;
图1B示出了本公开至少一个实施例提供的一种数据处理方法的流程图;
图2示出了本公开至少一个实施例提供的另一种数据处理方法的流程图;
图3示出了本公开至少一个实施例提供的图2中步骤S80的方法流程图;
图4示出了本公开至少一个实施例提供的另一种数据处理方法的流程图;
图5示出了本公开至少一个实施例提供的图4中步骤S420的方法流程图;
图6示出了本公开至少一个实施例提供的图1中步骤S10的方法流程图;
图7A示出了本公开至少一个实施例提供的另一数据处理方法的流程图;
图7B示出了本公开至少一个实施例提供的另一数据处理方法的示意图;
图8示出了本公开至少一个实施例提供的一种数据处理装置的示意框图;
图9为本公开一些实施例提供的一种电子设备的示意框图;
图10为本公开一些实施例提供的另一种电子设备的示意框图;以及
图11示出了本公开至少一个实施例提供的一种计算机可读存储介质的示意图。
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”、“一”或者“该”等类似词语也不表示数量限制,而是表示存在至少一个。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
在大数据统计中,经常需要对样本数据中的多个项目进行统计。在相关技术中,通常需要对多个项目中的每个项目分别遍历一次样本数据,才能够得到多个项目中各个项目的统计结果。这种统计分析方法费时费力。需要理解的是,此处的“项目”在不同的统计情形中可以代表不同的含义,例如,“项目”可以是指不同的统计指标,或者也可以是指同一统计指标的指标值。例如,“项目”可以是包括统计指标“总数量”和统计指标“设备运行时长”等。又例如,在统计指标为“性别”的情形中,“项目”可以是指性别的指标值1和2,“1”例如代表“男”,“2”例如代表“女”。
例如,交通全媒体管控平台是为公共交通场所运营企业或媒体运营商提供的客群分析、信息管理与精准投放、媒体设备集中管控、环境实时监测的软硬一体产业数字化平台。交通全媒体管控平台的前端可视化大屏呈现内容所包括的主题丰富多样,例如包含机构用户、交通客群流量、媒体广告内容及硬件设备信息等,各主题内统计指标多、报表组件多。例如,前端可视化大屏的客群分析页面包含各类客流人次(路过、触达、关 注人次)的分时统计,不同性别、各个年龄段客流人次的统计,各类广告的关注人次排名统计等,前端可视化大屏的媒体内容页面包含播放计划数、正常播放计划数、播放失败计划数、热门广告、热门素材等,前端可视化大屏的硬件设备页面包含各类服务器、播放设备的运行时长等信息。例如,上述统计指标均需按照组织机构维度(如地铁场景为线路、站点、设备)和时间维度(如当日、7天、30天)统计计算,且大部分页面需按一定频率定时(如1小时)进行更新。
若交通全媒体管控平台利用上述相关技术中的统计方法来对上述各个统计指标以及统计指标的指标值进行统计分析,不仅需要较长的统计时间,还可能由于大数据的吞吐造成用于统计分析的设备的性能问题。需要理解的是,虽然在本公开中以交通全媒体管控平台为例来说明本公开的实施方式,但是这并不意味着本公开只应用于交通全媒体管控平台等交通应用场景,本公开的数据处理方法可以应用于任何对数据进行统计分析的应用场景。
本公开至少一个实施例提供一种数据处理方法、数据处理装置、电子设备和计算机可读存储介质。该数据处理方法包括:获取统计组内的至少一个样本数据,每个样本数据包括统计指标和统计指标的指标值;创建对应于统计指标的第一统计数组,第一统计数组包括多个第一元素,多个第一元素分别用于对不同的指标值进行统计;以及遍历至少一个样本数据的指标值,并利用第一统计数组对至少一个样本数据进行统计,以获得数据统计结果,第一统计数组中的多个第一元素分别为每个指标值各自的统计子结果。
该数据处理方法利用数组通过一次数据处理便可实现对多个指标值的统计,实现了高并行度的计算,从而提高了数据计算的效率,并且只需要更新统计指标,便可以实现将该方法应用到对更新后的统计指标进行统计,因此该方法还具有高可复用性。在一些实施例中,该数据处理方法还可以被执行多次,每次执行该数据处理方法作为一个阶段,在前一阶段执行数据处理方法得到的数据统计结果的基础上,再进入后一阶段,这样可以对数据统计结果进一步地统计分析,从而减少数据冗余,实现数据规整,同时有效避免瞬时大吞吐数据造成的计算设备的性能问题。
图1A示出了可以应用本公开至少一个实施例提供的数据处理方法的系统架构。
如图1A所示,在该系统架构100中包括前端可视化大屏101、业务端102和大数据后台103。
前端可视化大屏101用于呈现丰富多样的主题内容,例如包含机构用户、交通客群流量、媒体广告内容及硬件设备信息等。例如,前端可视化大屏101的客群分析页面包含各类客流人次(路过、触达、关注人次)的分时统计,不同性别、各个年龄段客流人 次的统计,各类广告的关注人次排名统计等,前端可视化大屏101的媒体内容页面包含播放计划数、正常播放计划数、播放失败计划数、热门广告、热门素材等,前端可视化大屏101的硬件设备页面包含各类服务器、播放设备的运行时长等信息。
业务端102例如可以是为前端可视化大屏101提供支持的服务器。服务器例如可以对接收到的来自前端可视化大屏101的请求等数据进行分析等处理,并将处理结果(例如根据请求获取或生成的网页、信息、或数据等)反馈给前端可视化大屏101。业务端102可以为独立的物理服务器,也可以为多个物理服务器构成的服务器集群或者分布式系统,也可以是云服务器等。
在本公开的实施例中,业务端102不仅与前端可视化大屏101交互,还与大数据后台103交互,业务端102可以作为前端可视化大屏101和大数据后台交互的接口。例如,业务端102可以将来自终端设备的样本数据或者来自前端可视化大屏101的请求传输到大数据后台103,由大数据后台103对样本数据进行统计分析或者对请求进行响应。业务端102可以将大数据后台103的统计结果发送至前端可视化大屏101,以在前端可视化大屏101中展示统计分析的结果。
大数据后台103可以是独立的物理服务器,也可以为多个物理服务器构成的服务器集群或者分布式系统,也可以是云服务器等。本公开至少一实施例提供的数据处理方法可以由大数据后台103执行。该系统架构可以将批量的数据处理下推至大数据后台103进行计算,可以有效降低前端可视化大屏101和业务端102的计算与响应时间的压力。
可以理解的是,图1A所示的系统架构仅为一种示例,对本公开不具有限定作用,本领域技术人员可以通过任何系统架构来实现本公开的数据处理方法。例如,该数据处理方法也可以仅由业务端102和前端可视化大屏101执行,本公开至少一个实施例提供的数据处理方法主要由业务端102执行(例如进行计算),此时,系统架构中可以不包括大数据后台103。
图1B示出了本公开至少一实施例提供的一种数据处理方法的流程图。
如图1B所示,该方法可以包括步骤S10~S30。
步骤S10:获取统计组内的至少一个样本数据,每个样本数据包括统计指标和统计指标的指标值。
步骤S20:创建对应于统计指标的第一统计数组,第一统计数组包括多个第一元素,多个第一元素分别用于对不同的指标值进行统计。
步骤S30:遍历至少一个样本数据的指标值,并利用第一统计数组对至少一个样本数据进行统计,以获得数据统计结果。
对于步骤S10,例如,预设时间段内采集到的至少一个样本数据作为一个统计组,或者预设的终端设备(例如,在地铁场景中,布置于站点的显示屏)采集到的至少一个样本数据作为一个统计组。
例如,预设时间段为1小时,获取2021年3月30日15:00至2021年3月30日16:00采集到的至少一个样本数据。例如,在地铁场景中,样本数据可以包括经计算机视觉技术对摄像头采集到的图像分析以后的轨迹记录,摄像头为地铁站点上布置的显示屏中的摄像头。
下表一示意性示出了统计组2021年3月30日15:00至2021年3月30日16:00内采集到的至少一个样本数据。
如表一所示,表一包含多个字段,每个字段表示一个样本指标。每个样本数据可以包括多个样本指标,例如多个样本指标分别为:事件时间、摄像头标识码(ID)、人头ID、人头状态、年龄、性别以及持续时长等。每个样本指标具有与样本指标相对应的指标值,指标值可以是指样本指标的取值。例如,人头状态的指标值包括1、2、3,分别表示乘客的状态被识别为路过屏幕、触达屏幕和关注屏幕。此处的“屏幕”例如是指在地铁场景中,地铁站点中的显示屏的屏幕。年龄的指标值为1、2、3、4,分别表示识别出的年龄所属区间段,如1代表0-18岁,2代表19-30岁,3代表31-50岁,4表示51岁及以上。性别的指标值为1、2,分别表示男性、女性。事件时间的指标值可以是该事件发生的时刻,即采集到人头ID的时刻。摄像头ID的指标值可以是指采集到人头ID的摄像头的编号。例如,在地铁场景中,摄像头可以是布置于站点中的显示屏上的摄像头。
例如,在表一中,示例数据1表示,在事件时间为1599129468132时,摄像头camera04采集到被标记为人头ID为1的乘客触达屏幕,该乘客的年龄在0-18岁之间,该乘客为女性。
在本公开的一些实施例中,多个样本指标中的一个或者多个可以作为统计指标。例如,人头状态可以作为统计指标。又例如,性别可以作为统计指标。又例如,人头状态和性别均为统计指标。
表一
在本公开的一些实施例中,统计指标可以是本领域技术人员或者统计人员从多个样本指标中选择出来的。
对于步骤S20,例如根据统计指标的指标值列表,创建对应于统计指标的第一统计数组。指标值列表中包含该统计指标的所有指标值。
例如,统计指标为人头状态,根据人头状态的指标值列表包括1、2和3,创建对应于统计指标人头状态的第一统计数组res1。第一统计数组res1中包括3个元素,分别为res1[0]、res1[1]和res1[2]。
又例如,统计指标为人头状态和性别,根据人头状态的指标值列表包括1、2、3以及性别的指标值列表中包括1和2,创建对应于统计指标人头状态和性别的第一统计数组res2。第一统计数组res2中包括5个元素,分别为res2[0]、res2[1]、res2[2]、res2[3]、res2[4]。
在本公开的一些实施例中,第一统计数组包括多个第一元素,第一元素个数可以与统计指标的指标值列表中指标值的个数相同,使得多个第一元素与统计指标的多个指标值一一对应。
例如,在上述统计指标为人头状态的情形中,res1[0]与指标值1对应,res1[1]与指标值2对应,以及res1[2]与指标值3对应。多个第一元素res1[0]、res1[1]和res1[2]分别用于对指标值1、指标值2和指标值3进行统计,也就是,res1[0]用于对路过屏幕的人数进行统计,res1[1]用于对触达屏幕的人数进行统计,以及res1[2]用于对关注屏幕的人数进行统计。
又例如,在上述统计指标为人头状态和性别的情形中,res2[0]、res2[1]以及res2[2]分别与统计指标人头状态的指标值1、指标值2和指标值3对应,res2[3]和res2[4]分别与统计指标“性别”的指标值1和指标值2对应。多个第一元素res2[0]、res2[1]、res2[2]、res2[3]和res2[4]分别用于对人头状态的指标值1、人头状态的指标值2、人头状态的指标值3、性别的指标值1和性别的指标值2进行统计。也即,res2[0]用于对路过屏幕的人数进行统计,res2[1]用于对触达屏幕的人数进行统计,res2[2]用于对关注屏幕的人数进行统,res2[3]用于对男的人数进行统计,res2[4]用于对女性的人数进行统计。
对于步骤S30,第一统计数组中的多个第一元素分别为每个指标值各自的统计子结果。
例如,在上述统计指标为人头状态的情形中,对上表一中的统计指标人头状态的指 标值进行遍历,并利用第一统计数组res中的第一元素res[0]、res[1]和res[2]分别对指标值1、指标值2和指标值3的数量进行统计,以获得数据统计结果。数据统计结果例如是res[29,42,96],也就是,指标值1的统计子结果为29,指标值2的统计子结果为42和指标值3的统计子结果为96。对于上述统计指标为人头状态和性别的情形,步骤S30与统计指标为人头状态的情形类似,在此不再赘述。
例如,在得到数据统计结果的情况下,可以对第一统计数组中的元素进行拆分,根据第一统计数组中的元素和元素对应的指标值,生成区间统计结果表。下表二示意性示出了对2021年3月30日15:00到16:00采集到的样本数据进行统计得到的区间统计结果表。
表二
如表二所示,利用上述实施例的第一统计数据res2统计得到了2021年3月30日15:00:00-16:00:00之间的路过屏幕人数、触达屏幕人数、关注屏幕人数、男性人数和女性人数。
在本公开的一些实施例中,例如可以定义统计函数y=f(col,res_list),col为指定字段名称(即,统计指标的名称),res_list为该字段取值列表(即,统计指标的指标值列表),返回值y为长度为res_list大小的数组。例如,运行该统计函数可以执行上述参考图1B中步骤S20和步骤S30。在该实施例中,本领域技术人员或者统计人员只需要输入要统计的统计指标的名称和该统计指标的指标值列表,便可以调用统计函数对统计指标进行统计。因此,该实施例中的数据处理可以实现高复用性。
在本公开的一些实施例中,步骤S30可以是遍历统计组内至少一个样本数据的指标值,并利用第一统计数组中的多个元素分别对统计组中各个指标值计数,以获得统计组内各个指标值的数据统计结果。
例如,在统计指标为人头状态的情景中,首先将第一统计数组res中各第一元素初始化为0,然后,响应于获取到样本数据,对该条样本数据的人头状态的指标值进行判断:若该样本数据的人头状态的指标值1,则res[0]+=1,若该样本数据的人头状态的指标值2,则res[1]+=1,以及若该样本数据的人头状态的指标值3,则res[2]+=1,从而获得统计组内的数据统计结果,也即获得统计组内各个指标值的统计子结果。
在本公开的一些实施例中,例如也可以按照某个样本指标先对样本数据进行分组,获得各分组数据逻辑块。然后在每个分组数据逻辑块内部,以另一些样本指标为统计指标进行统计。例如,按照摄像头ID先对样本数据进行分组,或者每个摄像头对应的分组数据逻辑块,然后针对每个分支数据逻辑块,以人头状态和/或性别为统计指标进行统计。
在本公开的一些实施例中,步骤S10包括获取多个统计组内每个统计组对应的多个样本数据;以及步骤S30包括针对每个统计组,遍历至少一个样本数据的指标值,并利用第一统计数组对至少一个样本数据进行统计,以获得数据统计结果。
例如,步骤S10可以是获取多个预设时间段中每个预设时间段采集到的样本数据。例如,预设时间段为1小时,以1小时为统计单位,依次获取每1小时的样本数据。例如,当前时刻为2021年3月30日16:32,则获取2021年3月30日15:00到16:00采集到的样本数据,在2021年3月30日15:00之后以及16:00之前的时刻获取14:00到15:00之间采集到的样本数据,依次类推,获取多个预设时间段中每个预设时间段采集到的样本数据。在该实施例中,步骤S10为依次获取小时级的样本数据。
在该实施例中,对于步骤S30,针对每小时的样本数据,遍历至少一个样本数据的指标值,并利用第一统计数组对至少一个样本数据进行统计,以获得该小时的数据统计结果。
在本公开的一些实施例中,例如对于每个预设时间段内的样本数据,都在该预设时间段之后并且与该预设时间段相邻的下一个预设时间段中的某个时刻进行统计。例如,对于2021年3月30日15:00-16:00的样本数据,在16:00-17:00之间的某个时刻进行统计,以得到数据统计结果。
图2示出了本公开至少一个实施例提供的另一种数据处理方法的流程图。
如图2所示,该数据处理方法在包括图1B中的步骤S10~S30的基础上,还可以包括步骤S40~S80。
步骤S40:从不同的指标值中确定至少一个待统计指标值。
例如,从表二的路过屏幕人数、触达屏幕人数、关注屏幕人数、男性人数和女性人数中选择一个或者多个作为待统计指标值。例如,待统计指标值为路过屏幕人数。
步骤S50:获取多个统计区间。
在本公开的一些实施例中,在统计单位内获取到的至少一个样本数据作为一个统计组,每个统计区间包括至少一个统计单位。例如,统计单位可以是预设时间段,或者是预设数量的终端设备。
在本公开的一些实施例中,多个统计区间的范围可以依次增大。例如,多个统计区 间包括第一统计区间和第二统计区间,第二统计区间的范围大于第一统计区间的范围,并且第一统计区间在第二统计区间中。
又例如,多个统计区间包括第一统计区间、第二统计区间和第三统计区间,第三统计区间的范围大于第二统计区间的范围,并且第二统计区间在第三统计区间中,第二统计区间的范围大于第一统计区间的范围,并且第一统计区间在第二统计区间中。
例如,统计单位可以是预设时间段,每个统计区间可以是至少一个连续的预设时间段。例如,当前时刻为2021-03-30 16:32,多个统计区间可以分别为:当天统计区间[2021-03-30 00:00:00-2021-03-30 16:00:00),7天统计区间[2021-03-23 16:00:00-2021-03-30 16:00:00),以及30天统计区间[2021-02-28 16:00:00-2021-03-30 16:00:00)。也即,多个统计区间分别代表不同的时间维度。
又例如,统计单元为1个终端设备,多个统计区间例如可以分别是某个终端设备、包含该终端设备的站点上的全部终端设备、以及该站点所在的线路上的全部终端设备。
步骤S60:为多个统计区间建立第二统计数组,第二统计数组包括多个第二元素,多个第二元素与多个统计区间一一对应。
例如,统计区间的数量为N个,N为大于等于2的整数,则第二统计数据中第二元素的数量也可以是N个,使得多个第二元素与多个统计区间一一对应。
例如,统计区间的数量为3个,第二统计数组ges可以包括3个元素,分别为ges[0]、ges[1]和ges[2]。例如,ges[0]与[2021-03-30 00:00:00-2021-03-30 16:00:00)对应,ges[1]与[2021-03-23 16:00:00-2021-03-30 16:00:00)对应,以及ges[2]与[2021-02-28 16:00:00-2021-03-30 16:00:00)对应。
又例如,待统计指标值为两个,分别为路过屏幕人数和触达屏幕人数,以及统计区间的数量为3个,则第二统计数组ges可以包括6个元素,分别为ges[0]、ges[1]、ges[2]、ges[3]、ges[4]和ges[5]。例如,ges[0]、ges[1]和ges[2]分别用于对路过屏幕人数的不同时间维度进行统计,ges[3]、ges[4]和ges[5]分别用于对触达屏幕人数的不同时间维度进行统计。
步骤S70:从每个统计组内的数据统计结果中筛选属于多个统计区间的区间统计结果。
例如,从数据统计结果中筛选出最大统计区间范围内的数据统计结果。例如在上述多个统计区间分别为当天统计区间、7天统计区间和30天统计区间的情景中,可以从数据统计结果中筛选出30天统计区间[2021-02-28 16:00:00-2021-03-30 16:00:00)内的区间统计结果。
步骤S80:利用多个第二元素分别对区间统计结果中的待统计指标值进行统计,以获得每个统计区间的待统计指标值的指标值统计结果。
例如,利用ges[0]对[2021-03-30 00:00:00-2021-03-30 16:00:00)中的待统计指标值1(即,路过屏幕人数)进行统计,ges[1]对[2021-03-23 16:00:00-2021-03-30 16:00:00)中的路过屏幕人数进行统计,以及ges[2]对[2021-02-28 16:00:00-2021-03-30 16:00:00)中的路过屏幕人数进行统计,从而分别获取在[2021-03-30 00:00:00-2021-03-30 16:00:00)中的路过屏幕人数,[2021-03-23 16:00:00-2021-03-30 16:00:00)中的路过屏幕人数,以及[2021-02-28 16:00:00-2021-03-30 16:00:00)中的路过屏幕人数。
图2所示的方法通过一次数据处理便可实现不同维度的数据统计,实现了高并行度的计算,从而提高了数据计算的效率,并且只需要更新待统计指标值,便可以实现将该方法应用到对更新后的待统计指标值进行统计,因此该方法还具有高可复用性。通过这种两段式(第一阶段是利用图1B所示的方法得到数据统计结果,第二阶段利用图2所示的方法得到指标值统计结果)处理方法可以减少数据冗余,实现数据规整,同时有效避免瞬时大吞吐数据造成的计算设备的性能问题。
在本公开的一些实施例中,第二统计数组包括的第二元素的个数,以及第一统计数组包括的第一元素的个数可以根据实际需要动态扩展。例如,第二统计数据的统计维度根据实际需求可通过参数配置进行动态扩展。
图3示出了本公开至少一个实施例提供的图2中步骤S80的方法流程图。
如图3所示,该数据处理方法在包括步骤S81~S83。
步骤S81:针对每个区间统计结果,确定区间统计结果对应的统计组所属的统计区间。
例如,对于表二所示的区间统计结果,该区间统计结果对应的统计组的预设时间段为[2021-03-30 15:00:00-2021-03-30 16:00:00),而该预设时间段所属的统计区间不仅包括当天统计区间[2021-03-30 00:00:00-2021-03-30 16:00:00),还包括7天统计区间[2021-03-23 16:00:00-2021-03-30 16:00:00),以及30天统计区间[2021-02-28 16:00:00-2021-03-30 16:00:00)。
步骤S82:从区间统计结果中提取待统计指标值的统计子结果。
例如,待统计指标值为路过人数,从每个区间统计结果中提取路过人数的统计子结果。例如从表二所示的区间统计结果中提取当天统计区间[2021-03-30 15:00:00-2021-03-30 16:00:00)的路过人数为29人。
步骤S83:将统计子结果累加到与区间统计结果对应的统计组所属的统计区间相对 应的第二元素上,以获得每个统计区间的待统计指标值的指标值统计结果。
例如,将29分别累加到当天统计区间[2021-03-30 00:00:00-2021-03-30 16:00:00),7天统计区间[2021-03-23 16:00:00-2021-03-30 16:00:00),以及30天统计区间[2021-02-28 16:00:00-2021-03-30 16:00:00)中,从而获得当天统计区间[2021-03-30 00:00:00-2021-03-30 16:00:00),7天统计区间[2021-03-23 16:00:00-2021-03-30 16:00:00),以及30天统计区间[2021-02-28 16:00:00-2021-03-30 16:00:00)各自的路过人数的统计结果。
例如,根据3种统计区间,第二统计数组ges类型为包含3个第二元素的数组格式。在步骤S83可以首先将ges各元素初始化为0,然后对该区间统计结果的时间进行判断:响应于该条区间统计结果的时间在[2021-03-30 00:00:00-2021-03-30 16:00:00),则res[0]+=num_pass,res[1]+=num_pass,res[2]+=num_pass;响应于该区间统计结果的时间在[2021-03-23 16:00:00-2021-03-30 16:00:00),则res[1]+=num_pass,res[2]+=num_pass;响应于该区间统计结果的时间在[2021-02-28 16:00:00-2021-03-30 16:00:00),则res[2]+=num_pass。
图4示出了本公开至少一个实施例提供的另一种数据处理方法的流程图。
如图4所示,该数据处理方法在包括图1B中的步骤S10~S30的基础上,还可以包括步骤S410~S420。例如,步骤S410和步骤S420在步骤S10之前执行。
步骤S410:接收来自数据源的初始数据。
例如,在图1A所示的系统架构中,大数据后台103接收来自业务端102的初始数据。
例如,在地铁场景中,显示屏可以包括摄像头,摄像头将采集到的初始数据传输到业务端102,然后业务端102将来自摄像头的初始数据传输到大数据后台103,以将数据处理下沉至大数据后台103,以降低前端可视化大屏和业务端的计算和响应时间的压力。
步骤S420:根据初始数据,建立至少一个样本数据。
例如,大数据后台103根据初始数据,建立至少一个样本数据。
在本公开的一些实施例中,初始数据包括统计属性信息。统计属性信息作为用于存储初始数据的信息。例如,初始数据按照时间进行存储,则统计属性信息可以是表一中的事件时间,或者,初始数据按照摄像头ID存储,则统计属性信息可以是摄像头ID等。
例如,根据初始数据中事件时间所属的预设时间段,确定存储初始数据的存储文件,每个预设时间段内的初始数据存储到相同的存储文件中。或者,根据初始数据中摄像头ID,确定存储初始数据的存储文件,每个摄像头采集到的初始数据存储到相同的存储文件中。
图5示出了本公开至少一个实施例提供的图4中步骤S420的方法流程图。
如图5所示,该方法可以包括步骤S421~S423。
步骤S421:根据统计属性信息,确定是否存在用于存储初始数据的存储文件。
例如,统计属性信息为事件时间,根据初始数据的事件时间判断是否存在用于存储初始数据的存储文件。例如,按“…/业务主题/load_date=YYYY-MM-dd/load_hour=HH”的路径结构存储初始数据至文件路径的存储文件中。若事件时间对应的文件路径中存在存储文件,则存在用于存储初始数据的存储文件,若事件时间对应的文件路径中不存储存储文件,则不存在用于存储初始数据的存储文件。
在本公开的一些实施例中,属于同一统计组的初始数据存储到同一存储文件中。
例如,属于同一预设时间段的初始数据存储到同一存储文件中。
例如,初始数据的统计属性信息为2021年3月30日15:32,2021年3月30日15:32所属的预设时间段为2021年3月30日15:00-2021年3月30日16:00,则存储该初始数据的存储文件的文件路径为“topic_name/load_date=2021-03-30/load_hour=15”。在步骤S421,判断文件路径topic_name/load_date=2021-03-30/load_hour=15中是否存在存储文件。
步骤S422:响应于存在用于存储初始数据的存储文件,将初始数据存储到存储文件中,以作为至少一个样本数据使用。
例如,若文件路径topic_name/load_date=2021-03-30/load_hour=15中存在存储文件,则将该初始数据存储到文件路径topic_name/load_date=2021-03-30/load_hour=15中的存储文件中,从而该初始作为样本数据。
步骤S423:响应于不存在用于存储初始数据的存储文件,根据统计属性信息确定初始数据所属的统计组,并且根据统计组生成文件路径,并且在文件路径中创建存储文件,存储文件用于存储初始数据,以将初始数据作为所述至少一个样本数据使用。
例如,若文件路径topic_name/load_date=2021-03-30/load_hour=15中不存在存储文件,则根据统计属性信息2021年3月30日15:32确定该初始数据所述的统计组为预设时间段为2021年3月30日15:00-2021年3月30日16:00。然后,根据统计组预设时间段为2021年3月30日15:00-2021年3月30日16:00生成文件路径topic_name/load_date=2021-03-30/load_hour=15,以及在该文件路径中创建存储文件中,以用于存储事件时间位于预设时间段2021年3月30日15:00-2021年3月30日16:00的初始数据,从而该初始数据用作样本数据。
根据上述实施例,属于同一统计组的初始数据存储到同一存储文件中,这样可以在存储初始数据时便对初始数据进行了划分,可以方便后续地查找初始数据,进一步提高 了统计效率。
在本公开的一些实施例中,例如可以利用Kafka分布式日志系统按照上述路径结构将来自业务端的初始数据存储到大数据平台103(例如可以存储在大数据平台103本地的存储器中,或者存储到与大数据平台103关联的其他存储设备中),以便大数据平台103对样本数据进行统计分析。在该实施例中,步骤S10可以是从经过Kafka分布式日志系统落地的存储文件中获取至少一个样本数据。
在本公开的另一些实施例中,初始数据也可以是存储到数据库中,在该实施例中,步骤S10可以是从数据库中获取至少一个样本数据。
图6示出了本公开至少一实施例提供的图1中步骤S10的方法流程图。
如图6所示,该方法可以包括步骤S11~步骤S13。
步骤S11:生成统计组对应的存储文件的文件路径。
例如,当前时刻为2021年3月30日16:32,则获取2021年3月30日15:00到16:00统计组内的样本数据,根据上述图5描述的存储规则,存储2021年3月30日15:00到16:00统计组内的样本数据的存储文件的文件路径为topic_name/load_date=2021-03-30/load_hour=15。
步骤S12:判断是否存在文件路径。
例如,判断是否存在文件路径topic_name/load_date=2021-03-30/load_hour=15。
步骤S13:响应于存在文件路径,从文件路径中的存储文件中获取统计组内的初始数据以作为至少一个样本数据。
例如,响应于存在该文件路径中,从该文件路径中的存储文件中获取统计组内的初始数据以作为至少一个样本数据。
在本公开的一些实施例中,响应于不存在文件路径,则继续获取下一个统计组的样本数据或执行后续数据处理步骤。
在本公开的一些实施例中,该数据处理方法应用于多个电子设备。多个电子设备例如组成图1中所示的大数据后台103,也即,多个电子设备组成服务器集群。至少一个样本数据包括多个样本数据组,多个电子设备与多个样本数据组一一对应,多个电子设备配置为分别基于对应的样本数据组进行统计,并将统计的数值相加以得到所述数据统计结果。多个样本数据组,例如可以是指分别存储于不同的电子设备中的样本数据。
例如,2021年3月30日15:00到16:00统计组内的至少一个样本数据中部分样本数据落地至多个电子设备中的第一服务器,该部分样本数据为一个样本数据组,以及2021年3月30日15:00到16:00统计组内的至少一个样本数据中除上述部分样本数据之外的 其他样本数据落地至多个电子设备中的第二服务器,该其他样本数据为另一个样本数据组。第一服务器对第一服务器中的部分样本数据进行统计得到第一数据统计结果res(a),第二服务器对第二服务器中的其他样本数据进行统计得到第二数据统计结果res(b)。接下来,将res(a)和res(b)中对应第一元素相加得到2021年3月30日15:00到16:00统计组的数据统计结果res。例如,res(a)[0]+res(b)[0]=res[0],res(a)[1]+res(b)[1]=res[1]以及res(a)[2]+res(b)[2]=res[2]。
图7A示出了本公开至少一实施例提供的另一数据处理方法的流程图。
如图7A所示,该数据处理方法包括步骤S701~步骤S707。在该实施例中以预设时间段为1小时,也即,统计单位为1小时为例说明该数据处理方法。
步骤S701:根据当前时间生成上一小时文件路径。
例如,对于每个预设时间段内的样本数据,都在该预设时间段之后并且与该预设时间段相邻的下一个预设时间段中的某个时刻进行统计。例如,对于2021年3月30日15:00-16:00的样本数据,在16:00-17:00之间的某个时刻进行统计,以得到数据统计结果。因此,例如当前时刻为2021年3月30日16:32,则根据当前时间生成上一小时的文件路径,即,生成2021年3月30日15:00-16:00的样本数据的文件路径。根据上述路径结构,生成的文件路径例如为topic_name/load_date=2021-03-30/load_hour=15。
该步骤S701与图6中的步骤S11类似。
步骤S702:判断文件路径是否存在。
例如,判断topic_name/load_date=2021-03-30/load_hour=15是否存在。
若文件路径topic_name/load_date=2021-03-30/load_hour=15存在,则执行步骤S703。若文件路径topic_name/load_date=2021-03-30/load_hour=15不存在,则执行步骤S705。
该步骤S701与图6中的步骤S12类似。
步骤S703:对上一小时样本数据聚合预处理。
例如,对文件路径topic_name/load_date=2021-03-30/load_hour=15中的样本数据进行聚合预处理。聚合预处理例如可以是执行图1B中步骤S20和步骤S30,以获取对样本数据的统计指标的指标值进行统计的数据统计结果。
上述步骤S701~步骤S703可以得到小时级的数据统计结果。
步骤S704:将小时级的数据统计结果存储到数据仓库。例如,按照上文表二的形式将数据统计结果存储到数据仓库。
步骤S705:判断统计区间内是否有数据记录。
例如,统计区间包括多个,多个统计区间的范围可以依次增大,则步骤S705可以是 判断多个统计区间中范围最大的统计区间内是否有数据记录。例如,判断数据仓库中是否有属于该范围最大的统计区间的数据统计结果。
若范围最大的统计区间内有数据记录,则可以从每个统计组内的数据统计结果中筛选属于范围最大的统计区间内的区间统计结果,并且执行步骤S706。
若范围最大的统计区间内没有数据记录,则返回无数据记录的提示信息,至此该数据处理方法结束。
步骤S706:对区间统计结果按照时间维度分级聚合并行化处理。
例如,执行上文参考图3描述的步骤S81~S83,对区间统计结果按照时间维度分级聚合并行化处理,以获得各个时间维度的指标值统计结果。时间维度例如包括:当天统计区间[2021-03-30 00:00:00-2021-03-30 16:00:00),7天统计区间[2021-03-23 16:00:00-2021-03-30 16:00:00),以及30天统计区间[2021-02-28 16:00:00-2021-03-30 16:00:00)。
步骤S707:将指标值统计结果插入汇总数据表。
例如,将待统计指标值的指标值统计结果插入汇总数据表。例如,汇总数据表中可以包括不同待统计指标值各自的指标值统计结果。将各个待统计指标值的指标值统计结果汇总到汇总数据表中可以便于统计人员进行比较和分析。
图7B示出了本公开至少一实施例提供的另一数据处理方法的示意图。
如图7B所示,该数据处理方法包括步骤S710~步骤S730。步骤S710例如是第一服务器执行的,步骤S720例如是第二服务器执行的。
步骤S710:第一服务器对统计组内的第一样本数据组中的样本数据,按照图1B所示的方法进行数据处理。例如,利用第一统计数组res对第一样本数据组中的样本数据的统计指标的指标值进行统计。例如,统计指标的指标值列表中包含3个指标值,那么可以第一统计数组res可以包含3个第一元素,3个第一元素分别为res[0]、res[1]和res[2],以分别利用3个第一元素对3个指标值进行统计获得第一数据统计结果,即获得第一样本数据组中各个指标值的统计子结果。例如,在利用第一统计数组res对第一样本数据组内的样本数据的统计指标的指标值进行统计之前,对第一统计数组res进行初始化,使得各个第一元素为0。
步骤S720:第二服务器对统计组内的第二样本数据组的样本数据,按照图1B所述的方法进行数据处理。例如,利用第一统计数组res对第二样本数据组内的样本数据的统计指标的指标值进行统计,以获得第二数据统计结果。例如,在利用第一统计数组res对第二样本数据组的样本数据的统计指标的指标值进行统计之前,对第一统计数组res 进行初始化,使得各个第一元素为0。需要说明的是,第二服务器的处理方式与第一服务器的处理方式基本相同,区别在于两者针对的具体的样本数据不同,例如,第二服务器针对第二样本数据组进行处理,第一服务器针对第一样本数据组进行处理。
步骤S710和步骤S720的统计组为同一统计组,例如都是2021年3月30日15:00-16:00获得的样本数据所组成的统计组。同一统计组的样本数据可能会落入到不同的服务器中存储,因此,对于同一统计组的样本数据统计,需要将不同服务器中属于该同一统计组的样本数据的统计结果进行计算而得到该同一统计组的数据统计结果。
步骤S730:将第一数据统计结果和第二数据统计结果中对应第一元素相加得到数据统计结果700。例如,第一服务器统计得到的res[0]和第二服务器统计得到的res[0]相加得到数据统计结果700的res[0],类似地,第一服务器统计得到的res[1]和第二服务器统计得到的res[1]相加得到数据统计结果700的res[1],第一服务器统计得到的res[2]和第二服务器统计得到的res[2]相加得到数据统计结果700的res[2]。
图8示出了本公开至少一个实施例提供的一种数据处理装置800的示意框图。
例如,如图8所示,该数据处理装置800包括样本获取单元810、数组创建单元820和遍历单元830。
样本获取单元810配置为获取统计组内的至少一个样本数据,每个样本数据包括统计指标和统计指标的指标值。
样本获取单元810例如可以执行图1B描述的步骤S10。
数组创建单元820配置为创建对应于所述统计指标的第一统计数组,所述第一统计数组包括多个第一元素,所述多个第一元素分别用于对不同的指标值进行统计。
数组创建单元820例如可以执行图1B描述的步骤S20。
遍历单元830配置为遍历所述至少一个样本数据的指标值,并利用所述第一统计数组对所述至少一个样本数据进行统计,以获得数据统计结果,其中,第一统计数组中的多个第一元素分别为每个指标值各自的统计子结果。
遍历单元830例如可以执行图1B描述的步骤S30。
例如,样本获取单元810、数组创建单元820和遍历单元830可以为硬件、软件、固件以及它们的任意可行的组合。例如,样本获取单元810、数组创建单元820和遍历单元830可以为专用或通用的电路、芯片或装置等,也可以为处理器和存储器的结合。关于上述各个单元的具体实现形式,本公开的实施例对此不作限制。
需要说明的是,本公开的实施例中,数据处理装置800的各个单元与前述的数据处理方法的各个步骤对应,关于数据处理装置800的具体功能可以参考关于数据处理方法 的相关描述,此处不再赘述。图8所示的数据处理装置800的组件和结构只是示例性的,而非限制性的,根据需要,该数据处理装置800还可以包括其他组件和结构。
本公开的至少一个实施例还提供了一种电子设备,该电子设备包括处理器和存储器,存储器包括一个或多个计算机程序模块。一个或多个计算机程序模块被存储在存储器中并被配置为由处理器执行,一个或多个计算机程序模块包括用于实现上述的数据处理方法的指令。该电子设备可以利用数组通过一次数据处理便可实现对多个指标值的并行统计,提高了统计效率,并且具有高可复用性。
图9为本公开一些实施例提供的一种电子设备的示意框图。如图9所示,该电子设备900包括处理器910和存储器920。存储器920用于存储非暂时性计算机可读指令(例如一个或多个计算机程序模块)。处理器910用于运行非暂时性计算机可读指令,非暂时性计算机可读指令被处理器910运行时可以执行上文所述的数据处理方法中的一个或多个步骤。存储器920和处理器910可以通过总线系统和/或其它形式的连接机构(未示出)互连。
例如,处理器910可以是中央处理单元(CPU)、图形处理单元(GPU)或者具有数据处理能力和/或程序执行能力的其它形式的处理单元。例如,中央处理单元(CPU)可以为X86或ARM架构等。处理器910可以为通用处理器或专用处理器,可以控制电子设备900中的其它组件以执行期望的功能。
例如,存储器920可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序模块,处理器910可以运行一个或多个计算机程序模块,以实现电子设备900的各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据以及应用程序使用和/或产生的各种数据等。
需要说明的是,本公开的实施例中,电子设备900的具体功能和技术效果可以参考上文中关于数据处理方法的描述,此处不再赘述。
图10为本公开一些实施例提供的另一种电子设备的示意框图。该电子设备1000例如适于用来实施本公开实施例提供的数据处理方法。电子设备1000可以是终端设备等。需要注意的是,图10示出的电子设备1000仅仅是一个示例,其不会对本公开实施例的功能和使用范围带来任何限制。
如图10所示,电子设备1000可以包括处理装置(例如中央处理器、图形处理器等)1010,其可以根据存储在只读存储器(ROM)1020中的程序或者从存储装置1080加载到随机访问存储器(RAM)1030中的程序而执行各种适当的动作和处理。在RAM 1030中,还存储有电子设备1000操作所需的各种程序和数据。处理装置1010、ROM 1020以及RAM1030通过总线1040彼此相连。输入/输出(I/O)接口1050也连接至总线1040。
通常,以下装置可以连接至I/O接口1050:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1060;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置1070;包括例如磁带、硬盘等的存储装置1080;以及通信装置1090。通信装置1090可以允许电子设备1000与其他电子设备进行无线或有线通信以交换数据。虽然图10示出了具有各种装置的电子设备1000,但应理解的是,并不要求实施或具备所有示出的装置,电子设备1000可以替代地实施或具备更多或更少的装置。
例如,根据本公开的实施例,上述数据处理方法可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包括用于执行上述数据处理方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1090从网络上被下载和安装,或者从存储装置1080安装,或者从ROM 1020安装。在该计算机程序被处理装置1010执行时,可以实现本公开实施例提供的数据处理方法中限定的功能。
本公开的至少一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质用于存储非暂时性计算机可读指令,当非暂时性计算机可读指令由计算机执行时可以实现上述的数据处理方法。利用该计算机可读存储介质,可以利用数组通过一次数据处理便可实现对多个指标值的并行统计,提高了统计效率,并且具有高可复用性。
图11为本公开一些实施例提供的一种存储介质的示意图。如图11所示,存储介质1100用于存储非暂时性计算机可读指令1110。例如,当非暂时性计算机可读指令1110由计算机执行时可以执行根据上文所述的数据处理方法中的一个或多个步骤。
例如,该存储介质1100可以应用于上述电子设备900中。例如,存储介质1100可以为图9所示的电子设备900中的存储器920。例如,关于存储介质1100的相关说明可以参考图9所示的电子设备900中的存储器920的相应描述,此处不再赘述。
有以下几点需要说明:
(1)本公开实施例附图只涉及到本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新 的实施例。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。
Claims (14)
- 一种数据处理方法,包括:获取统计组内的至少一个样本数据,其中,每个样本数据包括统计指标和所述统计指标的指标值;创建对应于所述统计指标的第一统计数组,其中,所述第一统计数组包括多个第一元素,所述多个第一元素分别用于对不同的指标值进行统计;以及遍历所述至少一个样本数据的指标值,并利用所述第一统计数组对所述至少一个样本数据进行统计,以获得数据统计结果,其中,所述第一统计数组中的多个第一元素分别为每个指标值各自的统计子结果。
- 根据权利要求1所述的方法,其中,获取所述统计组内的至少一个样本数据,包括:获取多个统计组内每个统计组对应的多个样本数据;遍历所述至少一个样本数据的指标值,并利用所述第一统计数组对所述至少一个样本数据进行统计,以获得所述数据统计结果,包括:针对每个统计组,遍历所述至少一个样本数据的指标值,并利用所述第一统计数组对所述至少一个样本数据进行统计,以获得所述数据统计结果。
- 根据权利要求2所述的方法,其中,在统计单位内获取到的至少一个样本数据作为一个所述统计组,所述方法还包括:从所述不同的指标值中确定至少一个待统计指标值;获取多个统计区间,其中,每个统计区间包括至少一个所述统计单位;为所述多个统计区间建立第二统计数组,其中,所述第二统计数组包括多个第二元素,所述多个第二元素与所述多个统计区间一一对应;以及从所述每个统计组内的数据统计结果中筛选属于所述多个统计区间的区间统计结果;利用所述多个第二元素分别对所述区间统计结果中的所述待统计指标值进行统计,以获得每个统计区间的所述待统计指标值的指标值统计结果。
- 根据权利要求3所述的方法,其中,利用所述多个第二元素分别对所述区间统计结果中的所述待统计指标值进行统计,以获得每个统计区间的所述待统计指标值的指标值统计结果,包括:针对每个所述区间统计结果,确定所述区间统计结果对应的统计组所属的统计区间;从所述区间统计结果中提取所述待统计指标值的统计子结果;以及将所述统计子结果累加到与所述区间统计结果对应的统计组所属的统计区间相对应的第二元素上,以获得每个统计区间的所述待统计指标值的指标值统计结果。
- 根据权利要求2-4任一项所述的方法,其中,遍历所述统计组内至少一个样本数据的指标值,并利用所述第一统计数组对所述统计组中所述至少一个样本数据进行统计,以获得所述统计组内的数据统计结果,包括:遍历所述统计组内至少一个样本数据的指标值,并利用所述第一统计数组中的多个元素分别对所述统计组中各个指标值计数,以获得所述统计组的数据统计结果。
- 根据权利要求3或4所述的方法,其中,所述多个统计区间包括第一统计区间和第二统计区间,所述第二统计区间的范围大于所述第一统计区间的范围,并且所述第一统计区间在所述第二统计区间中。
- 根据权利要求1-6任一项所述的方法,还包括:接收来自数据源的初始数据;以及根据所述初始数据,建立所述至少一个样本数据。
- 根据权利要求7所述的方法,其中,所述初始数据包括统计属性信息,根据所述初始数据,建立所述至少一个样本数据,包括:根据所述统计属性信息,确定是否存在用于存储所述初始数据的存储文件;响应于存在用于存储所述初始数据的存储文件,将所述初始数据存储到所述存储文件中,以作为所述至少一个样本数据使用;响应于不存在用于存储所述初始数据的存储文件,根据所述统计属性信息确定所述初始数据所属的统计组,并且根据所述统计组生成文件路径,并且在所述文件路径中创建所述存储文件,其中,所述存储文件用于存储所述初始数据,以将所述初始数据作为所述至少一个样本数据使用,其中,属于同一统计组的初始数据存储到同一存储文件中。
- 根据权利要求8所述的方法,其中,获取所述统计组内的至少一个样本数据,包括:生成所述统计组对应的存储文件的文件路径;判断是否存在所述文件路径;以及响应于存在所述文件路径,从所述文件路径中的存储文件中获取所述统计组内的初始数据以作为所述至少一个样本数据。
- 根据权利要求3或4所述的方法,其中,所述统计单位包括预设时间段或者预设数量的至少一个终端设备。
- 根据权利要求1~10任一项所述的方法,其中,所述数据处理方法应用于多个电子设备,所述至少一个样本数据包括多个样本数据组,所述多个电子设备与所述多个样本数据组一一对应,所述多个电子设备配置为分别基于对应的样本数据组进行统计,并将统计的数值相加以得到所述数据统计结果。
- 一种数据处理装置,包括:样本获取单元,配置为获取统计组内的至少一个样本数据,其中,每个样本数据包括统计指标和统计指标的指标值;数组创建单元,配置为创建对应于所述统计指标的第一统计数组,其中,所述第一统计数组包括多个第一元素,所述多个第一元素分别用于对不同的指标值进行统计;以及遍历单元,配置为遍历所述至少一个样本数据的指标值,并利用所述第一统计数组对所述至少一个样本数据进行统计,以获得数据统计结果,其中,所述第一统计数组中的多个第一元素分别为每个指标值各自的统计子结果。
- 一种电子设备,包括:处理器;存储器,包括一个或多个计算机程序指令;其中,所述一个或多个计算机程序指令被存储在所述存储器中,并由所述处理器执行时实现权利要求1-11任一项所述的数据处理方法的指令。
- 一种计算机可读存储介质,非暂时性存储有计算机可读指令,其中,当所述计算机可读指令由处理器执行时实现权利要求1-11任一项所述的数据处理方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/908,271 US20230334068A1 (en) | 2021-08-20 | 2021-08-20 | Data processing method and apparatus thereof, electronic device, and computer-readable storage medium |
CN202180002237.1A CN115997203A (zh) | 2021-08-20 | 2021-08-20 | 数据处理方法、装置、电子设备和计算机可读存储介质 |
PCT/CN2021/113809 WO2023019560A1 (zh) | 2021-08-20 | 2021-08-20 | 数据处理方法、装置、电子设备和计算机可读存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/113809 WO2023019560A1 (zh) | 2021-08-20 | 2021-08-20 | 数据处理方法、装置、电子设备和计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023019560A1 true WO2023019560A1 (zh) | 2023-02-23 |
Family
ID=85239397
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/113809 WO2023019560A1 (zh) | 2021-08-20 | 2021-08-20 | 数据处理方法、装置、电子设备和计算机可读存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230334068A1 (zh) |
CN (1) | CN115997203A (zh) |
WO (1) | WO2023019560A1 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101963970A (zh) * | 2010-01-12 | 2011-02-02 | 浪潮(山东)电子信息有限公司 | 一种高效率低维护的数据统计方法 |
CN105354272A (zh) * | 2015-10-28 | 2016-02-24 | 中通服公众信息产业股份有限公司 | 一种基于维度组合的指标计算方法和系统 |
CN106991145A (zh) * | 2017-03-23 | 2017-07-28 | 中国银联股份有限公司 | 一种监测数据的方法及装置 |
US20190156917A1 (en) * | 2016-07-27 | 2019-05-23 | Huawei Technologies Co., Ltd. | Data Processing Method and Apparatus |
CN112001829A (zh) * | 2020-08-14 | 2020-11-27 | 青岛市城市规划设计研究院 | 一种基于手机信令数据的人口分布判断方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7706965B2 (en) * | 2006-08-18 | 2010-04-27 | Inrix, Inc. | Rectifying erroneous road traffic sensor data |
US7912628B2 (en) * | 2006-03-03 | 2011-03-22 | Inrix, Inc. | Determining road traffic conditions using data from multiple data sources |
EP2041676A4 (en) * | 2006-06-26 | 2012-05-16 | Nielsen Co Us Llc | METHOD AND DEVICES FOR IMPROVING THE DATA WAREHOUSE EFFICIENCY |
EP3756085A4 (en) * | 2018-10-18 | 2021-10-27 | Hewlett-Packard Development Company, L.P. | CREATION OF STATISTICAL ANALYZES OF DATA FOR TRANSMISSION TO SERVERS |
JP7088871B2 (ja) * | 2019-03-29 | 2022-06-21 | ファナック株式会社 | 検査装置、検査システム、及びユーザインタフェース |
CN110943883B (zh) * | 2019-11-13 | 2023-01-31 | 深圳市东进技术股份有限公司 | 网络流量统计方法、系统、网关及计算机可读存储介质 |
JP6980883B1 (ja) * | 2020-09-30 | 2021-12-15 | 株式会社ドワンゴ | アシストシステム、アシスト方法、およびアシストプログラム |
-
2021
- 2021-08-20 US US17/908,271 patent/US20230334068A1/en active Pending
- 2021-08-20 CN CN202180002237.1A patent/CN115997203A/zh active Pending
- 2021-08-20 WO PCT/CN2021/113809 patent/WO2023019560A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101963970A (zh) * | 2010-01-12 | 2011-02-02 | 浪潮(山东)电子信息有限公司 | 一种高效率低维护的数据统计方法 |
CN105354272A (zh) * | 2015-10-28 | 2016-02-24 | 中通服公众信息产业股份有限公司 | 一种基于维度组合的指标计算方法和系统 |
US20190156917A1 (en) * | 2016-07-27 | 2019-05-23 | Huawei Technologies Co., Ltd. | Data Processing Method and Apparatus |
CN106991145A (zh) * | 2017-03-23 | 2017-07-28 | 中国银联股份有限公司 | 一种监测数据的方法及装置 |
CN112001829A (zh) * | 2020-08-14 | 2020-11-27 | 青岛市城市规划设计研究院 | 一种基于手机信令数据的人口分布判断方法 |
Also Published As
Publication number | Publication date |
---|---|
US20230334068A1 (en) | 2023-10-19 |
CN115997203A (zh) | 2023-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106934014B (zh) | 一种基于Hadoop的网络数据挖掘与分析平台及其方法 | |
US9158843B1 (en) | Addressing mechanism for data at world wide scale | |
CN109034993A (zh) | 对账方法、设备、系统及计算机可读存储介质 | |
CN111274256B (zh) | 基于时序数据库的资源管控方法、装置、设备及存储介质 | |
CN104933173B (zh) | 一种用于异构多数据源的数据处理方法、装置和服务器 | |
CN104731896A (zh) | 一种数据处理方法及系统 | |
WO2022083436A1 (zh) | 数据处理方法、装置、设备及可读存储介质 | |
US20170212930A1 (en) | Hybrid architecture for processing graph-based queries | |
WO2022100032A1 (zh) | 系统分析可视化方法、装置、电子设备及计算机可读存储介质 | |
CN112506486A (zh) | 搜索系统建立方法、装置、电子设备及可读存储介质 | |
CN112949278A (zh) | 数据核对方法、装置、电子设备及可读存储介质 | |
CN115544183A (zh) | 数据可视化方法、装置、计算机设备和存储介质 | |
CN113962597A (zh) | 一种数据分析方法、装置、电子设备及存储介质 | |
CN112328592A (zh) | 数据存储方法、电子设备及计算机可读存储介质 | |
CN110309143A (zh) | 数据相似度确定方法、装置及处理设备 | |
CN114020819A (zh) | 一种多系统参数同步方法及装置 | |
CN112163127B (zh) | 关系图谱构建方法、装置、电子设备及存储介质 | |
CN116719822B (zh) | 一种海量结构化数据的存储方法及系统 | |
CN113656369A (zh) | 一种大数据场景下的日志分布式流式采集及计算方法 | |
CN112965943A (zh) | 一种数据处理方法、装置、电子设备以及存储介质 | |
WO2023019560A1 (zh) | 数据处理方法、装置、电子设备和计算机可读存储介质 | |
WO2022151614A1 (zh) | 数据迁移方法、装置、设备及存储介质 | |
WO2022134345A1 (zh) | 文件访问方法、装置、设备及可读存储介质 | |
CN114490667A (zh) | 多维度的数据分析方法、装置、电子设备及介质 | |
CN114490137A (zh) | 业务数据实时统计方法、装置、电子设备及可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21953801 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 12/06/2024) |