CN108090158B

CN108090158B - Data processing method and data processing system

Info

Publication number: CN108090158B
Application number: CN201711322010.7A
Authority: CN
Inventors: 彭佳
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2021-02-02
Anticipated expiration: 2037-12-12
Also published as: CN108090158A

Abstract

The invention discloses a data processing method and a data processing system. The method comprises the following steps: according to a set service strategy, collecting the collected user behavior data to form a plurality of standard data, wherein each standard data comprises first dimension data, second dimension data and other user behavior data corresponding to the second dimension data; and distributing the rest user behavior data in the standard data to corresponding servers by adopting a ring scheduling method. By adopting the scheme of the invention, the influence on user experience during the capacity expansion of the server is avoided; by adopting the scheme of the invention, when a data hot spot phenomenon occurs, data is overheated or new resources are added, the system efficiency is improved; by adopting the scheme of the invention, the newly added server can be dynamically supplemented into the cluster, thereby being beneficial to system expansion and efficiency improvement.

Description

Data processing method and data processing system

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a data processing method and a data processing system.

Background

In recent years, with the rapid development of networks and big data technologies, the business of operators is more and more inclined to the data business, and the influence of real-time storage and analysis of mass data in the field of operator business is more and more increased. Therefore, how to guarantee data security and service stability and ensure that the minimum impact can be caused to the service when a problem occurs in data security becomes an important issue.

However, the conventional method has the following technical problems:

1. in a system with many data servers, the increase of a batch of servers in batches causes the data load balance of the whole system, and the time for data synchronization is very long, so that the service processing can be delayed in a period of time when a machine is initially added, and a window period needing synchronization appears during capacity expansion, thereby influencing the user experience;

2. in the real-time analysis of mass data, when a data hot spot phenomenon occurs or partial processing units are offline due to data overheating, existing resources cannot be allocated in real time for supplement, or the whole data is redistributed after new resources are added, so that the system efficiency is seriously influenced;

3. the capacity expansion of the equipment is complex, and particularly when a plurality of heterogeneous systems form a combined cluster, because the load balancing methods of the systems are different, a newly added server cannot be dynamically supplemented into the cluster, needs to be tightly bound with services, and is not beneficial to expansion and efficiency improvement.

Disclosure of Invention

The invention provides a data processing method and a data processing system, which are used for avoiding the influence on user experience when a server expands capacity, improving the system efficiency and being beneficial to system expansion and efficiency improvement.

In order to achieve the above object, the present invention provides a data processing method, including:

according to a set service strategy, collecting the collected user behavior data to form a plurality of standard data, wherein each standard data comprises first dimension data, second dimension data and other user behavior data corresponding to the second dimension data;

and distributing the rest user behavior data in the standard data to corresponding servers by adopting a ring scheduling method.

Optionally, the aggregating the collected user behavior data to form a plurality of specification data includes:

and merging the user behavior data with the same first dimension data to form a plurality of specification data.

Optionally, the allocating, by using a ring scheduling method, the remaining user behavior data in the specification data to the corresponding server includes:

generating a first weight value of the server in the server group according to the serial number of the server, the size of the annular data template and the number of the servers;

generating a range of a server group corresponding to the standard data according to the number of the first dimension data of the standard data, the size of the annular data template and the number of the servers;

judging that the number of servers included in the range of the servers corresponding to the specification data is greater than or equal to 1;

if the range of the server group corresponding to the standard data is judged to be larger than 1, generating a second weight value corresponding to the second dimension data according to the serial number of the second dimension data and the maximum value in the range of the server group corresponding to the standard data; determining a server corresponding to the second dimension data from the range of the server group corresponding to the specification data according to a second weight value and a first weight value corresponding to the second dimension data; and placing the rest user behavior data corresponding to the second dimension data in the specification data on a server corresponding to the second dimension data.

Optionally, if it is determined that the range of the server group corresponding to the specification data is equal to 1, the specification data is placed on the server in the range of the server group corresponding to the specification data.

Optionally, the generating a first weight value of the server in the server group according to the number of the servers, the size of the ring data template, and the number of the servers further includes:

constructing an annular data template, wherein the size N of the annular data template is a set value;

generating a number of the first dimension data according to the first dimension data, and generating a number of the second dimension data according to the second dimension data;

generating a serial number of the server according to the mac address and the IP address of the server;

grouping the servers according to the number of the servers to obtain a plurality of server groups;

and generating the range of each server group according to the size of the annular data template, the number of the servers and the number of the servers.

Optionally, the generating the range of each server group according to the size of the ring data template, the number of servers, and the number of servers includes:

dividing the size of the annular data template by the number of the servers to obtain a range value of the server group;

and obtaining the range of each server group according to the range value of the server group and the number of the server.

Optionally, the generating a first weight value of the server in the server group according to the number of the servers, the size of the ring data template, and the number of the servers includes:

dividing the serial number of the server by the size of the annular data template to obtain a division result;

and performing remainder operation on the division result divided by the number of the servers to obtain a first weight value of the server in the server group.

Optionally, the generating, according to the number of the first dimension data of the specification data, the size of the annular data template, and the number of the servers, the range of the server group corresponding to the specification data includes:

dividing the serial number of the first dimension data by the size of the annular data template to obtain a division result;

performing remainder operation on the division result divided by the number of the servers to obtain a range value of the server corresponding to the standard data;

and searching the range of the server group corresponding to the range value of the server in the range of the server group according to the range value of the server.

Optionally, the generating a second weight value according to the number of the second dimension data and the maximum value in the range of the server group corresponding to the specification data includes:

and dividing the serial number of the second dimension data by the maximum value to carry out remainder taking operation to obtain a second weight value.

Optionally, the determining, according to the second weight value and the first weight value corresponding to the second dimensional data, the server corresponding to the second dimensional data from the range of the server group corresponding to the specification data includes:

calculating the ratio of the second weight value to the first weight value;

multiplying the ratio by the maximum value in the range of the server group corresponding to the specification data to obtain a multiplication result;

selecting the number of the server with the smallest difference value with the multiplication result from the range of the server group;

and determining the server corresponding to the number of the server with the smallest difference between the multiplication results as the server corresponding to the second dimension data.

To achieve the above object, the present invention provides a data processing system comprising:

the acquisition module is used for acquiring user behavior data;

the service aggregation module is used for aggregating the collected user behavior data to form a plurality of standard data according to the set service strategy, wherein each standard data comprises first dimension data, second dimension data and other user behavior data corresponding to the second dimension data;

and the scheduling module is used for distributing the rest user behavior data in the standard data to the corresponding servers by adopting an annular scheduling method.

Optionally, the scheduling module includes:

the first generation submodule is used for generating a first weight value of the server in the server group according to the serial number of the server, the size of the annular data template and the number of the servers;

the second generation submodule is used for generating the range of the server group corresponding to the standard data according to the serial number of the first dimension data of the standard data, the size of the annular data template and the number of the servers;

the judgment submodule is used for judging that the number of the servers included in the range of the servers corresponding to the standard data is greater than or equal to 1;

a third generating sub-module, configured to generate a second weight value corresponding to the second dimensional data according to the number of the second dimensional data and a maximum value in the range of the server group corresponding to the specification data if the determining sub-module determines that the range of the server group corresponding to the specification data is greater than 1;

the fourth generation submodule is used for determining a server corresponding to the second dimension data from the range of the server group corresponding to the specification data according to the second weight value and the first weight value corresponding to the second dimension data;

and the placing submodule is used for placing the rest user behavior data corresponding to the second dimension data in the specification data on a server corresponding to the second dimension data.

The invention has the following beneficial effects:

according to the technical scheme of the data processing method and the data processing system, collected user behavior data are converged according to a set service strategy to form a plurality of standard data, and other user behavior data in the standard data are distributed to corresponding servers by adopting an annular scheduling method; by adopting the scheme of the invention, when a data hot spot phenomenon occurs, data is overheated or new resources are added, the system efficiency is improved; by adopting the scheme of the invention, the newly added server can be dynamically supplemented into the cluster, thereby being beneficial to system expansion and efficiency improvement.

Drawings

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;

fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention;

FIG. 3 is a diagram illustrating a server group according to a second embodiment;

fig. 4 is a schematic structural diagram of a data processing system according to a third embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following describes the data processing method and the data processing system provided by the present invention in detail with reference to the accompanying drawings.

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step 101, according to a set service policy, aggregating collected user behavior data to form a plurality of standard data, where each standard data includes first dimension data, second dimension data, and other user behavior data corresponding to the second dimension data.

And 102, distributing the rest user behavior data in the specification data to corresponding servers by adopting an annular scheduling method.

In the technical scheme of the data processing method provided by this embodiment, collected user behavior data are aggregated according to a set service policy to form a plurality of standard data, and other user behavior data in the standard data are distributed to corresponding servers by using an annular scheduling method, so that the influence on user experience during server capacity expansion is avoided by using the scheme of this embodiment; by adopting the scheme of the embodiment, when a data hot spot phenomenon occurs, data is overheated or new resources are added, the system efficiency is improved; by adopting the scheme of the embodiment, the newly added server can be dynamically supplemented into the cluster, thereby being beneficial to system expansion and efficiency improvement.

Fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention, as shown in fig. 2, the method includes:

step 201, collecting user behavior data.

In this embodiment, user behavior data may be collected by Deep Packet Inspection (DPI) equipment. In this embodiment, the user behavior data may include a mobile phone number, a source IP, a source port, a destination IP, a destination port, a URL, and time. Wherein, the time is the internet surfing time of the user.

Preferably, a user behavior data set may also be formed according to the collected user behavior data, and the user behavior data set includes a plurality of user behavior data. The user behavior data set may include user behavior data a, user behavior data B, user behavior data C, and the like. For example, each user behavior data includes a mobile phone number, a source IP, a source port, a destination IP, a destination port, a URL, a time. That is, the collected plurality of user behavior data may be set in the form of a user behavior data set.

Step 202, according to the set service policy, aggregating a plurality of standard data of the collected user behavior data, wherein each standard data comprises first dimension data, second dimension data corresponding to the first dimension data, and other user behavior data corresponding to the second dimension data.

In this embodiment, the service policy may be preset. The service strategy is to take the first dimension data as a data merging condition and merge the user behavior data with the same first dimension data to form a plurality of standard data. In this embodiment, the first dimension data is one of the user behavior data, the second dimension data is another of the user behavior data, for example, the first dimension data is a mobile phone number, the second dimension data is time, the second dimension data corresponding to the first dimension data is time, and the remaining user behavior data corresponding to the second dimension data includes a source IP, a source port, a destination IP, a destination port, and a URL.

The method specifically comprises the following steps: and merging the user behavior data with the same first dimension data to form a plurality of specification data. The formed specification data may be represented as a1[ B1 (data 1), B2 (data 2), B3 (data 3) … ], where a1 is first-dimension data, B1, B2, B3 … … are second-dimension data, data 1 is the remaining user behavior data corresponding to second-dimension data B1, data 2 is the remaining user behavior data corresponding to second-dimension data B2, and B3 is the remaining user behavior data corresponding to second-dimension data B3. For example, when a1 is a mobile phone number and B1, B2, and B3 … … are time, each formed specification data is: the mobile phone number [ time 1 (source IP, source port, destination IP, destination port, URL …), time 2 (source IP, source port, destination IP, destination port, URL …) … ].

Preferably, a specification data set may also be generated from the formed plurality of specification data, the specification data set including the plurality of specification data. That is, the formed plurality of specification data may be provided in the form of specification data sets.

And step 203, constructing an annular data template, wherein the size N of the annular data template is a set value.

And constructing the annular data template according to the future cluster size or the maximum size which can be reached. Assume that the size N of the circular data template is 2³²Typically 5-10 times the size of the future cluster is taken to ensure that each server can correspond to a range.

And 204, generating the number of the first dimension data according to the first dimension data, and generating the number of the second dimension data according to the second dimension data.

In this embodiment, the first dimension data may be calculated to generate the number of the first dimension data. Specifically, the first dimension data includes a plurality of bit data, for example, a1, a2, a3 … …, and the numerical value of each bit data of the first dimension data may be added to obtain the number of the first dimension data. Namely: the number ID (a1) of the first-dimension data is a1+ a2+ a3 …, where a1, a2, and a3 … … are numerical values of bit data of the first-dimension data. If the bit data of the first dimension data is a digit, the digit itself is taken as the numerical value of the bit data of the first dimension data; and if the bit data of the first dimension data is a character, taking the ascii code of the character by the numerical value of the bit data of the first dimension data.

In this embodiment, the second dimension data may be calculated to generate the serial number of the second dimension data. Specifically, the second-dimension data includes a plurality of bit data, for example, the plurality of bit data are b1, b2, b3 … …, and the numerical value of each bit data of the second-dimension data may be added to obtain the number of the second-dimension data. Namely: the number ID (B1) of the second-dimensional data is B1+ B2+ B3 …, where B1, B2, and B3 … … are values of bit data of the second-dimensional data. If the bit data of the second dimension data is a digit, the digit itself is taken as the numerical value of the bit data of the second dimension data; and if the bit data of the second-dimension data is a character, taking the ascii code of the character by the numerical value of the bit data of the second-dimension data.

And step 205, generating the number of the server according to the mac address and the IP address of the server.

In this embodiment, the mac address and the IP address of the server may be calculated to generate the number of the server. Specifically, the mac address includes a plurality of bits of data, e.g., c1, c2, c3 … …; the IP address includes multiple bits of data, e.g., d1, d2, d3 … …. The value of each bit of data for the mac address and the value of each bit of data for the IP address may be added to arrive at the number for the server. Namely: the number ID (C1) of the server is C1+ C2+ C3+ … + d1+ d2+ d3 … …. If the bit data of the mac address is a number, the numerical value of the bit data of the mac address is the number itself; and if the bit data of the mac address is a character, the numerical value of the bit data of the mac address takes the ascii code of the character. If the bit data of the IP address is a number, the numerical value of the bit data of the IP address is the number itself; and if the bit data of the IP address is a character, the numerical value of the bit data of the IP address takes the ascii code of the character. In this embodiment, the number of the server is used to identify the server, and preferably, the number of the server is a unique identifier of the server.

And step 206, grouping the servers according to the number of the servers to obtain a plurality of server groups.

Assuming that the number n of servers is 1000, the servers may be divided into 100 server groups.

Fig. 3 is a schematic diagram of a server group according to a second embodiment, as shown in fig. 3, 3 server groups, that is, a server group 1, a server group 2, and a server group 3, are shown in fig. 3. Each server group may include a plurality of servers, which is illustrated in fig. 3 as including 2 servers per server group.

And step 207, generating the range of each server group according to the size of the annular data template, the number of the servers and the number of the servers.

Specifically, the size of the circular data template is divided by the number of servers to obtain the range value of the server group, and then the range of each server group is obtained according to the range value of the server group and the number of the server. Namely: the range value of the server group is N/N10000/1000 10. The scope of the server group is then: [0, 9], [10,19] … …, which is described by taking [0, 9] as an example, where [0, 9] is the range of a server group including a plurality of servers, 0 to 9 are the numbers of the servers included in the server group, and it can be seen from [0, 9] that the server group includes 10 servers.

And 208, generating a first weight value of the server in the server group according to the number of the servers, the size of the annular data template and the number of the servers.

Specifically, the number of the server is divided by the size of the annular data template to obtain a division result, and the division result is divided by the number of the servers to perform remainder operation to obtain a first weight value of the server in the server group. In the process of taking the remainder of the number/N/N of the server, dividing by N is to uniformly deploy the server on the annular data template, and dividing by N is to divide the values mapped on the annular data template into N groups. The first weight value of each server may be used to indicate the priority of the server in the server group, and the higher the first weight value, the higher the priority of the server in the server group.

And 209, generating the range of the server group corresponding to the specification data according to the serial number of the first dimension data of the specification data, the size of the annular data template and the number of the servers.

Specifically, the serial number of the first dimension data is divided by the size of the annular data template to obtain a division result, the division result is divided by the number of the servers to perform remainder operation, a range value of the server corresponding to the specification data is obtained, and then a range of the server group corresponding to the range value of the server is found in the range of the server group according to the range value of the server. Taking specification data a1[ B1 (data 1), B2 (data 2), B3 (data 3) … ] as an example, the range value of the server corresponding to the specification data is obtained by subtracting ID (a1)/N, and if the range value of the server corresponding to the specification data is located between the ranges [ e, f ] of the server groups, for example, if [ e, f ] is [10,19], the range of the server group corresponding to the specification data is [ e, f ].

Step 210, determining that the number of servers included in the range of the server group corresponding to the specification data is greater than or equal to 1, and if the number of servers included in the range of the server group corresponding to the specification data is equal to 1, executing step 211; if greater than 1, go to step 212.

Step 211, the specification data is placed on the server in the range of the server group corresponding to the specification data, and the process is ended.

In this step, since only one server exists in the range of the server group, the remaining user behavior data corresponding to the different second-dimension data in the specification data are all placed on the server.

Step 212, according to the number of the second dimension data and the maximum value in the range of the server group corresponding to the specification data, a second weight value corresponding to the second dimension data is generated.

Specifically, the number of the second dimension data is divided by the maximum value to perform remainder operation, so as to obtain a second weight value. Taking specification data a1[ B1 (data 1), B2 (data 2), B3 (data 3) … ] as an example, for example, a second weight value corresponding to B1 is obtained by adding ID (B1)/f, a second weight value corresponding to B2 is obtained by adding ID (B2)/f, and a second weight value corresponding to B3 is obtained by adding ID (B3)/f.

Step 213, determining a server corresponding to the second dimension data from the range of the server group corresponding to the specification data according to the second weight value and the first weight value corresponding to the second dimension data.

Specifically, step 213 includes:

step 2131, calculating a ratio of the second weight value to the first weight value.

Specifically, the second weight value is divided by the first weight value to obtain a ratio.

And 2132, multiplying the ratio by the maximum value in the range of the server group corresponding to the specification data to obtain a multiplication result.

Step 2133, select the server number with the smallest difference with the multiplication result from the range of the server group.

Step 2134, determining the server corresponding to the number of the server with the smallest difference between the multiplication results as the server corresponding to the second dimension data.

For example, the server corresponding to B1, the server corresponding to B2, the server corresponding to B3, and the like can be determined in this step.

Step 214, placing the rest of user behavior data corresponding to the second dimension data in the specification data on the server corresponding to the second dimension data, and ending the process.

For example, data 1 was placed on the server corresponding to B1, data 2 was placed on the server corresponding to B2, and data 3 was placed on the server corresponding to B3. Thereby achieving load balancing.

Optionally, when the server in this embodiment is subjected to capacity expansion, offline or damage, the following manner may be adopted for processing.

If the server needs to be expanded, step 201 to step 215 of this embodiment may be executed. The number of the server with capacity expansion calculated in step 205 is calculated according to the mac address and the IP address of the server, and the mac address is a random number large enough, so that the servers all over the world are not duplicated, and the servers can be uniformly distributed.

If the server is damaged or is offline, if there are other servers in the server group where the server is located, step 201 to step 215 of this embodiment are executed, and the specification data can be automatically dispatched to other servers of the server group.

If no other server exists in the server group where the server is located, multiplying the number of the first dimension data of the specification data on the damaged or offline server by a set positive integer to form a new number of the first dimension data, for example, setting the positive integer to 2, and then performing subsequent steps 205 to 215 according to the number of the new first dimension data to determine a corresponding server for the specification data; if the corresponding server cannot be determined for the specification data after the above process is performed, the process of multiplying the number of the first dimension data of the specification data on the damaged or offline server by the set positive integer to form a new number of the first dimension data is repeatedly performed, and then the subsequent steps 205 to 215 are performed according to the new number of the first dimension data to determine the corresponding server for the specification data. The data are evenly scheduled by adopting the mode, the balance degree is 1/2 of the original balance degree, and the data service is not influenced.

Fig. 4 is a schematic structural diagram of a data processing system according to a third embodiment of the present invention, as shown in fig. 4, the system includes: the system comprises an acquisition module 1, a service convergence module 2 and a scheduling module 3.

The acquisition module 1 is used for acquiring user behavior data. The service convergence module 2 is configured to converge the collected user behavior data to form a plurality of standard data according to the set service policy, where each standard data includes first dimension data, second dimension data, and other user behavior data corresponding to the second dimension data. The scheduling module 3 is configured to allocate the specification data to corresponding servers by using a ring scheduling method.

Specifically, the scheduling module 3 includes: a first generation submodule 31, a second generation submodule 32, a judgment submodule 33, a third generation submodule 34, a fourth generation submodule 35 and a placement submodule 36.

The first generating submodule 31 is configured to generate a first weight value of the server in the server group according to the number of the servers, the size of the ring data template, and the number of the servers. The second generation submodule 32 is configured to generate a range of a server group corresponding to the specification data according to the number of the first dimension data of the specification data, the size of the annular data template, and the number of servers. The judgment sub-module 33 is configured to judge that the range of the server corresponding to the specification data includes a number of servers greater than or equal to 1. The third generating sub-module 34 is configured to generate a second weight value corresponding to the second dimension data according to the number of the second dimension data and a maximum value in the range of the server group corresponding to the specification data, if the determining sub-module 33 determines that the range of the server group corresponding to the specification data is greater than 1. The fourth generation submodule 35 is configured to determine, according to the second weight value and the first weight value corresponding to the second dimension data, a server corresponding to the second dimension data from the range of the server group corresponding to the specification data. The placement sub-module 36 is configured to place the remaining user behavior data corresponding to the second dimension data in the specification data on the server corresponding to the second dimension data.

In the technical scheme of the data processing system provided in this embodiment, collected user behavior data are aggregated according to a set service policy to form a plurality of specification data, and other user behavior data in the specification data are distributed to corresponding servers by using an annular scheduling method, so that the influence on user experience during server capacity expansion is avoided by using the scheme of this embodiment; by adopting the scheme of the embodiment, when a data hot spot phenomenon occurs, data is overheated or new resources are added, the system efficiency is improved; by adopting the scheme of the embodiment, the newly added server can be dynamically supplemented into the cluster, thereby being beneficial to system expansion and efficiency improvement.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. A data processing method, comprising:

distributing the rest user behavior data in the standard data to corresponding servers by adopting an annular scheduling method;

the allocating the other user behavior data in the normative data to the corresponding servers by adopting the annular scheduling method comprises:

judging that the number of servers included in the range of the server group corresponding to the specification data is greater than or equal to 1;

2. The data processing method of claim 1, wherein the aggregating the collected user behavior data to form a plurality of specification data comprises:

3. The data processing method of claim 1, wherein if the range of the server group corresponding to the specification data is determined to be equal to 1, the specification data is placed on the server in the range of the server group corresponding to the specification data.

4. The data processing method of claim 1, wherein the generating the server before the first weight value in the server group according to the number of the servers, the size of the ring data template, and the number of the servers further comprises:

5. The data processing method of claim 4, wherein the generating the range of each server group according to the size of the ring data template, the number of servers and the number of servers comprises:

6. The data processing method of claim 1, wherein the generating a first weight value of the server in the server group according to the number of the servers, the size of the ring data template and the number of the servers comprises:

7. The data processing method according to claim 1, wherein the generating a range of the server group corresponding to the specification data according to the number of the first dimension data of the specification data, the size of the circular data template, and the number of the servers comprises:

8. The data processing method according to claim 1, wherein the generating a second weight value according to the number of the second dimension data and a maximum value in a range of the server group corresponding to the specification data comprises:

9. The data processing method according to claim 1, wherein the determining, according to the second weight value and the first weight value corresponding to the second dimensional data, the server corresponding to the second dimensional data from the range of the server group corresponding to the specification data includes:

calculating the ratio of the second weight value to the first weight value;

10. A data processing system, comprising:

the acquisition module is used for acquiring user behavior data;

the scheduling module is used for distributing the rest user behavior data in the standard data to the corresponding servers by adopting an annular scheduling method;

the scheduling module includes:

the judgment submodule is used for judging that the number of the servers included in the range of the server group corresponding to the standard data is greater than or equal to 1;