CN115344756A

CN115344756A - Method, system, device and medium for identifying data block in data table

Info

Publication number: CN115344756A
Application number: CN202211007711.2A
Authority: CN
Inventors: 李金鑫; 朱海霞; 邵峰峰; 张坤莲
Original assignee: Ctrip Travel Network Technology Shanghai Co Ltd
Current assignee: Ctrip Travel Network Technology Shanghai Co Ltd
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-11-15

Abstract

The present disclosure provides a method, a system, a device and a medium for identifying data blocks in a data table, which relate to the field of data identification, and the identification comprises the following steps: inquiring effective data bits of a data table to be processed according to a preset direction; when the first effective data bit is inquired, judging the effective data bit as the data of the current data field and pressing the effective data bit into a stack to be detected; sequentially popping the positions to be detected through a stack to be detected, judging whether the adjacent positions of the effective data bits have new effective data bits, and if so, expanding the boundary range of the current data domain according to the adjacent positions with the new effective data bits; and pressing all the unverified data positions on the boundary range of the current data domain into a stack to be detected, and then returning to the steps until the stack to be detected is empty, and outputting the current data domain. The method and the device have the advantages that the data domain can be accurately acquired even under the condition of serious data loss, and the problem of low efficiency of manual operation is solved.

Description

Method, system, device and medium for identifying data block in data table

Technical Field

The present disclosure relates to the field of data identification, and in particular, to a method, a system, a device, and a medium for identifying data blocks in a data table.

Background

At present, many data visualization frames and schemes are derived from a data warehouse under a large data system based on a Hadoop frame, such as open source frames Grafana and SuperSet, the frames can be well compatible with the large data frame system, and a good data visualization effect can be obtained through simple configuration based on a bottom database. However, the open source framework must be deployed to be used, meanwhile, the mobility of data is poor, in more application scenarios, such as data exchange of an industry research report, an academic sharing report, a government report and the like, data display is often performed by using an Office tool, such as PPT, and a popular open source visualization tool cannot play this role.

The technical scheme for solving the current problems is to develop a set of automatic Office report making system based on a data warehouse and periodically generate PPT reports according to templates. The current situation is that analysts are flooded with a large amount of repetitive work, and once the reported data is updated over time, the work is long-term and complex.

Therefore, it is very important to make Office access data warehouse a part of data visualization presentation, but data blocks in the data table are difficult to match with data blocks of source data of the graph or table, so that updated data cannot be accurately inserted into the updated graph or table, and the work efficiency is low due to manual operation. In addition, most of connected domain algorithms based on data matrixes are applied to the image recognition direction, two mainstream algorithms, namely a Two-Pass algorithm and a Seed-Filling algorithm, exist at present, and both algorithms can be effectively applied to common pixel point connected domain analysis but are not applicable to the current application direction.

Disclosure of Invention

The technical problem to be solved by the present disclosure is to provide a method, a system, a device and a medium for identifying a data block in a data table, in order to overcome the defects that the data block in the data table cannot be accurately and effectively identified and the manual operation is inefficient in the prior art.

The technical problem is solved by the following technical scheme:

in a first aspect, a method for identifying a data block in a data table is provided, the method including:

inquiring effective data bits of a data table to be processed according to a preset direction;

when a first effective data bit is inquired, judging that the effective data bit is data of a current data field, and pressing the effective data bit into a stack to be detected;

sequentially popping the positions to be detected through the stacks to be detected, judging whether the adjacent positions of the effective data bits have new effective data bits, and if so, expanding the boundary range of the current data domain according to the adjacent positions with the new effective data bits;

and pressing all unverified data positions on the boundary range of the current data domain into the stack to be detected, and then returning to the step of sequentially popping the positions to be detected through the stack to be detected until the stack to be detected is empty, and outputting the current data domain.

Preferably, the step of outputting the current data field comprises: and continuing to use the data table after the current data field is removed as a data table to be processed, then returning to the step of inquiring the effective data bits of the data table to be processed according to the preset direction, continuing to inquire new effective data bits until all the new effective data bits are inquired, and ending the process.

Preferably, the step of querying the valid data bits of the data table to be processed according to the preset direction specifically includes: and inquiring valid data bits from the starting position of the data table to be processed by rows from left to right.

Preferably, in the step of determining whether the adjacent position of the valid data bit has a new valid data bit, if the coordinate of the valid data bit is (x, y), the coordinate of the adjacent position of the valid data bit is (x-1,y), (x +1,y), (x-1, y + 1), (x, y + 1), and (x +1, y + 1), respectively.

Preferably, the step of expanding the boundary range of the current data field according to the adjacent positions with new valid data bits specifically includes: and comparing the adjacent horizontal and vertical coordinate values with the new effective data bits with the minimum and maximum horizontal and vertical coordinate values of the current data domain respectively, sequentially taking the minimum and maximum values, and expanding the minimum and maximum values into the boundary coordinates of the current data domain.

In a second aspect, a system for identifying data blocks in a data table is provided, the system comprising:

the query module is used for querying the effective data bits of the data table to be processed according to a preset direction;

the stack pushing module is used for judging that the effective data bit is the data of the current data field when the first effective data bit is inquired, and pushing the effective data bit into a stack to be detected;

the expanding module is used for sequentially popping up the position to be detected through the stack to be detected and judging whether the adjacent position of the effective data bit has a new effective data bit, if so, expanding the boundary range of the current data domain according to the adjacent position with the new effective data bit;

and the output module is used for pressing all unverified data positions on the boundary range of the current data domain into the stack to be detected, and then calling the expansion module to output the current data domain until the stack to be detected is empty.

Preferably, the output module is specifically configured to continue using the data table from which the current data field is removed as the data table to be processed, and then invoke the query module to continue querying new valid data bits until all new valid data bits have been queried.

Preferably, the query module is specifically configured to query valid data bits from left to right in rows from a start position of the data table to be processed.

Preferably, if the coordinate of the significant data bit is (x, y), the coordinates of the adjacent position of the significant data bit are (x-1,y), (x +1,y), (x-1, y + 1), (x, y + 1) and (x +1, y + 1), respectively.

Preferably, the expansion module is specifically configured to compare the horizontal and vertical coordinate values of the adjacent position having the new valid data bit with the minimum and maximum horizontal and vertical coordinate values of the current data field, take the minimum value and the maximum value in sequence, and expand the minimum value and the maximum value into the boundary coordinate of the current data field.

In a third aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method for identifying a data block in a data table when executing the computer program.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the method for identifying data blocks in a data table according to any one of the preceding claims.

The positive progress effect of this disclosure lies in: the method provided by the disclosure can effectively identify the data blocks in the data table, can accurately acquire the data domain even under the condition of serious data loss, simultaneously eliminates the low-efficiency operation of manual observation, ensures the stable update of the data blocks in Office to the data warehouse, and lays a foundation for realizing the function of using the Office tool as a visualization tool of the data warehouse.

Drawings

Fig. 1 is a schematic flowchart of a method for identifying a data block in a data table according to embodiment 1 of the present disclosure;

fig. 2 is a diagram of an adjacent position detection position in a method for identifying a data block in a data table according to embodiment 1 of the present disclosure.

Fig. 3 is a schematic diagram illustrating an effect of a method for identifying a data block in a data table according to embodiment 1 of the present disclosure.

FIG. 4 is a diagram illustrating the effect of a conventional connected domain algorithm.

Fig. 5 is a schematic diagram illustrating a first effect of an application example of a method for identifying a data block in a data table according to embodiment 1 of the present disclosure.

Fig. 6 is a schematic diagram illustrating a second effect of an application example of the method for identifying a data block in a data table according to embodiment 1 of the present disclosure.

Fig. 7 is a schematic diagram illustrating a third effect of an application example of the method for identifying a data block in a data table according to embodiment 1 of the present disclosure.

Fig. 8 is a schematic diagram illustrating a fourth effect of an application example of the method for identifying a data block in a data table according to embodiment 1 of the present disclosure.

Fig. 9 is a schematic diagram of a first effect of another application example of a method for identifying a data block in a data table according to embodiment 1 of the present disclosure.

Fig. 10 is a schematic diagram of a second effect of another application example of the identification method for data blocks in a data table according to embodiment 1 of the present disclosure.

Fig. 11 is a schematic diagram of a third effect of another application example of the identification method for data blocks in a data table according to embodiment 1 of the present disclosure.

Fig. 12 is a schematic diagram of a fourth effect of another application example of the identification method for data blocks in a data table according to embodiment 1 of the present disclosure.

Fig. 13 is a schematic diagram illustrating a fifth effect of another application example of the method for identifying a data block in a data table according to embodiment 1 of the present disclosure.

Fig. 14 is a schematic diagram of a sixth effect of another application example of the identification method for data blocks in a data table according to embodiment 1 of the present disclosure.

Fig. 15 is a schematic structural diagram of a system for identifying data blocks in a data table according to embodiment 2 of the present disclosure;

fig. 16 is a schematic structural diagram of an electronic device provided in embodiment 3 of the present disclosure.

Detailed Description

The present disclosure is further illustrated by the following examples, but is not thereby limited to the scope of the examples described.

Example 1

This embodiment provides a method for identifying a data block in a data table, and fig. 1 is a schematic flowchart of a method for identifying a data block in a data table provided in embodiment 1 of the present disclosure, as shown in fig. 1, the method includes:

step 101, inquiring effective data bits of a data table to be processed according to a preset direction.

In this step 101, the valid data bits are looked up starting from the start position of the data table to be processed by rows from left to right.

Wherein the location of the cell having data in the data table is referred to as a valid data bit.

And 102, when the first effective data bit is inquired, judging that the effective data bit is the data of the current data field, and pressing the effective data bit into a stack to be detected.

And 103, sequentially popping the positions to be detected through the stack to be detected, judging whether the adjacent positions of the effective data bits have new effective data bits, and if so, expanding the boundary range of the current data domain according to the adjacent positions with the new effective data bits.

When the boundary range of the data field is expanded, the positions of the adjacent bits need to be located, which is different from a four-adjacent-bit exploration method and an eight-adjacent-bit exploration method in the conventional algorithm, in this embodiment, a five-adjacent-bit exploration method is selected, and if the coordinates of the valid data bits are (x, y), the coordinates of the adjacent bits of the valid data bits are (x-1,y), (x +1,y), (x-1, y + 1), (x, y + 1), and (x +1, y + 1), respectively. As shown in fig. 2, the black position in the drawing is a position to be detected, the gray part is five adjacent positions to be detected of the position to be detected, and if the detected adjacent positions have valid data bits, the coordinates of the position are used to expand the boundary range of the current data field.

In step 103, the horizontal and vertical coordinate values of the adjacent position having the new valid data bit are compared with the minimum and maximum horizontal and vertical coordinate values of the current data field, the minimum and maximum values are sequentially taken, and the minimum and maximum values are expanded to be the boundary coordinates of the current data field.

Comparing the x value of the adjacent position of the new effective data bit with the minimum x value in the current data domain, comparing the y value with the minimum y value in the current data domain, sequentially taking the minimum value, and expanding the minimum value as the position of the upper left corner of the current data domain; and comparing the x value of the adjacent position of the new effective data bit with the maximum x value in the current data domain, comparing the y value with the maximum y value in the current data domain, sequentially taking the maximum value, and expanding the maximum value into the boundary coordinate of the current data domain.

And step 104, pressing all the unverified data positions on the boundary range of the current data domain into a stack to be detected, and then returning to step 103 until the stack to be detected is empty, and outputting the current data domain.

The newly expanded data domain boundary is verified, all unverified data positions on the current data domain boundary are pressed into a stack to be detected, the new data domain boundary is expanded again, interference of data missing is shielded, an accurate data domain is obtained, and when the stack to be detected is empty, all data positions of all data positions in the current data domain are detected, so that the current data domain is output.

In a specific implementation, the step 104 further includes:

and 105, continuing to use the data table without the current data field as a data table to be processed, returning to the step 101, continuing to query new effective data bits until all the new effective data bits are queried, and ending the process.

And continuously taking the data table without the current data field as a data table to be processed, and continuously inquiring new effective data bits, so that all effective data fields can be obtained by traversing once in the data table.

In this embodiment, as shown in fig. 3, the gray block area in the figure is a schematic diagram of the data block obtained by the method, wherein the position (4,6) (i.e., the abscissa value of the position is 4, the ordinate value is 6, and the same applies below) is determined as the data block on the largest left side of the data table, which obviously meets expectations, and also meets the conventional usage means of Excel, and even if the data of the entire data block is seriously lost, all data blocks can still be accurately distinguished. Fig. 4 is a schematic diagram illustrating the effect obtained by the conventional connected component domain detection method, because none of the eight neighboring positions of the (4,6) position have valid data, the result is determined as a data block by the conventional algorithm, which is not expected.

In the present embodiment, as can be seen from fig. 5, in this 5 × 5 matrix, only diagonal data exists, and most of the data is missing. The light gray position is defined as an initial search point of the data field, the black position is an effective data position detected in the previous round and is used for expanding the boundary range of the whole data field, the dark gray position is a detection point which is newly appeared after the whole data field range is currently expanded, and the newly appeared detection points can also enter a stack to be detected and wait for being detected. In fig. 5, the start position (1,1) of the data field is firstly detected by traversal, then the adjacent position of (1,1) is detected, the position (2,2) is detected to have a valid data bit, and (1,1) to (2,2) are set as the valid region of the current data field (i.e. the minimum coordinate value in the region is (1,1) and the maximum coordinate value is (2,2)), and meanwhile, the dark gray region, i.e. the undetected position in the current data field, is pushed into the stack to be detected for detection, so that the whole data field can be successfully obtained as shown in fig. 6, fig. 7 and fig. 8.

In the present embodiment, as shown in fig. 9, the position of the light gray portion (1,1) is the start position of the data table, detection is performed from the current position, and it is detected that the position (2,2) in the five adjacent bits of the position has valid data, and then the position (2,2) is expanded as the boundary coordinates of the current data field. Similarly, FIG. 10 extends the (3,3) position, and FIG. 11 extends the (2,4) position. In fig. 12, when detecting the (3,4) position, it is observed that the two valid data positions of (4,5) and (5,4) are not in the five-adjacent position of any other valid data bit, but when the (1,5) position is determined to be a valid data bit, the y value of the boundary coordinate of the current data field will expand to 5, and the null data point of the dark gray position will also be pushed into the stack to be detected, and when detecting the position (3,5), the position (4,5) will expand to the current data field, as shown in fig. 13. Similarly, as shown in fig. 14, when the position (5,4) is detected, the x value of the boundary coordinate of the current data field is expanded to 5, and at this time, the area in the data field is detected to be a 5 × 5 data matrix, and at this time, all valid data bits in the current data field are detected, and the current data field is output. Fig. 9-14 illustrate the advantages of the present disclosure over conventional algorithms, detailing how the present disclosure identifies valid data bits that are not adjacent bits also into a data block.

The method for identifying the data blocks in the data table replaces a manual operation method, the data areas of the data blocks can be accurately identified even under the condition that the data in the data table are seriously lost, innovation and modification are carried out on a connected domain algorithm, a program is enabled to be more in line with the use condition of a user, the low-efficiency operation of manual observation is eliminated, the stable update of the data blocks in Office to a data warehouse can be ensured, and a cushion is provided for the function that an Office tool is used as a visualization tool of the data warehouse.

Example 2

Fig. 15 is a schematic structural diagram of the identification system for data blocks in a data table provided in this embodiment, and as shown in fig. 15, the system includes:

the query module 1 is used for querying the effective data bits of the data table to be processed according to a preset direction;

the stack pushing module 2 is used for judging that the effective data bit is data of the current data field when the first effective data bit is inquired, and pushing the effective data bit into a stack to be detected;

the expanding module 3 is used for sequentially popping the position to be detected through the stack to be detected, judging whether the adjacent position of the effective data bit has a new effective data bit, and expanding the boundary range of the current data domain according to the adjacent position having the new effective data bit if the adjacent position of the effective data bit has the new effective data bit;

and the output module 4 is used for pressing all the unverified data positions in the boundary range of the current data domain into the stack to be detected, and then calling the expansion module 3 to output the current data domain until the stack to be detected is empty.

In specific implementation, the output module 4 is specifically configured to continue to use the data table from which the current data field is removed as the data table to be processed, and then invoke the query module 1 to continue to query new valid data bits until all new valid data bits have been queried.

In a specific implementation, the query module 1 is specifically configured to query valid data bits from the start position of the data table to be processed, starting from left to right in rows.

In specific implementation, if the coordinate of the valid data bit is (x, y), the coordinates of the adjacent positions of the valid data bit are (x-1,y), (x +1,y), (x-1, y + 1), (x, y + 1), and (x +1, y + 1), respectively.

In specific implementation, the expanding module 3 is specifically configured to compare the abscissa and ordinate values of the adjacent position of the new valid data bit with the minimum and maximum abscissa and ordinate values of the current data domain, respectively, sequentially take the minimum value and the maximum value, and expand the minimum value and the maximum value into the boundary coordinates of the current data domain.

The recognition system of the data blocks in the data table of the embodiment realizes the function of accurately distinguishing the data areas of the data blocks even under the condition of serious data loss in the data table through the mutual matching among the modules, solves the problem of low efficiency of manual operation, ensures the stable update of the data blocks in Office to a data warehouse, and provides a cushion for realizing the function of taking the Office tool as a visualization tool of the data warehouse.

Example 3

Fig. 16 is a schematic structural diagram of an electronic device provided in this embodiment, where the electronic device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the method for identifying a data block in the data table in embodiment 1 is implemented. The electronic device 30 shown in fig. 16 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure. As shown in fig. 16, the electronic device 30 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, and a bus 33 connecting the various system components (including the memory 32 and the processor 31).

The bus 33 includes a data bus, an address bus, and a control bus.

The memory 32 may include volatile memory, such as Random Access Memory (RAM) 321 and/or cache memory 322, and may further include read-only memory (RAM) 323.

Memory 32 may also include a program tool 325 (or utility tool) having a set (at least one) of program modules 324, such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The processor 31 executes various functional applications and data processing, such as the identification method of the data block in the data table in embodiment 1 described above, by running the computer program stored in the memory 32.

The electronic device 30 may also communicate with one or more external devices 34. Such communication may be through input/output (I/O) interfaces 35. Also, the model-generating electronic device 30 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 36. As shown in fig. 16, network adapter 36 communicates with the other modules of electronic device 30 via bus 33. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 30, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Example 4

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for identifying a data block in the data table in embodiment 1 above.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation manner, the present disclosure may also be implemented in the form of a program product, which includes program code for causing a terminal device to execute steps in implementing the method for identifying data blocks in a data table in embodiment 1 described above when the program product runs on the terminal device.

Where program code for carrying out the disclosure is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device, partly on a remote device or entirely on the remote device.

While specific embodiments of the disclosure have been described above, it will be understood by those skilled in the art that this is by way of illustration only, and that the scope of the disclosure is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of this disclosure, and these changes and modifications are intended to be within the scope of this disclosure.

Claims

1. A method for identifying a data block in a data table, the method comprising:

sequentially popping up positions to be detected through the stack to be detected, judging whether the adjacent positions of the effective data bits have new effective data bits, and if so, expanding the boundary range of the current data domain according to the adjacent positions with the new effective data bits;

2. A method for identifying a data block in a data table as claimed in claim 1, wherein said step of outputting the current data field is followed by the steps of: continuously taking the data table without the current data field as a data table to be processed, then returning to the step of inquiring the effective data bits of the data table to be processed according to the preset direction, continuously inquiring new effective data bits until all the new effective data bits are inquired, and ending the process;

and/or the step of querying the valid data bits of the data table to be processed according to the preset direction specifically comprises the following steps: and inquiring valid data bits from the starting position of the data table to be processed by rows from left to right.

3. The method for identifying data blocks in a data table according to claim 1, wherein in the step of determining whether the adjacent position of the valid data bit has a new valid data bit, if the coordinate of the valid data bit is (x, y), the coordinate of the adjacent position of the valid data bit is (x-1,y), (x +1,y), (x-1, y + 1), (x, y + 1) and (x +1, y + 1), respectively.

4. A method as claimed in claim 3, wherein the step of expanding the boundary range of the current data field according to the adjacent bits with new valid data bits comprises: and comparing the adjacent horizontal and vertical coordinate values with new effective data bits with the minimum and maximum horizontal and vertical coordinate values of the current data domain respectively, sequentially taking the minimum value and the maximum value, and expanding the minimum value and the maximum value into the boundary coordinate of the current data domain.

5. A system for identifying blocks in a data table, the system comprising:

the expanding module is used for sequentially popping the position to be detected through the stack to be detected and judging whether the adjacent position of the effective data bit has a new effective data bit, if so, expanding the boundary range of the current data domain according to the adjacent position having the new effective data bit;

and the output module is used for pressing all the unverified data positions in the boundary range of the current data domain into the stack to be detected, and then calling the expansion module to output the current data domain until the stack to be detected is empty.

6. The system for identifying data blocks in a data table according to claim 5, wherein the output module is specifically configured to continue to use the data table after removing the current data field as the data table to be processed, and then invoke the query module to continue querying new valid data bits until all new valid data bits have been queried;

and/or the query module is specifically configured to query the valid data bits from the start position of the to-be-processed data table from left to right in rows.

7. The system for identifying chunks in a spreadsheet as claimed in claim 5, wherein if the coordinates of the significand bit is (x, y), the coordinates of the ortho position of the significand bit are (x-1,y), (x +1,y), (x-1, y + 1), (x, y + 1) and (x +1, y + 1), respectively.

8. The system of claim 7, wherein the expansion module is specifically configured to compare the abscissa and ordinate values of the neighboring positions with the new valid data bits with the minimum and maximum abscissa and ordinate values of the current data field, respectively, take the minimum and maximum values in turn, and expand the minimum and maximum values into the boundary coordinates of the current data field.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for identifying a data block in a data table according to any of claims 1-4 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for identifying a data block in a data table according to any one of claims 1 to 4.