WO2020155757A1 - 柱状图数据转换控制方法、装置、计算机设备及存储介质 - Google Patents

柱状图数据转换控制方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020155757A1
WO2020155757A1 PCT/CN2019/117470 CN2019117470W WO2020155757A1 WO 2020155757 A1 WO2020155757 A1 WO 2020155757A1 CN 2019117470 W CN2019117470 W CN 2019117470W WO 2020155757 A1 WO2020155757 A1 WO 2020155757A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
information
histogram
columnar
text
Prior art date
Application number
PCT/CN2019/117470
Other languages
English (en)
French (fr)
Inventor
孙强
卢波
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020155757A1 publication Critical patent/WO2020155757A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor

Definitions

  • the embodiments of the present application relate to the field of data analysis technology, in particular to a method, device, computer equipment, and storage medium for controlling histogram data conversion.
  • bar charts are often used for data statistics and analysis, such as the analysis and testing of modern electronic products and some software, or quarterly reports on product sales, etc.
  • the bar chart is a type of
  • the length of the rectangle is the statistical report chart of the expression graph of the variable.
  • a series of vertical stripes with different heights represent the data distribution. It is used to compare two or more values (different times or different conditions). There is only one variable.
  • the histogram can also be arranged horizontally or expressed in a multi-dimensional manner.
  • the histogram is an image
  • the data in the histogram is also represented graphically.
  • the data in the histogram cannot be directly read like the data in the database. It is not easy to obtain the data in the histogram.
  • the histogram It is stored in the form of images, which takes up a large storage space and is inconvenient to use.
  • the embodiments of the present application provide a method, device, computer equipment, and storage medium for controlling histogram data conversion that parses histogram into structured data.
  • a technical solution adopted in the embodiment created by this application is to provide a method for controlling the conversion of histogram data, including the following steps:
  • Acquiring image information of at least one columnar target in the target histogram where the image information includes columnar attribute information of the columnar target and object attribute information of an object mapped by the columnar target;
  • the object attribute information and the quantity value information are structurally converted to generate structured target data in the form of key-value pairs.
  • an embodiment of the present application further provides a histogram data conversion control device, including:
  • the first acquisition module is configured to acquire image information of at least one columnar target in the target histogram, where the image information includes columnar attribute information of the columnar target and object attribute information of the object mapped by the columnar target;
  • the first processing module is configured to calculate the quantity value information of the object according to the standardized information preset in the target histogram and the histogram attribute information;
  • the first execution module is used for structurally converting the object attribute information and the quantity value information to generate structured target data in the form of key-value pairs.
  • it also includes:
  • a second acquiring module configured to acquire a text image corresponding to the columnar target in the target histogram
  • the second execution module is configured to recognize the name information of the object mapped by the columnar target according to the text image, wherein the object attribute information includes the name information.
  • it also includes:
  • the first execution sub-module is configured to input the text image into a preset text recognition model, where the text recognition model is a convolutional neural network model trained to convergence for recognizing text in the image;
  • the first obtaining sub-module is used to obtain the name information of the object output by the character recognition model.
  • it also includes:
  • the third acquiring module is configured to acquire statistical value information representing the number of objects in the text image
  • the comparison module is configured to compare the quantity difference between the statistical value information and the quantity value information with a preset comparison threshold
  • the third execution module is configured to replace the quantity value information with the statistical value information when the quantity difference is greater than the comparison threshold.
  • it also includes:
  • the second obtaining submodule is configured to obtain height information representing the height of the columnar target in the target histogram in the columnar attribute information
  • the second execution sub-module is configured to calculate the quantity value information of the object mapped by the column target according to the height information and the standardized information.
  • it also includes:
  • the third obtaining submodule is used to obtain target ordinate information of the highest point of the columnar target in the target histogram
  • the third execution submodule is configured to calculate the height information of the columnar target according to the target ordinate information and the origin ordinate information of the target histogram.
  • an embodiment of the present application further provides a computer device including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the steps of the method for controlling the conversion of the histogram data.
  • the embodiments of the present application also provide a non-volatile computer readable storage medium stored therein.
  • the computer non-volatile readable storage medium is executed by one or more processors, one or A plurality of processors execute the steps of the above-mentioned histogram data conversion control method.
  • the beneficial effect of the embodiment of the present application is that by acquiring image information of multiple columnar targets in the target histogram, the columnar target is a columnar member in the target histogram, and the image information includes columnar attribute information of the columnar target and the columnar target location.
  • the object attribute information of the mapped object is calculated based on the column attribute information of the column target and the standardized information in the target histogram to calculate the quantity value information of the object mapped by the column target, and then the object attribute information and quantity value information of the object Structured processing is performed to convert structured target data in the form of key-value pairs, which can then be stored in a structured database to facilitate data reading and reduce the space occupied by the data.
  • FIG. 1 is a schematic diagram of the basic flow of a method for controlling conversion of histogram data according to an embodiment of the application;
  • FIG. 2 is a schematic diagram of a process of obtaining object attribute information according to an embodiment of the application
  • FIG. 3 is a schematic diagram of the process of recognizing text images according to an embodiment of the application.
  • FIG. 4 is a schematic diagram of the flow of resetting quantity value information according to an embodiment of the application.
  • FIG. 5 is a schematic diagram of a process for calculating the quantity value information of an object according to an embodiment of the application
  • FIG. 6 is a schematic diagram of a process of obtaining height information of a columnar target according to an embodiment of the application.
  • FIG. 7 is a schematic diagram of the basic structure of a histogram data conversion control device according to an embodiment of the application.
  • FIG. 8 is a block diagram of the basic structure of a computer device according to an embodiment of the application.
  • FIG. 1 is a schematic diagram of the basic flow of the method for controlling the conversion of histogram data in this embodiment.
  • a method for controlling histogram data conversion includes the following steps:
  • the column target refers to the data content in the target histogram.
  • the target histogram Take the target histogram as the quarterly sales report of the car as an example, the target histogram
  • each columnar target corresponds to the car sales of a quarter. Since the car sales data of each quarter is different, the image information of each columnar target is also different.
  • the image information includes the columnar target of the columnar target.
  • the columnar attribute information includes the position and height of the columnar target in the target histogram and the number of characterizing objects.
  • the attribute information of the object includes the name of the object and the category to which it belongs. And other information.
  • each column target is set in the target histogram. For example, if there are 30 days in the current month, 30 column targets are set, and each column target represents For one day’s turnover, the column attribute information of each column target refers to the amount information of the current day’s turnover, and the object mapped by the column target refers to the company’s products, such as the histogram of the monthly turnover of sales pens.
  • the object mapped to each column target in the histogram is a pen, and the object attribute information of the object is the name information of the object or the category information of the category, for example, the pen is classified into stationery category.
  • the image information of each columnar target in the target histogram can be obtained by image recognition technology, for example, the columnar attribute information of the columnar target is calculated through image processing.
  • Image processing is a computer analysis of the image The technology to achieve the desired result, for example, by using Adobe Photoshop, Adobe Illustrator or CorelDRAW image processing software application to analyze and process the target histogram to obtain image information of multiple column targets in the target histogram.
  • Standardized information refers to the benchmark used to measure and label the number level of the column target in the target histogram.
  • a scale is set in the ordinate direction of the target histogram, and each scale is the quantity mapped by the standardized information.
  • Information and column attribute information can calculate the quantity value information of the object.
  • the height of the column target in the ordinate direction of the target histogram is equal to 3 scales, that is, the number of objects mapped by the column attribute information of the column target is the standardized information.
  • the number of objects mapped by the columnar target is 150,000, that is, the number of objects corresponding to the columnar target can be calculated based on the standard information and columnar attribute information Information, the quantity value information refers to the specific value of the object.
  • S1300 Structurally transform the object attribute information and the quantity value information to generate structured target data in the form of key-value pairs.
  • the object attribute information and quantity value information of the object are structurally transformed to generate structured target data.
  • the structured target data is in the form of key-value pairs. (Key-Value) storage is the simplest organization form of the database. Among them, the object attribute information is used as the Key and the quantity value information is used as the Value to form the Key-Value structure data.
  • the column attribute information of the column target is different. Specifically, the height of the column target in the histogram can be used to indicate the number of sports shoes sold in the current month.
  • the system first obtains the image information of the three column targets in the column chart, including the image information of each column target.
  • the column attribute information of the column target includes that the column target is in The height in the histogram, specifically, the height of each columnar object can be extracted through image processing, and then the quantity value information of the object can be calculated according to the standardized information preset in the histogram and the columnar attribute information of each columnar object.
  • the standard information is the preset standard used to measure the number of objects in the column target in the histogram.
  • the height of the column chart is divided into 10 standard units of height, and each height is one from bottom to top.
  • the height of the standard unit represents more than 100 pairs of sports shoes sold.
  • the standard unit is the standardized information preset in the histogram. According to the image information of each column target in the histogram, the sports in the first month, February and March can be obtained.
  • the number of shoes sold For example, the height of the column target in January is 4.5 standard units, the column target in the second month is 5.0 standard units, and the column target in the third month is 3.45 standard units.
  • Column attribute information and standardized information respectively calculate the quantitative value information of the object corresponding to each column target, and then perform structural transformation of the object attribute information and quantitative value information of the object to generate structured target data
  • the structured target data is the key value
  • the format, such as the columnar target for the first month, the columnar target for the second month and the columnar target for the third month are respectively structured: sports shoes sales in the first month -450, sports shoes in the second month For sales -500 and sports shoes -345 in the third month
  • the generated structured target data can be stored in a structured database to facilitate direct reading and obtaining of specific data in the histogram.
  • different types of objects are involved in the target histogram.
  • the consumables in the company’s office supplies include printing paper, pens, and erasers, which are used in the company’s annual office supplies.
  • the histogram of consumption statistics includes the first column target, the second column target and the third column target corresponding to the printing paper, pen, and eraser.
  • the image information of the target and the standardized information in the histogram calculate the quantity value information of the printing paper, pen and eraser, and then the printing paper, pen and eraser and their corresponding quantity value information are structurally converted to generate key values
  • the meaning of the generated structured target data is expressed as: the annual consumption of printing paper is 100,000 pieces, the annual consumption of pens is 50,000, and the annual consumption of erasers is -2,000. It can be stored in a structured database to facilitate the reading and storage of data information.
  • This embodiment acquires image information of multiple columnar targets in the target histogram.
  • the columnar targets are columnar members in the target histogram.
  • the image information includes columnar attribute information of the columnar target and object attribute information of the object mapped by the columnar target. , Calculate the quantitative value information of the object mapped by the column target according to the column attribute information of the column target and the standardized information in the target histogram, and then perform structural processing on the object attribute information and quantitative value information of the object to convert
  • the structured target data in the form of key-value pairs can be stored in a structured database to facilitate data reading and reduce the space occupied by the data.
  • FIG. 2 is a schematic diagram of a specific process of obtaining object attribute information in an embodiment of the present application.
  • step 1100 it also includes the following steps:
  • the target histogram will be set with the label information of the columnar target, and the standard information carries the name information of the object mapped by the columnar target. And the specific quantity information of the object, and the label information is represented as a text image corresponding to the column target in the target histogram.
  • the text image is set above the ordinate direction of the column target corresponding to the target histogram, and the text image corresponding to the column target can be obtained by scanning the target histogram.
  • the image text recognition can be realized through OCR, OCR (Optical Character Recognition, optical Character recognition) refers to the process in which electronic devices (such as scanners or digital cameras) check characters printed on paper, determine their shapes by detecting dark and light patterns, and then use character recognition methods to translate the shapes into computer text.
  • OCR Optical Character Recognition, optical Character recognition
  • the name information is added to the object attribute information of the object to improve the accuracy of identifying and obtaining the object attribute information of the object.
  • the character recognition does not require human intervention and improves data conversion s efficiency.
  • FIG. 3 is a schematic diagram of the basic process of recognizing text images in an embodiment of the present application.
  • step 1020 includes the following steps:
  • S1021 input the text image into a preset text recognition model, where the text recognition model is a convolutional neural network model trained to convergence for recognizing text in the image;
  • the text image After acquiring the text image, the text image can be input into a text recognition model, and the text recognition model performs image text recognition.
  • the text recognition model is a convolutional nerve trained to converge for recognizing text in the image Network model.
  • the convolutional neural network model After inputting the text image into the text recognition model, the convolutional neural network model recognizes the text image and enters the text in the text image. Since the text image corresponds to the column target, the output of the text recognition model is the column target
  • an LSTM network Long Short-Term Memory
  • the LSTM network uses "gates” to control the discarding or adding of information, thereby achieving the function of forgetting or memory.
  • Gate is a structure that allows information to pass through selectively, consisting of a sigmoid (S-shaped growth curve) function and a dot multiplication operation.
  • the output value of the sigmoid function is in the interval [0,1], 0 means completely discarded, and 1 means completely passed.
  • the neural network model trained to convergence has a recognition classifier that can recognize text information in text images.
  • the text recognition model includes the above-mentioned neural network model.
  • the neural network model includes N+1 recognition classifiers, and N is positive Integer.
  • the classification result of each word of the text image in the recognition classifier is obtained, where the classification result includes the text classification corresponding to the text image and the confidence level of the text classification (Confidence).
  • the confidence of the text classification means that after the text image is filtered and classified by the text recognition model, the text image is classified into more than one text classification and the percentage of the text image in the text classification is obtained . Since the final word information corresponding to the word in the text image is one type, it is necessary to compare the confidence of each text classification of the same text image. For example, the information carried by the text image is "notebook computer" and is classified into the electronic device computer The confidence of is 0.95, and the confidence of being classified into stationery notebook is 0.75.
  • the two confidence levels are compared with a preset first threshold, and when the confidence level is greater than the preset first threshold, it is confirmed that the character classification result represented by the confidence level is the name information of the object.
  • the preset first threshold is generally set to a value between 0.9 and 1.
  • the text information with a confidence level greater than a preset first threshold is selected as the final text classification result, that is, the text information represented by the confidence level is confirmed as the name information of the object. For example, when the preset first threshold is 0.9, and the information carried by the text image is "pen", the confidence of being classified as stationery is 0.95. Since 0.95>0.9, the emotional information of "pen” is happy.
  • FIG. 4 is a schematic diagram of the basic flow of resetting the quantity value information in an embodiment of the present application.
  • step S1020 it further includes the following steps:
  • the text image corresponding to the column target also carries the specific quantity information of the objects mapped by the column target.
  • the image text can be used to recognize the statistical value information in the text image through OCR, so as to obtain the number of objects mapped by the column target.
  • Numerical information is carried in the target histogram, which can accurately determine the specific number of objects mapped by the column target.
  • the text image corresponding to the column target includes the words "2.5 million”.
  • the text image can be recognized by OCR to know the corresponding column target.
  • the number of objects is 2.5 million.
  • the comparison threshold is set for comparison, where the comparison threshold is a preset number value in the system. During implementation, the comparison threshold can also be set by the user himself, so as to meet the user's use needs.
  • the quantity value information is replaced with statistical value information, because the number of objects represented by the quantity value information is based on the column attribute information of the column target in the target histogram and the target histogram The standardized information in the calculation is obtained, and there may be errors in the calculated results that make the quantitative value information inaccurate.
  • the quantitative difference between the statistical value information and the quantitative value information By comparing the quantitative difference between the statistical value information and the quantitative value information, when the quantitative difference is less than the comparison threshold (for example, 2 , 3 or 5), the quantitative difference between the statistical value information and the quantitative value information can be ignored, continue to use the quantitative value information and the object quantitative information to structure conversion to generate structured target data; and when the quantitative difference is greater than the comparison
  • the quantity value information is replaced with statistical value information, and the structured target data is generated using the structured conversion between the statistical value information and the object quantity information.
  • the comparison threshold for example, 2 , 3 or 5
  • the quantitative difference between the statistical value information and the quantitative value information can be ignored, continue to use the quantitative value information and the object quantitative information to structure conversion to generate structured target data; and when the quantitative difference is greater than the comparison
  • thresholding the quantity value information is replaced with statistical value information, and the structured target data is generated using the structured conversion between the statistical value information and the object quantity information.
  • other methods can also be used.
  • the quantitative value information is replaced with statistical value information, and the structured target data is generated using the structured conversion of the statistical value information and the object quantity information, and
  • the quantity difference is greater than the comparison threshold, continue to use the quantity value information and the object quantity information to structure conversion to generate structured target data.
  • FIG. 5 is a schematic diagram of a basic process of calculating the quantity value information of an object in an embodiment of the present application.
  • step 1200 includes the following steps:
  • the column attribute of the column target carries height information, and the height information is used to characterize the height of the column target in the target histogram, that is, the column attribute information of the column target includes the height information of the column target in the histogram. Specifically, The height of the columnar target is extracted through image processing. During implementation, please refer to FIG. 6.
  • FIG. 6 is a schematic diagram of a specific process of obtaining height information of the columnar target in an embodiment of the present application.
  • step S1210 includes the following steps:
  • S1211 Obtain target ordinate information of the highest point of the columnar target in the target histogram
  • Each column target in the target histogram has a long column shape, and each column target is arranged in order in the abscissa direction of the target histogram, and the height of the target histogram is different in the ordinate direction, because the objects mapped by the column target
  • the inconsistency of the number of, the height of the column target in the target histogram will also be inconsistent.
  • the height of the column target in the target histogram is proportional to the number of objects mapped by the column target, that is, the more the number of objects, the higher the height of the column target. High, during implementation, image processing technology can be used to obtain the target ordinate information of the highest point of the columnar target in the target histogram.
  • Image processing refers to the technology that uses a computer to analyze the image to achieve the desired result.
  • OCR or OpenCV is used to realize the image character recognition and positioning of the target histogram, so as to obtain the target ordinate information of the highest point of the column target.
  • S1212 Calculate the height information of the columnar target according to the target ordinate information and the origin ordinate information of the target histogram.
  • the origin ordinate information refers to the ordinate of the origin in the target histogram.
  • the origin of the target histogram is the starting point of the abscissa and ordinate.
  • extending the abscissa to the right from the origin as the starting point is the increment of the abscissa
  • extending upward from the origin is the increasing direction of the ordinate, that is, the origin is expressed as (0, 0).
  • the height information of the cylindrical target can be calculated by the target ordinate information and the origin ordinate information of the cylindrical target.
  • the target ordinate information of the columnar target is 640
  • the origin ordinate information is 0, that is, the height of the columnar target is 640.
  • S1220 Calculate quantity value information of the object mapped by the columnar target according to the height information and the standardized information.
  • the quantity value information of the objects mapped by the columnar target can be calculated according to the height information and the standardized information preset in the target histogram.
  • the standard information is preset in the histogram The standard used to measure the number of objects in the column target. Take the target column chart showing the grade students’ test scores as an example. The height of the target column chart is divided into 10 standard units of height, and the score of each standard unit is 10 points. And increase from bottom to top, that is, the highest score is 100 points and the lowest score is 0 points.
  • the standard unit score is the preset standardized information in the target histogram, and the image information of each column target in the histogram can be used by students For example, the height of the column target corresponding to the first student is 9.5 standard units, and the height of the column target corresponding to the second student is 9.8 standard units.
  • the system obtains the first student and the second student The corresponding column height information in the target histogram, and then calculate the first student’s grade and the second student’s grade according to the height information and standardized information, and then the object attributes of the first student and the second student Information and quantitative value information are structured to generate structured target data, that is, the names of the first student and the second student are structured with their own grades respectively to generate target structured data in the form of key-value pairs, such as the first
  • the names of one student and the second student are Zhang San and Li Si respectively, the resultant data generated is: Zhang San -95 points, Li Si -98 points, the generated structured target data can be stored and structured In the database, it is convenient to directly read and obtain the specific data in the histogram.
  • an embodiment of the present application also provides a histogram data conversion control device.
  • FIG. 7 is a schematic diagram of the basic structure of the histogram data conversion control device of this embodiment.
  • a histogram data conversion control device includes: a first acquisition module 2100, a first processing module 2200, and a first execution module 2300, wherein the first acquisition module 2100 is used to acquire at least Image information of a columnar target, wherein the image information includes columnar attribute information of the columnar target and object attribute information of the object mapped by the columnar target; the first processing module 2200 is configured to predict according to the target histogram The standardized information and the columnar attribute information are calculated to generate the quantitative value information of the object; the first execution module 2300 is configured to structurally transform the object attribute information and the quantitative value information to generate a key-value pair form Structured target data.
  • This embodiment acquires image information of multiple columnar targets in the target histogram.
  • the columnar targets are columnar members in the target histogram.
  • the image information includes columnar attribute information of the columnar target and object attribute information of the object mapped by the columnar target. , Calculate the quantitative value information of the object mapped by the column target according to the column attribute information of the column target and the standardized information in the target histogram, and then perform structural processing on the object attribute information and quantitative value information of the object to convert
  • the structured target data in the form of key-value pairs can be stored in a structured database to facilitate data reading and reduce the space occupied by the data.
  • the histogram data conversion control device further includes: a second acquisition module and a second execution module, wherein the second acquisition module is configured to acquire a text image corresponding to the column target in the target histogram; The second execution module is configured to recognize the name information of the object mapped by the columnar target according to the text image, wherein the object attribute information includes the name information.
  • the histogram data conversion control device further includes: a first execution sub-module and a first acquisition sub-module, wherein the first execution sub-module is used to input the text image into a preset text recognition model , Wherein the character recognition model is a convolutional neural network model trained to convergence for recognizing characters in an image; the first obtaining submodule is used to obtain the name information of the object output by the character recognition model.
  • the histogram data conversion control device further includes: a third acquisition module, a comparison module, and a third execution module, wherein the third acquisition module is used to acquire statistics representing the number of objects in the text image Numerical value information; the comparison module is used to compare the quantity difference between the statistical value information and the quantity value information with a preset comparison threshold; the third execution module is used for when the quantity difference is greater than all When comparing the threshold, replace the quantitative value information with the statistical value information.
  • the histogram data conversion control device further includes: a second acquisition sub-module and a second execution sub-module, wherein the second acquisition sub-module is used to acquire the column-shaped attribute information that indicates that the columnar target is The height information of the height in the target histogram; the second execution sub-module is configured to calculate the quantity value information of the objects mapped by the column target according to the height information and the standardized information.
  • the histogram data conversion control device further includes: a third acquisition submodule and a third execution submodule, wherein the third acquisition submodule is used to acquire the highest value of the histogram target in the target histogram.
  • the target ordinate information of the point; the third execution submodule is used to calculate the height information of the columnar target according to the target ordinate information and the origin ordinate information of the target histogram.
  • FIG. 8 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device includes a processor, a non-volatile storage medium, a memory, and a network interface connected through a system bus.
  • the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences.
  • the processor can implement a A control method of histogram data conversion.
  • the processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment.
  • a computer readable instruction may be stored in the memory of the computer device, and when the computer readable instruction is executed by the processor, the processor may execute a method for controlling the conversion of histogram data.
  • the network interface of the computer device is used to connect and communicate with the terminal.
  • the processor is configured to execute the first acquisition module 2100, the first processing module 2200, and the first execution module 2300 in FIG. 7, and the memory stores program codes and various data required to execute the above modules.
  • the network interface is used for data transmission between user terminals or servers.
  • the memory in this embodiment stores the program codes and data required to execute all sub-modules in the histogram data conversion control device, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.
  • the computer obtains image information of multiple columnar targets in the target histogram.
  • the columnar target is the columnar member in the target histogram.
  • the image information includes columnar attribute information of the columnar target and object attribute information of the object mapped by the columnar target.
  • the column attribute information of the column target and the standardized information in the target histogram calculate the quantitative value information of the object mapped by the column target, and then structure the object attribute information and quantitative value information of the object to convert it into a key
  • the structured target data in the form of value pairs can then be stored in a structured database to facilitate the reading of the data and reduce the space occupied by the data.
  • the present application also provides a storage medium storing computer-readable instructions, which when executed by one or more processors, cause one or more processors to execute the histogram data described in any of the above embodiments Steps to switch control methods.
  • the computer program can be stored in a computer readable storage medium. At this time, it may include the procedures of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

一种柱状图数据转换控制方法、装置、计算机设备及存储介质,包括下述步骤:获取目标柱状图中至少一个柱状目标的图像信息,其中,图像信息包括柱状目标的柱状属性信息以及柱状目标所映射对象的对象属性信息(S1100);根据目标柱状图中预设的标准化信息和柱状属性信息进行计算生成对象的数量值信息(S1200);将对象属性信息以及数量值信息进行结构化转换生成键值对形式的结构化目标数据(S1300)。该方法通过获取目标柱状图中的多个柱状目标的柱状属性信息和对象属性信息,然后计算出对象的数量值信息,然后将对象属性信息和数量值信息进行结构化转换成结构化目标数据,进而可以存储于结构化数据库中,方便数据的读取,且减少数据占用的空间。

Description

柱状图数据转换控制方法、装置、计算机设备及存储介质
本申请要求于2019年01月28日提交中国专利局、申请号为201910079912.5、申请名称为“柱状图数据转换控制方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及数据分析技术领域,尤其是一种柱状图数据转换控制方法、装置、计算机设备及存储介质。
背景技术
在工作和生活中,经常需要使用到柱状图用于进行数据的统计和分析,比如现代的电子产品和一些软件的分析测试或者产品销量的季度报告等,柱状图(bar chart)是一种以长方形的长度为变量的表达图形的统计报告图,由一系列高度不等的纵向条纹表示数据分布的情况,用来比较两个或以上的价值(不同时间或者不同条件),只有一个变量,通常利用于较小的数据集分析,在具体的使用过程中,柱状图亦可横向排列,或用多维方式表达。
但是,由于柱状图是图像,导致柱状图中的数据也是通过图形的方式表示的,柱状图中数据不能像数据库中的数据一样直接被读取,不易获取柱状图中的数据,而且,柱状图采用图像的形式进行存储,占用存储空间大,使用不方便。
发明内容
本申请实施例提供一种将柱状图解析为结构化数据的柱状图数据转换控制方法、装置、计算机设备及存储介质。
为解决上述技术问题,本申请创造的实施例采用的一个技术方案是:提供一种柱状图数据转换控制方法,包括下述步骤:
获取目标柱状图中至少一个柱状目标的图像信息,其中,所述图像信息包括所述柱状目标的柱状属性信息以及所述柱状目标所映射对象的对象属性信息;
根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息;
将所述对象属性信息以及所述数量值信息进行结构化转换生成键值对形式的结构化目标数据。
为解决上述技术问题,本申请实施例还提供一种柱状图数据转换控制装置,包括:
第一获取模块,用于获取目标柱状图中至少一个柱状目标的图像信息,其中,所述图像信息包括所述柱状目标的柱状属性信息以及所述柱状目标所映射对象的对象属性信息;
第一处理模块,用于根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息;
第一执行模块,用于将所述对象属性信息以及所述数量值信息进行结构化转换生成键值对形式的结构化目标数据。
可选地,还包括:
第二获取模块,用于获取所述目标柱状图中与所述柱状目标对应的文字图像;
第二执行模块,用于根据所述文字图像识别所述柱状目标所映射对象的名称信息,其中,所述对象属性信息包括所述名称信息。
可选地,还包括:
第一执行子模块,用于将所述文字图像输入至预设的文字识别模型中,其中,所述文字识别模型为训练至收敛的用于识别图像中文字的卷积神经网络模型;
第一获取子模块,用于获取所述文字识别模型输出的所述对象的名称信息。
可选地,还包括:
第三获取模块,用于获取所述文字图像中表征所述对象数量的统计数值信息;
比对模块,用于将所述统计数值信息与所述数量值信息之间的数量差值与预设的对比阈值进行比对;
第三执行模块,用于当所述数量差值大于所述对比阈值时,将所述数量值信息替换成所述统计数值信息。
可选地,还包括:
第二获取子模块,用于获取所述柱状属性信息中表征所述柱状目标在所述目标柱状图中高度的高度信息;
第二执行子模块,用于根据所述高度信息和所述标准化信息计算所述柱状目标所映射的对象的数量值信息。
可选地,还包括:
第三获取子模块,用于获取所述柱状目标在所述目标柱状图中的最高点的目标纵坐标信息;
第三执行子模块,用于根据所述目标纵坐标信息和所述目标柱状图的原点纵坐标信息计算出所述柱状目标的高度信息。
为解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行上述柱状图数据转换控制方法的步骤。
为解决上述技术问题,本申请实施例还提供一种存储有计算机非易失性可读存储介质,所述计算机非易失性可读存储介质被一个或多个处理器执行时,使得一个或多个处理器执行上述柱状图数据转换控制方法的步骤。
本申请实施例的有益效果为:通过获取目标柱状图中的多个柱状目标的图像信息,柱状目标是目标柱状图中的柱状成员,该图像信息中包括柱状目标的柱状属性信息和柱状目标所映射的对象的对象属性信息,根据该柱状目标的柱状属性信息和目标柱状图中的标准化信息计算出柱状目标所映射的对象的数量值信息,然后根据将该对象的对象属性信息和数量值信息进行结构化处理,从而转换成键值对形式的结构化目标数据,进而可以存储于结构化数据库中,方便数据的读取,且减少数据占用的空间。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图进行说明。
图1为本申请实施例柱状图数据转换控制方法的基本流程示意图;
图2为本申请实施例获取对象属性信息的流程示意图;
图3为本申请实施例识别文字图像的流程示意图;
图4为本申请实施例重置数量值信息的流程示意图;
图5为本申请实施例计算对象的数量值信息的流程示意图;
图6为本申请实施例获取柱状目标的高度信息的流程示意图;
图7为本申请实施例柱状图数据转换控制装置基本结构示意图;
图8为本申请实施例计算机设备基本结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图进行描述。
实施例1
具体请参阅图1,图1为本实施例柱状图数据转换控制方法的基本流程示意图。
如图1所示,一种柱状图数据转换控制方法,包括下述步骤:
S1100、获取目标柱状图中至少一个柱状目标的图像信息,其中,所述图像信息包括所述柱状目标的柱状属性信息以及所述柱状目标所映射对象的对象属性信息;
将柱状图转换成结构化数据需要先获取目标柱状图中各个柱状目标的图像信息,柱状目标是指目标柱状图中的各个数据内容,以目标柱状图为汽车季度销量报告为例,目标柱状图中设置有四个柱状目标,每个柱状目标对应一个季度的汽车销量,由于每个季度的汽车销量的数据不相同,所以各个柱状目标的图像信息也不相同,图像信息中包括柱状目标的柱状属性信息和该柱状目标所映射的对象的属性信息,柱状属性信息包括该柱状目标在目标柱状图中的位置、高度以及表征对象的数量等信息,对象的属性信息包括对象的名称以及所属的类别等信息。
在一个实施例中,以目标柱状图为公司月度营业额报告为例,该目标柱状图中设置有多个柱状目标,例如当月有30天,则设置有30个柱状目标,每个柱状目标代表一天的营业额,则每个柱状目标的柱状属性信息是指当天营业额的额度信息,该柱状目标所映射的对象是指公司销售的产品,例如销售钢笔的月度营业额的柱状图,则该柱状图中每个柱状目标所映射的对象就是钢笔,对象的对象属性信息为该对象的名称信息或者所属分类的类别信息,例如将钢笔划分为文具类。在实施时,目标柱状图中各个柱状目标的图像信息可以通过图像识别技术获取得到,例如通过图像处理计算得到柱状目标的柱状属性信息,图像处理(image processing)是一种用计算机对图像进行分析以达到所需结果的技术,例如通过使用Adobe Photoshop、Adobe Illustrator或者CorelDRAW图像处理软件应用对目标柱状图进行分析处理以获取目标柱状图中多个柱状 目标的图像信息。
S1200、根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息;
在获取目标柱状图中柱状目标的图像信息后,根据该目标柱状图中预设的标准化信息和柱状目标的柱状属性信息进行计算,以生成柱状目标所映射的对象的数量值信息,在实施时,标准化信息是指目标柱状图中用于衡量和标注柱状目标的数量等级的基准,例如在目标柱状图的纵坐标方向上设置有刻度,每一刻度就是标准化信息所映射的数量,根据该标准化信息和柱状属性信息即可计算出对象的数量值信息,例如柱状目标在目标柱状图中纵坐标方向上的高度等于3个刻度,即柱状目标的柱状属性信息所映射对象的数量为标准化信息所映射的数量的3倍,以标准化信息表示为5万为例,该柱状目标所映射对象的数量为15万,即可以根据该标准信息和柱状属性信息计算出该柱状目标对应的对象的数量值信息,该数量值信息是指对象的具体数值。
S1300、将所述对象属性信息以及所述数量值信息进行结构化转换生成键值对形式的结构化目标数据。
在计算出柱状目标所映射的对象的数量值信息后,将该对象的对象属性信息和数量值信息进行结构化转换生成结构化目标数据,该结构化目标数据为键值对形式,键值对(Key-Value)存储是数据库最简单的组织形式,其中,对象属性信息作为Key而数量值信息作为Value形成Key-Value结构数据,在实施时,以商场第一季度的运动鞋销量柱状图(目标柱状图)为例,该柱状图中包括3个柱状目标,分别为一月份柱状目标、二月份柱状目标和三月份柱状目标,一月份柱状目标、二月份柱状目标和三月份柱状目标分别表示一月份、二月份和三月份当月运动鞋的售出数量,例如一月份售出运动鞋为450双,二月份售出运动鞋为500双,三月份售出运动鞋为345双,则每个柱状目标的柱状属性信息不同,具体地,可以通过柱状目标的在柱状图中的高度表示当月运动鞋的销售数量,系统首先获取柱状图中3个柱状目标的图像信息,包括每个柱状目标的柱状属性信息和柱状目标所映射对象的对象属性信息,其中,柱状目标所映射的对象为运动鞋,该对象的对象属性信息为运动鞋的名称或者所属类别,例如在该柱状图中对象的对象属性信息为“XX运动鞋YY月份销量”或者“运动用品YY月份销量”,其中“XX”表示运动鞋的品牌名称,“YY”表示具体的月份,柱状目标的柱状属性信息包括该柱状目标在柱状图中的高度,具体地, 可以通过图像处理提取每个柱状目标的高度,然后根据柱状图中预设的标准化信息和该各个柱状目标的柱状属性信息计算出对象的数量值信息,在实施时,标准信息是柱状图中预设的用于衡量柱状目标中对象数量的标准,举例说明:在柱状图的纵坐标方向上分为10个标准单位的高度,且从下往上每高一个标准单位的高度代表运动鞋的售出数量多100双,标准单位是柱状图中预设的标准化信息,则根据柱状图中各个柱状目标的图像信息可以获取第一月份、二月份和三月份运动鞋的售出数量,例如一月份柱状目标的高度为4.5个标准单位的高度,第二月份柱状目标为5.0个标准单位的高度,第三月份柱状目标为3.45个标准单位的高度,系统再根据柱状属性信息和标准化信息分别计算出各个柱状目标对应的对象的数量值信息,然后将该对象的对象属性信息和数量值信息进行结构化转换以生成结构化目标数据,结构化目标数据为键值对形式,例如第一月份柱状目标、第二月份柱状目标和第三月份柱状目标分别进行结构化转换:运动鞋第一月份销量-450、运动鞋第二月份销量-500和运动鞋第三月份销量-345,生成的结构化目标数据能够存储与结构化数据库中,方便直接读取和获取柱状图中的具体数据。
在一个实施例中,在目标柱状图中涉及到不同种类的对象,以公司办公用消耗品为例,公司办公用品中的消耗品包括打印纸、笔以及橡皮擦等,在公司的年度办公用品消耗统计的柱状图中,包括分别与打印纸、笔和橡皮擦对应的第一柱状目标、第二柱状目标和第三柱状目标,通过获取柱状图中各个柱状目标的图像信息,并根据各个柱状目标的图像信息和该柱状图中的标准化信息计算出打印纸、笔和橡皮擦的数量值信息,然后将打印纸、笔和橡皮擦和其分别对应的数量值信息进行结构化转换生成键值对形式的结构化目标数据,例如生成的结构化目标数据的意思表达为:打印纸年度消耗数量-10万张,笔年度消耗数量-5万支,橡皮擦年度消耗数量-2000块。从而可以存储到结构化数据库中,方便数据信息的读取和存储。
本实施例通过获取目标柱状图中的多个柱状目标的图像信息,柱状目标是目标柱状图中的柱状成员,该图像信息中包括柱状目标的柱状属性信息和柱状目标所映射对象的对象属性信息,根据该柱状目标的柱状属性信息和目标柱状图中的标准化信息计算出柱状目标所映射的对象的数量值信息,然后根据将该对象的对象属性信息和数量值信息进行结构化处理,从而转换成键值对形式的结构化目标数据,进而可以存储于结构化数据库中,方便数据的读取,且减少 数据占用的空间。
在一个可选实施例中,请参阅图2,图2是本申请一个实施例获取对象属性信息的具体流程示意图。
如图2所示,步骤1100之前,还包括如下述步骤:
S1010、获取所述目标柱状图中与所述柱状目标对应的文字图像;
在获取柱状目标的图像信息之前,还可以先获取柱状目标对应的文字图像,在实施时,目标柱状图中会设置有柱状目标的标注信息,该标准信息携带有柱状目标所映射对象的名称信息以及该对象的具体数量信息,该标注信息表现为目标柱状图中与柱状目标对应的文字图像。具体地,文字图像会设置于目标柱状图对应柱状目标纵坐标方向的上方,通过对目标柱状图进行扫描即可获取与柱状目标相对应的文字图像。
S1020、根据所述文字图像识别所述柱状目标所映射对象的名称信息,其中,所述对象属性信息包括所述名称信息。
在获取柱状目标对应的文字图像后,识别该文字图形以获取与文字图像相对应的柱状目标所映射对象的名称信息,在实施时,可以通过OCR实现图片文字识别,OCR(Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程。识别到柱状目标所映射对象的名称信息后,将该名称信息添加到对象的对象属性信息中,提高对对象的对象属性信息识别和获取的精度,文字识别不需要人工的介入参与,提高数据转换的效率。
在另一个可选实施例中,请参阅图3,图3是本申请一个实施例识别文字图像的基本流程示意图。
如图3所示,步骤1020包括如下述步骤:
S1021、将所述文字图像输入至预设的文字识别模型中,其中,所述文字识别模型为训练至收敛的用于识别图像中文字的卷积神经网络模型;
在获取文字图像后,可以将该文字图像输入至文字识别模型中,由该文字识别模型进行图片文字识别,在实施时,文字识别模型为训练至收敛的用于识别图像中文字的卷积神经网络模型。
S1022、获取所述文字识别模型输出的所述对象的名称信息。
将文字图像输入至文字识别模型中后,卷积神经网络模型对该文字图像进 行识别并输入文字图像中的文字,由于该文字图像是与柱状目标对应的,所以文字识别模型输出的是柱状目标所映射对象的名称信息,在本实施方式中,可以使用LSTM网络(长短期记忆人工神经网络模型,Long Short-Term Memory)作为神经网络模型。LSTM网络通过“门”(gate)来控制丢弃或者增加信息,从而实现遗忘或记忆的功能。“门”是一种使信息选择性通过的结构,由一个sigmoid(S型生长曲线)函数和一个点乘操作组成。sigmoid函数的输出值在[0,1]区间,0代表完全丢弃,1代表完全通过。训练至收敛的神经网络模型具备了能识别文字图像中文字信息的识别分类器,其中,文字识别模型包括上述的神经网络模型,该神经网络模型包括了N+1个识别分类器,N为正整数。
具体地,通过将文字图像输入到预设的文字识别模型中,得到文字图像的每个词语在识别分类器中的分类结果,其中,分类结果包括文字图像对应的文字分类和文字分类的置信度(Confidence)。
获取上述的分类结果,其中,文字分类的置信度是指文字图像经过文字识别模型进行筛选分类后,文字图像被归类到一种以上的文字分类以及得到文字图像占该文字分类的百分值。由于最终得到文字图像中词语对应的文字信息为一种,故需要将同一文字图像的各个文字分类的置信度进行比较,例如,文字图像携带的信息为“笔记本电脑”,被分类到电子设备计算机的置信度为0.95,被分类到文具笔记本的置信度为0.75。
两置信度与预设的第一阈值进行比对,当所述置信度大于预设第一阈值时,确认所述置信度所表征的文字分类结果为所述对象的名称信息。预设第一阈值一般设置为0.9到1之间的数值。通过筛选出置信度大于预设第一阈值的文字信息作为最终的文字分类结果,即确认置信度所表征的文字信息为对象的名称信息。例如,当预设第一阈值为0.9时,并且文字图像携带的信息为“钢笔”,被分类到文具的置信度为0.95,由于0.95>0.9,所以“钢笔”的情绪信息为开心。
通过将文字图像输入到预设的文字识别模型中,并获取文字识别模型输出的文字图像的文字分类的置信度,当置信度大于预设第一阈值时,确认置信度所表征的文字分类结果为对象的名称信息,从而提高了识别文字图像中文字的准确度。
在一个可选实施例中,请参阅图4,图4是本申请一个实施例重置数量值信息的基本流程示意图。
如图4所示,步骤S1020之后,还包括如下述步骤:
S1030、获取所述文字图像中表征所述对象数量的统计数值信息;
柱状目标对应的文字图像还携带有柱状目标所映射对象的具体数量信息,在实施时,可以通过OCR实现图片文字识别文字图像中的统计数值信息,从而获取柱状目标所映射对象的数量,该统计数值信息是目标柱状图中携带的,可以准确确定柱状目标所映射对象的具体数量,例如与柱状目标对应的文字图像包括“250万”字样,通过OCR识别文字图像即可知道该柱状目标对应的对象的数量为250万。
S1040、将所述统计数值信息与所述数量值信息之间的数量差值与预设的对比阈值进行比对;
获取统计数值信息后,将该统计数值信息与根据柱状目标计算得到的数量值信息进行计算,以计算出统计数值信息与数量值信息之间的数量的差值,然后将该数量差值与预设的对比阈值进行比对,其中对比阈值是系统中预设的数量值,在实施时,该对比阈值还可以由用户自己进行设置,从而满足用户的使用需求。
S1050、当所述数量差值大于所述对比阈值时,将所述数量值信息替换成所述统计数值信息。
当该数量差值大于预设的对比阈值时,将该数量值信息替换成统计数值信息,由于数量值信息所表征的对象数量是根据目标柱状图中柱状目标的柱状属性信息和该目标柱状图中的标准化信息计算得到,计算的结果可能存在误差使得数量值信息不准确,通过将统计数值信息和数量值信息之间的数量差值进行比较,当该数量差值小于比对阈值(例如2、3或者5)时,统计值信息和数量值信息之间的数量差距可以忽略不计,继续使用数量值信息与对象数量信息信息结构化转换生成结构化目标数据;而当数量差值大于比对阈值时,将数量值信息替换成统计值信息,使用统计值信息与对象数量信息信息结构化转换生成结构化目标数据。当然,还可以采用其它的方式,例如当该数量差值小于比对阈值时,将数量值信息替换成统计值信息,使用统计值信息与对象数量信息信息结构化转换生成结构化目标数据,而当数量差值大于比对阈值时,继续使用数量值信息与对象数量信息信息结构化转换生成结构化目标数据。
在一个可选实施例中,请参阅图5,图5是本申请一个实施例计算对象的数量值信息的基本流程示意图。
如图5所示,步骤1200包括如下述步骤:
S1210、获取所述柱状属性信息中表征所述柱状目标在所述目标柱状图中高度的高度信息;
柱状目标的柱状属性中携带有高度信息,该高度信息用于表征柱状目标在目标柱状图中的高度,即柱状目标的柱状属性信息包括该柱状目标在柱状图中的高度信息,具体地,可以通过图像处理提取柱状目标的高度,在实施时,请参与图6,图6是本申请一个实施例获取柱状目标的高度信息的具体流程示意图。
如图6所示,步骤S1210包括如下述步骤:
S1211、获取所述柱状目标在所述目标柱状图中的最高点的目标纵坐标信息;
目标柱状图中的各柱状目标均成长条柱状形态,且各柱状目标在目标柱状图的横坐标方向上依次排列,而在目标柱状图的纵坐标方向上高度不一,由于柱状目标所映射对象的数量不一致,将导致柱状目标在目标柱状图中的高度也不一致,柱状目标在目标柱状图中的高度与柱状目标所映射对象的数量成正比,即对象的数量越多则柱状目标的高度越高,在实施时,可以通过图像处理技术获取目标柱状图中柱状目标的最高点的目标纵坐标信息,图像处理(image processing)是指用计算机对图像进行分析,以达到所需结果的技术,例如通过OCR或者OpenCV实现目标柱状图的图片文字识别和定位,从而获取柱状目标最高点的目标纵坐标信息。
S1212、根据所述目标纵坐标信息和所述目标柱状图的原点纵坐标信息计算出所述柱状目标的高度信息。
原点纵坐标信息是指在目标柱状图中原点的纵坐标,在实施时,目标柱状图的原点为横纵坐标的起点,一般情况下,以原点为起点向右反向延伸就是横坐标的递增反向,以原点为起点向上延伸就是纵坐标的递增方向,即原点表示为(0,0),通过将柱状目标的目标纵坐标信息和原点纵坐标信息即可计算出柱状目标的高度信息,例如柱状目标的最高点的坐标为(100,640),则该柱状目标的目标纵坐标信息为640,原点纵坐标信息为0,即该柱状目标的高度为640。通过根据坐标信息计算柱状目标的高度,能有效提高获取柱状目标所映射对象的数量的精度。
S1220、根据所述高度信息和所述标准化信息计算所述柱状目标所映射的对象的数量值信息。
在获取柱状目标的高度信息后,即可根据该高度信息和目标柱状图中预设的标准化信息计算出柱状目标所映射对象的数量值信息,在实施时,标准信息是柱状图中预设的用于衡量柱状目标中对象数量的标准,以目标柱状图表示年级学生考试成绩为例,在目标柱状图的纵坐标方向上分为10个标准单位的高度,每个标准单位的分数为10分且从下往上依次递增,即最高的为100分而最低的为0分,标准单位的分数是目标柱状图中预设的标准化信息,则根据柱状图中各个柱状目标的图像信息可以学生的成绩,例如第一个学生对应的柱状目标的高度为9.5个标准单位的高度,第二个学生对应的柱状目标的高度为9.8个标准单位的高度,系统获取第一个学生和第二个学生对应的柱状在目标柱状图中的高度信息,然后根据高度信息和标准化信息分别计算出第一个学生的成绩和第二个学生的成绩,然后将第一个学生和第二个学生的对象属性信息和数量值信息进行结构化转换以生成结构化目标数据,即将第一个学生和第二个学生的名字分别与自己的成绩进行结构化处理生成键值对形式的目标结构化数据,例如第一个学生和第二个学生的名字分别为张三和李四,则生成的目标结果化数据为:张三-95分,李四-98分,生成的结构化目标数据能够存储与结构化数据库中,方便直接读取和获取柱状图中的具体数据。
为解决上述技术问题,本申请实施例还提供一种柱状图数据转换控制装置。
具体请参阅图7,图7为本实施例柱状图数据转换控制装置基本结构示意图。
如图7所示,一种柱状图数据转换控制装置,包括:第一获取模块2100、第一处理模块2200和第一执行模块2300,其中,第一获取模块2100用于获取目标柱状图中至少一个柱状目标的图像信息,其中,所述图像信息包括所述柱状目标的柱状属性信息以及所述柱状目标所映射对象的对象属性信息;第一处理模块2200用于根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息;第一执行模块2300用于将所述对象属性信息以及所述数量值信息进行结构化转换生成键值对形式的结构化目标数据。
本实施例通过获取目标柱状图中的多个柱状目标的图像信息,柱状目标是目标柱状图中的柱状成员,该图像信息中包括柱状目标的柱状属性信息和柱状目标所映射对象的对象属性信息,根据该柱状目标的柱状属性信息和目标柱状图中的标准化信息计算出柱状目标所映射的对象的数量值信息,然后根据将该 对象的对象属性信息和数量值信息进行结构化处理,从而转换成键值对形式的结构化目标数据,进而可以存储于结构化数据库中,方便数据的读取,且减少数据占用的空间。
在一些实施方式中,柱状图数据转换控制装置还包括:第二获取模块和第二执行模块,其中,第二获取模块用于获取所述目标柱状图中与所述柱状目标对应的文字图像;第二执行模块用于根据所述文字图像识别所述柱状目标所映射对象的名称信息,其中,所述对象属性信息包括所述名称信息。
在一些实施方式中,柱状图数据转换控制装置还包括:第一执行子模块和第一获取子模块,其中,第一执行子模块用于将所述文字图像输入至预设的文字识别模型中,其中,所述文字识别模型为训练至收敛的用于识别图像中文字的卷积神经网络模型;第一获取子模块用于获取所述文字识别模型输出的所述对象的名称信息。
在一些实施方式中,柱状图数据转换控制装置还包括:第三获取模块、比对模块和第三执行模块,其中,第三获取模块用于获取所述文字图像中表征所述对象数量的统计数值信息;比对模块用于将所述统计数值信息与所述数量值信息之间的数量差值与预设的对比阈值进行比对;第三执行模块用于当所述数量差值大于所述对比阈值时,将所述数量值信息替换成所述统计数值信息。
在一些实施方式中,柱状图数据转换控制装置还包括:第二获取子模块和第二执行子模块,其中,第二获取子模块用于获取所述柱状属性信息中表征所述柱状目标在所述目标柱状图中高度的高度信息;第二执行子模块用于根据所述高度信息和所述标准化信息计算所述柱状目标所映射的对象的数量值信息。
在一些实施方式中,柱状图数据转换控制装置还包括:第三获取子模块和第三执行子模块,其中,第三获取子模块用于获取所述柱状目标在所述目标柱状图中的最高点的目标纵坐标信息;第三执行子模块用于根据所述目标纵坐标信息和所述目标柱状图的原点纵坐标信息计算出所述柱状目标的高度信息。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图8,图8为本实施例计算机设备基本结构框图。
如图8所示,计算机设备的内部结构示意图。如图8所示,该计算机设备包括通过系统总线连接的处理器、非易失性存储介质、存储器和网络接口。其 中,该计算机设备的非易失性存储介质存储有操作系统、数据库和计算机可读指令,数据库中可存储有控件信息序列,该计算机可读指令被处理器执行时,可使得处理器实现一种柱状图数据转换控制方法。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该计算机设备的存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种柱状图数据转换控制方法。该计算机设备的网络接口用于与终端连接通信。本领域技术人员可以理解,图中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
本实施方式中处理器用于执行图7中第一获取模块2100、第一处理模块2200和第一执行模块2300,存储器存储有执行上述模块所需的程序代码和各类数据。网络接口用于向用户终端或服务器之间的数据传输。本实施方式中的存储器存储有柱状图数据转换控制装置中执行所有子模块所需的程序代码及数据,服务器能够调用服务器的程序代码及数据执行所有子模块的功能。
计算机通过获取目标柱状图中的多个柱状目标的图像信息,柱状目标是目标柱状图中的柱状成员,该图像信息中包括柱状目标的柱状属性信息和柱状目标所映射对象的对象属性信息,根据该柱状目标的柱状属性信息和目标柱状图中的标准化信息计算出柱状目标所映射的对象的数量值信息,然后根据将该对象的对象属性信息和数量值信息进行结构化处理,从而转换成键值对形式的结构化目标数据,进而可以存储于结构化数据库中,方便数据的读取,且减少数据占用的空间。
本申请还提供一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述任一实施例所述柱状图数据转换控制方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
以上所述仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (20)

  1. 一种柱状图数据转换控制方法,其特征在于,包括下述步骤:
    获取目标柱状图中至少一个柱状目标的图像信息,其中,所述图像信息包括所述柱状目标的柱状属性信息以及所述柱状目标所映射对象的对象属性信息;
    根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息;
    将所述对象属性信息以及所述数量值信息进行结构化转换生成键值对形式的结构化目标数据。
  2. 根据权利要求1所述的柱状图数据转换控制方法,其特征在于,所述获取目标柱状图中至少一个柱状目标的图像信息的步骤之前,还包括如下述步骤:
    获取所述目标柱状图中与所述柱状目标对应的文字图像;
    根据所述文字图像识别所述柱状目标所映射对象的名称信息,其中,所述对象属性信息包括所述名称信息。
  3. 根据权利要求2所述的柱状图数据转换控制方法,其特征在于,所述根据所述文字图像识别所述柱状目标所映射的对象的名称信息的步骤,包括如下述步骤:
    将所述文字图像输入至预设的文字识别模型中,其中,所述文字识别模型为训练至收敛的用于识别图像中文字的卷积神经网络模型;
    获取所述文字识别模型输出的所述对象的名称信息。
  4. 根据权利要求2所述的柱状图数据转换控制方法,其特征在于,所述根据所述文字图像识别所述柱状目标所映射的对象的名称信息的步骤之后,还包括如下述步骤:
    获取所述文字图像中表征所述对象数量的统计数值信息;
    将所述统计数值信息与所述数量值信息之间的数量差值与预设的对比阈值进行比对;
    当所述数量差值大于所述对比阈值时,将所述数量值信息替换成所述统计数值信息。
  5. 根据权利要求1所述的柱状图数据转换控制方法,其特征在于,所述根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息的步骤,包括如下述步骤:
    获取所述柱状属性信息中表征所述柱状目标在所述目标柱状图中高度的高度信息;
    根据所述高度信息和所述标准化信息计算所述柱状目标所映射的对象的数量值信息。
  6. 根据权利要求5所述的柱状图数据转换控制方法,其特征在于,所述获取所述柱状属性信息中表征所述柱状目标在所述目标柱状图中高度的高度信息的步骤,包括如下述步骤:
    获取所述柱状目标在所述目标柱状图中的最高点的目标纵坐标信息;
    根据所述目标纵坐标信息和所述目标柱状图的原点纵坐标信息计算出所述柱状目标的高度信息。
  7. 根据权利要求3所述的柱状图数据转换控制方法,其特征在于,所述将所述文字图像输入至预设的文字识别模型中,获取所述文字识别模型输出的所述对象的名称信息,包括:
    将所述文字图像输入到预设的文字识别模型中,得到文字图像的每个词语在识别分类器中的分类结果,其中,所述分类结果包括文字图像对应的文字分类和文字分类的置信度;
    将所述置信度与预设的第一阈值进行比对,当所述置信度大于所述第一阈值时,确认所述置信度所表征的文字分类结果为所述对象的名称信息。
  8. 一种柱状图数据转换控制装置,其特征在于,包括:
    第一获取模块,用于获取目标柱状图中至少一个柱状目标的图像信息,其中,所述图像信息包括所述柱状目标的柱状属性信息以及所述柱状目标所映射对象的对象属性信息;
    第一处理模块,用于根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息;
    第一执行模块,用于将所述对象属性信息以及所述数量值信息进行结构化转换生成键值对形式的结构化目标数据。
  9. 根据权利要求8所述的柱状图数据转换控制装置,其特征在于,还包括:
    第二获取模块,用于获取所述目标柱状图中与所述柱状目标对应的文字图像;
    第二执行模块,用于根据所述文字图像识别所述柱状目标所映射对象的名称信息,其中,所述对象属性信息包括所述名称信息。
  10. 根据权利要求9所述的柱状图数据转换控制装置,其特征在于,所述柱状图数据转换控制装置还包括:第一执行子模块和第一获取子模块,
    所述第一执行子模块,用于将所述文字图像输入至预设的文字识别模型中,其中,所述文字识别模型为训练至收敛的用于识别图像中文字的卷积神经网络模型;
    所述第一获取子模块,用于获取所述文字识别模型输出的所述对象的名称信息。
  11. 根据权利要求9所述的柱状图数据转换控制装置,其特征在于,所述柱状图数据转换控制装置还包括:第三获取模块、比对模块和第三执行模块,
    所述第三获取模块,用于获取所述文字图像中表征所述对象数量的统计数值信息;
    所述比对模块,用于将所述统计数值信息与所述数量值信息之间的数量差值与预设的对比阈值进行比对;
    所述第三执行模块,用于当所述数量差值大于所述对比阈值时,将所述数量值信息替换成所述统计数值信息。
  12. 根据权利要求8所述的柱状图数据转换控制装置,其特征在于,所述柱状图数据转换控制装置还包括:第二获取子模块和第二执行子模块,
    所述第二获取子模块,用于获取所述柱状属性信息中表征所述柱状目标在所述目标柱状图中高度的高度信息;
    所述第二执行子模块,用于根据所述高度信息和所述标准化信息计算所述柱状目标所映射的对象的数量值信息。
  13. 根据权利要求12所述的柱状图数据转换控制装置,其特征在于,所述柱状图数据转换控制装置还包括:第三获取子模块和第三执行子模块,
    所述第三获取子模块,用于获取所述柱状目标在所述目标柱状图中的最高点的目标纵坐标信息;
    所述第三执行子模块,用于根据所述目标纵坐标信息和所述目标柱状图的原点纵坐标信息计算出所述柱状目标的高度信息。
  14. 根据权利要求10所述的柱状图数据转换控制装置,其特征在于,
    所述第一获取子模块,具体用于得到文字图像的每个词语在识别分类器中的分类结果,其中,所述分类结果包括文字图像对应的文字分类和文字分类的置信度;将所述置信度与预设的第一阈值进行比对,当所述置信度大于所述第 一阈值时,确认所述置信度所表征的文字分类结果为所述对象的名称信息。
  15. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    获取目标柱状图中至少一个柱状目标的图像信息,其中,所述图像信息包括所述柱状目标的柱状属性信息以及所述柱状目标所映射对象的对象属性信息;
    根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息;
    将所述对象属性信息以及所述数量值信息进行结构化转换生成键值对形式的结构化目标数据。
  16. 根据权利要求15所述的计算机设备,其特征在于,所述处理器在执行所述获取目标柱状图中至少一个柱状目标的图像信息的步骤之前,还执行如下述步骤:
    获取所述目标柱状图中与所述柱状目标对应的文字图像;
    根据所述文字图像识别所述柱状目标所映射对象的名称信息,其中,所述对象属性信息包括所述名称信息。
  17. 根据权利要求16所述的计算机设备,其特征在于,所述处理器在执行所述根据所述文字图像识别所述柱状目标所映射的对象的名称信息时,具体执行以下步骤:
    将所述文字图像输入至预设的文字识别模型中,其中,所述文字识别模型为训练至收敛的用于识别图像中文字的卷积神经网络模型;
    获取所述文字识别模型输出的所述对象的名称信息。
  18. 根据权利要求16所述的计算机设备,其特征在于,所述处理器在执行所述根据所述文字图像识别所述柱状目标所映射的对象的名称信息的步骤之后,还执行以下步骤:
    获取所述文字图像中表征所述对象数量的统计数值信息;
    将所述统计数值信息与所述数量值信息之间的数量差值与预设的对比阈值进行比对;
    当所述数量差值大于所述对比阈值时,将所述数量值信息替换成所述统计数值信息。
  19. 根据权利要求15所述的计算机设备,其特征在于,所述处理器在执行所述根据所述目标柱状图中预设的标准化信息和所述柱状属性信息进行计算生成所述对象的数量值信息的步骤时,具体执行以下步骤:
    获取所述柱状属性信息中表征所述柱状目标在所述目标柱状图中高度的高度信息;
    根据所述高度信息和所述标准化信息计算所述柱状目标所映射的对象的数量值信息。
  20. 一种存储有计算机非易失性可读存储介质,所述计算机非易失性可读存储介质被一个或多个处理器执行时,使得一个或多个处理器执行如权利要求1至7中任一项权利要求所述柱状图数据转换控制方法的步骤。
PCT/CN2019/117470 2019-01-28 2019-11-12 柱状图数据转换控制方法、装置、计算机设备及存储介质 WO2020155757A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910079912.5 2019-01-28
CN201910079912.5A CN109840278A (zh) 2019-01-28 2019-01-28 柱状图数据转换控制方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020155757A1 true WO2020155757A1 (zh) 2020-08-06

Family

ID=66884234

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117470 WO2020155757A1 (zh) 2019-01-28 2019-11-12 柱状图数据转换控制方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN109840278A (zh)
WO (1) WO2020155757A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840278A (zh) * 2019-01-28 2019-06-04 平安科技(深圳)有限公司 柱状图数据转换控制方法、装置、计算机设备及存储介质
CN110210455B (zh) * 2019-06-18 2022-03-01 石家庄捷弘科技有限公司 一种打印内容格式化提取方法
CN110688363B (zh) * 2019-09-02 2023-07-21 中国平安人寿保险股份有限公司 一种数据的标准化处理方法及系统、电子设备及存储介质
CN111143544B (zh) * 2019-12-23 2023-06-16 中南大学 一种基于神经网络的柱形图信息提取方法及装置
CN112101237A (zh) * 2020-09-17 2020-12-18 新华智云科技有限公司 一种柱状图数据提取和转化方法
CN112269828A (zh) * 2020-11-18 2021-01-26 网易(杭州)网络有限公司 数据生成方法、装置和电子设备
CN114143446A (zh) * 2021-10-20 2022-03-04 深圳航天智慧城市系统技术研究院有限公司 基于边缘计算的柱状图识别方法、系统、存储介质及设备
CN115205859A (zh) * 2022-09-13 2022-10-18 通联数据股份公司 用于将位图解析为结构化数据的方法、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090245640A1 (en) * 2008-03-31 2009-10-01 Jilin Li Image determination apparatus, image search apparatus and a recording medium on which an image search program is recorded
CN108399386A (zh) * 2018-02-26 2018-08-14 阿博茨德(北京)科技有限公司 饼图中的信息提取方法及装置
CN108416377A (zh) * 2018-02-26 2018-08-17 阿博茨德(北京)科技有限公司 柱状图中的信息提取方法及装置
CN108446717A (zh) * 2018-02-07 2018-08-24 苏州工业大数据创新中心有限公司 一种基于图像识别的机台状态采集方法及系统
CN109840278A (zh) * 2019-01-28 2019-06-04 平安科技(深圳)有限公司 柱状图数据转换控制方法、装置、计算机设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050471B (zh) * 2014-05-27 2017-02-01 华中科技大学 一种自然场景文字检测方法及系统
CN106934386B (zh) * 2017-03-30 2019-06-25 湖南师范大学 一种基于自启发式策略的自然场景文字检测方法及系统
US10726252B2 (en) * 2017-05-17 2020-07-28 Tab2Ex Llc Method of digitizing and extracting meaning from graphic objects
CN107578457A (zh) * 2017-08-21 2018-01-12 中云开源数据技术(上海)有限公司 一种套叠柱状图的可视化系统及其显示方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090245640A1 (en) * 2008-03-31 2009-10-01 Jilin Li Image determination apparatus, image search apparatus and a recording medium on which an image search program is recorded
CN108446717A (zh) * 2018-02-07 2018-08-24 苏州工业大数据创新中心有限公司 一种基于图像识别的机台状态采集方法及系统
CN108399386A (zh) * 2018-02-26 2018-08-14 阿博茨德(北京)科技有限公司 饼图中的信息提取方法及装置
CN108416377A (zh) * 2018-02-26 2018-08-17 阿博茨德(北京)科技有限公司 柱状图中的信息提取方法及装置
CN109840278A (zh) * 2019-01-28 2019-06-04 平安科技(深圳)有限公司 柱状图数据转换控制方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN109840278A (zh) 2019-06-04

Similar Documents

Publication Publication Date Title
WO2020155757A1 (zh) 柱状图数据转换控制方法、装置、计算机设备及存储介质
US11244208B2 (en) Two-dimensional document processing
US11714841B2 (en) Systems and methods for processing a natural language query in data tables
US10915788B2 (en) Optical character recognition using end-to-end deep learning
US20220004878A1 (en) Systems and methods for synthetic document and data generation
CN108427953A (zh) 一种文字识别方法及装置
US20240012846A1 (en) Systems and methods for parsing log files using classification and a plurality of neural networks
CN112035653A (zh) 一种政策关键信息提取方法和装置、存储介质、电子设备
CN111459967A (zh) 结构化查询语句生成方法、装置、电子设备及介质
CN110162754B (zh) 一种岗位描述文档的生成方法及设备
CN112270604B (zh) 信息结构化处理方法、装置及计算机可读存储介质
CN112036295B (zh) 票据图像处理方法、装置、存储介质及电子设备
US11341319B2 (en) Visual data mapping
US20210350068A1 (en) Descriptive insight generation and presentation system
CN110853739A (zh) 图像管理显示方法、装置、计算机设备及存储介质
CN110378516B (zh) 分析师画像生成方法、装置、设备及计算机可读存储介质
CN108369647B (zh) 基于图像的质量控制
CN111966600A (zh) 网页测试方法、装置、计算机设备及计算机可读存储介质
US20230023636A1 (en) Methods and systems for preparing unstructured data for statistical analysis using electronic characters
CN112270350A (zh) 组织机构的画像方法、装置、设备及存储介质
EP3640861A1 (en) Systems and methods for parsing log files using classification and a plurality of neural networks
CN112257400B (zh) 表格数据提取方法、装置、计算机设备和存储介质
CN115205877A (zh) 一种不规则排版发票单据布局预测方法、装置及存储介质
CN115617790A (zh) 数据仓库创建方法、电子设备及存储介质
CN113779231A (zh) 基于知识图谱的大数据可视化分析方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19912667

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19912667

Country of ref document: EP

Kind code of ref document: A1