CN112214620A - Information query method and device, chart processing method and electronic equipment - Google Patents

Information query method and device, chart processing method and electronic equipment Download PDF

Info

Publication number
CN112214620A
CN112214620A CN202011024153.1A CN202011024153A CN112214620A CN 112214620 A CN112214620 A CN 112214620A CN 202011024153 A CN202011024153 A CN 202011024153A CN 112214620 A CN112214620 A CN 112214620A
Authority
CN
China
Prior art keywords
information
chart
graph
sub
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011024153.1A
Other languages
Chinese (zh)
Inventor
冯博豪
庞敏辉
谢国斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011024153.1A priority Critical patent/CN112214620A/en
Publication of CN112214620A publication Critical patent/CN112214620A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses an information query method, an information query device, a chart processing method, electronic equipment and a storage medium, which relate to computer vision, voice technology and deep learning and comprise the following steps: the method comprises the steps of obtaining audio information of a graph in a query picture format, wherein the audio information carries query intentions and is used for querying target information corresponding to the query intentions in the graph, and determining the target information corresponding to the audio information according to preset graph information corresponding to the graph, wherein the graph information is generated by determining a significant graph of the graph and fusing and analyzing context information of the graph based on the significant graph, and the target information is output.

Description

Information query method and device, chart processing method and electronic equipment
Technical Field
The present disclosure relates to image processing technologies and artificial intelligence technologies, and in particular, to computer vision, speech technologies and deep learning, and in particular, to an information query method, apparatus, graph processing method, electronic device, and storage medium.
Background
With the development of digitization, papery files, such as corporate financial statements, annual sales files, stock market trend files, and the like, are stored digitally, and these files generally include charts.
In the prior art, corresponding information is mainly obtained from a chart in a manual mode, for example, a user selects a corresponding chart from a digitally stored file, and analyzes the chart to obtain information expressed in the chart, such as which month in annual sales lines is the highest and accounts for the percentage of the total, and the like.
Disclosure of Invention
An information query method, an information query device, a chart processing method, electronic equipment and a storage medium for quickly and conveniently acquiring related information in a chart are provided.
According to a first aspect, there is provided an information query method, comprising:
acquiring audio information of a chart in a query picture format, wherein the audio information carries a query intention and is used for querying target information corresponding to the query intention in the chart;
determining target information corresponding to the audio information according to preset graph information corresponding to the graph, wherein the graph information is generated by determining a saliency map of the graph, and fusing and analyzing context information of the graph based on the saliency map;
and outputting the target information.
In the embodiment of the application, the user can acquire the relevant information of the chart in a voice question-answering mode, so that the intellectualization and automation of information query are realized, and the technical effects of improving the query efficiency and accuracy are achieved.
According to a second aspect, an embodiment of the present application provides an information query apparatus, including:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring audio information of a chart in a query picture format, the audio information carries a query intention, and the audio information is used for querying target information corresponding to the query intention in the chart;
the first determining module is used for determining target information corresponding to the audio information according to preset graph information corresponding to the graph, wherein the graph information is generated by determining a significant graph of the graph, and fusing and analyzing context information of the graph based on the significant graph;
and the output module is used for outputting the target information.
According to a third aspect, an embodiment of the present application provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as in any one of the embodiments above.
According to a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method according to any one of the above embodiments.
According to a fifth aspect, an embodiment of the present application provides a chart processing method, including:
acquiring a chart of a picture format to be processed;
determining a saliency map of said graph;
and fusing and analyzing the context information of the chart based on the saliency map to generate chart information corresponding to the chart, wherein the chart information is used for information query.
The application provides an information query method, an information query device, a chart processing method, an electronic device and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining audio information of a chart in a query picture format, wherein the audio information carries query intentions and is used for querying target information corresponding to the query intentions in the chart, and determining the target information corresponding to the audio information according to preset chart information corresponding to the chart, wherein the chart information is generated by determining a significant graph of the chart and fusing and analyzing context information of the chart based on the significant graph, and the target information is output, on one hand, the embodiment of the application provides a method for obtaining relevant information of the chart in a question and answer mode, so that a user does not need to observe and analyze the chart, a background of a worker of a server does not need to analyze the chart, the labor cost is saved, the problems of low efficiency and low accuracy caused by analyzing the chart in a manual mode are avoided, and the intelligence and automation of information query are realized, the efficiency and the reliability of information query are improved; on the other hand, by generating the chart information corresponding to the chart from two dimensions, the sufficiency and comprehensiveness of the description of the chart information on the chart can be realized, so that the technical effects of the accuracy and reliability of the query result (namely the target information) are realized, and the query experience of the user is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic flow chart illustrating an information query method according to an embodiment of the present application;
fig. 2 is a schematic diagram of an application scenario of an information query method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating an information query method according to another embodiment of the present application;
FIG. 4 is a pictorial representation including a chart in accordance with an embodiment of the present application;
FIG. 5 is a schematic diagram of a pie-type chart of an embodiment of the present application;
FIG. 6 is a diagram illustrating a chart with a point diagram as a category attribute according to an embodiment of the present application;
FIG. 7 is a diagram illustrating a chart with a category attribute of a line graph according to an embodiment of the present application;
FIG. 8 is a flowchart illustrating an information query method according to another embodiment of the present application;
FIG. 9 is a flowchart illustrating an information query method according to another embodiment of the present application;
FIG. 10 is a diagram of an information query device according to an embodiment of the present application;
FIG. 11 is a diagram illustrating an information query device according to another embodiment of the present application;
FIG. 12 is a block diagram of an electronic device of an embodiment of the present application;
fig. 13 is a flowchart illustrating a chart processing method according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below with reference to the accompanying drawings, in which various details of the embodiments of the application are included to assist understanding, and which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the embodiments of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The embodiment of the application provides an information query method, which can be applied to a scene of querying related information in a chart. For example, for a chart included in a corporate financial statement, a user may query relevant information in the chart by using the information query method according to the embodiment of the present application; for another example, for a chart included in a report of a stock market quotation, a user may also query relevant information in the chart by using the information query method according to the embodiment of the present application; for another example, for a graph of the product library spot inspection yield of a company in a certain time period (for example, the last half year), a user may also query relevant information in the graph by using the information query method according to the embodiment of the present application, and so on, which are not listed here one by one.
In the related art, a user mainly depends on a manual mode to inquire related information in a chart, for example, the user observes and analyzes the chart to obtain the related information expected to be obtained by the user; or, by consulting the staff online, the staff analyzes the chart and feeds back the relevant information expected to be acquired by the user to the user.
However, the efficiency of analyzing the chart manually is low, and especially when the chart is large and the data is complex, the reliability of the result fed back manually is low.
The inventor of the application obtains the inventive concept of the application through creative work: by generating the graph information from the global dimension and the local dimension and when receiving the audio information initiated by the user and used for inquiring the information, determining and feeding back the related information corresponding to the audio information from the graph information to the user, the user can inquire the related information which the user wants to acquire by a voice question-and-answer mode.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The application provides an information query method, which is applied to computer vision, deep learning and voice technologies in the technical field of computers and artificial intelligence so as to query relevant information in a chart in a man-machine interaction mode.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an information query method according to an embodiment of the present application.
As shown in fig. 1, the method includes:
s101: and acquiring audio information of the graph in the query picture format, wherein the audio information carries query intentions and is used for querying target information corresponding to the query intentions in the graph.
The execution main body of the embodiment may be an information query device, and the information query device may be a terminal device, a processor, a chip, and a server (including a local server and a cloud server); or a system composed of a terminal device and a server, etc., and the embodiment is not limited.
Specifically, when the information inquiry apparatus is a terminal device, the terminal device may be a mobile terminal, such as a mobile phone (or referred to as a "cellular" phone) and a computer having a mobile terminal, for example, a portable, pocket, hand-held, computer-included or vehicle-mounted mobile apparatus, which exchanges language and/or data with a radio access network; the terminal device may also be a Personal Communication Service (PCS) phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA), a tablet computer, a Wireless modem (modem), a handheld device (handset), a laptop computer (laptop computer), a Machine Type Communication (MTC) terminal, or the like; the Terminal Device may also be referred to as a system, a Subscriber Unit (Subscriber Unit), a Subscriber Station (Subscriber Station), a Mobile Station (Mobile), a Remote Station (Remote Station), a Remote Terminal (Remote Terminal), an Access Terminal (Access Terminal), a User Terminal (User Terminal), a User Agent (User Agent), a User Device or User Equipment, etc., and is not limited herein.
Wherein the query intent may be understood as a user's desire for at least a portion of the information obtained from the chart.
Based on the analysis, the method of the embodiment of the application can be applied to application scenarios of information query of various charts. This step is now exemplarily described in connection with the application scenario shown in fig. 2.
As shown in fig. 2, the application scenario includes a system composed of the terminal device 100 and the server 200, and the terminal device 100 and the server 200 are connected through a network. The Network includes, but is not limited to, the internet, an intranet, a local area Network, a Block-chain-Based Service Network (BSN), a mobile communication Network, and a combination thereof.
The terminal device 100 may include a display, and the display may display a chart, a file including the chart, and the like. As shown in FIG. 2, the display may display a chart of the finished product library spot check yield for 1 month to 6 months of a company's division.
The Display may be used to represent a device for displaying a video, such as a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) Display, an Organic Light Emitting Display (OLED) Display, and the like, and the embodiments of the present application are not limited thereto.
The terminal device 100 may further include an audio component, and the user may initiate audio information for inquiring which month is the month of the 6 months in which the month with the lowest qualification rate is, to the server 200, and the server 200 may feed back to the user that the month with the lowest qualification rate is month 3 (for a specific implementation, see the following description).
It should be noted that, in the related art, the user needs to determine the month with the lowest qualification rate in the 6 months by observing and analyzing, and in this embodiment, the user can learn the month with the lowest qualification rate in the 6 months by question and answer, which avoids the disadvantages of long time consumption and high error probability due to user observation and analysis, especially avoids the disadvantages of time waste due to user browsing and query when the file content is large and the graphs are large, and especially avoids the problem of low accuracy due to data interference when the data is close and large, thereby realizing intellectualization and automation of information query, saving user time, and improving the technical effects of efficiency and accuracy of information query.
S102: and determining target information corresponding to the audio information according to preset graph information corresponding to the graph, wherein the graph information is generated by determining a saliency map of the graph, and fusing and analyzing context information of the graph based on the saliency map.
The chart information is related information used for expressing the content of the chart.
For example, with reference to the above example, a user may transmit an electronic file to a server through a terminal device, the server analyzes the electronic file after receiving the electronic file to obtain each graph (i.e., a graph in a picture format) in the electronic file, for example, the electronic file includes a graph as shown in fig. 2, the server analyzes the graph as shown in fig. 2 to obtain information (i.e., graph information) related to the graph as shown in fig. 2, and when the server acquires audio information, information (i.e., target information) corresponding to the audio information is determined from the graph information.
In this embodiment, the graph information is obtained by the server through two layers of processing, one layer is a saliency map of the determined graph, and the other layer is fusion and analysis of context information of the graph based on the determined saliency map.
The saliency map may be understood as a map including saliency information of the graph, which is obtained by analyzing the overall structural framework of the graph, and the saliency information includes at least one of global contrast, object property, and compactness.
It should be noted that, in this embodiment, the server processes the graph by using a two-level processing method to generate graph information, and specifically, the graph is analyzed from the overall dimension first, and then the graph is analyzed from the local dimension, which is equivalent to processing the graph from the two dimensions of "global + detail" to obtain the graph information.
S103: and outputting the target information.
With the above example, when the server determines the target information corresponding to the audio information, the target information may be sent to the terminal device, and the terminal device may perform voice broadcast on the target information through the audio component. If the target information carrying the month with the lowest qualification rate is determined and sent to the terminal equipment by the server, the terminal equipment converts the target information into voice information and broadcasts the voice information through the audio component.
Based on the above analysis, an embodiment of the present application provides an information query method, including: the method comprises the steps of obtaining audio information of a chart in a query picture format, wherein the audio information carries query intentions and is used for querying target information corresponding to the query intentions in the chart, and determining the target information corresponding to the audio information according to preset chart information corresponding to the chart, wherein the chart information is generated by determining a significant graph of the chart and fusing and analyzing context information of the chart based on the significant graph, and the target information is output, on one hand, the embodiment of the application provides a method for obtaining relevant information of the chart in a question and answer mode, so that a user does not need to observe and analyze the chart, a background of a worker of a server does not need to analyze the chart, the labor cost is saved, the problems of low efficiency and low accuracy caused by analyzing the chart in a manual mode are avoided, and the intelligence and automation of information query are realized, the efficiency and the reliability of information query are improved; on the other hand, by generating the chart information corresponding to the chart from two dimensions, the sufficiency and comprehensiveness of the description of the chart information on the chart can be realized, so that the technical effects of the accuracy and reliability of the query result (namely the target information) are realized, and the query experience of the user is improved.
For the reader to more deeply understand the principle of the server generating the chart information, it is now explained in detail with reference to fig. 3. Fig. 3 is a schematic flow chart of an information query method according to another embodiment of the present application.
As shown in fig. 3, the method includes:
s201: and analyzing the structure of the obtained chart to obtain a saliency map.
In some embodiments, a Saliency detection Network model (SAL) may be used to detect the graph and generate a Saliency map corresponding to the graph.
The generation of the significance detection network model may be implemented by collecting and training samples, and this embodiment is not limited.
In connection with the chart shown in fig. 2, analyzing the structure of the chart may be understood as detecting and analyzing the header and the content of the chart to obtain the header (as the header in fig. 2) and the content (as the portion of the rectangular bar of the chart in fig. 2) as shown in fig. 2.
It should be noted that, a picture chart may have a certain inclination or a strong background interference due to shooting or scanning, and therefore, in some embodiments, when the server acquires the chart, the chart may be corrected.
It should be understood that one electronic file (or picture) may include a plurality of charts, and when the number of the charts is multiple, the server may correct the electronic file, or when the server analyzes a certain chart, correct the chart, which is not limited in this embodiment.
For example, as shown in fig. 4, if the picture includes three charts, the server may correct the picture, that is, correct the three charts at the same time, or correct the chart analyzed by the server when the server analyzes one of the charts.
The method for correcting the chart by the server can comprise the following steps: and carrying out outer frame detection on the chart, cutting the chart according to outer frame coordinates, and carrying out oblique cutting correction on the chart by utilizing the four-corner coordinates of the detection frame. And the outer frame detection can be realized by adopting a remote sensing rotating target detection (SCR Det) model.
S202: at least one sub-graph of the graph is determined based on the saliency map and the contextual information of the graph.
The server may use a Conditional Random Field network (CRF) model, and supplement detailed information of the graph with the saliency map.
Similarly, the generation of the conditional random field network model may also be implemented by collecting and training samples, which is not limited in this embodiment.
In some embodiments, S202 may include:
s2021: the graph is divided into at least one region according to the saliency map and the context information of the graph.
For example, after determining the saliency map of the graph, the server may refine the saliency map in combination with the context information of the graph, and supplement details in the saliency map, and since more or less detail information corresponds to different positions in the saliency map, the saliency map supplemented with the context information of the graph may be divided into regions, which is equivalent to dividing the graph into a plurality of regions including a plurality of sub-graphs, and one region includes one sub-graph. As shown in fig. 4, the graph may be divided into 3 regions, one region being a region a as shown in fig. 4, another region being a region B as shown in fig. 4, and yet another region being a region C as shown in fig. 4.
S2022: and carrying out chart classification processing on at least one region to obtain at least one sub-chart comprising the chart type.
In combination with the above example, if at least one region includes the region a, the region B, and the region C, the server may perform chart classification processing on the region a, the region B, and the region C, that is, classify the chart in the region a, the chart in the region B, and the chart in the region C, and obtain a sub chart of the chart type in the region a, a sub chart of the chart type in the region B, and a sub chart of the chart type in the region C.
In a technical solution of a possible implementation, a preset chart classifier may be used to classify the charts in at least one region. Similarly, the obtaining method of the classifier may be implemented by collecting and training samples, and this embodiment is not limited.
It should be noted that, in this embodiment, by dividing the graph into one or more regions and performing graph classification processing on the one or more regions, the graph can be prevented from being omitted, and the reliability and integrity of the determined sub-graphs are improved, so that the technical effects of accuracy and comprehensiveness of the graph information generated based on each sub-graph are achieved.
S203: and analyzing the at least one sub-graph to generate graph information.
Accordingly, S203 may include: and analyzing the at least one sub-graph based on the graph type to generate graph information.
It should be noted that, in this embodiment, each sub-graph in the graph is determined from the global dimension of the structure and the local dimension of the context information of the graph, so that the drawback of wrong segmentation of each sub-graph can be avoided, the accuracy and reliability of segmenting the sub-graphs are improved, and the technical effect of the accuracy of the graph information generated based on each sub-graph is achieved.
In some embodiments, the graph type may be a column type (e.g., a sub-graph in region a in fig. 4), then S203 may include:
s20311: and performing character recognition on the at least one sub-chart to obtain first text information of the at least one sub-chart.
Here, the "first" in the first text information is used to distinguish from other text information (such as the second text information) in the following text, and cannot be understood as a limitation on the content of the text information.
In this embodiment, the text Recognition mode is not limited, for example, Optical Character Recognition (OCR) may be adopted to perform text Recognition on at least one sub-graph; for another example, a Fast organized Text transcription (FOTS) model may be used to perform Text recognition on at least one sub-graph.
It is worth mentioning that if the image text detection and recognition joint training model is adopted to perform character recognition on at least one sub-chart, the detection and recognition tasks of the image text detection and recognition joint training model share the convolution feature layer, so that the calculation time is saved, and more general image features can be learned compared with a two-stage training mode. The rotation region is introduced, and the text region which is aligned with the horizontal axis and is highly fixed can be generated from the convolution characteristic diagram, so that the identification of the oblique text is supported, and the accuracy and the reliability of the identification can be realized.
In some embodiments, the text recognition of the at least one sub-chart may be performed in two stages, one stage being text recognition of a header portion of the at least one sub-chart and the other stage being text recognition of other portions (hereinafter referred to as content portions) of the at least one sub-chart.
In conjunction with the sub-chart in the area a shown in fig. 4, the title portion of the sub-chart (i.e., "the completion of the key work" and "the percentage of pass of the enterprise product library for spot inspection") may be first subjected to character recognition, and then the content portion of the sub-chart (e.g., the horizontal and vertical coordinates and the percentage of columns) may be subjected to character recognition.
It should be noted that, by performing character recognition on the title portion and the content portion, confusion between the title portion and the content portion can be avoided, thereby improving the accuracy and reliability of character recognition.
S20312: and determining the rectangular columns of at least one sub-chart based on a preset target detection model.
The target detection model may be an open source object detection model, such as a YOLO target detection model, and specifically, a YOLO5 target detection model may be used. Similarly, the obtaining method of the YOLO5 target detection model is not limited in this embodiment, and for example, the obtaining method may be implemented by collecting training samples and performing training based on the training samples.
S20313: and determining the columnar attribute of the at least one sub-chart based on the rectangular columns, wherein the columnar attribute is a transverse columnar or a longitudinal columnar.
In some embodiments, the network model for distinguishing the columnar attributes may be established in advance, such as by acquiring a training set including horizontal columnar samples and vertical columnar samples, and performing training based on the training set.
S20314: and performing text positioning on the rectangular column to obtain first coordinate information of the rectangular column.
The image text detection and recognition combined training model can be adopted to determine the coordinate information of the rectangular column on the picture, and then the first coordinate information is determined based on a projection mode.
The chart information comprises first text information, columnar attributes and first coordinate information.
In this embodiment, by respectively obtaining the first text information, the columnar attribute and the first coordinate information, the chart information can be described from multiple dimensions, the integrity and comprehensiveness of the chart description are realized, and the technical effects of accuracy and reliability of information query are realized.
In some embodiments, parsing at least one sub-chart of the columnar type may further include:
s20315: first color information of the rectangular bar is determined, wherein the chart information further includes the first color information.
Similarly, the "first" in the first color information is used for distinguishing from the color information (e.g., the second color information) in the following, and is not to be construed as a limitation on the content of the color information.
For example, the first color information of the rectangular bar is determined by a preset color recognition model.
Specifically, the server may determine the first color information of the rectangular column based on an Open Source Computer Vision Library (OpenCV).
It should be noted that, in the graph, different colors may be used to fill each rectangular column, and when a user performs information query based on the graph, the user may also perform query by directly combining the colors, and by determining the first color information of the rectangular columns in this embodiment, recall rate, flexibility and diversity of the information query may be improved, and query experience of the user may be improved.
In some embodiments, the chart type may be a pie type (e.g., region B in fig. 4), and S203 may include:
s20321: and performing character recognition on the at least one sub-chart to obtain second text information of the at least one sub-chart.
For the principle of obtaining the second text information, reference may be made to the principle of obtaining the first text information, and details are not described here.
S20322: and detecting the at least one sub-chart based on a preset candidate region detection model to obtain each sector of the at least one sub-chart.
S20323: and determining the angle corresponding to each sector based on the detection frame corresponding to each sector.
The candidate region detection model (Fast RCNN) may obtain each sector through each detection frame, and the server may determine the angle of each sector based on the detection frame corresponding to each sector on the basis of obtaining each sector by using the candidate region detection model.
For example, as shown in fig. 5, the chart of pie-shaped types includes 5 sectors, and one sector corresponds to the proportion of sales of products of one model (including model a, model B, model C, model D, and model E). The server acquires 5 sectors of the pie chart by adopting a candidate area detection model, and can determine the angle of each sector based on the detection frame corresponding to each of the 5 sectors.
Specifically, the intersection point of the 5 detection frames is the center of a circle of the pie chart, one edge of any detection frame is tangent to the arc of the sector, and the angle of the sector can be determined through the center of the circle and the tangent line.
Wherein the chart information includes the second text information and the angle.
In this embodiment, by respectively obtaining the second text information and the angle, the chart information can be described from multiple dimensions, and the completeness and comprehensiveness of the chart description are realized, so that the technical effects of accuracy and reliability of information query are realized.
It is to be noted that, when the server determines the angle, the server may determine the area of the sectors, the percentage of each sector, and the like based on the angle, that is, the graph information may further include the area of each sector, the percentage of each sector, and the like.
The interaction between the user and the server is now exemplarily described with reference to the application scenario shown in fig. 2 and fig. 5 as follows:
the user can send audio information to the terminal device, the audio information is the model of the product with the best sales volume, the terminal device sends the audio information to the server, the server determines that the target information is the model D based on the chart information obtained through analysis, the target information of the model D is output to the terminal device, and the terminal device carries out voice broadcast on the target information of the model D.
Similarly, in some embodiments, parsing at least one sub-chart of a pie type may further include:
s20324: and determining second color information of each sector, wherein the chart information further comprises the second color information.
The principle of determining the second color information may refer to the principle of determining the first color information, and is not described herein again.
In the chart, different colors may be adopted to fill each sector, and when a user performs information query based on the chart, the user may also perform query by directly combining the colors, and by determining the second color information of each sector in the embodiment, the recall rate, flexibility and diversity of the information query can be improved, and the query experience of the user is improved.
In some embodiments, the chart type may be a dotted line type (e.g., a partial chart in area C in fig. 4), then S203 may include:
s20331: and performing character recognition on the at least one sub-chart to obtain third text information of the at least one sub-chart.
For the principle of obtaining the third text information, reference may be made to the principle of obtaining the first text information, and details are not described here.
S20332: and determining the category attribute of at least one sub chart, wherein the category attribute is a point diagram or a line diagram.
In this case, a network model (e.g., inclusion 4) for determining the category attribute of at least one sub-chart may be trained in advance, for example, a network model obtained by training a base network model using a sample point diagram and a sample line diagram.
S20333: and determining the position information of the point in the at least one sub-chart according to the category attribute.
Wherein the chart information includes third text information and position information.
In this embodiment, by respectively acquiring the third text information and the position information, the chart information can be described from multiple dimensions, the completeness and comprehensiveness of the chart description are realized, and the technical effects of accuracy and reliability of information query are realized.
Based on the above analysis, the category attribute is a dot diagram or a line diagram, and when the category attribute is a dot diagram (as shown in fig. 6), S20333 includes:
s3311: and detecting the dot diagram to obtain each data point and coordinate icon information in the dot diagram.
Therein, the dot map may be detected based on the YOLO5 target detection model, and each data point in the dot map is obtained, such as obtaining 5 data points as shown in fig. 6. In fig. 6, the abscissa is month, the ordinate is quantity, and specifically, the sale-limited quantity of a certain product, the unit may be a table.
The dot diagram may be detected based on the image text detection and recognition joint training model, and coordinate icon information in the dot diagram is obtained, such as information of abscissa and ordinate as shown in fig. 6.
S3312: and projecting each data point to a reconstructed image coordinate system, and obtaining a third coordinate of each data point based on coordinate icon information in the image coordinate system, wherein the position information comprises the third coordinate.
In this embodiment, the server may construct an image coordinate system, convert the coordinate diagram information in the image coordinate system into coordinate diagram information in the image coordinate system, and project each data point into the image coordinate system, so as to obtain a third coordinate of each data point.
In this embodiment, the third coordinate is determined in a projection manner, so that the third coordinate can be determined quickly, and the technical effect of improving the analysis efficiency is achieved.
Based on the above analysis, if the category attribute is a dot diagram or a line diagram, and if the category attribute is a line diagram (as shown in fig. 7), S20333 includes:
s3321: and detecting the line graph to obtain each break point in the line graph.
Therein, the dot diagram may be detected based on the YOLO5 target detection model, and each data point in the dot diagram is obtained, such as obtaining 5 break points as shown in fig. 7. Where the abscissa in fig. 7 is month and the ordinate is number, and specifically production number.
S3322: and identifying each folding point to obtain a fourth coordinate of each folding point, wherein the position information comprises the fourth coordinate.
For the principle of obtaining the fourth coordinate, reference may be made to the principle of obtaining the third coordinate, and details are not described here.
In this embodiment, the third coordinate is determined in a projection manner, so that the third coordinate can be determined quickly, and the technical effect of improving the analysis efficiency is achieved.
In some examples, after obtaining each break point in the line graph, a connecting line including each break point may be generated, and an intersection point of each connecting line in the line graph, each connecting line including the connecting line, is detected, and coordinates of the intersection point are obtained by identifying the intersection point, wherein the fourth coordinates include coordinates of the intersection point.
It should be noted that, by determining the intersection of the connecting lines, the association relationship between the connecting lines can be determined. For example, if one connection line is the sales condition of the product a and the other connection line is the sales condition of the product B, the same sales volume and the same sales volume of the product a and the product B at a certain time point can be determined according to the intersection point of the two connection lines, so that the intelligence and efficiency of the user for information query can be improved, and the query experience of the user can be improved.
In some embodiments, the chart type may also be a table, and S203 may include:
s20341: and detecting the table to obtain a table frame and a table line of the table.
The table may be detected by using a candidate region detection model, the table frame may be detected by using the candidate region detection model, and the table line may also be detected.
Of course, in another possible implementation, different models may be used to perform the table frame detection and the table line detection on the table respectively. For example, a table box may be obtained using a candidate region detection model and a table line may be obtained based on a corrosion dilation algorithm.
In a specific implementation process, the candidate area detection model may be obtained by collecting the sample table frame and the sample table line, and training the basic network model based on the collected sample table frame and the sample table line, which is not limited in this embodiment.
S20342: and reconstructing the table based on the table frame and the table line.
That is, when the server obtains the table frame and the table line, the table in the picture format may be reconstructed to obtain the table in the text format.
S20343: and performing text recognition on the table, and filling the text information obtained by recognition into the reconstructed table.
The table can be subjected to text recognition by adopting an image text detection and recognition combined training model to obtain text information.
S204: and acquiring audio information of the graph in the query picture format, wherein the audio information carries query intentions and is used for querying target information corresponding to the query intentions in the graph.
For the description of S204, reference may be made to S101, which is not described herein again.
S205: and determining target information corresponding to the audio information according to preset graph information corresponding to the graph, wherein the graph information is generated by determining a saliency map of the graph, and fusing and analyzing context information of the graph based on the saliency map.
For the description of S205, refer to S102, and the chart information is generated based on the above S201 to S203, which is not described herein again.
S206: and outputting the target information.
For the description of S206, reference may be made to S103, which is not described herein again.
In some embodiments, the query of information may be implemented by pre-constructing a prediction model, and the principle of constructing a prediction model and implementing information query based on the prediction model is now exemplarily described with reference to fig. 8. Fig. 8 is a schematic flow chart of an information query method according to another embodiment of the present application.
As shown in fig. 8, the method includes:
s301: and acquiring audio information of the graph in the query picture format, wherein the audio information carries query intentions and is used for querying target information corresponding to the query intentions in the graph.
For the description of S301, reference may be made to S101, which is not described herein again.
S302: image characteristics of the graph are determined.
The image features can be understood as feature vectors of the diagram.
S303: a feature code corresponding to the chart information is generated.
In this embodiment, the feature code may be understood as a language that can be recognized by a machine, and in this embodiment, the feature code may be understood as a language that can be recognized by a server.
S304: and training the basic network model according to the image characteristics, the characteristic codes corresponding to the chart information and preset training samples to generate a prediction model, wherein the training samples comprise question samples and answer samples.
In this embodiment, the type of the basic network model is not limited, for example, the basic network model may be a convolutional neural network model, and certainly, the basic network model may be a Dynamic Pointer Network (DPN), and when the basic network model is a Dynamic pointer network, because the Dynamic pointer network uses an attention mechanism (transactions), the attention mechanism can enable each feature to be freely combined with other entities, even if not from the same modality, so that when the basic network model is a Dynamic pointer network, the flexibility and reliability of generating the prediction model can be achieved, and further, the technical effects of reliability and accuracy in the prediction process can be achieved.
In some embodiments, in the training process, the predicted answer and the sample answer may be compared, and the parameters of the base network are adaptively adjusted based on the comparison result until the iteration number is met, or the error between the predicted answer and the sample answer is smaller than an error threshold.
In some examples, the user may interact with the server during the training process, for example, when the predicted answer is wrong, the correct answer may be provided, and the server may increase the sample size of the sample answer accordingly to improve the accuracy and reliability of the training.
S305: target information is determined based on the predictive model, the audio information, and the graph information.
In the embodiment, the prediction model is obtained through training, and prediction is performed based on the prediction model to obtain the target information, so that the chart information generated by the server can be fully utilized, the times of analyzing the chart information are reduced, the calculation cost is saved, the efficiency of information query is improved, and the query experience of a user is improved.
In some embodiments, S305 may include:
s3051: and performing character conversion on the audio information to generate target text information corresponding to the audio information.
In this step, the server converts the audio information into intermediate information (i.e., target text information) in a language (i.e., feature code) recognizable by the machine.
Wherein a deep neural network (LSTM-DNN) model may be employed to convert the audio information into the target textual information. Similarly, the deep neural network model can also be obtained by obtaining the training samples and training the basic network model based on the training samples.
S3052: and determining the feature code corresponding to the target text information.
In this step, the server converts the target text information into a machine-recognizable language (i.e., feature code).
S3053: and predicting the characteristic codes corresponding to the input chart information and the target text information by adopting a prediction model to obtain the target information.
S306: and outputting the target information.
For a description of S306, refer to S103, which is not described herein again.
In this embodiment, when a user initiates audio information for a certain chart, the server may predict the chart information (i.e., target information) corresponding to the audio information by using a prediction model to obtain the target information, so that the wide applicability and flexibility of information query may be improved, a convenient and fast information query manner may be provided for the user, and query experience of the user may be enhanced.
Based on the above analysis, in the above embodiments, the target information is predicted by using a prediction model, and in other embodiments, the target information may also be determined by using a mapping relationship, and the principle of implementing information query by using a mapping relationship is described in detail with reference to fig. 9. Fig. 9 is a schematic flow chart of an information query method according to another embodiment of the present application.
As shown in fig. 9, the method includes:
s401: and acquiring audio information of the graph in the query picture format, wherein the audio information carries query intentions and is used for querying target information corresponding to the query intentions in the graph.
For the description of S401, reference may be made to S101, which is not described herein again.
S402: and performing character conversion on the audio information to generate target text information corresponding to the audio information.
For the description of S402, reference may be made to S3051, which is not described herein again.
S403: and performing semantic analysis on the text information to obtain a query intention corresponding to the text information.
S404: and determining target information corresponding to the query intention from the chart information, wherein the chart information is generated by determining a saliency map of the chart, and fusing and analyzing context information of the chart based on the saliency map.
For example, the server may previously construct and store a mapping relationship between an intention (including a query intention) and graph related information (including target information), and when the intention is determined, graph related information corresponding to the intention may be extracted from the graph information based on the mapping relationship, and in the present embodiment, when the server determines the query intention, target information corresponding to the query intention may be extracted from the graph information based on the mapping relationship.
The mapping relationship may be embodied by a mapping relationship table, an index, and the like, and this embodiment is not limited.
In the embodiment, the target information is determined in a mapping relation mode, so that the diversity and flexibility of information query can be provided, and the accuracy of the information query can be improved.
The server can adaptively adjust the mapping relation based on the feedback of the user to the target information. If the server outputs the target information, the user feeds back the target information incorrectly and the correct target information is given, the server adaptively modifies the mapping relation and improves the accuracy of subsequent information query.
S405: target information is determined based on the predictive model, the audio information, and the graph information.
For the description of S405, refer to S102 or S305, which is not described herein again.
S406: and outputting the target information.
For a description of S406, refer to S103, which is not described herein again.
According to another aspect of the embodiments of the present application, an information query apparatus is further provided, configured to perform the method according to any one of the above embodiments, for example, the information query method shown in any one of fig. 1, fig. 3, fig. 8, and fig. 9 is performed.
Referring to fig. 10, fig. 10 is a schematic diagram of an information query device according to an embodiment of the present application.
As shown in fig. 10, the apparatus includes:
the acquisition module 11 is configured to acquire audio information of a graph in an inquiry picture format, where the audio information carries an inquiry intention and is used to inquire target information corresponding to the inquiry intention in the graph;
a first determining module 12, configured to determine target information corresponding to the audio information according to preset graph information corresponding to the graph, where the graph information is generated by determining a saliency map of the graph, and fusing and analyzing context information of the graph based on the saliency map;
and the output module 13 is used for outputting the target information.
As can be seen in fig. 11, in some embodiments, the method further includes:
a second determination module 14 for determining image characteristics of the graph;
a generating module 15, configured to generate a feature code corresponding to the graph information;
the training module 16 is configured to train a basic network model according to the image features, feature codes corresponding to the graph information, and preset training samples to generate a prediction model, where the training samples include question samples and answer samples;
and the first determining module 12 is configured to determine the target information according to the prediction model, the audio information, and the graph information.
In some embodiments, the first determining module 12 is configured to perform word conversion on the audio information, generate target text information corresponding to the audio information, determine a feature code corresponding to the target text information, and predict the input chart information and the feature code corresponding to the target text information by using the prediction model to obtain the target information.
In some embodiments, the first determining module is configured to perform word conversion on the audio information, generate target text information corresponding to the audio information, perform semantic analysis on the text information, obtain the query intention corresponding to the text information, and determine the target information corresponding to the query intention from the graph information.
As can be seen in fig. 11, in some embodiments, the method further includes:
an analysis module 17, configured to analyze a structure of the graph to obtain the saliency map;
a third determining module 18, configured to determine at least one sub-graph of the graph according to the saliency map and the context information of the graph;
and the analysis module 19 is configured to analyze the at least one sub-graph to generate the graph information.
In some embodiments, the third determining module 18 is configured to divide the chart into at least one region according to the saliency map and the context information of the chart, perform chart classification processing on the at least one region, and obtain the at least one sub-chart including a chart type;
and the parsing module 19 is configured to parse the at least one sub-graph based on the graph type to generate the graph information.
In some embodiments, if the chart type is a column type, the parsing module 19 is configured to perform word recognition on the at least one sub-chart to obtain first text information of the at least one sub-chart, determine a rectangular column of the at least one sub-chart based on a preset target detection model, determine a column attribute of the at least one sub-chart based on the rectangular column, where the column attribute is a horizontal column or a vertical column, perform text localization on the rectangular column, and obtain first coordinate information of the rectangular column, where the chart information includes the first text information, the column attribute, and the first coordinate information.
In some embodiments, if the chart type is a pie type, the parsing module 19 is configured to perform character recognition on the at least one sub-chart, obtain second text information of the at least one sub-chart, detect the at least one sub-chart based on a preset candidate region detection model, obtain each sector of the at least one sub-chart, and determine an angle corresponding to each sector based on a detection box corresponding to each sector, where the chart information includes the second text information and the angle.
In some embodiments, if the chart type is a dotted line type, the parsing module 19 is configured to perform character recognition on the at least one sub-chart, obtain third text information of the at least one sub-chart, determine a category attribute of the at least one sub-chart, where the category attribute is a point chart or a line chart, and determine location information of a point in the at least one sub-chart according to the category attribute, where the chart information includes the third text information and the location information.
In some embodiments, if the category attribute is a point map, the analysis module 19 is configured to detect the point map, obtain each data point in the point map and coordinate icon information, project each data point to a reconstructed image coordinate system, and obtain a third coordinate of each data point based on the coordinate icon information in the image coordinate system, where the location information includes the third coordinate.
In some embodiments, if the category attribute is a line graph, the analysis module 19 is configured to detect the line graph, obtain each break point in the line graph, identify each break point, and obtain a fourth coordinate of each break point, where the location information includes the fourth coordinate.
In some embodiments, the parsing module 19 is configured to identify the at least one sub-chart based on a preset color identification model, and obtain color information of the at least one sub-chart, where the chart information further includes the color information.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 12, is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of embodiments of the present application described and/or claimed herein.
As shown in fig. 12, the electronic apparatus includes: one or more processors 101, memory 102, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 12 illustrates an example of one processor 101.
The memory 102 is a non-transitory computer readable storage medium provided by the embodiments of the present application. The memory stores instructions executable by at least one processor, so that the at least one processor executes the information query method provided by the embodiment of the application. The non-transitory computer readable storage medium of the embodiments of the present application stores computer instructions for causing a computer to execute the information query method provided by the embodiments of the present application.
Memory 102, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules in embodiments of the present application. The processor 101 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 102, that is, implements the information query method in the above method embodiments.
The memory 102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 102 may optionally include memory located remotely from processor 101, which may be connected to an electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, Block-chain-Based Service Networks (BSNs), mobile communication networks, and combinations thereof.
The electronic device may further include: an input device 103 and an output device 104. The processor 101, the memory 102, the input device 103, and the output device 104 may be connected by a bus or other means, and the bus connection is exemplified in fig. 12.
The input device 103 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 104 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Block-chain-Based Service Networks (BSNs), Wide Area Networks (WANs), and the internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the conventional physical host and Virtual Private Server (VPS) service.
According to another aspect of the embodiments of the present application, there is also provided a chart processing method, which is used for generating chart information, where the chart information may be used in the information query method as described in any of the embodiments above.
Referring to fig. 13, fig. 13 is a flowchart illustrating a chart processing method according to an embodiment of the present disclosure.
As shown in fig. 13, the method includes:
s501: and acquiring a chart of a picture format to be processed.
S502: a saliency map of the graph is determined.
S503: and fusing and analyzing the context information of the chart based on the saliency map to generate chart information corresponding to the chart, wherein the chart information is used for information query.
For a specific scheme for determining the saliency map and generating the graph information, reference may be made to the description of the foregoing embodiments, and details are not described here again.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solution of the present application can be achieved, and the present invention is not limited thereto.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (27)

1. An information query method, comprising:
acquiring audio information of a chart in a query picture format, wherein the audio information carries a query intention and is used for querying target information corresponding to the query intention in the chart;
determining target information corresponding to the audio information according to preset graph information corresponding to the graph, wherein the graph information is generated by determining a saliency map of the graph, and fusing and analyzing context information of the graph based on the saliency map;
and outputting the target information.
2. The method of claim 1, further comprising:
determining image characteristics of the chart;
generating a feature code corresponding to the chart information;
training a basic network model according to the image features, feature codes corresponding to the chart information and preset training samples to generate a prediction model, wherein the training samples comprise question samples and answer samples;
and determining target information corresponding to the audio information according to preset graph information corresponding to the graph comprises: and determining the target information according to the prediction model, the audio information and the chart information.
3. The method of claim 2, wherein determining the target information from the predictive model, the audio information, and the graph information comprises:
performing character conversion on the audio information to generate target text information corresponding to the audio information;
determining a feature code corresponding to the target text information;
and predicting the characteristic codes corresponding to the input chart information and the target text information by adopting the prediction model to obtain the target information.
4. The method according to any one of claims 1 to 3, wherein determining target information corresponding to the audio information according to preset chart information corresponding to the chart comprises:
performing character conversion on the audio information to generate target text information corresponding to the audio information;
performing semantic analysis on the text information to obtain the query intention corresponding to the text information;
determining the target information corresponding to the query intent from the graph information.
5. The method of any of claims 1 to 3, further comprising:
analyzing the structure of the chart to obtain the saliency map;
determining at least one sub-graph of the graph according to the saliency map and the context information of the graph;
and analyzing the at least one sub-graph to generate the graph information.
6. The method of claim 5, wherein determining at least one sub-graph of the graph from the saliency map and the context information of the graph comprises:
dividing the chart into at least one region according to the saliency map and the context information of the chart;
performing chart classification processing on the at least one region to obtain at least one sub-chart comprising a chart type;
and analyzing the at least one sub-chart, wherein generating the chart information comprises: and analyzing the at least one sub-graph based on the graph type to generate the graph information.
7. The method of claim 6, wherein, if the graph type is a columnar type, parsing the at least one sub-graph comprises:
performing character recognition on the at least one sub-chart to obtain first text information of the at least one sub-chart;
determining a rectangular column of the at least one sub-chart based on a preset target detection model;
determining a columnar attribute of the at least one sub-chart based on the rectangular column, wherein the columnar attribute is a transverse columnar or a longitudinal columnar;
performing text positioning on the rectangular column to obtain first coordinate information of the rectangular column;
wherein the chart information includes the first text information, the columnar attribute, and the first coordinate information.
8. The method of claim 6, wherein, if the graph type is a pie type, parsing the at least one sub-graph comprises:
performing character recognition on the at least one sub-chart to obtain second text information of the at least one sub-chart;
detecting the at least one sub-chart based on a preset candidate area detection model to obtain each sector of the at least one sub-chart;
determining the angle corresponding to each sector based on the detection frame corresponding to each sector;
wherein the chart information includes the second text information and the angle.
9. The method of claim 6, wherein if the graph type is a dotted line type, parsing the at least one sub-graph comprises:
performing character recognition on the at least one sub-chart to obtain third text information of the at least one sub-chart;
determining a category attribute of the at least one sub-chart, wherein the category attribute is a point diagram or a line diagram;
determining position information of points in the at least one sub-chart according to the category attributes;
wherein the chart information includes the third text information and the position information.
10. The method of claim 9, wherein if the category attribute is a point map, determining the location information of the at least one sub-chart according to the category attribute comprises:
detecting the dot diagram to obtain each data point and coordinate icon information in the dot diagram;
and projecting the data points to a reconstructed image coordinate system, and obtaining third coordinates of the data points based on the coordinate icon information in the image coordinate system, wherein the position information comprises the third coordinates.
11. The method of claim 9, wherein if the category attribute is a line graph, determining location information for the at least one sub-chart based on the category attribute comprises:
detecting the line graph to obtain each break point in the line graph;
and identifying each break point to obtain a fourth coordinate of each break point, wherein the position information comprises the fourth coordinate.
12. The method of claim 7, further comprising:
and identifying the at least one sub-chart based on a preset color identification model to obtain the color information of the at least one sub-chart, wherein the chart information further comprises the color information.
13. An information inquiry apparatus comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring audio information of a chart in a query picture format, the audio information carries a query intention, and the audio information is used for querying target information corresponding to the query intention in the chart;
the first determining module is used for determining target information corresponding to the audio information according to preset graph information corresponding to the graph, wherein the graph information is generated by determining a significant graph of the graph, and fusing and analyzing context information of the graph based on the significant graph;
and the output module is used for outputting the target information.
14. The apparatus of claim 13, further comprising:
a second determination module for determining image characteristics of the chart;
the generating module is used for generating a feature code corresponding to the chart information;
the training module is used for training a basic network model according to the image features, the feature codes corresponding to the chart information and preset training samples to generate a prediction model, wherein the training samples comprise question samples and answer samples;
and the first determining module is used for determining the target information according to the prediction model, the audio information and the graph information.
15. The apparatus of claim 14, wherein the first determining module is configured to perform word conversion on the audio information, generate target text information corresponding to the audio information, determine feature codes corresponding to the target text information, and predict the input chart information and the feature codes corresponding to the target text information by using the prediction model to obtain the target information.
16. The apparatus according to any one of claims 13 to 15, wherein the first determining module is configured to perform word conversion on the audio information, generate target text information corresponding to the audio information, perform semantic analysis on the text information, obtain the query intention corresponding to the text information, and determine the target information corresponding to the query intention from the graph information.
17. The apparatus of any of claims 13 to 15, further comprising:
the analysis module is used for analyzing the structure of the chart to obtain the saliency map;
a third determination module for determining at least one sub-graph of the graph according to the saliency map and the context information of the graph;
and the analysis module is used for analyzing the at least one sub-graph to generate the graph information.
18. The apparatus according to claim 17, wherein the third determining module is configured to divide the chart into at least one region according to the saliency map and context information of the chart, perform chart classification processing on the at least one region, and obtain the at least one sub-chart including a chart type;
and the analysis module is used for analyzing the at least one sub-graph based on the graph type to generate the graph information.
19. The apparatus of claim 18, wherein if the graph type is a column type, the parsing module is configured to perform word recognition on the at least one sub-graph to obtain first text information of the at least one sub-graph, determine a rectangular column of the at least one sub-graph based on a preset target detection model, determine a column attribute of the at least one sub-graph based on the rectangular column, where the column attribute is a horizontal column or a vertical column, perform text localization on the rectangular column, and obtain first coordinate information of the rectangular column, where the graph information includes the first text information, the column attribute, and the first coordinate information.
20. The apparatus of claim 18, wherein if the chart type is a pie type, the parsing module is configured to perform word recognition on the at least one sub-chart, obtain second text information of the at least one sub-chart, perform detection on the at least one sub-chart based on a preset candidate area detection model, obtain each sector of the at least one sub-chart, and determine an angle corresponding to each sector based on a detection box corresponding to each sector, where the chart information includes the second text information and the angle.
21. The apparatus of claim 18, wherein if the chart type is a dotted line type, the parsing module is configured to perform word recognition on the at least one sub-chart, obtain third text information of the at least one sub-chart, determine a category attribute of the at least one sub-chart, where the category attribute is a dot diagram or a line diagram, and determine location information of a point in the at least one sub-chart according to the category attribute, where the chart information includes the third text information and the location information.
22. The apparatus of claim 21, wherein if the category attribute is a point map, the analysis module is configured to detect the point map, obtain each data point in the point map and coordinate icon information, project the each data point to a reconstructed image coordinate system, and obtain a third coordinate of the each data point based on the coordinate icon information in the image coordinate system, wherein the location information includes the third coordinate.
23. The apparatus of claim 21, wherein if the category attribute is a line graph, the parsing module is configured to detect the line graph, obtain each break point in the line graph, identify each break point, and obtain a fourth coordinate of each break point, and the location information includes the fourth coordinate.
24. The apparatus according to claim 19, wherein the parsing module is configured to identify the at least one sub-chart based on a preset color identification model, and obtain color information of the at least one sub-chart, wherein the chart information further includes the color information.
25. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.
26. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-12.
27. A chart processing method, comprising:
acquiring a chart of a picture format to be processed;
determining a saliency map of said graph;
and fusing and analyzing the context information of the chart based on the saliency map to generate chart information corresponding to the chart, wherein the chart information is used for information query.
CN202011024153.1A 2020-09-25 2020-09-25 Information query method and device, chart processing method and electronic equipment Pending CN112214620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011024153.1A CN112214620A (en) 2020-09-25 2020-09-25 Information query method and device, chart processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011024153.1A CN112214620A (en) 2020-09-25 2020-09-25 Information query method and device, chart processing method and electronic equipment

Publications (1)

Publication Number Publication Date
CN112214620A true CN112214620A (en) 2021-01-12

Family

ID=74052378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011024153.1A Pending CN112214620A (en) 2020-09-25 2020-09-25 Information query method and device, chart processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN112214620A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706780A (en) * 2009-09-03 2010-05-12 北京交通大学 Image semantic retrieving method based on visual attention model
CN103064936A (en) * 2012-12-24 2013-04-24 北京百度网讯科技有限公司 Voice-input-based image information extraction analysis method and device
US20180267960A1 (en) * 2017-01-16 2018-09-20 Senthil Nathan Rajendran Method for Interpretation of Charts Using Statistical Techniques & Machine Learning & Creating Automated Summaries in Natural Language
CN109710733A (en) * 2018-11-28 2019-05-03 北京永洪商智科技有限公司 A kind of data interactive method and system based on intelligent sound identification
US20190163970A1 (en) * 2017-11-29 2019-05-30 Abc Fintech Co., Ltd Method and device for extracting chart information in file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706780A (en) * 2009-09-03 2010-05-12 北京交通大学 Image semantic retrieving method based on visual attention model
CN103064936A (en) * 2012-12-24 2013-04-24 北京百度网讯科技有限公司 Voice-input-based image information extraction analysis method and device
US20180267960A1 (en) * 2017-01-16 2018-09-20 Senthil Nathan Rajendran Method for Interpretation of Charts Using Statistical Techniques & Machine Learning & Creating Automated Summaries in Natural Language
US20190163970A1 (en) * 2017-11-29 2019-05-30 Abc Fintech Co., Ltd Method and device for extracting chart information in file
CN109710733A (en) * 2018-11-28 2019-05-03 北京永洪商智科技有限公司 A kind of data interactive method and system based on intelligent sound identification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MARCELLA CORNIA: "Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention", 《ACM TRANSACTIONS ON MULTIMEDIA COMPUTING, COMMUNICATIONS AND APPLICATIONS》, 21 May 2018 (2018-05-21) *

Similar Documents

Publication Publication Date Title
US11854246B2 (en) Method, apparatus, device and storage medium for recognizing bill image
EP3832541A2 (en) Method and apparatus for recognizing text
US7945097B2 (en) Classifying digital ink into a writing or a drawing
US20210312172A1 (en) Human body identification method, electronic device and storage medium
US20190095758A1 (en) Method and system for obtaining picture annotation data
US10891430B2 (en) Semi-automated methods for translating structured document content to chat-based interaction
EP4006909B1 (en) Method, apparatus and device for quality control and storage medium
CN112507090B (en) Method, apparatus, device and storage medium for outputting information
US11727701B2 (en) Techniques to determine document recognition errors
CN113627439A (en) Text structuring method, processing device, electronic device and storage medium
CN104881673A (en) Mode identification method based on information integration and system thereof
US20220027854A1 (en) Data processing method and apparatus, electronic device and storage medium
WO2021254251A1 (en) Input display method and apparatus, and electronic device
CN113377958A (en) Document classification method and device, electronic equipment and storage medium
JP2022185143A (en) Text detection method, and text recognition method and device
CN110532415A (en) Picture search processing method, device, equipment and storage medium
JP2022536320A (en) Object identification method and device, electronic device and storage medium
US20220351495A1 (en) Method for matching image feature point, electronic device and storage medium
US20220392243A1 (en) Method for training text classification model, electronic device and storage medium
CN111552829A (en) Method and apparatus for analyzing image material
CN112214620A (en) Information query method and device, chart processing method and electronic equipment
US20230186667A1 (en) Assisted review of text content using a machine learning model
CN115017922A (en) Method and device for translating picture, electronic equipment and readable storage medium
CN113971810A (en) Document generation method, device, platform, electronic equipment and storage medium
US11995905B2 (en) Object recognition method and apparatus, and electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination