CN114218935B - Entity display method and device in data analysis - Google Patents
Entity display method and device in data analysis Download PDFInfo
- Publication number
- CN114218935B CN114218935B CN202210135204.0A CN202210135204A CN114218935B CN 114218935 B CN114218935 B CN 114218935B CN 202210135204 A CN202210135204 A CN 202210135204A CN 114218935 B CN114218935 B CN 114218935B
- Authority
- CN
- China
- Prior art keywords
- entity
- type
- category
- dimension
- entities
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
An embodiment of the specification provides an entity display method and device in data analysis, and the method comprises the following steps: acquiring an entity sequence obtained by entity recognition aiming at a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data; judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination; and displaying the entity combination as a data filtering condition included by the data analysis requirement. The entity exposure process in data analysis can embody the relevance between entities.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of computers, and more particularly, to a method and apparatus for entity display in data analysis.
Background
Currently, data analysis requirements of users have the characteristics of flexibility and a large amount, when one data analysis requirement is met, a professional needs to convert the data analysis requirement into a query statement which can be understood by a computer, for example, a Structured Query Language (SQL) statement, and then the computer can perform corresponding data analysis on a database by executing the SQL statement.
Due to the limited number of professionals, for the data analysis requirements of a large number of non-professionals, the data analysis requirements usually need to be converted into corresponding SQL statements by means of the professionals, and the process often needs to wait for a long time and cannot quickly meet the data analysis requirements. It is therefore desirable that a computer be able to receive user input of natural language text for expressing its data analysis needs by performing entity recognition on the natural language text, thereby understanding its data analysis needs based on the recognized entities.
In the identified entity set, some entities are logically related, some entities are logically unrelated, and how to embody the association between the entities in the entity display process in the data analysis is a problem to be solved urgently.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for entity presentation in data analysis, which can embody the association between entities in the process of entity presentation in data analysis.
In a first aspect, a method for displaying an entity in data analysis is provided, and the method includes:
acquiring an entity sequence obtained by entity recognition aiming at a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data;
judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination;
and displaying the entity combination as a data filtering condition included by the data analysis requirement.
In one possible implementation, the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data.
Further, the at least two entities comprise a first entity, a second entity and a third entity which are arranged in sequence; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity class of the second entity is an operator class and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value.
Further, the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules include:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity.
Further, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules include:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is a logical operator which is greater than, less than, greater than or equal to or less than;
the entity type of the eighth entity is a dimension value type, and the entity type is used to indicate that the entity belongs to a type of numerical value.
Further, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules comprise:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
In one possible embodiment, the presenting the entity combination includes:
highlighting a part of the natural language text corresponding to the entity combination in an input box; alternatively, the first and second electrodes may be,
and displaying the entity combination in a prompt box outside the input box, and showing the entity combination as one data filtering condition.
In one possible embodiment, the method further comprises:
and constructing a data query script according to the data filtering condition, wherein the data query script is used for executing query operation on the target data to obtain a query result corresponding to the data analysis requirement.
In a second aspect, an entity display apparatus in data analysis is provided, the apparatus including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an entity sequence obtained by entity identification aiming at a natural language text input by a user, and the natural language text is used for expressing the data analysis requirement of the user on target data;
a determining unit, configured to determine whether at least two adjacent entities in the entity sequence satisfy a preset rule according to at least one of an entity type, and an association relationship between entities in the entity sequence acquired by the acquiring unit; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
the combination unit is used for combining the at least two entities to obtain an entity combination if the judgment result of the judgment unit meets a preset rule;
and the display unit is used for displaying the entity combination obtained by the combination unit as a data filtering condition included by the data analysis requirement.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
In a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, firstly, an entity sequence obtained by entity identification of a natural language text input by a user is obtained, wherein the natural language text is used for expressing the data analysis requirement of the user on target data; then judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; then if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination; and finally, displaying the entity combination as a data filtering condition included by the data analysis requirement. As can be seen from the above, in the embodiments of the present specification, entities are grouped based on preset rules, model training is not required, accuracy is high, cold start speed is high, and by displaying entity combinations and corresponding the entity combinations to data filtering conditions, relevance between the entities can be reflected in an entity display process in data analysis.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 is a schematic diagram illustrating an implementation scenario of another embodiment disclosed in the present specification;
FIG. 3 illustrates a flow diagram of a method for entity presentation in data analysis, according to one embodiment;
FIG. 4 shows a schematic representation of a combination of entities according to one embodiment;
FIG. 5 illustrates a process diagram for building a data query script, according to one embodiment;
FIG. 6 shows a schematic block diagram of an entity exposure apparatus in data analysis, according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. The implementation scenario involves the presentation of entities in data analysis. And (3) data analysis, namely analyzing a large amount of collected data by using a proper statistical analysis method, and summarizing, understanding and digesting the data so as to maximally develop the function of the data and play the role of the data. Data analysis is the process of studying and summarizing data in detail in order to extract useful information and to form conclusions. In the embodiment of the present specification, a storage manner of data to be analyzed is not limited, and other storage manners such as an excel table may also be adopted. The database comprises a plurality of data tables, each data table comprises a plurality of fields, the fields correspond to columns, and each field is provided with a corresponding field name and a column of field values of a corresponding column. Referring to fig. 1, in order to quickly meet the data analysis requirement of a user, the present specification embodiment proposes a solution enabling a computer to receive natural language text input by the user, so as to understand the data analysis requirement based on a recognized entity by performing entity recognition on the natural language text, and enable the user to clearly associate the entities in a manner of grouping and presenting the recognized entities, wherein each presented entity combination corresponds to one data filtering condition. For example, the user inputs a natural language text "the total amount of purchase between the age of the user greater than 20 and the purchase time of the user between 6 months 10 and 6 months 12", it is understood that the natural language text is used for expressing the data analysis requirement of the user on the target data, and the natural language text includes two data filtering conditions, one data filtering condition is "the age of the user is greater than 20", the other data filtering condition is "the purchase time of the user between 6 months 10 and 6 months 12", each data filtering condition corresponds to an entity combination including a plurality of entities, and the plurality of entities included in the same entity combination are logically associated with each other.
Entity recognition, namely recognizing entities with specific meanings in natural language texts, and converting character sequences into entity sequences, such as entities of time and the like. In the embodiments of the present specification, an entity may be understood as a word, and each entity has its corresponding entity category. Entity categories may include, but are not limited to, temporal categories, operator categories, dimension value categories, and the like. The dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data.
The presentation of entity combinations may also be referred to as syntactic structuring, i.e., in data analysis, words that can be logically combined are functionally grouped into a group for natural language text input by a user. Such as: the user details of yesterday payment amount >10, where the payment amount >10, can be grouped into a group.
In the data analysis, the embodiment of the specification performs grouping display on entities obtained by entity recognition on natural language texts input by users, and provides a specific solution, so that the relevance among the entities can be reflected in the entity display process in the data analysis.
Fig. 2 is a schematic view of an implementation scenario of another embodiment disclosed in this specification. The implementation scenario involves the presentation of entities in data analysis. Referring to FIG. 2, the user enters a natural language text "the top ten of the payment amount in Beijing in the last thirty days", and the target database includes the following field names user, city, amt, time. Two values are included under the user field, namely 001 and 002; two values are included under the city field, namely Beijing city and Hangzhou city; the amt field comprises two values, namely 20 and 10; two values are included under the time field, 20200521 and 20200522 respectively. It will be appreciated that databases typically store large amounts of data, and that the figures are merely exemplary in nature to depict portions of the target database. After entity identification, obtaining 4 entities which are 0501-0530 respectively, wherein the category corresponding to the entity is Time, namely the Time category; in Beijing, the type corresponding to the entity is Col _ Value, namely dimension type, and the city represents the corresponding field name; paying amount, wherein the category corresponding to the entity is Measure, namely the dimension category, and amt represents the corresponding field name; top (10, desc), the category corresponding to the entity is Intent, i.e. intention category, which represents descending order of the Top 10 bits. In the embodiment of the specification, after the entity recognition is performed on the natural language text, the recognized entities all have respective categories, and the categories are helpful for embodying the data analysis requirements of the entities and can be used for displaying the entities in groups.
Fig. 3 shows a flowchart of an entity presentation method in data analysis according to an embodiment, which may be based on the implementation scenarios shown in fig. 1 or fig. 2. As shown in fig. 3, the entity display method in the data analysis in this embodiment includes the following steps: step 31, acquiring an entity sequence obtained by entity identification of a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data; step 32, judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity types, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; step 33, if the judgment result is that the preset rule is satisfied, combining the at least two entities to obtain an entity combination; and step 34, displaying the entity combination as a data filtering condition included by the data analysis requirement. Specific execution modes of the above steps are described below.
First, in step 31, an entity sequence obtained by performing entity identification on a natural language text input by a user is obtained, where the natural language text is used to express a data analysis requirement of the user on target data. It can be understood that the target data may be stored in any storage manner, and when the target data is stored in the database, since different databases generally have different field names and field values, the data analysis requirements faced are different accordingly. For example, a first database may have field names including name, age, identification number, and academic calendar, and a second database may have field names including user number and transaction amount, which are different fields and thus are typically subject to different data analysis requirements.
In one example, the data analysis requirements include querying a first range of the target data and performing a first manner of statistical analysis on the first range of the target data.
It is understood that a small range of data to be analyzed can be determined from a large range of stored data by determining one or more data filtering conditions included in the data analysis requirement, for example, the target data is stored in a target database, the target database includes a plurality of data tables, each data table includes a plurality of fields, at least one data table can be selected from the plurality of data tables, and data of at least one field can be selected from each data table in the at least one data table for analysis. In addition, there are various ways of statistical analysis, such as sorting, summing, averaging, etc., and one or more specific ways of statistical analysis may be determined by determining the need for data analysis.
In this embodiment of the specification, a manner of entity identification is not specifically limited, where a result of entity identification includes an entity sequence formed by a plurality of entities and entity categories corresponding to the entities, and each entity in the entity sequence has a certain order, where the order is an order of the entities in a natural language text.
Further, it is understood that "is", "equal" in natural language text can each be identified as entity "="; "not", "not equal", "except" in the natural language text can all be recognized as an entity "! And = 1 ".
Then, in step 32, judging whether at least two adjacent entities in the entity sequence meet a preset rule according to at least one of entity types, entity types and incidence relations among the entities in the entity sequence; wherein the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string. It is understood that the target data targeted by the data analysis generally has a specific data structure and a specific storage manner, for example, the target data is stored in a database, and the association relationship between the entities may be an association relationship embodied by the specific data structure, for example, a relationship between field names and field values of the same field.
In one example, the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data.
Further, the at least two entities comprise a first entity, a second entity and a third entity which are arranged in sequence; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value.
In this example, three adjacent entities are used as a group to determine whether a preset rule is met, wherein the second entity is located between the first entity and the third entity, and the rule relates to an entity category and an entity type. For example, if the first entity is age, the second entity is equal to age, and the third entity is 20, the set of entities satisfies the predetermined rule.
Further, the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules include:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity.
In this example, three adjacent entities are used as a group to determine whether a preset rule is met, wherein the second entity is located between the fourth entity and the fifth entity, and the rule relates to the entity category and the association relationship between the entities. For example, the fourth entity is a city, the second entity is equal, and the fifth entity is shanghai, which satisfies the predetermined rule.
Further, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules comprise:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is greater than, less than, greater than or equal to or less than or equal to a logical operator;
the entity type of the eighth entity is a dimension value type, and the entity type is used to indicate that the entity belongs to a type of numerical value.
In this example, three adjacent entities are used as a group to determine whether a preset rule is met, wherein the seventh entity is located between the sixth entity and the eighth entity, and the rule relates to an entity category and an entity type. For example, if the sixth entity is age, the seventh entity is less than, and the eighth entity is 20, the set of entities satisfies the predetermined rule.
Further, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules include:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
In this example, two adjacent entities are used as a group to determine whether a preset rule is met, and the rule relates to an association relationship between an entity category and an entity. For example, the ninth entity is beijing, the tenth entity is shanghai, and both belong to the dimension values corresponding to the city dimensions, and the group of entities satisfies the above predetermined rules.
It should be noted that the number of the adjacent entities according to the preset rule is not limited to two or three, and may include more entities, for example, four adjacent entities arranged in sequence, such as beijing, shanghai, nanjing, and guangzhou, the entity categories of the four adjacent entities are all dimension value categories, and belong to dimension values corresponding to the same dimension, which may be grouped into one group, and the group of entities conforms to the preset rule.
In the embodiment of the present specification, a plurality of parallel preset rules may be set, and as long as one of the preset rules is satisfied, the preset rule may be considered satisfied, and when a plurality of adjacent entities respectively satisfy different preset rules, the plurality of adjacent entities may also be regarded as a group, and the group is considered to satisfy the preset rule, for example, a plurality of adjacent entities, an entity 1, an entity 2, an entity 3, an entity 4, and an entity 5, which are sequentially arranged, are provided, where the entity 1, the entity 2, and the entity 3 satisfy the rule a, and the entity 4 and the entity 5 satisfy the rule B, and then the entity 1, the entity 2, the entity 3, the entity 4, and the entity 5 may be considered to satisfy the preset rule.
Next, in step 33, if the determination result is that the predetermined rule is satisfied, the at least two entities are combined to obtain an entity combination. It can be understood that, if the determination result is that the preset rule is not satisfied, the at least two entities cannot be combined.
For example, the natural language text input by the user is "city equal to Shanghai, which is not a sales amount of the user for male", wherein the city equal to Shanghai can be used as an entity combination, and the Shanghai cannot be used as an entity combination without being a male.
Finally, in step 34, a data filtering condition included by the entity combination as the data analysis requirement is displayed. It is understood that if other entities except the entity combination are displayed together, the display mode of the entity combination needs to be different from that of other entities.
In one example, the presenting the combination of entities comprises:
highlighting a part of the natural language text corresponding to the entity combination in an input box; alternatively, the first and second electrodes may be,
and displaying the entity combination in a prompt box outside the input box, and showing the entity combination as one data filtering condition.
FIG. 4 shows a presentation diagram of entity combinations according to one embodiment. Referring to fig. 4, in the input box, highlighted for a portion of the natural language text corresponding to the entity combination, the user inputs "a user with an age greater than 20 and a purchase total amount between 6 month 10 and 6 month 12 at a purchase time", wherein an underline is added below "the age greater than 20", and an underline is added below "the purchase time between 6 month 10 and 6 month 12" to indicate that the underlined portion corresponds to the entity combination. It is understood that the manner of highlighting is not limited thereto, and for example, a rectangular frame may be used to frame the entity combination, or a wavy line may be added below the entity combination.
The embodiment of the specification shows that the entity combination is used as a data filtering condition included in the data analysis requirement, so as to inform a user that the data filtering condition is identified, and a screening effect is generated on target data.
In one example, the method further comprises:
and constructing a data query script according to the data filtering condition, wherein the data query script is used for executing query operation on the target data to obtain a query result corresponding to the data analysis requirement.
FIG. 5 illustrates a process diagram for building a data query script, according to one embodiment. Referring to fig. 5, a natural language text input by a user is firstly subjected to entity identification to obtain an entity sequence, then an entity combination corresponding to a data filtering condition is determined according to the entity sequence, and then through core steps of syntactic analysis, semantic analysis, query script conversion and the like, the natural language is controllably and interpretably translated into a data query script step by step, so that a non-data technician can obtain data by self and analyze the data to obtain a data analysis result with high timeliness and high accuracy.
According to the method provided by the embodiment of the specification, firstly, an entity sequence obtained by entity recognition is obtained aiming at a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data; then judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; then if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination; and finally, displaying the entity combination as a data filtering condition included by the data analysis requirement. As can be seen from the above, in the embodiments of the present specification, entities are grouped based on preset rules, model training is not required, accuracy is high, cold start speed is high, and by displaying entity combinations and corresponding the entity combinations to data filtering conditions, relevance between the entities can be reflected in an entity display process in data analysis.
According to another aspect of embodiments, an entity presentation apparatus in data analysis is also provided, and the apparatus is configured to perform the method provided by the embodiments of the present specification. FIG. 6 shows a schematic block diagram of an entity exposure apparatus in data analysis, according to one embodiment. As shown in fig. 6, the apparatus 600 includes:
the acquiring unit 61 is configured to acquire an entity sequence obtained by performing entity identification on a natural language text input by a user, where the natural language text is used to express a data analysis requirement of the user on target data;
a determining unit 62, configured to determine whether at least two adjacent entities in the entity sequence satisfy a preset rule according to at least one of the entity category, the entity type, and the association relationship between the entities in the entity sequence acquired by the acquiring unit 61; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
a combining unit 63, configured to combine the at least two entities to obtain an entity combination if the determination result of the determining unit 62 meets a preset rule;
and a display unit 64, configured to display the entity combination obtained by the combination unit 63 as a data filtering condition included in the data analysis requirement.
Optionally, as an embodiment, the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data.
Further, the at least two entities comprise a first entity, a second entity and a third entity which are arranged in sequence; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value.
Further, the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules comprise:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity.
Further, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules include:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is greater than, less than, greater than or equal to or less than or equal to a logical operator;
the entity type of the eighth entity is a dimension value type, and the entity type is used to indicate that the entity belongs to a type of numerical value.
Further, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules include:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
Optionally, as an embodiment, the presenting unit 64 is specifically configured to highlight, in the input box, a portion of the natural language text corresponding to the entity combination; or, in a prompt box outside the input box, the entity combination is shown and is shown as one of the data filtering conditions.
Optionally, as an embodiment, the apparatus further includes:
and a construction unit, configured to construct a data query script according to the data filtering condition displayed by the display unit 64, and execute a query operation on the target data to obtain a query result corresponding to the data analysis requirement.
With the apparatus provided in this specification, first, the obtaining unit 61 obtains an entity sequence obtained by performing entity identification on a natural language text input by a user, where the natural language text is used to express a data analysis requirement of the user on target data; then, the determining unit 62 determines whether at least two adjacent entities in the entity sequence satisfy a preset rule according to at least one of the entity type, and the association relationship between the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; then, when the judgment result is that the preset rule is satisfied, the combining unit 63 combines the at least two entities to obtain an entity combination; finally, the presentation unit 64 presents the entity combination as a data filtering condition included in the data analysis requirement. As can be seen from the above, in the embodiments of the present specification, entities are grouped based on preset rules, model training is not required, accuracy is high, cold start speed is high, and by displaying entity combinations and corresponding the entity combinations to data filtering conditions, relevance between the entities can be reflected in an entity display process in data analysis.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 3.
Those skilled in the art will recognize that the functionality described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof, in one or more of the examples described above. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.
Claims (8)
1. A method of entity presentation in data analysis, the method comprising:
acquiring an entity sequence obtained by entity recognition aiming at a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data;
judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination;
displaying the entity combination as a data filtering condition included by the data analysis requirement;
wherein the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data;
the at least two entities comprise a first entity, a second entity and a third entity which are sequentially arranged; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value;
or the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules include:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity;
or, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules include:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is greater than, less than, greater than or equal to or less than or equal to a logical operator;
the entity type of the eighth entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value;
or, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules include:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
2. The method of claim 1, wherein said presenting said combination of entities comprises:
highlighting a part of the natural language text corresponding to the entity combination in an input box; alternatively, the first and second electrodes may be,
and displaying the entity combination in a prompt box outside the input box, and showing the entity combination as one data filtering condition.
3. The method of claim 1, wherein the method further comprises:
and constructing a data query script according to the data filtering condition, wherein the data query script is used for executing query operation on the target data to obtain a query result corresponding to the data analysis requirement.
4. An entity presentation apparatus in data analysis, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an entity sequence obtained by entity identification aiming at a natural language text input by a user, and the natural language text is used for expressing the data analysis requirement of the user on target data;
a determining unit, configured to determine whether at least two adjacent entities in the entity sequence satisfy a preset rule according to at least one of an entity type, and an association relationship between entities in the entity sequence acquired by the acquiring unit; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
the combination unit is used for combining the at least two entities to obtain an entity combination if the judgment result of the judgment unit meets a preset rule;
the display unit is used for displaying the entity combination obtained by the combination unit as a data filtering condition included by the data analysis requirement;
wherein the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data;
the at least two entities comprise a first entity, a second entity and a third entity which are sequentially arranged; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value;
or the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules include:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity;
or, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules include:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is a logical operator which is greater than, less than, greater than or equal to or less than;
the entity type of the eighth entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value;
or, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules include:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
5. The apparatus according to claim 4, wherein the presentation unit is specifically configured to highlight, in the input box, a portion of the natural language text corresponding to the entity combination; or, in a prompt box outside the input box, the entity combination is displayed and is shown as one data filtering condition.
6. The apparatus of claim 4, wherein the apparatus further comprises:
and the construction unit is used for constructing a data query script according to the data filtering conditions displayed by the display unit, and is used for executing query operation on the target data to obtain a query result corresponding to the data analysis requirement.
7. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-3.
8. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-3.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210952243.XA CN115345157A (en) | 2022-02-15 | 2022-02-15 | Entity display method and device in data analysis |
CN202210135204.0A CN114218935B (en) | 2022-02-15 | 2022-02-15 | Entity display method and device in data analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210135204.0A CN114218935B (en) | 2022-02-15 | 2022-02-15 | Entity display method and device in data analysis |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210952243.XA Division CN115345157A (en) | 2022-02-15 | 2022-02-15 | Entity display method and device in data analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114218935A CN114218935A (en) | 2022-03-22 |
CN114218935B true CN114218935B (en) | 2022-06-21 |
Family
ID=80709266
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210952243.XA Pending CN115345157A (en) | 2022-02-15 | 2022-02-15 | Entity display method and device in data analysis |
CN202210135204.0A Active CN114218935B (en) | 2022-02-15 | 2022-02-15 | Entity display method and device in data analysis |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210952243.XA Pending CN115345157A (en) | 2022-02-15 | 2022-02-15 | Entity display method and device in data analysis |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN115345157A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491373A (en) * | 2018-02-01 | 2018-09-04 | 北京百度网讯科技有限公司 | A kind of entity recognition method and system |
CN110955752A (en) * | 2019-11-25 | 2020-04-03 | 三角兽(北京)科技有限公司 | Information display method and device, electronic equipment and computer storage medium |
CN111091883A (en) * | 2019-12-16 | 2020-05-01 | 东软集团股份有限公司 | Medical text processing method and device, storage medium and equipment |
CN112001188A (en) * | 2020-10-30 | 2020-11-27 | 北京智源人工智能研究院 | Method and device for rapidly realizing NL2SQL based on vectorization semantic rule |
CN113657113A (en) * | 2021-08-24 | 2021-11-16 | 北京字跳网络技术有限公司 | Text processing method and device and electronic equipment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8751505B2 (en) * | 2012-03-11 | 2014-06-10 | International Business Machines Corporation | Indexing and searching entity-relationship data |
US20140278983A1 (en) * | 2013-03-15 | 2014-09-18 | Microsoft Corporation | Using entity repository to enhance advertisement display |
CN106033466A (en) * | 2015-03-20 | 2016-10-19 | 华为技术有限公司 | Database query method and device |
US20180210883A1 (en) * | 2017-01-25 | 2018-07-26 | Dony Ang | System for converting natural language questions into sql-semantic queries based on a dimensional model |
CN111310469A (en) * | 2020-01-16 | 2020-06-19 | 北京明略软件系统有限公司 | Method and device for searching invisible relationship between entities, electronic equipment and storage medium |
-
2022
- 2022-02-15 CN CN202210952243.XA patent/CN115345157A/en active Pending
- 2022-02-15 CN CN202210135204.0A patent/CN114218935B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491373A (en) * | 2018-02-01 | 2018-09-04 | 北京百度网讯科技有限公司 | A kind of entity recognition method and system |
CN110955752A (en) * | 2019-11-25 | 2020-04-03 | 三角兽(北京)科技有限公司 | Information display method and device, electronic equipment and computer storage medium |
CN111091883A (en) * | 2019-12-16 | 2020-05-01 | 东软集团股份有限公司 | Medical text processing method and device, storage medium and equipment |
CN112001188A (en) * | 2020-10-30 | 2020-11-27 | 北京智源人工智能研究院 | Method and device for rapidly realizing NL2SQL based on vectorization semantic rule |
CN113657113A (en) * | 2021-08-24 | 2021-11-16 | 北京字跳网络技术有限公司 | Text processing method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN115345157A (en) | 2022-11-15 |
CN114218935A (en) | 2022-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110543517B (en) | Method, device and medium for realizing complex query of mass data based on elastic search | |
CN109766497B (en) | Ranking list generation method and device, storage medium and electronic equipment | |
TWI643076B (en) | Financial analysis system and method for unstructured text data | |
CN101398758B (en) | Detection method of code copy | |
US20050183002A1 (en) | Data and metadata linking form mechanism and method | |
Van der Aa et al. | Checking process compliance against natural language specifications using behavioral spaces | |
KR20190076047A (en) | System and method for determining relationships between data elements | |
CN115061721A (en) | Report generation method and device, computer equipment and storage medium | |
CN109241075B (en) | Index basic data processing method and equipment and computer readable storage medium | |
CN109101541B (en) | Newly added index management method, device and computer readable storage medium | |
JP7015319B2 (en) | Data analysis support device, data analysis support method and data analysis support program | |
EP1745390A2 (en) | Data and metadata linking form mechanism and method | |
CN112966482A (en) | Report generation method, device and equipment | |
CN114218935B (en) | Entity display method and device in data analysis | |
JP7015320B2 (en) | Data analysis support device, data analysis support method and data analysis support program | |
Scaffidi et al. | Intelligently creating and recommending reusable reformatting rules | |
CN114090620B (en) | Query request processing method and device | |
CN111143398B (en) | Extra-large set query method and device based on extended SQL function | |
JP5020274B2 (en) | Semantic drift occurrence evaluation method and apparatus | |
CN114090627B (en) | Data query method and device | |
CN116127053B (en) | Entity word disambiguation, knowledge graph generation and knowledge recommendation methods and devices | |
JP5324500B2 (en) | File sharing device | |
US11315590B2 (en) | Voice and graphical user interface | |
CN116126918A (en) | Data generation method, information screening method, device and medium | |
CN114610791A (en) | Data blood relationship analysis method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |