CN114218935B - Entity display method and device in data analysis - Google Patents

Entity display method and device in data analysis Download PDF

Info

Publication number
CN114218935B
CN114218935B CN202210135204.0A CN202210135204A CN114218935B CN 114218935 B CN114218935 B CN 114218935B CN 202210135204 A CN202210135204 A CN 202210135204A CN 114218935 B CN114218935 B CN 114218935B
Authority
CN
China
Prior art keywords
entity
type
category
dimension
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210135204.0A
Other languages
Chinese (zh)
Other versions
CN114218935A (en
Inventor
黄亚东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202210952243.XA priority Critical patent/CN115345157A/en
Priority to CN202210135204.0A priority patent/CN114218935B/en
Publication of CN114218935A publication Critical patent/CN114218935A/en
Application granted granted Critical
Publication of CN114218935B publication Critical patent/CN114218935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

An embodiment of the specification provides an entity display method and device in data analysis, and the method comprises the following steps: acquiring an entity sequence obtained by entity recognition aiming at a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data; judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination; and displaying the entity combination as a data filtering condition included by the data analysis requirement. The entity exposure process in data analysis can embody the relevance between entities.

Description

Entity display method and device in data analysis
Technical Field
One or more embodiments of the present disclosure relate to the field of computers, and more particularly, to a method and apparatus for entity display in data analysis.
Background
Currently, data analysis requirements of users have the characteristics of flexibility and a large amount, when one data analysis requirement is met, a professional needs to convert the data analysis requirement into a query statement which can be understood by a computer, for example, a Structured Query Language (SQL) statement, and then the computer can perform corresponding data analysis on a database by executing the SQL statement.
Due to the limited number of professionals, for the data analysis requirements of a large number of non-professionals, the data analysis requirements usually need to be converted into corresponding SQL statements by means of the professionals, and the process often needs to wait for a long time and cannot quickly meet the data analysis requirements. It is therefore desirable that a computer be able to receive user input of natural language text for expressing its data analysis needs by performing entity recognition on the natural language text, thereby understanding its data analysis needs based on the recognized entities.
In the identified entity set, some entities are logically related, some entities are logically unrelated, and how to embody the association between the entities in the entity display process in the data analysis is a problem to be solved urgently.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for entity presentation in data analysis, which can embody the association between entities in the process of entity presentation in data analysis.
In a first aspect, a method for displaying an entity in data analysis is provided, and the method includes:
acquiring an entity sequence obtained by entity recognition aiming at a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data;
judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination;
and displaying the entity combination as a data filtering condition included by the data analysis requirement.
In one possible implementation, the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data.
Further, the at least two entities comprise a first entity, a second entity and a third entity which are arranged in sequence; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity class of the second entity is an operator class and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value.
Further, the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules include:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity.
Further, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules include:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is a logical operator which is greater than, less than, greater than or equal to or less than;
the entity type of the eighth entity is a dimension value type, and the entity type is used to indicate that the entity belongs to a type of numerical value.
Further, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules comprise:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
In one possible embodiment, the presenting the entity combination includes:
highlighting a part of the natural language text corresponding to the entity combination in an input box; alternatively, the first and second electrodes may be,
and displaying the entity combination in a prompt box outside the input box, and showing the entity combination as one data filtering condition.
In one possible embodiment, the method further comprises:
and constructing a data query script according to the data filtering condition, wherein the data query script is used for executing query operation on the target data to obtain a query result corresponding to the data analysis requirement.
In a second aspect, an entity display apparatus in data analysis is provided, the apparatus including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an entity sequence obtained by entity identification aiming at a natural language text input by a user, and the natural language text is used for expressing the data analysis requirement of the user on target data;
a determining unit, configured to determine whether at least two adjacent entities in the entity sequence satisfy a preset rule according to at least one of an entity type, and an association relationship between entities in the entity sequence acquired by the acquiring unit; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
the combination unit is used for combining the at least two entities to obtain an entity combination if the judgment result of the judgment unit meets a preset rule;
and the display unit is used for displaying the entity combination obtained by the combination unit as a data filtering condition included by the data analysis requirement.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
In a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, firstly, an entity sequence obtained by entity identification of a natural language text input by a user is obtained, wherein the natural language text is used for expressing the data analysis requirement of the user on target data; then judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; then if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination; and finally, displaying the entity combination as a data filtering condition included by the data analysis requirement. As can be seen from the above, in the embodiments of the present specification, entities are grouped based on preset rules, model training is not required, accuracy is high, cold start speed is high, and by displaying entity combinations and corresponding the entity combinations to data filtering conditions, relevance between the entities can be reflected in an entity display process in data analysis.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;
FIG. 2 is a schematic diagram illustrating an implementation scenario of another embodiment disclosed in the present specification;
FIG. 3 illustrates a flow diagram of a method for entity presentation in data analysis, according to one embodiment;
FIG. 4 shows a schematic representation of a combination of entities according to one embodiment;
FIG. 5 illustrates a process diagram for building a data query script, according to one embodiment;
FIG. 6 shows a schematic block diagram of an entity exposure apparatus in data analysis, according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. The implementation scenario involves the presentation of entities in data analysis. And (3) data analysis, namely analyzing a large amount of collected data by using a proper statistical analysis method, and summarizing, understanding and digesting the data so as to maximally develop the function of the data and play the role of the data. Data analysis is the process of studying and summarizing data in detail in order to extract useful information and to form conclusions. In the embodiment of the present specification, a storage manner of data to be analyzed is not limited, and other storage manners such as an excel table may also be adopted. The database comprises a plurality of data tables, each data table comprises a plurality of fields, the fields correspond to columns, and each field is provided with a corresponding field name and a column of field values of a corresponding column. Referring to fig. 1, in order to quickly meet the data analysis requirement of a user, the present specification embodiment proposes a solution enabling a computer to receive natural language text input by the user, so as to understand the data analysis requirement based on a recognized entity by performing entity recognition on the natural language text, and enable the user to clearly associate the entities in a manner of grouping and presenting the recognized entities, wherein each presented entity combination corresponds to one data filtering condition. For example, the user inputs a natural language text "the total amount of purchase between the age of the user greater than 20 and the purchase time of the user between 6 months 10 and 6 months 12", it is understood that the natural language text is used for expressing the data analysis requirement of the user on the target data, and the natural language text includes two data filtering conditions, one data filtering condition is "the age of the user is greater than 20", the other data filtering condition is "the purchase time of the user between 6 months 10 and 6 months 12", each data filtering condition corresponds to an entity combination including a plurality of entities, and the plurality of entities included in the same entity combination are logically associated with each other.
Entity recognition, namely recognizing entities with specific meanings in natural language texts, and converting character sequences into entity sequences, such as entities of time and the like. In the embodiments of the present specification, an entity may be understood as a word, and each entity has its corresponding entity category. Entity categories may include, but are not limited to, temporal categories, operator categories, dimension value categories, and the like. The dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data.
The presentation of entity combinations may also be referred to as syntactic structuring, i.e., in data analysis, words that can be logically combined are functionally grouped into a group for natural language text input by a user. Such as: the user details of yesterday payment amount >10, where the payment amount >10, can be grouped into a group.
In the data analysis, the embodiment of the specification performs grouping display on entities obtained by entity recognition on natural language texts input by users, and provides a specific solution, so that the relevance among the entities can be reflected in the entity display process in the data analysis.
Fig. 2 is a schematic view of an implementation scenario of another embodiment disclosed in this specification. The implementation scenario involves the presentation of entities in data analysis. Referring to FIG. 2, the user enters a natural language text "the top ten of the payment amount in Beijing in the last thirty days", and the target database includes the following field names user, city, amt, time. Two values are included under the user field, namely 001 and 002; two values are included under the city field, namely Beijing city and Hangzhou city; the amt field comprises two values, namely 20 and 10; two values are included under the time field, 20200521 and 20200522 respectively. It will be appreciated that databases typically store large amounts of data, and that the figures are merely exemplary in nature to depict portions of the target database. After entity identification, obtaining 4 entities which are 0501-0530 respectively, wherein the category corresponding to the entity is Time, namely the Time category; in Beijing, the type corresponding to the entity is Col _ Value, namely dimension type, and the city represents the corresponding field name; paying amount, wherein the category corresponding to the entity is Measure, namely the dimension category, and amt represents the corresponding field name; top (10, desc), the category corresponding to the entity is Intent, i.e. intention category, which represents descending order of the Top 10 bits. In the embodiment of the specification, after the entity recognition is performed on the natural language text, the recognized entities all have respective categories, and the categories are helpful for embodying the data analysis requirements of the entities and can be used for displaying the entities in groups.
Fig. 3 shows a flowchart of an entity presentation method in data analysis according to an embodiment, which may be based on the implementation scenarios shown in fig. 1 or fig. 2. As shown in fig. 3, the entity display method in the data analysis in this embodiment includes the following steps: step 31, acquiring an entity sequence obtained by entity identification of a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data; step 32, judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity types, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; step 33, if the judgment result is that the preset rule is satisfied, combining the at least two entities to obtain an entity combination; and step 34, displaying the entity combination as a data filtering condition included by the data analysis requirement. Specific execution modes of the above steps are described below.
First, in step 31, an entity sequence obtained by performing entity identification on a natural language text input by a user is obtained, where the natural language text is used to express a data analysis requirement of the user on target data. It can be understood that the target data may be stored in any storage manner, and when the target data is stored in the database, since different databases generally have different field names and field values, the data analysis requirements faced are different accordingly. For example, a first database may have field names including name, age, identification number, and academic calendar, and a second database may have field names including user number and transaction amount, which are different fields and thus are typically subject to different data analysis requirements.
In one example, the data analysis requirements include querying a first range of the target data and performing a first manner of statistical analysis on the first range of the target data.
It is understood that a small range of data to be analyzed can be determined from a large range of stored data by determining one or more data filtering conditions included in the data analysis requirement, for example, the target data is stored in a target database, the target database includes a plurality of data tables, each data table includes a plurality of fields, at least one data table can be selected from the plurality of data tables, and data of at least one field can be selected from each data table in the at least one data table for analysis. In addition, there are various ways of statistical analysis, such as sorting, summing, averaging, etc., and one or more specific ways of statistical analysis may be determined by determining the need for data analysis.
In this embodiment of the specification, a manner of entity identification is not specifically limited, where a result of entity identification includes an entity sequence formed by a plurality of entities and entity categories corresponding to the entities, and each entity in the entity sequence has a certain order, where the order is an order of the entities in a natural language text.
Further, it is understood that "is", "equal" in natural language text can each be identified as entity "="; "not", "not equal", "except" in the natural language text can all be recognized as an entity "! And = 1 ".
Then, in step 32, judging whether at least two adjacent entities in the entity sequence meet a preset rule according to at least one of entity types, entity types and incidence relations among the entities in the entity sequence; wherein the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string. It is understood that the target data targeted by the data analysis generally has a specific data structure and a specific storage manner, for example, the target data is stored in a database, and the association relationship between the entities may be an association relationship embodied by the specific data structure, for example, a relationship between field names and field values of the same field.
In one example, the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data.
Further, the at least two entities comprise a first entity, a second entity and a third entity which are arranged in sequence; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value.
In this example, three adjacent entities are used as a group to determine whether a preset rule is met, wherein the second entity is located between the first entity and the third entity, and the rule relates to an entity category and an entity type. For example, if the first entity is age, the second entity is equal to age, and the third entity is 20, the set of entities satisfies the predetermined rule.
Further, the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules include:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity.
In this example, three adjacent entities are used as a group to determine whether a preset rule is met, wherein the second entity is located between the fourth entity and the fifth entity, and the rule relates to the entity category and the association relationship between the entities. For example, the fourth entity is a city, the second entity is equal, and the fifth entity is shanghai, which satisfies the predetermined rule.
Further, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules comprise:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is greater than, less than, greater than or equal to or less than or equal to a logical operator;
the entity type of the eighth entity is a dimension value type, and the entity type is used to indicate that the entity belongs to a type of numerical value.
In this example, three adjacent entities are used as a group to determine whether a preset rule is met, wherein the seventh entity is located between the sixth entity and the eighth entity, and the rule relates to an entity category and an entity type. For example, if the sixth entity is age, the seventh entity is less than, and the eighth entity is 20, the set of entities satisfies the predetermined rule.
Further, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules include:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
In this example, two adjacent entities are used as a group to determine whether a preset rule is met, and the rule relates to an association relationship between an entity category and an entity. For example, the ninth entity is beijing, the tenth entity is shanghai, and both belong to the dimension values corresponding to the city dimensions, and the group of entities satisfies the above predetermined rules.
It should be noted that the number of the adjacent entities according to the preset rule is not limited to two or three, and may include more entities, for example, four adjacent entities arranged in sequence, such as beijing, shanghai, nanjing, and guangzhou, the entity categories of the four adjacent entities are all dimension value categories, and belong to dimension values corresponding to the same dimension, which may be grouped into one group, and the group of entities conforms to the preset rule.
In the embodiment of the present specification, a plurality of parallel preset rules may be set, and as long as one of the preset rules is satisfied, the preset rule may be considered satisfied, and when a plurality of adjacent entities respectively satisfy different preset rules, the plurality of adjacent entities may also be regarded as a group, and the group is considered to satisfy the preset rule, for example, a plurality of adjacent entities, an entity 1, an entity 2, an entity 3, an entity 4, and an entity 5, which are sequentially arranged, are provided, where the entity 1, the entity 2, and the entity 3 satisfy the rule a, and the entity 4 and the entity 5 satisfy the rule B, and then the entity 1, the entity 2, the entity 3, the entity 4, and the entity 5 may be considered to satisfy the preset rule.
Next, in step 33, if the determination result is that the predetermined rule is satisfied, the at least two entities are combined to obtain an entity combination. It can be understood that, if the determination result is that the preset rule is not satisfied, the at least two entities cannot be combined.
For example, the natural language text input by the user is "city equal to Shanghai, which is not a sales amount of the user for male", wherein the city equal to Shanghai can be used as an entity combination, and the Shanghai cannot be used as an entity combination without being a male.
Finally, in step 34, a data filtering condition included by the entity combination as the data analysis requirement is displayed. It is understood that if other entities except the entity combination are displayed together, the display mode of the entity combination needs to be different from that of other entities.
In one example, the presenting the combination of entities comprises:
highlighting a part of the natural language text corresponding to the entity combination in an input box; alternatively, the first and second electrodes may be,
and displaying the entity combination in a prompt box outside the input box, and showing the entity combination as one data filtering condition.
FIG. 4 shows a presentation diagram of entity combinations according to one embodiment. Referring to fig. 4, in the input box, highlighted for a portion of the natural language text corresponding to the entity combination, the user inputs "a user with an age greater than 20 and a purchase total amount between 6 month 10 and 6 month 12 at a purchase time", wherein an underline is added below "the age greater than 20", and an underline is added below "the purchase time between 6 month 10 and 6 month 12" to indicate that the underlined portion corresponds to the entity combination. It is understood that the manner of highlighting is not limited thereto, and for example, a rectangular frame may be used to frame the entity combination, or a wavy line may be added below the entity combination.
The embodiment of the specification shows that the entity combination is used as a data filtering condition included in the data analysis requirement, so as to inform a user that the data filtering condition is identified, and a screening effect is generated on target data.
In one example, the method further comprises:
and constructing a data query script according to the data filtering condition, wherein the data query script is used for executing query operation on the target data to obtain a query result corresponding to the data analysis requirement.
FIG. 5 illustrates a process diagram for building a data query script, according to one embodiment. Referring to fig. 5, a natural language text input by a user is firstly subjected to entity identification to obtain an entity sequence, then an entity combination corresponding to a data filtering condition is determined according to the entity sequence, and then through core steps of syntactic analysis, semantic analysis, query script conversion and the like, the natural language is controllably and interpretably translated into a data query script step by step, so that a non-data technician can obtain data by self and analyze the data to obtain a data analysis result with high timeliness and high accuracy.
According to the method provided by the embodiment of the specification, firstly, an entity sequence obtained by entity recognition is obtained aiming at a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data; then judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; then if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination; and finally, displaying the entity combination as a data filtering condition included by the data analysis requirement. As can be seen from the above, in the embodiments of the present specification, entities are grouped based on preset rules, model training is not required, accuracy is high, cold start speed is high, and by displaying entity combinations and corresponding the entity combinations to data filtering conditions, relevance between the entities can be reflected in an entity display process in data analysis.
According to another aspect of embodiments, an entity presentation apparatus in data analysis is also provided, and the apparatus is configured to perform the method provided by the embodiments of the present specification. FIG. 6 shows a schematic block diagram of an entity exposure apparatus in data analysis, according to one embodiment. As shown in fig. 6, the apparatus 600 includes:
the acquiring unit 61 is configured to acquire an entity sequence obtained by performing entity identification on a natural language text input by a user, where the natural language text is used to express a data analysis requirement of the user on target data;
a determining unit 62, configured to determine whether at least two adjacent entities in the entity sequence satisfy a preset rule according to at least one of the entity category, the entity type, and the association relationship between the entities in the entity sequence acquired by the acquiring unit 61; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
a combining unit 63, configured to combine the at least two entities to obtain an entity combination if the determination result of the determining unit 62 meets a preset rule;
and a display unit 64, configured to display the entity combination obtained by the combination unit 63 as a data filtering condition included in the data analysis requirement.
Optionally, as an embodiment, the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data.
Further, the at least two entities comprise a first entity, a second entity and a third entity which are arranged in sequence; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value.
Further, the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules comprise:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity.
Further, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules include:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is greater than, less than, greater than or equal to or less than or equal to a logical operator;
the entity type of the eighth entity is a dimension value type, and the entity type is used to indicate that the entity belongs to a type of numerical value.
Further, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules include:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
Optionally, as an embodiment, the presenting unit 64 is specifically configured to highlight, in the input box, a portion of the natural language text corresponding to the entity combination; or, in a prompt box outside the input box, the entity combination is shown and is shown as one of the data filtering conditions.
Optionally, as an embodiment, the apparatus further includes:
and a construction unit, configured to construct a data query script according to the data filtering condition displayed by the display unit 64, and execute a query operation on the target data to obtain a query result corresponding to the data analysis requirement.
With the apparatus provided in this specification, first, the obtaining unit 61 obtains an entity sequence obtained by performing entity identification on a natural language text input by a user, where the natural language text is used to express a data analysis requirement of the user on target data; then, the determining unit 62 determines whether at least two adjacent entities in the entity sequence satisfy a preset rule according to at least one of the entity type, and the association relationship between the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string; then, when the judgment result is that the preset rule is satisfied, the combining unit 63 combines the at least two entities to obtain an entity combination; finally, the presentation unit 64 presents the entity combination as a data filtering condition included in the data analysis requirement. As can be seen from the above, in the embodiments of the present specification, entities are grouped based on preset rules, model training is not required, accuracy is high, cold start speed is high, and by displaying entity combinations and corresponding the entity combinations to data filtering conditions, relevance between the entities can be reflected in an entity display process in data analysis.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 3.
Those skilled in the art will recognize that the functionality described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof, in one or more of the examples described above. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method of entity presentation in data analysis, the method comprising:
acquiring an entity sequence obtained by entity recognition aiming at a natural language text input by a user, wherein the natural language text is used for expressing the data analysis requirement of the user on target data;
judging whether at least two adjacent entities in the entity sequence meet a preset rule or not according to at least one of entity categories, entity types and incidence relations among the entities in the entity sequence; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
if the judgment result is that the preset rule is met, combining the at least two entities to obtain an entity combination;
displaying the entity combination as a data filtering condition included by the data analysis requirement;
wherein the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data;
the at least two entities comprise a first entity, a second entity and a third entity which are sequentially arranged; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value;
or the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules include:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity;
or, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules include:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is greater than, less than, greater than or equal to or less than or equal to a logical operator;
the entity type of the eighth entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value;
or, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules include:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
2. The method of claim 1, wherein said presenting said combination of entities comprises:
highlighting a part of the natural language text corresponding to the entity combination in an input box; alternatively, the first and second electrodes may be,
and displaying the entity combination in a prompt box outside the input box, and showing the entity combination as one data filtering condition.
3. The method of claim 1, wherein the method further comprises:
and constructing a data query script according to the data filtering condition, wherein the data query script is used for executing query operation on the target data to obtain a query result corresponding to the data analysis requirement.
4. An entity presentation apparatus in data analysis, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an entity sequence obtained by entity identification aiming at a natural language text input by a user, and the natural language text is used for expressing the data analysis requirement of the user on target data;
a determining unit, configured to determine whether at least two adjacent entities in the entity sequence satisfy a preset rule according to at least one of an entity type, and an association relationship between entities in the entity sequence acquired by the acquiring unit; the entity type is used for indicating that the entity belongs to a type of a numerical value or a character string;
the combination unit is used for combining the at least two entities to obtain an entity combination if the judgment result of the judgment unit meets a preset rule;
the display unit is used for displaying the entity combination obtained by the combination unit as a data filtering condition included by the data analysis requirement;
wherein the entity categories include an operator category, a dimension category, and a dimension value category; the dimension category corresponds to a field name in the target data, and the dimension value category corresponds to a specific value of a field in the target data;
the at least two entities comprise a first entity, a second entity and a third entity which are sequentially arranged; the preset rules include:
the entity category of the first entity is a dimension category, and the entity type of the first entity is used for indicating that the entity belongs to a type of a numerical value;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the third entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value;
or the at least two entities comprise a fourth entity, a second entity and a fifth entity which are arranged in sequence; the preset rules include:
the entity category of the fourth entity is a dimension category, and the entity type of the fourth entity is used for indicating that the entity belongs to the type of the character string;
the entity category of the second entity is an operator category and is equal to or not equal to a logical operator;
the entity type of the fifth entity is a dimension value type, and the association relationship between the fifth entity and the fourth entity is a dimension value corresponding to the fifth entity belonging to the fourth entity;
or, the at least two entities comprise a sixth entity, a seventh entity and an eighth entity which are arranged in sequence; the preset rules include:
the entity category of the sixth entity is a dimension category, and the entity type of the sixth entity is used for indicating that the entity belongs to a type of numerical value;
the entity category of the seventh entity is an operator category and is a logical operator which is greater than, less than, greater than or equal to or less than;
the entity type of the eighth entity is a dimension value type, and the entity type is used for indicating that the entity belongs to a type of a numerical value;
or, the at least two entities include a ninth entity and a tenth entity which are arranged in sequence; the preset rules include:
the entity categories of the ninth entity and the tenth entity are dimension value categories and belong to dimension values corresponding to the same dimension.
5. The apparatus according to claim 4, wherein the presentation unit is specifically configured to highlight, in the input box, a portion of the natural language text corresponding to the entity combination; or, in a prompt box outside the input box, the entity combination is displayed and is shown as one data filtering condition.
6. The apparatus of claim 4, wherein the apparatus further comprises:
and the construction unit is used for constructing a data query script according to the data filtering conditions displayed by the display unit, and is used for executing query operation on the target data to obtain a query result corresponding to the data analysis requirement.
7. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-3.
8. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-3.
CN202210135204.0A 2022-02-15 2022-02-15 Entity display method and device in data analysis Active CN114218935B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210952243.XA CN115345157A (en) 2022-02-15 2022-02-15 Entity display method and device in data analysis
CN202210135204.0A CN114218935B (en) 2022-02-15 2022-02-15 Entity display method and device in data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210135204.0A CN114218935B (en) 2022-02-15 2022-02-15 Entity display method and device in data analysis

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202210952243.XA Division CN115345157A (en) 2022-02-15 2022-02-15 Entity display method and device in data analysis

Publications (2)

Publication Number Publication Date
CN114218935A CN114218935A (en) 2022-03-22
CN114218935B true CN114218935B (en) 2022-06-21

Family

ID=80709266

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210952243.XA Pending CN115345157A (en) 2022-02-15 2022-02-15 Entity display method and device in data analysis
CN202210135204.0A Active CN114218935B (en) 2022-02-15 2022-02-15 Entity display method and device in data analysis

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202210952243.XA Pending CN115345157A (en) 2022-02-15 2022-02-15 Entity display method and device in data analysis

Country Status (1)

Country Link
CN (2) CN115345157A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491373A (en) * 2018-02-01 2018-09-04 北京百度网讯科技有限公司 A kind of entity recognition method and system
CN110955752A (en) * 2019-11-25 2020-04-03 三角兽(北京)科技有限公司 Information display method and device, electronic equipment and computer storage medium
CN111091883A (en) * 2019-12-16 2020-05-01 东软集团股份有限公司 Medical text processing method and device, storage medium and equipment
CN112001188A (en) * 2020-10-30 2020-11-27 北京智源人工智能研究院 Method and device for rapidly realizing NL2SQL based on vectorization semantic rule
CN113657113A (en) * 2021-08-24 2021-11-16 北京字跳网络技术有限公司 Text processing method and device and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8751505B2 (en) * 2012-03-11 2014-06-10 International Business Machines Corporation Indexing and searching entity-relationship data
US20140278983A1 (en) * 2013-03-15 2014-09-18 Microsoft Corporation Using entity repository to enhance advertisement display
CN106033466A (en) * 2015-03-20 2016-10-19 华为技术有限公司 Database query method and device
US20180210883A1 (en) * 2017-01-25 2018-07-26 Dony Ang System for converting natural language questions into sql-semantic queries based on a dimensional model
CN111310469A (en) * 2020-01-16 2020-06-19 北京明略软件系统有限公司 Method and device for searching invisible relationship between entities, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491373A (en) * 2018-02-01 2018-09-04 北京百度网讯科技有限公司 A kind of entity recognition method and system
CN110955752A (en) * 2019-11-25 2020-04-03 三角兽(北京)科技有限公司 Information display method and device, electronic equipment and computer storage medium
CN111091883A (en) * 2019-12-16 2020-05-01 东软集团股份有限公司 Medical text processing method and device, storage medium and equipment
CN112001188A (en) * 2020-10-30 2020-11-27 北京智源人工智能研究院 Method and device for rapidly realizing NL2SQL based on vectorization semantic rule
CN113657113A (en) * 2021-08-24 2021-11-16 北京字跳网络技术有限公司 Text processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN115345157A (en) 2022-11-15
CN114218935A (en) 2022-03-22

Similar Documents

Publication Publication Date Title
CN110543517B (en) Method, device and medium for realizing complex query of mass data based on elastic search
CN109766497B (en) Ranking list generation method and device, storage medium and electronic equipment
TWI643076B (en) Financial analysis system and method for unstructured text data
CN101398758B (en) Detection method of code copy
US20050183002A1 (en) Data and metadata linking form mechanism and method
Van der Aa et al. Checking process compliance against natural language specifications using behavioral spaces
KR20190076047A (en) System and method for determining relationships between data elements
CN115061721A (en) Report generation method and device, computer equipment and storage medium
CN109241075B (en) Index basic data processing method and equipment and computer readable storage medium
CN109101541B (en) Newly added index management method, device and computer readable storage medium
JP7015319B2 (en) Data analysis support device, data analysis support method and data analysis support program
EP1745390A2 (en) Data and metadata linking form mechanism and method
CN112966482A (en) Report generation method, device and equipment
CN114218935B (en) Entity display method and device in data analysis
JP7015320B2 (en) Data analysis support device, data analysis support method and data analysis support program
Scaffidi et al. Intelligently creating and recommending reusable reformatting rules
CN114090620B (en) Query request processing method and device
CN111143398B (en) Extra-large set query method and device based on extended SQL function
JP5020274B2 (en) Semantic drift occurrence evaluation method and apparatus
CN114090627B (en) Data query method and device
CN116127053B (en) Entity word disambiguation, knowledge graph generation and knowledge recommendation methods and devices
JP5324500B2 (en) File sharing device
US11315590B2 (en) Voice and graphical user interface
CN116126918A (en) Data generation method, information screening method, device and medium
CN114610791A (en) Data blood relationship analysis method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant