CN115438048A - Table searching method, device, equipment and storage medium - Google Patents

Table searching method, device, equipment and storage medium Download PDF

Info

Publication number
CN115438048A
CN115438048A CN202211201173.0A CN202211201173A CN115438048A CN 115438048 A CN115438048 A CN 115438048A CN 202211201173 A CN202211201173 A CN 202211201173A CN 115438048 A CN115438048 A CN 115438048A
Authority
CN
China
Prior art keywords
input
database
data
user input
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211201173.0A
Other languages
Chinese (zh)
Inventor
陈先丽
王阳
刘屹
李楠
王皖麟
孙猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Finance Technology Co Ltd
Original Assignee
China Merchants Finance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Finance Technology Co Ltd filed Critical China Merchants Finance Technology Co Ltd
Priority to CN202211201173.0A priority Critical patent/CN115438048A/en
Publication of CN115438048A publication Critical patent/CN115438048A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data processing technology and discloses a table searching method, a table searching device, electronic equipment and a storage medium, wherein the method comprises the following steps: performing type recognition on the acquired user input, performing data cleaning on the text input with the type recognition result, performing vector calculation on an input entity obtained by cleaning a data extraction entity to obtain a first expression vector, performing vector calculation on a table in a preset table database to obtain a second expression vector, and selecting a matching table from the table database according to the calculation results of the similarity of the first and second expression vectors; if the type identification result is table input, calculating the table list name correlation degree and the content correlation degree of each table according to the results of table list name identification and content identification of each table in the user input and table database respectively; and selecting a matching table from the table database according to the comprehensive relevance obtained by comprehensively scoring the list name relevance and the content relevance. The invention can improve the efficiency and accuracy of table search input by the user.

Description

Table searching method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a table search method and apparatus, an electronic device, and a computer-readable storage medium.
Background
In the face of mass data, users often need to accurately search effective data from the mass data, and the table is used as one of important data storage modes and becomes a carrier for searching objects by the users. In the using process, effective data and a large amount of interference data exist in the table, so that user input is quickly identified, data are accurately matched, the user requirements are met, and the high efficiency and the accuracy of table searching are improved. The existing table searching technology has the problems of low table searching efficiency and low accuracy based on user input due to single object supporting searching and lack of perfect matching mechanism, so that the requirement of high-efficiency acquisition of required data by a user is difficult to meet.
Disclosure of Invention
The invention provides a table searching method, a table searching device, electronic equipment and a computer readable storage medium, and mainly aims to solve the problems of low table searching efficiency and low accuracy based on user input.
In order to achieve the above object, the present invention provides a table search method, including:
acquiring user input, performing type recognition on the user input, and judging whether the user input is text input or table input according to a type recognition result;
when the user input is text input, cleaning the data of the user input to obtain cleaned data, and extracting entities from the cleaned data to obtain input entities;
performing vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database;
similarity calculation is carried out on the first expression vector and the second expression vector, and a matching table is selected from the table database according to the result of the similarity calculation;
when the user input is table input, performing table list name identification and content identification on the user input and each table in the table database respectively, and calculating the table list name correlation degree and the content correlation degree of each table according to the table list name identification result and the content identification result;
and comprehensively scoring according to the list name relevancy and the content relevancy to obtain comprehensive relevancy, and selecting a matching list from the list database according to the comprehensive relevancy.
Optionally, the performing type identification on the user input, and determining whether the user input is text input or table input according to a result of the type identification includes:
extracting the data format input by the user to obtain a target data format;
performing similar retrieval in a preset text data format set and a preset table data format set by using the target data format to obtain a matching type;
if the matching type belongs to the text data format set, judging that the user input is text input;
and if the matching type belongs to the table data format set, judging that the user input is table input.
Optionally, the performing data cleansing on the user input to obtain cleansing data includes:
performing syntactic analysis on the user input according to a preset text rule to obtain interference data;
and filtering and correcting the interference data to obtain cleaning data.
Optionally, the extracting an entity from the cleaning data to obtain an input entity includes:
performing part-of-speech analysis and word segmentation processing on the cleaning data to obtain input word segments and corresponding parts-of-speech;
acquiring a preset deactivation part-of-speech tag, and screening the input participle according to the part-of-speech of the input participle of the part-of-speech tag to obtain a standard participle;
and searching in a preset entity database by using the standard participle, and taking the searched standard participle as an input entity.
Optionally, the performing vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database includes:
performing word vector conversion on the input entity to obtain a word vector corresponding to the input entity;
carrying out weighted average on the word vectors to obtain a first expression vector of the input entity;
obtaining a table field corresponding to a table in the table database, and performing vector conversion on the table field to obtain a table field vector corresponding to the table field;
and generating a weight coefficient corresponding to the table field according to the word frequency and the table frequency of the table field, and performing vector comprehensive calculation according to the table field vector and the weight coefficient to obtain a second expression vector of the table in the table database.
Optionally, the performing vector comprehensive calculation according to the table field vector and the weight coefficient to obtain a second expression vector of the table in the table database includes:
and carrying out vector comprehensive calculation according to the table field vector and the weight coefficient by using the following formula:
Figure BDA0003872089120000031
wherein the content of the first and second substances,
Figure BDA0003872089120000032
j =1,2,3, …, N (N is a natural number) of a jth table field corresponding to a table in the table database; w (t) j ) The weight coefficient of the jth table field;
Figure BDA0003872089120000033
a second representation vector for a table in the table database.
Optionally, the calculating the table list name correlation and the content correlation of each table according to the table list name recognition result and the content recognition result includes:
the table list name correlation of each table is calculated from the table list name recognition result using the following formula:
Figure BDA0003872089120000034
wherein, H is the identifier identified by the list name;
Figure BDA0003872089120000035
list name data input by a user in the list name identification result;
Figure BDA0003872089120000036
table column name data for an ith table of the table database for the result of said table column name identification;
Figure BDA0003872089120000037
the table column name relevancy of the ith table in the table database.
The content relevance of each table is calculated from the result of the content identification using the following formula:
Figure BDA0003872089120000038
wherein, C is the identification of content identification;
Figure BDA0003872089120000039
content data input by a user for the result of the content recognition;
Figure BDA00038720891200000310
content data of an ith table of a table database for the result of the content identification;
Figure BDA00038720891200000311
the content relevancy of the ith table in the table database.
In order to solve the above problem, the present invention also provides a table search apparatus, comprising:
the type identification module is used for acquiring user input, performing type identification on the user input, and judging whether the user input is text input or table input according to a type identification result;
the input entity generating module is used for cleaning the data of the user input to obtain cleaning data and extracting entities from the cleaning data to obtain input entities when the user input is text input;
the vector calculation module is used for performing vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database;
the similarity calculation module is used for calculating the similarity of the first expression vector and the second expression vector and selecting a matching table from the table database according to the result of the similarity calculation;
the correlation degree generation module is used for respectively carrying out list name identification and content identification on the user input and each list in the list database when the user input is list input, and calculating the list name correlation degree and the content correlation degree of each list according to a list name identification result and a content identification result;
and the comprehensive scoring module is used for comprehensively scoring according to the list name correlation degree and the content correlation degree to obtain comprehensive correlation degree, and selecting a matching list from the list database according to the comprehensive correlation degree.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the table search method described above.
In order to solve the above problem, the present invention also provides a computer-readable storage medium having at least one computer program stored therein, the at least one computer program being executed by a processor in an electronic device to implement the table search method described above.
The embodiment of the invention can simultaneously support text input and table input, realizes the functional requirements of searching the table by text and searching the table by table, and improves the efficiency and the practicability of table information retrieval; in the table searching by the text, semantic characteristics of user input and a table database are enriched by calculating vectors of input entities and tables in a preset table database, and the retrieval accuracy is higher compared with that of the traditional keyword-based retrieval; in table searching, similarity calculation is carried out on the table list names and the table contents, comprehensive scoring is carried out on the calculation results by using a scoring function, table information is fully utilized, more related tables are obtained, the searching range is expanded, effective table selectivity is improved, and the user searching requirement is further met. Therefore, the table searching method, the table searching device, the electronic equipment and the computer readable storage medium provided by the invention can solve the problems of low table searching efficiency and low accuracy based on user input.
Drawings
FIG. 1 is a flowchart illustrating a table search method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating the process of extracting entities from the cleaning data to obtain input entities according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating vector calculation for the input entities and tables in the predetermined table database according to an embodiment of the present invention;
FIG. 4 is a functional block diagram of a table search apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing the table search method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a table searching method. The execution subject of the table search method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the table search method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and data processing platform.
Fig. 1 is a schematic flowchart of a table searching method according to an embodiment of the present invention. In this embodiment, the table search method includes:
s1, acquiring user input, performing type recognition on the user input, and judging whether the user input is text input or table input according to a type recognition result;
in the embodiment of the present invention, the user input is input content when the user searches using a table, and the user input may be a text or a table. Wherein the text comprises Chinese characters, english words, numbers and the like; the table includes a structured table or a table image. The type identification is to judge whether the type input by the user is a text or a table, and execute a text input processing flow or a table input processing flow according to the judged result.
In the embodiment of the present invention, the performing type identification on the user input, and determining whether the user input is text input or table input according to a result of the type identification includes:
extracting the data format input by the user to obtain a target data format;
performing similar retrieval in a preset text data format set and a preset table data format set by using the target data format to obtain a matching type;
if the matching type belongs to the text data format set, judging that the user input is text input;
and if the matching type belongs to the table data format set, judging that the user input is table input.
In the embodiment of the invention, the data formats contained in the text data format set comprise numerical values, character strings, boolean values, arrays, character objects and the like; the data formats contained in the table data format set include an excel format, a JPEG format, a TIF format, an EPS format and the like.
When the user input is text input, executing S2, performing data cleaning on the user input to obtain cleaning data, and extracting an entity from the cleaning data to obtain an input entity;
in the embodiment of the present invention, the performing data cleaning on the user input to obtain cleaning data includes:
performing syntactic analysis on the user input according to a preset text rule to obtain interference data;
and filtering and correcting the interference data to obtain cleaning data.
In the embodiment of the invention, an ETL (Extract-Transform-Load) cleaning method can be adopted for cleaning the data, the ETL cleaning is to clean the interference data by analyzing the generation reason and the existence form of the interference data, and the types of the interference data comprise incomplete data, error data, repeated data, emoji, sensitive data and the like; and original data which do not meet the text requirements are converted into data which meet the text requirements, so that the high quality of user input is effectively ensured, and the accuracy of subsequent table retrieval is improved.
Referring to fig. 2, in an embodiment of the present invention, the extracting entities from the cleansing data to obtain input entities includes:
s21, performing part-of-speech analysis and word segmentation processing on the cleaning data to obtain input word segments and corresponding part-of-speech;
s22, acquiring a preset deactivation part-of-speech tag, and screening the input participle according to the part-of-speech of the input participle of the part-of-speech tag to obtain a standard participle;
and S23, searching in a preset entity database by using the standard participle, and taking the searched standard participle as an input entity.
In the embodiment of the invention, a word segmentation device can be used for carrying out word segmentation processing on the cleaning data, wherein the word segmentation device comprises but is not limited to a crust word segmentation device; the part of speech of the input participle comprises nouns, verbs, adjectives, adverbs, auxiliary words and the like. Further, the deactivation part of speech tag may be an adjective, an adverb, an auxiliary word, or the like.
In the embodiment of the invention, the entity database can be summarized by business personnel according to the table data of the actual business or constructed by the information of a historical data table.
For example, assume that there is a deactivation part-of-speech tag: the auxiliary words are input with the word segmentation of ' 23 number ', ' and ' business report ', the corresponding parts of speech of the input word segmentation are ' noun ', ' auxiliary word ' and ' noun ', therefore, the parts of speech of the input word segmentation corresponding to the ' auxiliary word ' can be deleted, and the standard word segmentation of ' 23 number ' and ' business report ' can be obtained.
In the embodiment of the invention, the entity is extracted from the cleaning data to obtain the input entity, so that the key information in the cleaning data is determined, and the key information in the input of the whole user is combed.
S3, carrying out vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database;
referring to fig. 3, in the embodiment of the present invention, the performing vector calculation on the input entity and the table in the preset table database to obtain the first expression vector of the input entity and the second expression vector of the table in the table database includes:
s31, performing word vector conversion on the input entity to obtain a word vector corresponding to the input entity;
s32, carrying out weighted average on the word vectors to obtain a first expression vector of the input entity;
s33, obtaining a table field corresponding to a table in the table database, and performing vector conversion on the table field to obtain a table field vector corresponding to the table field;
and S34, generating a weight coefficient corresponding to the table field according to the word frequency and the table frequency of the table field, and performing vector comprehensive calculation according to the table field vector and the weight coefficient to obtain a second expression vector of the table in the table database.
In the embodiment of the invention, a Word2vec model can be adopted to perform vector calculation on the input entity and the table fields corresponding to the tables in the table database.
Specifically, the word vector is weighted-averaged using the following equation to obtain a first expression vector of the input entity:
Figure BDA0003872089120000081
wherein the content of the first and second substances,
Figure BDA0003872089120000082
i =1,2,3, …, N (N is a natural number) as the i-th word vector corresponding to the input entity; n is the total number of entities of the input entity;
Figure BDA0003872089120000083
is a first representation vector of an input entity.
Further, the weight coefficients corresponding to the table fields may be generated according to the word frequency and the frequency of the table fields by using the following formula:
Figure BDA0003872089120000084
wherein T is the table word frequency of the table field; a is the number of times the table field appears in a table of the table database; b is the total number of table fields in the table database;
Figure BDA0003872089120000085
wherein F is the table frequency of the table field; c is the total number of tables in the table database; d is the number of tables containing the table fields in the table database;
w=T×F
wherein w is the weight coefficient; t is the table word frequency of the table field; f is the table frequency of the table field.
In the embodiment of the invention, the Term Frequency and the table Frequency of the table field can be represented by TF (Term Frequency) and IDF (inverse document Frequency) of the table field, and the weight coefficient corresponding to the table field is determined by TF-IDF algorithm.
Specifically, in the embodiment of the present invention, a second expression vector of the table in the table database may be obtained by performing vector synthesis calculation according to the table field vector and the weight coefficient by using the following formula:
Figure BDA0003872089120000086
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003872089120000087
j =1,2,3, …, N (N is a natural number) of a jth table field corresponding to a table in the table database; w (t) j ) The weight coefficient of the jth table field;
Figure BDA0003872089120000088
a second representation vector for a table in the table database.
In the embodiment of the present invention, the first expression vector of the input entity is a vector obtained by weighted averaging of vectors of the input entities; and the second expression vector of the table in the table database is the vector obtained by weighted average of the table field weight coefficients.
S4, similarity calculation is carried out on the first expression vector and the second expression vector, and a matching table is selected from the table database according to the result of the similarity calculation;
in the embodiment of the present invention, a hash calculation may be performed on the first expression vector and the second expression vector by using a string numerical hash algorithm to obtain a hash string of the first expression vector and the second expression vector, and a primary similarity comparison may be performed by performing a string matching on the first expression vector and the second expression vector according to the hash string.
In this embodiment of the present invention, the similarity calculation may be performed on the first expression vector and the second expression vector by using the following similarity calculation formula:
Figure BDA0003872089120000091
wherein the content of the first and second substances,
Figure BDA0003872089120000092
is the first representation vector
Figure BDA0003872089120000093
And the second representation vector
Figure BDA0003872089120000094
The value range of similarity calculation is between 0 and 1]。
In the embodiment of the present invention, when the similarity calculation result is [0.7 to 1], it may be determined that the input entity is similar to a table in the table database, and a corresponding matching table may be selected from the table database. For example, the first expression vector sentence a is (1,1,2,1,1,1,0,0,0), the second expression vector sentence B is (1,1,1,0,1,1,1,1,1), and the similarity calculation formula is used for calculation, so that the similarity calculation formula is obtained
Figure BDA0003872089120000095
The calculation result of the magnitude of the similarity is 0.81, which is in the range of [0.7 to 1], so that the sentence a and the sentence B are similar, and a corresponding matching table is selected from the table database according to the second expression vector.
When the user input is table input, executing S5, respectively performing table list name identification and content identification on the user input and each table in the table database, and calculating the table list name correlation degree and the content correlation degree of each table according to the table list name identification result and the content identification result;
in the embodiment of the present invention, the performing list name recognition and content recognition on the user input and each table in the table database respectively includes:
extracting the user input and the structure data in the table database respectively;
and carrying out attribute classification according to the data attribute of the structural data to obtain tabular list name data and content data.
In the embodiment of the present invention, the structure data has different data attributes because the structure data has different positions in the table. For example, data located in a line has data attributes that are substantially a single time attribute, type attribute, and the like.
In the embodiment of the present invention, the calculating the table list name correlation and the content correlation of each table according to the table list name recognition result and the content recognition result includes:
the table list name correlation of each table is calculated from the table list name recognition result using the following formula:
Figure BDA0003872089120000101
wherein, H is the identifier identified by the list name;
Figure BDA0003872089120000102
list name data input by a user in the list name identification result;
Figure BDA0003872089120000103
table column name data for an ith table of the table database for the result of said table column name identification;
Figure BDA0003872089120000104
the table column name relevancy of the ith table in the table database.
The content relevance of each table is calculated from the result of the content identification using the following formula:
Figure BDA0003872089120000105
wherein, C is the identification of content identification;
Figure BDA0003872089120000106
content data input by a user for the result of the content recognition;
Figure BDA0003872089120000107
content data of an ith table of a table database for the result of the content identification;
Figure BDA0003872089120000108
the content relevancy of the ith table in the table database.
In the embodiment of the invention, the table searching range is expanded by comprehensively searching the two aspects of the table list name and the table content.
And S6, comprehensively scoring according to the list name correlation degree and the content correlation degree to obtain comprehensive correlation degree, and selecting a matching list from the list database according to the comprehensive correlation degree.
In the embodiment of the invention, comprehensive scoring can be carried out according to the tabular name relevancy and the content relevancy through the following formula to obtain comprehensive relevancy:
Figure BDA0003872089120000109
wherein score (i) is the comprehensive correlation of the ith table in the table database; alpha is a preset weight adjusting factor;
Figure BDA00038720891200001010
the table column name correlation degree of the ith table in the table database;
Figure BDA00038720891200001011
the content relevancy of the ith table in the table database.
In the embodiment of the invention, the larger the comprehensive relevance is, the more matched the table corresponding to the comprehensive relevance is, and the matching table with the highest matching comprehensive relevance is selected, so that the optimal matching table input by the user can be obtained. For example, the combined correlation degree of the matching table E is 0.6, and the combined correlation degree of the matching table G is 0.8, then the matching table G is the optimal matching table input by the user.
The embodiment of the invention can simultaneously support text input and table input, realizes the functional requirements of searching the table by the text and searching the table by the table, and improves the efficiency and the practicability of table information retrieval; in the table searching by the text, semantic characteristics of user input and a table database are enriched by calculating vectors of input entities and tables in a preset table database, and the retrieval accuracy is higher compared with that of the traditional keyword-based retrieval; in table searching, similarity calculation is carried out on the table list names and the table contents, comprehensive scoring is carried out on the calculation results by using a scoring function, table information is fully utilized, more related tables are obtained, the searching range is expanded, effective table selectivity is improved, and the user searching requirement is further met. Therefore, the table searching method provided by the invention can solve the problems of low table searching efficiency and low accuracy based on user input.
Fig. 4 is a functional block diagram of a table search apparatus according to an embodiment of the present invention.
The table searching apparatus 100 of the present invention may be installed in an electronic device. According to the implemented functions, the table search apparatus 100 may include a type identification module 101, an input entity generation module 102, a vector calculation module 103, a similarity calculation module 104, a correlation generation module 105, and a comprehensive scoring module 106. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the type identification module 101 is configured to obtain a user input, perform type identification on the user input, and determine whether the user input is text input or table input according to a result of the type identification;
the input entity generating module 102 is configured to, when the user input is a text input, perform data cleaning on the user input to obtain cleaning data, and extract an entity from the cleaning data to obtain an input entity;
the vector calculation module 103 is configured to perform vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database;
the similarity calculation module 104 is configured to perform similarity calculation on the first representation vector and the second representation vector, and select a matching table from the table database according to a result of the similarity calculation;
the relevancy generating module 105 is configured to, when the user input is a table input, perform table-list name recognition and content recognition on the user input and each table in the table database, and calculate a table-list relevancy and a content relevancy of each table according to a table-list name recognition result and a content recognition result;
and the comprehensive scoring module 106 is configured to perform comprehensive scoring according to the list name relevancy and the content relevancy to obtain a comprehensive relevancy, and select a matching list from the list database according to the comprehensive relevancy.
In detail, when the modules in the table searching apparatus 100 according to the embodiment of the present invention are used, the same technical means as the table searching method described in the drawings are adopted, and the same technical effects can be produced, which is not described herein again.
Fig. 5 is a schematic structural diagram of an electronic device implementing a table search method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as a table search program, stored in the memory 11 and executable on the processor 10.
In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (e.g., executing a table search program, etc.) stored in the memory 11 and calling data stored in the memory 11.
The memory 11 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in the electronic device and various types of data, such as codes of a table search program, etc., but also to temporarily store data that has been output or is to be output.
The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
The communication interface 13 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Fig. 5 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 5 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The table search program stored in the memory 11 of the electronic device 1 is a combination of a plurality of instructions, and when executed in the processor 10, can realize:
acquiring user input, performing type recognition on the user input, and judging whether the user input is text input or table input according to a type recognition result;
when the user input is text input, performing data cleaning on the user input to obtain cleaning data, and extracting entities from the cleaning data to obtain input entities;
performing vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database;
similarity calculation is carried out on the first expression vector and the second expression vector, and a matching table is selected from the table database according to the result of the similarity calculation;
when the user input is table input, performing table list name identification and content identification on the user input and each table in the table database respectively, and calculating the table list name correlation degree and the content correlation degree of each table according to the table list name identification result and the content identification result;
and comprehensively scoring according to the list name relevancy and the content relevancy to obtain comprehensive relevancy, and selecting a matching list from the list database according to the comprehensive relevancy.
Specifically, the specific implementation method of the instruction by the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to the drawings, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring user input, performing type recognition on the user input, and judging whether the user input is text input or table input according to a type recognition result;
when the user input is text input, cleaning the data of the user input to obtain cleaned data, and extracting entities from the cleaned data to obtain input entities;
performing vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database;
similarity calculation is carried out on the first expression vector and the second expression vector, and a matching table is selected from the table database according to the result of the similarity calculation;
when the user input is table input, performing table list name identification and content identification on the user input and each table in the table database respectively, and calculating the table list name correlation degree and the content correlation degree of each table according to the table list name identification result and the content identification result;
and comprehensively scoring according to the list name relevancy and the content relevancy to obtain comprehensive relevancy, and selecting a matching list from the list database according to the comprehensive relevancy.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method of table searching, the method comprising:
acquiring user input, performing type recognition on the user input, and judging whether the user input is text input or table input according to a type recognition result;
when the user input is text input, performing data cleaning on the user input to obtain cleaning data, and extracting entities from the cleaning data to obtain input entities;
performing vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database;
similarity calculation is carried out on the first expression vector and the second expression vector, and a matching table is selected from the table database according to the result of the similarity calculation;
when the user input is table input, performing table list name identification and content identification on the user input and each table in the table database respectively, and calculating the table list name correlation degree and the content correlation degree of each table according to the table list name identification result and the content identification result;
and comprehensively scoring according to the list name relevancy and the content relevancy to obtain comprehensive relevancy, and selecting a matching list from the list database according to the comprehensive relevancy.
2. The form searching method of claim 1, wherein the type-recognizing the user input and determining whether the user input is a text input or a form input according to a result of the type-recognizing comprises:
extracting the data format input by the user to obtain a target data format;
performing similar retrieval in a preset text data format set and a preset table data format set by using the target data format to obtain a matching type;
if the matching type belongs to the text data format set, judging that the user input is text input;
and if the matching type belongs to the table data format set, judging that the user input is table input.
3. The table search method of claim 1, wherein said data cleansing said user input to obtain cleansed data comprises:
performing syntactic analysis on the user input according to a preset text rule to obtain interference data;
and filtering and correcting the interference data to obtain cleaning data.
4. The table search method of claim 1, wherein said extracting entities from said cleansing data to obtain input entities comprises:
performing part-of-speech analysis and word segmentation processing on the cleaning data to obtain input word segments and corresponding parts-of-speech;
acquiring a preset deactivation part-of-speech tag, and screening the input participle according to the part-of-speech of the input participle of the part-of-speech tag to obtain a standard participle;
and searching in a preset entity database by using the standard participle, and taking the searched standard participle as an input entity.
5. The table searching method of claim 1, wherein the performing vector calculation on the input entity and the table in the preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database comprises:
performing word vector conversion on the input entity to obtain a word vector corresponding to the input entity;
carrying out weighted average on the word vectors to obtain a first expression vector of the input entity;
obtaining a table field corresponding to a table in the table database, and performing vector conversion on the table field to obtain a table field vector corresponding to the table field;
and generating a weight coefficient corresponding to the table field according to the word frequency and the table frequency of the table field, and performing vector comprehensive calculation according to the table field vector and the weight coefficient to obtain a second expression vector of the table in the table database.
6. The table searching method of claim 5, wherein the performing a vector synthesis calculation according to the table field vector and the weight coefficient to obtain a second expression vector of the table in the table database comprises:
and carrying out vector comprehensive calculation according to the table field vector and the weight coefficient by using the following formula:
Figure FDA0003872089110000021
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003872089110000022
j =1,2,3, …, N (N is a natural number) for the jth table field corresponding to the table in the table database; w (t) j ) The weight coefficient of the jth table field;
Figure FDA0003872089110000023
a second representation vector for a table in the table database.
7. The table searching method according to any one of claims 1 to 6, wherein the calculating of the table column name correlation degree and the content correlation degree of each table based on the table column name recognition result and the content recognition result includes:
the table list name correlation of each table is calculated from the table list name recognition result using the following formula:
Figure FDA0003872089110000031
wherein, H is the identifier identified by the list name;
Figure FDA0003872089110000032
list name data input by a user in the list name identification result;
Figure FDA0003872089110000033
table column name data of the ith table of the table database for the result of said table column name identification;
Figure FDA0003872089110000034
The table column name relevancy of the ith table in the table database.
The content relevance of each table is calculated from the result of content identification using the following formula:
Figure FDA0003872089110000035
wherein, C is the identification of content identification;
Figure FDA0003872089110000036
content data input by a user for the result of the content recognition;
Figure FDA0003872089110000037
content data of an ith table of a table database for the result of the content identification;
Figure FDA0003872089110000038
the content relevancy of the ith table in the table database.
8. A table search apparatus, the apparatus comprising:
the type identification module is used for acquiring user input, performing type identification on the user input, and judging whether the user input is text input or table input according to a type identification result;
the input entity generating module is used for cleaning the data of the user input to obtain cleaning data and extracting entities from the cleaning data to obtain input entities when the user input is text input;
the vector calculation module is used for carrying out vector calculation on the input entity and a table in a preset table database to obtain a first expression vector of the input entity and a second expression vector of the table in the table database;
the similarity calculation module is used for calculating the similarity of the first expression vector and the second expression vector and selecting a matching table from the table database according to the result of the similarity calculation;
the relevancy generation module is used for respectively carrying out list name identification and content identification on the user input and each list in the list database when the user input is list input, and calculating the list name relevancy and the content relevancy of each list according to a list name identification result and a content identification result;
and the comprehensive scoring module is used for comprehensively scoring according to the list name correlation degree and the content correlation degree to obtain comprehensive correlation degree, and selecting a matching list from the list database according to the comprehensive correlation degree.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the table search method of any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a table search method according to any one of claims 1 to 7.
CN202211201173.0A 2022-09-29 2022-09-29 Table searching method, device, equipment and storage medium Pending CN115438048A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211201173.0A CN115438048A (en) 2022-09-29 2022-09-29 Table searching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211201173.0A CN115438048A (en) 2022-09-29 2022-09-29 Table searching method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115438048A true CN115438048A (en) 2022-12-06

Family

ID=84250247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211201173.0A Pending CN115438048A (en) 2022-09-29 2022-09-29 Table searching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115438048A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049267A (en) * 2022-12-26 2023-05-02 上海朗晖慧科技术有限公司 Multi-dimensional intelligent identification chemical article searching and displaying method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116049267A (en) * 2022-12-26 2023-05-02 上海朗晖慧科技术有限公司 Multi-dimensional intelligent identification chemical article searching and displaying method
CN116049267B (en) * 2022-12-26 2023-07-18 上海朗晖慧科技术有限公司 Multi-dimensional intelligent identification chemical article searching and displaying method

Similar Documents

Publication Publication Date Title
CA2777520C (en) System and method for phrase identification
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN111460797B (en) Keyword extraction method and device, electronic equipment and readable storage medium
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
CN111639486A (en) Paragraph searching method and device, electronic equipment and storage medium
CN112380859A (en) Public opinion information recommendation method and device, electronic equipment and computer storage medium
CN113033198B (en) Similar text pushing method and device, electronic equipment and computer storage medium
CN113449187A (en) Product recommendation method, device and equipment based on double portraits and storage medium
CN112883730B (en) Similar text matching method and device, electronic equipment and storage medium
CN111625621B (en) Document retrieval method and device, electronic equipment and storage medium
CN112287682B (en) Method, device and equipment for extracting subject term and storage medium
CN113886708A (en) Product recommendation method, device, equipment and storage medium based on user information
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN114138784A (en) Information tracing method and device based on storage library, electronic equipment and medium
CN112667775A (en) Keyword prompt-based retrieval method and device, electronic equipment and storage medium
CN115438048A (en) Table searching method, device, equipment and storage medium
CN114416939A (en) Intelligent question and answer method, device, equipment and storage medium
Constantin Automatic structure and keyphrase analysis of scientific publications
CN115525761A (en) Method, device, equipment and storage medium for article keyword screening category
CN114385815A (en) News screening method, device, equipment and storage medium based on business requirements
CN114676307A (en) Ranking model training method, device, equipment and medium based on user retrieval
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium
CN114708073A (en) Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium
TW201822031A (en) Method of creating chart index with text information and its computer program product capable of generating a virtual chart message catalog and schema index information to facilitate data searching
CN112364068A (en) Course label generation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination