CN116340518A - Text association matrix establishment method and device, electronic equipment and storage medium - Google Patents

Text association matrix establishment method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116340518A
CN116340518A CN202310317594.8A CN202310317594A CN116340518A CN 116340518 A CN116340518 A CN 116340518A CN 202310317594 A CN202310317594 A CN 202310317594A CN 116340518 A CN116340518 A CN 116340518A
Authority
CN
China
Prior art keywords
text
information
association value
title
text set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310317594.8A
Other languages
Chinese (zh)
Inventor
谭明
高帅超
刘心哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202310317594.8A priority Critical patent/CN116340518A/en
Publication of CN116340518A publication Critical patent/CN116340518A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for establishing a text association matrix, electronic equipment and a storage medium. The method for establishing the text association matrix comprises the following steps: acquiring a text set to be associated, and acquiring catalog information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated; dividing the text set to be associated based on the catalog information and the mechanism information to obtain at least one mechanism text set; determining a target association value of each text to be associated in the main text set and each text to be associated in each sub text set according to the title information, the tag array information and the text information; and determining a text association matrix corresponding to each organization text set based on the target association value. Based on the technical scheme of the embodiment of the invention, the efficiency and the accuracy of establishing the text association matrix can be improved.

Description

Text association matrix establishment method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer application technologies, and in particular, to a method and apparatus for establishing a text association matrix, an electronic device, and a storage medium.
Background
In recent years, with the rapid development of information technology in big data age, the information volume generated by enterprises is exponentially exploded and increased, and the construction focus of a commercial bank knowledge base is gradually changed from how to informatize and knowledgeze entity data to how to avoid information overload and improve knowledge output capability. Whether the existing resources can be fully utilized, and the knowledge association relation can be accurately and rapidly established is one of important standards for evaluating the construction result of the enterprise-level knowledge base.
Compared with other enterprises, the commercial bank knowledge base has the characteristics of complex organization lines, weak knowledge relevance and the like of the user. The general knowledge in the commercial bank knowledge base has a large amount of branch characteristic knowledge, and the inquiry requirement of the user on the branch characteristic knowledge is larger due to the regional difference. In the prior art, the association relation of the total branch knowledge is manually established by a person in the related technical field, but the consumed manpower is large and the efficiency is low, and the situation that the association relation is established in error often occurs, so the accuracy and the efficiency of the established association relation are low.
Disclosure of Invention
The invention provides a method, a device, electronic equipment and a storage medium for establishing a text association matrix, which are used for solving the technical problems of low accuracy and efficiency of the established association relation.
According to one aspect of the invention, a method for establishing a text association matrix is provided, wherein the method comprises the following steps:
acquiring a text set to be associated, and acquiring catalog information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated;
dividing the text sets to be associated based on the catalog information and the mechanism information to obtain at least one mechanism text set, wherein each mechanism text set comprises a main text set and at least one sub text set corresponding to the main text set;
determining a target association value of each text to be associated in the main text set and each text to be associated in each sub text set according to the title information, the tag array information and the text information;
and determining a text association matrix corresponding to each organization text set based on the target association value.
According to another aspect of the present invention, there is provided an apparatus for establishing a text association matrix, wherein the apparatus includes:
the information acquisition module is used for acquiring a text set to be associated and acquiring catalog information, mechanism information, title information, tag array information and text information of each text to be associated in the text set to be associated;
the set dividing module is used for dividing the text sets to be associated based on the catalog information and the mechanism information to obtain at least one mechanism text set, wherein each mechanism text set comprises a main text set and at least one sub text set corresponding to the main text set;
the association value determining module is used for determining a target association value of each text to be associated in the main text set and each text to be associated in each sub text set according to the title information, the tag array information and the text information;
and the matrix establishment module is used for determining a text association matrix corresponding to each organization text set based on the target association value.
According to another aspect of the present invention, there is provided an electronic apparatus including:
At least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method for creating a text association matrix according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the method for creating a text association matrix according to any of the embodiments of the present invention when executed.
According to the technical scheme, the text set to be associated is obtained, and directory information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated are obtained; dividing the text set to be associated based on the directory information and the mechanism information to obtain at least one mechanism text set, wherein each mechanism text set comprises a main text set and at least one sub-text set corresponding to the main text set, and limiting the range established by a text association matrix based on the directory information and the mechanism information, so that the calculation cost can be effectively saved, and the calculation efficiency of the text association matrix is improved; for each mechanism text set, determining a target association value of each text to be associated in the main text set and each text to be associated in each sub-text set according to the title information, the tag array information and the text information, and simultaneously determining a target association value by combining the title information, the tag array information and the text information, thereby improving the accuracy of the determined target association value; and determining a text association matrix corresponding to each organization text set based on the target association value. The efficiency and the accuracy of establishing the text association matrix are improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for establishing a text association matrix according to a first embodiment of the present invention;
fig. 2 is a flowchart of a method for establishing a text association matrix according to a second embodiment of the present invention;
fig. 3 is an overall flowchart of a method for establishing a text association matrix according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a device for establishing a text association matrix according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing a method for establishing a text association matrix according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a method for establishing a text association matrix according to an embodiment of the present invention, where the method may be performed by a device for establishing a text association matrix, the device for establishing a text association matrix may be implemented in hardware and/or software, and the device for establishing a text association matrix may be configured in computer software. As shown in fig. 1, the method includes:
s110, acquiring a text set to be associated, and acquiring catalog information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated.
The text set to be associated can be understood as a text set of which the association relationship is to be determined. In the embodiment of the present invention, the text set to be associated may be obtained according to an application scenario, which is not specifically limited herein. Alternatively, in the financial application scenario, the text set to be associated may be a commercial bank knowledge base text set. In the medical application scenario, the text set to be associated may be a medical institution knowledge base text set.
The text to be associated can be understood as a single knowledge text to be associated in the text set to be associated. In the embodiment of the invention, the text to be associated is associated with the text set to be associated. Alternatively, in the financial application scenario, the text to be associated may be a commercial bank knowledge text. In the medical application scenario, the text to be associated may be medical institution knowledge text.
The catalog information may be understood as information characterizing the catalog to which the text to be associated belongs.
The organization information may be understood as information characterizing an organization to which the text to be associated belongs. In the embodiment of the invention, the organization information is associated with the application scene of the text set to be associated. Alternatively, in the financial application scenario, the institution information may be a banking headquarter or a banking branch, or the like. In the medical application scenario, the institution information may be a hospital total or hospital sub-hospital, etc.
The title information may be understood as information corresponding to the title of the text to be associated. In the embodiment of the present invention, the title information may be preset according to the scene requirement, which is not specifically limited herein. Alternatively, the title information may include the number of title characters, title contents, and the like.
The tag array information can be understood as information corresponding to the tag array of the text to be associated. In the embodiment of the present invention, the tag array information may be preset according to the scene requirement, which is not specifically limited herein. Specifically, the tag array information may be information corresponding to manually labeled tags for each text to be associated. Alternatively, the tag array information may include the number of tags, tag contents, and the like.
The text information may be understood as information of the text to be associated. In the embodiment of the present invention, the text information may be preset according to the scene requirement, which is not specifically limited herein. Alternatively, the text information may include the number of text characters, text content, keyword information, and the like.
Specifically, in the embodiment of the present invention, optionally, the directory information may be acquired by a directory acquirer; the mechanism information can be acquired by a mechanism acquirer; the title information can be acquired by a title acquirer; the tag array information can be acquired by a tag array acquirer; the text information may be acquired by a text acquirer.
And S120, dividing the text set to be associated based on the catalog information and the organization information to obtain at least one organization text set.
Wherein each mechanism text set comprises a main text set and at least one sub text set corresponding to the main text set;
the organization text set is understood to be a text set formed based on a main text set and at least one sub text set.
The main text set may be understood as a set of texts to be associated corresponding to the main mechanism. In the embodiment of the invention, the main text set is related to an application scene. Optionally, in the financial application scenario, the main text set may be a set of texts to be associated corresponding to a bank headquarter. In the medical application scenario, the main text set may be a set of texts to be associated corresponding to a hospital total.
Correspondingly, the text separating set can be understood as a set of texts to be associated corresponding to the extension mechanism. It will be appreciated that a master institution may correspond to one or more sub institutions, and that, illustratively, a bank master may correspond to multiple banks branches, or a hospital master may correspond to multiple hospital branches. Thus, the set of split texts is related to the application scenario and the set of main texts. Optionally, in the financial application scenario, the text-dividing set may be a set of texts to be associated corresponding to one or more banking branches corresponding to a banking general line. In the medical application scenario, the separate text set may be a set of texts to be associated corresponding to one or more hospital separating institutions corresponding to a hospital total.
Optionally, the dividing the text set to be associated based on the catalog information and the organization information to obtain at least one organization text set includes:
dividing the text set to be associated based on the directory information to obtain at least one directory text set;
and dividing the directory text sets based on the organization information for each directory text set to obtain at least one organization text set.
The directory text set may be understood as a text set obtained by dividing the text set to be associated based on the directory information. It will be appreciated that each of the catalogue text sets may include one or more organization text sets.
In the embodiment of the invention, the text sets to be associated are divided based on the catalog information and the organization information, at least one organization text set is obtained, and then the text associated matrix corresponding to each organization text set is determined, so that the calculation cost can be effectively saved, and the calculation efficiency of the text associated matrix is improved.
S130, determining a target association value of each text to be associated in the main text set and each text to be associated in each sub text set according to the title information, the tag array information and the text information for each organization text set.
And the target association value is determined according to the title information, the tag array information and the text information, and can represent the numerical value of the association between each text to be associated in the main text set and each text to be associated in each sub text set.
In the embodiment of the invention, the range for establishing the text association matrix is limited based on the catalog information and the organization information, and then the numerical value which can characterize the association of each text to be associated in the main text set and each text to be associated in each sub text set, namely the target association value, is determined by combining the title information, the tag array information and the text information at the same time so as to establish the text association matrix corresponding to each organization text set, thereby improving the efficiency and the accuracy for establishing the text association matrix.
And S140, determining a text association matrix corresponding to each mechanism text set based on the target association value.
The text association matrix can be understood as a three-dimensional matrix which can represent the association between each text to be associated in the main text set and each text to be associated in each sub text set in each mechanism text set. Specifically, the text association matrix may be established as follows:
Figure BDA0004156125400000081
wherein S represents a text set to be associated, k represents an extension mechanism set, and b represents a target association value.
In the embodiment of the invention, specifically, based on the established text association matrix, the text to be associated of each total institution can correspond to the text to be associated of at most one extension institution with the largest item label association value of each sub institution under the catalog. Therefore, based on the technical scheme of the embodiment of the invention, the relevance between the main text set of the total mechanism and the sub text sets of the sub mechanism can be established, so that the convenience of relevance text review is improved.
According to the technical scheme, the text set to be associated is obtained, and directory information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated are obtained; dividing the text set to be associated based on the directory information and the mechanism information to obtain at least one mechanism text set, wherein each mechanism text set comprises a main text set and at least one sub-text set corresponding to the main text set, and limiting the range established by a text association matrix based on the directory information and the mechanism information, so that the calculation cost can be effectively saved, and the calculation efficiency of the text association matrix is improved; for each mechanism text set, determining a target association value of each text to be associated in the main text set and each text to be associated in each sub-text set according to the title information, the tag array information and the text information, and simultaneously determining a target association value by combining the title information, the tag array information and the text information, thereby improving the accuracy of the determined target association value; and determining a text association matrix corresponding to each organization text set based on the target association value. The efficiency and the accuracy of establishing the text association matrix are improved.
Example two
Fig. 2 is a flowchart of a method for establishing a text association matrix according to a second embodiment of the present invention, where in this embodiment, the target association value of each text to be associated in the main text set and each text to be associated in each sub text set is determined according to the header information, the tag array information and the text information. As shown in fig. 2, the method includes:
s210, acquiring a text set to be associated, and acquiring catalog information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated.
S220, dividing the text set to be associated based on the catalog information and the organization information to obtain at least one organization text set.
And S230, respectively determining a title association value, a tag array association value and a text association value according to the title information, the tag array information and the text information for each organization text set.
The title association value may be understood as a numerical value determined based on the title information, where the numerical value may represent the association of the title between each text to be associated in the main text set and each text to be associated in the respective sub text sets. The tag array association value may be understood as a numerical value that is determined based on the tag array information and may characterize the association of the tag array between each text to be associated in the main text set and each text to be associated in the respective sub text sets. The text association value may be understood as a numerical value that may be determined based on the text information and may characterize the association of each text to be associated in the main text set with each text to be associated in the respective sub text sets.
Optionally, the determining the title association value and the tag array association value based on the title information and the tag array information includes:
based on the title information, determining the same title character number between each text to be associated in a main text set and each text to be associated in each sub text set and the total number of title characters corresponding to the main text set, and determining the title association value according to the same title character number and the total number of title characters;
and based on the tag array information, determining the same tag number between each text to be associated in the main text set and each text to be associated in each sub text set and the total number of tags corresponding to the main text set, and determining a tag array association value according to the same tag number and the total number of tags.
The same header character number can be understood as the same header character number between each text to be associated in the main text set and each text to be associated in each sub text set. The total number of heading characters may be understood as the total number of heading characters for each of the text to be associated in the main set of text.
Specifically, the calculation formula for determining the title association value according to the number of the same title characters and the total number of the title characters may be: title association value = number of homotitle characters/total number of title characters.
The same number of labels may be understood as the same number of labels between each text to be associated in the main text set and each text to be associated in the respective sub text sets. The total number of tags may be understood as a total number of tags for each of the text to be associated in the main set of text.
Specifically, the calculation formula for determining the tag array association value according to the same tag number and the tag total number may be: tag array association value = same number of tags/total number of tags.
Optionally, in an embodiment of the present invention, specifically, the title association value may be determined by a title association calculator; the tag array association value may be determined by a tag array association calculator.
Optionally, the determining a text association value based on the text information includes:
performing vocabulary cleaning on the text information based on a preset cleaning vocabulary, and extracting keywords from the cleaned text information to obtain keyword information;
extracting word frequency of the keyword information to obtain a keyword word frequency vector;
and determining a text association value between each text to be associated in the main text set and each text to be associated in each sub text set based on the keyword word frequency vector.
The preset clearing vocabulary can be understood as the vocabulary which needs to be cleared before keyword extraction is performed on the text information. In the embodiment of the present invention, the preset clearing vocabulary may be preset according to the scene requirement, which is not specifically limited herein. Optionally, the preset cleaning vocabulary may be high-frequency words that have no practical effect on the subsequent establishment of the text association matrix. For example, in the financial application scenario, the preset clearing vocabulary may be territory, institution, headquarter, branch, beijing, tianjin, and the like. It can be appreciated that the accuracy and efficiency of establishing the text association matrix can be improved by performing vocabulary cleaning on the text information.
The keyword information can be understood as keyword extraction of the text information after the removal, and keyword information is obtained. In the embodiment of the present invention, the keyword information may be preset according to the scene requirement, which is not specifically limited herein. Alternatively, the keyword information may include the keyword character number and the keyword content. The keyword word frequency vector can be understood as a word frequency vector obtained by extracting the keyword information.
In the embodiment of the present invention, a specific manner of determining the text association value between each text to be associated in the main text set and each text to be associated in each sub text set based on the keyword word frequency vector may be preset according to the scene requirement, which is not limited herein. Optionally, multiplying the keyword word frequency vector of each text to be associated in the main text set by the keyword word frequency vector of each text to be associated in each divided text set to obtain a text association value.
Optionally, in an embodiment of the present invention, specifically, the keyword information may be extracted by a keyword extractor; the keyword word frequency vector can be extracted by a word frequency extractor; further, the text association value may be determined by a text relevance calculator.
S240, determining a target association value between each text to be associated in the main text set and each text to be associated in each sub text set according to the title association value, the tag array association value and the text association value.
The target association value is determined according to the title association value, the tag array association value and the text association value, and may represent a value of association between each text to be associated in the main text set and each text to be associated in each sub text set.
Optionally, the determining, according to the title association value, the tag array association value and the text association value, a target association value between each text to be associated in the main text set and each text to be associated in each sub text set includes:
respectively determining the title weight corresponding to the title association value, the tag array weight corresponding to the tag array association value and the text weight corresponding to the text association value;
and determining a target association value between each text to be associated in the main text set and each text to be associated in each sub text set according to the title association value, the title weight, the tag array association value, the tag array weight, the text association value and the text weight.
The title weight may be understood as a weight corresponding to the title association value when determining the target association value. The tag array weight may be understood as a weight corresponding to the tag array association value when determining the target association value. The text weight can be understood as the weight corresponding to the text association value when the target association value is determined. In the embodiment of the present invention, the title weight, the tag array weight, and the text weight may be preset according to scene requirements, which is not specifically limited herein. Alternatively, the title weight, the tag array weight, and the text weight may be 1/2,1, 2, or the like.
In the embodiment of the invention, the tag array is a vocabulary summary of a knowledge maintainer and has a large contribution to determining the relevance of the text to be associated, so that, optionally, the tag array weight may be greater than the title weight and the text weight. Specifically, according to the title association value, the title weight, the tag array association value, the tag array weight, the text association value and the text weight, a calculation formula for determining a target association value between each text to be associated in the main text set and each text to be associated in each sub text set may be: target association value = tag array association value + title association value/2 + text association value/2.
Alternatively, in an embodiment of the present invention, the target correlation value may be determined by a knowledge correlation calculator.
S250, determining a text association matrix corresponding to each mechanism text set based on the target association value.
According to the technical scheme, the title association value, the tag array association value and the text association value are respectively determined based on the title information, the tag array information and the text information; and determining a target association value between each text to be associated in the main text set and each text to be associated in each sub text set according to the title association value, the tag array association value and the text association value. And simultaneously, combining the title information, the tag array information and the text information to determine a target association value, thereby improving the accuracy of the determined target association value.
Optionally, fig. 3 is an overall flowchart of a method for establishing a text association matrix according to an embodiment of the present invention. As shown in fig. 3, the overall flow of the method for establishing the text association matrix may be:
1. and extracting knowledge information of the text set to be associated. And extracting catalog information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated.
2. The directory text collection is partitioned. The directory retrieves knowledge information and builds a knowledge set under the directory, namely a directory text set.
3. The set of organization text is partitioned. And carrying out mechanism division on the directory text set, wherein the directory text set is divided into a main text set which is the general knowledge, and each line division mechanism divides the line knowledge, namely the line division text set.
4. And respectively extracting the characteristics of the total branch knowledge. And respectively extracting features of the total branch knowledge to obtain a title association value, a tag array association value and a text association value between each text to be associated in the main text set and each text to be associated in each branch text set.
5. And determining a target association value of each text to be associated in the main text set and each text to be associated in each sub text set. And respectively calculating the knowledge correlation of each general knowledge and the downlink knowledge of a certain downlink mechanism, and selecting the downlink knowledge with the largest correlation with the general knowledge under each downlink as the downlink characteristic knowledge of the corresponding certain downlink of the general knowledge.
6. And constructing a text association matrix. And (5) representing the association relation of the total score knowledge by using a text association matrix.
According to the technical scheme provided by the embodiment of the invention, based on the characteristics of the general knowledge and the branch characteristic knowledge in the commercial bank knowledge base, the total branch knowledge relationship suggestion scope is limited by the knowledge affiliated mechanism and the knowledge affiliated directory, so that the calculation cost can be effectively saved, and the efficiency of the constructed text association matrix is fully ensured. And meanwhile, the total correlation is determined by combining the correlation of the knowledge title, the correlation of the knowledge tag array and the correlation of the knowledge text, so that the accuracy of the constructed text correlation matrix is ensured. And the existing information of the commercial bank knowledge base is fully utilized, the total branch knowledge association relationship, namely the text association matrix, is automatically established, the pertinence is high, and the knowledge association and the total branch knowledge reference convenience are enhanced.
Example III
Fig. 4 is a schematic structural diagram of a device for establishing a text association matrix according to a third embodiment of the present invention. As shown in fig. 4, the apparatus includes: an information acquisition module 310, a set partitioning module 320, an association value determination module 330, and a matrix establishment module 340.
The information obtaining module 310 is configured to obtain a set of texts to be associated, and obtain directory information, organization information, title information, tag array information and text information of each text to be associated in the set of texts to be associated; the set dividing module 320 is configured to divide the text set to be associated based on the directory information and the organization information to obtain at least one organization text set, where each organization text set includes a main text set and at least one sub text set corresponding to the main text set; the association value determining module 330 is configured to determine, for each of the mechanism text sets, a target association value of each text to be associated in the main text set and each text to be associated in each sub text set according to the header information, the tag array information and the text information; the matrix establishing module 340 is configured to determine a text association matrix corresponding to each of the mechanism text sets based on the target association value.
According to the technical scheme, the text set to be associated is obtained, and directory information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated are obtained; dividing the text set to be associated based on the directory information and the mechanism information to obtain at least one mechanism text set, wherein each mechanism text set comprises a main text set and at least one sub-text set corresponding to the main text set, and limiting the range established by a text association matrix based on the directory information and the mechanism information, so that the calculation cost can be effectively saved, and the calculation efficiency of the text association matrix is improved; for each mechanism text set, determining a target association value of each text to be associated in the main text set and each text to be associated in each sub-text set according to the title information, the tag array information and the text information, and simultaneously determining a target association value by combining the title information, the tag array information and the text information, thereby improving the accuracy of the determined target association value; and determining a text association matrix corresponding to each organization text set based on the target association value. The efficiency and the accuracy of establishing the text association matrix are improved.
Optionally, the set dividing module 320 is configured to:
dividing the text set to be associated based on the directory information to obtain at least one directory text set;
and dividing the directory text sets based on the organization information for each directory text set to obtain at least one organization text set.
Optionally, the association value determining module 330 includes: the primary correlation value determination submodule and the target correlation value determination submodule.
The primary association value determining submodule is used for respectively determining a title association value, a tag array association value and a text association value based on the title information, the tag array information and the text information;
the target association value determining submodule is used for determining a target association value between each text to be associated in the main text set and each text to be associated in each sub text set according to the title association value, the tag array association value and the text association value.
Optionally, the determining sub-module of the preliminary correlation value includes: a title association value determining unit and a tag array association value determining unit.
The title association value determining unit is used for determining the same title character number between each text to be associated in the main text set and each text to be associated in each sub text set and the total number of title characters corresponding to the main text set based on the title information, and determining the title association value according to the same title character number and the total number of title characters;
The tag array association value determining unit is configured to determine, based on the tag array information, the same number of tags between each text to be associated in the main text set and each text to be associated in each sub text set and the total number of tags corresponding to the main text set, and determine a tag array association value according to the same number of tags and the total number of tags.
Optionally, the determining sub-module of the preliminary correlation value includes: keyword information determining unit, word frequency extracting unit and text association value determining unit.
The keyword information determining unit is used for carrying out vocabulary cleaning on the text information based on a preset cleaning vocabulary, and extracting keywords from the cleaned text information to obtain keyword information;
the word frequency extraction unit is used for extracting word frequency of the keyword information to obtain a keyword word frequency vector;
the text association value determining unit is used for determining text association values between each text to be associated in the main text set and each text to be associated in each sub text set based on the keyword word frequency vector.
Optionally, the target association value determining submodule is configured to:
Respectively determining the title weight corresponding to the title association value, the tag array weight corresponding to the tag array association value and the text weight corresponding to the text association value;
and determining a target association value between each text to be associated in the main text set and each text to be associated in each sub text set according to the title association value, the title weight, the tag array association value, the tag array weight, the text association value and the text weight.
Optionally, the tag array weight is greater than the title weight and the text weight.
The text association matrix establishing device provided by the embodiment of the invention can execute the text association matrix establishing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.
Example IV
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, for example, the text-association matrix establishment method.
In some embodiments, the method of establishing a text association matrix may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the above-described text-association matrix establishment method may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the method of establishing the text association matrix in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. The method for establishing the text association matrix is characterized by comprising the following steps of:
acquiring a text set to be associated, and acquiring catalog information, organization information, title information, tag array information and text information of each text to be associated in the text set to be associated;
dividing the text sets to be associated based on the catalog information and the mechanism information to obtain at least one mechanism text set, wherein each mechanism text set comprises a main text set and at least one sub text set corresponding to the main text set;
Determining a target association value of each text to be associated in the main text set and each text to be associated in each sub text set according to the title information, the tag array information and the text information;
and determining a text association matrix corresponding to each organization text set based on the target association value.
2. The method of claim 1, wherein the dividing the set of text to be associated based on the directory information and the organization information to obtain at least one set of organization text comprises:
dividing the text set to be associated based on the directory information to obtain at least one directory text set;
and dividing the directory text sets based on the organization information for each directory text set to obtain at least one organization text set.
3. The method of claim 1, wherein determining a target association value for each text to be associated in the main text set and each text to be associated in the separate text sets based on the header information, the tag array information, and the text information comprises:
Determining a title association value, a tag array association value and a text association value based on the title information, the tag array information and the text information, respectively;
and determining a target association value between each text to be associated in the main text set and each text to be associated in each sub text set according to the title association value, the tag array association value and the text association value.
4. The method of claim 3, wherein the determining a title association value and a tag array association value based on the title information and the tag array information comprises:
based on the title information, determining the same title character number between each text to be associated in a main text set and each text to be associated in each sub text set and the total number of title characters corresponding to the main text set, and determining the title association value according to the same title character number and the total number of title characters;
and based on the tag array information, determining the same tag number between each text to be associated in the main text set and each text to be associated in each sub text set and the total number of tags corresponding to the main text set, and determining a tag array association value according to the same tag number and the total number of tags.
5. The method of claim 3, wherein the determining a text association value based on the text information comprises:
performing vocabulary cleaning on the text information based on a preset cleaning vocabulary, and extracting keywords from the cleaned text information to obtain keyword information;
extracting word frequency of the keyword information to obtain a keyword word frequency vector;
and determining a text association value between each text to be associated in the main text set and each text to be associated in each sub text set based on the keyword word frequency vector.
6. The method of claim 3, wherein determining a target association value between each text to be associated in a main text set and each text to be associated in separate text sets based on the title association value, the tag array association value, and the text association value comprises:
respectively determining the title weight corresponding to the title association value, the tag array weight corresponding to the tag array association value and the text weight corresponding to the text association value;
and determining a target association value between each text to be associated in the main text set and each text to be associated in each sub text set according to the title association value, the title weight, the tag array association value, the tag array weight, the text association value and the text weight.
7. The method of claim 6, wherein the tag array weight is greater than the title weight and the text weight.
8. A text association matrix establishing device, comprising:
the information acquisition module is used for acquiring a text set to be associated and acquiring catalog information, mechanism information, title information, tag array information and text information of each text to be associated in the text set to be associated;
the set dividing module is used for dividing the text sets to be associated based on the catalog information and the mechanism information to obtain at least one mechanism text set, wherein each mechanism text set comprises a main text set and at least one sub text set corresponding to the main text set;
the association value determining module is used for determining a target association value of each text to be associated in the main text set and each text to be associated in each sub text set according to the title information, the tag array information and the text information;
and the matrix establishment module is used for determining a text association matrix corresponding to each organization text set based on the target association value.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of establishing a text association matrix as claimed in any one of claims 1 to 7.
10. A computer readable storage medium storing computer instructions for causing a processor to implement the method of establishing a text association matrix according to any one of claims 1-7 when executed.
CN202310317594.8A 2023-03-29 2023-03-29 Text association matrix establishment method and device, electronic equipment and storage medium Pending CN116340518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310317594.8A CN116340518A (en) 2023-03-29 2023-03-29 Text association matrix establishment method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310317594.8A CN116340518A (en) 2023-03-29 2023-03-29 Text association matrix establishment method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116340518A true CN116340518A (en) 2023-06-27

Family

ID=86880263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310317594.8A Pending CN116340518A (en) 2023-03-29 2023-03-29 Text association matrix establishment method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116340518A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194615A (en) * 2023-11-02 2023-12-08 国网浙江省电力有限公司 Enterprise compliance data processing method and platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194615A (en) * 2023-11-02 2023-12-08 国网浙江省电力有限公司 Enterprise compliance data processing method and platform
CN117194615B (en) * 2023-11-02 2024-02-20 国网浙江省电力有限公司 Enterprise compliance data processing method and platform

Similar Documents

Publication Publication Date Title
CN113590645B (en) Searching method, searching device, electronic equipment and storage medium
CN112559631B (en) Data processing method and device of distributed graph database and electronic equipment
CN113988157B (en) Semantic retrieval network training method and device, electronic equipment and storage medium
CN112989235A (en) Knowledge base-based internal link construction method, device, equipment and storage medium
CN116340518A (en) Text association matrix establishment method and device, electronic equipment and storage medium
CN115145924A (en) Data processing method, device, equipment and storage medium
CN113191145B (en) Keyword processing method and device, electronic equipment and medium
CN113408280A (en) Negative example construction method, device, equipment and storage medium
CN116340831B (en) Information classification method and device, electronic equipment and storage medium
CN112887426B (en) Information stream pushing method and device, electronic equipment and storage medium
CN112528644B (en) Entity mounting method, device, equipment and storage medium
CN116628167B (en) Response determination method and device, electronic equipment and storage medium
CN114422584B (en) Method, device and storage medium for pushing resources
CN115511014B (en) Information matching method, device, equipment and storage medium
CN117271840B (en) Data query method and device of graph database and electronic equipment
CN116089459B (en) Data retrieval method, device, electronic equipment and storage medium
CN113377922B (en) Method, device, electronic equipment and medium for matching information
CN118132550A (en) Structured large field data query method and device and electronic equipment
CN118012936A (en) Data extraction method, device, equipment and storage medium
CN117435686A (en) Negative example sample construction method, commodity searching method, device and electronic equipment
CN116306964A (en) Sample data generation method and device
CN114328855A (en) Document query method and device, electronic equipment and readable storage medium
CN118035445A (en) Work order classification method and device, electronic equipment and storage medium
CN116431809A (en) Text labeling method, device and storage medium based on bank customer service scene
CN117611290A (en) Method, device, equipment and storage medium for ordering merchant nodes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination