CN117851373A - Knowledge document hierarchical management method, storage medium and management system - Google Patents

Knowledge document hierarchical management method, storage medium and management system Download PDF

Info

Publication number
CN117851373A
CN117851373A CN202410266293.1A CN202410266293A CN117851373A CN 117851373 A CN117851373 A CN 117851373A CN 202410266293 A CN202410266293 A CN 202410266293A CN 117851373 A CN117851373 A CN 117851373A
Authority
CN
China
Prior art keywords
text
knowledge
document
training
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410266293.1A
Other languages
Chinese (zh)
Other versions
CN117851373B (en
Inventor
张晓亮
苏贤
曹荣来
贲余刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Shuce Information Technology Co ltd
Original Assignee
Nanjing Shuce Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Shuce Information Technology Co ltd filed Critical Nanjing Shuce Information Technology Co ltd
Priority to CN202410266293.1A priority Critical patent/CN117851373B/en
Publication of CN117851373A publication Critical patent/CN117851373A/en
Application granted granted Critical
Publication of CN117851373B publication Critical patent/CN117851373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a knowledge document layered management method, a storage medium and a management system, which are based on initial knowledge documents and constraint information, generate document management coding features, then perform dimension reduction operation on the initial knowledge documents to obtain mapped knowledge texts, generate inter-focus texts corresponding to the initial knowledge documents according to the document management coding features, the mapped knowledge texts and target level masks, and perform content adjustment on the mapped knowledge texts according to the document management coding features and the inter-focus texts to obtain adjusted knowledge documents, so that the enterprise knowledge base in the adjusted knowledge documents has a better content maintenance effect based on the target level features and constraint information of the enterprise knowledge base in the integrated initial knowledge documents, and the target level information disclosure features of the enterprise knowledge base in the adjusted knowledge documents are completed.

Description

Knowledge document hierarchical management method, storage medium and management system
Technical Field
The present application relates to the field of natural language processing and machine learning, and in particular, to a knowledge document hierarchical management method, a storage medium, and a management system.
Background
With rapid development of information technology and increasing of enterprise scale, knowledge documents play a vital role in enterprise management as important assets for enterprises. These documents usually contain key information such as core knowledge, business processes, product design, market analysis, and management policies of the enterprise, which are an integral part of the enterprise operation. However, conventional knowledge document management methods often face a number of challenges. First, existing document management systems often lack careful consideration of the document hierarchy, making it difficult to efficiently utilize the hierarchy features of the document during storage, retrieval, and use of the document. This not only reduces the efficiency of document utilization, but may also lead to omission or misuse of critical information. Second, conventional methods tend to ignore the importance of constraint information when processing documents. Constraint information refers to specific rules or conditions used to limit or guide the disclosure of document information. Because of the lack of an effective constraint information processing mechanism, the existing document management method has difficulty in ensuring the security and consistency of documents in the sharing and using processes. In addition, with the continuous expansion of the size of the enterprise knowledge base, how to efficiently process and maintain a large number of documents is also a problem to be solved. The traditional document management method is low in efficiency when processing a large-scale document set, and the actual requirements of enterprises are difficult to meet. Therefore, a novel knowledge document management method is urgently needed to improve the document utilization efficiency, ensure the information security and meet the actual demands of enterprises.
Disclosure of Invention
The invention aims to provide a knowledge document hierarchical management method, a storage medium and a management system. The embodiment of the application is realized in the following way:
in a first aspect, an embodiment of the present application provides a hierarchical knowledge document management method, applied to an enterprise management system, where the method includes: generating a document management coding feature according to an initial knowledge document and constraint information, wherein the document management coding feature comprises a description vector of the initial knowledge document and a description vector of the constraint information; the initial knowledge document is a text containing target level characteristics of the enterprise knowledge base, and the constraint information is information for constraining the target level information disclosure characteristics of the enterprise knowledge base; performing dimension reduction operation on the initial knowledge document to obtain a mapping knowledge text; wherein the mapped knowledge text is text having dimensions smaller than the initial knowledge document while maintaining target hierarchical features of the enterprise knowledge base; generating a confocal text corresponding to the initial knowledge document according to the mapped knowledge text, the document management coding features and a target hierarchy mask, wherein the target hierarchy mask is used for distinguishing the target hierarchy text in the initial knowledge document from other hierarchy texts except the target hierarchy text, and the confocal text is used for generating disclosure information corresponding to the enterprise knowledge base target hierarchy information disclosure features in the target hierarchy text; content adjustment is carried out on the mapping knowledge text according to the document management coding features and the confocal text, and an adjusted knowledge document is obtained; the adjusted knowledge document comprises target level characteristics of the initial knowledge document and also comprises enterprise knowledge base target level information disclosure characteristics constrained by the constraint information.
Optionally, the content adjustment is performed on the mapped knowledge text according to the document management coding feature and the confocal text to obtain an adjusted knowledge document, which includes: giving s rounds of disturbance information to the mapped knowledge text based on a first machine learning model to obtain a disturbance text, wherein s is more than or equal to 1; based on the first machine learning model, performing s rounds of disturbance information reasoning and disturbance cleaning processing on the disturbance text according to the document management coding features and the confocal text to obtain a cleaning text; restoring the clear text to the dimension corresponding to the initial knowledge document, and generating the adjusted knowledge document.
Optionally, the performing s rounds of disturbance information reasoning and disturbance cleaning processing on the disturbance text based on the first machine learning model according to the document management coding feature and the confocal text to obtain a cleaned text includes: based on the first machine learning model, reasoning disturbance information endowed by an ith round in the mapping knowledge text according to the document management coding features and the confocal text to obtain ith reasoning disturbance information, wherein i is more than or equal to 1 and less than or equal to s; clearing the ith reasoning disturbance information in the text with the disturbance information to obtain an adjusted text with the disturbance information; the initialized text with the disturbance information is the disturbance text; when i is less than s, taking i+1 as an adjusted i, continuing to iterate the operation of reasoning disturbance information given by an ith round in the mapping knowledge text on the basis of the first machine learning model according to the document management coding feature and the confocal text to obtain ith reasoning disturbance information; and when i=s, taking the adjusted text with disturbance information as the cleaning text.
Optionally, the generating the document management coding feature according to the initial knowledge document and the constraint information includes: extracting target level features of the enterprise knowledge base contained in the initial knowledge document based on a knowledge base detection network; performing embedded mapping on constraint information based on a constraint information network to obtain a coding result; generating the document management coding feature according to the target level feature of the enterprise knowledge base and the coding result; the constraint information comprises one or more constraint items, and the coding result comprises coding vectors respectively corresponding to the one or more constraint items; the generating the document management coding feature according to the target level feature of the enterprise knowledge base and the coding result comprises the following steps: combining the coding vector corresponding to a first constraint item in the one or more constraint items with the target level characteristics of the enterprise knowledge base to obtain an adjustment coding vector corresponding to the first constraint item; the first constraint item is a preset constraint item used for representing an enterprise knowledge base; processing the adjustment coding vector corresponding to the first constraint item according to a feedforward neural network to obtain a target coding vector corresponding to the first constraint item; the target coding vector corresponding to the first constraint item and the coding vector corresponding to the first constraint item comprise consistent dimensionality; and replacing the coding vector corresponding to the first constraint item in the coding result with the target coding vector corresponding to the first constraint item to generate the document management coding feature.
Optionally, the extracting, by the knowledge base detection network, the target level feature of the enterprise knowledge base included in the initial knowledge document includes: extracting a target level text in the initial knowledge document to obtain a target level text corresponding to the initial knowledge document; and extracting the enterprise knowledge base target level features from the target level text based on the knowledge base detection network.
Optionally, the generating the confocal text corresponding to the initial knowledge document according to the mapped knowledge text, the document management coding feature and the target hierarchy mask includes: generating an initial confocal text corresponding to the initial knowledge document according to the mapping knowledge text and the document management coding characteristic based on a first machine learning model; and shielding the rest level texts except the target level text in the initial confocal text according to the target level mask, so as to obtain the confocal text corresponding to the initial knowledge document.
Optionally, the method is performed by a document management neural network, and the method further includes a debugging process of the document management neural network, including: acquiring one or more debugging data of the document management neural network, wherein each debugging data comprises a pair of associated training knowledge documents and training constraint information; the training knowledge document is a text containing target level characteristics of an enterprise knowledge base, and the training constraint information is information for constraining the target level information disclosure characteristics of the enterprise knowledge base in the training knowledge document; generating training document management coding features based on the document management neural network according to the training knowledge document and the training constraint information, wherein the training document management coding features comprise description vectors of the training knowledge document and description vectors of the training constraint information; performing dimension reduction operation on the training knowledge document based on the document management neural network to obtain a training mapping knowledge text; wherein the training mapping knowledge text is a text with dimensions smaller than the training knowledge document, and maintaining target level characteristics of the enterprise knowledge base; generating a mutual focusing text corresponding to the training knowledge document based on the document management neural network according to the training mapping knowledge text, the training document management coding feature and a training target hierarchy mask, wherein the training target hierarchy mask is used for distinguishing a target hierarchy text in the training knowledge document and other hierarchy texts except the target hierarchy text, and the mutual focusing text is used for generating disclosure information corresponding to the enterprise knowledge base target hierarchy information disclosure feature in the target hierarchy text; based on the document management neural network, performing content adjustment on the training mapping knowledge text according to the training document management coding characteristics and the confocal text, and generating an adjusted knowledge document corresponding to the training knowledge document; and optimizing the network configuration variables of the document management neural network according to the training knowledge document and the adjusted knowledge document to obtain the document management neural network with the debugged document management neural network.
Optionally, the content adjustment is performed on the training mapping knowledge text based on the document management neural network according to the training document management coding feature and the confocal text, and the generating an adjusted knowledge document corresponding to the training knowledge document includes: giving s rounds of disturbance information in the training mapping knowledge text based on a first machine learning model included in the document management neural network to obtain a training disturbance text, wherein s is more than or equal to 1; based on the first machine learning model, performing s rounds of disturbance information reasoning and disturbance cleaning processing on the training disturbance text according to the document management coding features and the confocal text to obtain a training cleaning text; and restoring the training clear text to the dimension corresponding to the training knowledge document, and generating the adjusted knowledge document.
Optionally, the performing s rounds of disturbance information reasoning and disturbance cleaning processing on the training disturbance text based on the first machine learning model according to the document management coding feature and the confocal text to obtain a training cleaning text includes: based on the first machine learning model, managing coding features and the confocal text according to the training document, reasoning disturbance information endowed by an ith round in the training mapping knowledge text to obtain ith reasoning disturbance information, wherein i is more than or equal to 1 and less than or equal to s; clearing the ith reasoning disturbance information in the text with disturbance information in training to obtain an adjusted text with disturbance information in training; the initialized training disturbance text is the training disturbance text; when i is less than s, taking i+1 as an adjusted i, continuing to iterate the operation of obtaining the ith inference disturbance information by reasoning the disturbance information given by the ith round in the training mapping knowledge text based on the first machine learning model and the training document management coding feature and the confocal text; and when i=s, taking the text with disturbance information of the adjusted training as the training clear text.
Optionally, the generating training document management coding features based on the document management neural network according to the training knowledge document and the training constraint information includes: extracting target level features of the enterprise knowledge base contained in the training knowledge document based on a knowledge base detection network; based on a constraint information network included in the document management neural network, embedding and mapping the training constraint information to obtain a training coding result; generating the training document management coding feature according to the target level feature of the enterprise knowledge base and the training coding result; the training constraint information comprises one or more constraint items, and the training coding result comprises coding vectors corresponding to the one or more constraint items respectively; the generating the training document management coding feature according to the target level feature of the enterprise knowledge base and the training coding result comprises the following steps: combining the coding vector corresponding to a first constraint item in the one or more constraint items with the target level characteristics of the enterprise knowledge base to obtain an adjustment coding vector corresponding to the first constraint item; the first constraint item is a preset constraint item used for representing an enterprise knowledge base; processing the adjustment coding vector corresponding to the first constraint item according to a feedforward neural network to obtain a target coding vector corresponding to the first constraint item; the target coding vector corresponding to the first constraint item and the coding vector corresponding to the first constraint item comprise consistent dimensionality; and replacing the coding vector corresponding to the first constraint item in the training coding result with the target coding vector corresponding to the first constraint item to generate the training document management coding feature.
Optionally, the generating, based on the document management neural network, the inter-focus text corresponding to the training knowledge document according to the training mapping knowledge text, the training document management coding feature and the training target hierarchy mask includes: generating an initial confocal text corresponding to the training knowledge document according to the training mapping knowledge text and the document management coding characteristic based on a first machine learning model included in the document management neural network; shielding the rest level texts except the target level text in the initial confocal text according to the training target level mask, so as to obtain a confocal text corresponding to the training knowledge document; optimizing the network configuration variables of the document management neural network according to the training knowledge document and the adjusted knowledge document to obtain a document management neural network with completed debugging, wherein the method comprises the following steps: determining fusion errors according to the training knowledge document and the adjusted knowledge document, wherein the fusion errors comprise disturbance information errors, document content errors and hierarchical text distribution errors; the disturbance information error is used for indicating the error of the given disturbance information and the reasoning disturbance information, the document content error is used for indicating the error between the training knowledge document and the description vector representation of the adjusted knowledge document in the target hierarchical text, and the hierarchical text distribution error is used for indicating the adjusting effect of the confocal text in the target hierarchical text; and optimizing the network configuration variables of the document management neural network according to the fusion error to obtain the document management neural network with the debugged document management neural network.
Optionally, determining a fusion error according to the training knowledge document and the adjusted knowledge document includes: determining disturbance information errors according to disturbance information and reasoning disturbance information which are endowed in the process of carrying out content adjustment on the training mapping knowledge text; determining document content errors according to the training knowledge document and the adjusted knowledge document; determining a hierarchical text distribution error according to the training target hierarchical mask and the confocal text; determining the fusion error according to the disturbance information error, the document content error and the hierarchical text distribution error; wherein determining the document content error based on the training knowledge document and the adjusted knowledge document comprises: extracting target level features of the training knowledge document based on a knowledge base detection network to obtain first target level features; extracting target level features of the adjusted knowledge document based on the knowledge base detection network to obtain second target level features; and determining the document content error according to the commonality measurement result between the first target level characteristic and the second target level characteristic.
In a second aspect, the present application provides a computer readable storage medium having a computer program stored thereon, which when run on a processor causes the processor to perform the method described above.
In a third aspect, the present application provides an enterprise management system comprising: one or more processors; a memory; one or more computer programs; wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs, when executed by the processors, implement the methods as described above.
The application has at least the beneficial effects that: according to the knowledge document layered management method, the storage medium and the management system, the document management coding feature is generated based on the initial knowledge document and the constraint information, then the dimension reduction operation is carried out on the initial knowledge document, the mapping knowledge text is obtained, the mutual focusing text corresponding to the initial knowledge document is generated according to the document management coding feature, the mapping knowledge text and the target level mask, the content adjustment is carried out on the mapping knowledge text according to the document management coding feature and the mutual focusing text, the adjusted knowledge document is obtained, the enterprise knowledge base target level information disclosure feature based on the target level feature and the constraint information of the enterprise knowledge base in the fused initial knowledge document is completed, so that the common measurement result of the enterprise knowledge base in the generated adjusted knowledge document and the enterprise knowledge base in the initial knowledge document is high, and the enterprise knowledge base in the adjustment link has a good content maintenance effect.
In the following description, other features will be partially set forth. Upon review of the ensuing disclosure and the accompanying figures, those skilled in the art will in part discover these features or will be able to ascertain them through production or use thereof. The features of the present application may be implemented and obtained by practicing or using the various aspects of the methods, tools, and combinations that are set forth in the detailed examples described below.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a flowchart of a knowledge document hierarchical management method according to an embodiment of the present application.
Fig. 2 is a schematic diagram of an enterprise management system according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the description of the embodiments of the application is for the purpose of describing particular embodiments of the application only and is not intended to be limiting of the application.
In the embodiment of the present application, the executing body of the knowledge document hierarchical management method is a computer device (i.e., an enterprise management system), which may be, but is not limited to, a server, a personal computer, a notebook computer, a tablet computer, a smart phone, and the like. The computer device includes a user device and a network device. Wherein, the user equipment includes but is not limited to a computer, a smart phone, a PAD, etc.; network devices include, but are not limited to, a single network server, a server group of multiple network servers, or a cloud of large numbers of computers or network servers in a cloud computing, where cloud computing is a type of distributed computing, a super virtual computer consisting of a collection of loosely coupled computers. The computer device can be used for realizing the application by running alone, and can also be accessed into a network and realized by interaction with other computer devices in the network. Wherein the network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
The embodiment of the application provides a knowledge document hierarchical management method, which is applied to an enterprise management system, as shown in fig. 1, and comprises the following steps:
step S10: generating a document management coding feature according to the initial knowledge document and the constraint information, wherein the document management coding feature comprises a description vector of the initial knowledge document and a description vector of the constraint information; the initial knowledge document is a text containing target level characteristics of the enterprise knowledge base, and the constraint information is information for constraining the target level information disclosure characteristics of the enterprise knowledge base.
Wherein, the initial knowledge document refers to an original text file containing specific-level information in the enterprise knowledge base. It may contain various types of data, such as text, numbers, charts, etc., that reflect the knowledge content of the business at a certain level. For example, assume that a knowledge base of an enterprise has stored therein detailed information about its different product lines. One of the initial knowledge documents may be a detailed description about "smart phone product line" including various models, functional features, target markets, etc. Constraint information is a set of rules or conditions used to limit or guide how information is disclosed. It may relate to confidentiality, sensitivity, privacy protection, etc. of information, ensuring that policies or regulations of enterprises are not violated when information is shared or distributed. For example, in a smartphone product line document, constraint information may indicate that certain technical details are proprietary and cannot be disclosed externally; or some sales data is sensitive and can only be shared between internal higher layers. Document management coding features are a set of features extracted from a document by a specific algorithm (e.g., a neural network algorithm such as BERT) for classification, retrieval, management, etc. of the document. These features represent the content and structural information of the document in the form of vectors. For example, for a smartphone product line document, the document management coding features may include keywords mentioned in the document (e.g., "smartphone," "camera resolution," etc.), the length, format, creation time, etc. of the document. These features may help the system quickly identify and locate documents. Description vector is the process of converting text or other data types into a mathematically operable vector form. It is commonly used in natural language processing or machine learning tasks so that algorithms can understand and process text information. For example, in step S10, both the initial knowledge document and constraint information may be converted into description vectors. For example, for the phrase "smart phone has a high definition camera," the description vector may include numerical representations related to words of "smart phone," "high definition," and "camera," etc., reflecting the importance or relevance of these words in the text. Target tier features refer to information features that are present at a particular tier in an enterprise knowledge base. Different levels of information may have different importance, levels of detail, access rights, etc. For example, in a multi-level organization, higher-level management may focus on macro information (high-level features) such as strategic directions, financial performance, etc., while base-level employees may focus more on detailed information (low-level features) such as specific operational flows, task assignments, etc. In the smartphone product line document, the target tier feature may refer to information presentation emphasis for a particular user group (e.g., technical lovers, general consumers, etc.). A disclosed feature refers to a characteristic or attribute that information has when disclosed or shared. It may relate to aspects of accuracy, integrity, timeliness, etc. of the information, as well as how the information is presented to different audience members. For example, in a smartphone product line document, the disclosure feature may include determining which information may be disclosed to the consumer (e.g., product specification, price, etc.), which information needs to be kept to the inside personnel (e.g., production cost, supply chain details, etc.). These decisions are based on factors such as the marketing strategy, competitive situation, and regulatory requirements of the enterprise.
Step S10 involves processing the initial knowledge document and constraint information to generate document management coding features. First, the enterprise management system receives initial knowledge documents, which are text that contains target hierarchical features of the enterprise knowledge base. The target tier features may include categories, importance, security levels, etc. of documents, which are critical to subsequent document management and layering. The content of the initial knowledge document may encompass various information of the enterprise, such as policy documents, project reports, technical documents, and the like. At the same time, the enterprise management system also needs to receive constraint information that is used to limit the exposed features of the enterprise knowledge base target hierarchy information. Constraint information may be rules, conditions or labels provided in text form to specify which information may be revealed, how much is revealed, and what form the information needs to be hidden to take for desensitization. For example, constraint information may require some sensitive information to be obscured or completely hidden when disclosed. Upon receiving the initial knowledge document and constraint information, the enterprise management system will process the information using a particular algorithm or model. The purpose of the processing is to convert the text information into a computable vector representation for subsequent document management and retrieval operations. Such vector representations may capture semantic and structural features of documents such that similar documents are close to each other in vector space.
In particular, the enterprise management system may use, for example, word embedding models (e.g., word2Vec, gloVe, etc.) or deep learning models (e.g., convolutional neural network CNN, recurrent neural network RNN, etc.) to convert the initial knowledge documents and constraint information into description vectors. These models can map words or phrases in text into a high-dimensional vector space, capturing their semantic information. By training these models, the enterprise management system can learn the intrinsic laws and patterns of documents and constraint information. Finally, step S10 outputs a document management encoding feature including a description vector of the initial knowledge document and a description vector of constraint information. These features will be used as inputs for subsequent steps to guide the operations of dimension reduction, content adjustment, etc. of the document to ensure that the finally generated knowledge document meets both the hierarchical management requirements of the enterprise and the constraints of information disclosure.
Step S20: and performing dimension reduction operation on the initial knowledge document to obtain a mapped knowledge text, wherein the mapped knowledge text is a text with dimension smaller than that of the initial knowledge document, and simultaneously maintaining the target level characteristics of the enterprise knowledge base.
Step S20 involves performing a dimension reduction operation on the initial knowledge document to obtain a mapped knowledge text. When the enterprise management system performs step S20, the initial knowledge document is processed to reduce its dimension. Dimension reduction is a commonly used data processing technique that aims to reduce the complexity of data while retaining its critical information. In this process, the enterprise management system will map the initial knowledge document from the original high-dimensional space to a low-dimensional space using a specific algorithm or model to obtain mapped knowledge text. The mapping knowledge text is a text representation in a low-dimensional space obtained by dimension reduction processing. It is the result of mapping the original text data to a particular vector space, which typically has fewer dimensions, but is still able to retain the critical information of the original text. For example, in processing a document for a smartphone, the mapped knowledge text may be a vector representation composed of several key features (e.g., brand, operating system, screen size, etc.) that can concisely describe the primary content of the document.
This mapping process is based on a specific vector space and probability distribution. The vector space is a mathematical space formed by vectors and used to represent text information in a document. The probability distribution describes the distribution of these vectors in space, reflecting the association and similarity between text messages. By mapping, the enterprise management system can represent the information in the initial knowledge document in a more compact, easier to process form.
In the dimension reduction operation, the enterprise management system may employ a method such as Principal Component Analysis (PCA), singular Value Decomposition (SVD), or an auto encoder (Autoencoder) in deep learning. The method can identify and extract key features in the document, and remove redundant information, so that the purpose of dimension reduction is realized. For example, a self-encoder is a neural network model that encodes input data into a low-dimensional representation by learning, and is capable of reconstructing the original data from such a low-dimensional representation. In step S20, the self-encoder may be used to encode the initial knowledge document into mapped knowledge text.
It is important to ensure that the mapping knowledge text is dimensionality reduced while still maintaining the target level features of the enterprise knowledge base. This means that information critical to document classification, retrieval and management needs to be retained during the dimension reduction process. By selecting the appropriate dimension reduction method and parameter settings, the enterprise management system can ensure that the mapping knowledge text does not lose key information while reducing the data volume.
In a word, step S20 obtains the mapped knowledge text by performing dimension reduction operation on the initial knowledge document, and provides a more efficient and more convenient foundation for subsequent document management and retrieval. Implementation of this step may involve the selection and application of various algorithms and models, depending on the nature of the data and the processing requirements.
Step S30: generating a focusing text corresponding to the initial knowledge document according to the mapping knowledge text, the document management coding features and the target hierarchy mask, wherein the target hierarchy mask is used for distinguishing the target hierarchy text and the rest hierarchy texts except the target hierarchy text in the initial knowledge document, and the focusing text is used for generating disclosure information corresponding to the target hierarchy information disclosure features of the enterprise knowledge base in the target hierarchy text.
Step S30 involves generating a cross focus text (i.e., cross attention text) using the mapped knowledge text, the document management coding feature, and the target level mask. First, the enterprise management system obtains the mapped knowledge text, which is a text representation in the low-dimensional space obtained after the dimension reduction operation, which maintains the key features of the target hierarchy of the enterprise knowledge base. Mapping knowledge text as input information provides a basis for generating a confocal text.
Second, document management coding features are also used for this step. These features are extracted from the initial knowledge document by specific algorithms for classification, retrieval and management of the document. They represent the content and structural information of the document in the form of vectors that help the enterprise management system understand and process the text data.
In addition, the target hierarchy mask plays a key role in step S30. The mask is used to distinguish between target hierarchical text and other hierarchical text in the initial knowledge document. By applying the target hierarchy mask, the enterprise management system can focus on processing information related to the target hierarchy while ignoring other unrelated hierarchy text. This masking mechanism ensures that the generated inter-focused text corresponds to a particular target level.
In generating the confocal text, the enterprise management system can employ a cross-attention mechanism. This is a technique commonly used in machine learning, particularly in the field of natural language processing. The cross-attention mechanism allows the model to focus on information from different sources simultaneously and combine them effectively as the text is processed. In this step, the mapped knowledge text, document management coding features, and target level mask are input into the cross-attention model as different sources of information.
For example, assume that the enterprise knowledge base contains multiple levels of information, such as product, technology, and market levels. When it is desired to generate disclosure information related to a market tier, the target tier mask will be set to focus only on the text of the market tier. The enterprise management system will combine the mapped knowledge text (providing reduced-dimension key information) and document management coding features (providing document structure and content information) to generate a focused text by cross-attention mechanism. This confocal text focuses on market-level information and generates corresponding disclosure information based on the enterprise knowledge base target level information disclosure features.
Step S30 generates a confocal text corresponding to a specific target level by comprehensively utilizing the mapping knowledge text, the document management coding feature, and the target level mask. This process may involve machine learning techniques such as cross-attention to ensure that the generated text is accurate, relevant and meets the information disclosure requirements of the business.
Step S40: content adjustment is carried out on the mapped knowledge text according to the document management coding characteristics and the confocal text, and an adjusted knowledge document is obtained; the adjusted knowledge document comprises target level characteristics of the initial knowledge document and target level information disclosure characteristics of the enterprise knowledge base constrained by constraint information.
In step S40, the enterprise management system performs content adjustment on the mapped knowledge text according to the document management coding feature and the confocal text. The document management coding features contain the structure and key information of the document that assist the enterprise management system in understanding the content and context of the document. The inter-focus text is generated in step S30, which focuses on the information of the target hierarchy and ignores the contents of other hierarchies. By adjusting the mapped knowledge text, the enterprise management system can ensure that the adjusted knowledge document not only contains the target level characteristics of the initial knowledge document, but also meets the constraint requirements of the enterprise knowledge base target level information disclosure.
In the process of content adaptation, enterprise management systems may employ various natural language processing techniques and machine learning algorithms. For example, it may use a text generation model (e.g., recurrent neural network RNN, long short term memory network LSTM, or transfomer, etc.) to generate text content related to the target hierarchy. These models are able to generate coherent, context-conforming text from given input information, such as document management coding features and inter-focused text. In addition, to ensure that the adjusted knowledge document meets the constraint requirements of the enterprise knowledge base, the enterprise management system may also screen and modify the generated text using a series of rules, templates, or predefined constraints. These constraints may include format requirements of the document, sensitivity requirements of the information disclosure, compliance requirements of a particular industry, and the like. By applying these constraints, the enterprise management system can ensure that the adjusted knowledge document contains not only the key information of the target hierarchy, but also meets the standards and specifications of the enterprise.
For example, assume that an enterprise knowledge base has a document regarding product introduction that contains multiple levels of information about the product (e.g., basic attributes, technical details, market location, etc.). Now, enterprises want to generate an adjusted knowledge document containing only basic attributes and market location information for external partner reference. In this case, the enterprise management system would identify the basic attributes and market location information in the original knowledge document based on the document management coding features and content tune the mapped knowledge text in combination with the confocal text (information focused on both levels). The finally generated adjusted knowledge document only contains relevant information of basic attributes and market positioning, and meets the information disclosure constraint requirements of enterprises.
In summary, step S40 generates an adjusted knowledge document meeting specific requirements by performing content adjustment on the mapped knowledge text and combining the document management coding feature and information of the confocal text. This process may involve the application of various natural language processing techniques and machine learning algorithms to ensure that the generated document is both accurate and meets the standards and specifications of the enterprise.
In one embodiment, in step S40, content adjustment is performed on the mapped knowledge text according to the document management coding feature and the confocal text to obtain an adjusted knowledge document, which may include: step S41: and giving s rounds of disturbance information in the mapping knowledge text based on the first machine learning model to obtain a disturbance text, wherein s is more than or equal to 1. Step S42: and carrying out s rounds of disturbance information reasoning and disturbance cleaning processing on the disturbance text according to the document management coding characteristics and the confocal text based on the first machine learning model to obtain a cleaning text. Step S43: restoring the cleaning text to the dimension corresponding to the initial knowledge document, and generating the adjusted knowledge document.
In step S41, the enterprise management system processes the mapped knowledge text using an algorithm of the first machine learning model. The task of this model is to introduce a certain amount of disturbance information in the text, i.e. to make some form of slight modification or adjustment to the original text. The purpose of this is to enhance the generalization ability of the model, making it more robust in handling various variants or noise.
Specifically, the enterprise management system performs s rounds (s is an integer greater than or equal to 1) of perturbation processing on the mapped knowledge text based on the first machine learning model. Each round of perturbation may make minor changes to some parts of the text, such as replacing vocabulary, adjusting sentence structure, etc. Such changes are controllable and are intended to simulate various situations that may occur in practical applications.
For example, assume that the mapped knowledge text is a sentence describing a product property: the product has the characteristics of high efficiency and low energy consumption. The model can replace high efficiency with excellent performance in the first round of disturbance, and low energy consumption can be adjusted to be energy-saving and environment-friendly in the second round of disturbance. Thus, after s rounds of disturbance, the obtained disturbance text is a variant of the original text, so that the original information is reserved, and a certain diversity is introduced.
In step S42, the enterprise management system continues to process the perturbed text generated in step S41 using the first machine learning model. The goal of this time is to perform the perturbation information reasoning and cleaning process, i.e., to identify and remove unnecessary or redundant information that is introduced during the perturbation process to restore the original sharpness and accuracy of the text. Specifically, the model may combine the document management encoding features and information of the inter-focused text to carefully analyze the perturbed text. It will try to infer which perturbations are beneficial (i.e., enhance the expressive power of the text or adapt to a particular application scenario) and which perturbations are not beneficial (i.e., introduce noise or reduce the readability of the text). The model may then perform a cleaning process on the unwanted perturbations, i.e., remove or correct them from the text to a more suitable form. This process may take a number of rounds (equal to the number of disturbance rounds S in step S41) to ensure that all unnecessary disturbances are removed as much as possible. The finally obtained clear text is an optimized version which not only keeps the key information of the mapping knowledge text, but also adapts to specific application scenes and constraint conditions.
For example, assume that the disturbance text generated in step S41 is: the product has the characteristics of excellent performance, energy conservation and environmental protection, and excellent stability and durability. "in step S42, if the model determines that the portion of the information of" excellent stability and durability "is redundant or irrelevant to the target application scenario, it may remove or modify the portion of the information into a more compact form. Thus, the resulting clean text may be: the product has the characteristics of excellent performance, energy conservation and environmental protection. Such text is both concise and well-defined and meets the needs of a particular application.
Step S43 is an important step in the knowledge document processing process, and involves restoring the text subjected to the multiple rounds of perturbation and cleaning (i.e., the cleaned text) to the dimensions corresponding to the original knowledge document to generate the final adjusted knowledge document.
In the previous steps, the enterprise management system has made content adjustments to the mapped knowledge text, including assigning perturbation information and performing perturbation clean-up processes. These operations may cause the dimensions of the text to change, for example, by mapping the text to a low dimensional space for processing through a dimension reduction operation, or introducing additional features or information during the perturbation process. Therefore, the dimensions of the clean text need to be restored to be consistent with the original knowledge document before the final adjusted knowledge document is generated. To achieve this goal, enterprise management systems employ specific algorithms or models to process the clean-up text. These algorithms or models may include inverse dimension reduction techniques, feature mapping, or information fusion, among others. Specifically, if the initial knowledge document is subjected to a dimension reduction operation prior to processing, then in this step the enterprise management system may restore the cleaned text to the original high-dimensional space using a corresponding dimension reduction technique. If additional features or information are included in the clear text, the device may align and integrate the features with features of the initial knowledge document using feature mapping or information fusion, etc.
For example, assume that the initial knowledge document is a high-dimensional collection of feature vectors, and that the mapped knowledge text and the clean-up text are processed and generated in this high-dimensional space. In step S43, the enterprise management system maps the clean text from the low-dimensional space back to the high-dimensional space using a particular algorithm or model (e.g., inverse transform of principal component analysis PCA, self-encoder, etc.), or fuses the additionally introduced features with features of the initial knowledge document (e.g., by feature stitching, weighted summation, etc.). In this way, the generated adapted knowledge document will have the same dimensions as the original knowledge document, while containing the optimized and adapted information content.
It should be noted that the enterprise management system also needs to ensure the quality and accuracy of the adjusted knowledge document when restoring the clean text to the dimension corresponding to the original knowledge document. This may require some evaluation index or verification method to be employed to quality check and verify the generated document. For example, similarity metrics (e.g., cosine similarity, euclidean distance, etc.) may be used to compare the similarity between the adjusted knowledge document and the initial knowledge document; or the generated document is evaluated for accuracy and credibility by using manual annotation data or domain expert knowledge. Step S43 generates a final adjusted knowledge document by restoring the cleaned text to the dimension corresponding to the initial knowledge document. This process may involve the application of various algorithms and models to ensure that the generated document both meets specific dimensional requirements and contains optimized and adjusted information content.
Specifically, step S42, based on the first machine learning model, performs S rounds of disturbance information reasoning and disturbance cleaning processing on the disturbance text according to the document management coding feature and the confocal text, to obtain a cleaning text, including:
step S421: based on the first machine learning model, according to the document management coding characteristics and the confocal text, the disturbance information endowed by the ith round in the mapping knowledge text is inferred, so that the ith inference disturbance information is obtained, i is more than or equal to 1 and less than or equal to s.
Step S422: clearing the ith reasoning disturbance information in the text with the disturbance information to obtain an adjusted text with the disturbance information; the initialized text with the disturbance information is a disturbance text.
When i is less than s, taking i+1 as an adjusted i, continuing to infer disturbance information endowed by an ith round in the mapping knowledge text on the basis of a first machine learning model according to document management coding characteristics and a confocal text, and iterating the operation of obtaining the ith inference disturbance information; when i=s, the adjusted text with disturbance information is taken as the clear text.
In step S421, the enterprise management system uses the first machine learning model to combine the document management coding feature and the information of the confocal text, and infers the disturbance information given by each round in the mapped knowledge text. The reasoning process herein refers to the model attempting to identify and understand the specific effects that each round of perturbation has on the original text, as well as the association between these effects and document management coding features and the inter-focused text. Specifically, for disturbance information of the ith round (where the value range of i is from 1 to s), the model performs inference analysis based on existing knowledge and learned rules. The document structure and the hierarchical relation reflected by the document management coding features are comprehensively considered, and the target hierarchical information emphasized by the inter-focused text is used for deducing the influence of the disturbance on the text content and the semantics. In this way, the model can generate an inference result regarding the ith round of disturbance information, i.e., the ith inference disturbance information.
For example, assume that there is a sentence in the mapped knowledge text: the product has the characteristics of high efficiency and low energy consumption. "in the first round of turbulence," high performance "can be replaced with" excellent performance ". When the disturbance information is inferred, the model can combine the document management coding characteristics and the information of the confocal text to analyze the influence of the replacement on the overall content and the semantics of the text. If the substitution matches the target hierarchy information and does not disrupt the structure and hierarchy of the document, the model considers this to be a valid perturbation and saves it as the 1 st inferred perturbation information.
In step S422, the enterprise management system performs a cleaning process on the text with the disturbance information according to the reasoning result of the first machine learning model. The cleaning process herein refers to removing or correcting disturbance information that is judged by the model to be unnecessary or harmful to restore the original sharpness and accuracy of the text. Specifically, for the ith inferential perturbation information, if the model considers that it has a negative impact on the overall quality and readability of the text, or does not match the target level information, it is removed or corrected from the text with the perturbation information. The text thus processed is the adjusted text with the disturbance information.
It should be noted that during the cleanup process, the enterprise management system needs to ensure that the information structure and hierarchical relationships reflected by the document management coding features and the confocal text are not destroyed. Thus, the cleaning process needs to be carefully performed to avoid introducing new errors or inconsistencies. In addition, step S42 further includes an iterative process: when i is less than s, taking i+1 as the adjusted i, and continuing to perform reasoning and cleaning operations according to document management coding features and the confocal text based on a first machine learning model; when i=s, the iteration is ended, and the adjusted text with disturbance information is output as a final clear text. By the iterative mode, the enterprise management system can gradually optimize and perfect the cleaning effect of the text until a satisfactory result is achieved.
In one embodiment, in step S10, the generating a document management code feature according to the initial knowledge document and constraint information may specifically include:
step S11: extracting enterprise knowledge base target level characteristics contained in the initial knowledge document based on the knowledge base detection network; embedding and mapping constraint information based on a constraint information network to obtain a coding result;
Step S12: and generating document management coding features according to the target level features of the enterprise knowledge base and the coding results.
In step S11, the enterprise management system first processes the initial knowledge document using the knowledge base detection network. The knowledge base detection network is a pre-trained machine learning model, and the main task of the knowledge base detection network is to identify and extract target level features related to an enterprise knowledge base in a document. These features may include knowledge categories to which the document belongs, locations in the hierarchy, relationships with other documents, and so on. For example, if the initial knowledge document is an article about a product introduction, the knowledge base detection network may extract features such as the product category to which the article belongs, the location of the product in the hierarchy (e.g., which product line belongs, which sub-category is the product under, etc.), and the relationship with other related products. Meanwhile, the enterprise management system can also use the constraint information network to carry out embedded mapping on constraint information. The constraint information network is another machine learning model, and the function of the constraint information network is to convert constraint information (such as format requirements, content limitation and the like of a document) into a coding result, and the coding result can be conveniently understood and used in a subsequent processing process. For example, if the constraint information specifies that the document must use a particular title format and paragraph structure, the constraint information network may translate these requirements into an encoded form for consideration in the subsequent generation of document management encoding features.
In step S12, the enterprise management system generates document management code features in combination with the target hierarchical features of the enterprise knowledge base and the code results extracted in step S11. This process can be understood as fusing together the information from two sources (i.e., the content and constraints of the document) to form a unified coded representation. This unified coded representation will serve as the basis for subsequent processing steps such as storage, retrieval, updating of documents, etc. In particular, the enterprise management system may employ feature fusion techniques (e.g., feature stitching, weighted summation, etc.) to combine the enterprise knowledge base target level features and encoding results into a new feature vector or matrix. The new feature vector or matrix is the document management coding feature, integrates the information of the content and the constraint of the document, and can describe the characteristics and the requirements of the document more comprehensively. For example, in an actual document management system, the document management coding feature may include information on aspects of the document's topic classification, keyword list, hierarchical structure location, access rights level, etc. This information will be used for subsequent document processing and management operations.
Wherein, as an implementation mode, the constraint information comprises one or more constraint items, and the coding result comprises coding vectors respectively corresponding to the one or more constraint items. Then, based on this, step S12, generating a document management coding feature according to the target hierarchical feature of the enterprise knowledge base and the coding result may specifically include:
step S121: combining the coding vector corresponding to the first constraint item in the one or more constraint items with the target level characteristics of the enterprise knowledge base to obtain an adjustment coding vector corresponding to the first constraint item; the first constraint item is a preset constraint item used for representing an enterprise knowledge base.
In step S121, the enterprise management system first processes one or more constraint items in constraint information. These constraints may be in the form of text segmentation, constraint tags, etc., which provide specific restrictions and requirements on the content or format of the document. The result of the encoding is the respective corresponding encoding vectors of the constraint terms, which are converted in some way (e.g., embedded mapping) into numerical form that can be understood and processed by the computer.
Next, the enterprise management system is specifically concerned with a particular constraint, referred to as a "first constraint". This first constraint term is preset to characterize some important or special constraint of the enterprise knowledge base. For example, in one product document, the first constraint term may be a constraint on a product category that specifies the product category to which the document belongs. The enterprise management system combines the encoding vector corresponding to the first constraint item with the enterprise knowledge base target hierarchy feature. The combination here may be vector stitching, weighted summation, or other form of feature fusion. Vector concatenation is a common combination of two vectors connected in a certain order to form a longer vector. In this way, the encoded vector of the first constraint and the enterprise knowledge base target hierarchy feature are integrated into a new vector, which is referred to as the tuning encoded vector.
For example, assume that the enterprise knowledge base target hierarchy feature is a vector representation of the subject matter of the document, and the first constraint is a constraint on the document format (e.g., whether or not a picture, table, etc. is included). The enterprise management system first generates encoded representations of the two vectors separately and then combines them into a longer adjusted encoded vector by way of vector concatenation. The adjustment encoding vector contains both information about the subject of the document and constraint information about the format of the document. Step S121 generates an adjusted code vector containing more information by combining the code vector of the first constraint item and the target hierarchy feature of the enterprise knowledge base. This adjustment encoding vector will be further processed in subsequent steps to generate the final document management encoding feature.
Step S122: processing the adjustment coding vector corresponding to the first constraint item according to the feedforward neural network to obtain a target coding vector corresponding to the first constraint item; the target coding vector corresponding to the first constraint item and the coding vector corresponding to the first constraint item comprise consistent dimensionality.
In step S122, the enterprise management system processes the adjustment encoding vector corresponding to the first constraint term using the feedforward neural network. A feed-forward neural network is a basic neural network structure that organizes neurons in a hierarchical structure, and information flows in only one direction in the network, from the input layer to the output layer, without forming any loops or loops. Such a network is particularly suited for non-linear transformation of input data to extract higher level features. Specifically, the enterprise management system takes the adjustment encoding vector corresponding to the first constraint term as an input to the feedforward neural network. This adjusted code vector is obtained in step S121 by combining the code vector of the first constraint term and the enterprise knowledge base target hierarchy feature. The feed-forward neural network then processes this input vector layer by layer through a series of linear transformations and nonlinear activation functions. Each layer of neurons performs weighted summation on its inputs and adds bias terms, and then introduces nonlinear factors through activation functions (e.g., reLU, sigmoid, etc.) to obtain the output of that layer. This process is repeated until the output layer of the network is reached.
Finally, the output layer of the feedforward neural network generates a new vector, which is called the target encoding vector. The target coding vector is obtained by processing an adjustment coding vector corresponding to the first constraint term through a neural network, and contains higher-level and more abstract characteristic information. And, the dimension of the target coding vector is consistent with the original coding vector corresponding to the first constraint term, so that the vector replacement and comparison can be conveniently carried out in the subsequent steps.
For example, assuming that the first constraint is a constraint on the format of the document (e.g., whether a picture, table, etc. is included), the corresponding adjustment encoding vector may be a high-dimensional vector in which each element represents the presence or absence of a particular format feature. By the processing of the feedforward neural network, the target coding vector can be converted into a new vector with the same dimension, but the element values in the new vector have been changed, and the understanding and abstraction of the complex relationship between the format features by the neural network are reflected. In this way, the enterprise management system can utilize the target encoding vector to more accurately represent and identify format constraint information of the document during subsequent document management processes.
Step S123: and replacing the coding vector corresponding to the first constraint item in the coding result with the target coding vector corresponding to the first constraint item to generate the document management coding feature.
In step S123, the enterprise management system first obtains a coding result, where the coding result includes a coding vector corresponding to the initial constraint term. These encoded vectors are derived from constraint information through an embedding mapping, which represents a representation of the constraint term in feature space. The enterprise management system may then focus on the encoding vector corresponding to the first constraint term therein.
Next, the enterprise management system replaces the original encoding vector corresponding to the first constraint term with the target encoding vector processed through the feedforward neural network in step S122. This replacement process ensures that the neural network deep processed features can be integrated into the final document management coding feature. Since the target encoding vector has a consistent dimension with the original encoding vector, this replacement process is straightforward and does not require any additional dimensional transformations or matching. After the replacement is completed, the enterprise management system obtains an updated encoding result, and the encoding result now contains the target encoding vector corresponding to the first constraint item and the original encoding vector corresponding to the other constraint items. The updated encoding result is the final document management encoding feature which will be used in subsequent document management tasks such as document storage, retrieval, classification, etc.
For example, assume that in one product document, the first constraint term is a constraint on the product category. The initial encoding result may include encoding vectors for a plurality of constraints, such as product category, document format, release date, etc. In step S122, the code vector of the product category is processed through the feedforward neural network, and a target code vector is obtained. Then, in step S123, the enterprise management system replaces the target encoding vector with the corresponding position in the encoding result, thereby obtaining a document management encoding feature including the higher level product category feature. In this way, in the subsequent document management process, the enterprise management system can more accurately perform the operations of storing, searching, classifying and the like of the documents according to the product types.
In one embodiment, in step S11, the extracting, based on the knowledge base detection network, the target hierarchical feature of the enterprise knowledge base included in the initial knowledge document may include:
step S111: and extracting a target level text in the initial knowledge document to obtain the target level text corresponding to the initial knowledge document.
Step S112: the knowledge base detection network extracts enterprise knowledge base target hierarchy features from the target hierarchy text.
In step S111, during document processing and analysis, the enterprise management system first focuses on a particular portion of the initial knowledge document, i.e., the target hierarchical text. These target hierarchical texts are key contents related to the hierarchical structure of the enterprise knowledge base in the documents, and may include important features such as classification information of the documents, belonging hierarchy, relationship with other documents, and the like. To extract these target level text, the enterprise management system may utilize natural language processing techniques such as word segmentation, named entity recognition, or regular expression matching. For example, if the initial knowledge document is an article about a product introduction, the target hierarchical text may be a particular paragraph or sentence in the article that explicitly indicates the category, sub-category, and location of the product within the overall product hierarchy. After the extraction process is completed, the enterprise management system obtains one or more target level texts, which are used as input for the next processing.
In step S112, after successfully extracting the target hierarchical text, the enterprise management system processes the text using a pre-trained knowledge base detection network. Knowledge base detection networks are a deep learning model that has been trained on a large number of related documents to learn how to identify and extract features from text that are related to the hierarchical structure of an enterprise knowledge base. This network may be a complex structure consisting of multiple neural network layers, such as a Convolutional Neural Network (CNN) for capturing local features in text, a Recurrent Neural Network (RNN) or long-short-term memory network (LSTM) for processing sequence data and capturing long-term dependencies in text. Through the combination and processing of these network layers, the knowledge base detection network is able to extract deep, abstract feature representations from the target-level text. Specifically, the enterprise management system inputs the target level text into a knowledge base detection network, which performs feature extraction and conversion on the text layer by layer. Finally, the output layer of the network generates a feature vector or set of feature values representing the target hierarchical features of the enterprise knowledge base extracted from the target hierarchical text. These features will be used in subsequent document management and processing tasks such as classification, retrieval and recommendation of documents, etc.
In one embodiment, in step S30, generating a focused text corresponding to the initial knowledge document according to the mapped knowledge text, the document management coding feature and the target hierarchy mask specifically includes:
step S31: and generating an initial confocal text corresponding to the initial knowledge document according to the mapped knowledge text and the document management coding characteristic based on the first machine learning model.
Step S32: and shielding the rest level texts except the target level text in the initial confocal text according to the target level mask, so as to obtain the corresponding confocal text of the initial knowledge document.
In step S31, the enterprise management system processes the mapped knowledge text and document management coding features using a first machine learning model. The first machine learning model may be a deep learning model, such as a Recurrent Neural Network (RNN), long short term memory network (LSTM), or Transformer, which perform well in processing sequence data and text generation tasks.
Specifically, the enterprise management system provides the mapped knowledge text and document management coding features as input data to the first machine learning model. The mapped knowledge text is the text of the original knowledge document after being converted by a certain mapping relation, and can contain key information or specific structures in the document. The document management coding feature is a group of feature vectors extracted from the document in the coding process, and the feature vectors can reflect the content, structure, hierarchy and other information of the document. The first machine learning model uses the input data to generate an initial confocal text corresponding to the initial knowledge document via an internal computing and learning mechanism. This initial confocal text is a transformation or reconstruction of the original document that may highlight some important information or hierarchy in the document for subsequent text processing and analysis.
After generating the initial confocal text, the enterprise management system further processes the text using the target hierarchy mask. The target hierarchical mask is a mask matrix or vector that identifies the different hierarchical text in the document that can help the enterprise management system accurately locate the target hierarchical text in the initial inter-focused text.
Specifically, the enterprise management system may block or mask the remaining hierarchical text of the initial confocal text, excluding the target hierarchical text, as indicated by the target hierarchical mask. This may be accomplished by setting the text portions of the initial confocal text that correspond to non-target levels to zero, deleting or replacing them with specific marks, etc. By doing so, the enterprise management system can obtain a focused text that contains only the target hierarchical text. This inter-focused text will play an important role in subsequent document processing and analysis tasks. For example, in calculating document similarity, performing document classification, or extracting document key information, the enterprise management system may focus only on target-level text portions in the confocal text, thereby improving the accuracy and efficiency of the process.
In the embodiment of the application, the hierarchical management method of the knowledge document can be executed through a document management neural network, and then the method provided by the embodiment of the application can further comprise a debugging process of the document management neural network. Specifically, the method may include the steps of:
step S110: acquiring one or more debugging data of a document management neural network, each debugging data comprising a pair of associated training knowledge documents and training constraint information; the training knowledge document is a text containing target level characteristics of the enterprise knowledge base, and the training constraint information is information for constraining the target level information disclosure characteristics of the enterprise knowledge base in the training knowledge document.
Step S120: based on the training knowledge document and training constraint information, the training document management coding feature is generated, wherein the training document management coding feature comprises a description vector of the training knowledge document and a description vector of the training constraint information.
Step S130: performing dimension reduction operation on the training knowledge document based on the document management neural network to obtain a training mapping knowledge text; the training mapping knowledge text is text with dimensions smaller than those of the training knowledge document, and meanwhile, target level characteristics of the enterprise knowledge base are maintained.
Step S140: based on the document management neural network, generating a confocal text corresponding to the training knowledge document according to the training mapping knowledge text, the training document management coding feature and the training target hierarchy mask, wherein the training target hierarchy mask is used for distinguishing the target hierarchy text in the training knowledge document from the rest hierarchy texts except the target hierarchy text, and the confocal text is used for generating the disclosure information corresponding to the target hierarchy information disclosure feature of the enterprise knowledge base in the target hierarchy text.
Step S150: and based on the document management neural network, according to the training document management coding characteristics and the confocal text, performing content adjustment on the training mapping knowledge text, and generating an adjusted knowledge document corresponding to the training knowledge document.
Step S160: and optimizing the network configuration variables of the document management neural network according to the training knowledge document and the adjusted knowledge document to obtain the document management neural network with the debugged document management neural network.
In step S110, the enterprise management system acquires debug data required for the document management neural network. The debug data includes a pair of associated training knowledge documents and training constraint information. Training knowledge documents are text that contains target level features of an enterprise knowledge base, which represent various knowledge documents that may be encountered in practical applications. The training constraint information is information for constraining the object level information disclosure features of the enterprise knowledge base in the training knowledge documents, and defines the types of information that can be disclosed or hidden at different levels.
Next, in step S120, the enterprise management system generates training document management coding features from the training knowledge documents and training constraint information using the document management neural network. These encoded features include descriptive vectors of training knowledge documents and descriptive vectors of training constraint information, which represent representations of the documents and constraint information, respectively, in feature space.
Then, in step S130, the enterprise management system performs a dimension reduction operation on the training knowledge document again by using the document management neural network, so as to obtain a training mapping knowledge text. This process aims to reduce the dimensionality of the text data while preserving the enterprise knowledge base target hierarchy features therein. Through the dimension reduction operation, the subsequent processing can be more efficient, and meanwhile, the consumption of computing resources is reduced.
In step S140, the enterprise management system generates a confocal text corresponding to the training knowledge document from the training mapping knowledge text, the training document management encoding features, and the training target hierarchy mask using the document management neural network. The training target hierarchy mask is used to distinguish target hierarchy text from other hierarchies of text in the training knowledge document. The confocal text is to generate disclosure information corresponding to the disclosure features of the object hierarchy information of the enterprise knowledge base on the object hierarchy text, so that the neural network can pay more attention to the disclosure of the information of the object hierarchy.
Next, in step S150, the enterprise management system uses the document management neural network to perform content adjustment on the training mapping knowledge text according to the training document management coding feature and the confocal text, and generates an adjusted knowledge document corresponding to the training knowledge document. The process aims at modifying and perfecting the original training mapping knowledge text according to the coding characteristics and the information of the confocal text, so that the training mapping knowledge text meets the requirements of practical application.
Finally, in step S160, the enterprise management system optimizes the network configuration variables of the document management neural network according to the training knowledge document and the adjusted knowledge document, and obtains the document management neural network after debugging. The optimization process can adjust parameters such as weight, bias, learning rate and the like of the neural network by comparing differences between the training knowledge document and the adjusted knowledge document so as to improve performance and accuracy of the neural network. After this series of debugging steps, the document management neural network will be able to better perform the hierarchical management tasks of the knowledge document.
Step S150, based on the document management neural network, performs content adjustment on the training mapping knowledge text according to the training document management coding feature and the inter-focusing text, and generates an adjusted knowledge document corresponding to the training knowledge document, which may specifically include:
Step S151: and giving s rounds of disturbance information in the training mapping knowledge text based on a first machine learning model included in the document management neural network to obtain a training disturbance text, wherein s is more than or equal to 1.
Step S152: and carrying out s rounds of disturbance information reasoning and disturbance cleaning processing on the training disturbance text based on the first machine learning model according to the document management coding characteristics and the confocal text to obtain the training cleaning text.
Step S153: and restoring the training clear text to the dimension corresponding to the training knowledge document, and generating the adjusted knowledge document.
In step S151, the enterprise management system processes the training map knowledge text using the first machine learning model in the document management neural network. This first machine learning model may be a deep neural network, convolutional Neural Network (CNN), recurrent Neural Network (RNN), or transducer, depending on the task requirements for processing the text data. The enterprise management system generates training disturbance text by introducing disturbance information into the training mapping knowledge text. The perturbation information here may be a small change made to some vocabulary, sentence structure or semantics in the text. These variations are intended to simulate various noise or variations that may be encountered during actual text processing to enhance the robustness of the model. For example, perturbation information may be introduced by synonym substitution, random insertion, deletion, or rearrangement of words in sentences, and the like. The parameter s represents the number of rounds of introducing disturbance information, and the value of s is 1 or more, which means that disturbance processing is performed at least once. Each round of perturbation processing may produce different training perturbation texts that will be used in subsequent steps for training and optimization of the model.
After the training disturbance text is generated, the enterprise management system continues to utilize the first machine learning model, and performs multiple rounds of disturbance information reasoning and disturbance cleaning processing on the training disturbance text according to the document management coding characteristics and the confocal text. This process aims to recover from the perturbed text the cleaned text that is closer to the original text. Disturbance information reasoning, for example, involves parsing and predicting vocabulary, sentence structures, or semantics in a disturbance text to infer the expression closest to the original text. The disturbance clearing process can include removing or correcting unreasonable or unnecessary disturbance information introduced, so that clearing text is more accurate and reliable. After s rounds of processing, the enterprise management system obtains training and cleaning texts. The texts keep the main information of the original texts, and meanwhile, the processing capacity and the robustness of the model on disturbance information are integrated.
Finally, the enterprise management system restores the training clear text to the dimension corresponding to the training knowledge document, and generates an adjusted knowledge document. This process may involve operations such as expanding or interpolating the training clean text to conform its dimensions to the original training knowledge document. In this way, the adjusted knowledge document can be used as a basis for model training and optimization and used for subsequent knowledge document hierarchical management tasks.
The above embodiment of step S150 realizes content adjustment and optimization of training mapping knowledge text by introducing disturbance information, performing multiple rounds of disturbance reasoning and cleaning processes, dimension recovery, and the like. The steps jointly improve the performance and accuracy of the document management neural network when processing the actual knowledge document hierarchical management task.
In step S152, S-round disturbance information reasoning and disturbance cleaning processing are performed on the training disturbance text based on the first machine learning model according to the document management coding feature and the inter-focused text, so as to obtain the training cleaning text, which specifically includes:
step S1521: and (3) based on the first machine learning model, managing coding features and the confocal text according to the training document, and reasoning disturbance information endowed by the ith round in the training mapping knowledge text to obtain ith reasoning disturbance information, wherein i is more than or equal to 1 and less than or equal to s.
Step S1522: clearing the ith reasoning disturbance information in the text with disturbance information in training to obtain an adjusted text with disturbance information in training; the initialized text with the disturbance information is a training disturbance text.
When i is less than s, taking i+1 as an adjusted i, continuing to manage coding features and a confocal text based on a training document based on a first machine learning model, and reasoning disturbance information endowed by an ith round in a training mapping knowledge text to obtain an ith reasoning disturbance information operation for iteration; when i=s, the text with disturbance information of the adjusted training is taken as training clear text.
In the hierarchical management process of knowledge documents, in order to ensure the accuracy and robustness of the model, the text introducing disturbance needs to be processed. In step S1521, the enterprise management system uses a first machine learning model to manage coding features and confocal text according to the training document to perform an inference operation. Inference refers herein to the prediction or conclusion of a result or conclusion from a model that depends on existing information (i.e., document management coding features and cross-focused text). Specifically, the model infers disturbance information given by the ith round in the training mapping knowledge text. This disturbance information was introduced in a previous step in order to simulate noise or variations in the actual text processing. By reasoning, the model can identify the disturbance information to obtain an ith reasoning disturbance information. It should be noted that i is an iteration variable, whose value ranges from 1 to s, indicating that a total of s rounds of reasoning and cleaning are required.
Taking the text classification task as an example, if a word in the training mapped knowledge text is replaced with a synonym, then the replacement can be regarded as disturbance information. In the inference phase, the model may identify this substitution based on the context information and the coding features and label it as an inferential perturbation.
After the ith inference disturbance information is inferred, the enterprise management system performs a cleaning operation in training the text with the disturbance information. This cleaning operation refers to removing or correcting the identified disturbance information to restore the original state of the text or to improve the quality of the text. Specifically, the ith reasoning disturbance information is cleaned up in the text with disturbance information in the current training, and a text with disturbance information in the adjusted training is obtained. This adjusted text will be input for the next round of reasoning and cleaning process. Note that the initialized training-provided disturbance-information text is the training disturbance text generated in step S151.
Still taking the text classification task as an example, if disturbance information for a synonym replacement is identified in the inference phase, the model can replace this synonym back to the original vocabulary in the clean-up phase to recover the original representation of the text.
When i is less than s, it means that there are more reasoning and cleaning cycles remaining to be performed. At this time, i+1 is set as a new i value, and the routine returns to step S1521 to perform the next round of reasoning operation. When i is equal to s, it means that all reasoning and cleaning process rounds have been completed. At this time, the text with disturbance information of the finally obtained adjusted training is output as a training clear text. This training clean text will be input as a subsequent step (e.g., step S153) for generating an adapted knowledge document.
As one embodiment, step S120, generating training document management coding features based on the training knowledge document and training constraint information based on the document management neural network, includes:
step S121: and extracting target level characteristics of the enterprise knowledge base contained in the training knowledge document based on the knowledge base detection network.
Step S122: and based on a constraint information network included in the document management neural network, embedding and mapping training constraint information to obtain a training coding result.
Step S123: and generating training document management coding features according to the target level features of the enterprise knowledge base and the training coding results.
Steps S121-S123 describe how to generate training document management coding features based on the document management neural network in combination with training knowledge documents and training constraint information. In step S121, the enterprise management system processes the training knowledge document using the knowledge base detection network. The knowledge base detection network is a neural network model specifically designed to identify and extract specific hierarchical features in documents. In this scenario, its goal is to extract the enterprise knowledge base target level features contained in the training knowledge document. The enterprise knowledge base target hierarchical features may include classification information, hierarchies, keywords, etc. of documents that are critical to subsequent document management and retrieval. For example, in an enterprise knowledge base, documents may be categorized by different departments, projects, or topics and form a hierarchy. The knowledge base detection network is capable of automatically identifying and extracting these hierarchical features by learning and analyzing the content of the training knowledge document.
After the enterprise knowledge base target hierarchy features are extracted, step S122 focuses on processing training constraint information. And the constraint information network in the document management neural network is responsible for embedding and mapping training constraint information to obtain a training coding result. Embedding mapping is a technique that converts discrete data (e.g., text, labels, etc.) into a continuous vector representation, and is commonly used in the fields of natural language processing and machine learning. Here, the constraint information network converts training constraint information (which may be rules or conditions regarding document classification, format, access rights, etc.) into a continuous vector representation, i.e., training encoding results. The vector representation can capture the inherent relation and mode in constraint information, so that subsequent feature fusion and model learning are facilitated.
Finally, step S123 fuses the results of the first two steps to generate the training document management coding feature. Specifically, the method integrates the characteristics into a unified representation form through a certain fusion strategy (such as splicing, weighted summation, characteristic crossing and the like) according to the target level characteristics of the enterprise knowledge base and the training coding result. The unified representation, namely the training document management coding feature, not only contains hierarchical structure information in the training knowledge document, but also integrates rules and conditions in training constraint information. Such a feature representation has important instructive significance for subsequent document management tasks (e.g., classification, retrieval, recommendation, etc.), as it takes into account both the content and constraints of the document.
In summary, the derivation scheme in step S120 combines the knowledge base detection network and the constraint information network to effectively process and integrate the training knowledge document and training constraint information, thereby providing powerful feature support for the subsequent document management task.
Wherein, optionally, the training constraint information comprises one or more constraint items, and the training coding result comprises coding vectors respectively corresponding to the one or more constraint items; then, based on this, step S123 generates training document management coding features according to the target level features of the enterprise knowledge base and the training coding result, including:
step S1231: combining the coding vector corresponding to the first constraint item in the one or more constraint items with the target level characteristics of the enterprise knowledge base to obtain an adjustment coding vector corresponding to the first constraint item; the first constraint item is a preset constraint item used for representing an enterprise knowledge base.
Step S1232: processing the adjustment coding vector corresponding to the first constraint item according to the feedforward neural network to obtain a target coding vector corresponding to the first constraint item; the target coding vector corresponding to the first constraint item and the coding vector corresponding to the first constraint item comprise consistent dimensionality.
Step S1233: and replacing the coding vector corresponding to the first constraint item in the training coding result with the target coding vector corresponding to the first constraint item to generate the training document management coding feature.
Steps S1231-S1233 describe how to combine the target level features of the enterprise knowledge base and the training encoding results to generate training document management encoding features, particularly when the training constraint information includes a plurality of constraint terms.
In step S1231, the enterprise management system first identifies a particular constraint in the training constraint information, referred to as a first constraint. This first constraint is preset to specifically characterize certain important constraints of the enterprise knowledge base. For example, it may relate to a security level, access rights or class labels of the document, etc.
Next, the enterprise management system combines the encoding vector corresponding to this first constraint with the enterprise knowledge base target hierarchy features. The combination may be splicing, adding or combining by some specific fusion mechanism, in order to form a new encoding vector, i.e. the adjusted encoding vector corresponding to the first constraint. The adjustment encoding vector contains information of the first constraint item and integrates hierarchical structure characteristics of the enterprise knowledge base.
After obtaining the adjusted encoding vector corresponding to the first constraint, step S1232 introduces a feed-forward neural network to further process the vector. Feedforward neural networks are a common neural network architecture that learn complex representations of input data through multiple layers of nonlinear transformations.
Here, the objective of the feed-forward neural network is to receive as input the adjustment encoding vector and to output a new vector, i.e. the target encoding vector corresponding to the first constraint. This target encoding vector has the same dimensions as the encoding vector corresponding to the original first constraint, but after processing by the neural network it may contain more information and higher level representation of features.
Finally, step S1233 replaces the coding vector corresponding to the original first constraint term in the training coding result with the target coding vector generated in step S1232. In this way, the training encoding results are updated, which incorporates higher level features that have been processed by the neural network.
The updated training coding result, namely the training document management coding feature, not only considers the hierarchical structure information of the enterprise knowledge base, but also emphasizes the importance of the first constraint item, and improves the feature representation capability through the processing of the neural network. Such coding features will be more instructive for subsequent document management tasks.
By combining specific constraint item information and the processing capacity of the neural network, the quality and the representation capacity of the training document management coding feature are improved, and more powerful feature support is provided for document management tasks.
Optionally, in step S140, generating, based on the document management neural network, a confocal text corresponding to the training knowledge document according to the training mapping knowledge text, the training document management coding feature and the training target hierarchy mask may specifically include:
step S141: based on a first machine learning model included in the document management neural network, generating an initial confocal text corresponding to the training knowledge document according to the training mapping knowledge text and the document management coding feature.
Step S142: and shielding the rest level texts except the target level text in the initial confocal text according to the training target level mask, so as to obtain the confocal text corresponding to the training knowledge document.
Steps S141 and S142 describe how, through the document management neural network, the inter-focused text corresponding to the training knowledge document is generated in conjunction with the training map knowledge text, the training document management coding features, and the training target hierarchy mask.
In step S141, the enterprise management system processes the training map knowledge text and the document management coding feature using a first machine learning model, which is an important component in the document management neural network. The first machine learning model may be a deep learning model, such as a Recurrent Neural Network (RNN), convolutional Neural Network (CNN), or transducer, which is trained to learn and understand the structure and content of the document.
Specifically, the enterprise management system provides training mapped knowledge text and document management coding features as inputs to the first machine learning model. The model generates an initial confocal text corresponding to the training knowledge document by analyzing and learning patterns and associations in the input information. This initial confocal text is a transformation or reconstruction of the original knowledge document content that may contain key information, hierarchical structures, semantic relationships, etc. in the document. For example, if the training map knowledge text is a document about an enterprise organization, the document management code feature may include classification information, hierarchical structure, etc. of the document. The first machine learning model is capable of generating an initial, focused text by learning these features, wherein information such as key departments, positions and responsibilities in the organizational structure are highlighted.
After the initial confocal text is generated, step S142 further processes it with a training target hierarchy mask. Training the target hierarchy mask is a tool to indicate the importance of different hierarchies in a document that can help the model focus on information for a particular hierarchy, while ignoring other irrelevant hierarchies.
Specifically, the enterprise management system obscures or masks the remaining hierarchical text of the initial confocal text, excluding the target hierarchical text, as indicated by the training target hierarchical mask. Thus, the model only focuses on the information of the target level in the subsequent processing, and thus the focusing capability of the content of the specific level is improved.
For example, if the training target hierarchy mask indicates that the attention is focused on a higher-level management department in the organization structure, step S142 will block out other hierarchy texts except the higher-level management department in the initial confocal text. The resulting confocal text will contain only information related to the higher-level management department, enabling the model to focus more on this level of content.
In summary, through a processing manner combining the first machine learning model and the training target hierarchy mask, a confocal text corresponding to the training knowledge document is generated. The text not only highlights key information in the document, but also improves focusing capability of specific-level content, and provides powerful support for subsequent document management and processing tasks.
Step S160, optimizing the network configuration variables of the document management neural network according to the training knowledge document and the adjusted knowledge document to obtain the document management neural network with the debugged completion, which specifically comprises the following steps:
Step S161: determining fusion errors according to the training knowledge document and the adjusted knowledge document, wherein the fusion errors comprise disturbance information errors, document content errors and hierarchical text distribution errors; the disturbance information error is used for indicating the given disturbance information and the error of reasoning disturbance information, the document content error is used for indicating the error between the training knowledge document and the description vector representation of the adjusted knowledge document in the target hierarchical text, and the hierarchical text distribution error is used for indicating the adjusting effect of the confocal text in the target hierarchical text.
Step S162: and optimizing the network configuration variables of the document management neural network according to the fusion error to obtain the document management neural network after debugging.
Steps S161 and S162 describe how to optimize the network configuration variables of the document management neural network based on the training knowledge document and the adjusted knowledge document, and finally obtain the document management neural network with the debug completed.
In step S161, the enterprise management system first determines a fusion error according to the training knowledge document and the adjusted knowledge document. The fusion error is an error index comprehensively considering a plurality of aspects, and mainly comprises disturbance information error, document content error and hierarchical text distribution error.
The perturbation information error is used to indicate the difference between the assigned perturbation information and the inferred perturbation information. In the document processing process, in order to enhance the robustness of the model, some disturbance information is usually introduced. The disturbance information error is the degree of difference between the introduced disturbance information and the disturbance information used in the actual reasoning of the model.
Document content errors are used to indicate errors between the training knowledge document and the description vector representation of the adjusted knowledge document in the target hierarchical text. In short, the consistency degree of the original document and the adjusted document in the content is compared. If the adjusted document has a large difference in content from the original document, the document content error will be large.
The hierarchical text distribution error is used to indicate the effect of adjusting the inter-focused text at the target hierarchical text. The inter-focused text is obtained by specific processing of the original document, with the aim of making the model more focused on certain levels of importance in the document. The hierarchical text distribution error is a measure of how well this processing is done, i.e., comparing whether the inter-focused text is consistent with the text distribution of the original document on the target hierarchical level.
The process of determining the fusion error may be accomplished by calculating a weighted sum of the three errors or other suitable combination. This fusion error will serve as an important basis for the subsequent optimization of the document management neural network.
After determining the fusion error, step S162 will optimize the network configuration variables of the document management neural network in accordance with this error. Network configuration variables are some of the adjustable parameters in the neural network, such as weights, biases, etc. By adjusting these parameters, the behavior and performance of the neural network can be changed.
The optimization process can be implemented by various optimization algorithms, such as gradient descent method, random gradient descent method, adam, etc. These algorithms adjust the network configuration variables based on the magnitude and direction of the fusion error so that the output of the neural network is closer to the desired output (i.e., the adjusted knowledge document).
After multiple iterations and optimizations, when the fusion error reaches a preset threshold or no longer drops significantly, the document management neural network can be considered to have been debugged. The resulting neural network will now be able to better process similar knowledge documents and generate more accurate, focused, inter-focused text.
Step S161, determining a fusion error according to the training knowledge document and the adjusted knowledge document, includes:
step S1611: and determining disturbance information errors according to disturbance information and reasoning disturbance information which are endowed in the process of carrying out content adjustment on the training mapping knowledge text.
Step S1612: and determining document content errors according to the training knowledge document and the adjusted knowledge document.
Step S1613: and determining the hierarchical text distribution error according to the training target hierarchical mask and the confocal text.
Step S1614: and determining a fusion error according to the disturbance information error, the document content error and the hierarchical text distribution error.
Steps S1611-S1614 describe how to determine a fusion error, which is derived from the combination of errors in multiple aspects, specifically including disturbance information errors, document content errors, and hierarchical text distribution errors.
In step S1611, the enterprise management system needs to determine the disturbance information error according to the disturbance information and the inference disturbance information given in the process of content adjustment of the training map knowledge text. The perturbation information is some small changes introduced to enhance the robustness of the model, which can help the model to better cope with various situations that may occur in practical applications.
Specifically, the enterprise management system first obtains perturbation information given when content adjustment is performed on training mapping knowledge text, and the perturbation information can be an operation of replacing, deleting or inserting certain words in the text. Then, the enterprise management system can acquire the reasoning disturbance information actually used by the model in the reasoning process. Finally, the enterprise management system can determine the disturbance information error by comparing the difference between the disturbance information and the inference disturbance information.
For example, if a word is replaced with its paraphrasing word when content adjustment is performed on training mapping knowledge text, but such replacement is not performed at the time of model reasoning, disturbance information errors may occur. The magnitude of this error can be measured by calculating the similarity or difference of the text before and after the substitution.
Next, the enterprise management system needs to determine document content errors from the training knowledge document and the adjusted knowledge document. This error is mainly used to measure the consistency of the content of the adjusted knowledge document with the original training knowledge document.
Specifically, the enterprise management system may calculate the similarity or difference between two documents by comparing their description vector representations. The description vector representation may be a word vector representation, a sentence vector representation, or a document embedded representation of the document, etc. The magnitude of the document content error can be obtained by calculating cosine similarity, euclidean distance or other similarity index between the vectors.
For example, if the adjusted knowledge document is very similar in content to the original training knowledge document, then the document content error will be small; conversely, if there is a large difference in content between the two, then the document content error will be relatively large.
Finally, the enterprise management system needs to determine hierarchical text distribution errors from the training target hierarchical mask and the confocal text. This error is primarily used to measure the degree of consistency between the text distribution of the confocal text at the target level and the desired distribution.
In particular, the enterprise management system may obtain a desired text distribution on the target hierarchy based on the training target hierarchy mask, which may be derived by counting the frequency of occurrence of text of the target hierarchy in the training data. The enterprise management system may then calculate an actual text distribution of the focused text on the target level, which may be obtained by performing word segmentation, word frequency statistics, etc. on the focused text. Finally, by comparing the difference between the desired text distribution and the actual text distribution, a hierarchical text distribution error can be determined.
For example, if a word in a desired text distribution occurs more frequently, but the word occurs less frequently in an actual text distribution, a hierarchical text distribution error may occur. The magnitude of this error can be measured by calculating the KL divergence, cross entropy, or other difference index between the two distributions.
In summary, the fusion error is determined by comprehensively considering the disturbance information error, the document content error and the hierarchical text distribution error, and the fusion error is used as an important basis for optimizing the document management neural network.
In one example, the disturbance information error (Perturbation Information Error) can be calculated with the following formula:
wherein: e (E) p Representing disturbance information errors; n is the amount of disturbance information; p (P) i Is assigned when content adjustment is performed on training mapping knowledge textThe ith disturbance information; p'. i The ith reasoning disturbance information is actually used in model reasoning; d (P) i, P' i ) Is a function used to calculate the difference between the two disturbance information.
The document content error (Document Content Error) can be calculated by the following formula:
wherein: e (E) c Representing document content errors; d is a description vector representation of the training knowledge document; d' is a description vector representation of the adjusted knowledge document; s (D, D') is a function used to calculate the similarity of the two document description vectors.
Hierarchical text distribution error (Hierarchical Text Distribution Error) can be calculated with the following formula:
wherein: e (E) d Representing hierarchical text distribution errors; t is the desired text distribution on the training target level; t' is the actual text distribution of the confocal text on the target level; div (T, T') is a function used to calculate the difference between two distributions.
The Fusion Error (Fusion Error) can be calculated with the following formula:
/>
wherein: e (E) f Representing a fusion error; alpha, beta, gamma are weight parameters used to balance the importance of the different error terms.
It will be appreciated that the above formulas are merely examples, and an appropriate error function may be selected to perform calculation of the corresponding error according to actual needs, which is not limited in this application.
In one embodiment, step S1612, determining the document content error based on the training knowledge document and the adjusted knowledge document may specifically include:
step S16121: and extracting target level features of the training knowledge document based on the knowledge base detection network to obtain first target level features.
Step S16122: and extracting target level features of the adjusted knowledge document based on the knowledge base detection network to obtain second target level features.
Step S16123: and determining document content errors according to the commonality measurement result between the first target level characteristic and the second target level characteristic.
Steps S16121-S16123 illustrate how document content errors are determined, which are calculated by comparing feature differences at the target level of the training knowledge document and the adjusted knowledge document.
In step S16121, the enterprise management system uses the knowledge base detection network to extract target hierarchy features of the training knowledge document, resulting in first target hierarchy features. The knowledge base detection network is a neural network model specifically designed to extract specific hierarchical features from documents. It is trained to recognize and extract key information in documents that is relevant to the target hierarchy.
Specifically, the enterprise management system provides training knowledge documents as input to the knowledge base detection network. The network identifies features associated with the target hierarchy by analyzing and learning the structure and content in the document and extracts features forming a first target hierarchy. These features may be keywords, phrases, sentences or paragraphs, etc. in the document that together constitute a feature representation of the training knowledge document at the target level.
Next, the enterprise management system uses the same knowledge base detection network to extract target hierarchy features of the adjusted knowledge document, resulting in second target hierarchy features. This step is similar to step S16121, except that the input becomes an adjusted knowledge document.
By extracting the target level features of the adjusted knowledge document, the enterprise management system may obtain a representation of the features of the document at the target level. These features may also be keywords, phrases, sentences or paragraphs, etc., that reflect the content and structural characteristics of the adapted knowledge document.
Finally, the enterprise management system determines document content errors based on the commonality metric between the first target tier features and the second target tier features. The commonality measure is a method for measuring similarity or difference between two feature sets, and can help an enterprise management system to judge whether features of training knowledge documents and adjusted knowledge documents on a target level are consistent.
In particular, the enterprise management system may use metrics such as cosine similarity, euclidean distance, etc. to calculate a commonality metric between the first and second target level features. If the similarity between the two feature sets is higher, the content of the training knowledge document and the content of the adjusted knowledge document on the target level are consistent, and the document content error is smaller; otherwise, if the similarity is lower, the larger difference exists between the two, and the document content error is larger.
In summary, document content errors are determined by extracting target level features of training knowledge documents and adjusted knowledge documents using a knowledge base detection network and calculating commonality metrics between them. The method can effectively measure the consistency degree of the two documents in terms of content and structure, and provides important basis for subsequent optimization work.
The embodiment of the present application further provides an enterprise management system, as shown in fig. 2, the enterprise management system 100 includes: a processor 101 and a memory 103. Wherein the processor 101 is coupled to the memory 103, such as via bus 102. Optionally, the enterprise management system 100 may also include a transceiver 104. It should be noted that, in practical applications, the transceiver 104 is not limited to one, and the structure of the enterprise management system 100 is not limited to the embodiments of the present application. The processor 101 may be a CPU, general purpose processor, GPU, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor 101 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc. Bus 102 may include a path to transfer information between the aforementioned components. Bus 102 may be a PCI bus or an EISA bus, etc. The bus 102 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 2, but not only one bus or one type of bus.
Memory 103 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disks, laser disks, optical disks, digital versatile disks, blu-ray disks, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 103 is used for storing application program codes for executing the present application and is controlled to be executed by the processor 101. The processor 101 is configured to execute application code stored in the memory 103 to implement what is shown in any of the method embodiments described above.
The embodiment of the application provides an enterprise management system, which comprises: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the one or more processors, implement the methods provided by the embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, when the computer program runs on a processor, the computer program can enable the processor to execute corresponding content in the embodiment of the method.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages. The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (14)

1. A hierarchical knowledge document management method, applied to an enterprise management system, the method comprising:
generating a document management coding feature according to an initial knowledge document and constraint information, wherein the document management coding feature comprises a description vector of the initial knowledge document and a description vector of the constraint information; the initial knowledge document is a text containing target level characteristics of the enterprise knowledge base, and the constraint information is information for constraining the target level information disclosure characteristics of the enterprise knowledge base;
performing dimension reduction operation on the initial knowledge document to obtain a mapping knowledge text; wherein the mapped knowledge text is text having dimensions smaller than the initial knowledge document while maintaining target hierarchical features of the enterprise knowledge base;
generating a confocal text corresponding to the initial knowledge document according to the mapped knowledge text, the document management coding features and a target hierarchy mask, wherein the target hierarchy mask is used for distinguishing the target hierarchy text in the initial knowledge document from other hierarchy texts except the target hierarchy text, and the confocal text is used for generating disclosure information corresponding to the enterprise knowledge base target hierarchy information disclosure features in the target hierarchy text;
Content adjustment is carried out on the mapping knowledge text according to the document management coding features and the confocal text, and an adjusted knowledge document is obtained; the adjusted knowledge document comprises target level characteristics of the initial knowledge document and also comprises enterprise knowledge base target level information disclosure characteristics constrained by the constraint information.
2. The method of claim 1, wherein said content adjusting the mapped knowledge text based on the document management coding feature and the inter-focused text to obtain an adjusted knowledge document comprises:
giving s rounds of disturbance information to the mapped knowledge text based on a first machine learning model to obtain a disturbance text, wherein s is more than or equal to 1;
based on the first machine learning model, performing s rounds of disturbance information reasoning and disturbance cleaning processing on the disturbance text according to the document management coding features and the confocal text to obtain a cleaning text;
restoring the clear text to the dimension corresponding to the initial knowledge document, and generating the adjusted knowledge document.
3. The method of claim 2, wherein performing s rounds of perturbation information reasoning and perturbation cleaning processing on the perturbation text based on the first machine learning model according to the document management coding feature and the confocal text to obtain a cleaned text comprises:
Based on the first machine learning model, reasoning disturbance information endowed by an ith round in the mapping knowledge text according to the document management coding features and the confocal text to obtain ith reasoning disturbance information, wherein i is more than or equal to 1 and less than or equal to s;
clearing the ith reasoning disturbance information in the text with the disturbance information to obtain an adjusted text with the disturbance information; the initialized text with the disturbance information is the disturbance text;
when i is less than s, taking i+1 as an adjusted i, continuing to iterate the operation of reasoning disturbance information given by an ith round in the mapping knowledge text on the basis of the first machine learning model according to the document management coding feature and the confocal text to obtain ith reasoning disturbance information;
and when i=s, taking the adjusted text with disturbance information as the cleaning text.
4. The method of claim 1, wherein generating document management coding features based on the initial knowledge document and constraint information comprises:
extracting target level features of the enterprise knowledge base contained in the initial knowledge document based on a knowledge base detection network; performing embedded mapping on constraint information based on a constraint information network to obtain a coding result;
Generating the document management coding feature according to the target level feature of the enterprise knowledge base and the coding result;
the constraint information comprises one or more constraint items, and the coding result comprises coding vectors respectively corresponding to the one or more constraint items; the generating the document management coding feature according to the target level feature of the enterprise knowledge base and the coding result comprises the following steps:
combining the coding vector corresponding to a first constraint item in the one or more constraint items with the target level characteristics of the enterprise knowledge base to obtain an adjustment coding vector corresponding to the first constraint item; the first constraint item is a preset constraint item used for representing an enterprise knowledge base;
processing the adjustment coding vector corresponding to the first constraint item according to a feedforward neural network to obtain a target coding vector corresponding to the first constraint item; the target coding vector corresponding to the first constraint item and the coding vector corresponding to the first constraint item comprise consistent dimensionality;
and replacing the coding vector corresponding to the first constraint item in the coding result with the target coding vector corresponding to the first constraint item to generate the document management coding feature.
5. The method of claim 4, wherein the knowledge base based detection network extracting the enterprise knowledge base target level features contained in the initial knowledge document comprises:
extracting a target level text in the initial knowledge document to obtain a target level text corresponding to the initial knowledge document; and extracting the enterprise knowledge base target level features from the target level text based on the knowledge base detection network.
6. The method of claim 1, wherein generating the corresponding confocal text of the initial knowledge document based on the mapped knowledge text, the document management encoding features, and a target level mask comprises:
generating an initial confocal text corresponding to the initial knowledge document according to the mapping knowledge text and the document management coding characteristic based on a first machine learning model;
and shielding the rest level texts except the target level text in the initial confocal text according to the target level mask, so as to obtain the confocal text corresponding to the initial knowledge document.
7. The method according to any one of claims 1 to 6, wherein the method is performed by a document management neural network, the method further comprising a debugging process of the document management neural network, comprising:
Acquiring one or more debugging data of the document management neural network, wherein each debugging data comprises a pair of associated training knowledge documents and training constraint information; the training knowledge document is a text containing target level characteristics of an enterprise knowledge base, and the training constraint information is information for constraining the target level information disclosure characteristics of the enterprise knowledge base in the training knowledge document;
generating training document management coding features based on the document management neural network according to the training knowledge document and the training constraint information, wherein the training document management coding features comprise description vectors of the training knowledge document and description vectors of the training constraint information;
performing dimension reduction operation on the training knowledge document based on the document management neural network to obtain a training mapping knowledge text; wherein the training mapping knowledge text is a text with dimensions smaller than the training knowledge document, and maintaining target level characteristics of the enterprise knowledge base;
generating a mutual focusing text corresponding to the training knowledge document based on the document management neural network according to the training mapping knowledge text, the training document management coding feature and a training target hierarchy mask, wherein the training target hierarchy mask is used for distinguishing a target hierarchy text in the training knowledge document and other hierarchy texts except the target hierarchy text, and the mutual focusing text is used for generating disclosure information corresponding to the enterprise knowledge base target hierarchy information disclosure feature in the target hierarchy text;
Based on the document management neural network, performing content adjustment on the training mapping knowledge text according to the training document management coding characteristics and the confocal text, and generating an adjusted knowledge document corresponding to the training knowledge document;
and optimizing the network configuration variables of the document management neural network according to the training knowledge document and the adjusted knowledge document to obtain the document management neural network with the debugged document management neural network.
8. The method of claim 7, wherein the generating the adjusted knowledge document corresponding to the training knowledge document based on the document management neural network performing content adjustment on the training mapping knowledge text according to the training document management coding feature and the confocal text comprises:
giving s rounds of disturbance information in the training mapping knowledge text based on a first machine learning model included in the document management neural network to obtain a training disturbance text, wherein s is more than or equal to 1;
based on the first machine learning model, performing s rounds of disturbance information reasoning and disturbance cleaning processing on the training disturbance text according to the document management coding features and the confocal text to obtain a training cleaning text;
And restoring the training clear text to the dimension corresponding to the training knowledge document, and generating the adjusted knowledge document.
9. The method of claim 8, wherein performing s rounds of disturbance information reasoning and disturbance cleaning processing on the training disturbance text based on the first machine learning model according to the document management coding feature and the confocal text to obtain training cleaning text comprises:
based on the first machine learning model, managing coding features and the confocal text according to the training document, reasoning disturbance information endowed by an ith round in the training mapping knowledge text to obtain ith reasoning disturbance information, wherein i is more than or equal to 1 and less than or equal to s;
clearing the ith reasoning disturbance information in the text with disturbance information in training to obtain an adjusted text with disturbance information in training; the initialized training disturbance text is the training disturbance text;
when i is less than s, taking i+1 as an adjusted i, continuing to iterate the operation of obtaining the ith inference disturbance information by reasoning the disturbance information given by the ith round in the training mapping knowledge text based on the first machine learning model and the training document management coding feature and the confocal text;
And when i=s, taking the text with disturbance information of the adjusted training as the training clear text.
10. The method of claim 8, wherein generating training document management encoding features based on the document management neural network from the training knowledge document and the training constraint information comprises:
extracting target level features of the enterprise knowledge base contained in the training knowledge document based on a knowledge base detection network;
based on a constraint information network included in the document management neural network, embedding and mapping the training constraint information to obtain a training coding result;
generating the training document management coding feature according to the target level feature of the enterprise knowledge base and the training coding result;
the training constraint information comprises one or more constraint items, and the training coding result comprises coding vectors corresponding to the one or more constraint items respectively; the generating the training document management coding feature according to the target level feature of the enterprise knowledge base and the training coding result comprises the following steps:
combining the coding vector corresponding to a first constraint item in the one or more constraint items with the target level characteristics of the enterprise knowledge base to obtain an adjustment coding vector corresponding to the first constraint item; the first constraint item is a preset constraint item used for representing an enterprise knowledge base;
Processing the adjustment coding vector corresponding to the first constraint item according to a feedforward neural network to obtain a target coding vector corresponding to the first constraint item; the target coding vector corresponding to the first constraint item and the coding vector corresponding to the first constraint item comprise consistent dimensionality;
and replacing the coding vector corresponding to the first constraint item in the training coding result with the target coding vector corresponding to the first constraint item to generate the training document management coding feature.
11. The method of claim 7, wherein generating the inter-focused text corresponding to the training knowledge document based on the document management neural network in accordance with the training map knowledge text, the training document management coding feature, and a training target hierarchy mask comprises:
generating an initial confocal text corresponding to the training knowledge document according to the training mapping knowledge text and the document management coding characteristic based on a first machine learning model included in the document management neural network;
shielding the rest level texts except the target level text in the initial confocal text according to the training target level mask, so as to obtain a confocal text corresponding to the training knowledge document;
Optimizing the network configuration variables of the document management neural network according to the training knowledge document and the adjusted knowledge document to obtain a document management neural network with completed debugging, wherein the method comprises the following steps:
determining fusion errors according to the training knowledge document and the adjusted knowledge document, wherein the fusion errors comprise disturbance information errors, document content errors and hierarchical text distribution errors; the disturbance information error is used for indicating the error of the given disturbance information and the reasoning disturbance information, the document content error is used for indicating the error between the training knowledge document and the description vector representation of the adjusted knowledge document in the target hierarchical text, and the hierarchical text distribution error is used for indicating the adjusting effect of the confocal text in the target hierarchical text;
and optimizing the network configuration variables of the document management neural network according to the fusion error to obtain the document management neural network with the debugged document management neural network.
12. The method of claim 11, wherein determining a fusion error from the training knowledge document and the adjusted knowledge document comprises:
Determining disturbance information errors according to disturbance information and reasoning disturbance information which are endowed in the process of carrying out content adjustment on the training mapping knowledge text;
determining document content errors according to the training knowledge document and the adjusted knowledge document;
determining a hierarchical text distribution error according to the training target hierarchical mask and the confocal text; determining the fusion error according to the disturbance information error, the document content error and the hierarchical text distribution error;
wherein determining the document content error based on the training knowledge document and the adjusted knowledge document comprises:
extracting target level features of the training knowledge document based on a knowledge base detection network to obtain first target level features;
extracting target level features of the adjusted knowledge document based on the knowledge base detection network to obtain second target level features; and determining the document content error according to the commonality measurement result between the first target level characteristic and the second target level characteristic.
13. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when run on a processor, causes the processor to perform the method of any of claims 1-12.
14. An enterprise management system, comprising:
one or more processors;
a memory;
one or more computer programs; wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs, when executed by the processor, implement the method of any of claims 1-12.
CN202410266293.1A 2024-03-08 2024-03-08 Knowledge document hierarchical management method, storage medium and management system Active CN117851373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410266293.1A CN117851373B (en) 2024-03-08 2024-03-08 Knowledge document hierarchical management method, storage medium and management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410266293.1A CN117851373B (en) 2024-03-08 2024-03-08 Knowledge document hierarchical management method, storage medium and management system

Publications (2)

Publication Number Publication Date
CN117851373A true CN117851373A (en) 2024-04-09
CN117851373B CN117851373B (en) 2024-06-11

Family

ID=90535077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410266293.1A Active CN117851373B (en) 2024-03-08 2024-03-08 Knowledge document hierarchical management method, storage medium and management system

Country Status (1)

Country Link
CN (1) CN117851373B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220358145A1 (en) * 2021-05-05 2022-11-10 Business Objects Software Ltd Managing Structured Documents Based On Document Profiles
CN115587175A (en) * 2022-12-08 2023-01-10 阿里巴巴达摩院(杭州)科技有限公司 Man-machine conversation and pre-training language model training method and system and electronic equipment
CN116187163A (en) * 2022-12-20 2023-05-30 北京知呱呱科技服务有限公司 Construction method and system of pre-training model for patent document processing
CN116611443A (en) * 2023-04-23 2023-08-18 中国人民解放军战略支援部队信息工程大学 Knowledge interaction graph guided event causal relationship identification system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220358145A1 (en) * 2021-05-05 2022-11-10 Business Objects Software Ltd Managing Structured Documents Based On Document Profiles
CN115587175A (en) * 2022-12-08 2023-01-10 阿里巴巴达摩院(杭州)科技有限公司 Man-machine conversation and pre-training language model training method and system and electronic equipment
CN116187163A (en) * 2022-12-20 2023-05-30 北京知呱呱科技服务有限公司 Construction method and system of pre-training model for patent document processing
CN116611443A (en) * 2023-04-23 2023-08-18 中国人民解放军战略支援部队信息工程大学 Knowledge interaction graph guided event causal relationship identification system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王忠义;周杰;黄京;: "数字图书馆多粒度关联数据的创建与发布", 情报学报, no. 08, 24 August 2016 (2016-08-24), pages 103 - 114 *

Also Published As

Publication number Publication date
CN117851373B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
Li et al. DCT-GAN: dilated convolutional transformer-based GAN for time series anomaly detection
Wen et al. Neural attention model for recommendation based on factorization machines
Jiang et al. Ontology matching with knowledge rules
Patel et al. Representing joint hierarchies with box embeddings
Loyola et al. UNSL at eRisk 2021: A Comparison of Three Early Alert Policies for Early Risk Detection.
Irissappane et al. Leveraging GPT-2 for classifying spam reviews with limited labeled data via adversarial training
CN116228383A (en) Risk prediction method and device, storage medium and electronic equipment
Yu et al. Use of deep learning model with attention mechanism for software fault prediction
Jagdish et al. Identification of End‐User Economical Relationship Graph Using Lightweight Blockchain‐Based BERT Model
Geist et al. Leveraging machine learning for software redocumentation—A comprehensive comparison of methods in practice
CN117851373B (en) Knowledge document hierarchical management method, storage medium and management system
Alshamsan et al. Machine learning algorithms for privacy policy classification: A comparative study
Li et al. A deep learning approach of financial distress recognition combining text
Mahfoodh et al. Word2vec duplicate bug records identification prediction using tensorflow
Cao et al. A new skeleton-neural DAG learning approach
Chen Realization of Inter-Model Connections: Linking Requirements and Computer-Aided Design
Tavares et al. How COVID-19 Impacted Data Science: a Topic Retrieval and Analysis from GitHub Projects’ Descriptions
Li et al. Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer‐Based Pointer Generator Network
He et al. Counterfactual Explanations for Sequential Recommendation with Temporal Dependencies
Li et al. TEBC-net: An effective relation extraction approach for simple question answering over knowledge graphs
Zaqiyah et al. Text Generation with Content and Structure-Based Preprocessing in Imbalanced Data of Product Review.
Xu Applications of Modern NLP Techniques for Predictive Modeling in Actuarial Science
CN117149999B (en) Class case recommendation method and device based on legal element hierarchical network and text characteristics
KR102666388B1 (en) Apparatus and method for generating predictive information on development possibility of promising technology
CN118228318B (en) Battery carbon footprint distributed computing method and system for protecting carbon data privacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant