CN111881695A - Audit knowledge retrieval method and device - Google Patents

Audit knowledge retrieval method and device Download PDF

Info

Publication number
CN111881695A
CN111881695A CN202010536953.5A CN202010536953A CN111881695A CN 111881695 A CN111881695 A CN 111881695A CN 202010536953 A CN202010536953 A CN 202010536953A CN 111881695 A CN111881695 A CN 111881695A
Authority
CN
China
Prior art keywords
question
word
audit knowledge
type
audit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010536953.5A
Other languages
Chinese (zh)
Inventor
侯本忠
张永强
胡璟懿
匡尧
张志斌
张海超
王郁
贾昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Hubei Electric Power Co Ltd
Beijing Guodiantong Network Technology Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Hubei Electric Power Co Ltd
Beijing Guodiantong Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, State Grid Hubei Electric Power Co Ltd, Beijing Guodiantong Network Technology Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202010536953.5A priority Critical patent/CN111881695A/en
Publication of CN111881695A publication Critical patent/CN111881695A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Human Computer Interaction (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

One or more embodiments of the present specification provide a method and an apparatus for retrieving audit knowledge, which determine an audit knowledge problem to be retrieved; analyzing the audit knowledge problem, and determining a problem type, a user emotion color type and a semantic expression; retrieving question-answering models matched with the semantic expressions; generating answers of audit knowledge questions according to the question types, the user emotion color types and the question-answer model; and the answers of the audit knowledge questions are output, so that the efficiency of retrieving audit knowledge is improved.

Description

Audit knowledge retrieval method and device
Technical Field
One or more embodiments of the present disclosure relate to the technical field of audit work, and in particular, to a method and an apparatus for retrieving audit knowledge.
Background
Audit, as an economic surveillance activity, refers to a systematic process of objectively collecting and evaluating evidence in order to ascertain the degree of agreement between accreditation and established standards regarding economic activities and economic phenomena, and communicating the results to interested users.
The auditing work has the characteristics of huge knowledge system, complex content and strong specialty. The audit work has higher requirements on the knowledge reserve of the audit business personnel, so that the audit business personnel can frequently look up relevant data and retrieve relevant knowledge in work.
At present, in the auditing work process, the auditing work platform mainly manages and maintains auditing knowledge in an attachment uploading mode. In this way, in the process of searching and viewing audit knowledge and results, audit service personnel can only search according to the name of the audit knowledge file and cannot search the file content information of the audit knowledge or the results. The information retrieved by the auditing staff is a relevant regulation file, and the retrieved information is not integrated, so that the efficiency is low.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a method and an apparatus for retrieving audit knowledge, so as to solve the problem of low efficiency of retrieving audit knowledge.
In view of the above, one or more embodiments of the present specification provide a method for retrieving audit knowledge, including:
determining an audit knowledge problem to be retrieved;
analyzing the audit knowledge problem, and determining a problem type, a user emotion color type and a semantic expression;
retrieving question-answering models matched with the semantic expressions;
generating answers of audit knowledge questions according to the question types, the user emotion color types and the question-answer model;
and outputting the answer of the audit knowledge question.
Optionally, determining the audit knowledge problem to be retrieved includes:
receiving audit knowledge problem information input by a user;
if the received audit knowledge problem information is text information, the text information is used as the audit knowledge problem to be retrieved;
and if the received audit knowledge problem information is voice information, converting the voice information into text information, and taking the converted text information as the audit knowledge problem to be retrieved.
Optionally, analyzing the audit knowledge problem and determining the problem type includes:
splitting an audit knowledge problem into word sequences;
comparing the word sequence with a preset question type word library;
and if the words matched with the preset problem type word bank exist in the word sequence, determining whether the problem type is closed or open according to the problem type corresponding to the words.
Optionally, analyzing the audit knowledge problem and determining the emotional color type of the user includes:
splitting an audit knowledge problem into word sequences;
comparing the word sequence with a preset emotional color word library;
and if the words matched with the preset emotional color word library exist in the word sequence, determining whether the emotional color type of the user is a positive emotion type or a negative emotion type according to the emotional color corresponding to the words.
Optionally, analyzing the audit knowledge problem and determining the semantic expression includes:
splitting an audit knowledge problem into word sequences;
in the word sequence, identifying by adopting a CRF technology to obtain entity words;
in the entity words, a TF-IDF technology is adopted to identify and obtain key words;
expanding the keywords to generate a keyword set;
and generating a semantic expression according to the keyword set.
Optionally, generating an answer to the audit knowledge question according to the question type, the user emotion color type and the question-answer model, including:
extracting text information in the question-answering model;
obtaining syntactic components forming an answer according to the question type and the emotional color type of the user;
the syntactic components are converted into answers.
Based on the same inventive concept, one or more embodiments of the present specification provide an apparatus for retrieving audit knowledge, comprising:
the audit knowledge problem determining module is used for determining the audit knowledge problem to be retrieved;
the audit knowledge problem analysis module is used for analyzing the audit knowledge problem and determining a problem type, a user emotion color type and a semantic expression;
the audit knowledge question processing module is used for retrieving question and answer models matched with the semantic expressions;
the audit knowledge answer generating module is used for generating an answer of the audit knowledge question according to the question type, the emotion color type of the user and the question-answer model;
and the audit knowledge answer output module is used for outputting the answer of the audit knowledge question.
Optionally, the audit knowledge problem analysis module includes:
the problem type analysis submodule is used for splitting the audit knowledge problem into word sequences; comparing the word sequence with a preset question type word library; if the word sequence has a word matched with a preset question type word library, determining whether the problem type is closed or open according to the problem type corresponding to the word;
the user emotion color type analysis submodule is used for splitting an audit knowledge problem into word sequences; comparing the word sequence with a preset emotional color word library; if the words matched with a preset emotional color word library exist in the word sequence, determining whether the emotional color type of the user is a positive emotion type or a negative emotion type according to the emotional color corresponding to the words;
the semantic expression generation submodule is used for splitting the audit knowledge problem into word sequences; in the word sequence, identifying to obtain entity words; identifying and obtaining key words in the entity words; expanding the keywords to generate a keyword set; and generating a semantic expression according to the keyword set.
Optionally, the semantic expression generation sub-module includes:
the word segmentation unit is used for segmenting the audit knowledge problem into word sequences;
the entity word recognition unit is used for recognizing and obtaining entity words in the word sequence by adopting a CRF (learning control and reporting) technology;
the keyword identification unit is used for identifying and obtaining keywords in the entity words by adopting a TF-IDF technology;
a keyword set generating unit for expanding the keywords and generating a keyword set;
and the semantic expression generating unit is used for generating a semantic expression according to the keyword set.
Optionally, the audit knowledge answer generating module includes:
the audit knowledge answer extraction submodule is used for extracting text information in the question-answer model;
a syntax component obtaining submodule for obtaining syntax components constituting an answer according to the question type and the user emotion color type;
and the audit knowledge answer synthesis submodule is used for converting the syntactic components into answers.
As can be seen from the above, the audit knowledge retrieval method and apparatus provided in one or more embodiments of the present specification determine the audit knowledge problem to be retrieved; analyzing the audit knowledge problem, and determining a problem type, a user emotion color type and a semantic expression; retrieving question-answering models matched with the semantic expressions; generating answers of audit knowledge questions according to the question types, the user emotion color types and the question-answer model; and the answers of the audit knowledge questions are output, so that the efficiency of retrieving audit knowledge is improved.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.
FIG. 1 is a schematic flow diagram of a method for audit knowledge retrieval as provided in one or more embodiments of the present description;
FIG. 2 is a schematic flow diagram of generating a semantic expression provided by one or more embodiments of the present disclosure;
FIG. 3 is a schematic diagram of an apparatus for audit knowledge retrieval according to one or more embodiments of the present disclosure;
fig. 4 is a schematic structural diagram of a semantic expression generation submodule provided in one or more embodiments of the present specification.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
In order to achieve the above object, one or more embodiments of the present disclosure provide a method and an apparatus for retrieving audit knowledge, which may be applied to various electronic devices, such as a memory, a processor, and a computer program stored in the memory and running on the processor, and the disclosure is not limited thereto. The following first describes the method for searching audit knowledge in detail.
Fig. 1 is a schematic flowchart of a method for retrieving audit knowledge according to one or more embodiments of the present disclosure, and referring to fig. 1, the method for retrieving audit knowledge includes:
s101, determining an audit knowledge problem to be retrieved.
In one embodiment, S101 may include:
and receiving audit knowledge problem information input by a user as an audit knowledge problem to be retrieved.
In one case, the audit knowledge question entered by the user may be text-form information, and the present disclosure may provide an interface for the user to enter text information, such as a question and answer window or a search window, through which the user may enter audit knowledge question information in text form to be asked.
In one case, the audit knowledge problem entered by the user may be information in the form of speech, and the present disclosure may provide a module for speech input, such as an audio input module, for receiving the speech information entered by the user. If the received audit knowledge problem information input by the user is voice information, the voice information can be identified, the voice information is converted into a text expression form, namely, the voice information is converted into corresponding text information, and then the identification result, namely, the corresponding text information is used as the audit knowledge problem to be retrieved.
S102, analyzing the audit knowledge problem, and determining a problem type, a user emotion color type and a semantic expression.
In one embodiment, S102 may include:
in one case, analyzing audit knowledge questions to determine question types includes:
the audit knowledge problem is split into word sequences.
And comparing the word sequence with a preset question type word library.
And if the words matched with the preset problem type word bank exist in the word sequence, determining whether the problem type is closed or open according to the problem type corresponding to the words.
Wherein, split into the word sequence with audit knowledge problem, include:
the process of splitting the audit knowledge problem into word sequences, namely word segmentation, refers to the process of recombining continuous word sequences into word sequences according to certain specifications. Chinese word segmentation is a step of Chinese text processing, and can be performed based on a dictionary word segmentation algorithm.
Wherein determining the problem type comprises:
it is determined whether the problem type is closed or open.
The closed-type question here refers to a question that can be answered in the form of "yes" or "no", whereas the open-type question refers to a question that cannot be answered in the form of "yes" or "no".
Specifically, for example, the words in the closed question word bank include: "is not", "is not right", "is not needed", "has or not", "cannot be able", "is unwilling"; the words in the open question word library include: "what", "how", "why".
In one aspect, analyzing audit knowledge issues to determine a user emotional color type, comprising:
the audit knowledge problem is split into word sequences.
And comparing the word sequence with a preset emotional color word library.
And if the words matched with the preset emotional color word library exist in the word sequence, determining whether the emotional color type of the user is a positive emotion type or a negative emotion type according to the emotional color corresponding to the words.
Wherein, confirm the emotional color type of user, include:
and determining whether the character input by the user is a character with negative emotion or a character with positive emotion, wherein the negative emotion mainly refers to the emotional depression such as stupor, hurt and angry, and the positive emotion mainly refers to the emotional pleasure such as happy and excited.
Specifically, the characters input into the text edit box by the user are obtained.
The characters are compared with a preset word bank.
Presetting a preset word library reflecting emotional colors, wherein the preset word library comprises a plurality of words expressing the emotional colors and a mark of the emotional color of each word.
And if the words matched with the preset word library exist in the characters, determining the emotional color type of the user according to the emotional colors corresponding to the words.
If the characters contain words matched with a certain word in the preset word library, the characters are shown to be the characters with emotional colors, if the words matched with the certain word in the preset word library exist in the characters are words with positive emotions, the emotional color types of the characters are judged to be positive emotion types, and if the words matched with the certain word in the preset word library exist in the characters are words with negative emotions, the emotional color types of the characters are judged to be negative emotion types.
Referring to fig. 2, in one case, the audit knowledge problem is analyzed to determine semantic expressions, including:
s201, splitting the audit knowledge problem into word sequences.
S202, in the word sequence, entity words are identified and obtained by adopting a CRF technology.
Wherein, in the word sequence, the entity words are identified and obtained, and the method comprises the following steps: and identifying to obtain the entity words based on the entity word list and the CRF model. Specifically, the identifying obtains entity words, which includes:
constructing characteristics, and then combining, wherein the format is required to be in accordance with the CRF + + training format:
CRF + + is one implementation of a CRF algorithm.
Constructing part-of-speech characteristics, constructing word boundary characteristics, constructing entity indicator word characteristics, constructing characteristic word characteristics, constructing common word characteristics, constructing label characteristics, and combining the characteristics: the characteristic sequence is character + part of speech + boundary + indicator + characteristic word + common word + label in turn.
Writing a corresponding feature template:
the characteristic template is written according to the characteristics of characters, parts of speech, boundaries, indicator words, characteristic words, common words and labels.
The model was trained with CRF + +.
Testing the model by CRF + + and identifying entity words:
and traversing each word to judge whether the word is in the entity word list.
And S203, identifying the entity words by adopting a TF-IDF technology to obtain key words.
Wherein, in the entity word, the keyword is obtained by identification, which includes:
and calculating the weight of each entity word, removing the entity words with low weight, and taking the entity words with high weight as the key words. Specifically, a TF-IDF (term frequency-inverse document frequency) weighting technique may be used to identify keywords from entity words. TF (word frequency) is the number of times an entity word currently appears, and TDF (inverse document frequency) can be obtained by dividing the total number of files by the number of files containing the entity word, and taking the obtained quotient to be a base-10 logarithm.
Term Frequency (TF) refers to the number of times a given word appears in the document. The formula is as follows:
tfi,j=ni,j/∑knk,j
wherein n isi,jIs the word tiIn document djThe number of occurrences in (1);
knk,jis in the file djThe sum of the number of occurrences of all words in (b).
The main idea of the reverse document frequency (IDF) is: if containing the entry tiThe fewer documents, that is ni,jThe smaller the IDF, the larger the IDF, the better the category distinguishing capability of the entry. The IDF for a particular term may be obtained by dividing the total number of documents by the number of documents that contain that term and taking the logarithm of the resulting quotient. The formula is as follows:
idfi=log(|D|/|{j:ti∈dj}|)
wherein | D | is the total number of files in the corpus;
|{j:ti∈djis a word containing tiNumber of files (i.e., n)i,jNumber of files not equal to 0), if the word is not in the corpus, it will result in a dividend of zero, so 1+ | { j: t ] is typically usedi∈dj}|。
A high word frequency within a particular document, and a low document frequency for that word across the document collection, may result in a high-weighted TF-IDF. Therefore, TF-IDF tends to filter out common words, preserving important words. The formula is as follows:
tfidfij=tfij×idfi
wherein tfidfijIs term frequency-inverse document frequency (TF-IDF);
tfijis the Term Frequency (TF);
idfiis the Inverse Document Frequency (IDF).
And S204, expanding the keywords to generate a keyword set.
Wherein, expanding the keywords and generating a keyword set comprises:
determining synonyms and related words of the keywords, taking the synonyms and related words as expansion words, calculating the weight of each expansion word by using the TF-IDF method, removing the expansion words with low weight, taking the expansion words with high weight, and taking the expansion words with high weight and the keywords as a keyword set.
And S205, generating a semantic expression according to the keyword set.
Generating a semantic expression according to the keyword set, wherein the semantic expression comprises the following steps:
the semantic expression represents the user intention, and specifically, in the auditing work, the semantic expression may relate to risk problems, laws and regulations, enterprise systems, auditing cases, auditing talents, item types and the like, which are not specifically limited by the disclosure.
For example, in one case, the audit knowledge question entered by the user is "what are the authorities of the audit authority? "then determine that the audit knowledge problem to be retrieved is" what are the authorities of the audit authority? "what are the authorities to the problem" audit authority? The method comprises the steps of performing word segmentation processing, splitting into word sequences, identifying entity words as an audit organization and authority based on an entity word list and a CRF model, identifying key words in the entity words by using a TF-IDF method, determining synonyms and related words of the key words as expansion words, wherein the synonyms and related words of the audit organization can include: the synonyms and related words of "authority" can be: the function scope, the right, the power and the like, and the extension words with low weight are removed by using a TF-IDF method, and the extension words with high weight and the keywords are used as a keyword set. According to a preset audit semantic database, searching the semantics of the keywords, and generating a semantic expression of the audit knowledge problem, such as:
{ [ (Audit) or (Audit)) ] and [ (Authority) or (function scope) ] }
S103, retrieving the question-answer model matched with the semantic expression.
And searching the matched question-answer model from a preset question-answer library.
Keyword indexes can be set for questions in the question-answer library, and all matched question-answer models are obtained through retrieval of the keywords in the semantic expressions, so that alternative question-answer models are obtained. The correlation between the candidate question-answer models and the questions can be calculated by using the matching probability, namely the similarity between the questions of each candidate question-answer model and the questions of the user, and in one case, the matching probability can be calculated by using the similarity degree on the font.
For example, according to a semantic expression, a question-answer model to be selected is searched from a question-answer library to be the authority of an auditing organization, the correlation between the question-answer model to be selected and a question is calculated, and if the correlation between the question-answer model to be selected and the question reaches a preset threshold value, the question-answer model is selected.
And S104, generating answers of the audit knowledge questions according to the question types, the emotion colors of the users and the question-answer models.
In one embodiment, S104 may include:
and extracting answers in the question-answering model.
And obtaining a syntactic component forming an answer according to the question type and the emotional color type of the user.
The syntactic components are converted into answers.
And generating answers for the alternative question-answer models by adopting answer forms corresponding to the question types and the emotion colors of the users. And if the question type is determined to be open and the emotional color of the user is determined to be the positive emotion type, generating an answer for the alternative question-answering model by adopting an open and positive emotion answer form.
When each syntactic component forming the whole sentence is obtained, a tree structure is output, so that the tree structure needs to be converted into a linear sentence, and convenience is brought to a user to read.
And S105, outputting the answer of the audit knowledge question.
The method and the device for searching the audit knowledge, provided by one or more embodiments of the specification, are used for determining the audit knowledge problem to be searched; analyzing the audit knowledge problem, and determining a problem type, a user emotion color type and a semantic expression; retrieving question-answering models matched with the semantic expressions; generating answers of audit knowledge questions according to the question types, the user emotion color types and the question-answer model; and the answers of the audit knowledge questions are output, so that the efficiency of retrieving audit knowledge is improved.
Fig. 3 is a schematic structural diagram of an audit knowledge retrieval apparatus provided in one or more embodiments of the present specification, and referring to fig. 3, the audit knowledge retrieval apparatus includes:
and the audit knowledge problem determining module 31 is used for determining the audit knowledge problem to be retrieved.
And the audit knowledge problem analysis module 32 is used for analyzing the audit knowledge problem and determining the problem type, the user emotion color type and the semantic expression.
Wherein, audit knowledge problem analysis module includes:
the problem type analysis submodule is used for splitting the audit knowledge problem into word sequences and comparing the word sequences with a preset problem type word library; and if the words matched with the preset problem type word bank exist in the word sequence, determining whether the problem type is closed or open according to the problem type corresponding to the words.
The user emotion color type analysis submodule is used for splitting the audit knowledge problem into word sequences and comparing the word sequences with a preset emotion color word library; and if the words matched with the preset emotional color word library exist in the word sequence, determining whether the emotional color type of the user is a positive emotion type or a negative emotion type according to the emotional color corresponding to the words.
The semantic expression generation submodule is used for splitting the audit knowledge problem into word sequences, and identifying entity words in the word sequences; identifying and obtaining key words in the entity words; expanding the keywords to generate a keyword set; and generating a semantic expression according to the keyword set.
Referring to fig. 4, the semantic expression generation submodule includes:
and the word segmentation list 41 is used for splitting the audit knowledge problem into word sequences.
And the entity word recognition unit 42 is used for recognizing and obtaining entity words in the word sequence by adopting a CRF (fuzzy C-means) technology.
And the keyword identification unit 43 is used for identifying and obtaining the keywords in the entity words by adopting the TF-IDF technology.
The keyword set generating unit 44 is configured to expand the keywords and generate a keyword set.
And a semantic expression generating unit 45, configured to generate a semantic expression according to the keyword set.
And the audit knowledge question processing module 33 is used for retrieving the question-answer model matched with the semantic expression.
And the audit knowledge answer generating module 34 is used for generating an answer of the audit knowledge question according to the question type, the user emotion color type and the question-answer model.
And an audit knowledge answer output module 35, configured to output an answer to the audit knowledge question.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description.
It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. A retrieval method of audit knowledge is characterized by comprising the following steps:
determining an audit knowledge problem to be retrieved;
analyzing the audit knowledge problem, and determining a problem type, a user emotion color type and a semantic expression;
retrieving question-answering models matched with the semantic expressions;
generating answers of the audit knowledge questions according to the question types, the user emotion color types and the question-answer models;
and outputting the answer of the audit knowledge question.
2. The method of claim 1, wherein determining audit knowledge questions to retrieve comprises:
receiving audit knowledge problem information input by a user;
if the received audit knowledge problem information is text information, taking the text information as an audit knowledge problem to be retrieved;
and if the received audit knowledge problem information is voice information, converting the voice information into text information, and taking the converted text information as the audit knowledge problem to be retrieved.
3. The method of claim 1, wherein analyzing the audit knowledge problem to determine a problem type comprises:
splitting the audit knowledge problem into word sequences;
comparing the word sequence with a preset question type word library;
and if the word sequence has a word matched with the preset question type word library, determining whether the question type is closed or open according to the question type corresponding to the word.
4. The method of claim 1, wherein analyzing the audit knowledge problem to determine a user emotional color type comprises:
splitting the audit knowledge problem into word sequences;
comparing the word sequence with a preset emotional color word library;
and if the words matched with the preset emotional color word library exist in the word sequence, determining whether the emotional color type of the user is a positive emotion type or a negative emotion type according to the emotional color corresponding to the words.
5. The method of claim 1, wherein analyzing the audit knowledge problem to determine semantic expressions comprises:
splitting the audit knowledge problem into word sequences;
in the word sequence, identifying by adopting a CRF technology to obtain entity words;
identifying and obtaining key words in the entity words by adopting a TF-IDF technology;
expanding the keywords to generate a keyword set;
and generating the semantic expression according to the keyword set.
6. The method of claim 1, wherein generating the answer to the audit knowledge question based on the question type, the user emotional color type, and a question-and-answer model comprises:
extracting text information in the question-answering model;
obtaining syntactic components forming answers according to the question types and the emotional color types of the users;
converting the syntactic component into the answer.
7. An apparatus for auditing knowledge retrieval, comprising:
the audit knowledge problem determining module is used for determining the audit knowledge problem to be retrieved;
the audit knowledge problem analysis module is used for analyzing the audit knowledge problem and determining a problem type, a user emotion color type and a semantic expression;
the audit knowledge question processing module is used for retrieving the question-answer model matched with the semantic expression;
the audit knowledge answer generating module is used for generating the answer of the audit knowledge question according to the question type, the emotion color type of the user and the question-answer model;
and the audit knowledge answer output module is used for outputting the answer of the audit knowledge question.
8. The apparatus of claim 7, wherein the audit knowledge problem analysis module comprises:
the problem type analysis submodule is used for splitting the audit knowledge problem into word sequences; comparing the word sequence with a preset question type word library; if the word sequence has a word matched with the preset question type word library, determining whether the question type is closed or open according to the question type corresponding to the word;
the user emotion color type analysis submodule is used for splitting the audit knowledge problem into word sequences; comparing the word sequence with a preset emotional color word library; if the words matched with the preset emotional color word library exist in the word sequence, determining whether the emotional color type of the user is a positive emotion type or a negative emotion type according to the emotional color corresponding to the words;
a semantic expression generation submodule for splitting the audit knowledge problem into word sequences; in the word sequence, identifying to obtain entity words; identifying and obtaining key words in the entity words; expanding the keywords to generate a keyword set; and generating the semantic expression according to the keyword set.
9. The apparatus of claim 7, wherein the semantic expression generation submodule comprises:
the word segmentation unit is used for segmenting the audit knowledge problem into word sequences;
the entity word recognition unit is used for recognizing and obtaining entity words in the word sequence by adopting a CRF technology;
the keyword identification unit is used for identifying and obtaining keywords in the entity words by adopting a TF-IDF technology;
a keyword set generating unit, configured to expand the keywords and generate a keyword set;
and the semantic expression generating unit is used for generating the semantic expression according to the keyword set.
10. The apparatus of claim 7, wherein the audit knowledge answer generation module comprises:
the audit knowledge answer extraction submodule is used for extracting text information in the question-answer model;
a syntax component obtaining submodule for obtaining syntax components constituting an answer according to the question type and the user emotion color type;
and the audit knowledge answer synthesis submodule is used for converting the syntactic components into the answers.
CN202010536953.5A 2020-06-12 2020-06-12 Audit knowledge retrieval method and device Pending CN111881695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010536953.5A CN111881695A (en) 2020-06-12 2020-06-12 Audit knowledge retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010536953.5A CN111881695A (en) 2020-06-12 2020-06-12 Audit knowledge retrieval method and device

Publications (1)

Publication Number Publication Date
CN111881695A true CN111881695A (en) 2020-11-03

Family

ID=73158287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010536953.5A Pending CN111881695A (en) 2020-06-12 2020-06-12 Audit knowledge retrieval method and device

Country Status (1)

Country Link
CN (1) CN111881695A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672720A (en) * 2021-09-14 2021-11-19 国网天津市电力公司 Power audit question and answer method based on knowledge graph and semantic similarity
CN116756178A (en) * 2023-08-22 2023-09-15 北京至臻云智能科技有限公司 Audit method, system and audit robot based on large language generation model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598445A (en) * 2013-11-01 2015-05-06 腾讯科技(深圳)有限公司 Automatic question-answering system and method
CN105893344A (en) * 2016-03-28 2016-08-24 北京京东尚科信息技术有限公司 User semantic sentiment analysis-based response method and device
CN107301168A (en) * 2017-06-01 2017-10-27 深圳市朗空亿科科技有限公司 Intelligent robot and its mood exchange method, system
CN107688608A (en) * 2017-07-28 2018-02-13 合肥美的智能科技有限公司 Intelligent sound answering method, device, computer equipment and readable storage medium storing program for executing
CN110750616A (en) * 2019-10-16 2020-02-04 网易(杭州)网络有限公司 Retrieval type chatting method and device and computer equipment
CN111078837A (en) * 2019-12-11 2020-04-28 腾讯科技(深圳)有限公司 Intelligent question and answer information processing method, electronic equipment and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598445A (en) * 2013-11-01 2015-05-06 腾讯科技(深圳)有限公司 Automatic question-answering system and method
CN105893344A (en) * 2016-03-28 2016-08-24 北京京东尚科信息技术有限公司 User semantic sentiment analysis-based response method and device
CN107301168A (en) * 2017-06-01 2017-10-27 深圳市朗空亿科科技有限公司 Intelligent robot and its mood exchange method, system
CN107688608A (en) * 2017-07-28 2018-02-13 合肥美的智能科技有限公司 Intelligent sound answering method, device, computer equipment and readable storage medium storing program for executing
CN110750616A (en) * 2019-10-16 2020-02-04 网易(杭州)网络有限公司 Retrieval type chatting method and device and computer equipment
CN111078837A (en) * 2019-12-11 2020-04-28 腾讯科技(深圳)有限公司 Intelligent question and answer information processing method, electronic equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672720A (en) * 2021-09-14 2021-11-19 国网天津市电力公司 Power audit question and answer method based on knowledge graph and semantic similarity
CN116756178A (en) * 2023-08-22 2023-09-15 北京至臻云智能科技有限公司 Audit method, system and audit robot based on large language generation model

Similar Documents

Publication Publication Date Title
CN110399457B (en) Intelligent question answering method and system
US10489439B2 (en) System and method for entity extraction from semi-structured text documents
US11521603B2 (en) Automatically generating conference minutes
CN108829893B (en) Method and device for determining video label, storage medium and terminal equipment
CN111368048B (en) Information acquisition method, information acquisition device, electronic equipment and computer readable storage medium
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN111190997A (en) Question-answering system implementation method using neural network and machine learning sequencing algorithm
US20050138079A1 (en) Processing, browsing and classifying an electronic document
CN112380848B (en) Text generation method, device, equipment and storage medium
US20220358379A1 (en) System, apparatus and method of managing knowledge generated from technical data
CN116775874B (en) Information intelligent classification method and system based on multiple semantic information
CN116227466B (en) Sentence generation method, device and equipment with similar semantic different expressions
Hassani et al. LVTIA: A new method for keyphrase extraction from scientific video lectures
CN111881695A (en) Audit knowledge retrieval method and device
Vinciarelli et al. Application of information retrieval technologies to presentation slides
CN111401047A (en) Method and device for generating dispute focus of legal document and computer equipment
JP5679400B2 (en) Category theme phrase extracting device, hierarchical tagging device and method, program, and computer-readable recording medium
CN111949781B (en) Intelligent interaction method and device based on natural sentence syntactic analysis
CN112905752A (en) Intelligent interaction method, device, equipment and storage medium
CN118350368B (en) Multi-document select and edit method of large language model based on NLP technology
CN113505889B (en) Processing method and device of mapping knowledge base, computer equipment and storage medium
JP3910823B2 (en) Questionnaire analysis apparatus, questionnaire analysis method and program
Deegan et al. Computational linguistics meets metadata, or the automatic extraction of key words from full text content
CN114153947A (en) Document processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination