CN111737499B - Data searching method based on natural language processing and related equipment - Google Patents

Data searching method based on natural language processing and related equipment Download PDF

Info

Publication number
CN111737499B
CN111737499B CN202010727532.0A CN202010727532A CN111737499B CN 111737499 B CN111737499 B CN 111737499B CN 202010727532 A CN202010727532 A CN 202010727532A CN 111737499 B CN111737499 B CN 111737499B
Authority
CN
China
Prior art keywords
information
answer
government affair
type
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010727532.0A
Other languages
Chinese (zh)
Other versions
CN111737499A (en
Inventor
袁小力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010727532.0A priority Critical patent/CN111737499B/en
Publication of CN111737499A publication Critical patent/CN111737499A/en
Application granted granted Critical
Publication of CN111737499B publication Critical patent/CN111737499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The invention relates to the technical field of artificial intelligence, and provides a data searching method based on natural language processing, which comprises the following steps: receiving government affair search information; acquiring the information type of government affair search information; preprocessing the government affair search information according to the information type to obtain preprocessed information; identifying the preprocessed information and determining the problem type; based on the question type, searching the preprocessed information through a government affair knowledge map to obtain a first answer; judging whether the first answer is valid or not according to the updating state of the government affair knowledge map; if the answer is valid, determining a plurality of associated dimensions of the first answer according to the answer type of the first answer, and expanding the first answer through a government affair knowledge graph based on the plurality of associated dimensions to obtain a second answer; and outputting the first answer and the second answer. The invention also relates to a block chain technology, which can upload the first answer and the second answer to the block chain. The method is applied to the intelligent government affair scene, so that the development of the intelligent city is promoted.

Description

Data searching method based on natural language processing and related equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data searching method based on natural language processing and related equipment.
Background
Currently, information search from the internet has become a trend for people to acquire information. However, in practice, it is found that, the internet usually outputs some related web pages, and due to the huge amount of data on the internet and the complex data, the user needs to click and search for related answers one by one, and sometimes, due to a keyword error, multiple searches are needed, which undoubtedly wastes a lot of time and energy, the search efficiency is low, and the search accuracy is also low.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data searching method and related apparatus based on natural language processing, which can improve the searching efficiency and the searching accuracy.
The first aspect of the present invention provides a data search method based on natural language processing, including:
receiving input government affair search information;
acquiring the information type of the government affair searching information;
preprocessing the government affair search information according to the information type to obtain preprocessed information;
identifying the preprocessed information through Natural Language Processing (NLP), and determining a problem type corresponding to the preprocessed information;
based on the question type, searching the pre-processing information through a pre-established government affair knowledge map to obtain a first answer;
judging whether the first answer is valid or not according to the updating state of the government affair knowledge map;
if the first answer is valid, determining a plurality of associated dimensions of the first answer according to the answer type of the first answer, and expanding the first answer through the government affair knowledge graph based on the associated dimensions to obtain a second answer;
and outputting the first answer and the second answer.
In a possible implementation manner, the preprocessing the government affair search information according to the information type, and obtaining the preprocessed information includes:
if the information type is a text type, judging whether error characters exist in the government affair searching information or not;
if the government affair search information contains wrong characters, searching a first character with high similarity to the wrong characters from a preset word bank according to an editing distance algorithm;
and replacing the error characters by the first characters, and determining the replaced government affair search information as the preprocessing information.
In a possible implementation manner, the preprocessing the government affair search information according to the information type, and obtaining the preprocessed information includes:
if the information type is a picture type, performing picture processing on the government affair search information by adopting an image deblurring algorithm to obtain a first picture;
and if the first picture has the edge irrelevant information, deleting the edge irrelevant information, and determining the deleted first picture as the preprocessing information.
In a possible implementation manner, the preprocessing the government affair search information according to the information type, and obtaining the preprocessed information includes:
if the information type is a voice type, identifying the regional accent type to which the government affair search information belongs;
and performing voice correction processing on the government affair search information based on the regional accent type, and determining the processed government affair search information as preprocessing information.
In a possible implementation manner, after the pre-processing information is identified through natural language processing NLP and a problem type corresponding to the pre-processing information is determined, the method for searching data based on natural language processing further includes:
acquiring the confidentiality level of the question type and acquiring the user level of the current input user;
judging whether the user level is matched with the secret level;
and if the user level is matched with the secret level, searching the pre-processing information through a pre-established government affair knowledge map based on the question type to obtain a first answer.
In one possible implementation manner, the data search method based on natural language processing further includes:
acquiring government affair information from each government affair website through a web crawler technology;
determining a plurality of entities from the government affair information, and analyzing the incidence relation of the entities based on the labels of the entities;
and establishing a government affair knowledge map according to the entities and the incidence relation.
In one possible implementation, the outputting the first answer and the second answer includes:
acquiring an output mode matched with the information type;
if the output mode is a text mode, acquiring text attributes of the government affair search information, and outputting the first answer and the second answer by adopting the text attributes; or
If the output mode is a picture mode, acquiring a graphic template matched with the answer type, and performing visual display on the first answer and the second answer by using the graphic template; or
And if the output mode is a voice mode, converting the first answer and the second answer into a second voice matched with the accent of the government affair search information, and outputting the second voice.
A second aspect of the present invention provides a data search apparatus comprising:
the receiving module is used for receiving the input government affair searching information;
the acquisition module is used for acquiring the information type of the government affair search information;
the processing module is used for preprocessing the government affair searching information according to the information type to obtain preprocessed information;
the determining module is used for identifying the preprocessing information through Natural Language Processing (NLP) and determining the problem type corresponding to the preprocessing information;
the searching module is used for searching the preprocessing information through a pre-established government affair knowledge map based on the question type to obtain a first answer;
the judging module is used for judging whether the first answer is valid according to the updating state of the government affair knowledge map;
the determining module is further configured to determine, if the first answer is valid, a plurality of association dimensions of the first answer according to an answer type of the first answer, and expand the first answer through the government affair knowledge graph based on the plurality of association dimensions to obtain a second answer;
and the output module is used for outputting the first answer and the second answer.
A third aspect of the present invention provides an electronic device comprising a processor and a memory, the processor being configured to implement the data search method based on natural language processing when executing a computer program stored in the memory.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the natural language processing-based data search method.
According to the method and the device, the government affair search information is preprocessed according to the information type of the government affair search information, semantic recognition is carried out through NLP, the real intention of the user is obtained, answers thought by the user are searched based on the government affair knowledge graph, meanwhile, correlation search is carried out on the answers, more correlation information can be provided for the user, the search efficiency and the search accuracy can be improved, meanwhile, the user search satisfaction can be improved, and the user experience is improved.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a data search method based on natural language processing according to the present invention.
FIG. 2 is a functional block diagram of a preferred embodiment of a data search apparatus according to the present disclosure.
FIG. 3 is a schematic structural diagram of an electronic device implementing a data search method based on natural language processing according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
The electronic device is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The electronic device may also include a network device and/or a user device. The network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers. The user device includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or the like.
Referring to fig. 1, fig. 1 is a flowchart illustrating a data searching method based on natural language processing according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed, and some steps may be omitted.
And S11, receiving the input government affair search information.
And S12, acquiring the information type of the government affair search information.
The information type may be a text type, a picture type, or a voice type, which is not limited in the embodiments of the present invention.
And S13, preprocessing the government affair search information according to the information type to obtain preprocessed information.
Specifically, the preprocessing the government affair search information according to the information type, and the obtaining of the preprocessed information includes:
if the information type is a text type, judging whether error characters exist in the government affair searching information or not;
if the government affair search information contains wrong characters, searching a first character with high similarity to the wrong characters from a preset word bank according to an editing distance algorithm;
and replacing the error characters by the first characters, and determining the replaced government affair search information as the preprocessing information.
In this alternative embodiment, the search system may automatically recognize the error correction when the user enters government search information with misspellings, homophones. The method comprises the steps of firstly searching words with higher word similarity and pinyin similarity in a preset word bank (usually the word bank in the same industry field) to serve as candidate words to narrow an editing distance calculation range, then calculating the editing distance between each candidate word and a word needing error correction, taking the candidate word with the minimum editing distance, and returning the candidate word as a result if the editing distance value exceeds a set error correction threshold value. By the method, government affair search information can be corrected, more accurate search information can be obtained, and the accuracy rate of identifying the intention of the user can be improved.
Specifically, the preprocessing the government affair search information according to the information type, and the obtaining of the preprocessed information includes:
if the information type is a picture type, performing picture processing on the government affair search information by adopting an image deblurring algorithm to obtain a first picture;
and if the first picture has the edge irrelevant information, deleting the edge irrelevant information, and determining the deleted first picture as the preprocessing information.
In this optional embodiment, when the information type is a picture type, by performing picture processing on the government affair search information and deleting the edge-independent information, not only can the definition of the picture be improved, but also the redundant information of the picture is reduced, the amount of information to be identified is reduced, and thus the accuracy of identifying the user intention can be improved.
Specifically, the preprocessing the government affair search information according to the information type, and the obtaining of the preprocessed information includes:
if the information type is a voice type, identifying the regional accent type to which the government affair search information belongs;
and performing voice correction processing on the government affair search information based on the regional accent type, and determining the processed government affair search information as preprocessing information.
In this optional embodiment, when the users in different areas input voice, the accents in the area may be brought by themselves, and the voice recognition may be difficult due to the regionalization of the accents, and by performing the voice correction processing on the government affair search information based on the type of the accent in the area, mandarin that meets the standard may be obtained, and then the intention of the user may be accurately recognized, so that the accuracy of recognizing the intention of the user may be improved.
S14, identifying the preprocessing information through Natural Language Processing (NLP), and determining the problem type corresponding to the preprocessing information.
The NLP (Natural Language Processing) may identify the preprocessed information and obtain a problem type corresponding to the preprocessed information.
Wherein the pre-processing information may be categorized into groups of problem types.
Problem type 1: "person attributes-title-hierarchy-school-label-organization-department-geography", for example, which captain a XXX organization has, who the captain of the XXX organization's XXX department is.
Problem type 2: "institution-geography-time-budget value", for example, how much was budgeted by XXX institution 2018, and how much was budgeted by the most budgeted institution.
Optionally, after the preprocessing information is identified by the natural language processing NLP and the problem type corresponding to the preprocessing information is determined, the method further includes:
acquiring the confidentiality level of the question type and acquiring the user level of the current input user;
judging whether the user level is matched with the secret level;
and if the user level is matched with the secret level, searching the pre-processing information through a pre-established government affair knowledge map based on the question type to obtain a first answer.
In this alternative embodiment, since the user is searching for government-related information, which is typically confidential, not all people can search. The identity of the input user can be verified by matching and judging the confidentiality level and the user level, and if the input user is matched with the user level, the input user is indicated to belong to a legal user and has the searching authority. Through the verification of the user identity, the safety of the information can be ensured.
And S15, searching the pre-processing information through a pre-established government affair knowledge map based on the question type to obtain a first answer.
Optionally, the method further includes:
acquiring government affair information from each government affair website through a web crawler technology;
determining a plurality of entities from the government affair information, and analyzing the incidence relation of the entities based on the labels of the entities;
and establishing a government affair knowledge map according to the entities and the incidence relation.
In this alternative embodiment, the relevant websites such as government official websites, financial hall official websites, government bids and the like of provinces and cities across the country can be crawled for timing information through a large number of rules and crawler technologies.
For example, suppose that both the chief of city a and the chief of city B read from the university of beijing, the university of beijing is an entity (entity), and one of the relations (relations) is a reader, and the relation is related to the other entities, namely the chief of city a and the chief of city B. But due to the particularities of the government field, the leader shift may be changed, but the post is always present, so we record the post and the corresponding person. Each entity has a specific label such as organization, geography, function, attribute. By analogy, a large number of government affair knowledge maps with large entity and relationship can be constructed.
And S16, judging whether the first answer is valid according to the updating state of the government affair knowledge map.
Wherein the first answer may be determined to be valid if the update status of the government affairs knowledge graph indicates that the data in the government affairs knowledge graph are all the latest data, and conversely, the first answer may be determined to be invalid if the update status of the government affairs knowledge graph indicates that the data in the government affairs knowledge graph are not the latest data, that is, the government affairs knowledge graph is not updated for a long time.
In this way, it is ensured that the first answer to the query matches the preprocessed information, and at the same time, it is ensured that the first answer is a valid answer to the current query of the user.
S17, if the first answer is valid, determining a plurality of associated dimensions of the first answer according to the answer type of the first answer, and expanding the first answer through the government affair knowledge graph based on the associated dimensions to obtain a second answer.
Wherein the associated dimensions may include, but are not limited to, time, place, area, people, and the like.
For example, if the answer type of the first answer is a person, it may be determined that multiple associated dimensions of the first answer may be time and area, and based on the associated dimensions, the year and month of birth of the person, where the person is born, where the person currently lives, and so on may be found.
For another example, if the answer type of the first answer is a certain region, it may be determined that the association dimension of the first answer may be time, and the budget conditions of the region in recent years may be found based on the association dimension.
Optionally, if the first answer is valid, the method further includes:
acquiring user information of a current input user;
judging whether the input user has the authority of expanding answers or not according to the user information;
if the input user has the authority of expanding answers, determining a plurality of associated dimensions of the first answer according to the answer type of the first answer, and expanding the first answer through the government affair knowledge graph based on the associated dimensions to obtain a second answer.
In the optional implementation mode, the extension authority of the user can be preset, the condition that some users can only obtain a single answer during searching is limited, some users can not only obtain the answer but also obtain related answers during searching, and through the setting of the extension authority, the condition that illegal users (namely users without authority) can obtain deeper information can be prevented, and the information leakage is avoided.
And S18, outputting the first answer and the second answer.
Specifically, the outputting the first answer and the second answer includes:
acquiring an output mode matched with the information type;
if the output mode is a text mode, acquiring text attributes of the government affair search information, and outputting the first answer and the second answer by adopting the text attributes; or
If the output mode is a picture mode, acquiring a graphic template matched with the answer type, and performing visual display on the first answer and the second answer by using the graphic template; or
And if the output mode is a voice mode, converting the first answer and the second answer into a second voice matched with the accent of the government affair search information, and outputting the second voice.
In the embodiment, if the information type is a character, the output mode is also a character, if the information type is a picture, the output mode is also a picture, and if the information type is a voice, the output mode is also a voice.
In the method flow described in fig. 1, the government affair search information is preprocessed according to the information type of the government affair search information, semantic recognition is performed through NLP to obtain the real intention of the user, answers thought by the user are searched based on the government affair knowledge graph, and meanwhile, correlation search is performed on the answers, so that more correlation information can be provided for the user, the search efficiency and the search accuracy can be improved, meanwhile, the satisfaction degree of the user search can be improved, and the user experience is improved.
The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and it will be apparent to those skilled in the art that modifications may be made without departing from the inventive concept of the present invention, and these modifications are within the scope of the present invention.
Referring to fig. 2, fig. 2 is a functional block diagram of a preferred embodiment of a data search apparatus according to the present invention.
In some embodiments, the data search apparatus operates in an electronic device. The data search means may comprise a plurality of functional modules consisting of program code segments. Program codes of respective program segments in the data search apparatus may be stored in the memory and executed by the at least one processor to perform some or all of the steps of the natural language processing-based data search method described in fig. 1.
In this embodiment, the data search apparatus may be divided into a plurality of functional modules according to the functions performed by the data search apparatus. The functional module may include: the device comprises a receiving module 201, an obtaining module 202, a processing module 203, a determining module 204, a searching module 205, a judging module 206 and an outputting module 207. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory.
The receiving module 201 is configured to receive input government affair search information.
An obtaining module 202, configured to obtain an information type of the government affair search information.
The information type may be a text type, a picture type, or a voice type, which is not limited in the embodiments of the present invention.
And the processing module 203 is configured to perform preprocessing on the government affair search information according to the information type to obtain preprocessed information.
Specifically, the preprocessing the government affair search information according to the information type, and the obtaining of the preprocessed information includes:
if the information type is a text type, judging whether error characters exist in the government affair searching information or not;
if the government affair search information contains wrong characters, searching a first character with high similarity to the wrong characters from a preset word bank according to an editing distance algorithm;
and replacing the error characters by the first characters, and determining the replaced government affair search information as the preprocessing information.
In this alternative embodiment, the search system may automatically recognize the error correction when the user enters government search information with misspellings, homophones. The method comprises the steps of firstly searching words with higher word similarity and pinyin similarity in a preset word bank (usually the word bank in the same industry field) to serve as candidate words to narrow an editing distance calculation range, then calculating the editing distance between each candidate word and a word needing error correction, taking the candidate word with the minimum editing distance, and returning the candidate word as a result if the editing distance value exceeds a set error correction threshold value. By the method, government affair search information can be corrected, more accurate search information can be obtained, and the accuracy rate of identifying the intention of the user can be improved.
Specifically, the preprocessing the government affair search information according to the information type, and the obtaining of the preprocessed information includes:
if the information type is a picture type, performing picture processing on the government affair search information by adopting an image deblurring algorithm to obtain a first picture;
and if the first picture has the edge irrelevant information, deleting the edge irrelevant information, and determining the deleted first picture as the preprocessing information.
In this optional embodiment, when the information type is a picture type, by performing picture processing on the government affair search information and deleting the edge-independent information, not only can the definition of the picture be improved, but also the redundant information of the picture is reduced, the amount of information to be identified is reduced, and thus the accuracy of identifying the user intention can be improved.
Specifically, the preprocessing the government affair search information according to the information type, and the obtaining of the preprocessed information includes:
if the information type is a voice type, identifying the regional accent type to which the government affair search information belongs;
and performing voice correction processing on the government affair search information based on the regional accent type, and determining the processed government affair search information as preprocessing information.
In this optional embodiment, when the users in different areas input voice, the accents in the area may be brought by themselves, and the voice recognition may be difficult due to the regionalization of the accents, and by performing the voice correction processing on the government affair search information based on the type of the accent in the area, mandarin that meets the standard may be obtained, and then the intention of the user may be accurately recognized, so that the accuracy of recognizing the intention of the user may be improved.
A determining module 204, configured to identify the pre-processing information through natural language processing NLP, and determine a problem type corresponding to the pre-processing information.
The NLP (Natural Language Processing) may identify the preprocessed information and obtain a problem type corresponding to the preprocessed information.
Wherein the pre-processing information may be categorized into groups of problem types.
Problem type 1: "person attributes-title-hierarchy-school-label-organization-department-geography", for example, which captain a XXX organization has, who the captain of the XXX organization's XXX department is.
Problem type 2: "institution-geography-time-budget value", for example, how much was budgeted by XXX institution 2018, and how much was budgeted by the most budgeted institution.
The searching module 205 is configured to search the pre-processing information through a pre-established government affair knowledge graph based on the question type to obtain a first answer.
And the judging module 206 is configured to judge whether the first answer is valid according to the update state of the government affair knowledge graph.
Wherein the first answer may be determined to be valid if the update status of the government affairs knowledge graph indicates that the data in the government affairs knowledge graph are all the latest data, and conversely, the first answer may be determined to be invalid if the update status of the government affairs knowledge graph indicates that the data in the government affairs knowledge graph are not the latest data, that is, the government affairs knowledge graph is not updated for a long time.
In this way, it is ensured that the first answer to the query matches the preprocessed information, and at the same time, it is ensured that the first answer is a valid answer to the current query of the user.
The determining module 204 is further configured to determine, if the first answer is valid, a plurality of association dimensions of the first answer according to an answer type of the first answer, and expand the first answer through the government affair knowledge graph based on the plurality of association dimensions to obtain a second answer.
Wherein the associated dimensions may include, but are not limited to, time, place, area, people, and the like.
For example, if the answer type of the first answer is a person, it may be determined that multiple associated dimensions of the first answer may be time and area, and based on the associated dimensions, the year and month of birth of the person, where the person is born, where the person currently lives, and so on may be found.
For another example, if the answer type of the first answer is a certain region, it may be determined that the association dimension of the first answer may be time, and the budget conditions of the region in recent years may be found based on the association dimension.
And the output module is used for outputting the first answer and the second answer.
Specifically, the outputting the first answer and the second answer includes:
acquiring an output mode matched with the information type;
if the output mode is a text mode, acquiring text attributes of the government affair search information, and outputting the first answer and the second answer by adopting the text attributes; or
If the output mode is a picture mode, acquiring a graphic template matched with the answer type, and performing visual display on the first answer and the second answer by using the graphic template; or
And if the output mode is a voice mode, converting the first answer and the second answer into a second voice matched with the accent of the government affair search information, and outputting the second voice.
In the embodiment, if the information type is a character, the output mode is also a character, if the information type is a picture, the output mode is also a picture, and if the information type is a voice, the output mode is also a voice.
Optionally, the obtaining module 202 is further configured to obtain a confidentiality level of the question type and a user level of a currently input user after the determining module 204 identifies the preprocessed information through natural language processing NLP, and determines a question type corresponding to the preprocessed information;
the determining module 206 is further configured to determine whether the user level matches the secret level;
and if the user level is matched with the secret level, searching the pre-processing information through a pre-established government affair knowledge map based on the question type to obtain a first answer.
In this alternative embodiment, since the user is searching for government-related information, which is typically confidential, not all people can search. The identity of the input user can be verified by matching and judging the confidentiality level and the user level, and if the input user is matched with the user level, the input user is indicated to belong to a legal user and has the searching authority. Through the verification of the user identity, the safety of the information can be ensured.
Optionally, the obtaining module 202 is further configured to obtain government affair information from each government affair website through a web crawler technology;
the determining module 204 is further configured to determine a plurality of entities from the government affair information, and analyze association relationships of the plurality of entities based on tags of the plurality of entities;
the data search apparatus further includes:
and the establishing module is used for establishing a government affair knowledge map according to the entities and the incidence relation.
In this alternative embodiment, the relevant websites such as government official websites, financial hall official websites, government bids and the like of provinces and cities across the country can be crawled for timing information through a large number of rules and crawler technologies.
For example, suppose that both the chief of city a and the chief of city B read from the university of beijing, the university of beijing is an entity (entity), and one of the relations (relations) is a reader, and the relation is related to the other entities, namely the chief of city a and the chief of city B. But due to the particularities of the government field, the leader shift may be changed, but the post is always present, so we record the post and the corresponding person. Each entity has a specific label such as organization, geography, function, attribute. By analogy, a large number of government affair knowledge maps with large entity and relationship can be constructed.
Optionally, if the first answer is valid, the obtaining module 202 is further configured to obtain user information of a currently input user;
the determining module 206 is further configured to determine whether the input user has the authority to expand the answer according to the user information;
the determining module 204 is further configured to determine, if the input user has the right to expand an answer, a plurality of association dimensions of the first answer according to the answer type of the first answer, and expand the first answer through the government affair knowledge graph based on the plurality of association dimensions to obtain a second answer.
In the optional implementation mode, the extension authority of the user can be preset, the condition that some users can only obtain a single answer during searching is limited, some users can not only obtain the answer but also obtain related answers during searching, and through the setting of the extension authority, the condition that illegal users (namely users without authority) can obtain deeper information can be prevented, and the information leakage is avoided.
In the data search device described in fig. 2, the government affair search information is preprocessed according to the information type of the government affair search information, semantic recognition is performed through NLP to obtain the real intention of the user, answers thought by the user are searched based on the government affair knowledge graph, and meanwhile, correlation search is performed on the answers, so that more correlation information can be provided for the user, the search efficiency and the search accuracy can be improved, meanwhile, the satisfaction degree of the user search can be improved, and the user experience is improved.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing a data search method based on natural language processing. The electronic device 3 comprises a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and executable on the at least one processor 32, and at least one communication bus 34.
Those skilled in the art will appreciate that the schematic diagram shown in fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the electronic device 3 may further include an input/output device, a network access device, and the like.
The at least one Processor 32 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The processor 32 may be a microprocessor or the processor 32 may be any conventional processor or the like, and the processor 32 is a control center of the electronic device 3 and connects various parts of the whole electronic device 3 by various interfaces and lines.
The memory 31 may be used to store the computer program 33 and/or the module/unit, and the processor 32 may implement various functions of the electronic device 3 by running or executing the computer program and/or the module/unit stored in the memory 31 and calling data stored in the memory 31. The memory 31 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data) created according to the use of the electronic device 3, and the like. Further, the memory 31 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other non-volatile solid state storage device.
With reference to fig. 1, the memory 31 in the electronic device 3 stores a plurality of instructions to implement a data search method based on natural language processing, and the processor 32 can execute the plurality of instructions to implement:
receiving input government affair search information;
acquiring the information type of the government affair searching information;
preprocessing the government affair search information according to the information type to obtain preprocessed information;
identifying the preprocessed information through Natural Language Processing (NLP), and determining a problem type corresponding to the preprocessed information;
based on the question type, searching the pre-processing information through a pre-established government affair knowledge map to obtain a first answer;
judging whether the first answer is valid or not according to the updating state of the government affair knowledge map;
if the first answer is valid, determining a plurality of associated dimensions of the first answer according to the answer type of the first answer, and expanding the first answer through the government affair knowledge graph based on the associated dimensions to obtain a second answer;
and outputting the first answer and the second answer.
Specifically, the processor 32 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the instruction, which is not described herein again.
In the electronic device 3 described in fig. 3, the government affair search information is preprocessed according to the information type of the government affair search information, semantic recognition is performed through NLP to obtain the real intention of the user, answers thought by the user are searched based on the government affair knowledge graph, and meanwhile, correlation search is performed on the answers, so that more correlation information can be provided for the user, the search efficiency and the search accuracy can be improved, meanwhile, the satisfaction degree of the user search can be improved, and the user experience is improved.
The integrated modules/units of the electronic device 3 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, and Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. The units or means recited in the system claims may also be implemented by software or hardware.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (9)

1. A data search method based on natural language processing is characterized in that the data search method based on natural language processing comprises the following steps:
receiving input government affair search information;
acquiring the information type of the government affair searching information;
preprocessing the government affair search information according to the information type to obtain preprocessed information;
identifying the preprocessed information through Natural Language Processing (NLP), and determining a problem type corresponding to the preprocessed information;
acquiring the confidentiality level of the question type and acquiring the user level of the current input user;
judging whether the user level is matched with the secret level;
if the user level is matched with the secret level, based on the question type, searching the pre-processing information through a pre-established government affair knowledge map to obtain a first answer;
judging whether the first answer is valid or not according to the updating state of the government affair knowledge map;
when the first answer is valid, acquiring user information of the input user, and judging whether the input user has the authority of expanding the answer or not according to the user information;
when the input user has the right of expanding answers, determining a plurality of associated dimensions of the first answer according to the answer type of the first answer, and expanding the first answer through the government affair knowledge graph based on the associated dimensions to obtain a second answer;
and outputting the first answer and the second answer.
2. The data search method based on natural language processing according to claim 1, wherein the preprocessing the government affairs search information according to the information type, and obtaining the preprocessed information comprises:
if the information type is a text type, judging whether error characters exist in the government affair searching information or not;
if the government affair search information contains wrong characters, searching a first character with high similarity to the wrong characters from a preset word bank according to an editing distance algorithm;
and replacing the error characters by the first characters, and determining the replaced government affair search information as the preprocessing information.
3. The data search method based on natural language processing according to claim 1, wherein the preprocessing the government affairs search information according to the information type, and obtaining the preprocessed information comprises:
if the information type is a picture type, performing picture processing on the government affair search information by adopting an image deblurring algorithm to obtain a first picture;
and if the first picture has the edge irrelevant information, deleting the edge irrelevant information, and determining the deleted first picture as the preprocessing information.
4. The data search method based on natural language processing according to claim 1, wherein the preprocessing the government affairs search information according to the information type, and obtaining the preprocessed information comprises:
if the information type is a voice type, identifying the regional accent type to which the government affair search information belongs;
and performing voice correction processing on the government affair search information based on the regional accent type, and determining the processed government affair search information as preprocessing information.
5. The natural language processing based data search method of claim 1, wherein the natural language processing based data search method further comprises:
acquiring government affair information from each government affair website through a web crawler technology;
determining a plurality of entities from the government affair information, and analyzing the incidence relation of the entities based on the labels of the entities;
and establishing a government affair knowledge map according to the entities and the incidence relation.
6. The data search method based on natural language processing according to claim 1, wherein the outputting the first answer and the second answer comprises:
acquiring an output mode matched with the information type;
if the output mode is a text mode, acquiring text attributes of the government affair search information, and outputting the first answer and the second answer by adopting the text attributes; or
If the output mode is a picture mode, acquiring a graphic template matched with the answer type, and performing visual display on the first answer and the second answer by using the graphic template; or
And if the output mode is a voice mode, converting the first answer and the second answer into a second voice matched with the accent of the government affair search information, and outputting the second voice.
7. A data search apparatus, characterized in that the data search apparatus comprises:
the receiving module is used for receiving the input government affair searching information;
the acquisition module is used for acquiring the information type of the government affair search information;
the processing module is used for preprocessing the government affair searching information according to the information type to obtain preprocessed information;
the determining module is used for identifying the preprocessed information through Natural Language Processing (NLP), determining a problem type corresponding to the preprocessed information, acquiring the confidentiality level of the problem type and acquiring the user level of the current input user; judging whether the user level is matched with the secret level;
the searching module is used for searching the preprocessing information through a pre-established government affair knowledge map based on the question type to obtain a first answer if the user level is matched with the secret level;
the judging module is used for judging whether the first answer is valid according to the updating state of the government affair knowledge map, acquiring the user information of the input user when the first answer is valid, and judging whether the input user has the authority of expanding the answer according to the user information;
the determining module is further configured to determine multiple association dimensions of the first answer according to an answer type of the first answer when the input user has a right to expand the answer, and expand the first answer through the government affair knowledge graph based on the multiple association dimensions to obtain a second answer; and the output module is used for outputting the first answer and the second answer.
8. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the natural language processing based data search method according to any one of claims 1 to 6.
9. A computer-readable storage medium storing at least one instruction which, when executed by a processor, implements a natural language processing-based data search method according to any one of claims 1 to 6.
CN202010727532.0A 2020-07-27 2020-07-27 Data searching method based on natural language processing and related equipment Active CN111737499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010727532.0A CN111737499B (en) 2020-07-27 2020-07-27 Data searching method based on natural language processing and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010727532.0A CN111737499B (en) 2020-07-27 2020-07-27 Data searching method based on natural language processing and related equipment

Publications (2)

Publication Number Publication Date
CN111737499A CN111737499A (en) 2020-10-02
CN111737499B true CN111737499B (en) 2020-11-27

Family

ID=72657781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010727532.0A Active CN111737499B (en) 2020-07-27 2020-07-27 Data searching method based on natural language processing and related equipment

Country Status (1)

Country Link
CN (1) CN111737499B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214613A (en) * 2020-10-15 2021-01-12 平安国际智慧城市科技股份有限公司 Artificial intelligence-based medication recommendation method and device, electronic equipment and medium
CN112417885A (en) * 2020-11-17 2021-02-26 平安科技(深圳)有限公司 Answer generation method and device based on artificial intelligence, computer equipment and medium
CN112507095A (en) * 2020-12-15 2021-03-16 平安国际智慧城市科技股份有限公司 Information identification method based on weak supervised learning and related equipment
CN113204644B (en) * 2021-01-07 2022-08-30 合肥工业大学 Government affair encyclopedia construction method based on knowledge graph
CN112860865A (en) * 2021-02-10 2021-05-28 达而观信息科技(上海)有限公司 Method, device, equipment and storage medium for realizing intelligent question answering
CN113239146B (en) * 2021-05-12 2023-07-28 平安科技(深圳)有限公司 Response analysis method, device, equipment and storage medium
CN113505262B (en) * 2021-08-17 2022-03-29 深圳华声医疗技术股份有限公司 Ultrasonic image searching method and device, ultrasonic equipment and storage medium
CN114881675A (en) * 2022-07-11 2022-08-09 广东电网有限责任公司 Intelligent customer service method and system based on power grid service
CN116863935B (en) * 2023-09-04 2023-11-24 深圳有咖互动科技有限公司 Speech recognition method, device, electronic equipment and computer readable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102713865A (en) * 2009-10-20 2012-10-03 汤森路透环球资源公司 Entitled data cache management
CN107480551A (en) * 2017-07-06 2017-12-15 网易(杭州)网络有限公司 A kind of file management method and device
CN110008234A (en) * 2019-04-11 2019-07-12 北京百度网讯科技有限公司 A kind of business datum searching method, device and electronic equipment
CN110727930A (en) * 2019-10-12 2020-01-24 北京推想科技有限公司 Authority control method and device
CN111416789A (en) * 2019-01-04 2020-07-14 腾讯科技(深圳)有限公司 Method, apparatus and computer-readable storage medium for assigning usage rights to a user

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102083066B (en) * 2009-11-26 2014-04-09 中兴通讯股份有限公司 Unified safety authentication method and system
US9037529B2 (en) * 2011-06-15 2015-05-19 Ceresis, Llc Method for generating visual mapping of knowledge information from parsing of text inputs for subjects and predicates
CN203368574U (en) * 2013-07-31 2013-12-25 湖南大学 Mobile communication device with voice and text output selection function
CN105675008A (en) * 2016-01-08 2016-06-15 北京乐驾科技有限公司 Navigation display method and system
CN107992545A (en) * 2017-11-27 2018-05-04 珠海市魅族科技有限公司 A kind of searching method, device, terminal and readable storage medium storing program for executing
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN109410664B (en) * 2018-12-12 2021-01-26 广东小天才科技有限公司 Pronunciation correction method and electronic equipment
CN111191105A (en) * 2019-10-31 2020-05-22 腾讯科技(深圳)有限公司 Method, device, system, equipment and storage medium for searching government affair information
CN111159230A (en) * 2019-11-29 2020-05-15 上海数据交易中心有限公司 Data resource map construction method and device, storage medium and terminal
CN111078897A (en) * 2019-12-26 2020-04-28 国衡智慧城市科技研究院(北京)有限公司 System for generating six-dimensional knowledge map

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102713865A (en) * 2009-10-20 2012-10-03 汤森路透环球资源公司 Entitled data cache management
CN107480551A (en) * 2017-07-06 2017-12-15 网易(杭州)网络有限公司 A kind of file management method and device
CN111416789A (en) * 2019-01-04 2020-07-14 腾讯科技(深圳)有限公司 Method, apparatus and computer-readable storage medium for assigning usage rights to a user
CN110008234A (en) * 2019-04-11 2019-07-12 北京百度网讯科技有限公司 A kind of business datum searching method, device and electronic equipment
CN110727930A (en) * 2019-10-12 2020-01-24 北京推想科技有限公司 Authority control method and device

Also Published As

Publication number Publication date
CN111737499A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN111737499B (en) Data searching method based on natural language processing and related equipment
WO2022105122A1 (en) Answer generation method and apparatus based on artificial intelligence, and computer device and medium
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN110765770A (en) Automatic contract generation method and device
CN108959559B (en) Question and answer pair generation method and device
CN111046221A (en) Song recommendation method and device, terminal equipment and storage medium
CN112287914B (en) PPT video segment extraction method, device, equipment and medium
CN112686036B (en) Risk text recognition method and device, computer equipment and storage medium
WO2021196825A1 (en) Abstract generation method and apparatus, and electronic device and medium
CN111538816B (en) Question-answering method, device, electronic equipment and medium based on AI identification
CN110209721A (en) Judgement document transfers method, apparatus, server and storage medium
CN110909120A (en) Resume searching/delivering method, device and system and electronic equipment
CN111723870B (en) Artificial intelligence-based data set acquisition method, apparatus, device and medium
CN112651236A (en) Method and device for extracting text information, computer equipment and storage medium
CN112395391A (en) Concept graph construction method and device, computer equipment and storage medium
CN111552865A (en) User interest portrait method and related equipment
CN113626704A (en) Method, device and equipment for recommending information based on word2vec model
WO2021139242A1 (en) Presentation file generation method, apparatus, and device and storage medium
Liang et al. Detecting novel business blogs
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
CN113627186B (en) Entity relation detection method based on artificial intelligence and related equipment
CN112989820B (en) Legal document positioning method, device, equipment and storage medium
CN110909538B (en) Question and answer content identification method and device, terminal equipment and medium
CN113434631A (en) Emotion analysis method and device based on event, computer equipment and storage medium
CN113887191A (en) Method and device for detecting similarity of articles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant