CN115169336A

CN115169336A - Knowledge retrieval method, device and storage medium based on artificial intelligence

Info

Publication number: CN115169336A
Application number: CN202210786107.8A
Authority: CN
Inventors: 李田
Original assignee: Second Research Institute Of Casic
Current assignee: Second Research Institute Of Casic
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-10-11

Abstract

The invention discloses a knowledge retrieval method, a knowledge retrieval device and a storage medium based on artificial intelligence, which are used for improving precision ratio and recall ratio of knowledge retrieval and meeting information retrieval requirements of users. The method comprises the following steps: receiving a retrieval request submitted by a user, wherein the retrieval request carries a retrieval statement; performing knowledge extraction based on the retrieval statement to obtain entity information in the retrieval statement, wherein the entity information comprises at least one of the following items: entities, entity relationships, and entity attributes; carrying out concept establishment according to the obtained entity information to obtain a query vocabulary; searching a target key vocabulary matched with the query vocabulary from a pre-established literature resource expression table, wherein the literature resource expression table comprises data records of mapping address information of the key vocabulary and original literature resources to which the key vocabulary belongs; and extracting and presenting the data record containing the target key vocabulary from the literature resource representation table.

Description

Knowledge retrieval method, device and storage medium based on artificial intelligence

Technical Field

The invention relates to the technical field of artificial intelligence technology application, in particular to a knowledge retrieval method, a knowledge retrieval device and a storage medium based on artificial intelligence.

Background

Currently, people mainly use search engines to retrieve information in a keyword-based retrieval manner. With this search method, although a large amount of information can be fed back to the user in a short time, the returned search result is too noisy and contains too much useless information, and the user needs to pay attention to analyzing, identifying and selecting a large amount of feedback information in order to obtain the really needed information. In addition, when a user searches, the user often does not know what keyword should be input, but only knows what difficulty the user encounters, so that the input search keyword is inaccurate, and in this case, the search result fed back by adopting the keyword search mode cannot meet the real requirements of the user. Moreover, the search based on the keywords submitted by the user has certain limitations, which results in incomplete coverage of the returned search results.

Therefore, how to improve the precision ratio and the recall ratio of search results of a search engine and meet the requirement of user information retrieval becomes one of the technical problems to be solved urgently in the prior art.

Disclosure of Invention

The invention provides a knowledge retrieval method, a knowledge retrieval device and a storage medium based on artificial intelligence, which are used for improving the precision ratio and the recall ratio of knowledge retrieval and meeting the information retrieval requirements of users.

In a first aspect, a knowledge retrieval method based on artificial intelligence is provided, which includes:

receiving a retrieval request submitted by a user, wherein the retrieval request carries a retrieval statement;

performing knowledge extraction based on the retrieval statement to obtain entity information in the retrieval statement, wherein the entity information comprises at least one of the following items: entities, entity relationships, and entity attributes;

carrying out concept establishment according to the obtained entity information to obtain a query vocabulary;

searching a target key vocabulary matched with the query vocabulary from a pre-established literature resource expression table, wherein the literature resource expression table comprises data records of mapping address information of the key vocabulary and original literature resources to which the key vocabulary belongs;

and extracting and presenting the data record containing the target key vocabulary from the literature resource representation table.

In one embodiment, the literature resource representation table further includes membership degrees corresponding to the key vocabularies;

the literature resource expression table is established according to the following procedures:

extracting knowledge of each original document resource to obtain a document vocabulary;

determining the membership degree of each document vocabulary according to a first frequency of the document vocabulary appearing in the document resource and a second frequency of the document vocabulary modified or connected by other document vocabularies in the document resource;

aiming at each literature resource, determining a key vocabulary of the literature resource according to the membership degree of a document vocabulary contained in the literature resource;

aiming at each key vocabulary, establishing the corresponding relation between the key vocabulary and the corresponding membership degree and the mapping address of the original document resource to which the key vocabulary belongs to obtain the data record corresponding to the key vocabulary;

and determining data records corresponding to the key vocabulary of each original document resource to form the document resource representation table.

In one embodiment, for each document vocabulary, determining the degree of membership of the document vocabulary according to a first frequency of occurrence of the document vocabulary in the document resource and a second frequency of modification or concatenation of the document vocabulary by other document vocabularies in the document resource specifically comprises:

respectively counting the times of appearance of the document vocabulary in the document to which the document vocabulary belongs and the times of modification or connection of the document vocabulary in the document resources to which the document vocabulary belongs by other vocabularies aiming at each document vocabulary;

determining a first frequency of the document vocabulary appearing in the document resource according to the frequency of the document vocabulary appearing in the document and the total number of the document vocabulary contained in the document resource;

determining a second frequency of modification or connection of the document vocabulary by other document vocabularies in the document resources according to the times of modification or connection of the document vocabulary by other vocabularies in the document resources and the total number of the document vocabularies contained in the document resources;

and determining the membership degree of the document vocabulary according to the weighting result of the first frequency and the second frequency based on the weight values corresponding to the first frequency and the second frequency.

In one embodiment, before presenting the data record containing the target keyword, the method further comprises:

for each query vocabulary, determining membership degrees corresponding to the query vocabularies based on the retrieval sentences to form query vocabulary vectors; and

respectively extracting the membership degree of each target key vocabulary matched with each query vocabulary from the literature resource expression table to form a key vocabulary vector, and forming a key vocabulary matrix by the key vocabulary vector corresponding to each query vocabulary;

respectively determining the distance between the query vocabulary vector and each target key vocabulary vector;

sequencing each target key vocabulary vector contained in the key vocabulary matrix according to the determined distance; and

extracting and presenting data records containing the target key words, wherein the data records specifically comprise:

and according to the sequencing result of each target key word vector, extracting and presenting data records containing each target key word corresponding to each target key word vector.

In one embodiment, for each target key word corresponding to the target key word vector, extracting and presenting the data record of each target key word according to the following method:

extracting a data record containing each target key vocabulary corresponding to each target key vocabulary vector aiming at each target key vocabulary vector;

and sequencing and presenting the extracted data records according to the membership degree of each target keyword.

In one embodiment, the original document resource comprises a video resource;

before knowledge extraction is carried out on each original document resource to obtain a document vocabulary, the method further comprises the following steps:

extracting the characteristics of the video resources, and performing semantic description according to the extracted characteristics to obtain video resource description information; and

the method for extracting knowledge of each original document resource to obtain a document vocabulary specifically comprises the following steps:

and aiming at video resources, carrying out knowledge extraction on video resource description information corresponding to the video resources to obtain document vocabularies.

In one embodiment, the original document resource comprises a speech resource;

carrying out voice recognition on voice resources to obtain text data; and

knowledge extraction is carried out on each original document resource to obtain a document vocabulary, and the method specifically comprises the following steps:

and aiming at the voice resource, carrying out knowledge extraction on text data corresponding to the voice resource to obtain a document vocabulary.

In one embodiment, the original document resources include original document resources of different languages; and

after the concept establishment is carried out according to the obtained entity information to obtain the query vocabulary, the method further comprises the following steps:

converting the query vocabulary into query vocabularies of different languages by adopting machine translation; and

searching a target key vocabulary matched with the query vocabulary from a pre-established literature resource representation table, which specifically comprises the following steps:

and searching target keyword vocabularies of corresponding languages matched with the query vocabularies of different languages from a pre-established literature resource representation table.

In a second aspect, an artificial intelligence based knowledge retrieval apparatus is provided, comprising:

the system comprises a receiving unit, a searching unit and a searching unit, wherein the receiving unit is used for receiving a searching request submitted by a user, and the searching request carries a searching statement;

an obtaining unit, configured to perform knowledge extraction based on the search statement, and obtain entity information in the search statement, where the entity information includes at least one of the following: entities, entity relationships, and entity attributes;

the concept establishing unit is used for establishing concepts according to the obtained entity information to obtain query words;

the matching unit is used for searching a target key vocabulary matched with the query vocabulary from a pre-established literature resource representation table, and the literature resource representation table contains data records of mapping address information of the key vocabulary and original literature resources to which the key vocabulary belongs;

and the presentation unit is used for extracting the data record containing the target key vocabulary from the literature resource representation table and presenting the data record.

the device, still include:

the mapping unit is used for carrying out knowledge extraction on each original document resource to obtain a document vocabulary; determining the membership degree of each document vocabulary according to a first frequency of the document vocabulary appearing in the document resource and a second frequency of the document vocabulary modified or connected by other document vocabularies in the document resource; aiming at each literature resource, determining a key vocabulary of the literature resource according to the membership degree of a document vocabulary contained in the literature resource; aiming at each key vocabulary, establishing the corresponding relation between the key vocabulary and the corresponding membership degree and the mapping address of the original document resource to which the key vocabulary belongs to obtain the data record corresponding to the key vocabulary; and determining data records corresponding to the key vocabulary of each original document resource to form the document resource representation table.

In one embodiment, the mapping unit is specifically configured to count, for each document vocabulary, the number of times that the document vocabulary appears in the document to which the document vocabulary belongs and the number of times that the document vocabulary is modified or connected by other vocabularies in the document resource to which the document vocabulary belongs, respectively; determining a first frequency of the document vocabulary appearing in the document resource according to the frequency of the document vocabulary appearing in the document and the total number of the document vocabulary contained in the document resource; determining a second frequency of modification or connection of the document vocabulary by other document vocabularies in the document resources according to the times of modification or connection of the document vocabulary by other vocabularies in the document resources and the total number of the document vocabularies contained in the document resources; and determining the membership degree of the document vocabulary according to the weighting result of the first frequency and the second frequency based on the weight values corresponding to the first frequency and the second frequency.

In one embodiment, the apparatus further comprises a sorting unit, wherein:

the sequencing unit is used for establishing a target key vocabulary vector corresponding to each query vocabulary according to the membership degree of the query vocabulary and each target key vocabulary matched in the literature resource expression table aiming at each query vocabulary before the data records containing the target key words are presented by the presentation unit, and the target key vocabulary vector corresponding to each query vocabulary forms a key vocabulary matrix, wherein in the target key vocabulary vector, the vector component value corresponding to the target key vocabulary matched with the query vocabulary is the membership degree of the key vocabulary, and the vector component value corresponding to the key vocabulary not matched with the query vocabulary is 0;

respectively determining the distance between the query vocabulary vector and the corresponding target key vocabulary vector;

sequencing each target key vocabulary vector contained in the key vocabulary matrix according to the determined distance;

and the presentation unit is specifically used for extracting and presenting data records containing all target key words corresponding to all target key word vectors according to the sequencing result of all target key word vectors.

In an embodiment, the presenting unit is further configured to, for each target key vocabulary vector, extract a data record including each target key vocabulary corresponding to the target key vocabulary vector; and sequencing and presenting the extracted data records according to the membership degree of each target keyword.

In one embodiment, the original document resource comprises a video resource;

the apparatus further comprises a feature extraction unit, wherein:

the feature extraction unit is used for extracting features of the video resources and performing semantic description according to the extracted features to obtain video resource description information;

the mapping unit is specifically configured to perform knowledge extraction on video resource description information corresponding to video resources to obtain a document vocabulary.

In one embodiment, the original document resource comprises a voice resource;

the apparatus further comprises a speech recognition unit, wherein:

the voice recognition unit is used for performing voice recognition on voice resources to obtain text data;

the mapping unit is specifically configured to, for a voice resource, perform knowledge extraction on text data corresponding to the voice resource to obtain a document vocabulary.

the apparatus further comprises a translation unit, wherein:

the translation unit is used for converting the query vocabulary into query vocabularies of different languages by adopting machine translation;

the matching unit is specifically configured to search target keyword vocabularies of corresponding languages, which are matched with query vocabularies of different languages, from a pre-established literature resource expression table.

In a third aspect, a computing device is provided, which includes at least one processor and at least one memory, wherein the memory stores a computer program, and the processor is configured to read the computer program in the memory and execute any one of the steps of the artificial intelligence based knowledge retrieval method provided in the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, which stores computer-executable instructions for causing a computer to perform any of the steps of the artificial intelligence based knowledge retrieval method provided in the first aspect.

The invention provides a knowledge retrieval method, a device and a storage medium based on artificial intelligence, which are characterized in that firstly, key vocabularies capable of representing the meanings of original document resources are extracted from the original document resources to express the original document resources to obtain a document resource expression table, on the basis, the knowledge of retrieval sentences submitted by users is extracted to obtain entity information, the entity information comprises entities, entity relations and entity attributes, furthermore, the entity information is conceptually established to obtain corresponding query vocabularies, and the query vocabularies obtained by processing not only comprise the vocabularies contained in the retrieval sentences, but also comprise extended vocabularies obtained by semantically understanding the vocabularies based on the retrieval sentences, so that target key vocabularies obtained by matching the query vocabularies in the document resource expression table can accurately express the meanings of the retrieval sentences, and the coverage range of the key vocabularies is enriched, thereby improving the precision and recall of knowledge retrieval and meeting the user information retrieval requirements.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow diagram illustrating the process of creating a document resource representation according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a process for determining membership of the vocabulary of documents according to an embodiment of the present invention;

FIG. 3 is a flow chart of a knowledge retrieval method based on artificial intelligence according to an embodiment of the invention;

FIG. 4 is a schematic diagram illustrating a process of sorting and displaying search results according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a knowledge retrieval device based on artificial intelligence according to an embodiment of the present invention.

Detailed Description

In order to improve recall ratio and precision ratio of knowledge retrieval and meet information retrieval requirements of users, the embodiment of the invention provides a knowledge retrieval method, a knowledge retrieval device and a storage medium based on artificial intelligence.

It should be noted that the terms "first," "second," and the like in the description and claims of the embodiments of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein.

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, it being understood that the preferred embodiments described herein are for purposes of illustration and explanation only and are not intended to be limiting of the present invention, and that the embodiments and features of the embodiments may be combined with each other without conflict.

Example one

The knowledge retrieval method based on artificial intelligence provided by the embodiment of the invention is not only suitable for retrieving document resources, but also suitable for retrieving multimedia resources, such as video resources and voice resources.

In specific implementation, for document resources, in order to improve precision and recall of knowledge retrieval, in an embodiment of the present invention, a document resource representation table is established for the document resources according to a flow shown in fig. 1, including the following steps:

s11, knowledge extraction is carried out on original document resources to obtain document vocabularies.

In order to reduce redundancy of the document resource library and save the expense of a storage space of the document resource library, in specific implementation, incomplete data, wrong data and repeated data are processed by data cleaning aiming at the document resources, and then the document resources are stored in a native document resource library.

For each original document resource contained in the native document resource library, the document vocabulary contained in the document resource is obtained through knowledge extraction.

And S12, aiming at each document vocabulary, determining the membership degree of the document vocabulary according to the first frequency of the document vocabulary appearing in the document resource and the second frequency of the document vocabulary modified or connected by other document vocabularies in the document resource.

The higher the membership degree is, the more the document vocabulary can represent the document resources.

Based on each document vocabulary obtained in step S11, the degree of membership of the document vocabulary may be determined in step S12 according to the flow shown in fig. 2:

s121, respectively counting the frequency of the document vocabulary appearing in the document to which the document vocabulary belongs and the frequency of the document vocabulary modified or connected by other vocabularies in the document resources to which the document vocabulary belongs for each document vocabulary.

For each document vocabulary, for convenience of description, the frequency of occurrence of the document vocabulary in the document to which the document vocabulary belongs is recorded as T _o The number of times that the vocabulary of the document is modified or connected by other vocabularies in the document resources is recorded as T _c 。

S122, determining a first frequency of the document vocabulary in the document resource according to the frequency of the document vocabulary in the document and the total number of the document vocabulary in the document resource.

Note that the first frequency of occurrence of the vocabulary of the document in the resources of the document to which the vocabulary belongs is F _o In particular, the first frequency F may be determined according to the following equation _o ：

Wherein T is the total number of the document words contained in the document to which the document words belong.

S123, determining a second frequency of the document vocabulary modified or connected by other document vocabularies in the document resources according to the times of the document vocabulary modified or connected by other vocabularies in the document resources and the total number of the document vocabularies contained in the document resources.

Note that the second frequency of the vocabulary of the document being modified or concatenated by the vocabulary of other documents in the document resource to which it belongs is F _c In particular, the second frequency F may be determined according to the following equation _c ：

And S124, determining the membership degree of the document vocabulary according to the weighting result of the first frequency and the second frequency based on the weight values corresponding to the first frequency and the second frequency.

In the embodiment of the invention, in order to improve the accuracy of membership calculation, a manual correction value is introduced in the membership calculation process and is marked as V E [0,1], and the specific numerical value can be obtained according to manual experience.

In this step, for any document vocabulary, the membership degree L of the document vocabulary may be determined according to the following formula:

where ω 1 and ω 2 are the weight of the first frequency of occurrence of the document vocabulary in the document resource and the second weight of the frequency of modification or concatenation of the document vocabulary by other vocabularies in the document resource.

And S13, aiming at each literature resource, determining a key vocabulary of the literature resource according to the membership degree of the document vocabulary contained in the literature resource.

In this step, for each document resource, a plurality of document vocabularies capable of representing the document resource may be selected as key vocabularies in the order from high membership to low membership of the document vocabularies included in the document resource.

And S14, aiming at each key vocabulary, establishing a corresponding relation between the key vocabulary and the corresponding membership degree thereof and the mapping address of the original document resource to which the key vocabulary belongs to obtain the data record corresponding to the key vocabulary.

In specific implementation, the data record corresponding to each key word may be expressed as:

A＝{e _i l _i L(e _i )，e _i ∈U，i＝1，2，...}，e _i l _i L(e _i ) Representing complex elements in the table for document resources, e _i Is a key vocabulary which can represent the meaning of the whole document literature resource and is extracted from the document vocabulary domain U. l _i As a key word e _i Mapping addresses to native document resource pools, L (e) _i ) As a key word e _i Degree of membership.

And S15, determining data records corresponding to the key vocabularies of the original document resources to form a document resource representation table.

In specific implementation, the document resource representation table may further include fields such as description information of document resources. For document-type document resources, the descriptive information may be a document title.

In one embodiment, the document resource representation table structure is shown in Table 1:

TABLE 1

Key words and phrases	Resource mapping address of belonged document	Document resource description information	Degree of membership
				Keyl	URL1	D1	Ll
Key2	URL1	D1	L2
				Key3	URL2	D2	L3
……	……	……	……
				Keyn	URLn	Dn	Ln

For video resources, in the embodiment of the invention, firstly, the features of the video resources are extracted, and semantic description is carried out according to the extracted features to obtain the video resource description information. It should be noted that, in order to reduce redundancy of video resource description information and reduce storage resource overhead, in specific implementation, the video resource may be subjected to feature extraction by lenses, and the video resource description information is semantically described based on the extracted features, so that the obtained video resource description information may represent specific content of the video resource. After the video resource description information is obtained, a literature resource representation table corresponding to the video resource is established based on the video resource description information through the process shown in fig. 2, and a specific implementation process of the table can refer to fig. 2, which is not described herein again.

For picture resources, a picture resource representation table can be established by the same method as that for video resources, that is, feature extraction is performed on the picture resources first, semantic description is performed according to the extracted features to obtain picture resource description information, and then a document resource representation table corresponding to the picture resources is established based on the picture resource description information through the process shown in fig. 2.

For the voice resource, firstly, voice recognition is performed on the voice resource to obtain text data, and a literature resource representation table corresponding to the voice resource is established through the process shown in fig. 2 based on the text data corresponding to the voice resource, and a specific implementation process of the text resource representation table may refer to fig. 2, which is not described herein again.

In some embodiments, the document resource representation table may also contain fields for resource types to satisfy the user's requirements for different types of resource retrieval.

Through the above process, a document resource representation table containing document resources, video document resources and voice document resources can be established, and in order to improve the response speed of the retrieval result, in the embodiment of the invention, the document resource representation table can be classified according to the type of each document resource, so that the retrieval can be performed in the document resource representation classification table according to the retrieval statement submitted by the user, thereby reducing the retrieval range and improving the response speed of the retrieval result.

Based on the above-mentioned established literature resource representation table, the embodiment of the present invention provides a knowledge retrieval method based on artificial intelligence, as shown in fig. 3, including the following steps:

and S31, receiving a retrieval request submitted by a user.

When the retrieval request is carried, the retrieval request can be submitted according to the problem to be solved by the user when the user has the retrieval requirement. For ease of understanding, the example will be described in which the user enters the search sentence "why the microwave oven does not need to be rotated" in the search engine.

And S32, extracting knowledge based on the retrieval statement to obtain entity information in the retrieval statement.

Wherein the entity information comprises at least one of: entities, entity relationships, and entity attributes.

For the search statement submitted by the user in step S31, the entity information extracted by the knowledge extraction method is as follows: "microwave", "no need" and "rotary". Through the steps, the retrieval sentences can be decomposed into a plurality of vocabularies.

And S33, carrying out concept establishment according to the obtained entity information to obtain a query vocabulary.

In this step, each vocabulary obtained in step S32 is subjected to concept building to obtain a near-synonym or synonym of each vocabulary, for example, the rotation may be expanded to rotation, inversion, torsion, etc., and the original vocabulary obtained in step S32 and the expanded vocabulary obtained in step S33 are used as query vocabularies. Continuing with the above example, the query term may be expressed as "microwave," don't require, "" rotate, "" turn, "" flip, "and" turn.

And S34, searching a target key vocabulary matched with the query vocabulary from a pre-established literature resource representation table.

In the specific implementation, in this step, the query vocabulary obtained in step S33 is used to search the target key vocabulary matched with the query vocabulary in the table 1 or the document resource expression classification table. In this example, the target Key words matched by the query words in table 1 are "Key1", "Key3" … …, as an example.

And S35, extracting and presenting the data record containing the target key vocabulary from the document resource representation table.

In this step, data records containing "Key1" and "Key3" … … are extracted from table 1 and presented to the user, as shown in table 2:

TABLE 2

Key words and phrases	To which it belongsDocument resource mapping addresses	Document resource description information	Degree of membership
				Key1	URL1	D1	L1
Key3	URL2	D2	L3
				……	……	……	……

The first embodiment provides a concept-based abstract retrieval method, which is different from the existing keyword retrieval method based on definite meaning, so that the document data which can solve the user problem but does not contain the keywords in the retrieval statement can be retrieved; in addition, according to the retrieval method provided by the embodiment of the invention, the media document resources can be provided, and the existing retrieval method can only retrieve structured and semi-structured text data and cannot retrieve multimedia unstructured data such as pictures, videos and sounds, so that the precision ratio and recall ratio of knowledge retrieval are improved.

Example two

In specific implementation, when the retrieval result obtained in the first embodiment is presented, in order to display the search result closer to the retrieval intention of the user in front and reduce the time for the user to search for the document resources capable of solving the problem, the second embodiment performs sorting display on each document resource obtained by retrieval according to the distance between each document resource and the retrieval intention of the user on the basis of the retrieval result obtained in the first embodiment. In the embodiment of the present invention, the user search intention may be represented by the membership degree of the query vocabulary obtained in step S33, and the main content of each document resource may be represented by the membership degree of the keyword vocabulary.

In the step S124, the membership degree of the key vocabulary of the document resource is introduced, and the calculation method of the membership degree of the query vocabulary is the same as the calculation method of the membership degree of the key vocabulary in principle, as shown in fig. 4, for each query vocabulary, the membership degree of the query vocabulary can be calculated according to the following procedure:

L＝ω ₃ *F _a +ω ₄ *F _b

wherein, F _a A third frequency, F, representing the occurrence of the query term in the search sentence _b Representing a fourth frequency with which the query term is modified or concatenated by other terms in the search term. Omega ₃ And ω ₄ Respectively represent F _a And F _b The weight of (a) may be set according to actual needs or may be set according to empirical values in implementation. It should be noted that, in the embodiment of the present invention, the query vocabulary includes an expanded vocabulary of a native vocabulary extracted from the search term, so when the third frequency is calculated, the number of times of occurrence of the query vocabulary may be counted as the number of times of occurrence of the native vocabulary extracted from the search term and the expanded vocabulary thereof, and similarly, when the fourth frequency is calculated, the number of times of modification or connection of the query vocabulary in the search term by other vocabularies may be counted as the number of times of modification or connection of the native vocabulary and the expanded vocabulary thereof included in the search term.

By the method, the membership degree of the key vocabulary and the membership degree of the query vocabulary can be obtained, and based on the determined membership degree of the query vocabulary and the membership degree of the key vocabulary, in the embodiment of the invention, the retrieval result can be displayed in a sequence according to the flow shown in fig. 4:

and S41, determining the membership degree corresponding to each query word to form a query word vector based on the retrieval statement for each query word.

In particular, the query vocabulary vector may be represented as q = (L (q) ₁ )，L(q ₂ )，L(q ₃ )，...，L(q _n )) ^T ，q _i N, is the query key vocabulary, l (q) 1,2 _i ) Is q _i Degree of membership.

And S42, aiming at each query vocabulary, establishing a target key vocabulary vector corresponding to the query vocabulary according to the membership degree of the query vocabulary and each target key vocabulary matched in the literature resource expression table, and forming a key vocabulary matrix by the target key vocabulary vector corresponding to each query vocabulary.

In the target key vocabulary vector, the vector component value corresponding to the target key vocabulary matched with the query vocabulary is the membership degree of the key vocabulary, and the vector component value corresponding to the key vocabulary not matched with the query vocabulary is 0.

For example, for query vocabulary q ₁ In other words, its corresponding target key vocabulary vector may be represented as [ L (r) ₁₁ )，0，L(r ₁₃ )，0，0，0......L(r _m1 )]Where m represents the number of key terms contained in the document resource representation table. In specific implementation, the query vocabulary q is utilized _i Matching the target key vocabulary in the document resource expression table and searching the vocabulary q _i And the key vocabulary r _m Under the condition of matching, the corresponding vector component value in the key vocabulary vector is the membership degree of the matched key vocabulary, and the query vocabulary q is _i And the key vocabulary r _m If the key vocabulary vector is not matched, the corresponding vector component value in the key vocabulary vector is 0. Accordingly, the key vocabulary matrix may be represented as

r _ij For querying a vocabulary q _i Corresponding jth key word, L (r) _ij ) Is a key word r _ij Is the number of query terms, n. One column in R represents key words formed by membership degrees of key words matched with a certain query wordAnd (5) vector quantity.

In one embodiment, r _ij The selection can be based on the following principle:

R＝{r _ij ，L(q _i )≤L(r _ij )，r _ij is e.g. L and r _ij ＝q _i }，q _i E q, where L (q) _i ) Is to inquire the degree of membership of a vocabulary, L (r) _ij ) Is the degree of membership of the key vocabulary, L is a representation table of the document resources, and the key vocabulary matrix R is the query matching result.

For example, the key vocabulary vector may be represented as:

and S43, respectively determining the distance between the query vocabulary vector and each target key vocabulary vector.

In one embodiment, the distance between the query vocabulary vector and the target key vocabulary vector may be characterized by a cosine distance. Note r _j J =1,2, which is the j-th row of R, n, i.e., the key vocabulary vector corresponding to the membership of the key vocabulary that matches the j-th query vocabulary. In the embodiment of the invention, the cosine distance between the query vocabulary vector and the target key vocabulary vector can be calculated according to the following formula:

wherein q is _k Is the kth component of the query vocabulary vector q, r _kj Is a key vocabulary vector r _k The j component of (a), k ∈ [1,n [ ]]，j∈[1，m]，θ _j Is the vector angle.

And S44, sequencing each target key word vector contained in the key word matrix according to the determined distance.

In this step, i.e. according to cos (θ) _j ) Arranged in descending order of r _k 。

For the key vocabulary vector r _k Each element containedIn the embodiment of the present invention, L (r) can be expressed _ij ) The vector r 'is arranged in descending order, and the new vector r' is the ordering result.

It should be noted that for the key vocabulary vector r _k The ranking of each element included according to the membership degree is only one implementation manner of the embodiment of the present invention, and in the specific implementation, the ranking may also be performed according to the access amount, the publishing time, and the like of the corresponding document resource, which is not limited in the embodiment of the present invention.

In specific implementation based on the obtained ranking result, in step S35, data records including each target key word corresponding to each target key word vector may be extracted and presented according to the ranking result of each target key word vector.

Taking the Key words matched with the query word 1 as Key1, key2, key3, key4 and Key5, and the Key words matched with the query word 2 as Key6, key7, key8, key9 and Key10, respectively, in the specific implementation, the cosine distances calculated according to the step S43 are cos (θ) respectively ₁ ) And cos (θ) ₂ ) And cos (θ) ₁ ) Less than cos (theta) ₂ ) Then, the data records extracted according to the sorting result obtained in step S44 are shown in table 3:

TABLE 3

Key words and phrases	Resource mapping address of belonged document	Document resource description information	Degree of membership
				Key6	URL5	D4	L6
Key7	URL4	D4	L7
				Key8	URL6	D3	L8
Key9	URL9	D6	L9
				Key10	URL9	D6	L10
Key\|	URL1	D1	L1
				Key2	URL1	D1	L2
Key3	URL2	D2	L3
				Key4	URL3	D5	L4
Key5	URL7	D7	L5

It should be noted that, if the extracted data records contain the same URL address, the duplicate records are deleted and one record is reserved.

Further, if the key words matching the query word 2 have a membership ranking of L8 < L6 < L7 < L9 < L10 and the key words matching the query word 1 have a membership ranking of L5 < L1 < L4 < L2 < L3, the extracted data records are shown in Table 4:

TABLE 4

Key words and phrases	Resource mapping address of belonged document	Document resource description information	Degree of membership
				Key10	URL9	D6	L10
Key9	URL9	D6	L9
				Key7	URL4	D4	L7
Key6	URL5	D4	L6
				Key8	URL6	D3	L8
Key3	URL2	D2	L3
				Key2	URL1	D1	L2
Key4	URL3	D5	L4
				Key1	URL1	D1	L1
Key5	URL7	D7	L5

Based on the presented search results, the user can view the original document resource through the URL.

According to the artificial intelligence-based retrieval method provided by the second embodiment, retrieval results obtained by the second embodiment can be displayed in a sorted manner according to the real intention of the user, so that the time for the user to locate required document resources is reduced, and the retrieval efficiency is improved.

EXAMPLE III

In order to meet the requirement of multi-language literature resource retrieval of a user, the third embodiment can translate the query vocabulary into the query vocabulary of different languages by adopting a machine translation mode after the query vocabulary is obtained on the basis of the first embodiment and the second embodiment, and then can search the target keyword vocabulary of the corresponding language matched with the query vocabulary of different languages from the pre-established literature resource expression table, thereby realizing cross-language knowledge retrieval, meeting the retrieval requirement of different languages of the user and further improving the recall ratio of the knowledge retrieval.

Based on the same technical concept, the embodiment of the application also provides a knowledge retrieval device based on artificial intelligence, and because the principle of solving the problems of the device is similar to the knowledge retrieval method based on artificial intelligence, the implementation of the device can be referred to the implementation of the method, and repeated parts are not repeated.

As shown in fig. 5, which is a schematic structural diagram of a knowledge retrieval apparatus based on artificial intelligence according to an embodiment of the present invention, the apparatus includes:

a receiving unit 51, configured to receive a retrieval request submitted by a user, where the retrieval request carries a retrieval statement;

an obtaining unit 52, configured to perform knowledge extraction based on the search statement, and obtain entity information in the search statement, where the entity information includes at least one of the following: entities, entity relationships, and entity attributes;

a concept establishing unit 53, configured to perform concept establishment according to the obtained entity information to obtain a query vocabulary;

a matching unit 54, configured to search a target key vocabulary matched with the query vocabulary from a pre-established literature resource expression table, where the literature resource expression table includes data records of mapping address information of the key vocabulary and the original literature resource to which the key vocabulary belongs;

and the presentation unit 55 is used for extracting the data record containing the target key vocabulary from the literature resource representation table and presenting the data record.

the device, still include:

In one embodiment, the mapping unit is specifically configured to count, for each document vocabulary, the number of times that the document vocabulary appears in the corresponding document and the number of times that the document vocabulary is modified or connected by other vocabularies in the corresponding document resource; determining a first frequency of the document vocabulary appearing in the document resource according to the frequency of the document vocabulary appearing in the document and the total number of the document vocabulary contained in the document resource; determining a second frequency of modification or connection of the document vocabulary by other document vocabularies in the document resources according to the times of modification or connection of the document vocabulary by other vocabularies in the document resources and the total number of the document vocabularies contained in the document resources; and determining the membership degree of the document vocabulary according to the weighting results of the first frequency and the second frequency based on the weight values corresponding to the first frequency and the second frequency.

In one embodiment, the apparatus further comprises a sorting unit, wherein:

the sequencing unit is used for determining the membership degree corresponding to each query vocabulary to form a query vocabulary vector based on the retrieval sentences for each query vocabulary before the presentation unit presents the data records containing the target keywords; respectively extracting the membership degree of each target key vocabulary matched with each query vocabulary from the literature resource expression table to form a key vocabulary vector, and forming a key vocabulary matrix by the key vocabulary vector corresponding to each query vocabulary; respectively determining the distance between the query vocabulary vector and each target key vocabulary vector; sequencing each target key vocabulary vector contained in the key vocabulary matrix according to the determined distance;

In one embodiment, the original document resource comprises a video resource;

the apparatus further comprises a feature extraction unit, wherein:

the mapping unit is specifically configured to, for a video resource, perform knowledge extraction on video resource description information corresponding to the video resource to obtain a document vocabulary.

In one embodiment, the original document resource comprises a speech resource;

the apparatus further comprises a speech recognition unit, wherein:

the voice recognition unit is used for carrying out voice recognition on voice resources to obtain text data;

the apparatus further comprises a translation unit, wherein:

For convenience of description, the above parts are described separately by dividing the functional modules into modules (or units). Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware in the practice of the invention.

Having described the artificial intelligence based knowledge retrieval method and apparatus of an exemplary embodiment of the present invention, a computing apparatus according to another exemplary embodiment of the present invention is next described.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. In some possible embodiments, a computing device according to the present invention may include at least one processor and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the artificial intelligence based knowledge retrieval method according to various exemplary embodiments of the present invention described above in this specification. For example, the processor may execute step S31 shown in fig. 3, receiving a retrieval request submitted by a user; s32, extracting knowledge based on the retrieval statement to obtain entity information in the retrieval statement; s33, carrying out concept establishment according to the obtained entity information to obtain a query vocabulary; step S34, searching a target key vocabulary matched with the query vocabulary from a pre-established literature resource expression table; and step S35, extracting and presenting the data record containing the target key vocabulary from the literature resource representation table.

In some possible embodiments, the aspects of the artificial intelligence based knowledge retrieval method provided by the present invention can also be implemented in a form of a program product, which includes program code for causing a computer device to execute the steps of the artificial intelligence based knowledge retrieval method according to various exemplary embodiments of the present invention described above in this specification when the program product runs on the computer device, for example, the computer device can execute the step S31 shown in fig. 3, receive a retrieval request submitted by a user; s32, extracting knowledge based on the retrieval statement to obtain entity information in the retrieval statement; s33, carrying out concept establishment according to the obtained entity information to obtain a query vocabulary; step S34, searching a target key vocabulary matched with the query vocabulary from a pre-established literature resource expression table; and step S35, extracting and presenting the data record containing the target key vocabulary from the literature resource representation table.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A program product for XX in embodiments of the present invention may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A knowledge retrieval method based on artificial intelligence is characterized by comprising the following steps:

2. The method of claim 1, wherein the document resource representation further includes membership degrees corresponding to each key vocabulary;

the literature resource representation table is established according to the following procedures:

extracting knowledge from each original document resource to obtain a document vocabulary;

3. The method of claim 2, wherein determining, for each document vocabulary, a degree of membership of the document vocabulary based on a first frequency of occurrence of the document vocabulary in the document resource and a second frequency of modification or concatenation of the document vocabulary by other document vocabularies in the document resource comprises:

determining a second frequency of the document vocabulary modified or connected by other document vocabularies in the affiliated document resource according to the times of the document vocabulary modified or connected by other vocabularies in the affiliated document resource and the total number of the document vocabularies contained in the affiliated document resource;

4. The method of claim 1,2 or 3, further comprising, prior to presenting the data records containing the target keyword:

for each query vocabulary, determining membership degrees corresponding to the query vocabularies to form query vocabulary vectors based on the retrieval sentences; and

aiming at each query vocabulary, establishing a target key vocabulary vector corresponding to the query vocabulary according to the membership degree of the query vocabulary and each target key vocabulary matched in the literature resource expression table, wherein the target key vocabulary vector corresponding to each query vocabulary forms a key vocabulary matrix, the vector component value corresponding to the target key vocabulary matched with the query vocabulary in the target key vocabulary vector is the membership degree of the key vocabulary, and the vector component value corresponding to the key vocabulary unmatched with the query vocabulary is 0;

5. The method of claim 4, wherein for each target key word corresponding to the target key word vector, extracting and presenting data records of each target key word according to the following method:

6. The method of claim 2, wherein the original document resource comprises a video resource;

7. The method of claim 2, wherein the original document resource comprises a voice resource;

carrying out voice recognition on voice resources to obtain text data; and

8. The method of claim 1, wherein the original document resources comprise original document resources of different languages; and

9. A knowledge retrieval apparatus based on artificial intelligence, comprising:

and the presentation unit is used for extracting and presenting the data record containing the target key vocabulary from the literature resource representation table.

10. A computing device comprising at least one processor and at least one memory, wherein the memory stores a computer program and the processor, when reading the computer program from the memory, performs the method of any one of claims 1 to 8.

11. A computer-readable storage medium having computer-executable instructions stored thereon for causing a computer to perform the method of any one of claims 1 to 8.