WO2021043088A1

WO2021043088A1 - File query method and device, and computer device and storage medium

Info

Publication number: WO2021043088A1
Application number: PCT/CN2020/112336
Authority: WO
Inventors: 钱克功; 沈网中
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-09-03
Filing date: 2020-08-30
Publication date: 2021-03-11
Also published as: CN110674087A

Abstract

A file query method, which comprises: acquiring a collection file set of a client, creating a service description of the collection file set in a file system, and storing the collection file set with the created service description into a cloud storage (S1); performing keyword extraction on the service description through a keyword extraction algorithm to obtain keywords of the service description, converting the keyword into a word vector and then storing the word vector (S2); receiving query content input by a user, and calculating a similarity between the query content and the word vector (S3); and selecting a corresponding service description according to the similarity, querying the collection file in the cloud storage in a multi-policy retrieval mode, and returning a query result to the user (S4). The method realizes accurate file query.

Description

File query method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 3, 2019, the application number is CN201910829794.5, and the invention title is "File query method, device and computer readable storage medium". The entire content of the patent application is approved The reference is incorporated in this application.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a file query method, device, computer equipment and storage medium.

Background technique

With the development of technology, the amount of information has exploded, and more and more files need to be stored in the user's computer. The computer's file system is responsible for creating files for users, and controlling file access by storing, reading, modifying, and dumping files. When users no longer use files, they can revoke, delete files, etc., so the file system of the computer can support the storage of massive files. However, the inventor realizes that for users, facing a large number of files, it takes a certain amount of time and energy to retrieve the target file. At present, there is no related technology or product in the industry that can perform fast file query.

Summary of the invention

This application provides a file query method, device, computer equipment and storage medium.

A document query method provided by this application includes:

Acquire the collection file set of the client, create a service description of the collection file set in the file system, and store the collection file set after the service description is created in cloud storage;

Perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, convert the keywords into word vectors, and store the word vectors;

Receiving the query content input by the user, and calculating the similarity between the query content and the word vector;

Select the corresponding business description according to the similarity, query the cloud storage for the favorite files through a multi-strategy retrieval method, and return the query result to the user.

In addition, this application also provides a file query device, which includes:

The service description creation module is used to obtain the collection file set of the client, create the service description of the collection file set in the file system, and store the collection file set after the service description is created in the cloud storage;

The keyword extraction module is configured to perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, and convert the keywords into word vectors and then store the word vectors;

The similarity calculation module is used to receive the query content input by the user, and calculate the similarity between the query content and the word vector;

The query module is configured to select the corresponding business description according to the similarity, query the cloud storage for the favorite files through a multi-strategy retrieval method, and return the query result to the user.

In addition, the present application also provides a computer device that includes a memory and a processor, the memory stores a file query program that can be run on the processor, and the file query program is executed by the processor. When implementing the following steps:

In addition, this application also provides a computer-readable storage medium having a file query program stored on the computer-readable storage medium, and the file query program can be executed by one or more processors to implement the following steps:

Description of the drawings

FIG. 1 is a schematic flowchart of a file query method provided by an embodiment of this application;

2 is a schematic diagram of the internal structure of a computer device provided by an embodiment of the application;

FIG. 3 is a schematic diagram of modules of a file query device provided by an embodiment of the application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

detailed description

It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.

This application provides a file query method. Referring to FIG. 1, it is a schematic flowchart of a file query method provided by an embodiment of this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.

In this embodiment, the file query method includes:

S1. Acquire the collection file set of the client, create a service description of the collection file set in the file system, and store the collection file set after the service description is created in cloud storage.

In a preferred embodiment of the present application, the client is also called a client, which refers to a program that corresponds to a server and provides local services to the client. The collection of the client’s collection of files is obtained in the following two ways: mode one, traversing and searching from the client’s local disk to obtain the collection of collections; mode two, using keywords from the search engine according to the needs of the user The collection of files is obtained by searching.

The cloud storage refers to a mode of online online storage (Cloud storage), that is, storing data on multiple virtual servers usually hosted by a third party instead of a dedicated server.

Preferably, the file system in this application is Hadoop Distributed File System (HDFS). The HDFS has high fault tolerance and can be deployed on low-cost hardware. At the same time, the HDFS relaxes the requirement for a portable operating system interface so that it can access file data in the form of streams, thereby providing high throughput. Access to application data is suitable for applications with large data sets.

In detail, the HDFS is composed of a NameNode (master node) and n DataNodes (slave nodes). The NameNode is mainly responsible for managing the file namespace and the master server for client access, and the DataNode is responsible for storing files. To manage. The preferred embodiment of the present application creates the service description of the collection file set in the master node of the HDFS file system.

Further, the business description refers to a brief summary of the content of the collection file set, and can also be expressed as the name of the collection file set. In the preferred embodiment of the present application, a plurality of different files are established in the master node of the Hadoop. Service description, and set up several slave nodes under the master node to store the corresponding collection files of the service description, so the corresponding collection files of the service description can be realized through the retrieval of the service description Query.

S2. Perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, convert the keywords into word vectors, and store the word vectors.

In a preferred embodiment of the present application, the performing keyword extraction on the business description through a keyword extraction algorithm includes:

Perform word segmentation operations on the business description;

Calculate the dependency correlation degree of any two words W _i and W _{j in the business description:}

Among them, Dep(W _i , W _j ) represents the degree of dependency relationship between _{the words W i} and W _j _{, len(W i} , W _j ) represents the length of the dependency path between _{the words W i} and W _{j, and b is} Hyperparameter

Calculate the gravitational forces of the words W _i and W _j:

Among them, f _grav (W _i , W _j ) represents the _{gravitational forces of the words W i} and W _j , tfidf(W _i ) represents the TF-IDF value of the _{word W i} _{, and tfidf(W j} ) represents _{the TF-IDF of the word W j} IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors _{of words W i} and W _j;

According to the calculated dependency correlation degree and the gravity, the correlation strength between _{the words W i} and W _{j is:}

weight(W _i ,W _j )=Dep(W _i ,W _j )*f _grav (W _i ,W _j )

Binding strength of the association degree of importance of the word W _i is calculated scores:

among them,

W _i is associated with a set of vertices, η is the damping coefficient;

Preferably, the present application selects t words with the highest scores as keywords for the business description according to the importance score of the word.

Further, this application uses a one-hot algorithm to convert keywords into word vectors for representation. The one-hot representation algorithm is a basic method of vector representation of words. It is similar to the idea of bag-of-words model. A dictionary is constructed by extracting all the words in the corpus, and each word in the dictionary is represented by a word vector. The dimension of the word vector is equal to the dictionary scale, and only the value of the dimension corresponding to the current word in the vector is 1, and the values of the other dimensions are all 0. Therefore, this application converts the dimension of the keyword of the business description to 1. The dimension of the remaining words is 0, so that the keyword can be converted into a word vector representation.

S3. Receive the query content input by the user, and calculate the similarity between the query content and the word vector.

The preferred embodiment of the present application calculates the similarity between the query content and the word vector by using the cosin method (cosine similarity). The cosine similarity is to use the cosine value of the angle between two vectors in the vector space as a measure of the size of the difference between two individuals. When the cosine value of the cosine similarity is closer to 1, it indicates that the two vectors are The closer the angle between them is to 0 degrees, the more similar the two vectors are. Wherein, the calculation formula of the cosine similarity is as follows:

Wherein, X represents the word vector, Y represents the query content, and the similarity range of the cosine value of the cosine similarity is -1 to 1: when the cosine value is -1, it means that the query content is The direction pointed by the word vector is exactly opposite, indicating that the similarity between the query content and the word vector is 0, and when the cosine value is 1, it means that the query content and the direction pointed by the word vector are exactly the same , It means that the similarity between the query content and the word vector is 100%, and when the cosine value is 0, it means that the query content and the word vector are independent, indicating that the query There is moderate similarity or dissimilarity between the content and the word vector. This application obtains the similarity between the query content and the word vector according to the cosine value.

S4. Select the corresponding business description according to the similarity, query the cloud storage for the favorite files through a multi-strategy retrieval method, and return the query result to the user.

The multi-strategy search method in the preferred embodiment of the present application includes Levenstein Distance (LD). When the user enters the query content, the similarity calculation method is used to compare with the business description of the favorite file in the cloud storage to determine whether it matches. If there is a match, return the favorite file to the user directly; if it does not match, calculate the similarity between the query content entered by the user and the keywords of the business description in the favorite file, and the preset threshold is 0.8, and the similarity result is greater than The collection file corresponding to the service description with the preset threshold is used as a query result and returned to the user.

Further, when none of the similarity results are greater than a preset threshold, this application uses the LD to calculate the similarity between the query content input by the user and the character string in the service description of the favorite file. In detail, this application presets that the original character string in the query content input by the user is m, the service description target character string of the collection file is n, and it is necessary to record that the original character string m is transformed into the target character string n The number of edits L for deleting, inserting, and replacing operations, and the L of the two strings m and n is recorded as lev _m,n (|m|,|n|), where |m|,|n| are characters respectively The length of string m, n is apart. Wherein, when L is larger, the similarity of character strings is lower. Therefore, this application selects the corresponding collection file with the smallest L value as the query result and returns it to the user.

The invention also provides a computer device. Referring to FIG. 2, it is a schematic diagram of the internal structure of a computer device provided by an embodiment of this application.

In this embodiment, the computer device 1 may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, or a portable computer, or a server. The computer device 1 at least includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.

The memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like. The memory 11 may be an internal storage unit of the computer device 1 in some embodiments, such as a hard disk of the computer device 1. In other embodiments, the memory 11 may also be an external storage device of the computer device 1, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (SD) equipped on the computer device 1. Card, Flash Card, etc. Further, the memory 11 may also include both an internal storage unit of the computer device 1 and an external storage device. The memory 11 can be used not only to store application software and various data installed in the computer device 1, such as the code of the file query program 01, etc., but also to temporarily store data that has been output or will be output.

In some embodiments, the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip, for running program codes or processing stored in the memory 11 Data, for example, execute file query program 01, etc.

The communication bus 13 is used to realize the connection and communication between these components.

The network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the computer device 1 and other electronic devices.

Optionally, the computer device 1 may also include a user interface. The user interface may include a display (Display) and an input unit such as a keyboard (Keyboard). The optional user interface may also include a standard wired interface and a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the computer device 1 and to display a visualized user interface.

Figure 2 only shows the computer device 1 with the components 11-14 and the file query program 01. Those skilled in the art can understand that the structure shown in Figure 1 does not constitute a limitation on the computer device 1, and may include a comparison chart. Show fewer or more components, or combinations of certain components, or different component arrangements.

In the embodiment of the computer device 1 shown in FIG. 2, the file query program 01 is stored in the memory 11; when the processor 12 executes the file query program 01 stored in the memory 11, the following steps are implemented:

Step 1: Obtain the collection of collection files of the client, create a service description of the collection of files in the file system, and store the collection of collection files after the service description is created in cloud storage.

In the preferred embodiment of the present application, the client is also called the client, which refers to the program corresponding to the server and providing local services to the client. The collection of the client’s collection of files is obtained in the following two ways: mode one, traversing and searching from the client’s local disk to obtain the collection of collections; mode two, using keywords from the search engine according to the needs of the user The collection of files is obtained by searching.

Step 2: Perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, convert the keywords into word vectors, and store the word vectors.

The service description for word operations; dependence of the degree of association calculated service described in any two words W _i and W _j of:

Calculate the gravitational forces of the words W _i and W _j:

weight(W _i ,W _j )=Dep(W _i ,W _j )*f _grav (W _i ,W _j )

among them,

W _i is associated with a set of vertices, η is the damping coefficient;

Step 3: Receive the query content input by the user, and calculate the similarity between the query content and the word vector.

Step 4: Select the corresponding business description according to the similarity, query the cloud storage for the favorite files through a multi-strategy retrieval method, and return the query result to the user.

The multi-strategy search method in the preferred embodiment of the present application includes Levenstein Distance (LD). When the user enters the query content, the similarity calculation method is used to compare with the service description of the favorite file in the cloud storage to determine whether it matches. If there is a match, return the favorite file to the user directly; if it does not match, calculate the similarity between the query content entered by the user and the keywords of the business description in the favorite file, and the preset threshold is 0.8, and the similarity result is greater than The collection file corresponding to the service description with the preset threshold is used as a query result and returned to the user.

Further, when none of the similarity results are greater than a preset threshold, this application uses the LD to calculate the similarity between the query content input by the user and the character string in the service description of the favorite file. In detail, this application presets that the original character string in the query content entered by the user is m, the service description target character string of the collection file is n, and it is necessary to record that the original character string m is transformed into the target character string n The number of edits L for deleting, inserting, and replacing operations, and the L of the two strings m and n is recorded as lev _m,n (|m|,|n|), where |m|,|n| are characters respectively The length of string m, n is apart. Wherein, when L is larger, the similarity of character strings is lower. Therefore, this application selects the corresponding collection file with the smallest L value as the query result and returns it to the user.

3, a schematic diagram of modules in an embodiment of the document query device of this application. In this embodiment, the document query program includes a business description creation module 10, a keyword extraction module 20, a similarity calculation module 30, and a query The module 40 exemplarily:

The service description creation module 10 is used to obtain the collection file set of the client, create the service description of the collection file set in the file system, and store the collection file set after the service description is created in cloud storage.

The keyword extraction module 20 is configured to: perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, and convert the keywords into word vectors and then store the word vectors .

The similarity calculation module 30 is configured to receive the query content input by the user, and calculate the similarity between the query content and the word vector.

The query module 40 is configured to select a corresponding business description according to the similarity, query the cloud storage for favorite files through a multi-strategy retrieval method, and return the query result to the user.

The functions or operation steps implemented when the program modules such as the text service description creation module 10, the keyword extraction module 20, the similarity calculation module 30, and the query module 40 are executed are substantially the same as those in the foregoing embodiment, and will not be repeated here.

In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile, and a file query program is stored on the computer-readable storage medium. The file query program can be executed by one or more processors to achieve the following operations:

The specific implementation of the computer-readable storage medium of this application is basically the same as the above-mentioned file query device and method embodiments, and will not be repeated here.

It should be noted that the serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. And the terms "include", "include" or any other variants thereof in this article are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, but also includes those elements that are not explicitly included. The other elements listed may also include elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.

Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including a number of instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

A file query method, wherein the method includes:

Acquire the collection file set of the client, create a service description of the collection file set in the file system, and store the collection file set after the service description is created in cloud storage;

Perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, convert the keywords into word vectors, and store the word vectors;

Receiving the query content input by the user, and calculating the similarity between the query content and the word vector;

Select the corresponding business description according to the similarity, query the cloud storage for the favorite files through a multi-strategy retrieval method, and return the query result to the user.
5. The file query method according to claim 1, wherein said obtaining the collection of files of the client terminal comprises:

Traverse and retrieve from the local disk of the client to obtain the collection of files; or

According to the needs of the user, the collection of documents is obtained by searching from the search engine by using keywords.
The file query method according to claim 1, wherein said performing keyword extraction on said business description through a keyword extraction algorithm comprises:

Perform word segmentation operations on the business description;

Calculate the dependency correlation degree of any two words W i and W j in the business description:

Among them, Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j , len(W i , W j ) represents the length of the dependency path between the words W i and W j, and b is Hyperparameter

Calculate the gravitational forces of the words W i and W j:

Among them, f grav (W i , W j ) represents the gravitational forces of the words W i and W j , tfidf(W i ) represents the TF-IDF value of the word W i , and tfidf(W j ) represents the TF-IDF of the word W j IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors of words W i and W j;

According to the calculated dependency correlation degree and the gravity, the correlation strength between the words W i and W j is:

weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )

Binding strength of the association degree of importance of the word W i is calculated scores:

among them,
W i is associated with a set of vertices, η is the damping coefficient;

The degree of importance of the selected word W i t score highest-scoring word as a keyword in the service description.
3. The file query method according to claim 1, wherein the calculation formula for the similarity between the query content and the word vector is:

Wherein, X represents the word vector, and Y represents the query content.
The file query method according to any one of claims 1 to 4, wherein the querying of the collection file from the cloud storage in a multi-strategy search mode includes:

Preset that the original character string in the query content input by the user is m, and the business description target character string of the collection file is n;

Recording the number of edit times L of deletion, insertion, and replacement operations required to transform the original character string m into the target character string n;

The corresponding favorite file with the smallest L value is selected as the query result and returned to the user.
The file query method according to claim 1, wherein said converting said keywords into word vectors comprises:

The one-hot representation algorithm is used to convert the keywords into word vectors for representation.
The file query method according to claim 1, wherein the file system is a Hadoop file system.
A computer device, wherein the device includes a memory and a processor, the memory stores a file query program that can be run on the processor, and when the file query program is executed by the processor, the following steps are implemented :

Acquire the collection file set of the client, create a service description of the collection file set in the file system, and store the collection file set after the service description is created in cloud storage;

Perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, convert the keywords into word vectors, and store the word vectors;

Receiving the query content input by the user, and calculating the similarity between the query content and the word vector;

Select the corresponding business description according to the similarity, query the cloud storage for the favorite files through a multi-strategy retrieval method, and return the query result to the user.
8. The computer device according to claim 8, wherein said acquiring the collection of collection files of the client terminal comprises:

Traverse and retrieve from the local disk of the client to obtain the collection of files; or

According to the needs of the user, the collection of documents is obtained from the search engine by using keywords.
8. The computer device according to claim 8, wherein said performing keyword extraction on said business description through a keyword extraction algorithm comprises:

Perform word segmentation operations on the business description;

Calculate the dependency correlation degree of any two words W i and W j in the business description:

Among them, Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j , len(W i , W j ) represents the length of the dependency path between the words W i and W j, and b is Hyperparameter

Calculate the gravitational forces of the words W i and W j:

Among them, f grav (W i , W j ) represents the gravitational forces of the words W i and W j , tfidf(W i ) represents the TF-IDF value of the word W i , and tfidf(W j ) represents the TF-IDF of the word W j IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors of words W i and W j;

According to the calculated dependency correlation degree and the gravity, the correlation strength between the words W i and W j is:

weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )

Binding strength of the association degree of importance of the word W i is calculated scores:

among them,
W i is associated with a set of vertices, η is the damping coefficient;

The degree of importance of the selected word W i t score highest-scoring word as a keyword in the service description.
8. The computer device according to claim 8, wherein the formula for calculating the similarity between the query content and the word vector is:

Wherein, X represents the word vector, and Y represents the query content.
11. The computer device according to any one of claims 8 to 11, wherein said querying said cloud storage for collection files in a multi-strategy search mode comprises:

Preset that the original character string in the query content input by the user is m, and the business description target character string of the collection file is n;

Record the number of edits L of the deletion, insertion, and replacement operations required to transform the original character string m into the target character string n;

The corresponding favorite file with the smallest L value is selected as the query result and returned to the user.
8. The computer device of claim 8, wherein said converting said keyword into a word vector comprises:

The one-hot representation algorithm is used to convert the keywords into word vectors for representation.
A file query device, wherein the device includes:

The service description creation module is used to obtain the collection file set of the client, create the service description of the collection file set in the file system, and store the collection file set after the service description is created in the cloud storage;

The keyword extraction module is configured to perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, and convert the keywords into word vectors and then store the word vectors;

The similarity calculation module is used to receive the query content input by the user, and calculate the similarity between the query content and the word vector;

The query module is configured to select the corresponding business description according to the similarity, query the cloud storage for the favorite files through a multi-strategy retrieval method, and return the query result to the user.
A computer-readable storage medium, wherein a file query program is stored on the computer-readable storage medium, and the file query program can be executed by one or more processors to implement the steps of the file query method described below :

Acquire the collection file set of the client, create a service description of the collection file set in the file system, and store the collection file set after the service description is created in cloud storage;

Perform keyword extraction on the business description through a keyword extraction algorithm to obtain keywords of the business description, convert the keywords into word vectors, and store the word vectors;

Receiving the query content input by the user, and calculating the similarity between the query content and the word vector;

Select the corresponding business description according to the similarity, query the cloud storage for the favorite files through a multi-strategy retrieval method, and return the query result to the user.
15. The computer-readable storage medium according to claim 15, wherein said acquiring the collection of collection files of the client comprises:

Traverse and retrieve from the local disk of the client to obtain the collection of files; or

According to the needs of the user, the collection of documents is obtained from the search engine by using keywords.
15. The computer-readable storage medium of claim 15, wherein said performing keyword extraction on said business description by a keyword extraction algorithm comprises:

Perform word segmentation operations on the business description;

Calculate the dependency correlation degree of any two words W i and W j in the business description:

Among them, Dep(W i , W j ) represents the degree of dependency relationship between the words W i and W j , len(W i , W j ) represents the length of the dependency path between the words W i and W j, and b is Hyperparameter

Calculate the gravitational forces of the words W i and W j:

Among them, f grav (W i , W j ) represents the gravitational forces of the words W i and W j , tfidf(W i ) represents the TF-IDF value of the word W i , and tfidf(W j ) represents the TF-IDF of the word W j IDF value, TF means word frequency, IDF means inverse document frequency index, d is the Euclidean distance between the word vectors of words W i and W j;

According to the calculated dependency correlation degree and the gravity, the correlation strength between the words W i and W j is:

weight(W i ,W j )=Dep(W i ,W j )*f grav (W i ,W j )

Binding strength of the association degree of importance of the word W i is calculated scores:

among them,
W i is associated with a set of vertices, η is the damping coefficient;

The degree of importance of the selected word W i t score highest-scoring word as a keyword in the service description.
15. The computer device according to claim 15, wherein the formula for calculating the similarity between the query content and the word vector is:

Wherein, X represents the word vector, and Y represents the query content.
18. The computer-readable storage medium according to any one of claims 15 to 18, wherein the querying of the collection file from the cloud storage in a multi-strategy retrieval manner comprises:

Preset that the original character string in the query content input by the user is m, and the business description target character string of the collection file is n;

Record the number of edits L of the deletion, insertion, and replacement operations required to transform the original character string m into the target character string n;

The corresponding favorite file with the smallest L value is selected as the query result and returned to the user.
15. The computer-readable storage medium of claim 15, wherein said converting said keywords into word vectors comprises:

The one-hot representation algorithm is used to convert the keywords into word vectors for representation.