CN116340395B - Equipment information retrieval method and system based on optimized retrieval conditions - Google Patents

Equipment information retrieval method and system based on optimized retrieval conditions Download PDF

Info

Publication number
CN116340395B
CN116340395B CN202310611896.6A CN202310611896A CN116340395B CN 116340395 B CN116340395 B CN 116340395B CN 202310611896 A CN202310611896 A CN 202310611896A CN 116340395 B CN116340395 B CN 116340395B
Authority
CN
China
Prior art keywords
keywords
category
scores
score
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310611896.6A
Other languages
Chinese (zh)
Other versions
CN116340395A (en
Inventor
高飞
徐伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Plaything Technology Co ltd
Original Assignee
Shenzhen Plaything Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Plaything Technology Co ltd filed Critical Shenzhen Plaything Technology Co ltd
Priority to CN202310611896.6A priority Critical patent/CN116340395B/en
Publication of CN116340395A publication Critical patent/CN116340395A/en
Application granted granted Critical
Publication of CN116340395B publication Critical patent/CN116340395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/2454Optimisation of common expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to the technical field of information retrieval, and particularly discloses a device information retrieval method and system based on optimized retrieval conditions, wherein the method comprises the following steps: performing word segmentation on the plurality of pieces of equipment information, removing duplication to obtain M keywords and dividing the M keywords into N categories; setting a corresponding score for each of the N categories of keywords, the scores of the different keywords being different: setting a score for each piece of equipment information according to a sum of scores of all keywords contained in each piece of equipment information; receiving search conditions for inquiring a plurality of pieces of equipment information; word segmentation is carried out on the search conditions to obtain m keywords; calculating the sum of the scores of the m keywords as the score of the search condition; from the plurality of pieces of device information, the device information having the same score as the search condition is queried, and a search result corresponding to the search condition is generated. Compared with the prior art, the method and the device can greatly reduce the complexity of equipment information retrieval and improve the retrieval efficiency.

Description

Equipment information retrieval method and system based on optimized retrieval conditions
Technical Field
The present invention relates to the field of information retrieval technology, and more particularly, to a method and system for retrieving device information based on optimized retrieval conditions by light.
Background
Network service providers (Internet Service Provider, abbreviated ISPs), that is, telecom operators that comprehensively provide internet access service, information service, and value added service to a wide range of users, and at the same time, serve as important equipment providers in the internet application service industry chain, to provide large-scale network service equipment, which means that staff needs to manage and maintain a large amount of equipment information.
According to the business needs, the staff often needs to search for a large amount of device information, for example, when the device belonging to "Guangdong telecom" needs to be searched out, the device information needs to be classified into 3 types (i.e. mobile, communication, telecom in China) by operators first, the device belonging to the telecom needs to be searched out, and then the device belonging to the Guangdong needs to be found out. Since there are 2 conditions, two searches are required, as shown in fig. 1. When the search conditions are more, more times of search are needed to obtain the search result, and the complexity and the efficiency of the search are too high.
Disclosure of Invention
In order to solve the technical problems, the application is provided to provide a device information retrieval method and a system based on optimized retrieval conditions, which can reduce the complexity of device information retrieval and improve the retrieval efficiency.
In a first aspect, the present invention provides a device information retrieval method based on an optimized retrieval condition, including: performing word segmentation on the plurality of pieces of equipment information, and then removing duplication to obtain M keywords; dividing the M keywords into N categories; setting a corresponding score for each keyword in the N categories of keywords, wherein the scores of different keywords are different:
or->
Wherein, after the scores of the ith category keyword in the N category keywords are ranked from low to high, the scores of the two adjacent keywords in the ith category keyword have a fixed difference value,/>Score representing the j-th keyword of the i-th category keywords,/for the j-th keyword>Score indicating j+1st keyword among ith category keywords, ++>Represents the maximum value in the i-th category keyword, < +.>Representing the minimum value in the i-th category keyword,/->The number of keywords representing the ith category keyword; setting a score for each piece of equipment information according to the sum of scores of all keywords contained in each piece of equipment information; receiving search conditions for inquiring the plurality of device information; word segmentation is carried out on the search conditions to obtain m keywords; calculating the sum of the scores of the m keywords as the score of the search condition; querying the device information which is the same as the retrieval condition score from the plurality of device information; and generating a search result corresponding to the search condition according to the equipment information which is the same as the score of the search condition.
Preferably, the step of performing word segmentation on the plurality of pieces of equipment information and then de-duplication to obtain M keywords in the equipment information retrieval method based on the optimized retrieval condition includes: based on a preset device file training word bonding strength recognition model:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the word bonding strength between any two adjacent words x and y in the device file,representing the frequency of occurrence of word x adjacent to word y in said device file, +.>Representing the frequency of occurrence of word x in said device file,/->Representing the frequency of occurrence of the word y in the device file; according toThe word bonding strength recognition model calculates the word bonding strength of any two adjacent words in each piece of equipment information in the plurality of pieces of equipment information; and when the calculated word combination strength exceeds a preset threshold value, extracting the corresponding adjacent two words as keywords.
Preferably, the foregoing device information retrieval method based on the optimized retrieval condition further includes, before the step of setting a corresponding score for each of the N category keywords: calculating the frequency of each keyword in each of the N category keywords in the historical search conditionWherein->Representing a j-th keyword among the i-th category keywords; setting the score order of the keywords in each category of keywords according to the occurrence frequency of the keywords in each category of keywords; the step of setting a corresponding score for each of the N category keywords includes: setting the scores of different keywords in the keywords of each category according to the calculated order of the scores of the keywords.
Preferably, the device information retrieval method based on the optimized retrieval condition further includes, before the step of "calculating the sum of the scores of the m keywords: searching the M keywords from the M keywords, and if p keywords in the M keywords are positioned in the M keywords, setting the scores of the p keywords according to the set scores of the M keywords.
Preferably, the foregoing device information retrieval method based on the optimized retrieval condition further includes: if q keywords in the M keywords are not among the M keywords, calculating the similarity between the font characteristic of the q keywords and the font characteristic of each keyword in the M keywords; according to the calculated similarity, a group of keywords with highest similarity with the q keywords is found out from the M keywords; and setting the scores of the q keywords according to the scores of the group of keywords.
Preferably, the foregoing device information retrieval method based on the optimized retrieval condition, the step of "calculating the similarity between the grapheme features of the q keywords and the grapheme features of each of the M keywords" includes: for two words for similarity calculationAnd->Extracting a font feature set ∈ ->And->Calculating the font characteristic set ++>And->Similarity between:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the glyph feature set ++>And->Similarity between->Representing a glyph feature set +.>I-th feature of (a)>Representing a glyph feature set +.>I-th feature of (a).
Preferably, the step of generating the search result corresponding to the search condition according to the device information having the same score as the search condition in the device information search method based on the optimized search condition includes: and extracting the equipment information containing the m keywords from the equipment information with the same score as the search result when a plurality of pieces of equipment information with the same score as the search condition are provided.
In a second aspect, the present invention provides a device information retrieval system based on optimized retrieval conditions, including: the first word segmentation module is used for carrying out word segmentation on the plurality of equipment information and then removing duplication to obtain M keywords; the classification module is used for classifying the M keywords into N categories; the first scoring module is used for setting corresponding scores for each keyword in the N categories of keywords, and the scores of different keywords are different:
or->
Wherein, after the scores of the ith category keyword in the N category keywords are ranked from low to high, the scores of the two adjacent keywords in the ith category keyword have a fixed difference value,/>Score representing the j-th keyword of the i-th category keywords,/for the j-th keyword>Score indicating j+1st keyword among ith category keywords, ++>Represents the maximum value in the i-th category keyword, < +.>Representing the minimum value in the i-th category keyword,/->The number of keywords representing the ith category keyword; a second scoring module, configured to set a score for each piece of equipment information according to a sum of scores of all keywords included in each piece of equipment information; a search condition receiving module for receiving a search condition for inquiring the plurality of device information; the second word segmentation module is used for segmenting the search condition to obtain m keywords; a third scoring module for calculating the sum of the scores of the m keywords as the score of the search condition; a query module configured to query, from the plurality of device information, device information that is the same as the search condition score; and a result generation module for generating a search result corresponding to the search condition according to the equipment information with the same score as the search condition.
Preferably, the foregoing device information retrieval system based on the optimized retrieval condition further includes: the word bonding strength module trains a word bonding strength recognition model based on a preset device file:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the word bonding strength between any two adjacent words x and y in a device file,Representing the frequency of occurrence of word x adjacent to word y in said device file, +.>Representing the frequency of occurrence of word x in said device file,/->Representing the frequency of occurrence of the word y in the device file; according to the word bonding strength recognition model, calculating the word bonding strength of any two adjacent words in each piece of equipment information in the plurality of pieces of equipment information; when the calculated word combination strength exceeds a preset threshold value, the first word segmentation module extracts corresponding adjacent two words as keywords.
Preferably, the foregoing device information retrieval system based on the optimized retrieval condition further includes: a frequency calculation module for calculating the frequency of each keyword in each of the N category keywords in the history retrieval conditionWherein->Representing a j-th keyword among the i-th category keywords; the ordering module is used for setting the score order of the keywords in each category of keywords according to the occurrence frequency of the keywords in each category of keywords; and the first scoring module sets scores of different keywords in the keywords of each category according to the calculated keyword scores.
The technical scheme provided by the invention has at least one or more of the following beneficial effects:
the technical scheme of the invention is different from the prior art, the method does not directly search according to the keywords in the search condition, but divides a plurality of pieces of equipment information into keywords and classifies the keywords, scores the keywords in each category, ensures that any two keywords with the closest score in the same category have the same score, ensures that the maximum score of one keyword in any two categories does not exceed the score of the other keyword, and thus, after the overall score of the equipment information is determined based on the score of the keywords contained in the equipment information, the scores of the different pieces of equipment information are mutually different, the search condition is equally divided according to the search condition when the search is performed, the overall score of the search condition is determined according to the score of the keyword obtained after the division, and the equipment information with the same score is searched according to the score of the search condition.
Drawings
The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
FIG. 1 is a schematic diagram of information retrieval of a prior art solution;
FIG. 2 is a flow chart of a device information retrieval method based on optimized retrieval conditions according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a device information retrieval method based on optimized retrieval conditions according to an embodiment of the present application;
FIG. 4 is a flow chart of a device information retrieval method based on optimized retrieval conditions according to an embodiment of the present application;
FIG. 5 is a flow chart of a device information retrieval method based on optimized retrieval conditions according to an embodiment of the present application;
fig. 6 is a block diagram of a device information retrieval system based on optimized retrieval conditions according to an embodiment of the present application.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
As shown in fig. 2, in one embodiment of the present invention, there is provided a device information retrieval method based on an optimized retrieval condition, including:
step S210, performing word segmentation on the plurality of pieces of equipment information, and then removing duplication to obtain M keywords.
In this embodiment, the device type is not limited, and may be, for example, a network device provided by the ISP, or a mobile terminal such as a mobile phone.
Step S220, dividing M keywords into N categories.
In this embodiment, the classification method is not limited, and for example, keywords such as "mobile, communication, and telecom" may be classified into "carrier" and "guangdong, shandong, and Zhejiang" may be classified into "home" categories.
Step S230, setting corresponding scores for each keyword in the N categories of keywords, wherein the scores of different keywords are different:
or->
Wherein, after the scores of the ith category keyword in the N category keywords are ranked from low to high, the scores of the two adjacent keywords in the ith category keyword have a fixed difference value,/>Score representing the j-th keyword of the i-th category keywords,/for the j-th keyword>Representing the score of the j+1th keyword among the ith category keywords,represents the maximum value in the i-th category keyword, < +.>Representing the minimum value in the i-th category keyword,the number of keywords representing the i-th category keyword.
In this embodiment, the score of the keyword is set based on the above formula, so that any two keywords with closest scores in the same category have the same score, and the maximum score of one category keyword does not exceed the score of another category keyword for any two category keywords, which not only can make the scores of different keywords different, but also can make the scores of different device information composed of various keywords different from each other.
Step S240, setting a score for each piece of equipment information according to the sum of scores of all keywords contained in each piece of equipment information.
In this embodiment, the scores of different device information are made different from each other based on the foregoing keyword score setting manner.
Step S250, receiving a search condition for querying a plurality of device information.
Step S260, word segmentation is carried out on the search condition, and m keywords are obtained.
And step S270, calculating the sum of the scores of the m keywords as the score of the search condition.
In this embodiment, the scores of the M keywords may be set with reference to the scores of the M keywords.
Step S280, inquiring the device information with the same index condition score from the plurality of device information.
Step S290, according to the equipment information which is the same as the index condition score, generating the index result corresponding to the index condition.
Further, when there are a plurality of pieces of equipment information having the same score as the search condition due to existence of equipment information in an irregular format or the like, the equipment information including m keywords is extracted from the equipment information having the same score as the search result.
In this embodiment, when the search result is not unique, the search result may still be filtered according to the keyword content, and since the search result contains less equipment information, the filtering efficiency of the search result is higher.
According to the technical solution of the present embodiment, after calculating the scores of a plurality of pieces of equipment information in advance, when receiving the search condition, only the score of the search condition needs to be calculated, and a search result can be obtained by performing a query according to the score, specifically, as shown in fig. 3, for the keyword of the "operator" category, the score of the "telecommunication" is set to 0, the score of the "UNICOM" is set to 100, the score of the "mobile" is set to 200, for the keyword of the "home" category, the score of the "Guangdong" is set to 10, and the score of the "Guangxi" is set to 11, then the score of the equipment information "Guangdong telecommunication" is 0+10=10, and the score of the equipment information "Guangdong UNICOM" is set to 100+10=110.
As shown in fig. 4, in another embodiment of the present invention, there is provided a device information retrieval method based on an optimized retrieval condition, in comparison with the previous embodiment, step S210 includes:
step S410, based on a preset device file training word combination strength recognition model:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the word bonding strength between any two adjacent words x and y in the device file,representing the frequency of occurrence of word x adjacent word y in the device file, +.>Representing the frequency of occurrence of word x in the device file,representing how frequently word y appears in the device file.
In this embodiment, since the device information often includes a large number of technical terms, and the technical terms occur in natural language less frequently, the word segmentation method for natural language in the prior art is not suitable for the device information. In this embodiment, the device file is taken as a sample, the probability of co-occurrence and the total probability of occurrence of any two adjacent words in the file are analyzed, when the word combination strength calculated based on the formula is high, the fact that more adjacent words occur simultaneously and fewer words occur independently is indicated, and at this time, the two words can be selected as keywords.
Step S420, calculating the word bonding strength of any two adjacent words in each piece of equipment information in the plurality of pieces of equipment information according to the word bonding strength recognition model.
And step S430, when the calculated word combination strength exceeds a preset threshold value, extracting the corresponding adjacent two words as keywords.
According to the technical scheme of the embodiment, keywords with the length of two words can be accurately and effectively identified.
In another embodiment of the present invention, there is provided a device information retrieval method based on an optimized retrieval condition, further including, before step S230:
calculating the frequency of occurrence of each keyword in each of N category keywords in the historical search conditionWherein->Represents the j-th keyword among the i-th category keywords.
And setting the score order of the keywords in each category keyword according to the occurrence frequency of the keywords in each category keyword.
The step S230 specifically includes: and setting the scores of different keywords in each category of keywords according to the calculated keyword score order.
In the technical scheme of the embodiment, the keywords of each category are scored according to the occurrence frequency of the keywords in the history retrieval record, so that the keywords with higher occurrence frequency obtain higher scores, and further, the equipment information which is more frequently retrieved obtains higher scores, and the equipment information is ranked according to the scores from high to low, so that the equipment information which is more frequently retrieved is retrieved more quickly.
As shown in fig. 5, in another embodiment of the present invention, there is provided a device information retrieval method based on an optimized retrieval condition, which further includes, before step S270, compared to the previous embodiment:
step S510, searching M keywords from the M keywords, and if p keywords in the M keywords are located in the M keywords, setting the scores of the p keywords according to the set scores of the M keywords.
In this embodiment, if the p keywords can be directly queried in the M keywords, the scores of the p keywords may be directly set according to the scores of the M keywords.
In step S520, if q keywords among the M keywords are not among the M keywords, a similarity between the glyph feature of the q keywords and the glyph feature of each of the M keywords is calculated.
Specifically, for two words for which similarity calculation is performedAnd->Extracting a font feature set ∈ ->And->Computing a glyph feature set +.>And->Similarity between:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a glyph feature set +.>And->Similarity between->Representing a glyph feature set +.>I-th feature of (a)>Representing a glyph feature set +.>I-th feature of (a).
In this embodiment, the similarity calculation between the keywords is converted into the calculation of the similarity between the character pattern feature sets, and experiments show that when the keywords are, for example, the letter and number sequences of the product model, the most similar sequence can be accurately found based on the above formula.
In step S530, a group of keywords with highest similarity to the q keywords is found out from the M keywords according to the calculated similarity.
Step S540, setting the scores of q keywords according to the scores of the keywords.
In this embodiment, if the q keywords cannot be directly queried in the M keywords, the keywords most similar to the q keywords may be searched from the M keywords, and the scores of the q keywords may be set according to the scores of the searched group of keywords.
As shown in fig. 6, in one embodiment of the present invention, there is provided a device information retrieval system based on an optimized retrieval condition, including:
the first word segmentation module 610 performs word segmentation on the plurality of device information and then removes duplication to obtain M keywords.
In this embodiment, the device type is not limited, and may be, for example, a network device provided by the ISP, or a mobile terminal such as a mobile phone.
The classification module 620 classifies the M keywords into N categories.
In this embodiment, the classification method is not limited, and for example, keywords such as "mobile, communication, and telecom" may be classified into "carrier" and "guangdong, shandong, and Zhejiang" may be classified into "home" categories.
The first scoring module 630 sets a corresponding score for each of the N category keywords, the scores of the different keywords being different:
or->
Wherein, after the scores of the ith category keyword in the N category keywords are ranked from low to high, the scores of the two adjacent keywords in the ith category keyword have a fixed difference value,/>Score representing the j-th keyword of the i-th category keywords,/for the j-th keyword>Representing the score of the j+1th keyword among the ith category keywords,represents the maximum value in the i-th category keyword, < +.>Representing the minimum value in the i-th category keyword,the number of keywords representing the i-th category keyword.
In this embodiment, the score of the keyword is set based on the above formula, so that any two keywords with closest scores in the same category have the same score, and the maximum score of one category keyword does not exceed the score of another category keyword for any two category keywords, which not only can make the scores of different keywords different, but also can make the scores of different device information composed of various keywords different from each other.
The second scoring module 640 sets a score for each piece of equipment information according to the sum of scores of all keywords included in each piece of equipment information.
In this embodiment, the scores of different device information are made different from each other based on the foregoing keyword score setting manner.
The search condition receiving module 650 receives a search condition for querying a plurality of device information.
The second word segmentation module 660 performs word segmentation on the search condition to obtain m keywords.
The third scoring module 670 calculates the sum of the scores of the m keywords as the score of the search condition.
In this embodiment, the scores of the M keywords may be set with reference to the scores of the M keywords.
The query module 680 queries the device information, which is the same as the search condition score, from the plurality of device information.
The result generation module 690 generates a search result corresponding to the search condition based on the device information that is the same as the score of the search condition.
Further, when there are a plurality of pieces of equipment information having the same score as the search condition due to existence of equipment information in an irregular format or the like, the equipment information including m keywords is extracted from the equipment information having the same score as the search result.
In this embodiment, when the search result is not unique, the search result may still be filtered according to the keyword content, and since the search result contains less equipment information, the filtering efficiency of the search result is higher.
According to the technical solution of the present embodiment, after calculating the scores of a plurality of pieces of equipment information in advance, when receiving the search condition, only the score of the search condition needs to be calculated, and a search result can be obtained by performing a query according to the score, specifically, as shown in fig. 3, for the keyword of the "operator" category, the score of the "telecommunication" is set to 0, the score of the "UNICOM" is set to 100, the score of the "mobile" is set to 200, for the keyword of the "home" category, the score of the "Guangdong" is set to 10, and the score of the "Guangxi" is set to 11, then the score of the equipment information "Guangdong telecommunication" is 0+10=10, and the score of the equipment information "Guangdong UNICOM" is set to 100+10=110.
In another embodiment of the present invention, there is provided a device information retrieval system based on an optimized retrieval condition, further including, compared to the previous embodiment:
the word bonding strength module trains a word bonding strength recognition model based on a preset device file:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the word bonding strength between any two adjacent words x and y in the device file,representing the frequency of occurrence of word x adjacent word y in the device file, +.>Representing the frequency of occurrence of word x in the device file,representing how frequently word y appears in the device file.
In this embodiment, since the device information often includes a large number of technical terms, and the technical terms occur in natural language less frequently, the word segmentation method for natural language in the prior art is not suitable for the device information. In this embodiment, the device file is taken as a sample, the probability of co-occurrence and the total probability of occurrence of any two adjacent words in the file are analyzed, when the word combination strength calculated based on the formula is high, the fact that more adjacent words occur simultaneously and fewer words occur independently is indicated, and at this time, the two words can be selected as keywords.
And calculating the word bonding strength of any two adjacent words in each piece of equipment information in the plurality of pieces of equipment information according to the word bonding strength recognition model.
When the calculated word combination strength exceeds the preset threshold, the first word segmentation module 610 extracts the corresponding two adjacent words as keywords.
According to the technical scheme of the embodiment, keywords with the length of two words can be accurately and effectively identified.
In another embodiment of the present invention, there is provided a device information retrieval system based on an optimized retrieval condition, further including, compared to the previous embodiment:
a frequency calculation module for calculating the frequency of each keyword in each of N category keywords in the history retrieval conditionWherein->Represents the j-th keyword among the i-th category keywords.
The ordering module is used for setting the score high-low sequence of the keywords in each category keyword according to the occurrence frequency of the keywords in each category keyword.
The first scoring module 630 sets scores of different keywords in each category of keywords according to the order of the calculated scores of the keywords.
In the technical scheme of the embodiment, the keywords of each category are scored according to the occurrence frequency of the keywords in the history retrieval record, so that the keywords with higher occurrence frequency obtain higher scores, and further, the equipment information which is more frequently retrieved obtains higher scores, and the equipment information is ranked according to the scores from high to low, so that the equipment information which is more frequently retrieved is retrieved more quickly.
In another embodiment of the present invention, there is provided a device information retrieval method based on an optimized retrieval condition, which, compared to the foregoing embodiment,
the third scoring module 670 searches M keywords from the M keywords, and if p keywords in the M keywords are located in the M keywords, sets scores of the p keywords according to the set scores of the M keywords; if q keywords in the M keywords are not among the M keywords, calculating the similarity between the font characteristics of the q keywords and the font characteristics of each keyword in the M keywords, finding out a group of keywords with highest similarity with the q keywords from the M keywords according to the calculated similarity, and setting the scores of the q keywords according to the scores of the group of keywords.
Specifically, for two words for which similarity calculation is performedAnd->Extracting a font feature set ∈ ->And->Computing a glyph feature set +.>And->Similarity between:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a glyph feature set +.>And->Similarity between->Representing a glyph feature set +.>I-th feature of (a)>Representing a glyph feature set +.>I-th feature of (a).
In this embodiment, the similarity calculation between the keywords is converted into the calculation of the similarity between the character pattern feature sets, and experiments show that when the keywords are, for example, the letter and number sequences of the product model, the most similar sequence can be accurately found based on the above formula.
In this embodiment, if the p keywords can be directly queried in the M keywords, the scores of the p keywords may be directly set according to the scores of the M keywords. In this embodiment, if the q keywords cannot be directly queried in the M keywords, the keywords most similar to the q keywords may be searched from the M keywords, and the scores of the q keywords may be set according to the scores of the searched group of keywords.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not intended to be limited to the details disclosed herein as such.
The block diagrams of the devices, apparatuses, devices, systems referred to in this application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent to the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (10)

1. A device information retrieval method based on optimized retrieval conditions, comprising:
performing word segmentation on the plurality of pieces of equipment information, and then removing duplication to obtain M keywords;
dividing the M keywords into N categories;
setting a corresponding score for each keyword in the N categories of keywords, wherein the scores of different keywords are different:
or->
Wherein, after the scores of the ith category keyword in the N category keywords are ranked from low to high, the scores of the two adjacent keywords in the ith category keyword have a fixed difference value,/>Score representing the j-th keyword of the i-th category keywords,/for the j-th keyword>Score indicating j+1st keyword among ith category keywords, ++>Represents the maximum value in the i-th category keyword, < +.>Representing the minimum value in the i-th category keyword,/->The number of keywords representing the ith category keyword;
setting a score for each piece of equipment information according to the sum of scores of all keywords contained in each piece of equipment information;
receiving search conditions for inquiring the plurality of device information;
word segmentation is carried out on the search conditions to obtain m keywords;
calculating the sum of the scores of the m keywords as the score of the search condition;
querying the device information which is the same as the retrieval condition score from the plurality of device information;
and generating a search result corresponding to the search condition according to the equipment information which is the same as the score of the search condition.
2. The apparatus information retrieval method based on the optimized retrieval condition according to claim 1, wherein the step of performing word segmentation on the plurality of apparatus information and then performing duplication elimination to obtain M keywords includes:
based on a preset device file training word bonding strength recognition model:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the word binding strength between any two adjacent words x and y in the device file, +.>Representing the frequency of occurrence of word x adjacent to word y in said device file, +.>Representing the frequency of occurrence of word x in the device file,representing the frequency of occurrence of the word y in the device file;
according to the word bonding strength recognition model, calculating the word bonding strength of any two adjacent words in each piece of equipment information in the plurality of pieces of equipment information;
and when the calculated word combination strength exceeds a preset threshold value, extracting the corresponding adjacent two words as keywords.
3. The apparatus information retrieval method based on the optimized retrieval condition according to claim 1, further comprising, before the step of setting a corresponding score for each of the N category keywords:
calculating the frequency of each keyword in each of the N category keywords in the historical search conditionWherein->Representing a j-th keyword among the i-th category keywords;
setting the score order of the keywords in each category of keywords according to the occurrence frequency of the keywords in each category of keywords;
the step of setting a corresponding score for each of the N category keywords includes:
setting the scores of different keywords in the keywords of each category according to the calculated order of the scores of the keywords.
4. The apparatus information retrieval method based on the optimized retrieval condition according to claim 1, further comprising, before the step of "calculating the sum of the scores of the m keywords:
searching the M keywords from the M keywords, and if p keywords in the M keywords are positioned in the M keywords, setting the scores of the p keywords according to the set scores of the M keywords.
5. The apparatus information retrieval method based on the optimized retrieval condition according to claim 4, further comprising:
if q keywords in the M keywords are not among the M keywords, calculating the similarity between the font characteristic of the q keywords and the font characteristic of each keyword in the M keywords;
according to the calculated similarity, a group of keywords with highest similarity with the q keywords is found out from the M keywords;
and setting the scores of the q keywords according to the scores of the group of keywords.
6. The apparatus information retrieval method based on the optimized retrieval condition according to claim 5, wherein the step of calculating the similarity between the glyph feature of the q keywords and the glyph feature of each of the M keywords includes:
for two words for similarity calculationAnd->Extracting a font feature set ∈ ->And->Calculating the font characteristic set ++>And->Similarity between:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the glyph feature set ++>And->Similarity between->Representing a glyph feature set +.>I-th feature of (a)>Representing a glyph feature set +.>I-th feature of (a).
7. The apparatus information search method based on the optimized search condition according to claim 1, wherein the step of generating a search result corresponding to the search condition based on the same apparatus information as the search condition score includes:
and extracting the equipment information containing the m keywords from the equipment information with the same score as the search result when a plurality of pieces of equipment information with the same score as the search condition are provided.
8. A device information retrieval system based on optimized retrieval conditions, comprising:
the first word segmentation module is used for carrying out word segmentation on the plurality of equipment information and then removing duplication to obtain M keywords;
the classification module is used for classifying the M keywords into N categories;
the first scoring module is used for setting corresponding scores for each keyword in the N categories of keywords, and the scores of different keywords are different:
or->
Wherein, after the scores of the ith category keyword in the N category keywords are ranked from low to high, the scores of the two adjacent keywords in the ith category keyword have a fixed difference value,/>Score representing the j-th keyword of the i-th category keywords,/for the j-th keyword>Score indicating j+1st keyword among ith category keywords, ++>Represents the maximum value in the i-th category keyword, < +.>Representing the minimum value in the i-th category keyword,/->The number of keywords representing the ith category keyword;
a second scoring module, configured to set a score for each piece of equipment information according to a sum of scores of all keywords included in each piece of equipment information;
a search condition receiving module for receiving a search condition for inquiring the plurality of device information;
the second word segmentation module is used for segmenting the search condition to obtain m keywords;
a third scoring module for calculating the sum of the scores of the m keywords as the score of the search condition;
a query module configured to query, from the plurality of device information, device information that is the same as the search condition score;
and a result generation module for generating a search result corresponding to the search condition according to the equipment information with the same score as the search condition.
9. The device information retrieval system based on the optimized retrieval condition as recited in claim 8, further comprising:
the word bonding strength module trains a word bonding strength recognition model based on a preset device file:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the word binding strength between any two adjacent words x and y in the device file, +.>Representing the frequency of occurrence of word x adjacent to word y in said device file, +.>Representing the frequency of occurrence of word x in the device file,representing the frequency of occurrence of the word y in the device file;
according to the word bonding strength recognition model, calculating the word bonding strength of any two adjacent words in each piece of equipment information in the plurality of pieces of equipment information;
when the calculated word combination strength exceeds a preset threshold value, the first word segmentation module extracts corresponding adjacent two words as keywords.
10. The device information retrieval system based on the optimized retrieval condition as recited in claim 8, further comprising:
a frequency calculation module for calculating the frequency of each keyword in each of the N category keywords in the history retrieval conditionWherein->Representing a j-th keyword among the i-th category keywords;
the ordering module is used for setting the score order of the keywords in each category of keywords according to the occurrence frequency of the keywords in each category of keywords;
and the first scoring module sets scores of different keywords in the keywords of each category according to the calculated keyword scores.
CN202310611896.6A 2023-05-29 2023-05-29 Equipment information retrieval method and system based on optimized retrieval conditions Active CN116340395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310611896.6A CN116340395B (en) 2023-05-29 2023-05-29 Equipment information retrieval method and system based on optimized retrieval conditions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310611896.6A CN116340395B (en) 2023-05-29 2023-05-29 Equipment information retrieval method and system based on optimized retrieval conditions

Publications (2)

Publication Number Publication Date
CN116340395A CN116340395A (en) 2023-06-27
CN116340395B true CN116340395B (en) 2023-07-28

Family

ID=86876214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310611896.6A Active CN116340395B (en) 2023-05-29 2023-05-29 Equipment information retrieval method and system based on optimized retrieval conditions

Country Status (1)

Country Link
CN (1) CN116340395B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002269146A (en) * 2001-03-08 2002-09-20 Fujitsu Ltd Word spotting information retrieving device, and method and program for realizing word spotting information retrieving device
CN102486781A (en) * 2010-12-03 2012-06-06 阿里巴巴集团控股有限公司 Method and device for sorting searches
CN110597957A (en) * 2019-09-11 2019-12-20 腾讯科技(深圳)有限公司 Text information retrieval method and related device
CN111553762A (en) * 2020-04-24 2020-08-18 广州探途网络技术有限公司 Method, system and terminal equipment for improving search quality

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002269146A (en) * 2001-03-08 2002-09-20 Fujitsu Ltd Word spotting information retrieving device, and method and program for realizing word spotting information retrieving device
CN102486781A (en) * 2010-12-03 2012-06-06 阿里巴巴集团控股有限公司 Method and device for sorting searches
CN110597957A (en) * 2019-09-11 2019-12-20 腾讯科技(深圳)有限公司 Text information retrieval method and related device
CN111553762A (en) * 2020-04-24 2020-08-18 广州探途网络技术有限公司 Method, system and terminal equipment for improving search quality

Also Published As

Publication number Publication date
CN116340395A (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN108520002A (en) Data processing method, server and computer storage media
CN104408191B (en) The acquisition methods and device of the association keyword of keyword
CN104881458B (en) A kind of mask method and device of Web page subject
CN104281702B (en) Data retrieval method and device based on electric power critical word participle
KR20150010740A (en) On-line product search method and system
CN105260359A (en) Semantic keyword extraction method and apparatus
JP6355840B2 (en) Stopword identification method and apparatus
CN108536667B (en) Chinese text recognition method and device
WO2015043066A1 (en) Keyword expansion method and system, and classified corpus annotation method and system
CN102012915A (en) Keyword recommendation method and system for document sharing platform
CN109086355B (en) Hot-spot association relation analysis method and system based on news subject term
CN110990676A (en) Social media hotspot topic extraction method and system
CN106844482B (en) Search engine-based retrieval information matching method and device
CN113065070A (en) Intelligent sorting method, system, equipment and computer storage medium for mobile internet information search and retrieval
CN115238154A (en) Search engine optimization system
CN110688572A (en) Method for identifying search intention in cold starting state
CN107943937B (en) Debtor asset monitoring method and system based on judicial public information analysis
CN110928986A (en) Legal evidence sorting and recommending method, device, equipment and storage medium
CN116340395B (en) Equipment information retrieval method and system based on optimized retrieval conditions
JP2003150624A (en) Information extraction device and information extraction method
CN111091003A (en) Parallel extraction method based on knowledge graph query
CN108846094A (en) A method of based on index in classification interaction
CN103136256A (en) Method and system for achieving information retrieval in network
CN112883704B (en) Big data similar text duplicate removal preprocessing method and device and terminal equipment
WO2021103859A1 (en) Information search method, apparatus and device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant