CN105528411A - Full-text retrieval device and method for interactive electronic technical manual of shipping equipment - Google Patents

Full-text retrieval device and method for interactive electronic technical manual of shipping equipment Download PDF

Info

Publication number
CN105528411A
CN105528411A CN201510884252.XA CN201510884252A CN105528411A CN 105528411 A CN105528411 A CN 105528411A CN 201510884252 A CN201510884252 A CN 201510884252A CN 105528411 A CN105528411 A CN 105528411A
Authority
CN
China
Prior art keywords
module
abbreviation
database
character string
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510884252.XA
Other languages
Chinese (zh)
Other versions
CN105528411B (en
Inventor
马良荔
覃基伟
苏凯
许国鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Naval University of Engineering PLA
Original Assignee
Naval University of Engineering PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naval University of Engineering PLA filed Critical Naval University of Engineering PLA
Priority to CN201510884252.XA priority Critical patent/CN105528411B/en
Publication of CN105528411A publication Critical patent/CN105528411A/en
Application granted granted Critical
Publication of CN105528411B publication Critical patent/CN105528411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The present invention discloses a full-text retrieval device for an interactive electronic technical manual of shipping equipment. The full-text retrieval device comprises a common source database, a specialized vocabulary extraction module, an abbreviation extraction module, a first segmentation module, a technical information term database, an equipment part name database, an abbreviation database, a general vocabulary database, a retrieval record database, a user retrieval command communication module, a retrieval module, a second segmentation module, an index database and an index module. Element label characteristics and document content in data module documents are composited, query is carried out by utilization of specialized vocabularies, weight of the specialized vocabularies in documents and retrieval keywords is increased, so that the system can carry out query in certain semantic levels, returned retrieved results are closer to retrieval intention of users, and therefore high recall rate and accuracy of the retrieval system are ensured.

Description

Apparel interactive electronic technical manual full-text search device and method
Technical field
The present invention relates to technical field of information retrieval, refer to a kind of apparel interactive electronic technical manual full-text search device and method particularly.
Technical background
The technical information major part of current apparel exists with paper-based form, causes the management role of technical information day by day heavy, and data repetition rate and redundance increase, and is difficult to upgrade, and data interoperability, transmission real-time and shared difficulty are large.In order to solve an above-mentioned difficult problem, usual establishment interactive electronic technical manual (IETM, InteractiveElectronicTechnicalManual) technical information is managed, namely according to the digital format authorizing standard of standard, adopt the forms such as word, figure, form, Voice & Video, the technical press of the content such as ultimate principle, the guarantee of operation use technology of this equipment is provided by man-machine interaction mode.The information related to due to IETM system is various, and user need use the fast finding of information retrieval function realization to required content usually, and wherein full-text search is one of the most frequently used method.In the text searching method of past IETM, most retrieval scheme adopting general field, does not take into full account the feature of professional domain technical information, causes result for retrieval undesirable.
Full-text search refers to the search method of all texts of document and search key being carried out mating.Due under Chinese linguistic context, space is not had as separator between word, obvious separator is not had between word, needing according to certain specification is word independent one by one by the cutting of Chinese character string, just can reach the effect of Computer Automatic Recognition statement implication, mate work with what complete document Chinese version and search key, therefore, Chinese words segmentation also becomes the core technology of Chinese Full Text Retrieval.In segmenting method conventional at present, segmenting method based on character string is most widely used method, the method to need the character string of participle to carry out mating the method obtaining word segmentation result according to certain strategy with a dictionary, and in professional domain, if lack specialized vocabulary in dictionary, segmenting method based on character string cannot obtain desirable participle effect, and in dictionary, the number of specialized vocabulary directly affects the accuracy rate of participle.
In apparel IETM field, main exist two class specialized vocabularies, and a class is apparel component names, as " SMR-7200 marine radar ", " 05106 current mode screw propeller anemoscope " etc.Another kind of is technical information term, as " tactical and technical norms ", " amplitude-comprised direction-finding principle ", " maintenance envelope diagram " etc.Therefore, the acquisition of this two classes specialized vocabulary is the problem that first IETM full-text search needs to solve, only have and utilize specialized vocabulary and universal word to carry out participle coupling to data module (DM, DataModel) document simultaneously, user's fast finding just can be made to required equipment technology information.
Apparel title full name complex structure; the various characters types such as numeral, symbol, letter are often comprised in title; user can use abbreviation to carry out alternative full name usually; as equipped title " H1604A ' Ilyushin Coase dignity ' number bulk goods wheel "; user uses usually " H1604A bulk goods wheel " or " Ilyushin Coase dignity " replaces; therefore, only comprise the full name of equipment title in dictionary not enough, the process of abbreviation is also that apparel IETM field participle mates the problem that cannot avoid.For equipment title, be mainly condensation from primitive to abbreviation form and cut slightly two kinds, condensation refers to that by primitive cutting be some parts, chooses the word that can represent original meaning in each several part or word combination becomes abbreviation, as " H1604A bulk goods wheel " in citing; Cut and slightly refer to that in acquisition primitive, one section of continuous print substring is as abbreviation, " Ilyushin Coase dignity " as above in example.
After solution specialized vocabulary acquisition problem, existing segmenting method does not mate for the feature of specialized vocabulary, and participle effect exists certain problem, therefore, need to combine the specific segmenting method that the feature design of extracting vocabulary is applicable to this field, to obtain best matching effect.
After retrieving required information, how to sort to multiple result for retrieval is also that full-text search apparatus and method need one of key problem solved, due to the element huge number of data module document, importance degree differs, and the importance degree of different document also there are differences, and the importance degree of different searching keywords is not identical yet, therefore, need the factor considering above three aspects, result for retrieval sort method reasonable in design, obtain making customer satisfaction system result for retrieval.
As can be seen from above content, specialized vocabulary acquisition, abbreviation acquisition, participle problem and result for retrieval sequence are the four major problems that current apparel IETM full-text search apparatus and method need solve.
Summary of the invention
Object of the present invention will provide a kind of apparel interactive electronic technical manual full-text search device and method exactly, and these apparatus and method can facilitate user to find required apparel technical information rapidly and accurately.
For realizing this object, the present invention is designed apparel interactive electronic technical manual full-text search device, it comprises database and functional module, wherein, described database comprises Common source database, technical information terminological data bank, equipment part name database, abbreviation database, universal word database, search records database and index data base, described functional module comprises specialized vocabulary extraction module, abbreviation extraction module, first participle module, user search command communication module, retrieval module, second word-dividing mode and index module, wherein Common source database provides word retrieval source for specialized vocabulary extraction module and abbreviation extraction module and provides the content of word segmentation processing for first participle module, specialized vocabulary extraction module is for extracting vocabulary and stored in technical information terminological data bank and equipment part name database, abbreviation extraction module is for extracting vocabulary stored in abbreviation database, first participle module is used for the participle content after by process and imports index module process,
Index module is used for setting up index and stored in index data base, searching database carries out matched and searched for the retrieval of content receiving the second word-dividing mode word segmentation processing, and be back to retrieval module sort mating the result set that obtains, retrieval module is used for that the retrieval of content of user is sent to the second word-dividing mode and carries out word segmentation processing, retrieval module is also for receiving the retrieval command of user search command communication module and returning the result set after sequence and be sent to user search command communication module, user search command communication module is used for the retrieval command of user to be sent to search records database, search records database is used for providing word retrieval source for abbreviation extraction module,
Described technical information terminological data bank, equipment part name database, abbreviation database and universal word database are respectively coupling word set when first participle module and the second word-dividing mode provide participle.
Utilize above-mentioned apparel interactive electronic technical manual full-text search device to carry out the method retrieved, it comprises the steps:
Step 1: import the data module document edited according to selected interactive electronic technical manual document preparation standard (i.e. S1000D standard) in Common source database, specialized vocabulary extraction module is according to the technical information term in data module document in the requirement extraction Common source database of described selected interactive electronic technical manual document preparation standard and equipment part title two class specialized vocabulary, and set up and mapping relations in corresponding data module documentation between data module coded message, and by above-mentioned two class specialized vocabularies and mapping relations stored in the technical information terminological data bank of correspondence and equipment part name database,
Step 2: abbreviation extraction module extracts the characteristic quantity of corresponding abbreviation from the equipment part title of Common source database, this characteristic quantity is numeral number in equipment part title or is commonly called as part;
Step 3: the user search record in data module document in above-mentioned characteristic quantity and Common source database and search records database is carried out matched and searched by abbreviation extraction module, determines the particular location of each element in data module document and user search record in characteristic quantity;
Step 4: the head and the tail character string of abbreviation extraction module determination characteristic quantity place abbreviation, and the border fragment of the corresponding abbreviation of recognition feature amount, make the abbreviation identified be complete abbreviation, this complete abbreviation be decided to be candidate's abbreviation;
Step 5: abbreviation extraction module calculates the weights of above-mentioned candidate's abbreviation by following formula 1:
W a = n m i c n a l l * lg D a l l D m i c - - - ( 1 )
N in formula micfor the number of times that candidate's abbreviation occurs in certain content, described certain content comprises the search key in the data module document content identical with the types of equipment identification code of equipment part title and this data module document content search records; n allthe summation of occurrence number in all search records in the number of times occurred in all data module documents for candidate's abbreviation and search records database; D allfor all data module total number of documents and all search records sum sum; D micfor comprising the data module total number of documents of candidate's abbreviation and comprising the search records sum sum of candidate's abbreviation; W afor the weights of candidate's abbreviation, weigh the ability of theme for weighing candidate's abbreviation, W athreshold value be set-point, when the weights of candidate's abbreviation are more than or equal to W athreshold value time, candidate's abbreviation can be considered formal abbreviation, and by candidate's abbreviation stored in abbreviation database, the weights of candidate's abbreviation are less than W athreshold value time, candidate's abbreviation is not processed;
Step 6: respectively word segmentation processing is carried out to the user search keyword that data module documentation and retrieval module provide in first participle module and the second word-dividing mode; The detailed process of word segmentation processing is:
If character string to be slit is S 1=w 1w 2w 3w iw n, wherein, character string S to be slit 1for each content in the character string of user search keyword or data module document, w ifor S 1in single character, n is the length of this character string, n>=1, and i is the character number between 1 to n;
Use abbreviation database to character string S to be slit 1scan, when abbreviation hits, by character string S to be slit 1the character substring of middle hit is reduced to corresponding primitive, until character string S to be slit 1till scanned, now form character string S 2=u 1u 2u iu m, wherein u ifor S 2in single character, m is the length of this character string;
Character string S is used in first participle module and the second word-dividing mode 2set up the directed acyclic graph G that a nodes is m+1, the numbering of directed acyclic graph G node is followed successively by v 0, v 1, v 2v m, m is the length of this character string, in adjacent two vertex v k, v k+1between set up directed edge <v k, v k+1>, this directed edge <v k, v k+1the vocabulary that > is corresponding is u k+1, (k=0,1,2...m-1, m are the length of this character string), if there is the directed edge be directly connected between any two directed acyclic graph G nodes, then thinks that these two internodal distances are 1, if character string S 2character substring h 1=u pu p+1u q, (1≤p < q) is the primitive after abbreviation reduction, then with node v p-1, v qfor start node and terminating node set up directed edge <v p-1, v q>, this directed edge limit <v p-1, v qthe vocabulary that > is corresponding is character string S 2character substring h 1;
Operation technique information terminological data bank and equipment part name database are to character string S respectively 2mate, if there is the maximum word length character substring h of coupling 2=u au a+1u b, (1≤a < b), and maximum word length character substring h 2node v a-1with node v bbetween there is not directed edge <v a-1, v b>, and have a>=p+1 or b≤q-1 to set up, then with node v a-1for start node, with node v bfor terminating node sets up directed edge <v a-1, v b>, the corresponding vocabulary in this limit is maximum word length character substring h 2;
Use universal word database to character string S 2mate, if there is the character string h of coupling 3=u cu c+1u d, (1≤c < d), and character string h 3node v c-1and v dbetween there is not directed edge <v c-1, v d>, then with character string h 3node v c-1for start node, with character string h 3node v dfor terminating node sets up directed edge <v c-1, v d>, this directed edge <v c-1, v dthe corresponding vocabulary of > is character string h 3; If character string h 3node v c-1with node v dbetween there is directed edge <v c-1, v d>, and directed edge <v c-1, v dthe character string type of > is maximum word length character substring h 2, then maximum word length character substring h is described 2exist in universal word database, therefore by maximum for its type word length character substring h 2change character substring h into 4;
After statistics directed edge generates in directed acyclic graph G from node v 0arrive v mpath front N paths from short to long, N elects 3 as, and a wherein the shortest paths considers all directed edge types, and it is h that character string type is all ignored in the second short path and the 3rd short path 1and h 2directed edge, be only h to corresponding vocabulary character string 3and h 4directed edge consider, namely in non-optimal path, only consider the matching result of general dictionary, reject the repetition directed edge existed in above-mentioned three paths, export respectively in each paths and remain vocabulary corresponding to directed edge, the result set of formation had both been final word segmentation result;
Step 7: first participle module by final word segmentation result obtained above respectively stored in each territory of index file in index data base, and the weighted value in each territory is set, each territory of index file comprises title field, territory, path, link text territory, subtitle territory and text field;
Step 8: the weight that index file in index data base is set, and multiple index file is formed section and finally forms index file; Index file weight arranges and is divided into standard encoding systems code weight to arrange and the setting of information code weight, according to data module document coding feature, the weight of various criterion coded system coding and information code is arranged, standard encoding systems code weight installation warrants standard encoding systems encoding equipment stratum level is lower, corresponding weight factor arranges higher rule, information code weight installation warrants subcategory information code arranges the rule of the weight than main classes Bie Genggao, then standard encoding systems code weight and information code multiplied by weight is obtained the weight of index file;
Step 9: utilize retrieval module to provide full article retrieval to user, retrieval module receives the retrieval request of user and calls inquiry mode and retrieve, this inquiry mode is specially: after the keyword invocation step 6 of user search is carried out word segmentation processing, in the index database formed with step 7, the participle content in each territory of document is mated, and the document searching all couplings collects as a result.
What the present invention is directed to that existing full-text search apparatus and method exist when apparel interactive electronic technical manual professional domain uses lacks specialized vocabulary and abbreviation thereof, lack the problem that adaptive segmentation methods and result for retrieval sequence not have optimization, by analyzing selected interactive electronic technical manual document preparation standard (i.e. S1000D standard) data module file structure and element-specific label feature, the specialized vocabulary type occurred in conjunction with apparel technical information and feature, complete the extraction of specialized vocabulary and abbreviation thereof, and according to multiclass Words ' Characteristics, design segmentation methods pointedly, quick position information is convenient to stored in index by after data module document content participle, and all kinds of factor weight value is set for solving result for retrieval sequencing problem, complete the structure of interactive electronic technical manual full-text search apparatus and method.Element tags feature and document content in this full-text search apparatus and method integrated data module documentation, utilize specialized vocabulary to carry out inquiring about and strengthen the weight of specialized vocabulary in document and search key, system can be inquired about at certain semantic hierarchies, the retrieval intention that the result for retrieval returned is close to the users more, thus ensure that high recall rate and the accuracy rate of this indexing unit.
Accompanying drawing explanation
Fig. 1 is the structural representation of apparel interactive electronic technical manual full-text search device in the present invention.
Wherein, 1-Common source database, 2-specialized vocabulary extraction module, 3-abbreviation extraction module, 4-first participle module, 5-technical information terminological data bank, 6-equipment part name database, 7-abbreviation database, 8-universal word database, 9-search records database, 10-user search command communication module, 11-retrieval module, the 12-the second word-dividing mode, 13-index data base, 14-index module.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail:
Apparel interactive electronic technical manual full-text search device as shown in Figure 1, it comprises database and functional module, wherein, described database comprises Common source database 1, technical information terminological data bank 5, equipment part name database 6, abbreviation database 7, universal word database 8, search records database 9 and index data base 13, described functional module comprises specialized vocabulary extraction module 2, abbreviation extraction module 3, first participle module 4, user search command communication module 10, retrieval module 11, second word-dividing mode 12 and index module 14, wherein Common source database 1 provides word retrieval source for specialized vocabulary extraction module 2 and abbreviation extraction module 3 and provides the content of word segmentation processing for first participle module 4, specialized vocabulary extraction module 2 is for extracting vocabulary and stored in technical information terminological data bank 5 and equipment part name database 6, abbreviation extraction module 3 is for extracting vocabulary stored in abbreviation database 7, first participle module 4 processes for the participle content after process is imported index module 14,
Index module 14 is for setting up index and stored in index data base 13, searching database 13 carries out matched and searched for the retrieval of content receiving the second word-dividing mode 12 word segmentation processing, and be back to retrieval module 11 sort mating the result set that obtains, retrieval module 11 carries out word segmentation processing for the retrieval of content of user being sent to the second word-dividing mode 12, retrieval module 11 is also for receiving the retrieval command of user search command communication module 10 and returning the result set after sequence and be sent to user search command communication module 10 to check, user search command communication module 10 is for being sent to search records database 9 by the retrieval command of user, search records database 9 is for providing word retrieval source for abbreviation extraction module 3,
Described technical information terminological data bank 5, equipment part name database 6, abbreviation database 7 and universal word database 8 are respectively coupling word set when first participle module 4 and the second word-dividing mode 12 provide participle.
Utilize above-mentioned apparel interactive electronic technical manual full-text search device to carry out the method retrieved, it comprises the steps:
Step 1: import the data module document according to selected interactive electronic technical manual document (the present embodiment is chosen as the interactive electronic technical manual document of S1000D) standard of compiling editor in Common source database 1, specialized vocabulary extraction module 2 extracts data module document (DM in Common source database 1 according to the requirement of described selected interactive electronic technical manual document preparation standard, DataModel) the technical information term in and equipment part title two class specialized vocabulary, and set up and mapping relations in corresponding data module documentation between data module coded message, and by above-mentioned two class specialized vocabularies and mapping relations stored in the technical information terminological data bank 5 of correspondence and equipment part name database 6,
Step 2: abbreviation extraction module 3 extracts the characteristic quantity of corresponding abbreviation from the equipment part title (full name) of Common source database 1, this characteristic quantity is numeral number in equipment part title or is commonly called as part and (such as equips title primitive " H1604A ' Ilyushin Coase dignity ' number bulk goods wheel ", its abbreviation must comprise numeral number " 1604 " and be commonly called as " Ilyushin Coase dignity " or the two one of, therefore, the position that this type of characteristic quantity location abbreviation may exist can be utilized, string matching before and after other character strings of primitive of recycling equipment title and characteristic quantity, identify the border fragment of abbreviation, the abbreviation identified is made to comprise most long word, calculate these abbreviation weights and decision threshold, build equipment title primitive and abbreviation between mapping relations and stored in abbreviation dictionary, complete abbreviation to extract),
Above-mentioned abbreviation extraction module 3 extracts the concrete grammar of corresponding abbreviation characteristic quantity from the equipment part title (full name) of Common source database 1, comprises the steps: the characteristic quantity of the abbreviation extracted in apparel title primitive; Because every class apparel has fixing naming rule, this naming rule therefore can be utilized to judge equipment name type and carry out cutting according to the constituent of rule to equipment title, completing the extraction of characteristic quantity, if apparel title primitive is W 0=w 1w 2w n, w ifor i-th character of title primitive, first the grammer instruments such as JAPE (aJavaAnnotationPatternsEngine) are used to formulate the regular expression of all kinds of equipment naming rule, each W in the equipment part title dictionary using these regular expression determination steps 1 to be formed 0affiliated name type, and according to the rule of hitting to W 0carry out cutting, obtain abbreviation characteristic quantity W 1=w pw q, 1≤p<q≤n;
Step 3: the user search record in data module document in above-mentioned characteristic quantity and Common source database 1 and search records database 9 is carried out matched and searched by abbreviation extraction module 3, determine the particular location of each element in data module document and user search record in characteristic quantity, the concrete grammar of step 3 is for setting hit character string as W 2, then W is met 2=W 1, become abbreviation candidate, W for preventing the character string of uncorrelated IETM system 2the types of equipment identification code MIC of residing data module document D M or the corresponding access links of search records must meet and W 1corresponding primitive W 0to map types of equipment identification code MIC identical;
Step 4: abbreviation extraction module 3 determines the head and the tail character string of characteristic quantity place abbreviation, and the border fragment of the corresponding abbreviation of recognition feature amount, the abbreviation identified is made to be complete abbreviation, this complete abbreviation is decided to be candidate's abbreviation (such as, " HMZ-360 radar identification target ", the words, " 360 " are characteristic quantities, " HMZ-360 radar " is the most long word of abbreviation, identifies not exclusively if only recognize " HMZ-360 " or " 360 radar ");
Step 5: abbreviation extraction module 3 calculates the weights of above-mentioned candidate's abbreviation by following formula 1:
W a = n m i c n a l l * lg D a l l D m i c - - - ( 1 )
N in formula micfor the number of times that candidate's abbreviation occurs in certain content, described certain content comprises the search key in the data module document content identical with the types of equipment identification code (MIC, Modelidentificationcode) of equipment part title and this data module document content search records; n allin the number of times occurred in all data module documents for candidate's abbreviation and search records database 9 in all search records the summation of occurrence number (business of the two weighs candidate's abbreviation word frequency, this value is higher, illustrate that candidate's abbreviation occurrence number in specific IETM system is more); D allfor all data module total number of documents and all search records sum sum; D micin order to comprise the data module total number of documents of candidate's abbreviation, (this logarithm value is for weighing the ubiquity of candidate's abbreviation with the search records sum sum comprising candidate's abbreviation, this value is higher, illustrates that candidate's abbreviation concentrates on minority data module document and occurs); W afor the weights of candidate's abbreviation, weigh the ability of theme for weighing candidate's abbreviation, W athreshold value be set-point, this threshold value is set as 2, when the weights of candidate's abbreviation are more than or equal to W athreshold value time (illustrating that it is higher with the IETM system theme degree of association of specifically equipping), candidate's abbreviation can be considered formal abbreviation, and by candidate's abbreviation stored in abbreviation database 7, the weights of candidate's abbreviation are less than W athreshold value time, candidate's abbreviation is not processed;
Step 6: respectively word segmentation processing is carried out to the user search keyword that data module documentation and retrieval module 11 provide in first participle module 4 and the second word-dividing mode 12, extract in the multiclass vocabulary formed at specialized vocabulary extraction module 2 and abbreviation extraction module 3, there is the compound vocabulary combined by multiple simple words, by there are many correct path after dictionary cutting in these vocabulary, cutting can be continued for " radar/test/device " as equipped title " radar testing device ", if only adopt single cutting result for this kind of compound vocabulary, be rejected causing matching way correct in a large number, obtain word segmentation result cannot meet the demand of user search, the present invention adopts on the basis of original N-shortest path segmenting method, in conjunction with Words ' Characteristics in the multiclass specialized vocabulary dictionary generated and existing universal word dictionary, when carrying out participle, carry out 3 dictionary matching processs altogether, first the abbreviation dictionary utilizing step 2 to obtain mates, the abbreviation existed in scanning technique information, and be reduced to corresponding equipment part title primitive, secondly the technical information term dictionary obtained by step 1 and equipment part title dictionary mate miss content of text, then by general dictionary, all content of text after reduction primitive are mated, after coupling, export satisfactory N paths, the result set that mulitpath is formed is final word segmentation result, and the detailed process of word segmentation processing is:
If character string to be slit is S 1=w 1w 2w 3w iw n, wherein, character string S to be slit 1for each content in the character string of user search keyword or data module document, w ifor S 1in single character, n is the length of this character string, n>=1, and i is the character number between 1 to n;
Use abbreviation database 7 to character string S to be slit 1scan, when abbreviation hits, by character string S to be slit 1the character substring of middle hit is reduced to corresponding primitive, until character string S to be slit 1till scanned, now form character string S 2=u 1u 2u iu m, wherein u ifor S 2in single character, m is the length of this character string;
Character string S is used in first participle module 4 and the second word-dividing mode 12 2set up the directed acyclic graph G that a nodes is m+1, the numbering of directed acyclic graph G node is followed successively by v 0, v 1, v 2v m, m is the length of this character string, in adjacent two vertex v k, v k+1between set up directed edge <v k, v k+1>, this directed edge <v k, v k+1the vocabulary that > is corresponding is u k+1, (k=0,1,2...m-1, m are the length of this character string), if there is the directed edge be directly connected between any two directed acyclic graph G nodes, then thinks that these two internodal distances are 1, if character string S 2character substring h 1=u pu p+1u q, (1≤p < q) is the primitive after abbreviation reduction, then with node v p-1, v qfor start node and terminating node set up directed edge <v p-1, v q>, this directed edge limit <v p-1, v qthe vocabulary that > is corresponding is character string S 2character substring h 1;
Operation technique information terminological data bank 5 and equipment part name database 6 couples of character string S respectively 2mate, if there is the maximum word length character substring h of coupling 2=u au a+1u b, (1≤a < b), and maximum word length character substring h 2node v a-1with node v bbetween there is not directed edge <v a-1, v b>, and have a>=p+1 or b≤q-1 to set up, then with node v a-1for start node, with node v bfor terminating node sets up directed edge <v a-1, v b>, the corresponding vocabulary in this limit is maximum word length character substring h 2;
Use universal word database 8 couples of character string S 2mate, if there is the character string h of coupling 3=u cu c+1u d, (1≤c < d), and character string h 3node v c-1and v dbetween there is not directed edge <v c-1, v d>, then with character string h 3node v c-1for start node, with character string h 3node v dfor terminating node sets up directed edge <v c-1, v d>, this directed edge <v c-1, v dthe corresponding vocabulary of > is character string h 3; If character string h 3node v c-1with node v dbetween there is directed edge <v c-1, v d>, and directed edge <v c-1, v dthe character string type of > is maximum word length character substring h 2, then maximum word length character substring h is described 2exist in universal word database 8, therefore by maximum for its type word length character substring h 2change character substring h into 4, be convenient to follow-up output processing;
After statistics directed edge generates in directed acyclic graph G from node v 0arrive v mpath front N paths from short to long, N elects 3 as, and a wherein the shortest paths considers all directed edge types, and it is h that character string type is all ignored in the second short path and the 3rd short path 1and h 2directed edge, be only h to corresponding vocabulary character string 3and h 4directed edge consider, namely in non-optimal path, only consider that the matching result of general dictionary (prevents above N-shortest path segmenting method 3 cuttings from also cannot meet Search Requirement, avoid the excessive situation that just can reach good cutting granularity of N value), reject the repetition directed edge existed in above-mentioned three paths, export respectively in each paths and remain vocabulary corresponding to directed edge, the result set of formation had both been final word segmentation result;
Step 7: in first participle module 4 by each territory of final word segmentation result obtained above respectively stored in index file in index data base 13, and the weighted value in each territory is set, for the sequence of final result for retrieval provides parameter, multiple document forms section and finally forms index file, stored in disk or internal memory, each territory of index file comprises title field, territory, path, link text territory, subtitle territory and text field;
Step 8: the weight that index file in index data base 13 is set, and multiple index file is formed section and finally forms index file, and stored in disk or internal memory, index file weight arranges and is divided into standard encoding systems (StandardNumberingSystems, SNS) code weight is arranged and the setting of information code weight, according to data module document coding feature, the weight of various criterion coded system coding and information code is arranged, standard encoding systems code weight installation warrants standard encoding systems encoding equipment stratum level is lower, corresponding weight factor arranges higher rule, information code weight installation warrants subcategory information code arranges the rule of the weight than main classes Bie Genggao, then standard encoding systems code weight and information code multiplied by weight are obtained the weight of index file,
Step 9: utilize retrieval module 11 to provide full article retrieval to user, retrieval module 11 receives the retrieval request of user and calls inquiry mode and retrieve, this inquiry mode is specially: after the keyword invocation step 6 of user search is carried out word segmentation processing, in the index database formed with step 7, the participle content in each territory of document is mated, and the document searching all couplings collects as a result.
In the step 7 of technique scheme, the weighted value installation warrants of each territory of index file and correspondence is as follows:
The word segmentation result of title field store data module title <dmtitle>, appear at the theme of the entry reflection entire chapter data module document of title field, the weight of title field is set to 10;
Territory, path is used for identification documents access path, and store data module coding information realizes identification path function, and territory, path does not participate in participle and retrieving, and territory, path is without the need to arranging weight;
The word segmentation result that link text territory is used for store data module coding link reduction content of text is (the same with inside webpage, link is there is in data module content, link and occur with the form of data module coding, user can access other data module by clickthrough, map being formed between data module coding and vocabulary in step 1, herein for utilizing this mapping relations that coding is reduced to the result of vocabulary content then participle), also for realizing the retrieval to link anchor text, when search key hits in link text territory, the content that the data module document module that link is pointed to may be searched for user, the weight in link text territory is set to 3,
Subtitle territory is used for the word segmentation result depositing reflection local topic information <title> (label of local topic, local topic content is deposited in the inside), and the weight in subtitle territory is set to 5;
Text field is used for other technical information participle (other technical information is the body matter except subtitle and link information) result in store data module documentation, and the weight of text field is set to 1.
The step 1 of technique scheme, specifically comprises the steps:
Step 101: choose certain text content and extract equipment part title and technical information term two class specialized vocabulary, wherein element-specific comprises technical name <techname> and name of the information <infoname>, in data module title, technical name <techname> is for describing equipment part title, name of the information <infoname> is used for description technique information term, therefore the text message extracting this two dvielement completes the extraction of specialized vocabulary,
Step 102: set up specialized vocabulary and corresponding data module coding (DataModelCode, DMC) mapping relations between information, mapping relations wherein refer to standard encoding systems (StandardNumberingSystems, SNS) and between equipment part title, mapping relations between information code <incode> and technical information term, links and accesses information is the resource that in retrieving, a part is important, but the link due to data module document is quoted and is not provided Anchor Text information, but data module coding realizes by reference, therefore need that data module coded message is reduced to text and just can enter range of search, the accurate coded system SNS of daughter element of data module coding is for describing the hierarchical location of assembly in whole equipment of current data block document description, the equipment part title that therefore can describe with technical name <techname> forms mapping relations, thus utilize equipment part title to complete retrieval to coded system SNS, set up the mapping relations between the daughter element information code <incode> of data module encoding D MC and name of the information <infoname>, technical information term is utilized to complete the retrieval of information code due in different apparel interactive electronic technical manual IETM systems, identical technical information or coding corresponding to equipment part title may be different, in order to the situation preventing this mapping inconsistent, corresponding types of equipment identification code (Modelidentificationcode is added in corresponding information code and coded system SNS code, MIC), MIC code plays the effect of definition equipment title and model, it is the coding uniquely determining to equip that authoritative institution formulates,
Step 103: by extract vocabulary with corresponding coded message respectively stored in equipment part title dictionary and technical information term dictionary, wherein equipment part title dictionary is for depositing the coded system SNS coded message of equipment title or parts title and correspondence, and technical information term dictionary is for depositing the information code coded message of technical information term and correspondence.
In the step 4 of technique scheme, because apparel abbreviation occurs with condensation and slightly two kinds of forms of cutting, therefore the character string occurred in abbreviation must be character in primitive (being the full name that " abbreviation " is corresponding), and it is constant to meet the relative primitive that puts in order of character in abbreviation; Read in W 2a character on left side or right side, if this candidate characters is w c, judge w cat W 0in whether exist and meet and W 2put in order at W 0in do not change, if satisfied condition, then judge w cfor the border character of candidate's abbreviation, make W 2equal w cw 2or W 2w cif do not satisfy condition, then w cbe not character in abbreviation, current direction character judges to stop, and border is determined, repeats above process, until the character boundary of both direction judges all to stop, and W now 2for final candidate's abbreviation.
In the step 7 of technique scheme, index is used for the text message needed for quick position, thus avoid read-write operations a large amount of in retrieving, index uses specific data structure to complete quick position to entry, the present invention is on the basis of general full-text search kit Lucene, design is applicable to the index structure of IETM full-text search apparatus and method, index structure in Lucene is divided into index, section, document, territory and entry Pyatyi level altogether from high to low, wherein entry is the base unit of index, deposits each character string after word segmentation processing; Territory is for comprising in single section of document the different information of separately index, and as title, text, link, territory is that user can the structure of designed, designed, to realize the retrieval to dissimilar document; Document is the base unit setting up index, and in the present invention, an index file deposits the information after the process of a data module documentation; Section is made up of multiple document, and can be considered a small-sized index, multiple sections finally form index.
In the step 8 of technique scheme, standard encoding systems (StandardNumberingSystems, SNS) the equipment part level that code weight represents according to standard encoding systems is determined, the numeral of SNS code describes the equipment level in current data block residing for equipment part, SNS code 00-00-00, 0a-00-00, 0a-b0-00, 0a-bd-00 and 0a-bd-fg, (a ≠ 0, b ≠ 0, d ≠ 0, ∪ g ≠ 0, f ≠ 0) respectively describes in bear layer aggregated(particle) structure and is in equipment level, system-level, subsystem irrespective of size, the equipment part of subsystem irrespective of size and more bottom equipment partition level, when search key hit document, SNS code layer time higher data module document may only have local content and user's information needed to link up with, the ratio that the data module document reflection user information needed that SNS code layer is time lower on the contrary accounts for document content is higher, therefore, SNS code equipment stratum level is lower, and the weight factor of corresponding document arranges higher, equipment level, system-level, subsystem irrespective of size, the SNS code weight of subsystem irrespective of size and more bottom equipment partition level is set to 1 heavily respectively, 2, 3, 4 and 5,
The information category size of information code weight described by this information code is determined, information code a00 and abc, (b ≠ 0, c ≠ 0) respectively describe large classification and the subclass of technical information, when search key hit document, the possibility of the information code rank that granularity is less and the relevance needed for user is higher, therefore, subcategory information code arranges the weight higher than large classification, and it is 1 that the present invention arranges large class weight value, and subclass weighted value is 2.
In the step 9 of technique scheme, the sort by vector space model (VSM, VectorSpaceModel) of result set calculates, and concrete formula is as follows:
c o o r d ( q , d ) = Num d t / Num q t S q d = c o o r d ( q , d ) * q u e r y t n o r m ( q ) * S d t S d t = &Sigma; i = 1 n ( t f ( t i , d ) * i d f ( t i ) 2 * Boost t i * n o r m ( t i , d ) ) n o r m ( t , d ) = Boost d * &Pi; Boost f / Num t e r m - - - ( 2 )
If document is d in index, the search key of user is q, q result after point word segmentation is t 1/ t 2/ ... / t n(S dti from 1 to n result, the inside comprises t n), wherein n is the entry sum after cutting, t ifor single keyword entry, n>=1, i is the character number between 1 to n, S qdrepresent the score of mating search key q in index file d, for sort result factor, its value is higher, in result set, document ordering is more forward, coord (q, d) for weighing the number not repeating entry in index file d, by there is not repetitor bar number Num in computation index document d dtwith not repetitor bar number Num in search key q qtbusiness obtain, querytnorm (q) is regulatory factor, on marking ranking results do not affect, the size of this value for integrally-regulated score can be set, S dtrepresent in index file d, hit all single keyword entry t iscore and, tf (t i, d) represent single keyword entry t iin the word frequency score that index file d occurs, idf (t i) represent single keyword entry t iat how many documents occurred, this value is higher, and t is described ithe document occurred is fewer, single keyword entry t ilarger with particular topic correlativity, Boost tifor single keyword entry t iweight, according to keyword entry t single during participle imate dictionary and determine, the weight that norm (t, d) is index file d and length factor gather value, wherein Boost dfor index file d weight, this value size arranges according to the index file of index module each territory weight described in step 7 and decides, Boost ffor hitting single keyword entry t in index file d ithe weight in territory, this value size arranges decision, Num according to the index file of index module each territory weight described in step 7 termbe the cutting entry sum in index file d, this value is larger, and norm (t, d) score is lower;
The weight of described search key bar decides according to coupling dictionary type during participle, and installation warrants is as follows:
(1) the entry reflection user search intention of hitting from abbreviation dictionary, technical information term dictionary and equipment part title dictionary is comparatively large, and weighted value is set to 5.
(2) the dictionary reflection user search intention of mating in general dictionary is comparatively unilateral, and weighted value is set to 2.
(3) the individual character granularity of division occurred in participle process is meticulous, and the noise data caused during retrieval is too much, and weighted value is set to 1.
After sequence, retrieval module is with the ranking results of certain forms Output rusults collection, the results page every page returned ten result for retrieval, each result exports the information segment the entry adding red highlighted hit that hit entry place, and provide title and the data module coding (DataModelCode of hit document, DMC) information, user accesses former data module document by the hyperlink clicking title.
The content that this instructions is not described in detail belongs to the known prior art of professional and technical personnel in the field.

Claims (7)

1. an apparel interactive electronic technical manual full-text search device, it is characterized in that: it comprises database and functional module, wherein, described database comprises Common source database (1), technical information terminological data bank (5), equipment part name database (6), abbreviation database (7), universal word database (8), search records database (9) and index data base (13), described functional module comprises specialized vocabulary extraction module (2), abbreviation extraction module (3), first participle module (4), user search command communication module (10), retrieval module (11), second word-dividing mode (12) and index module (14), wherein Common source database (1) provides word retrieval source for specialized vocabulary extraction module (2) and abbreviation extraction module (3) and provides the content of word segmentation processing for first participle module (4), specialized vocabulary extraction module (2) is for extracting vocabulary and stored in technical information terminological data bank (5) and equipment part name database (6), abbreviation extraction module (3) is for extracting vocabulary stored in abbreviation database (7), first participle module (4) is for importing index module (14) process by the participle content after process,
Index module (14) is for setting up index and stored in index data base (13), searching database (13) carries out matched and searched for the retrieval of content receiving the second word-dividing mode (12) word segmentation processing, and be back to retrieval module (11) sort mating the result set that obtains, retrieval module (11) carries out word segmentation processing for the retrieval of content of user being sent to the second word-dividing mode (12), retrieval module (11) is also for receiving the retrieval command of user search command communication module (10) and returning the result set after sequence and be sent to user search command communication module (10), user search command communication module (10) is for being sent to search records database (9) by the retrieval command of user, search records database (9) is for providing word retrieval source for abbreviation extraction module (3),
Described technical information terminological data bank (5), equipment part name database (6), abbreviation database (7) and universal word database (8) are respectively coupling word set when first participle module (4) and the second word-dividing mode (12) provide participle.
2. utilize apparel interactive electronic technical manual full-text search device described in claim 1 to carry out the method retrieved, it is characterized in that, it comprises the steps:
Step 1: import in Common source database (1) according to the selected standard compliant data module document of interactive electronic technical manual document preparation, specialized vocabulary extraction module (2) is according to the technical information term in requirement extraction Common source database (1) the interior data module document of described selected interactive electronic technical manual document preparation standard and equipment part title two class specialized vocabulary, and set up and mapping relations in corresponding data module documentation between data module coded message, and by above-mentioned two class specialized vocabularies and mapping relations stored in the technical information terminological data bank (5) of correspondence and equipment part name database (6),
Step 2: abbreviation extraction module (3) extracts the characteristic quantity of corresponding abbreviation from the equipment part title of Common source database (1), this characteristic quantity is numeral number in equipment part title or is commonly called as part;
Step 3: the user search record in above-mentioned characteristic quantity and the interior data module document of Common source database (1) and search records database (9) is carried out matched and searched by abbreviation extraction module (3), determines the particular location of each element in data module document and user search record in characteristic quantity;
Step 4: abbreviation extraction module (3) determines the head and the tail character string of characteristic quantity place abbreviation, and the border fragment of the corresponding abbreviation of recognition feature amount, make the abbreviation identified be complete abbreviation, this complete abbreviation be decided to be candidate's abbreviation;
Step 5: abbreviation extraction module (3) calculates the weights of above-mentioned candidate's abbreviation by following formula 1:
W a = n m i c n a l l * lg D a l l D m i c - - - ( 1 )
N in formula micfor the number of times that candidate's abbreviation occurs in certain content, described certain content comprises the search key in the data module document content identical with the types of equipment identification code of equipment part title and this data module document content search records; n allthe summation of occurrence number in all search records in the number of times occurred in all data module documents for candidate's abbreviation and search records database (9); D allfor all data module total number of documents and all search records sum sum; D micfor comprising the data module total number of documents of candidate's abbreviation and comprising the search records sum sum of candidate's abbreviation; W afor the weights of candidate's abbreviation, weigh the ability of theme for weighing candidate's abbreviation, W athreshold value be set-point, when the weights of candidate's abbreviation are more than or equal to W athreshold value time, candidate's abbreviation can be considered formal abbreviation, and by candidate's abbreviation stored in abbreviation database (7), the weights of candidate's abbreviation are less than W athreshold value time, candidate's abbreviation is not processed;
Step 6: respectively word segmentation processing is carried out to the user search keyword that data module documentation and retrieval module (11) provide in first participle module (4) and the second word-dividing mode (12); The detailed process of word segmentation processing is:
If character string to be slit is S 1=w 1w 2w 3w iw n, wherein, character string S to be slit 1for each content in the character string of user search keyword or data module document, w ifor S 1in single character, n is the length of this character string, n>=1, and i is the character number between 1 to n;
Use abbreviation database (7) to character string S to be slit 1scan, when abbreviation hits, by character string S to be slit 1the character substring of middle hit is reduced to corresponding primitive, until character string S to be slit 1till scanned, now form character string S 2=u 1u 2u iu m, wherein u ifor S 2in single character, m is the length of this character string;
Character string S is used in first participle module (4) and the second word-dividing mode (12) 2set up the directed acyclic graph G that a nodes is m+1, the numbering of directed acyclic graph G node is followed successively by v 0, v 1, v 2v m, m is the length of this character string, in adjacent two vertex v k, v k+1between set up directed edge <v k, v k+1>, this directed edge <v k, v k+1the vocabulary that > is corresponding is u k+1, (k=0,1,2...m-1, m are the length of this character string), if there is the directed edge be directly connected between any two directed acyclic graph G nodes, then thinks that these two internodal distances are 1, if character string S 2character substring h 1=u pu p+1u q, (1≤p < q) is the primitive after abbreviation reduction, then with node v p-1, v qfor start node and terminating node set up directed edge <v p-1, v q>, this directed edge limit <v p-1, v qthe vocabulary that > is corresponding is character string S 2character substring h 1;
Operation technique information terminological data bank (5) and equipment part name database (6) are to character string S respectively 2mate, if there is the maximum word length character substring h of coupling 2=u au a+1u b, (1≤a < b), and maximum word length character substring h 2node v a-1with node v bbetween there is not directed edge <v a-1, v b>, and have a>=p+1 or b≤q-1 to set up, then with node v a-1for start node, with node v bfor terminating node sets up directed edge <v a-1, v b>, the corresponding vocabulary in this limit is maximum word length character substring h 2;
Use universal word database (8) to character string S 2mate, if there is the character string h of coupling 3=u cu c+1u d, (1≤c < d), and character string h 3node v c-1and v dbetween there is not directed edge <v c-1, v d>, then with character string h 3node v c-1for start node, with character string h 3node v dfor terminating node sets up directed edge <v c-1, v d>, this directed edge <v c-1, v dthe corresponding vocabulary of > is character string h 3; If character string h 3node v c-1with node v dbetween there is directed edge <v c-1, v d>, and directed edge <v c-1, v dthe character string type of > is maximum word length character substring h 2, then maximum word length character substring h is described 2exist in universal word database (8), therefore by maximum for its type word length character substring h 2change character substring h into 4;
After statistics directed edge generates in directed acyclic graph G from node v 0arrive v mpath front N paths from short to long, N elects 3 as, and a wherein the shortest paths considers all directed edge types, and it is h that character string type is all ignored in the second short path and the 3rd short path 1and h 2directed edge, be only h to corresponding vocabulary character string 3and h 4directed edge consider, namely in non-optimal path, only consider the matching result of general dictionary, reject the repetition directed edge existed in above-mentioned three paths, export respectively in each paths and remain vocabulary corresponding to directed edge, the result set of formation had both been final word segmentation result;
Step 7: in first participle module (4) by each territory of final word segmentation result obtained above respectively stored in index file in index data base (13), and the weighted value in each territory is set, each territory of index file comprises title field, territory, path, link text territory, subtitle territory and text field;
Step 8: the weight that index data base (13) interior index file is set, and multiple index file is formed section and finally forms index file; Index file weight arranges and is divided into standard encoding systems code weight to arrange and the setting of information code weight, according to data module document coding feature, the weight of various criterion coded system coding and information code is arranged, standard encoding systems code weight installation warrants standard encoding systems encoding equipment stratum level is lower, corresponding weight factor arranges higher rule, information code weight installation warrants subcategory information code arranges the rule of the weight than main classes Bie Genggao, then standard encoding systems code weight and information code multiplied by weight is obtained the weight of index file;
Step 9: utilize retrieval module (11) to provide full article retrieval to user, retrieval module (11) receives the retrieval request of user and calls inquiry mode and retrieve, this inquiry mode is specially: after the keyword invocation step 6 of user search is carried out word segmentation processing, in the index database formed with step 7, the participle content in each territory of document is mated, and the document searching all couplings collects as a result.
3. search method according to claim 2, is characterized in that: in described step 7, the word segmentation result of title field store data module title, and appear at the theme of the entry reflection entire chapter data module document of title field, the weight of title field is set to 10.
4. search method according to claim 2, it is characterized in that: in described step 7, territory, path is used for identification documents access path, and store data module coding information realizes identification path function, territory, path does not participate in participle and retrieving, and territory, path is without the need to arranging weight.
5. search method according to claim 2, it is characterized in that: in described step 7, link text territory is used for the word segmentation result of store data module coding link reduction content of text, also for realizing the retrieval to link anchor text, when search key hits in link text territory, the content that the data module document module that link is pointed to may be searched for user, the weight in link text territory is set to 3.
6. search method according to claim 2, is characterized in that: in described step 7, and subtitle territory is for depositing the word segmentation result of the label of reflection local topic information, and the weight in subtitle territory is set to 5.
7. search method according to claim 2, is characterized in that: in described step 7, and text field is used for other technical information word segmentation result in store data module documentation, and the weight of text field is set to 1.
CN201510884252.XA 2015-12-03 2015-12-03 Apparel interactive electronic technical manual full-text search device and method Active CN105528411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510884252.XA CN105528411B (en) 2015-12-03 2015-12-03 Apparel interactive electronic technical manual full-text search device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510884252.XA CN105528411B (en) 2015-12-03 2015-12-03 Apparel interactive electronic technical manual full-text search device and method

Publications (2)

Publication Number Publication Date
CN105528411A true CN105528411A (en) 2016-04-27
CN105528411B CN105528411B (en) 2019-08-20

Family

ID=55770634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510884252.XA Active CN105528411B (en) 2015-12-03 2015-12-03 Apparel interactive electronic technical manual full-text search device and method

Country Status (1)

Country Link
CN (1) CN105528411B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562716A (en) * 2017-07-18 2018-01-09 阿里巴巴集团控股有限公司 Term vector processing method, device and electronic equipment
CN107844472A (en) * 2017-07-18 2018-03-27 阿里巴巴集团控股有限公司 Term vector processing method, device and electronic equipment
CN110851692A (en) * 2018-07-27 2020-02-28 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN110990663A (en) * 2019-12-04 2020-04-10 中船黄埔文冲船舶有限公司 Ship process knowledge management method, device and system
CN111339244A (en) * 2020-02-29 2020-06-26 山东浪潮通软信息科技有限公司 Tax policy and regulation inquiry method, computer equipment and storage medium
CN112084290A (en) * 2019-06-13 2020-12-15 北京沃东天骏信息技术有限公司 Data retrieval method, device, equipment and storage medium
CN115329086A (en) * 2022-08-29 2022-11-11 中铁四局集团电气化工程有限公司 Rail transit document retrieval system and method based on classified coding
CN115688690A (en) * 2022-11-16 2023-02-03 金航数码科技有限责任公司 Dynamic conversion method for converting Word document content into XML fragment conforming to S1000D standard
CN116227488A (en) * 2023-05-09 2023-06-06 北京拓普丰联信息科技股份有限公司 Text word segmentation method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023989A (en) * 2009-09-23 2011-04-20 阿里巴巴集团控股有限公司 Information retrieval method and system thereof
CN102810096A (en) * 2011-06-02 2012-12-05 阿里巴巴集团控股有限公司 Retrieval method and device based on separate character indexing system
CN103823799A (en) * 2012-11-16 2014-05-28 镇江诺尼基智能技术有限公司 New-generation industry knowledge full-text search method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023989A (en) * 2009-09-23 2011-04-20 阿里巴巴集团控股有限公司 Information retrieval method and system thereof
CN102810096A (en) * 2011-06-02 2012-12-05 阿里巴巴集团控股有限公司 Retrieval method and device based on separate character indexing system
CN103823799A (en) * 2012-11-16 2014-05-28 镇江诺尼基智能技术有限公司 New-generation industry knowledge full-text search method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107844472B (en) * 2017-07-18 2021-08-24 创新先进技术有限公司 Word vector processing method and device and electronic equipment
CN107844472A (en) * 2017-07-18 2018-03-27 阿里巴巴集团控股有限公司 Term vector processing method, device and electronic equipment
CN107562716A (en) * 2017-07-18 2018-01-09 阿里巴巴集团控股有限公司 Term vector processing method, device and electronic equipment
CN110851692A (en) * 2018-07-27 2020-02-28 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN112084290B (en) * 2019-06-13 2024-04-05 北京沃东天骏信息技术有限公司 Data retrieval method, device, equipment and storage medium
CN112084290A (en) * 2019-06-13 2020-12-15 北京沃东天骏信息技术有限公司 Data retrieval method, device, equipment and storage medium
CN110990663A (en) * 2019-12-04 2020-04-10 中船黄埔文冲船舶有限公司 Ship process knowledge management method, device and system
CN110990663B (en) * 2019-12-04 2023-03-24 中船黄埔文冲船舶有限公司 Ship process knowledge management method, device and system
CN111339244A (en) * 2020-02-29 2020-06-26 山东浪潮通软信息科技有限公司 Tax policy and regulation inquiry method, computer equipment and storage medium
CN115329086A (en) * 2022-08-29 2022-11-11 中铁四局集团电气化工程有限公司 Rail transit document retrieval system and method based on classified coding
CN115329086B (en) * 2022-08-29 2024-04-16 中铁四局集团电气化工程有限公司 Track traffic document retrieval system and method based on classification coding
CN115688690A (en) * 2022-11-16 2023-02-03 金航数码科技有限责任公司 Dynamic conversion method for converting Word document content into XML fragment conforming to S1000D standard
CN115688690B (en) * 2022-11-16 2023-10-03 金航数码科技有限责任公司 Dynamic conversion method for converting Word document content into XML fragment conforming to S1000D standard
CN116227488A (en) * 2023-05-09 2023-06-06 北京拓普丰联信息科技股份有限公司 Text word segmentation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN105528411B (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN105528411A (en) Full-text retrieval device and method for interactive electronic technical manual of shipping equipment
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN100465954C (en) Reinforced clustering of multi-type data objects for search term suggestion
US8190556B2 (en) Intellegent data search engine
US7783629B2 (en) Training a ranking component
CN101542475B (en) System and method for searching and matching data having ideogrammatic content
CN103136352B (en) Text retrieval system based on double-deck semantic analysis
US7636714B1 (en) Determining query term synonyms within query context
CN105045875B (en) Personalized search and device
EP2045731A1 (en) Automatic generation of ontologies using word affinities
CN107122413A (en) A kind of keyword extracting method and device based on graph model
CN104915449B (en) A kind of facet searching system and method based on water conservancy object classification label
JP2005526317A (en) Method and system for automatically searching a concept hierarchy from a document corpus
CN103593410A (en) System for search recommendation by means of replacing conceptual terms
CN101566998A (en) Chinese question-answering system based on neural network
CN102023995A (en) Speech retrieval apparatus and speech retrieval method
EP2577521A2 (en) Detection of junk in search result ranking
CN109597895B (en) Knowledge graph-based official document searching method
CN103714118B (en) Book cross-reading method
CN101751439A (en) Image retrieval method based on hierarchical clustering
CN114911917B (en) Asset meta-information searching method and device, computer equipment and readable storage medium
CN105808739A (en) Search result ranking method based on Borda algorithm
CN114997288A (en) Design resource association method
CN115905489A (en) Method for providing bid and bid information search service
CN109165331A (en) A kind of index establishing method and its querying method and device of English place name

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant