CA3177671A1 - Enquiring method and device based on vertical search, computer equipment and storage medium - Google Patents

Enquiring method and device based on vertical search, computer equipment and storage medium

Info

Publication number
CA3177671A1
CA3177671A1 CA3177671A CA3177671A CA3177671A1 CA 3177671 A1 CA3177671 A1 CA 3177671A1 CA 3177671 A CA3177671 A CA 3177671A CA 3177671 A CA3177671 A CA 3177671A CA 3177671 A1 CA3177671 A1 CA 3177671A1
Authority
CA
Canada
Prior art keywords
data
statement
search
search engine
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3177671A
Other languages
French (fr)
Inventor
Jiaqing LI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3177671A1 publication Critical patent/CA3177671A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The present invention discloses a system for constructing a classification model for a vertical search engine, and corresponding device, method, electronic equipment, and memory. The system comprises: a memory, a system bus, a network, a processor configured to: prepare and extract scenario data to construct the classification model, determine search dimension target, wherein search fields are determined, and extraction of a feature and train the classification model, wherein the classification model classifies a keyword to obtain an attribute category of the keyword.

Description

ENQUIRING METHOD AND DEVICE BASED ON VERTICAL SEARCH, COMPUTER EQUIPMENT AND STORAGE MEDIUM
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the field of data processing technology, and more particularly to an enquiring method based on vertical search, and corresponding device, computer equipment and storage medium.
Description of Related Art
[0002] The search technology is currently widely applied in various fields of specialty. With the continuous growth in sizes of various information data, in order to utilize internal data resources more efficiently, enterprises possessing corresponding resources and capabilities usually prefer to create vertical search engines to direct to specific application scenarios, and to provide internal and external clients with high-quality information search services.
[0003] The vertical search engine receives a keyword input by a user, enquires in an inverted index document, calculates relevancies between indexed contents and the input keyword, performs sorting according to the relevancies, and finally produces search results according to the relevancies in a decreasing order. On one hand, the internal data of an enterprise is often characteristic of possessing multiple dimensions, so it is usual for a query to be searched in the multiple dimensions, and it is not infrequent for the input of the user to contain keywords of plural attribute dimensions at the same time.
On the other hand, a good vertical search engine is not only required to supply data enquiring function, but is also required to support the capability of providing one input with data query and Date Regue/Date Received 2022-09-29 retrieval of multiple dimensions, so as to enhance the precision of the query result.
Accordingly, it is thus demanded for the vertical search engine to smartly recognize the query keyword input by the user and its data attribute field as pertained, so as to provide support for further optimizing search query statements to thereby enhance precision of the search result and the searching experience.
SUMMARY OF THE INVENTION
[0004] In order to overcome the problems pending in the state of the art, embodiments of the present invention provide an enquiring method based on vertical search, and corresponding device, computer equipment and storage medium that support intention recognition of multi-fields search of search engines in the vertical search field, to thereby enhance precision of the search result and the searching experience.
[0005] To solve one or more of the aforementioned technical problem(s), the present invention employs the following technical solutions.
According to the first aspect, there is provided an enquiring method based on vertical search, and the method comprises the following steps:
[0006] performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement;
[0007] preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining keywords corresponding to the second statement, performing classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the keywords;
[0008] generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category; and
[0009] invoking a preset search engine interface, and matching out a query result according to Date Regue/Date Received 2022-09-29 the target query statement.
[0010] In some embodiments, the step of preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining keywords corresponding to the second statement includes:
[0011] performing a word segmentation process to the second statement that does not satisfy the matching rule in the initial query statement, and obtaining a word segmentation result;
and
[0012] determining the keywords of the second statement according to the word segmentation result and a preset rule.
[0013] In some embodiments, the method comprises, prior to the step of performing a word segmentation process to the second statement that does not satisfy the matching rule in the initial query statement:
[0014] denoising the second statement, and removing noise characters from the second statement.
[0015] In some embodiments, the step of generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category comprises:
[0016] generating data pairs on the basis of the first statement and the corresponding first attribute category, the keywords and the corresponding second attribute category;
[0017] generating the target query statement according to the data pairs and a preset search engine indexing rule.
[0018] In some embodiments, the method further comprises a process of training a classifying model, including:
[0019] obtaining training data according to a business scenario; and
[0020] employing the training data to train a preset classifier, and obtaining a trained classifying model.

Date Regue/Date Received 2022-09-29
[0021] In some embodiments, the preset classifier includes a logistic regression classifier or a support vector machine classifier.
[0022] According to the second aspect, there is provided an enquiring device based on vertical search, and the device comprises:
[0023] a matching module, for performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement;
[0024] an obtaining module, for preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining keywords corresponding to the second statement;
[0025] a classifying module, for performing classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the keywords;
[0026] a generating module, for generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category; and
[0027] an enquiring module, for invoking a preset search engine interface, and matching out a query result according to the target query statement.
[0028] According to the third aspect, there is provided a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:
[0029] performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement;
[0030] preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining keywords corresponding to the second statement, performing Date Regue/Date Received 2022-09-29 classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the keywords;
[0031] generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category; and
[0032] invoking a preset search engine interface, and matching out a query result according to the target query statement.
[0033] According to the fourth aspect, there is provided a computer-readable storage medium storing a computer program thereon, and the following steps are realized when the computer program is executed by a processor:
[0034] performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement;
[0035] preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining keywords corresponding to the second statement, performing classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the keywords;
[0036] generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category; and
[0037] invoking a preset search engine interface, and matching out a query result according to the target query statement.
[0038] The technical solutions provided by the embodiments of the present invention bring about the following advantageous effects:
[0039] In the enquiring method based on vertical search, and corresponding device, computer equipment and storage medium provided by the embodiments of the present invention, by performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement; preprocessing a second statement Date Regue/Date Received 2022-09-29 that does not satisfy the matching rule in the initial query statement, obtaining keywords corresponding to the second statement, performing classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the keywords; generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category;
and invoking a preset search engine interface, and matching out a query result according to the target query statement, search intention recognition of query statements for a vertical search engine is realized, and enquiring efficiency and user experience are enhanced.
BRIEF DESCRIPTION OF THE DRAWINGS
[0040] To more clearly describe the technical solutions in the embodiments of the present invention, drawings required to illustrate the embodiments are briefly introduced below.
Apparently, the drawings introduced below are merely directed to some embodiments of the present invention, while persons ordinarily skilled in the art may further acquire other drawings on the basis of these drawings without spending creative effort in the process.
[0041] Fig. 1 is a view illustrating equipment composition of a search intention recognizing system according to an exemplary embodiment;
[0042] Fig. 2 is a flowchart illustrating text model training according to an exemplary embodiment;
[0043] Fig. 3 is a flowchart illustrating the recognition of an attribute category of a keyword according to an exemplary embodiment;
[0044] Fig. 4 is a flowchart illustrating an enquiring method based on vertical search according to an exemplary embodiment;
[0045] Fig. 5 is a view schematically illustrating the structure of an enquiring device based on vertical search according to an exemplary embodiment; and
[0046] Fig. 6 is a view schematically illustrating the internal structure of a computer equipment according to an exemplary embodiment.

Date Regue/Date Received 2022-09-29
[0047] DETAILED DESCRIPTION OF THE INVENTION
[0048] To make more lucid and clear the objectives, technical solutions and advantages of the present invention, the technical solutions in the embodiments of the present invention will be clearly and comprehensively described below with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the embodiments as described are merely partial, rather than the entire, embodiments of the present invention.
Any other embodiments makeable by persons ordinarily skilled in the art on the basis of the embodiments in the present invention without creative effort shall all fall within the protection scope of the present invention.
[0049] Embodiment 1
[0050] As noted in the Description of Related Art, in such specific fields as the vertical search engine field, it is usually required to construct such data in the vertical field as structured business data into a vertical search engine to better provide query search services of the business data, in which there are no small quantities of text type data. By constructing a search engine, the highly efficient indexing technique of the search engine can be made use of to provide the business with the enquiring function of business data.
Usually, a good vertical search engine is not only required to supply data enquiring function, but is also required to support the capability of providing one input with data query and retrieval of multiple dimensions. It is thus demanded for the vertical search engine to smartly recognize the query keyword input by the user and its data attribute field as pertained, so as to provide support for further optimizing search query statements.
[0051] In order to solve the above problems, an enquiring method based on vertical search is creatively proposed in the embodiments of the present invention, the method includes a text recognizing method for a vertical search engine, whereby structured and non-Date Regue/Date Received 2022-09-29 structured cleaned business scenario data of the vertical field is based on to train a text classification model of multiple categories of field attributes searchable by the user, to recognize attributes of short texts with respect to keywords input by the user, so as to provide a search engine to search with respect to different fields. Taking an enterprise information search engine for example, enterprise information, for instance such text information as enterprise name and corporate name, such character string information as registration number and unified credit code, such numerical value information as registered capital etc., and some other information are based. In order to enquire out a certain enterprise, the vertical search engine supports text information search of enterprise name or corporate information, and also supports precise matching of character strings of the registration number and the unified credit code.
[0052] Fig. 1 is a view illustrating the framework of a search intention recognizing system according to an exemplary embodiment. With reference to Fig. 1, the system at least consists of a memory, a system bus, a processor and a network, of which the memory can be formed by a plurality of storage medium RAMs, while the specific attributes of the memory are not restricted in this context.
[0053] Specifically, the above solution can be realized by the following steps:
[0054] Step A ¨ constructing a multi-categories text classification model for a vertical search engine based on business scenario data. Specifically, in an embodiment of the present invention, this step includes the following processes:
[0055] Preparation and extraction of business scenario data
[0056] Specifically, it is firstly required to analyze the business data needing to construct a vertical search engine in conjunction with the business scenario requirement of the vertical search engine, and data is extracted from a relevant database to obtain structured data, in preparation for creating vertical search engine indexing data, and also for providing training data for model training.

Date Regue/Date Received 2022-09-29
[0057] Determination of search dimension target
[0058] Search fields of the vertical search engine are determined, and the search fields are fields desired to be provided to the user for search and query when the vertical search engine is being created. A labeling process is performed on the data according to the search fields, for instance, with respect to a search engine system that enquires real estate information, the enquiring function should be provided to enquire plural fields relevant to the housing community name information, housing intermediary name information, etc. All data under these fields are extracted according to the structured data obtained in the previous step, the corresponding field data are labeled with these fields, to form labeled data of multiple categories.
[0059] Processes of feature extraction and model training, etc.
[0060] Fig. 2 is a flowchart illustrating text model training according to an exemplary embodiment. With reference to Fig. 2, features of corresponding fields are selected according to the above data form. During specific implementation, it is possible to perform a word segmentation process by words or by phrases on the relevant text content fields, their features are extracted, and corresponding feature vectors are generated to serve as data for model training, for instance, TF-IDF feature vectors are generated with respect to TF-IDF features. The mode of training the classification model can be to select a classifier on the basis of the scikit-learn machine learning library, such as a logistic regression classifier, or a support vector machine classifier, and it is also possible to construct any other classifier.
[0061] Taking for example a classification model that constructs enterprise names and personal names, such a classification model is mainly used to differentiate and enquire company names and natural persons, legal persons or actual controllers or company high-ranking executives, etc., and used for different query logics (such as enterprise name search, legal person search, high-ranking executive search, or combined search, etc.). The process is Date Regue/Date Received 2022-09-29 as follows when the classification model is being constructed:
[0062] Training data is firstly extracted from a business database or a search engine, the data is a list of enterprise names and a list of personal names, labels are "enterprise name", "personal name", and the form is as follows:
enterprise name: [
'Beijing ** Furniture Company Ltd.', `Shenzhen ** Software Company Ltd.', 'Hebei ** Machinery Factory' ]
personal name: [
`Ren **', `Zhang **', 'Chen **', 'Li **', ]
A data set is secondly created, and is exemplified below:
('Beijing ** Furniture Company Ltd.', 'enterprise name') ('Shenzhen ** Software Company Ltd.', 'enterprise name') ... ...
('Hebei ** Machinery Factory', 'enterprise name') ('Ren **', 'personal name') ('Zhang **', 'personal name') ... ...
('Li **', 'personal name') A word segmentation process by words is thereafter performed on the data set, and the processing result is as follows:
('Beijing * * Furniture Company Ltd.', 'enterprise name') Date Regue/Date Received 2022-09-29 (` Shenzhen * * Software Company Ltd.', 'enterprise name') ... ...
(`Zhang * *', 'personal name') ... ...
('Li * *', 'personal name')
[0063] Subsequently, the data set is segmented into a training set and a testing set by a random order, in a proportion of 4:1, for example, the scikit-learn machine learning library is employed to perform TF-IDF text vector extraction, a TF-IDF matrix of the training set is generated, and a classifier is selected (from naive Bayes, logistic regression and support vector machine classifiers, for example) to perform model training to obtain the classifier.
[0064] Finally, the prediction capability of the classifier is tested and appraised, the testing set generated in the previous step is used to perform model appraisal on the classifier, and the practicality of the classifier is hence appraised.
[0065] Step B ¨ preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining keywords corresponding to the second statement, performing classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the keywords.
[0066] Specifically, Fig. 3 is a flowchart illustrating the recognition of an attribute category of a keyword according to an exemplary embodiment. With reference to Fig. 3, in the embodiments of the present invention, firstly performing regex matching to the input initial query statement, the result of the regex matching is performed with support keyword search, while the unmatched result, for example in the form of text, is thereafter processed with text character purification to remove noise characters from the text, for instance, to remove useless characters and punctuations, a Chinese word segmentation process is performed thereon, and a list of keywords contained in the initial query statement is extracted out.

Date Regue/Date Received 2022-09-29
[0067] Secondly, the classification model obtained in the previous step is invoked to classify each keyword, and to obtain an attribute category of each keyword. A
combination of (one or more) dimension(s) of the keyword is input as a basis for judging the search intention of the user, the search intention is based on to continue to perform such processing as rectifying and associating on the search word, and to output a data pair (keyword, attribute).
[0068] In the embodiments of the present invention, the vertical search engine accepts character inputs of any random form, so it is required to preprocess the query input character strings (namely initial query statements), to judge different inputs, to judge the attributes of the input character strings and to output the same.
[0069] An example is taken below:
[0070] Step 201 ¨ after having received the initial query statement, performing regex matching to the input initial query statement, judging whether it conforms to such a code format as a registration code or an enterprise credit code, if yes, labeling the character string as a corresponding code attribute, and outputting the same, otherwise continuing the process to enter step 202.
[0071] For instance:
(1) Input "91320000608950986L", and the output will be "social unified credit code"
(2) Input "future technology", the next processing is entered.
[0072] Step 202 ¨ inputting the preprocessed initial query statement in a text classifier, and outputting a correspondingly predicted attribute category.
[0073] For instance:
Input "future technology", and the classifier outputs "enterprise name"

Date Regue/Date Received 2022-09-29 Input "Zhang San", and the classifier outputs "personal name".
[0074] Step C ¨ constructing a target query statement, invoking a preset search engine interface, and matching out a query result according to the target query statement.
[0075] Specifically, based on the keyword attribute pair obtained in the previous step, a query statement (namely a target query statement) adapted to the data index of the underlying search engine is constructed, and a unified interface of the search engine is invoked, to obtain the enquired data result.
[0076] As a preferred mode of execution in the embodiments of the present invention, it is further possible to construct search intention recognizing system and device for searching enterprise information in advance based on a search intention recognizing module, so as to support query input of multiple attributes in the search of enterprise information, and to adapt to the retrieval of different attribute information according to attribute categories returned by the search intention recognizing module.
[0077] Embodiment 2
[0078] Fig. 4 is a flowchart illustrating an enquiring method based on vertical search according to an exemplary embodiment. With reference to Fig. 4, the method comprises the following steps:
[0079] 51 - performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement.
[0080] S2 - preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining keywords corresponding to the second statement, performing classifying processing to each of the keywords with a pretrained classifying model, Date Regue/Date Received 2022-09-29 obtaining a second attribute category of each of the keywords.
[0081] Specifically, in the embodiments of the present invention, character input of any random form is accepted, that is to say, the initial query statement is not restricted, so it is required to preprocess the query input character string, to judge different inputs, to judge the attribute of the input character string and to output the same.
[0082] Specifically, in order to enhance precision of the search query and the enquiring efficiency, the search intention of the user is firstly recognized according to the received initial query statement in the embodiments of the present invention, during specific implementation, it is possible to firstly extract keywords contained in the second statement, subsequently employ a pre-trained classification model to classify each keyword, and obtain the attribute category of each keyword.
[0083] S3 - generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category.
[0084] Specifically, a query statement adapted to the data index of the underlying search engine is constructed on the basis of the first statement, the first attribute category, the keywords and the corresponding second attribute category obtained in the previous step.
[0085] S4 - invoking a preset search engine interface, and matching out a query result according to the target query statement.
[0086] As a preferred mode of execution in the embodiments of the present invention, the step of preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining key words corresponding to the second statement includes:
[0087] performing a word segmentation process to the second statement that does not satisfy the matching rule in the initial query statement, and obtaining a word segmentation result;

Date Regue/Date Received 2022-09-29 and
[0088] determining the keywords of the second statement according to the word segmentation result and a preset rule.
[0089] Specifically, in the embodiments of the present invention, a keyword matching rule is predefined, the word segmentation result is matched according to this keyword matching rule, and the word segmentation result that conforms to the requirement is obtained as a keyword.
[0090] As a preferred mode of execution in the embodiments of the present invention, the method comprises, prior to the step of performing a word segmentation process to the second statement that does not satisfy the matching rule in the initial query statement:
[0091] denoising the second statement, and removing noise characters from the second statemen.
[0092] Specifically, in order to enhance enquiring efficiency and precision of the query, it is also possible in the embodiments of the present invention to denoise the second statement that does not satisfy the matching rule in the initial query statement to remove noise characters from the second statement, for instance, to remove useless characters and punctuations.
[0093] As a preferred mode of execution in the embodiments of the present invention, the step of generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category comprises:
[0094] generating data pairs on the basis of the first statement and the corresponding first attribute category, the keywords and the corresponding second attribute category;
[0095] generating the target query statement according to the data pairs and a preset search engine indexing rule.
[0096] As a preferred mode of execution in the embodiments of the present invention, the method further comprises a process of training a classification model, including:
[0097] obtaining training data according to a business scenario; and Date Regue/Date Received 2022-09-29
[0098] employing the training data to train a preset classifier, and obtaining a trained classification model.
[0099] As a preferred mode of execution in the embodiments of the present invention, the preset classifier includes a logistic regression classifier or a support vector machine classifier.
[0100] Fig. 5 is a view schematically illustrating the structure of an enquiring device based on vertical search according to an exemplary embodiment, and the device comprises:
[0101] a matching module, for performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement;
[0102] an obtaining module, for preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining key words corresponding to the second statement;
[0103] a classifying module, for performing classifying processing to each of the key words with a pretrained classifying model, obtaining a second attribute category of each of the key words;
[0104] a generating module, for generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category; and
[0105] an enquiring module, for invoking a preset search engine interface, and matching out a query result according to the target query statement.
[0106] As a preferred mode of execution in the embodiments of the present invention, the obtaining module includes:
[0107] a word segmenting unit, for performing a word segmentation process on the second statement that does not satisfy the matching rule in the initial query statement, and obtaining a word segmentation result; and
[0108] a matching unit, for determining the keywords of the second statement according to the word segmentation result and a preset rule.

Date Regue/Date Received 2022-09-29
[0109] As a preferred mode of execution in the embodiments of the present invention, the device further comprises:
[0110] A denoising module, for denoising the second statement, and removing noise characters from the second statement.
[0111] As a preferred mode of execution in the embodiments of the present invention, the generating module is specifically employed for:
[0112] generating data pairs on the basis of the first statement and the corresponding first attribute category, the keywords and the corresponding second attribute category;
[0113] generating the target query statement according to the data pairs and a preset search engine indexing rule.
[0114] As a preferred mode of execution in the embodiments of the present invention, the device further comprises:
[0115] a training module, for obtaining training data according to a business scenario, employing the training data to train a preset classifier, and obtaining a trained classification model.
[0116] As a preferred mode of execution in the embodiments of the present invention, the preset classifier includes a logistic regression classifier or a support vector machine classifier.
[0117] Fig. 6 is a view schematically illustrating the internal structure of a computer equipment according to an exemplary embodiment. With reference to Fig. 6, the computer equipment comprises a processor, a memory, and a network interface connected to each other via a system bus. The processor of the computer equipment is employed to provide computing and controlling capabilities. The memory of the computer equipment includes a nonvolatile storage medium, and an internal memory. The nonvolatile storage medium stores therein an operating system, a computer program and a database. The internal memory provides environment for the running of the operating system and the computer Date Regue/Date Received 2022-09-29 program in the nonvolatile storage medium. The network interface of the computer equipment is employed to connect to an external terminal via network for communication.
The computer program realizes an optimized method of executing a plan when it is executed by a processor.
[0118] As understandable to persons skilled in the art, the structure illustrated in Fig. 6 is merely a block diagram of partial structure relevant to the solution of the present invention, and does not constitute any restriction to the computer equipment on which the solution of the present invention is applied, as the specific computer equipment may comprise component parts that are more than or less than those illustrated in Fig. 6, or may combine certain component parts, or may have different layout of component parts.
[0119] As a preferred mode of execution in the embodiments of the present invention, the computer equipment comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:
[0120] performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement;
[0121] preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining key words corresponding to the second statement, performing classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the key words;
[0122] generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category; and
[0123] invoking a preset search engine interface, and matching out a query result according to the target query statement.
[0124] As a preferred mode of execution in the embodiments of the present invention, the Date Regue/Date Received 2022-09-29 following steps are further realized when the processor executes the computer program:
[0125] performing a word segmentation process to the second statement that does not satisfy the matching rule in the initial query statement, and obtaining a word segmentation result;
and
[0126] determining the keywords of the second statement according to the word segmentation result and a preset rule.
[0127] As a preferred mode of execution in the embodiments of the present invention, the following step is further realized when the processor executes the computer program:
[0128] denoising the second statement, and removing noise characters from the second statement.
[0129] As a preferred mode of execution in the embodiments of the present invention, the following steps are further realized when the processor executes the computer program:
[0130] generating data pairs on the basis of the first statement and the corresponding first attribute category, the keywords and the corresponding second attribute category;
[0131] generating the target query statement according to the data pairs and a preset search engine indexing rule.
[0132] As a preferred mode of execution in the embodiments of the present invention, the following steps are further realized when the processor executes the computer program:
[0133] obtaining training data according to a business scenario; and
[0134] employing the training data to train a preset classifier, and obtaining a trained classification model.
[0135] In the embodiments of the present invention, there is further provided a computer-readable storage medium storing a computer program thereon, and the following steps are realized when the computer program is executed by a processor:
[0136] performing regex matching to a received initial query statement, and obtaining a first Date Regue/Date Received 2022-09-29 statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement;
[0137] preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining key words corresponding to the second statement, performing classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the key words;
[0138] generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category; and
[0139] invoking a preset search engine interface, and matching out a query result according to the target query statement.
[0140] As a preferred mode of execution in the embodiments of the present invention, the following steps are further realized when the computer program is executed by a processor:
[0141] performing a word segmentation process to the second statement that does not satisfy the matching rule in the initial query statement, and obtaining a word segmentation result;
and
[0142] determining the keywords of the second statement according to the word segmentation result and a preset rule.
[0143] As a preferred mode of execution in the embodiments of the present invention, the following step is further realized when the computer program is executed by a processor:
[0144] denoising the second statement, and removing noise characters from the second statement.
[0145] As a preferred mode of execution in the embodiments of the present invention, the following steps are further realized when the computer program is executed by a processor:
[0146] generating data pairs on the basis of the first statement and the corresponding first attribute category, the keywords and the corresponding second attribute category;
Date Regue/Date Received 2022-09-29
[0147] generating the target query statement according to the data pairs and a preset search engine indexing rule.
[0148] As a preferred mode of execution in the embodiments of the present invention, the following steps are further realized when the computer program is executed by a processor:
[0149] obtaining training data according to a business scenario; and
[0150] employing the training data to train a preset classifier, and obtaining a trained classification model.
[0151] To sum it up, the technical solutions provided by the embodiments of the present invention bring about the following advantageous effects:
[0152] In the enquiring method based on vertical search, and corresponding device, computer equipment and storage medium provided by the embodiments of the present invention, by performing regex matching to a received initial query statement, and obtaining a first statement that satisfies a matching rule in the initial query statement, determining a first attribute category corresponding to the first statement; preprocessing a second statement that does not satisfy the matching rule in the initial query statement, obtaining key words corresponding to the second statement, performing classifying processing to each of the keywords with a pretrained classifying model, obtaining a second attribute category of each of the key words; generating a target query statement according to the first statement, the first attribute category, the keywords and the second attribute category;
and invoking a preset search engine interface, and matching out a query result according to the target query statement, search intention recognition of query statements for a vertical search engine is realized, and enquiring efficiency and user experience are enhanced.
[0153] As should be noted, when the enquiring device based on vertical search provided by the aforementioned embodiment triggers an enquiring business, the division into the aforementioned various functional modules is merely by way of example, while it is Date Regue/Date Received 2022-09-29 possible, in actual application, to base on requirements to assign the functions to different functional modules for completion, that is to say, to divide the internal structure of the device into different functional modules to complete the entire or partial functions described above. In addition, the enquiring device based on vertical search provided by this embodiment pertains to the same conception as the enquiring method based on vertical search provided by method embodiment, that is to say, the device is based on the enquiring method based on vertical search ¨ see the corresponding method embodiment for its specific realization process, while no repetition will be made in this context.
[0154] As understandable by persons ordinarily skilled in the art, realization of the entire or partial steps of the aforementioned embodiments can be completed by hardware, or by a program instructing relevant hardware, the program can be stored in a computer-readable storage medium, and the storage medium can be a read-only memory, a magnetic disk, or an optical disk, etc.
[0155] What is described above is merely directed to preferred embodiments of the present invention, and they are not meant to restrict the present invention. Any amendment, equivalent replacement and improvement makeable within the spirit and scope of the present invention shall all be covered within the protection scope of the present invention.

Date Regue/Date Received 2022-09-29

Claims (7)

What is claimed is:
1. A system for constructing a multi-categories text classification model for a vertical search engine, system comprising:
a memory, wherein the memory is formed by a plurality of storage medium RAMs, wherein specific attributes of the memory are not restricted in this context;
a system bus;
a network;
a processor configured to:
prepare and extract scenario data;
determine search dimension target; and process extraction and model training.
2. The system of claim 1, wherein prepare and extract scenario data comprises:
analyzing data needing to construct the vertical search engine in conjunction with a scenario requirement of the vertical search engine; and extract data from a database to obtain structured data, in preparation for creating vertical search engine indexing data, and for providing training data for model training.
3. The system of any one of claims 1 to 2, wherein determine search dimension target comprises:

Date Regue/Date Received 2022-09-29 determining search fields of the vertical search engine, wherein the search fields are fields desired to be provided to a user for search and query when the vertical search engine is being created;
performing a labeling process on the data according to the search fields; and extracting all the data under the search fields according to the structured data, wherein the corresponding field data are labeled with the search fields to form labeled data of multiple categories.
4. The system of any one of claims 1 to 3, wherein a word segmentation process is performed by words or by phrases on relevant text content fields, wherein their features are extracted, and corresponding feature vectors are generated to serve as data for model training.
5. The system of any one of claims 1 to 4, wherein the data is segmented into the training data and testing data by a random order, in a proportion of 4:1, wherein a classifier is selected to perform model training to obtain the classifier.
6. The system of any one of claims 1 to 5, wherein prediction capability of the classifier is tested and appraised, wherein the testing set is used to perform model appraisal on the classifier.
7. The system of any one of claims 1 to 6, accepts character inputs of any random form.

Date Regue/Date Received 2022-09-29
CA3177671A 2020-11-06 2021-11-08 Enquiring method and device based on vertical search, computer equipment and storage medium Pending CA3177671A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN202011229548.5A CN112035599B (en) 2020-11-06 2020-11-06 Query method and device based on vertical search, computer equipment and storage medium
CN202011229548.5 2020-11-06
CA3138556A CA3138556A1 (en) 2020-11-06 2021-11-08 Apparatuses, storage medium and method of querying data based on vertical search

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CA3138556A Division CA3138556A1 (en) 2020-11-06 2021-11-08 Apparatuses, storage medium and method of querying data based on vertical search

Publications (1)

Publication Number Publication Date
CA3177671A1 true CA3177671A1 (en) 2022-05-06

Family

ID=73572806

Family Applications (2)

Application Number Title Priority Date Filing Date
CA3177671A Pending CA3177671A1 (en) 2020-11-06 2021-11-08 Enquiring method and device based on vertical search, computer equipment and storage medium
CA3138556A Pending CA3138556A1 (en) 2020-11-06 2021-11-08 Apparatuses, storage medium and method of querying data based on vertical search

Family Applications After (1)

Application Number Title Priority Date Filing Date
CA3138556A Pending CA3138556A1 (en) 2020-11-06 2021-11-08 Apparatuses, storage medium and method of querying data based on vertical search

Country Status (2)

Country Link
CN (1) CN112035599B (en)
CA (2) CA3177671A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818005B (en) * 2021-02-03 2024-02-02 北京清科慧盈科技有限公司 Structured data searching method, device, equipment and storage medium
CN113254587B (en) * 2021-05-31 2023-10-13 北京奇艺世纪科技有限公司 Search text recognition method and device, computer equipment and storage medium
CN113590919A (en) * 2021-07-29 2021-11-02 小船出海教育科技(北京)有限公司 Search request processing method and device, electronic equipment and computer readable medium
CN114943234B (en) * 2022-06-27 2024-03-19 企查查科技股份有限公司 Enterprise name linking method, enterprise name linking device, computer equipment and storage medium
CN115563167B (en) * 2022-12-02 2023-03-31 浙江大华技术股份有限公司 Data query method, electronic device and computer-readable storage medium
CN117519702B (en) * 2023-12-29 2024-03-19 冠骋信息技术(苏州)有限公司 Search page design method and system based on low code collocation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915380A (en) * 2012-11-19 2013-02-06 北京奇虎科技有限公司 Method and system for carrying out searching on data
CN110020063B (en) * 2017-07-18 2021-09-03 北京京东尚科信息技术有限公司 Vertical search method and system
CN107577755B (en) * 2017-08-31 2020-06-19 江西博瑞彤芸科技有限公司 Searching method
CN107958406A (en) * 2017-11-30 2018-04-24 北京小度信息科技有限公司 Inquire about acquisition methods, device and the terminal of data

Also Published As

Publication number Publication date
CN112035599B (en) 2021-08-27
CA3138556A1 (en) 2022-05-06
CN112035599A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CA3177671A1 (en) Enquiring method and device based on vertical search, computer equipment and storage medium
US11663254B2 (en) System and engine for seeded clustering of news events
CN110968699B (en) Logic map construction and early warning method and device based on fact recommendation
US10565533B2 (en) Systems and methods for similarity and context measures for trademark and service mark analysis and repository searches
CN109726298B (en) Knowledge graph construction method, system, terminal and medium suitable for scientific and technical literature
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN111105209B (en) Job resume matching method and device suitable for person post matching recommendation system
CN114119058B (en) User portrait model construction method, device and storage medium
CN111552788B (en) Database retrieval method, system and equipment based on entity attribute relationship
CN114722137A (en) Security policy configuration method and device based on sensitive data identification and electronic equipment
CN112651236B (en) Method and device for extracting text information, computer equipment and storage medium
Jacobs Using statistical methods to improve knowledge-based news categorization
CN114153962A (en) Data matching method and device and electronic equipment
CN110837590A (en) Information pushing method and device, computer equipment and storage medium
CN111209753A (en) Entity naming identification method and device
CN113591476A (en) Data label recommendation method based on machine learning
Kusumaningrum et al. WCLOUDVIZ: Word cloud visualization of Indonesian news articles classification based on Latent dirichlet allocation
CN112487154B (en) Intelligent search method based on natural language
US20210073258A1 (en) Information processing apparatus and non-transitory computer readable medium
CN113779364A (en) Searching method based on label extraction and related equipment thereof
CN110930189A (en) Personalized marketing method based on user behaviors
CN112949287B (en) Hot word mining method, system, computer equipment and storage medium
Ağduk et al. Classification of news texts from different languages with machine learning algorithms
CN110008307B (en) Method and device for identifying deformed entity based on rules and statistical learning
CN116975198A (en) Information query method, device, equipment and medium

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929

EEER Examination request

Effective date: 20220929