CN112925967A - Method, device and equipment for generating expanded query words and storage medium - Google Patents
Method, device and equipment for generating expanded query words and storage medium Download PDFInfo
- Publication number
- CN112925967A CN112925967A CN202110168746.3A CN202110168746A CN112925967A CN 112925967 A CN112925967 A CN 112925967A CN 202110168746 A CN202110168746 A CN 202110168746A CN 112925967 A CN112925967 A CN 112925967A
- Authority
- CN
- China
- Prior art keywords
- words
- word
- expanded query
- category
- expanded
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 239000011159 matrix material Substances 0.000 claims abstract description 35
- 238000007781 pre-processing Methods 0.000 claims abstract description 15
- 238000012163 sequencing technique Methods 0.000 claims abstract description 8
- 230000001174 ascending effect Effects 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 4
- 230000000630 rising effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 12
- 238000004590 computer program Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 230000008520 organization Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9532—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method, a device, equipment and a storage medium for generating an expanded query word, wherein the method comprises the following steps: acquiring real-time search engine data according to a query word input by a user; preprocessing the search engine data to obtain a word frequency matrix; and classifying the words in the word frequency matrix, sequencing the words in each category according to the TF-IDF values of the words, and selecting a preset number of words in each category as generated classified expanded query words. According to the method for generating the expanded query words, the query words can be expanded in real time by acquiring real-time search engine data, the expanded words can be classified, user requirements can be better met, search effectiveness is improved, the trend of the generated expanded query words can be calculated, and the trend result is used as input data of subsequent application.
Description
Technical Field
The present invention relates to the field of natural language processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating an expanded query term.
Background
The rapid development of the internet brings about explosive exponential growth of internet data, and in the new era of digital surge, the daily generated data is about as much as 370 EB. With the exponential growth of data, the problem is how to accurately and efficiently retrieve information from massive internet data, and intelligently recommend relevant information and information to users.
In the current information retrieval field, the academic world focuses on exploring various linguistic methods to improve retrieval effectiveness, while the engineering world focuses on fully utilizing historical data and recommending relevant information for users by utilizing statistical methods. However, a model based on a linguistic analysis method needs a relatively perfect training data set, an additional dictionary needs to be maintained, and the migration capability of the model is relatively weak; while statistical-based models rely on searching the user's historical behavioral data; and the methods in the prior art cannot predict the trend of the associated expanded words.
Disclosure of Invention
The embodiment of the disclosure provides a method, a device, equipment and a storage medium for generating an expanded query term. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present disclosure provides a method for generating an expanded query term, including:
acquiring real-time search engine data according to a query word input by a user;
preprocessing search engine data to obtain a word frequency matrix;
classifying the words in the word frequency matrix, sequencing the words in each category according to TF-IDF values of the words, and selecting a preset number of words in each category as generated classified expanded query words.
In an optional embodiment, obtaining real-time search engine data according to a query word input by a user includes:
acquiring a query word input by a user;
acquiring search engine data in real time by adopting a crawler technology;
and analyzing and combining a plurality of search engine data to form text data to be processed.
In an optional embodiment, the preprocessing the search engine data to obtain a word frequency matrix includes:
performing word segmentation on text data to be processed, and labeling the part of speech of each word;
and generating a word frequency matrix according to the occurrence frequency of each word in each article.
In an optional embodiment, classifying the words in the word frequency matrix includes:
and dividing the words in the word frequency matrix into four categories of a name category, a verb category, an adjective category and a mechanism name category according to the part of speech of the label.
In an optional embodiment, after selecting a preset number of terms in each category as the generated classification expansion query term, the method further includes:
and determining the trend of the expanded query words according to the trend threshold of the expanded query words and the categories of the expanded query words.
In an alternative embodiment, determining a trend of the expanded query term based on a trend threshold of the expanded query term and its category includes:
when the TF-IDF value of the expanded query word is greater than or equal to the trend threshold value of the category of the expanded query word, determining the expanded query word to be in an ascending trend;
and when the TF-IDF value of the expanded query word is smaller than the trend threshold value of the category of the expanded query word, determining the expanded query word to be in a descending trend.
In an optional embodiment, after determining the trend of the expanded query term according to the trend threshold of the expanded query term and the category thereof, the method further includes:
and determining the planned topics of the news department according to the expanded query words in the rising trend.
In a second aspect, an embodiment of the present disclosure provides an apparatus for generating an expanded query term, including:
the acquisition module is used for acquiring real-time search engine data according to the query words input by the user;
the preprocessing module is used for preprocessing the search engine data to obtain a word frequency matrix;
and the expansion query word generation module is used for classifying the words in the word frequency matrix, sequencing the words in each category according to TF-IDF values of the words, and selecting a preset number of words in each category as the generated classification expansion query words.
In a third aspect, the disclosed embodiment provides an expanded query term generation device, including a processor and a memory storing program instructions, where the processor is configured to execute the expanded query term generation method provided in the foregoing embodiment when executing the program instructions.
In a fourth aspect, the present disclosure provides a computer-readable medium, on which computer-readable instructions are stored, where the computer-readable instructions are executable by a processor to implement an extended query term generation method provided in the foregoing embodiments.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the method for generating the expanded query word, provided by the embodiment of the disclosure, the timeliness of the expanded query word is ensured by acquiring the data of the search engine in real time, the problem that the query word cannot be expanded in real time is solved, and the problem that the statistical method depends on the historical behavior data of the search user is also solved. By classifying the expanded query words according to the parts of speech, more query words meeting the requirements of the user can be provided for the user, and the searching effectiveness is improved; the trend of the generated expanded query words can be calculated, and the trend result is used as input data of subsequent application, and the method can be applied to the fields of real-time portrait of people, intelligent recommendation of search engines, association of similar words and the like.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flowchart illustrating a method of generating expanded query terms in accordance with an exemplary embodiment;
FIG. 2 is a flowchart illustrating a method of generating expanded query terms in accordance with an exemplary embodiment;
FIG. 3 is a diagram illustrating a method for obtaining search engine data in real-time according to an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating a data pre-processing according to an exemplary embodiment;
FIG. 5 is a diagram illustrating a word frequency matrix in accordance with an exemplary embodiment;
FIG. 6 is a diagram illustrating one type of generating expanded query terms and trends in accordance with an illustrative embodiment;
FIG. 7 is a block diagram illustrating an apparatus for generating expanded query terms in accordance with an illustrative embodiment;
FIG. 8 is a block diagram illustrating an apparatus for generating expanded query terms in accordance with an illustrative embodiment;
FIG. 9 is a schematic diagram illustrating a computer storage medium in accordance with an exemplary embodiment.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The method for generating the expanded query term according to the embodiment of the present application will be described in detail below with reference to fig. 1 to 6. Fig. 1 is a flowchart illustrating a method for generating an expanded query term according to an exemplary embodiment, and referring to fig. 1, the method specifically includes the following steps.
S101, obtaining real-time search engine data according to the query words input by the user.
In one possible implementation, the user inputs the query term through the search platform, for example, the user inputs the query term through a network platform such as Baidu, 360, Saogue, and the like, and the searched result data is obtained. Preferably, the input query term is a noun or a verb.
And further, calling a search engine interface, acquiring search engine data in real time through a crawler technology, analyzing the obtained search engine result data, extracting titles and abstracts in the list, removing html tags, removing advertisements and the like. And then combining a plurality of search engine data to form text data to be processed.
Fig. 3 is a schematic diagram illustrating a method for acquiring search engine data in real time according to an exemplary embodiment, where as shown in fig. 3, a query word is first input, then a plurality of search engine interfaces are called in real time to obtain a plurality of result data sets, the obtained plurality of result data sets are parsed, for example, titles and abstracts in a list are extracted, html tags are removed, advertisements are removed, and finally, a plurality of search engine data are combined to form text data to be processed.
By acquiring the data of the search engine in real time, the problem that the query words cannot be expanded in real time is solved, and the problem that the statistical method depends on the historical behavior data of the search user is also solved.
S102, preprocessing the search engine data to obtain a word frequency matrix.
Fig. 4 is a schematic diagram of data preprocessing according to an exemplary embodiment, and as shown in fig. 4, preprocessing text data to be processed includes first obtaining the text data to be processed in step S101, then performing word segmentation, part-of-speech tagging, stop word removal, and the like on the text data to be processed, and finally generating a word frequency matrix according to a word segmentation tagging result.
Specifically, the text data to be processed is firstly segmented, and then labeled according to the part of speech of each segmented word, for example, the word is a name of a person, then labeled as a name of a person, the word is a verb, then labeled as a verb, the word is an adjective or an adverb, then labeled as an adjective or an adverb, the word is an organization name, and then labeled as an organization name. And delete some stop words, etc. nonsense words in the text data set.
Furthermore, according to the word segmentation labeling result, word frequency is counted, a word frequency matrix is generated, and the number in the word frequency matrix represents the number of times that the corresponding word appears in the corresponding article.
Fig. 5 is a schematic diagram of a word frequency matrix according to an exemplary embodiment, and as shown in fig. 5, two articles are assumed, wherein a number 0 in a first row and a first column indicates that a word "any president" appears in article 1 with a frequency of 0, and a number 1 in a first column and a second row indicates that a word "any president" appears in article 2 with a frequency of 1. The times of the corresponding words appearing in the corresponding articles can be seen according to the numbers in the word frequency matrix.
S103, classifying the words in the word frequency matrix, sequencing the words in each category according to TF-IDF values of the words, and selecting a preset number of words in each category as generated classified expanded query words.
In one possible implementation, calculating the TF-IDF value of each word in the word frequency matrix, TF-IDF (term frequency-inverse text frequency index) is a statistical method, and if a word or phrase occurs in an article with a high frequency TF and rarely occurs in other articles, the word or phrase is considered to have a good category distinguishing capability and is suitable for classification. TF represents the frequency of occurrence of the term in the document d, and the main idea of IDF is that if the number of documents containing the term t is smaller, the IDF is larger, and the term t has good category distinguishing capability. High word frequencies within a particular document, and low document frequencies for that word across the document collection, may result in a high-weighted TF-IDF, which tends to filter out common words and retain important words.
In the disclosed embodiment, the higher the TF-IDF value of a term, the higher the association between the term and the input query term.
Further, words in the word frequency matrix are divided into four categories of names, verbs, adjectives and mechanism names according to the parts of speech of the labels. And then, sequencing the terms in each category from large to small according to the TF-IDF values of the terms, and selecting a preset number of terms in each category as the generated classified expanded query term.
In a possible implementation mode, the first 10 terms with higher TF-IDF values are selected from each category as the extended query words of the category to obtain a name extended query word set, a verb extended query word set, an adjective/adverb extended query word set and a mechanism name extended query word set. The number of the expanded query terms in each type of expanded query term set can be the same or different, and the expanded query terms can be set by a person skilled in the art according to needs.
By classifying the expanded query words according to the parts of speech, the requirements of users can be better met, and the searching effectiveness is improved.
In an optional embodiment, after selecting a preset number of terms in each category as the generated classified expanded query term, determining a trend of the expanded query term according to a trend threshold of the expanded query term and its category.
Specifically, when the TF-IDF value of the expanded query word is greater than or equal to the trend threshold of the category of the expanded query word, the expanded query word is determined to be in an ascending trend, and when the TF-IDF value of the expanded query word is smaller than the trend threshold of the category of the expanded query word, the expanded query word is determined to be in a descending trend.
Wherein the trend threshold for each category is obtained by analyzing historical data, and in an exemplary scenario, the name trend threshold is 0.35, the verb trend threshold is 0.43, the adjective/adverb trend threshold is 0.28, and the organization name trend threshold is 0.24. And if the TF-IDF value of the expanded query word is smaller than the trend threshold value of the category of the expanded query word, determining that the expanded query word is in a descending trend, and determining that the expanded query word is in a descending trend after a period of time.
Furthermore, after various expanded query terms and trends are obtained, the expanded query terms and the trends can be used as input data of various subsequent applications. In one exemplary scenario, such as when a news campaign is planning a topic, expanded query terms that are highly related and in a rising trend may be selected from the list of expanded query terms as the basis for the topic. In one exemplary scenario, such as when the knowledge graph is supplemented to be refined, the expanded query terms with high association and an ascending trend may be selected from the expanded query term list as the node data of the knowledge graph. In one exemplary scenario, such as when generating a real-time portrait of a person, expanded query terms that are highly related and in an ascending trend may also be selected as input data.
By calculating the trend of various expanded query words, the accuracy of subsequent applications such as intelligent recommendation, related word association, planning and selecting questions and the like can be greatly improved.
Fig. 6 is a schematic diagram of generating an extended query term and a trend according to an exemplary embodiment, and as shown in fig. 6, a generated word-frequency matrix is first obtained, then a TF-IDF value of each term in the word-frequency matrix is calculated, the terms are divided into four categories, namely, a name category, a verb category, an adjective category and a mechanism name category according to the part of speech of the term, then the terms in each category are sorted according to the TF-IDF values of the terms from large to small, and a preset number of terms are selected from each category as the generated classified extended query term.
Further, the expanded query words are compared with the trend threshold value of the types of the expanded query words, when the TF-IDF value of the expanded query words is larger than or equal to the trend threshold value of the types of the expanded query words, the expanded query words are determined to be in an ascending trend, when the TF-IDF value of the expanded query words is smaller than the trend threshold value of the types of the expanded query words, the expanded query words are determined to be in a descending trend, and finally the character expanded query words and trends, the attribute expanded query words and trends, the state expanded query words and trends, and the mechanism expanded query words and trends are obtained. The relevance threshold in fig. 6 refers to a trend threshold, and may be obtained by analyzing historical data.
In order to facilitate understanding of the method for generating the expanded query term provided in the embodiment of the present application, the following description is made with reference to fig. 2. As shown in fig. 2, the method includes:
firstly, real-time data of a search engine is acquired according to a query word input by a user, and preprocessing of operations such as word segmentation, labeling and word frequency matrix generation is performed on the acquired data.
And further, calculating the TF-IDF value of each word in the word frequency matrix, and dividing the words in the word frequency matrix into four categories, namely a name category, a verb category, an adjective category and a mechanism name category according to the marked part of speech. And then, sequencing the terms in each category from large to small according to the TF-IDF values of the terms, and selecting a preset number of terms in each category as the generated classified expanded query term.
Further, the trend of the expanded query word is judged according to the trend threshold of the expanded query word and the category of the expanded query word, when the TF-IDF value of the expanded query word is larger than or equal to the trend threshold of the category of the expanded query word, the expanded query word is determined to be in an ascending trend, and when the TF-IDF value of the expanded query word is smaller than the trend threshold of the category of the expanded query word, the expanded query word is determined to be in a descending trend.
And finally, the generated expanded query words and the trends thereof are used as input data of various subsequent applications, for example, a portrait is generated according to the generated expanded query words, intelligent recommendation is performed, a knowledge graph is perfected, and the like.
According to the method for generating the expanded query words, the query words can be expanded in real time by acquiring real-time search engine data, the expanded words can be classified, user requirements can be better met, search effectiveness is improved, the trend of the generated expanded query words can be calculated, and the trend result is used as input data of subsequent application.
The embodiment of the present disclosure further provides a device for generating an expanded query term, where the device is configured to execute the method for generating an expanded query term in the foregoing embodiment, and as shown in fig. 7, the device includes:
an obtaining module 701, configured to obtain real-time search engine data according to a query word input by a user;
a preprocessing module 702, configured to preprocess search engine data to obtain a word frequency matrix;
the expanded query term generating module 703 is configured to classify the terms in the word-frequency matrix, rank the terms in each category according to the TF-IDF values of the terms, and select a preset number of terms in each category as the generated classified expanded query terms.
It should be noted that, when the generating apparatus for generating an expanded query term provided in the foregoing embodiment executes the generating method for an expanded query term, the above-mentioned division of the function modules is merely used as an example, and in practical applications, the above-mentioned function distribution may be completed by different function modules according to needs, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the above-mentioned functions. In addition, the generating device of the expanded query term and the generating method of the expanded query term provided by the above embodiments belong to the same concept, and the embodiment of the method embodies the implementation process, which is not described herein again.
The embodiment of the present disclosure further provides an electronic device corresponding to the method for generating an expanded query term provided in the foregoing embodiment, so as to execute the method for generating an expanded query term.
Referring to fig. 8, a schematic diagram of an electronic device provided in some embodiments of the present application is shown. As shown in fig. 8, the electronic apparatus includes: a processor 800, a memory 801, a bus 802 and a communication interface 803, the processor 800, the communication interface 803 and the memory 801 being connected by the bus 802; the memory 801 stores a computer program that can be executed on the processor 800, and the processor 800 executes the method for generating the expanded query term provided in any of the foregoing embodiments of the present application when executing the computer program.
The Memory 801 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 803 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
The processor 800 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 800. The Processor 800 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 800 reads the information in the memory 801 and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the application and the method for generating the expanded query term provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 9, the computer-readable storage medium is an optical disc 900, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program executes the method for generating the extended query term provided in any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiment of the present application and the method for generating the expanded query term provided by the embodiment of the present application have the same advantages as the method adopted, run, or implemented by the application program stored in the computer-readable storage medium.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A method for generating an expanded query term, comprising:
acquiring real-time search engine data according to a query word input by a user;
preprocessing the search engine data to obtain a word frequency matrix;
and classifying the words in the word frequency matrix, sequencing the words in each category according to the TF-IDF values of the words, and selecting a preset number of words in each category as generated classified expanded query words.
2. The method of claim 1, wherein obtaining real-time search engine data based on a query term input by a user comprises:
acquiring a query word input by a user;
acquiring search engine data in real time by adopting a crawler technology;
and analyzing and combining a plurality of search engine data to form text data to be processed.
3. The method of claim 2, wherein preprocessing the search engine data to obtain a word frequency matrix comprises:
performing word segmentation on the text data to be processed, and labeling the part of speech of each word;
and generating a word frequency matrix according to the occurrence frequency of each word in each article.
4. The method of claim 3, wherein classifying the words in the word frequency matrix comprises:
and dividing the words in the word frequency matrix into four categories of a name category, a verb category, an adjective category and a mechanism name category according to the marked part of speech.
5. The method of claim 1, wherein after selecting a preset number of terms in each category as the generated category expansion query terms, further comprising:
and determining the trend of the expanded query term according to the trend threshold of the expanded query term and the category of the expanded query term.
6. The method of claim 5, wherein determining the trend of the expanded query term based on a trend threshold of the expanded query term and its category comprises:
when the TF-IDF value of the expanded query word is larger than or equal to the trend threshold value of the category of the expanded query word, determining that the expanded query word is in an ascending trend;
and when the TF-IDF value of the expanded query word is smaller than the trend threshold value of the category of the expanded query word, determining that the expanded query word is in a descending trend.
7. The method of claim 5, wherein after determining the trend of the expanded query term based on the trend threshold of the expanded query term and its category, further comprising:
and determining the planned topics of the news department according to the expanded query words in the rising trend.
8. An apparatus for generating an expanded query term, comprising:
the acquisition module is used for acquiring real-time search engine data according to the query words input by the user;
the preprocessing module is used for preprocessing the search engine data to obtain a word frequency matrix;
and the expansion query word generation module is used for classifying the words in the word frequency matrix, sequencing the words in each category according to the TF-IDF values of the words, and selecting a preset number of words in each category as the generated classification expansion query words.
9. An expanded query term generation apparatus comprising a processor and a memory storing program instructions, the processor being configured to execute the expanded query term generation method according to any one of claims 1 to 7 when executing the program instructions.
10. A computer-readable medium having computer-readable instructions stored thereon, the computer-readable instructions being executable by a processor to implement an extended query term generation method as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110168746.3A CN112925967A (en) | 2021-02-07 | 2021-02-07 | Method, device and equipment for generating expanded query words and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110168746.3A CN112925967A (en) | 2021-02-07 | 2021-02-07 | Method, device and equipment for generating expanded query words and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112925967A true CN112925967A (en) | 2021-06-08 |
Family
ID=76171131
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110168746.3A Pending CN112925967A (en) | 2021-02-07 | 2021-02-07 | Method, device and equipment for generating expanded query words and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112925967A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101295319A (en) * | 2008-06-24 | 2008-10-29 | 北京搜狗科技发展有限公司 | Method and device for expanding query, search engine system |
CN102902806A (en) * | 2012-10-17 | 2013-01-30 | 深圳市宜搜科技发展有限公司 | Method and system for performing inquiry expansion by using search engine |
CN106294868A (en) * | 2016-08-23 | 2017-01-04 | 达而观信息科技(上海)有限公司 | A kind of personalized recommendation method based on search engine and system |
CN107291914A (en) * | 2017-06-27 | 2017-10-24 | 达而观信息科技(上海)有限公司 | A kind of method and system for generating search engine inquiry expansion word |
-
2021
- 2021-02-07 CN CN202110168746.3A patent/CN112925967A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101295319A (en) * | 2008-06-24 | 2008-10-29 | 北京搜狗科技发展有限公司 | Method and device for expanding query, search engine system |
CN102902806A (en) * | 2012-10-17 | 2013-01-30 | 深圳市宜搜科技发展有限公司 | Method and system for performing inquiry expansion by using search engine |
CN106294868A (en) * | 2016-08-23 | 2017-01-04 | 达而观信息科技(上海)有限公司 | A kind of personalized recommendation method based on search engine and system |
CN107291914A (en) * | 2017-06-27 | 2017-10-24 | 达而观信息科技(上海)有限公司 | A kind of method and system for generating search engine inquiry expansion word |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017097231A1 (en) | Topic processing method and device | |
CN104573054A (en) | Information pushing method and equipment | |
CN108520007B (en) | Web page information extracting method, storage medium and computer equipment | |
TW201923629A (en) | Data processing method and apparatus | |
CN108228612B (en) | Method and device for extracting network event keywords and emotional tendency | |
CN115374781A (en) | Text data information mining method, device and equipment | |
CN111639255A (en) | Search keyword recommendation method and device, storage medium and electronic equipment | |
CN112307336A (en) | Hotspot information mining and previewing method and device, computer equipment and storage medium | |
CN107977420A (en) | The abstract extraction method, apparatus and readable storage medium storing program for executing of a kind of evolved document | |
JP7395377B2 (en) | Content search methods, devices, equipment, and storage media | |
CN114328983A (en) | Document fragmenting method, data retrieval device and electronic equipment | |
WO2015084757A1 (en) | Systems and methods for processing data stored in a database | |
Alruqimi et al. | Bridging the Gap between the Social and Semantic Web: Extracting domain-specific ontology from folksonomy | |
Darmawiguna et al. | The development of integrated Bali tourism information portal using web scrapping and clustering methods | |
CN114090877A (en) | Position information recommendation method and device, electronic equipment and storage medium | |
CN106372123B (en) | Tag-based related content recommendation method and system | |
Zhang et al. | Informing the curious negotiator: Automatic news extraction from the internet | |
Lee et al. | Web document classification using topic modeling based document ranking | |
CN112528021B (en) | Model training method, model training device and intelligent equipment | |
CN111310017A (en) | Method and device for generating timeliness scene content | |
CN112925967A (en) | Method, device and equipment for generating expanded query words and storage medium | |
CN114328895A (en) | News abstract generation method and device and computer equipment | |
CN110727850B (en) | Network information filtering method, computer readable storage medium and mobile terminal | |
CN113468339A (en) | Label extraction method, system, electronic device and medium based on knowledge graph | |
US10552459B2 (en) | Classifying a document using patterns |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |