CN114706978A - Information retrieval method and system for vehicle machine - Google Patents

Information retrieval method and system for vehicle machine Download PDF

Info

Publication number
CN114706978A
CN114706978A CN202210219091.2A CN202210219091A CN114706978A CN 114706978 A CN114706978 A CN 114706978A CN 202210219091 A CN202210219091 A CN 202210219091A CN 114706978 A CN114706978 A CN 114706978A
Authority
CN
China
Prior art keywords
topic
word
user
keyword
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210219091.2A
Other languages
Chinese (zh)
Inventor
李旭婕
詹修泓
罗凡
张焕期
张敬伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongfeng Motor Corp
Original Assignee
Dongfeng Motor Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongfeng Motor Corp filed Critical Dongfeng Motor Corp
Priority to CN202210219091.2A priority Critical patent/CN114706978A/en
Publication of CN114706978A publication Critical patent/CN114706978A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information retrieval method and system for a vehicle machine, wherein the method comprises the following steps: step S1, obtaining a user theme keyword; s2, obtaining a text related to the subject key words, and constructing an LDA model of the text related to the subject key words; step S3, obtaining a preliminary topic word list related to the topic key words; step S4, acquiring related information including structured data and unstructured document data; and step S5, obtaining a vehicle-mounted device search result. The information retrieval method for the vehicle machine provided by the invention can be used for presenting the vehicle machine retrieval result with high quality to be carried on a vehicle machine system while expanding the retrieval range, and solving the problem of quick and accurate information retrieval in the vertical field of automobiles.

Description

Information retrieval method and system for vehicle machine
Technical Field
The invention relates to the technical field of information retrieval, in particular to an information retrieval method and system for a vehicle machine.
Background
With the rapid development of computer technology, the era of interconnection on board vehicles has quietly arrived. From automation to intelligence, people have accomplished from science and technology promotion efficiency to the evolution that science and technology promoted quality of life, and the demand that combines the internet to the vehicle also increases more and more. When using automobiles, people are more inclined to do some information retrieval through the car machine system. However, the current surge of mass information data on the internet creates certain difficulties for information retrieval of the vehicle-machine system. How to accurately retrieve and acquire relevant knowledge of various aspects concerned in a user voice instruction from a plurality of data materials becomes more and more important for a vehicle-mounted computer system.
At present, for the instruction of a user, the main channel for the vehicle-mounted computer to respond to the search is related application programs with corresponding topics, such as weather related application entering during temperature search and catering related application entering during food search. When a retrieval channel enters a search engine, on one hand, according to sporadic or single keywords provided by a user, retrieval results are difficult to cover all aspects of information of the theme, and the implicit semantics of the user under the keywords are difficult to reflect, and on the other hand, a great deal of information data irrelevant to the theme can be retrieved.
How to find a way for presenting a retrieval result with high quality while expanding a retrieval range based on the intention of a user is carried on a vehicle machine system, and the problem of quick and accurate retrieval of information in the vertical field of an automobile is solved.
Disclosure of Invention
The invention aims to overcome the defects of the background technology and provides an information retrieval method and an information retrieval system for a vehicle machine.
In a first aspect, the present invention provides an information retrieval method for a vehicle machine, including the following steps:
step S1, obtaining a user theme keyword;
step S2, according to the obtained user topic keywords, obtaining relevant texts of the topic keywords, constructing an LDA model of the relevant texts of the topic keywords, and obtaining relevant word lists of the topic keywords;
step S3, filtering weakly-relevant words of the subject through a TF-IDF model by taking the relevant word list of the subject key words as input, and acquiring a primary subject word list relevant to the subject key words;
step S4, taking the user theme key words as the main part and the preliminary theme word list as the auxiliary part, sequentially retrieving and capturing relevant word information, and acquiring relevant information comprising structured data and unstructured document data;
and S5, fusing the structured data and the unstructured data to obtain a vehicle-mounted device retrieval result.
According to the first aspect, in a first possible implementation manner of the first aspect, the step S1 specifically includes the following steps:
s11, acquiring a user instruction;
s12, judging the vocabulary type of the user instruction;
s131, when the vocabulary type of the user instruction is a word, taking the word instruction as a topic keyword;
s132, when the vocabulary type of the user instruction is a short sentence, identifying the short sentence instruction, and acquiring the topic key words related to the short sentence instruction.
According to the first aspect, in a second possible implementation manner of the first aspect, the step S2 specifically includes the following steps:
s21, according to the obtained user topic keywords, the search engine captures texts related to the topic keywords;
s22, constructing an LDA model of the text related to the subject key words, and obtaining a word list related to the subject key words.
According to the first aspect, in a third possible implementation manner of the first aspect, the step S3 specifically includes the following steps:
construction of
Figure BDA0003536231030000031
Matrix, in which vector
Figure BDA0003536231030000032
Representing capacity of vocabulary, vector
Figure BDA0003536231030000033
Then the capacity of the text set is indicated;
obtaining frequency TF of each word in the related word list of the subject key words, reciprocal IDF of all text numbers containing the word and product of TF and IDF, and assigning the product to
Figure BDA0003536231030000034
A corresponding position in the matrix;
obtaining an IDF value of each word in a text set in a word list related to the subject key words;
comparing the IDF value of each word with the IDF threshold value;
and filtering out words with the IDF value lower than the IDF threshold value in the influence factor distribution result, and acquiring a preliminary subject word list related to the subject key words.
According to the first aspect, in a fourth possible implementation manner of the first aspect, the step S4 specifically includes the following steps:
step S41, constructing a policy network;
and step S42, taking the user theme key words as the main part and the preliminary theme word list as the auxiliary part, sequentially searching and capturing relevant word information through a policy network, and acquiring relevant information comprising structured data and unstructured document data.
According to a fourth possible implementation form of the first aspect, in a fifth possible implementation form of the first aspect,
the step S41 specifically includes the following steps:
s411, obtaining the theme relevance of the text content of the service webpage to be crawled;
step S412, obtaining the link authority of the service type web page to be crawled;
step S413, fusing the obtained topic correlation and the link authority to obtain resource sequencing of the high-quality webpage;
and step S414, constructing a strategy network comprising the priority of the URL to be crawled according to the resource sequencing of the acquired high-quality webpage.
According to a fifth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, the policy network in step S414 includes an input layer, a convolution layer, a Softmax layer, and an output layer, where the input layer is configured to read a feature matrix of a set of service-class webpages to be crawled, the convolution layer is configured to evaluate quality of each service-class webpage to be crawled, the Softmax layer is configured to obtain a click probability of each service-class webpage to be crawled, and the output layer is configured to output a click probability distribution of the set of service-class webpages to be crawled.
In a second aspect, the present invention provides an information retrieval system for a vehicle machine, including:
the theme key word acquisition module is used for acquiring a user theme key word;
the topic keyword related word list acquisition module is in communication connection with the topic keyword acquisition module and is used for acquiring a topic keyword related text according to the acquired user topic keyword, constructing an LDA model of the topic keyword related text and acquiring a topic keyword related word list;
a preliminary topic word list acquisition module which is in communication connection with the topic keyword acquisition module and the topic keyword related word list acquisition module and is used for filtering weakly related topic words through a TF-IDF model by taking the topic keyword related word list as input to acquire a preliminary topic word list related to the topic keyword;
the related information acquisition module is in communication connection with the theme keyword acquisition module and the preliminary theme vocabulary acquisition module, and is used for sequentially searching and capturing related word information by taking the user theme keywords as a main part and the preliminary theme vocabulary as an auxiliary part to acquire related information comprising structured data and unstructured document data;
and the vehicle machine retrieval result acquisition module is in communication connection with the related information acquisition module and is used for fusing the structured data and the unstructured data to acquire a vehicle machine retrieval result.
According to the second aspect, in a first possible implementation manner of the second aspect, the topic keyword obtaining module further includes:
the user instruction acquisition sub-module is used for acquiring a user instruction;
the vocabulary type acquisition submodule is in communication connection with the user execution acquisition submodule and is used for judging the vocabulary type of the user instruction;
the first theme keyword acquisition sub-module is in communication connection with the user instruction acquisition sub-module and the vocabulary type acquisition sub-module and is used for taking the vocabulary instruction as a theme keyword when the vocabulary type of the user instruction is a word;
and the second subject keyword acquisition sub-module is in communication connection with the user instruction acquisition sub-module and the vocabulary type acquisition sub-module and is used for acquiring the main keywords related to the short sentence instruction when the vocabulary type of the user instruction is the short sentence.
According to the second aspect, in a second possible implementation manner of the second aspect, the topic keyword related vocabulary acquiring module further includes:
the related text acquisition sub-module is in communication connection with the topic keyword acquisition module and is used for capturing related texts of main keywords by a search engine according to the acquired user topic keywords;
and the LDA model acquisition sub-module is in communication connection with the related text acquisition sub-module and is used for constructing an LDA model of the text related to the subject key words.
According to the information retrieval method for the vehicle machine, the LDA model of the text related to the subject key words is constructed, the TF-IDF model is used for filtering the subject words and phrases if the words and phrases are related, the preliminary subject word list related to the subject key words is obtained, the related information comprising the structured data and the unstructured document data is obtained through retrieval, the vehicle machine retrieval result is obtained, the retrieval range is expanded, the vehicle machine retrieval result can be presented in a high-quality mode to be carried on a vehicle machine system, and the problem of quick and accurate retrieval of information in the vertical field of automobiles is solved.
Drawings
Fig. 1 is a flowchart of a method for retrieving information of a vehicle-mounted device according to an embodiment of the present invention;
fig. 2 is a flowchart of another method of an information retrieval method for a vehicle machine according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an LDA model according to an embodiment of the present invention;
FIG. 4 is a diagram of a policy network;
fig. 5 is a functional block diagram of an information retrieval system for a vehicle device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the present embodiments of the invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with the specific embodiments, it will be understood that they are not intended to limit the invention to the embodiments described. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. It should be noted that the method steps described herein may be implemented by any functional block or functional arrangement, and that any functional block or functional arrangement may be implemented as a physical entity or a logical entity, or a combination of both.
In order that those skilled in the art will better understand the invention, further details are provided below in conjunction with the accompanying drawings and the detailed description of the invention.
Note that: the example to be described next is only a specific example, and does not limit the embodiments of the present invention necessarily to the following specific steps, values, conditions, data, orders, and the like. Those skilled in the art can, upon reading this specification, utilize the concepts of the present invention to construct more embodiments than those specifically described herein.
At present, for the instruction of a user, the main channel of vehicle-mounted machine response search is related application programs of corresponding topics, such as weather application and catering application, and the search is performed only according to a single sporadic keyword provided by the user, so that all information of the topic is covered, the implicit semantics of the user under the keyword is difficult to reflect, and on the other hand, a great amount of information data irrelevant to the subject can be searched.
In view of the above, referring to fig. 1, the present invention provides an information retrieval method for a vehicle machine, including the following steps:
step S1, obtaining a user theme keyword;
step S2, in order to expand the retrieval range of the subject-oriented keyword retrieval, show all dimension information for the user, and mine the potential semantics and intention of the user, the method obtains the relevant text of the subject keyword according to the obtained subject keyword of the user, constructs the LDA model of the relevant text of the subject keyword, obtains the relevant vocabulary of the subject keyword, and provides better help for the subsequent extended retrieval;
step S3, in order to avoid retrieving the information data irrelevant to the main key words, the word list relevant to the subject key words is used as input, and the TF-IDF model is used for filtering the weakly relevant words of the subject words to obtain a preliminary subject word list relevant to the subject key words;
step S4, taking the user theme key words as the main part and the preliminary theme word list as the auxiliary part, sequentially retrieving and capturing relevant word information, and acquiring relevant information comprising structured data and unstructured document data;
and S5, fusing the structured data and the unstructured data to obtain a vehicle-mounted device retrieval result.
According to the information retrieval method for the vehicle machine, the LDA model of the text related to the subject key words is constructed, the TF-IDF model is used for filtering the subject words and phrases if the words and phrases are related, the preliminary subject word list related to the subject key words is obtained, the related information comprising the structured data and the unstructured document data is obtained through retrieval, the vehicle machine retrieval result is obtained, the retrieval range is expanded, the vehicle machine retrieval result can be presented in a high-quality mode to be carried on a vehicle machine system, and the problem of quick and accurate retrieval of information in the vertical field of automobiles is solved.
In an embodiment, the step S1 specifically includes the following steps:
s11, acquiring a user instruction, wherein the acquisition path can be the user dictation content acquired by a voice recognition system or the user input information acquired by a vehicle-mounted device search box;
s12, judging the vocabulary type of the user instruction, wherein the vocabulary type comprises words and short sentences;
s131, when the vocabulary type of the user instruction is a word, taking the word instruction as a topic keyword;
s132, when the vocabulary type of the user instruction is a short sentence, applying a natural language processing technology to perform word segmentation, part of speech tagging and named entity recognition on the sentence of the short sentence instruction, and defining the topic key words concerned by the general user as the topic key words used for subsequent retrieval.
In an embodiment, referring to fig. 2, the step S2 specifically includes the following steps:
s21, according to the obtained user topic keywords, the search engine captures texts related to the topic keywords;
s22, constructing an LDA model of the text related to the subject key words, and obtaining a word list related to the subject key words.
In one embodiment, the step S21 of capturing the text related to the topic keyword by the search engine is mainly performed by a crawler program. The search engine can be selected from search engines such as hundredth degree search, compulsory search, dog search and the like according to actual conditions. When some keywords with definite subjects are searched, a vertical domain information source can be added for searching, such as ink marks during weather searching, ink marks during food searching, public comments during food searching, a high-grade map during parking lot searching and the like.
In an embodiment, in step S22, the basic idea of building the LDA model for the text related to the main keyword is to regard the document of the text related to the main keyword as a plurality of words without context, the document includes a plurality of topics, each word is generated by a topic under a certain concept, the LDA model is a directed graph, and the generation model is shown in fig. 3:
the symbols in the circles in the figures represent variables, arrows represent conditional relationships between the variables, the boxes represent that the symbols in the circles in the boxes need to be resampled, and the numbers k, m and n in the boxes represent the number of times of resampling;
Figure BDA0003536231030000081
and
Figure BDA0003536231030000082
is two superThe parameters are set to be in a predetermined range,
Figure BDA0003536231030000083
representing a distribution of a total of k "topic-words",
Figure BDA0003536231030000084
there are m kinds of "document-subject" distributions, wm,nIndicating that there are n words in each document m, zm,nThe expression wm,nCorresponding to a theme. The joint distribution of visible and hidden variables in the model can be represented by equation 2-1:
Figure BDA0003536231030000085
Figure BDA0003536231030000091
the maximum likelihood estimate of each word distribution in the final document can be obtained by calculation as shown in equation 2-2.
Figure BDA0003536231030000092
According to p (w)i| α, β) is used, for example, commonly used gibbs sampling may be used.
In an embodiment, after the topic word list related to the topic keyword is obtained through calculation in step S22, in step S3, a TF-IDF (Term Frequency-inventory Frequency) model is used to filter out weak topic related words in the topic keyword related topic word list. TF-IDF is used to evaluate the importance of a word to one of a set of files or a corpus, where the importance of a word increases in direct proportion to the number of times it appears in the file, but decreases in inverse proportion to the frequency of occurrence of the word in the corpus, i.e., the importance of a word is in direct proportion to the number of times it appears in the file and in inverse proportion to the frequency of occurrence of the word in the corpus.
In an embodiment, the step S3 specifically includes the following steps:
construction of
Figure BDA0003536231030000093
Matrix, in which vector
Figure BDA0003536231030000094
Representing capacity of vocabulary, vector
Figure BDA0003536231030000095
Then the capacity of the text set is indicated;
obtaining frequency TF of each word in the related word list of the subject key words, reciprocal IDF of all text numbers containing the word and product of TF and IDF, and assigning the product to
Figure BDA0003536231030000096
A corresponding position in the matrix;
obtaining an IDF value of each word in a text set in a word list related to the subject key words and obtaining a distribution result of influence factors of each word in the text set;
comparing the IDF value of each word with the IDF threshold value;
and filtering out words with the IDF value lower than the IDF threshold value in the influence factor distribution result, and acquiring a preliminary subject word list related to the subject key words.
In an embodiment, the step of obtaining the IDF value of each word in the text set in the word list related to the topic keyword specifically includes the following steps:
it can be known from the TF-IDF model calculation idea that there is an entry w, and if the frequency of the entry appearing in a certain text d is higher (i.e. the TF value is larger), and the number of times the entry appearing in the whole text set is lower (i.e. the IDF value is lower), the influence factor of the entry w with respect to the text d is higher. Wherein:
Figure BDA0003536231030000101
Figure BDA0003536231030000102
TF-IDF=TF×IDF (2-5)
in the formula, X represents the frequency of the entry w in d, X represents the total number of words in the text, Y is the number of documents containing the word w, and Y represents the total number of the document set. TF-IDF in the formula (2-5) is an influence factor distribution structure. And obtaining the distribution result of the influence factors of the words w in the text set through simple normalization after calculation of formulas 2-3-2-5.
In an embodiment, the step S4 specifically includes the following steps:
step S41, constructing a policy network, as shown in fig. 4;
and step S42, taking the user theme key words as the main part and the preliminary theme word list as the auxiliary part, sequentially searching and capturing relevant word information through a policy network, and acquiring relevant information comprising structured data and unstructured document data. The structured data is mainly data presented in a tabular form in a website, and the structured degree of the structured data in the website with higher quality is higher. The unstructured data exists in internet information, and for example, the search for food is performed, and the website small red book is provided with unstructured information, wherein the unstructured information includes a public comment for explicitly marking structured information such as a storefront address, an average price, recommended dishes and the like, and also includes a recommended information which is gathered in an article.
In an embodiment, the step S41 specifically includes the following steps:
s411, extracting the text content of the service type web page to be crawled, establishing a vector space model of the subject key word and the text content of the web page to be crawled, and calculating a subject correlation score between the text content and the subject key word by utilizing a cosine law so as to obtain the subject correlation of the text content of the service type web page to be crawled;
step S412, calculating and acquiring the link authority of the service type web page to be crawled through an SEO (search Engine optimization) index;
step S413, fusing the obtained topic correlation and the link authority to obtain the resource sequence of the high-quality webpage, thereby determining the priority of the URL to be crawled and realizing the selection of the fused information source;
and S414, constructing a policy network comprising the priority of the URL to be crawled according to the resource sequencing of the acquired high-quality webpage.
In an embodiment, before the step S414, the method further includes the following steps:
step S4140, simulating selection of websites by a user, designing continuous clicking and page transfer for the selection, setting a clicking website selection threshold value for websites selected by the user, such as websites including a continuous motor and websites transferred, and acquiring an effective website set including the selection threshold value.
And then, the resource sequencing of the high-quality webpages of the effective website set is obtained by using the step S414, a strategy network including the priority of the URL to be crawled is constructed, and the redundant addition of the information of the invalid high-quality website is avoided.
In one embodiment, the policy network in step S414 is essentially a neural network, and takes a certain environmental state as an input, and finally outputs a probability distribution function of all executable actions in the environmental state through forward propagation, where the probability distribution corresponds to selecting different website links to perform the motor. The application provides as shown in fig. 4 policy network, including input layer, convolution layer, Softmax layer and output layer, the input layer is used for reading the characteristic matrix of the service class webpage collection of waiting to crawl, the convolution layer is used for assessing the quality of every service class webpage of waiting to crawl, Softmax layer is used for obtaining the click probability of every service class webpage of waiting to crawl, the output layer is used for outputting the click probability distribution of the service class webpage collection of waiting to crawl.
For each web site that needs to be evaluated, it is clicked on randomly using the policy network.
(1) Input layer
The input layer is used for reading a feature matrix of the website set.
(2) Convolutional layer
The convolution layer is used for evaluating the quality of each website resource, and performs convolution operation on the input feature matrix to obtain a vector:
Figure BDA0003536231030000121
wherein h iskIs the kth output, ω, of the convolutional layernIs a convolution kernel weight vector, bnIs a bias term and ReLU is an activation function.
(3) Softmax layer
The Softmax layer is used to convert the vector from the convolutional layer into a probability of selecting each website to click on, indicating the likelihood that selecting a certain website resource will yield a better result. For the kth website, the click probability PkThe calculation is as follows:
Figure BDA0003536231030000122
(4) output layer
The output layer is responsible for outputting the click probability distribution of the formulas 2 to 7, namely the click probability distribution of each website. The output layer outputs the following probability distribution:
P=(p1,p2,…,pm) (2-8)
each of which corresponds to a probability that a web site was clicked, the greater the probability value if the rating for selecting a site is the better.
In one embodiment, the training process of the policy network is as follows:
Figure BDA0003536231030000123
Figure BDA0003536231030000131
therefore, information resources obtained by keyword retrieval given by a user can be selectively screened, high-quality structured and unstructured data are screened out and stored, and the data are presented to the user in the next step and can be stored in a cloud, a local database and the like.
According to the method, the retrieval range of keyword retrieval given by the user is expanded by establishing the LDA model, comprehensive information of each dimension is displayed for the user, and potential semantics and intentions of the user are mined; filtering the weak related vocabulary of the topic by using the TF-IDF model, better representing the related information of the topic key words concerned by the user and providing better help for the subsequent extended retrieval; by means of reinforcement learning based on a strategy network, information resources obtained by retrieval of the topic keywords given by the user are selectively selected, high-quality information is presented for the user from mass data, interference of non-relevant data on user browsing is reduced, and high-quality, fast and accurate vehicle-mounted retrieval results relevant to the topic keywords of the user are provided.
Based on the same inventive concept, please refer to fig. 5, the invention provides an information retrieval system for a vehicle machine, comprising:
a topic keyword obtaining module 100, configured to obtain a user topic keyword;
a topic keyword related vocabulary acquiring module 200, communicatively connected to the topic keyword acquiring module 100, for acquiring a topic keyword related text according to the acquired user topic keyword, constructing an LDA model of the topic keyword related text, and acquiring a topic keyword related vocabulary;
a preliminary topic vocabulary acquiring module 300, communicatively connected to the topic keyword acquiring module 100 and the topic keyword related vocabulary acquiring module 200, for filtering weakly related topic vocabularies via a TF-IDF model by using the topic keyword related vocabulary as input to acquire a preliminary topic vocabulary related to the topic keyword;
a related information obtaining module 400, communicatively connected to the topic keyword obtaining module 100 and the preliminary topic vocabulary obtaining module 300, for sequentially retrieving and capturing relevant information of words and phrases by using the user topic keyword as a main part and the preliminary topic vocabulary as an auxiliary part, and obtaining related information including structured data and unstructured document data;
and the vehicle-mounted device search result acquisition module 500 is in communication connection with the related information acquisition module 400, and is used for fusing the structured data and the unstructured data to acquire a vehicle-mounted device search result.
In an embodiment, the topic keyword obtaining module further includes:
the user instruction acquisition submodule is used for acquiring a user instruction;
the vocabulary type acquisition submodule is in communication connection with the user execution acquisition submodule and is used for judging the vocabulary type of the user instruction;
the first theme keyword acquisition sub-module is in communication connection with the user instruction acquisition sub-module and the vocabulary type acquisition sub-module and is used for taking the vocabulary instruction as a theme keyword when the vocabulary type of the user instruction is a word;
and the second subject keyword acquisition sub-module is in communication connection with the user instruction acquisition sub-module and the vocabulary type acquisition sub-module and is used for acquiring the main keywords related to the short sentence instruction when the vocabulary type of the user instruction is the short sentence.
In an embodiment, the topic keyword related vocabulary acquiring module further includes:
the related text acquisition sub-module is in communication connection with the topic keyword acquisition module and is used for capturing related texts of main keywords by a search engine according to the acquired user topic keywords;
and the LDA model acquisition sub-module is in communication connection with the related text acquisition sub-module and is used for constructing an LDA model of the text related to the subject keywords.
According to the information retrieval method for the vehicle machine, the LDA model of the text related to the subject key words is constructed, the TF-IDF model is used for filtering the subject words and phrases if the words and phrases are related, the preliminary subject word list related to the subject key words is obtained, the related information comprising the structured data and the unstructured document data is obtained through retrieval, the vehicle machine retrieval result is obtained, the retrieval range is expanded, the vehicle machine retrieval result can be presented in a high-quality mode to be carried on a vehicle machine system, and the problem of quick and accurate retrieval of information in the vertical field of automobiles is solved.
Based on the same inventive concept, the embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements all or part of the method steps of the above method.
The present invention can implement all or part of the processes of the above methods, and can also be implemented by using a computer program to instruct related hardware, where the computer program can be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above method embodiments can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, software distribution medium, etc. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program running on the processor, and the processor executes the computer program to implement all or part of the method steps in the method.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the computer device and the various interfaces and lines connecting the various parts of the overall computer device.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the computer device by executing or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (e.g., a sound playing function, an image playing function, etc.); the storage data area may store data (e.g., audio data, video data, etc.) created according to the use of the cellular phone. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, server, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), servers and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. The information retrieval method for the vehicle machine is characterized by comprising the following steps of:
step S1, obtaining a user theme keyword;
step S2, according to the obtained user topic keywords, obtaining relevant texts of the topic keywords, constructing an LDA model of the relevant texts of the topic keywords, and obtaining relevant word lists of the topic keywords;
step S3, filtering weakly-relevant words of the subject through a TF-IDF model by taking the relevant word list of the subject key words as input, and acquiring a primary subject word list relevant to the subject key words;
step S4, taking the user theme key words as the main part and the preliminary theme word list as the auxiliary part, sequentially retrieving and capturing relevant word information, and acquiring relevant information comprising structured data and unstructured document data;
and S5, fusing the structured data and the unstructured data to obtain a vehicle-mounted device retrieval result.
2. The information retrieval method for the vehicle machine according to claim 1, wherein the step S1 specifically includes the following steps:
s11, acquiring a user instruction;
s12, judging the vocabulary type of the user instruction;
s131, when the vocabulary type of the user instruction is a word, taking the word instruction as a topic keyword;
s132, when the vocabulary type of the user instruction is a short sentence, identifying the short sentence instruction, and acquiring the topic key words related to the short sentence instruction.
3. The information retrieval method for the vehicle machine according to claim 1, wherein the step S2 specifically includes the following steps:
s21, according to the obtained user topic keywords, the search engine captures texts related to the topic keywords;
s22, constructing an LDA model of the text related to the subject key words, and obtaining a word list related to the subject key words.
4. The information retrieval method for the vehicle machine according to claim 1, wherein the step S3 specifically includes the following steps:
construction of
Figure FDA0003536231020000021
Matrix, in which vectors
Figure FDA0003536231020000022
Representing capacity of vocabulary, vector
Figure FDA0003536231020000023
Then the capacity of the text set is indicated;
obtaining frequency TF of each word in the related word list of the subject key words, reciprocal IDF of all text numbers containing the word and product of TF and IDF, and assigning the product to
Figure FDA0003536231020000024
A corresponding position in the matrix;
obtaining an IDF value of each word in a text set in a word list related to the subject key words;
comparing the IDF value of each word with the IDF threshold value;
and filtering out words with the IDF value lower than the IDF threshold value in the influence factor distribution result, and acquiring a preliminary subject word list related to the subject key words.
5. The information retrieval method for the vehicle machine according to claim 1, wherein the step S4 specifically includes the steps of:
step S41, constructing a policy network;
and step S42, taking the user theme key words as the main part and taking the preliminary theme word list as the auxiliary part, sequentially searching and capturing relevant word information through a strategy network, and acquiring relevant information comprising structured data and unstructured document data.
6. The information retrieval method for the vehicle machine according to claim 5, wherein the step S41 specifically includes the following steps:
s411, obtaining the theme relevance of the text content of the service webpage to be crawled;
step S412, obtaining the link authority of the service type web page to be crawled;
step S413, fusing the obtained theme correlation and the link authority to obtain resource sequencing of the high-quality webpage;
and S414, constructing a policy network comprising the priority of the URL to be crawled according to the resource sequencing of the acquired high-quality webpage.
7. The information retrieval method for the vehicle machine as claimed in claim 6, wherein the policy network in step S414 includes an input layer, a convolution layer, a Softmax layer and an output layer, the input layer is configured to read the feature matrix of the set of the service-class web pages to be crawled, the convolution layer is configured to evaluate the quality of each service-class web page to be crawled, the Softmax layer is configured to obtain the click probability of each service-class web page to be crawled, and the output layer is configured to output the click probability distribution of the set of service-class web pages to be crawled.
8. The utility model provides an information retrieval system for car machine which characterized in that includes:
the theme key word acquisition module is used for acquiring a user theme key word;
the topic keyword related word list acquisition module is in communication connection with the topic keyword acquisition module and is used for acquiring a topic keyword related text according to the acquired user topic keyword, constructing an LDA model of the topic keyword related text and acquiring a topic keyword related word list;
a preliminary topic word list acquisition module which is in communication connection with the topic keyword acquisition module and the topic keyword related word list acquisition module and is used for filtering weakly related topic words through a TF-IDF model by taking the topic keyword related word list as input to acquire a preliminary topic word list related to the topic keyword;
the related information acquisition module is in communication connection with the theme keyword acquisition module and the preliminary theme vocabulary acquisition module, and is used for sequentially searching and capturing related word information by taking the user theme keywords as a main part and the preliminary theme vocabulary as an auxiliary part to acquire related information comprising structured data and unstructured document data;
and the vehicle machine retrieval result acquisition module is in communication connection with the related information acquisition module and is used for fusing the structured data and the unstructured data to acquire a vehicle machine retrieval result.
9. The information retrieval system for the in-vehicle machine as recited in claim 8, wherein the topic keyword obtaining module further comprises:
the user instruction acquisition submodule is used for acquiring a user instruction;
the vocabulary type acquisition submodule is in communication connection with the user execution acquisition submodule and is used for judging the vocabulary type of the user instruction;
the first theme keyword acquisition sub-module is in communication connection with the user instruction acquisition sub-module and the vocabulary type acquisition sub-module and is used for taking the vocabulary instruction as a theme keyword when the vocabulary type of the user instruction is a word;
and the second subject keyword acquisition sub-module is in communication connection with the user instruction acquisition sub-module and the vocabulary type acquisition sub-module and is used for acquiring the main keyword related to the short sentence instruction when the vocabulary type of the user instruction is the short sentence.
10. The information retrieval system for the in-vehicle machine as recited in claim 8, wherein the topic keyword related vocabulary obtaining module further comprises:
the related text acquisition sub-module is in communication connection with the topic keyword acquisition module and is used for capturing related texts of main keywords by a search engine according to the acquired user topic keywords;
and the LDA model acquisition sub-module is in communication connection with the related text acquisition sub-module and is used for constructing an LDA model of the text related to the subject key words.
CN202210219091.2A 2022-03-08 2022-03-08 Information retrieval method and system for vehicle machine Pending CN114706978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210219091.2A CN114706978A (en) 2022-03-08 2022-03-08 Information retrieval method and system for vehicle machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210219091.2A CN114706978A (en) 2022-03-08 2022-03-08 Information retrieval method and system for vehicle machine

Publications (1)

Publication Number Publication Date
CN114706978A true CN114706978A (en) 2022-07-05

Family

ID=82168772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210219091.2A Pending CN114706978A (en) 2022-03-08 2022-03-08 Information retrieval method and system for vehicle machine

Country Status (1)

Country Link
CN (1) CN114706978A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078895A1 (en) * 2010-09-24 2012-03-29 International Business Machines Corporation Source expansion for information retrieval and information extraction
CN104765862A (en) * 2015-04-22 2015-07-08 百度在线网络技术(北京)有限公司 Document retrieval method and device
CN110555154A (en) * 2019-08-30 2019-12-10 北京科技大学 theme-oriented information retrieval method
CN110807326A (en) * 2019-10-24 2020-02-18 江汉大学 Short text keyword extraction method combining GPU-DMM and text features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078895A1 (en) * 2010-09-24 2012-03-29 International Business Machines Corporation Source expansion for information retrieval and information extraction
CN104765862A (en) * 2015-04-22 2015-07-08 百度在线网络技术(北京)有限公司 Document retrieval method and device
CN110555154A (en) * 2019-08-30 2019-12-10 北京科技大学 theme-oriented information retrieval method
CN110807326A (en) * 2019-10-24 2020-02-18 江汉大学 Short text keyword extraction method combining GPU-DMM and text features

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
张敬伟等: "基于非结构化文本检索模型综述", 《计算机应用研究》, 15 June 2017 (2017-06-15) *
李旭婕等: "基于开放信息源的实体挖掘方法研究", 《情报科学》, 29 July 2019 (2019-07-29) *
邓丁朋: "文本主题建模技术研究与实现", 《信息科技》, 9 June 2020 (2020-06-09) *
郑健珍: "定题爬虫搜索策略研究", 中国优秀硕士学位论文全文数据库, no. 07, 1 April 2007 (2007-04-01) *

Similar Documents

Publication Publication Date Title
CN106682192B (en) Method and device for training answer intention classification model based on search keywords
US11176142B2 (en) Method of data query based on evaluation and device
US8630972B2 (en) Providing context for web articles
US8868609B2 (en) Tagging method and apparatus based on structured data set
CN104573054A (en) Information pushing method and equipment
CN110321537B (en) Method and device for generating file
CN110516074B (en) Website theme classification method and device based on deep learning
CN111813905B (en) Corpus generation method, corpus generation device, computer equipment and storage medium
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
JP2022552421A (en) Techniques for dynamically creating representations for regulations
CN108959550B (en) User focus mining method, device, equipment and computer readable medium
CN110781669A (en) Text key information extraction method and device, electronic equipment and storage medium
CN112818200A (en) Data crawling and event analyzing method and system based on static website
CN115905489A (en) Method for providing bid and bid information search service
CN112330510A (en) Volunteer recommendation method and device, server and computer-readable storage medium
CN102214186B (en) Method and system for displaying object relation
CN113901169A (en) Information processing method, information processing device, electronic equipment and storage medium
CN116680481B (en) Search ranking method, apparatus, device, storage medium and computer program product
CN112818206A (en) Data classification method, device, terminal and storage medium
CN107368464B (en) Method and device for acquiring bidding product information
CN114706978A (en) Information retrieval method and system for vehicle machine
CN101799805A (en) File retrieval method and system thereof
CN111914201B (en) Processing method and device of network page
CN114595309A (en) Training device implementation method and system
AU2019290658B2 (en) Systems and methods for identifying and linking events in structured proceedings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination