CN111368036A - Method and apparatus for searching information - Google Patents

Method and apparatus for searching information Download PDF

Info

Publication number
CN111368036A
CN111368036A CN202010147266.4A CN202010147266A CN111368036A CN 111368036 A CN111368036 A CN 111368036A CN 202010147266 A CN202010147266 A CN 202010147266A CN 111368036 A CN111368036 A CN 111368036A
Authority
CN
China
Prior art keywords
novel
name
search request
existing
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010147266.4A
Other languages
Chinese (zh)
Other versions
CN111368036B (en
Inventor
郎添娇
赵旭
郭宣佑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010147266.4A priority Critical patent/CN111368036B/en
Publication of CN111368036A publication Critical patent/CN111368036A/en
Application granted granted Critical
Publication of CN111368036B publication Critical patent/CN111368036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method and a device for searching information. One embodiment of the method comprises: receiving a search request input by a user; determining whether the search type corresponding to the search request is a novel search type; if the search type corresponding to the search request is a novel search type, inputting the search request to a pre-trained analytical model to obtain a retrieval expression corresponding to the search request, wherein the analytical model is used for identifying at least one of a novel name, an author name and a hero name; and carrying out novel retrieval based on the retrieval expression to obtain novel to be pushed and pushing the novel to the user. According to the embodiment, the method and the device for recalling the novel on the basis of the retrieval of at least one of the novel name, the author name and the hero name can be realized, the application scene is expanded, and the recalling rate of the novel with the reading requirement on the user is improved.

Description

Method and apparatus for searching information
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a method and a device for searching information.
Background
The vertical search positioning provides an industry-oriented vertical search system for a plurality of vertical fields, can meet the user requirements with lower cost and more accurate professional results, and realizes accurate butt joint of the user and high-quality professional vertical resources. As the novel vertical type with more audiences and rich resources, the novel vertical type searching technology meets a series of reading requirements of novel reading users described by search words (query).
Currently, the data stored in the novel database includes key information such as novel names, author names and brief descriptions. The user is required to input the name of the novel or input the name of the novel and the name of the author simultaneously for searching, and then the corresponding novel can be recalled. When the user forgets the name of the novel and the name of the author and inputs other information of the novel to search for the novel, the corresponding novel cannot be recalled.
Disclosure of Invention
The embodiment of the application provides a method and a device for searching information.
In a first aspect, an embodiment of the present application provides a method for searching information, including: receiving a search request input by a user; determining whether the search type corresponding to the search request is a novel search type; if the search type corresponding to the search request is a novel search type, inputting the search request to a pre-trained analytical model to obtain a retrieval expression corresponding to the search request, wherein the analytical model is used for identifying at least one of a novel name, an author name and a hero name; and carrying out novel retrieval based on the retrieval expression to obtain novel to be pushed and pushing the novel to the user.
In some embodiments, determining whether the search type corresponding to the search request is a novel search type includes: and inputting the search request into a pre-trained trigger model to obtain a search type corresponding to the search request, wherein the trigger model is used for identifying the search type based on at least one of the name of the novel, the name of the author and the name of the hero.
In some embodiments, performing novel retrieval based on the retrieval expression to obtain a novel to be pushed includes: and searching in a pre-generated novel abstract information set based on the search expression, and determining the novel to be pushed, wherein the novel abstract information comprises a novel name, an author name and a hero name.
In some embodiments, the searching in the pre-generated novel abstract information set based on the searching expression and determining the novel to be pushed comprises: calculating the correlation degree of the retrieval expression and the novel abstract information in the novel abstract information set, and determining a candidate novel set; calculating the correlation degree of the search request and the candidate novels in the candidate novels set, and determining the novels set to be selected; and sequencing and de-duplicating the novel set to be selected based on the heat degree of the novel set to be selected and the correlation degree with the search request, and determining the novel to be pushed.
In some embodiments, the step of generating novel summary information comprises: acquiring the existing chapters of the novel; performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a Natural Language Processing (NLP) shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result; selecting a name set of the existing chapter from the word segmentation result based on the part of speech analysis result; determining a main role name set of the existing chapter from a name set of the existing chapter; and generating novel abstract information based on the main role name set of the existing chapters.
In some embodiments, performing word segmentation and part-of-speech analysis on the content of the existing chapter by using a natural language processing NLP shallow lexical analysis model to obtain word segmentation results and part-of-speech analysis results, including: the method comprises the steps of firstly segmenting the content of the existing chapters by utilizing an NLP shallow lexical analysis model to obtain a vocabulary set, then recombining the vocabulary set to obtain a vocabulary sequence with semantics meeting preset conditions, and determining the part of speech of vocabularies in the vocabulary sequence, wherein the preset conditions comprise at least one of the following items: the semantic meaning is reasonable and complete.
In some embodiments, determining the set of pivot names for the existing chapter from the set of names of people for the existing chapter comprises: merging similar names in the name set of the existing chapters to generate a merged name set of the existing chapters; filtering the merged name set of the existing chapters based on a pre-generated stop word list to generate a role name set of the existing chapters; and counting the word frequency of the role names in the role name set of the existing chapters, and selecting the main role name set of the existing chapters from the role name set of the existing chapters.
In some embodiments, the step of generating novel summary information further comprises: if the novel has chapter updating, acquiring an updated chapter; determining a pivot name set of the updated section, and updating the novel summary information based on the pivot name set of the updated section.
In some embodiments, the training step of the trigger model comprises: acquiring a first training sample set, wherein a first training sample in the first training sample set comprises a first sample search request and a corresponding first sample search type label; for a first training sample in a first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain a trigger model.
In some embodiments, the step of training the analytical model comprises: acquiring a second training sample set, wherein a second training sample in the second training sample set comprises a second sample search request and a corresponding second sample retrieval expression, and the second sample retrieval expression comprises at least one of a novel name, an author name and a hero name; and for a second training sample in a second training sample set, taking a second sample search request in the second training sample as input, taking a second sample retrieval expression in the second training sample as output, and training to obtain an analytical model.
In a second aspect, an embodiment of the present application provides an apparatus for searching information, including: a receiving unit configured to receive a search request input by a user; a determination unit configured to determine whether a search type corresponding to the search request is a novel search type; the analysis unit is configured to input the search request into a pre-trained analysis model if the search type corresponding to the search request is a novel search type to obtain a retrieval expression corresponding to the search request, wherein the analysis model is used for identifying at least one of a novel name, an author name and a hero name; and the retrieval unit is configured to perform novel retrieval based on the retrieval expression, obtain the novel to be pushed and push the novel to the user.
In some embodiments, the determining unit comprises: the trigger subunit is configured to input the search request to a pre-trained trigger model, and obtain a search type corresponding to the search request, wherein the trigger model is used for identifying the search type based on at least one of the name of the novel, the name of the author, and the name of the hero.
In some embodiments, the retrieval unit comprises: the searching subunit is configured to search in a pre-generated novel abstract information set based on the searching expression and determine novel to be pushed, wherein the novel abstract information comprises a novel name, an author name and a hero name.
In some embodiments, retrieving the subunit includes: the first calculation module is configured to calculate the correlation degree of the retrieval expression and the novel abstract information in the novel abstract information set, and determine a candidate novel set; the second calculation module is configured to calculate the correlation degree of the search request and the candidate novels in the candidate novels set, and determine a to-be-selected novels set; and the sorting and de-duplication module is configured to sort and de-duplicate the novel set to be selected based on the heat of the novel set to be selected and the correlation degree with the search request, and determine the novel to be pushed.
In some embodiments, the step of generating novel summary information comprises: acquiring the existing chapters of the novel; performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a Natural Language Processing (NLP) shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result; selecting a name set of the existing chapter from the word segmentation result based on the part of speech analysis result; determining a main role name set of the existing chapter from a name set of the existing chapter; and generating novel abstract information based on the main role name set of the existing chapters.
In some embodiments, performing word segmentation and part-of-speech analysis on the content of the existing chapter by using a natural language processing NLP shallow lexical analysis model to obtain word segmentation results and part-of-speech analysis results, including: the method comprises the steps of firstly segmenting the content of the existing chapters by utilizing an NLP shallow lexical analysis model to obtain a vocabulary set, then recombining the vocabulary set to obtain a vocabulary sequence with semantics meeting preset conditions, and determining the part of speech of vocabularies in the vocabulary sequence, wherein the preset conditions comprise at least one of the following items: the semantic meaning is reasonable and complete.
In some embodiments, determining the set of pivot names for the existing chapter from the set of names of people for the existing chapter comprises: merging similar names in the name set of the existing chapters to generate a merged name set of the existing chapters; filtering the merged name set of the existing chapters based on a pre-generated stop word list to generate a role name set of the existing chapters; and counting the word frequency of the role names in the role name set of the existing chapters, and selecting the main role name set of the existing chapters from the role name set of the existing chapters.
In some embodiments, the step of generating novel summary information further comprises: if the novel has chapter updating, acquiring an updated chapter; determining a pivot name set of the updated section, and updating the novel summary information based on the pivot name set of the updated section.
In some embodiments, the training step of the trigger model comprises: acquiring a first training sample set, wherein a first training sample in the first training sample set comprises a first sample search request and a corresponding first sample search type label; for a first training sample in a first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain a trigger model.
In some embodiments, the step of training the analytical model comprises: acquiring a second training sample set, wherein a second training sample in the second training sample set comprises a second sample search request and a corresponding second sample retrieval expression, and the second sample retrieval expression comprises at least one of a novel name, an author name and a hero name; and for a second training sample in a second training sample set, taking a second sample search request in the second training sample as input, taking a second sample retrieval expression in the second training sample as output, and training to obtain an analytical model.
In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
According to the method and the device for searching the information, whether the search type corresponding to the search request input by the user is a novel search type is determined; then, if the search type corresponding to the search request is a novel search type, inputting the search request into an analysis model to obtain a retrieval expression corresponding to the search request; and finally, carrying out novel retrieval based on the retrieval expression to obtain novel to be pushed and push the novel to the user. The analytic model can identify at least one of the novel name, the author name and the hero name from the search request of the novel search type, solves the technical problem that the hero name in the search request cannot be identified or the hero name is mistakenly identified as the author name in the prior art, carries out retrieval based on at least one of the novel name, the author name and the hero name, enlarges the application scene and improves the recall rate of the novel with reading requirements for users.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture to which the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for searching information according to the present application;
FIG. 3 is a flow diagram of yet another embodiment of a method for searching information according to the present application;
FIG. 4 is a flow diagram of one embodiment of a novel summary information generation method according to the present application;
FIG. 5 is a flow chart of an application scenario of a method for searching information according to the present application;
FIG. 6 is a schematic block diagram illustrating one embodiment of an apparatus for searching information according to the present application;
FIG. 7 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for searching information or the apparatus for searching information of the present application may be applied.
As shown in fig. 1, a system architecture 100 may include a terminal device 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various communication client applications, such as a reading application, may be installed on the terminal device 101.
The terminal apparatus 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices supporting information search, including but not limited to a smart phone, a tablet computer, a portable computer, a desktop computer, and so on. When the terminal apparatus 101 is software, it can be installed in the above-described electronic apparatus. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The server 103 may be a server providing various services, for example, a background server of a reading application, and the background server of the reading application may analyze and perform processing on data such as a search request received from the terminal device 101, and feed back a processing result (for example, a novel to be pushed) to the terminal device 101.
The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for searching for information provided in the embodiment of the present application is generally executed by the server 103, and accordingly, the apparatus for searching for information is generally disposed in the server 103.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for searching information in accordance with the present application is shown. The method for searching information includes the steps of:
step 201, receiving a search request input by a user.
In the present embodiment, an execution subject (e.g., the server 103 shown in fig. 1) of the method for searching information may receive a search request input by a user from a terminal device (e.g., the terminal device 101 shown in fig. 1). The search request may include search information (query) input by the user. The search information may be information describing a reading requirement of the user. For example, if a user has a reading need for a novel, the search information typically includes at least one of the novel name, author name, and hero name of the novel. Specifically, the user may open a reading-type application installed on the terminal device, input search information including at least one of a novel name, an author name, and a hero name in an input box, and click a search button. When the user clicks the search button, his terminal device may send a search request to the execution body.
Step 202, determining whether the search type corresponding to the search request is a novel search type.
In this embodiment, the execution main body may determine whether the search type corresponding to the search request is a novel search type. If yes, go to step 203. Step 203, if the search type corresponding to the search request is the novel search type, inputting the search request to a pre-trained analysis model to obtain a retrieval expression corresponding to the search request.
In this embodiment, if the search type corresponding to the search request is a novel search type, the execution main body may input the search request to the analysis model to obtain the search expression corresponding to the search request. Wherein the analytical model may be used to identify at least one of a novel name, an author name, and a hero name. The retrieval expression may be spliced by at least one of a novel name, an author name, and a hero name. Specifically, the analytic model can perform word segmentation logic and component analysis on the search request, identify at least one of a novel name, an author name and a main role name, and splice a retrieval expression.
In some optional implementations of this embodiment, the analytical model may be obtained by training as follows:
first, a second set of training samples is obtained.
Wherein the second training sample in the second set of training samples may include a second sample search request and a corresponding second sample retrieval expression. The search type corresponding to the second sample search request may be a novel search type, and the corresponding second sample retrieval expression may be formed by splicing at least one of a novel name, an author name and a hero name.
In addition, the technical problem that the search request only comprising the novel name and the hero name or the search request only comprising the hero name cannot be identified and recalled is solved. The executing body can also obtain a main role name set of the novel and construct a second sample search request based on the main role name set so as to obtain a plurality of second training samples.
Then, for a second training sample in a second training sample set, taking a second sample search request in the second training sample as input, taking a second sample retrieval expression in the second training sample as output, and training to obtain an analytic model.
Generally, the execution subject may train an RNN (Recurrent Neural Networks) model using the second training sample, optimize the model according to the accuracy, and finally generate the analysis model.
And 204, retrieving the novel based on the retrieval expression to obtain the novel to be pushed, and pushing the novel to the user.
In this embodiment, the executing body may perform a novel search based on the search expression, obtain a novel to be pushed, and push the novel to the user. For example, the execution subject may retrieve a novel including words in the retrieval expression and push the novel to be pushed to the user.
In some optional implementation manners of this embodiment, the executing body may search in a pre-generated novel summary information set based on the search expression, and determine the novel to be pushed. For example, the executing body may retrieve novel abstract information including words in the retrieval expression from the novel abstract information set, and push a novel corresponding to the novel abstract information to the user as a novel to be pushed. Wherein, the novel abstract information can be in an XML format, and the contents thereof include, but are not limited to, novel name, author name, chief role name, classification, tag, novel number and the like.
The method for searching the information provided by the embodiment of the application comprises the steps of firstly determining whether a search type corresponding to a search request input by a user is a novel search type; then, if the search type corresponding to the search request is a novel search type, inputting the search request into an analysis model to obtain a retrieval expression corresponding to the search request; and finally, carrying out novel retrieval based on the retrieval expression to obtain novel to be pushed and push the novel to the user. The analytic model can identify at least one of the novel name, the author name and the hero name from the search request of the novel search type, solves the technical problem that the hero name in the search request cannot be identified or the hero name is mistakenly identified as the author name in the prior art, carries out retrieval based on at least one of the novel name, the author name and the hero name, enlarges the application scene and improves the recall rate of the novel with reading requirements for users.
With further reference to FIG. 3, a flow 300 of yet another embodiment of a method for searching information according to the present application is shown. The method for searching information includes the steps of:
step 301, receiving a search request input by a user.
In this embodiment, the specific operation of step 301 has been described in detail in step 201 in the embodiment shown in fig. 2, and is not described herein again.
Step 302, inputting the search request to a pre-trained trigger model to obtain a search type corresponding to the search request.
In this embodiment, an executing entity (e.g., the server 103 shown in fig. 1) of the method for searching for information may input a search request to the trigger model, and obtain a search type corresponding to the search request. Wherein the trigger model may be used to identify a search type based on at least one of the name of the novel, the name of the author, and the name of the hero. Specifically, the trigger model may analyze the search request to determine a probability that the search request belongs to N (N is a positive integer) search types. The preset N search types may include, but are not limited to, a novel search type, a news event search type, an encyclopedia search type, a weather search type, and the like.
In some optional implementations of this embodiment, if the type of novel search is preset, the trigger model may identify and recall a search request describing reading requirements of the novel. In general, the trigger model may identify a probability that a search request belongs to a novel search type. If the probability is greater than a preset probability threshold, the search type corresponding to the search request is a novel search type; and if the probability is not greater than the preset probability threshold, the search type corresponding to the search request is a non-novel search type.
In some optional implementations of this embodiment, the trigger model may be trained by:
first, a first set of training samples is obtained.
Wherein a first training sample in the first set of training samples may include a first sample search request and a corresponding first sample search type tag. If the first sample search request comprises at least one of a novel name, an author name and a hero name, the corresponding search type is a novel search type, the corresponding first sample search type label has a value of 1, and the corresponding first training sample is a positive sample. If the first sample search request does not include any of the novel name, the author name and the hero name, the corresponding search type is a non-novel search type, the corresponding first sample search type label has a value of 0, and the corresponding first training sample is a negative sample.
In addition, the technical problem that the search request only comprising the novel name and the hero name or the search request only comprising the hero name cannot be identified and recalled is solved. The executing agent may also obtain a pivot name set of the novel and construct a first sample search request based on the pivot name set to obtain a plurality of first training samples.
Secondly, as for a first training sample in a first training sample set, a first sample search request in the first training sample is used as input, a first sample search type label in the first training sample is used as output, and a trigger model is obtained through training.
Generally, the executing entity may train the binary model with a first training sample to obtain a trigger model. In addition, manual rules are set, including white lists (meeting the list and recalling), black lists (meeting the list and not recalling), and other strategies are carried out to train the model. And adding a certain number of negative samples according to the training effect of the model, and performing iterative optimization on the model, so that the false recall rate is reduced while the recall rate is ensured to be met.
Step 303, if the search type corresponding to the search request is the novel search type, inputting the search request to a pre-trained analysis model to obtain a retrieval expression corresponding to the search request.
In this embodiment, the specific operation of step 303 has been described in detail in step 203 in the embodiment shown in fig. 2, and is not described herein again.
And step 304, calculating the correlation degree of the retrieval expression and the novel abstract information in the novel abstract information set, and determining a candidate novel set.
In this embodiment, the execution subject may calculate the correlation between the search expression and the novel summary information in the novel summary information set, and determine the candidate novel set. For example, the execution subject may select novel summary information with a top relevance rank (as in the case of the top 20) from the novel summary information set, and generate a candidate novel set by using a novel corresponding to the selected novel summary information as a candidate novel. The retrieval expression can be formed by splicing at least one of a novel name, an author name and a hero name. The more words in the retrieval expression are included in the novel abstract information, the higher the relevance of the words and the retrieval expression.
And 305, calculating the correlation degree of the search request and the candidate novels in the candidate novels set, and determining the novels set to be selected.
In this embodiment, the executing body may calculate a degree of correlation between the search request and the candidate novel in the candidate novel set, and determine the novel set to be selected.
In this embodiment, the executing body may calculate a degree of correlation between the search request and the candidate novel in the candidate novel set, and determine the novel set to be selected. For example, for a candidate novel in the candidate novel set, the execution subject may calculate information such as edit distance, closeness, BM25 relevance, etc. of the search request and the candidate novel, and generate the relevance by combining the information. Subsequently, the executing body may compare the correlation with a preset correlation threshold, and if the correlation is greater than the preset correlation threshold, add the candidate novel as a novel to be selected to the novel set.
And step 306, sorting and removing the duplicate of the novel set to be selected based on the popularity of the novel to be selected in the novel set to be selected and the relevance of the novel to the search request, determining the novel to be pushed, and pushing the novel to the user.
In this embodiment, the executing body may sort and deduplicate the to-be-selected novel set based on the popularity of the to-be-selected novel in the to-be-selected novel set and the relevance to the search request, and determine the novel to be pushed. For example, the executing body may sort the set of novels to be selected by combining the popularity and the relevance to the search request, then perform a deduplication operation to generate a comprehensive ranking, and finally push the candidate novels with the highest comprehensive ranking to the user as the novels to be pushed.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the method for searching for information in the present embodiment highlights the triggering step and the retrieving step. Therefore, in the scheme described in this embodiment, the trigger model can identify the search request including at least one of the novel name, the author name, and the hero name as the novel search type, so that the technical problem that the search request including only the novel name and the hero name or the search request including only the hero name cannot be identified in the prior art is solved, and the recall rate of the trigger model to the search request of the novel search type is improved. In addition, the scheme described in this embodiment combines multiple aspects of the relevance between the retrieval expression and the novel abstract information, the relevance between the search request and the novel, the popularity of the novel, and the like to screen layer by layer, so that the matching degree between the selected novel and the reading requirement of the user is improved, and further, the click rate of the user on the pushed novel is improved.
With further reference to fig. 4, a flow 400 of one embodiment of a novel summary information generation method according to the present application is shown. The novel abstract information generation method comprises the following steps:
step 401, obtaining the existing chapters of the novel.
In the present embodiment, the execution subject of the novel summary information generation method (e.g., the server 103 shown in fig. 1) can acquire the existing chapter of the novel. Typically, the executing agent may retrieve an existing chapter of a novel from a database. For example, two databases are preset, one for storing an existing chapter directory of a large number of novel, and the other for storing existing chapters of a large number of novel. If an existing chapter of a novel needs to be obtained, the execution main body can firstly search an existing chapter catalog of the novel from a database storing the existing chapter catalog by taking the novel name of the novel as an index; the existing chapter of the novel is then looked up from a database storing existing chapters, indexed by the existing chapter directory of the novel.
It should be understood that the execution body described above may acquire all existing chapters of the novel, or may acquire only a part of the existing chapters of the novel. For example, if the number of existing chapters of a novel exceeds 1000, the execution main body may acquire only the first 1000 chapters of the novel.
Step 402, performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a Natural Language Processing (NLP) shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result.
In this embodiment, the execution subject may perform word segmentation and part-of-speech analysis on the content of the existing chapter by using a NLP (Natural Language Processing) shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result. The NLP shallow lexical analysis model is based on massive internet data and is combined with a mixed structure of a structured perceptron and a deep neural network to realize Chinese word segmentation and part-of-speech analysis.
In some optional implementation manners of this embodiment, the execution main body may use the NLP shallow lexical analysis model to perform word segmentation on the content of the existing chapter to obtain a vocabulary set, and then perform recombination on the vocabulary set to obtain a vocabulary sequence whose semantics satisfy the preset condition, and determine the part of speech of the vocabulary in the vocabulary sequence. Wherein the vocabulary in the vocabulary set is a basic granularity vocabulary in the content of the existing section. The preset conditions may include, but are not limited to, at least one of the following: reasonable semantics, complete semantics, and the like. The processing process of the NLP shallow lexical analysis model on the input text is a granularity and part-of-speech combined optimization process. Compared with a streamline working mode of firstly segmenting words and then labeling parts of speech, the two tasks of granularity and parts of speech can share characteristics. Therefore, the problem of error diffusion and propagation is relieved, and meanwhile, the fact that various morpheme parts of speech have to be introduced due to the problem of dispersion is avoided, so that the expressive meaning of the part of speech tagging result is improved.
And 403, selecting a name set of the existing chapter from the word segmentation result based on the part of speech analysis result.
In this embodiment, the execution main body may select a name set of an existing chapter from the word segmentation result based on the result of the part of speech analysis. For example, the execution subject may select a word whose part of speech is a person name from the word segmentation result, and generate a person name set.
In addition, the execution main body can perform word frequency statistics on the names in the name set and store the names in a dictionary mode. The storage structure may be { gid: { name: freq } }. Wherein gid represents the name of a novel, name represents the vocabulary of a name, and freq represents the word frequency.
Step 404, determining a pivot name set of the existing chapter from the name set of the existing chapter.
In this embodiment, the execution main body may determine a pivot name set of an existing chapter from a name set of an existing chapter. For example, the main body may select a name with a top word frequency (e.g., top 5) from a name set of an existing chapter, and generate a hero name set.
In some optional implementations of this embodiment, the executing entity may determine the pivot name set by:
first, similar names in the name sets of the existing chapters are merged to generate a merged name set of the existing chapters.
In general, there will be some similar names in the set of names, for example, for the same person, the names may include names consisting of surnames and first names, and names consisting of only first names. And both the name and the name contain the name of the person and thus belong to similar person names. Also, such similar names belong to the same person and can therefore be merged. For example, during the process of person name identification, a { name: freq } dictionary is maintained and updated. And when the NLP shallow lexical analysis model identifies a new name, a judgment function is added. Specifically, if the new name coincides with a name in the dictionary and the length of the coincident character string is greater than 4(gbk codes), that is, the new name contains the name in the dictionary or the name in the dictionary contains the new name, the execution body may merge the new name with the name in the dictionary and retain the name with a large character string length.
Then, the merged name set of the existing chapters is filtered based on the pre-generated stop word list, and the role name set of the existing chapters is generated.
In general, there may be some words that are not actually named in the words whose wording is the name of the person in the word segmentation result, such as the word stack "haha", which is called the predicate "master", etc. Therefore, there is a need to establish and maintain a deactivation vocabulary and filtering strategy that removes some of the proper nouns and interfering words identified as names from the merged set of names. For example, stopwords are filtered from the set of merged names by first filtering stopwords from the set of merged names, then building and maintaining a stopwords table. The deactivation vocabulary can be formulated through the following steps: firstly, randomly extracting a plurality of pieces of test data; then, marking the main role name list; then finding out the entry with the wrong label to form an entry list; finally, the entry list is summarized, and the word frequency is counted, so that a stop word list is generated.
And finally, counting the word frequency of the role names in the role name set of the existing chapters, and selecting the main role name set of the existing chapters from the role name set of the existing chapters.
For example, a character name with a top word frequency (e.g., top 5) is selected from the character name set, and a pivot name set is generated.
Step 405, generating novel abstract information based on the main role name set of the existing chapters.
In this embodiment, the execution subject may generate novel summary information based on the main character name set of the existing chapter. For example, the novel summary information is generated by adding other descriptive information of some novel on the basis of the main role name set. The description information may include, but is not limited to, a novel name, an author name, a chapter number, a category, a label, a novel number, and the like. Thus, the generated novel abstract information can be in an XML format, and the content of the novel abstract information can comprise not only the name of the novel and the name of the author, but also the name of a chief role. The storage structure of the novel abstract information can be { gid: { name1: freq1, name2: freq2, name3: freq3 … }, chapter _ num: n1}, wherein gid is the name of a novel, name1, name2, name3, etc. are the names of main corners (gbk coding), freq1, freq2, freq3, etc. are the word frequencies corresponding to the names of the main corners, chapter _ num is the number of chapters, and n1 is the number of chapters value.
In some optional implementation manners of this embodiment, if there is a chapter update in the novel, the executing main body may obtain the updated chapter, determine the pivot name set of the updated chapter by performing step 402 and step 404 again, and update the summary information of the novel based on the pivot name set of the updated chapter. Generally, the novel summary information is updated when the number of updated chapters is not less than a preset number value (e.g., 50). Further, when the total chapter number of the novel of the update chapter exceeds 1000, the execution main body may acquire only the first 1000 chapters in the update chapter to update the novel summary information.
The method for generating the novel abstract information comprises the steps of firstly, performing word segmentation and part-of-speech analysis on the content of the existing chapters of the novel by adopting a Natural Language Processing (NLP) shallow lexical analysis model to obtain word segmentation results and part-of-speech analysis results; then, selecting a name set of the existing chapter from the word segmentation result based on the part of speech analysis result; then, determining a main role name set of the existing chapter from a name set of the existing chapter; and finally, generating novel abstract information based on the main role name set of the existing chapters. The NLP shallow lexical analysis model is adopted to identify the names of the existing chapters of the novel, and therefore the name identification accuracy is improved. In addition, the key role names are added to the novel abstract information, so that the content richness of the novel abstract information is improved, and the recall rate of the novel which has reading requirements on users is improved.
With further reference to fig. 5, a flow chart of one application scenario of a method for searching information is shown. As shown in fig. 5, the application scenario includes an offline portion and an online portion. Wherein, the online part comprises steps 501-508, and the online part comprises steps 509-514, which are as follows:
step 501, crawling the content of the novel chapter.
And step 502, performing word segmentation and lexical analysis on the NLP shallow lexical analysis model, and screening out vocabularies with parts of speech being names of people.
Step 503, deactivating vocabulary filtering and maintenance.
At step 504, the names of the people are merged.
In step 505, principal role name data is generated.
In step 506, the chapter update number is greater than 50, and the process returns to step 501.
Step 507, the novel query triggers model training.
Step 508, novel query parsing model training.
Step 509, receiving the query from the mobile phone or the computer.
And step 510, calling the trigger model and the analysis model, recalling the novel query, and splicing into a retrieval expression.
And step 511, inquiring the offline novel data schema, and calculating to generate a candidate list according to the relevance between the novel data and the query.
And step 512, the online model scores the relevance and the heat.
And step 513, reordering according to the scores.
And step 514, recalling the highest-ranking novel, generating a novel card and pushing the novel card to the user.
With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for searching for information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 6, the apparatus 600 for searching information of the present embodiment may include: receiving section 601, determining section 602, analyzing section 603, and retrieving section 604. The receiving unit 601 is configured to receive a search request input by a user; a determining unit 602 configured to determine whether a search type corresponding to the search request is a novel search type; the parsing unit 603 is configured to, if the search type corresponding to the search request is a novel search type, input the search request to a pre-trained parsing model to obtain a retrieval expression corresponding to the search request, where the parsing model is used to identify at least one of a novel name, an author name, and a hero name; and the retrieval unit 604 is configured to perform novel retrieval based on the retrieval expression, obtain the novel to be pushed, and push the novel to the user.
In the present embodiment, in the apparatus 600 for searching information: the detailed processing and the technical effects of the receiving unit 601, the determining unit 602, the analyzing unit 603 and the retrieving unit 604 can refer to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the determining unit 602 includes: and a trigger subunit (not shown in the figure) configured to input the search request to a pre-trained trigger model, and obtain a search type corresponding to the search request, wherein the trigger model is used for identifying the search type based on at least one of the name of the novel, the name of the author, and the name of the hero.
In some optional implementations of this embodiment, the retrieving unit 604 includes: and the retrieval subunit (not shown in the figure) is configured to retrieve in a pre-generated novel abstract information set based on the retrieval expression and determine the novel to be pushed, wherein the novel abstract information comprises a novel name, an author name and a hero name.
In some optional implementations of this embodiment, the retrieving subunit includes: a first calculating module (not shown in the figure) configured to calculate the correlation degree of the retrieval expression and the novel abstract information in the novel abstract information set, and determine a candidate novel set; a second calculation module (not shown in the figure) configured to calculate the correlation degree of the search request and the candidate novel in the candidate novel set, and determine a novel set to be selected; and the sorting and de-duplication module (not shown in the figure) is configured to sort and de-duplicate the novel set to be selected based on the heat degree of the novel set to be selected and the correlation degree with the search request, and determine the novel to be pushed.
In some optional implementations of this embodiment, the step of generating the novel summary information includes: acquiring the existing chapters of the novel; performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a Natural Language Processing (NLP) shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result; selecting a name set of the existing chapter from the word segmentation result based on the part of speech analysis result; determining a main role name set of the existing chapter from a name set of the existing chapter; and generating novel abstract information based on the main role name set of the existing chapters.
In some optional implementation manners of this embodiment, performing word segmentation and part-of-speech analysis on the content of the existing chapter by using a natural language processing NLP shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result, including: the method comprises the steps of firstly segmenting the content of the existing chapters by utilizing an NLP shallow lexical analysis model to obtain a vocabulary set, then recombining the vocabulary set to obtain a vocabulary sequence with semantics meeting preset conditions, and determining the part of speech of vocabularies in the vocabulary sequence, wherein the preset conditions comprise at least one of the following items: the semantic meaning is reasonable and complete.
In some optional implementations of this embodiment, determining the set of principal names of the existing chapter from the set of names of people of the existing chapter includes: merging similar names in the name set of the existing chapters to generate a merged name set of the existing chapters; filtering the merged name set of the existing chapters based on a pre-generated stop word list to generate a role name set of the existing chapters; and counting the word frequency of the role names in the role name set of the existing chapters, and selecting the main role name set of the existing chapters from the role name set of the existing chapters.
In some optional implementations of this embodiment, the step of generating the novel summary information further includes: if the novel has chapter updating, acquiring an updated chapter; determining a pivot name set of the updated section, and updating the novel summary information based on the pivot name set of the updated section.
In some optional implementations of this embodiment, the training step of the trigger model includes: acquiring a first training sample set, wherein a first training sample in the first training sample set comprises a first sample search request and a corresponding first sample search type label; for a first training sample in a first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain a trigger model.
In some optional implementations of this embodiment, the training step of the analytical model includes: acquiring a second training sample set, wherein a second training sample in the second training sample set comprises a second sample search request and a corresponding second sample retrieval expression, and the second sample retrieval expression comprises at least one of a novel name, an author name and a hero name; and for a second training sample in a second training sample set, taking a second sample search request in the second training sample as input, taking a second sample retrieval expression in the second training sample as output, and training to obtain an analytical model.
Referring now to FIG. 7, a block diagram of a computer system 700 suitable for use in implementing an electronic device (e.g., server 103 shown in FIG. 1) of an embodiment of the present application is shown. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a receiving unit, a determining unit, a parsing unit, and a retrieving unit. Where the names of the units do not constitute a limitation on the units themselves in this case, for example, a receiving unit may also be described as a "unit that receives a search request input by a user".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a search request input by a user; determining whether the search type corresponding to the search request is a novel search type; if the search type corresponding to the search request is a novel search type, inputting the search request to a pre-trained analytical model to obtain a retrieval expression corresponding to the search request, wherein the analytical model is used for identifying at least one of a novel name, an author name and a hero name; and carrying out novel retrieval based on the retrieval expression to obtain novel to be pushed and pushing the novel to the user.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (22)

1. A method for searching information, comprising:
receiving a search request input by a user;
determining whether the search type corresponding to the search request is a novel search type;
if the search type corresponding to the search request is a novel search type, inputting the search request to a pre-trained analytical model to obtain a retrieval expression corresponding to the search request, wherein the analytical model is used for identifying at least one of a novel name, an author name and a hero name;
and carrying out novel retrieval based on the retrieval expression to obtain novel to be pushed and pushing the novel to the user.
2. The method of claim 1, wherein the determining whether the search type to which the search request corresponds is a novel search type comprises:
and inputting the search request into a pre-trained trigger model to obtain a search type corresponding to the search request, wherein the trigger model is used for identifying the search type based on at least one of the name of the novel, the name of the author and the name of the chief actor.
3. The method according to claim 1, wherein the retrieving the novel based on the retrieval expression to obtain the novel to be pushed comprises:
and searching in a pre-generated novel abstract information set based on the search expression, and determining the novel to be pushed, wherein the novel abstract information comprises a novel name, an author name and a hero name.
4. The method of claim 3, wherein the retrieving in a pre-generated novel summary information set based on the retrieval expression to determine the novel to be pushed comprises:
calculating the correlation degree of the retrieval expression and the novel abstract information in the novel abstract information set, and determining a candidate novel set;
calculating the correlation degree of the search request and the candidate novels in the candidate novels set, and determining a novels set to be selected;
and sequencing and de-duplicating the novel set to be selected based on the heat degree of the novel set to be selected and the correlation degree with the search request, and determining the novel to be pushed.
5. The method of claim 1, wherein the step of generating the novel summary information comprises:
acquiring the existing chapters of the novel;
performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a Natural Language Processing (NLP) shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result;
selecting a name set of the existing chapter from the word segmentation result based on the part of speech analysis result;
determining a set of pivot names for the existing chapter from the set of names of the existing chapter;
and generating the novel abstract information based on the main role name set of the existing chapters.
6. The method of claim 5, wherein the performing word segmentation and part-of-speech analysis on the content of the existing chapter by using the NLP shallow lexical analysis model to obtain word segmentation results and part-of-speech analysis results comprises:
utilizing the NLP shallow lexical analysis model to cut words of the content of the existing chapters to obtain a vocabulary set, then recombining the vocabulary set to obtain a vocabulary sequence with semantics meeting preset conditions, and determining the part of speech of the vocabulary in the vocabulary sequence, wherein the preset conditions comprise at least one of the following items: the semantic meaning is reasonable and complete.
7. The method of claim 5, wherein said determining the set of pivot names for the existing chapter from the set of names of people for the existing chapter comprises:
merging similar names in the name set of the existing chapters to generate a merged name set of the existing chapters;
filtering the merged name set of the existing chapters based on a pre-generated stop word list to generate a role name set of the existing chapters;
and counting the word frequency of the role names in the role name set of the existing chapters, and selecting the main role name set of the existing chapters from the role name set of the existing chapters.
8. The method of claim 5, wherein the step of generating the novel summary information further comprises:
if the novel has chapter updating, acquiring an updating chapter;
determining a pivot name set of the updated section, and updating the novel summary information based on the pivot name set of the updated section.
9. The method according to one of claims 1 to 8, wherein the training of the trigger model comprises:
obtaining a first training sample set, wherein a first training sample in the first training sample set comprises a first sample search request and a corresponding first sample search type label;
and for a first training sample in the first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain the trigger model.
10. The method according to one of claims 1-8, wherein the step of training the analytical model comprises:
acquiring a second training sample set, wherein a second training sample in the second training sample set comprises a second sample search request and a corresponding second sample retrieval expression, and the second sample retrieval expression comprises at least one of a novel name, an author name and a hero name;
and for a second training sample in the second training sample set, taking a second sample search request in the second training sample as input, taking a second sample retrieval expression in the second training sample as output, and training to obtain the analytic model.
11. An apparatus for searching information, comprising:
a receiving unit configured to receive a search request input by a user;
a determining unit configured to determine whether a search type corresponding to the search request is a novel search type;
the analysis unit is configured to input the search request into a pre-trained analysis model to obtain a retrieval expression corresponding to the search request if the search type corresponding to the search request is a novel search type, wherein the analysis model is used for identifying at least one of a novel name, an author name and a hero name;
and the retrieval unit is configured to perform novel retrieval based on the retrieval expression, obtain a novel to be pushed and push the novel to the user.
12. The apparatus of claim 11, wherein the determining unit comprises:
the triggering subunit is configured to input the search request to a pre-trained triggering model, and obtain a search type corresponding to the search request, wherein the triggering model is used for identifying the search type based on at least one of the name of the novel, the name of the author, and the name of the hero.
13. The apparatus of claim 11, wherein the retrieving unit comprises:
the searching subunit is configured to search in a pre-generated novel abstract information set based on the searching expression and determine the novel to be pushed, wherein the novel abstract information comprises a novel name, an author name and a hero name.
14. The apparatus of claim 13, wherein the retrieval subunit comprises:
a first calculation module configured to calculate a correlation degree between the retrieval expression and the novel abstract information in the novel abstract information set, and determine a candidate novel set;
the second calculation module is configured to calculate the correlation degree of the search request and the candidate novel in the candidate novel set and determine a novel set to be selected;
the sorting and de-duplication module is configured to sort and de-duplicate the to-be-pushed novel set based on the popularity of the to-be-selected novel in the to-be-selected novel set and the relevance of the to-be-selected novel set and the search request.
15. The apparatus of claim 11, wherein the generating of the novel summary information comprises:
acquiring the existing chapters of the novel;
performing word segmentation and part-of-speech analysis on the content of the existing chapter by adopting a Natural Language Processing (NLP) shallow lexical analysis model to obtain a word segmentation result and a part-of-speech analysis result;
selecting a name set of the existing chapter from the word segmentation result based on the part of speech analysis result;
determining a set of pivot names for the existing chapter from the set of names of the existing chapter;
and generating the novel abstract information based on the main role name set of the existing chapters.
16. The apparatus of claim 15, wherein the performing word segmentation and part-of-speech analysis on the content of the existing section by using the NLP shallow lexical analysis model to obtain word segmentation results and part-of-speech analysis results comprises:
utilizing the NLP shallow lexical analysis model to cut words of the content of the existing chapters to obtain a vocabulary set, then recombining the vocabulary set to obtain a vocabulary sequence with semantics meeting preset conditions, and determining the part of speech of the vocabulary in the vocabulary sequence, wherein the preset conditions comprise at least one of the following items: the semantic meaning is reasonable and complete.
17. The apparatus of claim 15, wherein the determining the set of pivot names for the existing chapter from the set of names of people for the existing chapter comprises:
merging similar names in the name set of the existing chapters to generate a merged name set of the existing chapters;
filtering the merged name set of the existing chapters based on a pre-generated stop word list to generate a role name set of the existing chapters;
and counting the word frequency of the role names in the role name set of the existing chapters, and selecting the main role name set of the existing chapters from the role name set of the existing chapters.
18. The apparatus of claim 15, wherein the generating of the novel summary information further comprises:
if the novel has chapter updating, acquiring an updating chapter;
determining a pivot name set of the updated section, and updating the novel summary information based on the pivot name set of the updated section.
19. The apparatus according to one of claims 11-18, wherein the training of the trigger model comprises:
obtaining a first training sample set, wherein a first training sample in the first training sample set comprises a first sample search request and a corresponding first sample search type label;
and for a first training sample in the first training sample set, taking a first sample search request in the first training sample as input, taking a first sample search type label in the first training sample as output, and training to obtain the trigger model.
20. The apparatus according to one of claims 11-18, wherein the step of training the analytical model comprises:
acquiring a second training sample set, wherein a second training sample in the second training sample set comprises a second sample search request and a corresponding second sample retrieval expression, and the second sample retrieval expression comprises at least one of a novel name, an author name and a hero name;
and for a second training sample in the second training sample set, taking a second sample search request in the second training sample as input, taking a second sample retrieval expression in the second training sample as output, and training to obtain the analytic model.
21. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.
22. A computer-readable medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN202010147266.4A 2020-03-05 2020-03-05 Method and device for searching information Active CN111368036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010147266.4A CN111368036B (en) 2020-03-05 2020-03-05 Method and device for searching information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010147266.4A CN111368036B (en) 2020-03-05 2020-03-05 Method and device for searching information

Publications (2)

Publication Number Publication Date
CN111368036A true CN111368036A (en) 2020-07-03
CN111368036B CN111368036B (en) 2023-09-26

Family

ID=71208640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010147266.4A Active CN111368036B (en) 2020-03-05 2020-03-05 Method and device for searching information

Country Status (1)

Country Link
CN (1) CN111368036B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763081A (en) * 2020-08-26 2021-12-07 北京沃东天骏信息技术有限公司 Article recall method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090216790A1 (en) * 2008-02-22 2009-08-27 Jeffrey Matthew Dexter Systems and Methods of Searching a Document for Relevant Chunks in Response to a Search Request
US20120059820A1 (en) * 2010-09-07 2012-03-08 International Business Machines Corporation Aggregation, Organization and Provision of Professional and Social Information
CN103902697A (en) * 2014-03-28 2014-07-02 百度在线网络技术(北京)有限公司 Combinatorial search method, client and server
US20150012562A1 (en) * 2013-02-04 2015-01-08 Zola Books Inc. Literary Recommendation Engine
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
US20190065507A1 (en) * 2017-08-22 2019-02-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for information processing
CN109492089A (en) * 2018-10-18 2019-03-19 上海连尚网络科技有限公司 Method and apparatus for output information
US20190108275A1 (en) * 2017-10-06 2019-04-11 Elsevier, Inc. Systems and methods for providing recommendations for academic and research entities
WO2019185689A1 (en) * 2018-03-27 2019-10-03 Nokia Technologies Oy Creation of rich content from textual content
CN110781397A (en) * 2019-10-29 2020-02-11 上海连尚网络科技有限公司 Method and equipment for providing novel information

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090216790A1 (en) * 2008-02-22 2009-08-27 Jeffrey Matthew Dexter Systems and Methods of Searching a Document for Relevant Chunks in Response to a Search Request
US20120059820A1 (en) * 2010-09-07 2012-03-08 International Business Machines Corporation Aggregation, Organization and Provision of Professional and Social Information
US20150012562A1 (en) * 2013-02-04 2015-01-08 Zola Books Inc. Literary Recommendation Engine
CN103902697A (en) * 2014-03-28 2014-07-02 百度在线网络技术(北京)有限公司 Combinatorial search method, client and server
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
US20190065507A1 (en) * 2017-08-22 2019-02-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for information processing
US20190108275A1 (en) * 2017-10-06 2019-04-11 Elsevier, Inc. Systems and methods for providing recommendations for academic and research entities
WO2019185689A1 (en) * 2018-03-27 2019-10-03 Nokia Technologies Oy Creation of rich content from textual content
CN109492089A (en) * 2018-10-18 2019-03-19 上海连尚网络科技有限公司 Method and apparatus for output information
CN110781397A (en) * 2019-10-29 2020-02-11 上海连尚网络科技有限公司 Method and equipment for providing novel information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨州: ""基于文本检索的深度关联匹配模型算法的研究与改进"", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *
田春虎: ""基于本体的族式返回检索模型"", 《图书情报工作》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763081A (en) * 2020-08-26 2021-12-07 北京沃东天骏信息技术有限公司 Article recall method and device

Also Published As

Publication number Publication date
CN111368036B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
CN107491534B (en) Information processing method and device
CN108153901B (en) Knowledge graph-based information pushing method and device
CN107220386B (en) Information pushing method and device
CN107256267B (en) Query method and device
US11663254B2 (en) System and engine for seeded clustering of news events
US11521603B2 (en) Automatically generating conference minutes
US11151191B2 (en) Video content segmentation and search
US20180293302A1 (en) Natural question generation from query data using natural language processing system
EP3579119A1 (en) Method and apparatus for recognizing event information in text
WO2023024975A1 (en) Text processing method and apparatus, and electronic device
US10303704B2 (en) Processing a data set that is not organized according to a schema being used for organizing data
US20210004602A1 (en) Method and apparatus for determining (raw) video materials for news
CN114091426A (en) Method and device for processing field data in data warehouse
CN113660541A (en) News video abstract generation method and device
CN110245357B (en) Main entity identification method and device
CN109902152B (en) Method and apparatus for retrieving information
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN111930949B (en) Search string processing method and device, computer readable medium and electronic equipment
CN111126073B (en) Semantic retrieval method and device
CN111368036B (en) Method and device for searching information
US20230112385A1 (en) Method of obtaining event information, electronic device, and storage medium
US20230090601A1 (en) System and method for polarity analysis
CN111666479A (en) Method for searching web page and computer readable storage medium
US9886488B2 (en) Conceptual document analysis and characterization
CN111949781B (en) Intelligent interaction method and device based on natural sentence syntactic analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant