WO2022156446A1 - Method and apparatus for determining summary of search result, and electronic device - Google Patents

Method and apparatus for determining summary of search result, and electronic device Download PDF

Info

Publication number
WO2022156446A1
WO2022156446A1 PCT/CN2021/138921 CN2021138921W WO2022156446A1 WO 2022156446 A1 WO2022156446 A1 WO 2022156446A1 CN 2021138921 W CN2021138921 W CN 2021138921W WO 2022156446 A1 WO2022156446 A1 WO 2022156446A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
last sentence
interception
determined
target paragraph
Prior art date
Application number
PCT/CN2021/138921
Other languages
French (fr)
Chinese (zh)
Inventor
李悦
张志凌
李顶圣
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022156446A1 publication Critical patent/WO2022156446A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to a method, apparatus and electronic device for determining an abstract of a search result.
  • the search engine can provide the URL and abstract of the corresponding webpage according to the search query submitted by the user, and the abstract is used to describe the general content of the webpage, and the user can determine whether to browse the webpage corresponding to the abstract by reading the abstract.
  • the text of a certain length of characters is usually intercepted from the webpage text as the webpage summary.
  • the reference value is very small, and it will also lead to poor presentation of the abstract, thereby affecting the user's reading experience.
  • the present application provides a method, device and electronic method for determining an abstract of a search result, which can flexibly adjust the position of text intercepted in a webpage, so that the information expressed in the last sentence of the obtained webpage abstract is more complete, thereby improving the webpage abstraction.
  • the presentation effect improves the user's reading experience.
  • an embodiment of the present application provides a method for determining an abstract of a search result, including:
  • the first interception position of the target paragraph is determined, wherein the first interception position is any one of the following: the end position of the text obtained by supplementing the last sentence of text, the Describe the end position of the last sentence of text, and the end position of the text obtained after deleting the last sentence of text;
  • an abstract of the search result is determined from the target paragraph, wherein a character length corresponding to the abstract of the search result is within a preset length range.
  • the method for determining the abstract of the search result determines a preset length of characters from the starting position of the target paragraph, evaluates the last sentence of the determined preset length of characters, and obtains the first evaluation according to the evaluation. As a result, the first interception position is determined. It can be seen that the embodiment of the present application flexibly determines the first interception position according to the first evaluation result, so that the last sentence of the text of the abstract of the search result determined according to the first interception position can be a sentence with relatively complete semantics. The text improves the reference value of whether the user browses the web page.
  • the rendering effect is better, which reduces the amount of invalid reading by the user, improves the user's reading experience, and improves the user's click on the summary. Web page probability.
  • the first interception position determined in the embodiment of the present application is the end position of the text obtained after supplementing the last sentence of text, the end position of the last sentence of text, or the end position of the text obtained after deleting the last sentence of text. According to the different first evaluation results, the first interception positions determined in the application embodiment are also different, and the determined first interception positions are further flexibly determined, so that the determined abstract has a better presentation effect.
  • the first evaluation result is: the end of the last sentence of text is incomplete topic information;
  • the first interception position is: the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
  • the topic information at the end of the last sentence of text is evaluated, and the obtained first evaluation result is that the end of the last sentence of text is incomplete topic information.
  • subject information such as movie and TV drama titles, book titles, etc.
  • the first evaluation result obtained in this embodiment is the end of the last sentence.
  • the first interception position may be determined as a position that can complete the above subject information according to the first evaluation result, so that the user can better understand the text.
  • the first evaluation result is: the semantics of the last sentence of text is incomplete;
  • the first interception position is: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
  • the semantics of the last sentence of text is evaluated, and the first evaluation result obtained is that the semantics of the last sentence of text is incomplete.
  • the first evaluation result obtained in this embodiment is that the semantics of the last sentence is incomplete.
  • the text of the last sentence can be completed according to the first evaluation result, which is convenient for the user to read the text.
  • the first evaluation result is: the character length corresponding to the last sentence of text is less than the first preset length;
  • the first interception position is: the end position of the text obtained after deleting the last sentence of text.
  • the length of the character corresponding to the last sentence of text is evaluated, and the obtained first evaluation result is that the length of the character corresponding to the last sentence of text is less than the first preset length.
  • the first evaluation result is: the number of words corresponding to the last sentence of text is less than a second preset number
  • the first interception position is: the end position of the text obtained after deleting the last sentence of text.
  • the number of words corresponding to the last sentence of text is evaluated, and the obtained first evaluation result is that the number of words corresponding to the last sentence of text is less than the second preset number. Since the number of words in a sentence of text can usually reflect the semantic integrity of a sentence of text, evaluating the number of words in the last sentence of text can more accurately evaluate the semantic integrity of the last sentence of text and the reference value for users to understand web pages.
  • the first evaluation result is: the proportion of characters other than words and words in the last sentence of text is greater than a preset proportion;
  • the first interception position is: the end position of the text obtained after deleting the last sentence of text.
  • the proportion of characters other than words and words in the text of the last sentence is evaluated, and the obtained first evaluation result is that the proportion of characters other than words and words in the text of the last sentence is greater than the preset proportion.
  • Users usually understand abstracts more fully through words and phrases that have actual semantics. Characters other than words and words in abstracts are usually less helpful for users to understand abstract content. Therefore, words and words in texts are usually the ones that users understand.
  • the more effective information of the abstract the embodiment of the present application can estimate that the proportion of effective information in the last sentence of text is small, and when determining the first interception position, it can be quickly determined to delete the above-mentioned last sentence of text with a preset length of characters. .
  • the first evaluation result is: the end of the last sentence of text is a punctuation mark;
  • the first interception position is: the end position of the text obtained after deleting the punctuation mark at the end of the last sentence of text.
  • the first interception position is: the end position of the text obtained after deleting the last sentence of text;
  • the determining the abstract of the search result from the target paragraph according to the first interception position includes:
  • the second interception position is: the end position of the text obtained after the last sentence of text is supplemented;
  • a summary of the search result is determined from the target paragraph based on the second clipping position.
  • the first interception position is the end position of the text obtained after deleting the last sentence of text
  • the character length corresponding to the text in the target paragraph before the first interception position is less than the lower limit of the preset length range , indicating that the length of the characters corresponding to the text before the first interception position is too small.
  • the summary of the determined search result may contain very little information, so that the user cannot obtain the effective information of the search result from the summary.
  • the abstract of the search result is determined from the target paragraph according to the second interception position, since the second interception position is the end position of the text obtained after the last sentence of text is supplemented, the characters corresponding to the abstract of the obtained search result can be increased. length, so that users can learn more about the search results from the snippet.
  • the determining the abstract of the search result from the target paragraph according to the second clipping position includes:
  • the preset length characters are determined as an abstract of the search result.
  • the character length corresponding to the text before the second interception position in the target paragraph is greater than the lower limit of the preset length range, it means that the character length corresponding to the text before the second interception position is too long.
  • the number of entries is too small, or the determined abstract cannot be displayed completely, which affects the layout of the page.
  • the preset length of characters is determined as the abstract of the search result, so that the determined abstract will not be too long. , and it will not be too short, which is conducive to the layout of the page.
  • the first interception position is: the end position of the text obtained after the last sentence of text is supplemented;
  • the determining the abstract of the search result from the target paragraph according to the first interception position includes:
  • the third interception position is: the end position of the text obtained after deleting the last sentence of text;
  • a summary of the search result is determined from the target paragraph based on the third clipping position.
  • the first interception position is the end position of the text obtained by supplementing the last sentence of text
  • the character length corresponding to the text before the first interception position in the target paragraph is greater than the second preset length, it means that The character length corresponding to the text before the first interception position is too long.
  • the summary of the determined search result may not be displayed completely.
  • the abstract of the search result is determined from the target paragraph according to the third interception position, since the third interception position is the end position of the text obtained after deleting the last sentence of text, the character length corresponding to the abstract of the obtained search result can be shortened, Enables the determined search results to be fully displayed in the display page.
  • the determining the first interception position of the target paragraph according to the first evaluation result includes:
  • the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the character length corresponding to the last sentence of text is not less than the first preset. The length, the proportion of characters other than words and words in the last sentence of text is not greater than the preset proportion;
  • the first evaluation result is that the end of the last sentence of text is the complete topic information
  • the semantics of the last sentence of text is complete
  • the length of the characters corresponding to the last sentence of text is not less than the first preset length, or the last sentence of text except for words and words
  • the proportion of characters is not greater than the preset proportion, it means that when evaluating the last sentence of text, the last sentence meets the evaluation requirements, that is, the last sentence meets certain integrity requirements. In this case, in order to make the obtained
  • the completeness of the abstract is better and the presentation effect of the abstract is better.
  • the last sentence of text can be further evaluated with different contents to obtain a second evaluation result, and the first interception position can be determined according to the second evaluation result.
  • the determining the abstract of the search result from the target paragraph according to the first interception position includes:
  • a summary of the search result is determined from the target paragraph based on the fourth clipping position.
  • the integrity of the last sentence of the text in the target paragraph before the first interception position may still be low.
  • the last sentence of the text before an interception position is evaluated, a fourth interception position is determined according to the obtained third evaluation result, and an abstract of the search result is determined from the target paragraph according to the fourth interception position. That is to say, after each interception position is determined in this embodiment, the last sentence of the text in the target paragraph that is located before the most recently determined interception position is re-evaluated until the last sentence of the text before the most recently determined interception position.
  • the text of the last sentence of the text satisfies the requirement of complete display of the abstract, so that the completeness and display effect of the abstract of the determined search result can be improved.
  • an embodiment of the present application also provides a device for determining a summary of a search result, including:
  • the evaluation module is used to determine the preset length characters from the starting position of the target paragraph, and evaluate the last sentence of text of the preset length characters to obtain the first evaluation result, wherein the target paragraph is the search result.
  • a determination module configured to determine a first interception position of the target paragraph according to the first evaluation result, wherein the first interception position is any one of the following: a text obtained by supplementing the last sentence of text The ending position of the text, the ending position of the last sentence of text, the ending position of the text obtained after deleting the last sentence of text, according to the first interception position, the summary of the search result is determined from the target paragraph, wherein , the character length corresponding to the abstract of the search result is within a preset length range.
  • the first evaluation result is: the end of the last sentence of text is incomplete topic information;
  • the first interception position is: the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
  • the first evaluation result is: the semantics of the last sentence of text is incomplete;
  • the first interception position is: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
  • the first evaluation result is: the character length corresponding to the last sentence of text is less than the first preset length, or the proportion of characters other than words and words in the last sentence of text greater than the preset ratio;
  • the first interception position is: the end position of the text obtained after deleting the last sentence of text.
  • the first interception position is: the end position of the text obtained after deleting the last sentence of text;
  • the determining module is specifically used for:
  • the second interception position is: the end position of the text obtained after the last sentence of text is supplemented;
  • a summary of the search result is determined from the target paragraph based on the second clipping position.
  • the first interception position is: the end position of the text obtained after the last sentence of text is supplemented;
  • the determining module is specifically used for:
  • the third interception position is: the end position of the text obtained after deleting the last sentence of text;
  • a summary of the search result is determined from the target paragraph based on the third clipping position.
  • the determining module is specifically configured to:
  • the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the character length corresponding to the last sentence of text is not less than the first preset. Length, the proportion of characters other than words and words in the last sentence of text is not greater than the preset proportion;
  • the determining module is specifically used for:
  • a summary of the search result is determined from the target paragraph based on the fourth clipping position.
  • an embodiment of the present application further provides an electronic device, including: a processor, a memory, and an interface;
  • the processor, the memory, and the interface cooperate with each other, and the processor is configured to perform the method of any one of the first aspects.
  • an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the processor executes the first aspect The method of any of the above.
  • the embodiments of the present application further provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the method described in any one of the first aspects.
  • 1a is a schematic diagram of a search page of an example search engine provided by an embodiment of the present application.
  • Figure 1b is a schematic diagram of a browser page
  • FIG. 2 is a schematic diagram of content corresponding to a search result displayed by an example of a browser provided by an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an example of a search system provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an example of a method for determining an abstract of a search result provided by an embodiment of the present application
  • FIG. 5 is an example diagram of determining an abstract of a search result from a target paragraph provided by an embodiment of the present application
  • FIG. 6 is another example diagram of determining an abstract of a search result from a target paragraph provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of another example of a method for determining an abstract of a search result provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an example of word segmentation of text provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of another example of word segmentation of text provided by an embodiment of the present application.
  • 11 is an example diagram of an example of determining an abstract of a search result from a target paragraph provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of another example of a method for determining a summary of a search result provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of displaying on a browser the abstracts of each search result determined using the method for determining the abstract of the search result provided by the embodiment of the present application;
  • FIG. 14 is a schematic flowchart of another example of a method for determining an abstract of a search result provided by an embodiment of the present application.
  • 15 is a schematic flowchart of an example of determining a target paragraph provided by an embodiment of the present application.
  • 16 is a schematic structural diagram of an apparatus for determining a summary of a search result provided by an embodiment of the present application
  • FIG. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • first”, “second” and “third” are only used for descriptive purposes, and should not be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first”, “second”, “third” may expressly or implicitly include one or more of that feature.
  • users can enter a search query on the client.
  • the client can be an electronic device such as a mobile phone, tablet computer, smart watch, computer, etc., and a browser can be installed on the client.
  • the application program corresponding to the search engine can be installed.
  • the search query can be one or more keywords, a piece of text or a formula, etc., or other search queries (such as a picture).
  • the terminal displays the received title, hyperlink and summary on the display screen for the user to view.
  • the abstract of the search result can describe the general content of the web page, and can reflect why the web page is relevant to the search query input by the user.
  • FIG. 1 a is a schematic diagram of a search engine page of a browser
  • FIG. 1 b is a schematic diagram of a browser page
  • FIG. 2 is a schematic diagram of content corresponding to a search result displayed by the browser.
  • the user may enter a search query (eg, wood floor) in the search box 102 of the search engine page of the browser shown in FIG. 1a; or, as shown in FIG. 1b, the user may also enter the search query in the address bar 101 of the browser Search query (for example, water heater type), after receiving the search query, the browser can jump to the search engine page and display the search result.
  • a search query eg, wood floor
  • the address bar 101 of the browser Search query for example, water heater type
  • FIG. 2 shows the titles and abstracts of each search result displayed by the browser after the user enters the search query of “wooden floor” in the search box 102 in FIG. 1a.
  • the browser also displays the Uniform Resource Location of the search results. (uniform resource locator, URL), when there is an image in the web page corresponding to the search result, the browser also displays the image corresponding to the search result.
  • the last sentences of the second, third and fourth abstracts in Figure 2 correspond to "Yiqiang", “otherwise” and "here" respectively. It can be seen that the last sentences of the second, third and fourth abstracts are not helpful for users to browse the webpage. Provide more valuable reference information, so that when the last sentence of text is displayed on the browser, the rendering effect of the abstract is not good, which will increase the invalid reading volume of users and affect the user experience.
  • the embodiments of the present application provide a method and apparatus for determining an abstract of a search result, which can improve the presentation effect of a webpage abstract and improve the user's reading experience.
  • the executive body of the method for determining the summary of the search result provided by the embodiment of the present application may be a server, the server may be a server of a search engine, or a server of other search systems, and the server can crawl web pages and associate information with web pages Stored in the information database, the server can perform information search in the information database according to the search query received from the client.
  • the execution subject is a server as an example for description.
  • FIG. 3 is a schematic structural diagram of a search system 300 according to an embodiment of the present application.
  • the method for determining the abstract of a search result provided in this embodiment of the present application may be applied to a search system 300 , and the search system 300 may include a client 310 and a server 320 , and the client 310 and the server 320 are connected through a network 330 .
  • the server 320 is configured to receive the search query sent by the client 310 through the network 330, determine the abstract of the search result according to the search query, and send the determined abstract to the client 310 through the network 330.
  • the server 320 can also be used to Network 330 crawls individual web pages or other information.
  • the client 310 is configured to acquire the search query input by the user through the network 330, send the search query to the server 320, receive the abstract of the search result sent by the server 320, and display the received abstract on the display.
  • Client 310 may be an electronic device.
  • the search system 300 may include one server 320, may include multiple servers 320, may include one client 310, or may include multiple clients 310, which is not limited in this application. In FIG. 3 , the search system 300 includes three servers 320 and two clients 310 .
  • a browser 311 may be installed on the client 310, and the browser 311 may provide an interface of a search engine, so that a user can send a search query to the search engine through the browser 311.
  • the server 320 may be a server 320 corresponding to a search engine.
  • the search engine may search the index database for search results matching the search query input by the user, and may obtain web pages corresponding to the search results selected by the user from the information database.
  • FIG. 4 is a schematic flowchart of a method for determining an abstract of a search result provided by an embodiment of the present application.
  • the abstract of a search result may be determined according to the following steps S401 to S416, and the execution subject of steps S401 to S416 is: server.
  • Step S401 Acquire the text information of the search result, and determine the target paragraph from the text information.
  • Figures 5 and 6 show two examples of determining the abstract of the search result from the target paragraph.
  • the determined paragraph of the abstract may be the following first paragraph 5a: "Mary is one of the most popular singers and dancers. Mary is born in August 4th, 1969. She is very good at singing and acting. Her songs are always pop music, but there are also countryside music, hip-hop, and rock. She composed many of her own songs.”
  • the determined paragraph of the abstract may also be the following second paragraph 6a: "The process of injection molding is: at a certain temperature, by stirring a completely molten plastic material, injecting it with high pressure The mold cavity is cooled and solidified to obtain a molded product.
  • the advantages of the injection molding method are that the production speed is fast, the efficiency is high, the product integrity is good, the production process can be automated, the output products have various shapes, accurate product dimensions, and products are easy to replace.
  • the specific process of injection molding is based on the content of Chapter 3 of "Production Methods of Plastic Products”.
  • the parameters in the injection molding process mainly include injection pressure, injection time, injection temperature, pressure holding time and temperature, back pressure pressure.”
  • first paragraph 5a and second paragraph 6a are examples when the paragraph where the abstract is located is in English and Chinese respectively.
  • the specific content of the search results and the corresponding language are different, and the content and language of the paragraph where the abstract is located are also different.
  • the second paragraph 6a is the paragraph where the two abstracts corresponding to the search results determined for the two different search queries are located.
  • Step S402 Determine a preset length of characters from the start position of the target paragraph.
  • the preset length is 140 characters
  • the preset length characters 5b determined by the server are: "Mary is one of the most popular singers and dancers. Mary is born in August 4th, 1969. She is very good at singing and acting. Her songs are”, the determined preset length characters are non-Chinese characters; for the second paragraph 6a above, as shown in Figure 6, the server determines The preset length of the characters 6b is: "The process of injection molding is: at a certain temperature, by stirring the completely molten plastic material, injecting it into the mold cavity with high pressure, and cooling and solidifying to obtain a molded product.
  • the advantages of the injection molding method The production speed is fast, the efficiency is high, the product integrity is good, the production process can be automated, the output products have various shapes, accurate product dimensions, the products are easy to replace, and can be formed into complex shapes.
  • the specific process of injection molding is based on "plastics”. Products", the determined preset length characters are Chinese characters.
  • the determined preset length characters include spaces. Since characters such as spaces and punctuation marks will occupy display space when the summary of the search result is displayed on the client, the determined preset length of characters is the actual character length including spaces.
  • Step S403 Evaluate whether the number of words corresponding to the last sentence of text with the preset length of characters is not less than the second preset number.
  • the evaluation result obtained in step S403 is the first evaluation result.
  • the number of words corresponding to the last sentence of text may be the sum of the number of Chinese single-character words and the number of non-Chinese words contained in the last sentence of text.
  • non-Chinese words can include English words, German words, French words, etc.
  • the number of words corresponding to punctuation marks is zero
  • at least one consecutive number is a non-Chinese word
  • consecutive numbers and English characters are 1
  • Non-Chinese words consecutive means adjacent and not separated by spaces.
  • a single-character word can be understood as a single character.
  • the above-mentioned second preset number may be any number from 2 to 5, or may be another smaller number.
  • the second preset number is 4.
  • the last sentence text of the preset length characters 5b determined by the server from the starting position of the first paragraph 5a is "Her songs are", which contains The number of words 3 is less than the second preset number 4, and the server evaluates that the number of words corresponding to the last sentence of text of the preset length characters 5b is less than the second preset number, that is, the evaluation result of step S403 is no.
  • the last text of the preset length characters 6b determined by the server from the starting position of the second paragraph 6a is “the specific process of injection molding is “plastic products”, which contains The number of words 15 is greater than the second preset number 4, and the server evaluates that the number of words corresponding to the last sentence of text of the above-mentioned preset length characters 6b is greater than the second preset number, that is, the evaluation result of step S403 is Yes .
  • the text of the last sentence is "Rescreened in the premise in 3D version in August 2016”
  • the number of single Chinese characters included in this sentence is 9 (9 Chinese single-character words)
  • the last sentence of text is "Mary Ly is the name of the protagonist"
  • the number of single Chinese characters included in this text is 7 (7 Chinese single-character words)
  • the number of English words is 2 (2 non-Chinese words)
  • the text of the last sentence is "The Production Method of Plastic Products" was published in 1995.
  • Step S403 can also be replaced by step S404.
  • Step S404 Evaluate whether the character length corresponding to the last sentence of text of the preset length characters is not less than the first preset length.
  • the evaluation result obtained in step S404 is the first evaluation result. Step S404 is not shown in the figure.
  • the last sentence of text may contain Chinese characters, English characters, or other non-Chinese characters.
  • the character length corresponding to the last sentence of text can be understood as the sum of the characters corresponding to the last sentence of text. Length, that is, the total length of all kinds of characters contained in the last sentence of text.
  • the character length corresponding to the last sentence of text may be the sum of the number of Chinese characters and the number of non-Chinese characters contained in the last sentence of text, where one Chinese character is one Chinese character, one English letter is one non-Chinese character, and one A full-width punctuation mark is one Chinese character, a half-width punctuation mark is one non-Chinese character, a number is one non-Chinese character, and spaces are zero characters.
  • the above-mentioned first preset length can be any length from 3 to 8 characters, or can be other specific lengths.
  • the value of the first preset length can be different.
  • the first preset length can be any value from 10 to 15.
  • the language of the target paragraph is Chinese as shown in 6a in FIG. 6, the first preset length The length can be any value from 3 to 8, because the semantic integrity of Chinese and English corresponding to the same character length is different (Chinese semantics are more complete), so the first preset length corresponding to Chinese is smaller than English (or other languages that use letters to form words) corresponding to the first preset length.
  • step S404 When the first preset length is set to be longer, in step S404, the shorter and incomplete last sentence of text can be deleted with greater probability, so that the end of the abstract of the finally obtained search result is more semantically complete. Since some short sentences may also express complete semantics, when the first preset length is set to be shorter, the probability of mistakenly deleting the last sentence with complete semantics or larger reference value can be reduced in step S404. Those skilled in the art can set the specific value of the first preset length according to the actual scene.
  • the first preset length is 13.
  • the last sentence text of the preset length characters 5b determined by the server from the starting position of the first paragraph 5a is "Her songs are", and the corresponding If the character length 11 is less than the first preset length 13, the server evaluates that the character length corresponding to the last sentence of text of the preset length characters 5b is less than the first preset length, that is, the evaluation result of step S414 is no.
  • the first preset length is 5.
  • the last sentence of the preset length characters 6b determined by the server from the starting position of the second paragraph 6a is “The specific process of injection molding is "Plastic products”
  • the corresponding character length 15 is greater than the first preset length 5
  • the server evaluates that the character length corresponding to the last sentence of text of the above preset length characters 6b is greater than the first preset length, that is, step S414 The evaluation result is yes.
  • the last sentence of text is "Re-released in 3D version in venue China in August 2016”
  • the text of the last sentence is "The Production Method of Plastic Products" was published in 1995.
  • Steps S403 and S404 can also be understood as evaluating the format of the last sentence of text.
  • step S703 for the specific process of determining the last sentence of the preset length of characters in steps S403 and S404, reference may be made to step S703 in the second embodiment.
  • step S405 is performed, and when the evaluation result of step S403 is yes, or the evaluation result of step S404 is positive, step S407 is performed.
  • Step S405 Delete the last sentence of text.
  • the last sentence of the text "Her songs are” of the preset length characters 5b determined from the starting position of the above-mentioned first paragraph 5a can be deleted, and "Her songs are” can be deleted.
  • the following text information 5c is: "Mary is one of the most popular singers and dancers. Mary is born in August 4th, 1969. She is very good at singing and acting.”
  • the end position of the text after deleting "Her songs are” For the first interception position, the length of the text information 5c after deleting "Hersongs are” is 127 characters.
  • Step S406 Determine the end position of the text obtained after deleting the last sentence of text as the first interception position, and determine the first interception position as the latest interception position.
  • step S408 may be performed.
  • Step S407 Determine the end position of the last sentence of text as the first interception position, and determine the first interception position as the latest interception position.
  • step S410 may be executed.
  • Step S408 Determine whether the character length corresponding to the text located before the first clipping position in the target paragraph is not less than the lower limit of the preset length range.
  • the length of the text information 5c after deleting "Hersongs are" is 127 characters and not less than 120 characters.
  • the lower limit of ⁇ 200 is 120.
  • the length of 127 characters in the text information 5c after “Hersongs are” is deleted is less than 130 characters.
  • the lower limit of ⁇ 200 is 130.
  • step S408 If the judgment result of step S408 is NO, execute step S409, and if the judgment result of step S408 is YES, execute step S410.
  • Step S409 supplementing the last sentence of text as a complete text, and updating the latest interception position to the end position of the text obtained by adding a complete text.
  • the most recent clipping position described above is used to determine the summary of the search results.
  • step S415 may be executed, or step S413 may be executed directly.
  • the text information after the server deletes the last sentence of text with the determined preset length of characters 5b is 5c, which includes 127 characters.
  • the length ranges from 130 to 200 characters. Since 127 is less than 130, the text 5d obtained after adding the last sentence of the preset length of characters 5b to a complete sentence is "Mary is one of the most popular singers and dancers" .Mary is born in August 4th, 1969. She is very good at singing and acting. Her songs are always pop music,", the obtained text includes 158 characters, which is greater than the lower limit of 130.
  • Step S410 Evaluate whether the end of the text before the latest clipping position is incomplete topic information.
  • the latest interception position is determined in steps S406 and S407, and the latest interception position is updated in step S409.
  • the server may update and adjust the latest interception position one or more times according to the text before the determined first interception position. For example, step S409 is updated once, so the server updates the text before the latest interception position every time. to evaluate.
  • step S410 If the evaluation result of step S410 is yes, then step S411 is executed, and if the evaluation result of step S410 is no, then step S412 is executed.
  • Step S411 Complete the incomplete topic information according to the target paragraph, and update the latest interception position to the end position of the text obtained after the topic information is completed.
  • step S412 may be performed.
  • Step S412 Evaluate whether the semantics of the last sentence of the text before the latest clipping position are complete.
  • step S412 If the evaluation result of step S412 is yes, step S413 is executed, and if the evaluation result of step S412 is negative, step S414 is executed.
  • Step S413 Determine the text before the latest clipping position as the abstract of the search result.
  • the number of updates of the latest interception position is It could be one time, it could be multiple times.
  • the server does not adjust the latest interception position after each evaluation, the latest interception position is the first interception position.
  • Step S414 Complete the semantics of the last sentence of the text before the latest clipping position according to the target paragraph, and update the latest clipping position to the end position of the text obtained after the semantics is completed.
  • the text before the latest interception position is the text shown in 6c in Figure 6, and the last sentence of the text is "The specific process of injection molding is "Production Method of Plastic Products", and the server can convert this sentence It is added that "the specific process of injection molding is based on the content of Chapter 3 of "Production Methods of Plastic Products”, and the obtained text is shown in 6d in Figure 6. It can be seen that when the semantics are completed, the last sentence becomes "the specific process of injection molding is based on the content of Chapter 3 of "Production Methods of Plastic Products", and the semantics of this sentence are complete.
  • Step S415 Determine whether the character length of the text before the latest clipping position is not greater than the upper limit of the preset length range.
  • step S415 If the judgment result of step S415 is yes, then step S413 is executed, and if the judgment result of step S415 is no, then step S416 is executed.
  • Step S416 Update the latest interception position to the first interception position.
  • the text before the latest interception position is the text shown at 6d in Figure 6, which is: "The process of injection molding is: at a certain temperature, by stirring a completely molten plastic material, with High pressure is injected into the mold cavity, and the molded product is obtained after cooling and solidification.
  • the advantages of the injection molding method are that the production speed is fast, the efficiency is high, the product integrity is good, the production process can be automated, the output products have various shapes, accurate product dimensions, and It is easy to be replaced and can be made into parts with complex shapes.
  • the specific process of injection molding is based on the content of Chapter 3 of "Production Methods of Plastic Products".
  • the length of the text shown in 6d is 153 characters.
  • the character length 153 of the obtained text is greater than the upper limit of the preset length range. Therefore, update the latest interception position to the end position of 6c (ie, the first interception position), and the obtained text is shown in the figure 6e in 6 is shown.
  • step S413 is performed.
  • the latest interception position is not updated.
  • the length of the text 6e obtained after the last step of the deletion operation in Figure 6 corresponds to 125 characters.
  • the preset length range is 130 to 160 characters, and the lower limit is 130 characters, at this time, Even if 125 is less than 130, the latest interception position is no longer updated.
  • the character length of the abstract exceeds the preset length range, the size of the abstract displayed on the client will take up a larger proportion, which affects the overall display effect of the webpage. Therefore, the abstract is not easy to be too long.
  • steps S408 to S416 are to determine the abstract of the search result from the target paragraph according to the first clipping position.
  • FIG. 13 is a schematic diagram of displaying on a browser the abstracts of each search result determined by using the method for determining the abstract of the search results provided in the first embodiment.
  • the abstract of each search result is determined using the method for determining the abstract of the search result provided in this embodiment, and the browser displays the content corresponding to each search result as shown in FIG. 13 .
  • the preset length characters are directly intercepted from the starting position of the target paragraph of the webpage text corresponding to each search result, and the intercepted characters are determined as the webpage summary, and the browser displays each entry.
  • the content corresponding to the search results is shown in Figure 2.
  • Fig. 13 shows that the last sentence of the abstract of the second search result in Fig. 2 is "to be strong", and to evaluate "to be strong", the obtained evaluation result is that "by strong" (2 characters) corresponds to The length of the characters is less than the first preset length (for example, the first preset length is 5 characters), indicating that the length of the characters corresponding to the last sentence of text is small.
  • the text before the latest interception position is evaluated multiple times.
  • the latest interception position is updated and adjusted, and the latest interception position is updated multiple times, so that the semantics of the last sentence of the text in the abstract of the determined search result is more complete and more user-friendly.
  • FIG. 7 is another schematic flowchart of a method for determining a summary of a search result provided by an embodiment of the present application.
  • the execution body of this embodiment may be a server. As shown in FIG. 7 , the following steps S701 to S708 may be used to determine the search result Summary.
  • Step S701 Acquire the text information of the search result, and determine the target paragraph from the text information.
  • Step S702 From the starting position of the target paragraph, determine the set length of characters.
  • step S701 and step S702 are similar to the implementation process of step S401 and step S402, and will not be repeated here.
  • Step S703 Determine the last sentence of text with the preset length of characters.
  • the last sentence of the preset length characters may be determined in the following manner: the text after the last first punctuation mark in the preset length characters is determined as the last sentence of text.
  • the first punctuation mark may be a period mark, wherein the period mark is one of the punctuation marks, and the period mark is used to indicate pauses of different lengths in spoken language.
  • the above-mentioned first punctuation mark may be any one of the following dots: ",” (comma), "?” (question mark), "! (exclamation mark), ";” (semicolon) ,”:"(colon).
  • the text before the dot is usually a text with relatively complete semantics, so the first punctuation mark is any of the above dots, which can make the sentence division more reasonable.
  • FIG. 8 shows a plurality of examples of the last sentence of the above-mentioned preset length characters.
  • the last first punctuation mark is " after "We can watch the news”, ”, therefore, the “understanding of domestic and foreign affairs” after this “,” is determined as the last sentence of text with a preset length of characters, and the last sentence of text in a in Figure 8 is framed by a box.
  • the preset length characters determined from the starting position of the above target paragraph are shown in b in Figure 8, and the last first punctuation mark is "," after "Therefore”, therefore, this " ,” after “briging the” is determined as the last sentence of text with a preset length of characters, and the last sentence of text in b in Figure 8 is framed by a box.
  • the preset length characters determined from the starting position of the above-mentioned target paragraph are as shown in c in Figure 8, and the last first punctuation mark is "can read gratifying, watch movies on the Internet”. ,”, therefore, the “medical, shopping” after this ",” is determined as the last sentence of text with a preset length of characters.
  • the last punctuation mark is ",", which does not belong to the first sentence. Punctuation marks, therefore, do not use ",” to determine the last sentence of text.
  • the last sentence of text is framed by a box.
  • the server may also determine the last sentence of the above-mentioned preset length characters in the following manner: determine that the above-mentioned preset length characters end with a first punctuation mark, and use the second-to-last first punctuation mark in the preset length characters.
  • the text after the symbol is determined to be the last sentence of text.
  • the preset length characters end with the first punctuation mark, there is no character after the last first punctuation mark.
  • the text after the last first punctuation mark is determined as the last sentence of text, then determine The last sentence of text output is empty, that is, there is no last sentence of text. Therefore, the text after the penultimate first punctuation mark in the preset length characters can be determined as the last sentence of text, so as to obtain the last sentence of text.
  • the preset length characters determined from the starting position of the above-mentioned target paragraph are as shown in d in FIG. 8 , it ends with ",", so the last first punctuation mark is the one at the end. ",”, the second-to-last first punctuation mark is ".” before “Therefore”, in this case, the "Therefoer,” after the second-to-last first punctuation mark ".” can be determined as the default
  • Step S704 Evaluate whether the end of the last sentence of text is incomplete topic information.
  • step S704 If the evaluation result of step S704 is yes, then step S705 is executed, and if the evaluation result of step S704 is no, then step S706 is executed.
  • the above topic information may include: topic name, URL, and may also include other types of topic information.
  • the subject name may include the title of film and television drama, character name, book title, musical work title, opera title, folk art performance title, etc., and may also include other subject titles.
  • steps S1 to S4 may be used to determine whether the end of the last sentence of text is incomplete topic information.
  • Step S1 Perform word segmentation on the last sentence of text to obtain at least one word segmentation.
  • a machine algorithm can be used to segment the last sentence of text, for example, a forward maximum matching word segmentation algorithm, a bidirectional maximum matching word segmentation algorithm, etc., or other algorithms can be used to segment the last sentence of text. This application does not limit the specific participle method.
  • step S1 when the last sentence of text contains only one word or one phrase, a word segment is obtained, and when the last sentence of text contains multiple words or multiple phrases, multiple word segments are obtained.
  • FIG. 9 and FIG. 10 are schematic diagrams of word segmentation of text in this embodiment, respectively.
  • the text of the last sentence is "the specific process of injection molding is "plastic products”, and the result of word segmentation of this text can be shown in Figure 9.
  • the last sentence of text is "Technology makes our life faster and more convenient”, and the result of word segmentation of this text can be shown in Figure 10.
  • Step S2 It is determined that there is topic information including the last participle of each participle in the subject information database.
  • the above-mentioned subject information database includes a plurality of subject information.
  • the topic information base may include: each topic name, each uniform resource locator, and may also include other topic information.
  • the subject information in the subject information base can be updated regularly or irregularly, so that the subject information in the subject information base is more comprehensive.
  • the server may check whether the last word segment is included in the topic information database, and when the last word segment is found, it is determined that topic information including the last word segment in each word segment exists in the topic information database.
  • Step S3 It is determined that the above-mentioned last participle is incomplete compared with the topic information including the above-mentioned last participle.
  • the determined last participle is “plastic products”
  • the subject information containing "plastic products” in the information base is "production method of plastic products”
  • since "plastic products” is incomplete compared to “production methods of plastic products” it can be determined that the last participle "plastic products” is incomplete compared to the subject information "production method of plastic products” that contains "plastic products”.
  • Step S4 It is determined that the end of the text of the last sentence is incomplete topic information.
  • the above steps S1 to S4 can determine that the end of the last sentence of text is incomplete topic information.
  • other means may also be used to determine that the end of the last sentence of text is incomplete subject information. For example, when it is determined that the last sentence of text includes a punctuation mark that indicates the title of a work, if the last sentence includes the left half of the punctuation mark, but does not include the right half of the punctuation mark, then determine that the end of the last sentence of text is Incomplete subject information. For example, the text of the last sentence is "the specific process of injection molding is based on "plastic products”. Since this sentence contains the left half of the book title "", but does not include the right half of """, it is determined that the end of this sentence is incomplete subject information.
  • FIG. 11 shows an example of determining an abstract of a search result.
  • the determined target paragraph (that is, the paragraph where the abstract is located) 11a is:
  • the preset length of characters 11b determined from the starting position of the target paragraph 11a is: "Movie enriches people's daily life, this year is a big year for movies, and the "Growing Story” in June It is a classic, and "From Now on” in September has also been well received. Many movies have left a deep impression on everyone.
  • the well-received “Run to” the final text of the determined preset length of characters 11b is: “Run to” with constant praise, evaluate “Run to” with constant praise. After the evaluation, the evaluation result obtained is that the last sentence of the text "The end of "Run to” with constant praise is incomplete topic information.
  • Step S705 Complete the incomplete topic information at the end of the last sentence of text, and determine the end position of the text obtained after the topic information is completed as the first interception position.
  • Step S706 Evaluate whether the semantics of the last sentence of text are complete.
  • step S706 If the judgment result of step S706 is yes, then step S708 is executed, and if the judgment result of step S706 is no, then step S709 is executed.
  • Step S708 Determine the end position of the preset length characters as the first cutting position.
  • Step S709 Complete the last sentence of text, and determine the end position of the text obtained after the last sentence is completed as the first interception position.
  • step S707 is executed.
  • step S706 and step S709 For the specific implementation process of step S706 and step S709, reference may be made to step S1203 and step S1204 in the third embodiment, and step S412 and step S414 in the first embodiment, which will not be described in detail here.
  • Step S707 Determine the text in the target paragraph before the first cut position as the abstract of the search result.
  • the last sentence of the text 11c obtained by supplementing the topic information is "Running to the Sun", which has been well received.
  • the summary of the search result obtained by supplementing the topic information is shown in 11c in Figure 11.
  • the last sentence of text could be supplemented with "Towards the Sun,” which has been well-received.
  • This embodiment evaluates whether the subject information at the end of the last sentence of text with a preset length of characters is complete, so as to complete the subject information at the end.
  • the user usually prefers to continue reading the subject information to better understand the text.
  • the subject information is supplemented completely, so that the user can read more complete information.
  • FIG. 12 is another schematic flowchart of a method for determining an abstract of a search result provided by an embodiment of the present application.
  • the abstract of the search result may be determined according to the following steps S1201 to S1206 .
  • the execution body of this embodiment may be a server.
  • Step S1201 Acquire the text information of the search result, and determine the target paragraph from the text information.
  • Step S1202 From the starting position of the target paragraph, determine the set length of characters.
  • step S1201 and step S1202 are similar to the implementation process of step S401 and step S402, and will not be repeated here.
  • Step S1203 Evaluate whether the semantics of the last sentence of text with the preset length of characters is complete.
  • step S1204 is performed, and if the evaluation result of step S1203 is yes, step S1205 is performed.
  • step S412 of the first embodiment when it is determined that the last sentence of text lacks any one of a subject, a predicate and an object, it can be determined that the semantics of the last sentence of text is incomplete. For a specific example of evaluating whether the semantics is complete, reference may be made to step S412 of the first embodiment.
  • the semantics of the last sentence of text can also be determined to be incomplete in other ways.
  • the third preset number may be any number from 2 to 6, or may be any other smaller number, wherein the words may include single words and phrases in Chinese, words in English, and may also include other languages words and phrases.
  • Step S1204 Complement the last sentence of text as a complete sentence of text, and determine the end position of the text obtained by adding a complete sentence of text as the first interception position.
  • Step S1205 Determine the end position of the preset length characters as the first interception position.
  • Step S1206 Determine whether the character length of the text before the first clipping position is not greater than the upper limit of the preset length range.
  • step S1206 If the judgment result of step S1206 is yes, then step S1207 is executed, and if the judgment result of step S1206 is no, then step S1208 is executed.
  • Step S1207 Determine the text in the target paragraph before the first interception position as the abstract of the search result.
  • the last sentence of text is supplemented as a complete text, and the last sentence of text becomes "Towards the Sun, which has been well received, is a classic movie", by adding the last sentence of text to a complete text to get A summary of the search results for is shown in 11d in Figure 11.
  • Step S1208 Delete the last sentence of text, and determine the end position of the text obtained after deleting the last sentence of text as the third interception position.
  • Step S1209 Determine the text in the target paragraph before the third clipping position as the abstract of the search result.
  • step S1208 and step S1209 For the specific processes of step S1208 and step S1209, reference may be made to step S415, step S416, and step S413 in the first embodiment, and details are not described here.
  • FIG. 14 is a schematic flowchart of a method for determining an abstract of a search result provided by an embodiment of the present application. As shown in FIG. 14 , the method for determining an abstract of a search result provided by this embodiment of the present application includes the following steps S1410 to S1480.
  • the server in the embodiment shown in FIG. 14 may be the foregoing server 320
  • the client may be the foregoing client 310 .
  • Step S1410 The server receives the search query sent by the client.
  • the user can input a search query on the client 310, and after receiving the search query input by the user, the client 310 can send the search query to the server 320 through the network 330 (eg, wired and/or wireless network), and the server 320 can receive the search query.
  • the client 310 can send the search query to the server 320 through the network 330 (eg, wired and/or wireless network), and the server 320 can receive the search query.
  • the user can input the keyword "wooden floor" in the search box 201 of the search page of the browser, the keyword is the search query, and the search box 201 of the search page is the search engine's An interface for users to input information.
  • Step S1420 The server searches for at least one search result matching the above search query.
  • the search results may be any machine-readable and storable documents, for example, the search results may be emails, news, blogs, business directories, electronic versions of printed texts, web pages, and the like.
  • the search results are usually web pages.
  • the search results usually include text information, and may also include embedded information such as images, hyperlinks, audio, video, etc., and embedded instructions such as scripting languages.
  • the server may search for at least one search result matching the above search query from the index database of the search engine.
  • the index database stores the index information of each webpage, and according to the index information, webpages that meet specific conditions can be quickly found.
  • the index information can include the URL of the webpage corresponding to each word segment and the keyword of the webpage, and can also include Other information about the webpage corresponding to each word segment.
  • the server can store the URL of each web page, the hyper text markup language (HTML) code of the web page, the title of the web page and other web page information in the information database.
  • the index information of each web page is established in the index database.
  • the complete information of the webpage is stored in the information database.
  • the server can obtain the complete information of the webpage from the information database, and send the complete information of the webpage to the client, so that the client can display the complete information of the webpage.
  • the complete content of the web page can be obtained from the information database, and send the complete information of the webpage to the client, so that the client can display the complete information of the
  • the server may look up search results after receiving the above search query.
  • the search results may be web pages that match the search query entered by the user.
  • the search result matches the search query. It can be that the title of the search result contains the keyword entered by the user, or the title of the search result contains a synonym of the keyword entered by the user, or the search result contains all or part of the text entered by the user, or the search result belongs to
  • the field is the same or similar to the field to which the search query entered by the user belongs, and the search result may also be other cases that match the search query. For example, as shown in FIG. 2 , if the keyword input by the user during the search is "wooden floor", the server may determine the web page whose title contains "wooden floor” as the search result.
  • Step S1430 The server scores each of the found search results, and sorts each of the found search results in descending order of scores.
  • a higher score indicates a better match between the search result and the search query.
  • the server may score the searched web page according to at least one of the following conditions:
  • the number of occurrences of the search query in the webpage e.g., the more occurrences, the higher the score
  • search query appears in the web page (e.g. in the title with a higher rating);
  • the number of times the web page was viewed (for example, the more times the page was viewed, the higher the score), etc.
  • Step S1440 The server acquires the text information of each of the first preset number of search results in order from front to back, and determines the target paragraph from the acquired text information.
  • the target paragraph is the paragraph where the abstract of the search result is located.
  • the server determines the target paragraph from the acquired text information, that is, the server determines the target paragraph corresponding to the text information from each acquired text information.
  • the above-mentioned first preset number may be any number, for example, may be 10, 50, 100, 200, 500, etc., or may be other more or less number, the larger the first preset number is set, the server determines The more search results are output, the more search results are presented to the user, and the smaller the first preset number is, the faster the server determines the search results.
  • the server when the server stores the complete information of each crawled webpage in the above-mentioned information database, the server can obtain the text information of the search result from the information database.
  • the server may determine the target paragraph in any one of the following determination manners.
  • Determination method 1 The server segments the text information according to the carriage return in the text information of the search result, scores each paragraph, and selects at least one paragraph in descending order of scores to determine the target paragraph. It is understandable that the character length of some paragraphs after the paragraph is very short. In this case, a paragraph may not meet the requirements of abstract interception. Therefore, the target paragraph may include one paragraph or two or more. paragraph.
  • the target paragraph determined in this way has a high degree of matching with the user's search query.
  • Determination method 2 Segment the text information according to the carriage return in the text information of the search result, and score each paragraph. At least one paragraph adjacent to the paragraph is determined as the target paragraph.
  • the target paragraph determined in this way can not only have a high degree of matching with the user's search query, and when the target paragraph includes two or more paragraphs, the coherence of each paragraph included in the target paragraph can be better, so that the Semantic coherence and readability of summaries of identified search results are better.
  • the third determination method the server segments the text information according to the carriage return in the text information of the search result, and determines the starting paragraph of the text information of the search result as the target paragraph. This method can quickly and easily determine the paragraph in which the abstract is located.
  • the server may also determine the paragraph in which the abstract is located in other ways, which is not specifically limited in this application.
  • the server can score each segment based on the following factors: the search query in the segment The number of occurrences (for example, the more occurrences, the higher the score), the position of the segment in the search results (for example, if the segment is the title or the first paragraph, the score is higher), the scoring method of the segment is not here. Repeat.
  • Step S1450 The server determines a preset length of characters from the starting position of the above-mentioned target paragraph.
  • the server determines the preset length of characters, which may be to determine the preset length of Chinese characters, or to determine the preset length of English characters, or to determine the preset length of Korean characters, or to determine the preset length of text in other languages, or to determine Preset lengths of characters in a mix of languages.
  • the text information of the search result may have the same preset lengths corresponding to different language types, such as the example in step S402 of the first embodiment, so that the applicability of the solution provided in this application can be improved. stronger.
  • the number of lines and spaces occupied by characters of the same length and the same font size in the web page are often different. Therefore, in another implementation manner, for the text information of the search results in different languages, pre- The length can also be different.
  • the preset length may be, for example, any length from 100 to 200 characters, or may be a length of other numbers of characters.
  • the preset length When the preset length is set longer, the number of characters in the determined summary is also larger, so that users can learn more about the search results; when the preset length is set shorter, the number of characters in the determined summary is also larger.
  • Each search result occupies less space on the webpage when displayed on the display of the client, so that the same webpage can display more search results.
  • Those skilled in the art can set the specific value of the preset length according to actual needs.
  • Step S1460 The server evaluates the determined last sentence of text with a preset length of characters to obtain a first evaluation result.
  • the last sentence of text with a preset length of characters may be determined according to step S703 of the second embodiment, and then the last sentence of text is evaluated.
  • evaluating the last sentence of text with a preset length of characters determined from the starting position of the above-mentioned target paragraph may be, as in step S403 in the first embodiment, evaluating the number of words corresponding to the last sentence of text, or step S404 evaluates the character length corresponding to the last sentence of text. It can also be to evaluate whether the end of the last sentence of text is incomplete topic information as in step S704 in the second embodiment, or it can be to evaluate the semantics of the last sentence of text as in step S1203 in the third embodiment. completeness.
  • the obtained first evaluation result may be that the number of words corresponding to the last sentence of text in step S403 of the first embodiment is less than the second preset number (that is, the number of words is less), or it may be the last sentence in step S404 of the first embodiment.
  • the character length corresponding to the text is less than the first preset length (that is, the character length is shorter), it can also be that the end of the last sentence of text in step S704 of the second embodiment is incomplete theme information, or it can be the step of the third embodiment
  • the semantics of the last sentence text in S1203 is incomplete.
  • the obtained first evaluation result may also be other results that can reflect the integrity of the text of the last sentence, which is not specifically limited in this application.
  • Step S1470 The server determines the first clipping position of the target paragraph according to the above-mentioned first evaluation result.
  • the first interception position may be the end position of the text obtained by supplementing the last sentence of text of the target paragraph.
  • the first interception position may also be the end position of the text obtained after deleting the last sentence of text of the target paragraph.
  • the first interception position may be the end position of the text obtained after deleting the last sentence of text in step S405 of the first embodiment.
  • the first clipping position may also be the end position of the last sentence of the target paragraph.
  • the first clipping position may be the summary of the search result determined in step S708 of the second embodiment or step S1205 of the third embodiment.
  • the last sentence of text may be supplemented into a complete sentence, or as in step S705 in the second embodiment, the last sentence may be added to the text.
  • the subject information at the end of the text is complete.
  • to supplement the last sentence of text as a complete sentence of text it may be to add the target paragraph before the next first punctuation mark of the above-mentioned last sentence of text is added to the end of this last sentence of text.
  • the text located before the first interception position can be adjusted, and a new new interception position can be re-determined, so that the text before the new interception position corresponds to the character length
  • a new cutting position can be re-determined according to steps S408 and S409 of the first embodiment.
  • the preset length range may be 80 to 350 characters, and may also be other character length ranges. This application does not specifically limit the preset length range, and those skilled in the art can set it according to actual needs.
  • the character length of the abstract of the determined search result can be kept within a reasonable length range, and the page length will not be too long.
  • the number of displayed search results is too small, nor is it too short to cause the summary to contain too little valid information.
  • the layout space occupied by the summary of each search result is usually determined, and the display settings such as font size, line spacing, and word spacing of the summary of each search result are also OK, if the length of the text corresponding to the text in the paragraph where the abstract is located before the cut-off position is long, the determined abstract will not be fully displayed on the client or other monitors, resulting in a less complete abstract of the displayed search results. Poor, the display effect is not good.
  • the character length corresponding to the text located before the interception position in the target paragraph is set within the preset length range, so that the user can better understand the general content of the search result through the abstract, and the determined
  • the summary is completely displayed on the client or other displays, so that the summary of the displayed search results is more complete and the display effect is better.
  • step S410 when the evaluation result of step S410 is yes, and when the evaluation result of step S704 of the second embodiment is yes, the above-mentioned last sentence of text can be deleted, as in step S405 of the first embodiment, and the above-mentioned last sentence of text can also be supplemented, as in the embodiment
  • step S705 of the second embodiment and step S1204 of the third embodiment the end position of the text obtained after deleting the last sentence of text or the text obtained after supplementing is determined as the first
  • the above-mentioned last sentence of text may be The end position is determined as the first interception position of the target paragraph, as shown in step S708 of the second embodiment and step S1205 of the third embodiment.
  • the first interception position of the paragraph where the abstract is located can be flexibly determined according to the first evaluation result, and those skilled in the art can determine the location of the paragraph where the abstract is located based on the principles of better completeness and better display effect of the abstract in the search result.
  • the present application does not limit a specific manner of determining the first interception position.
  • Step S1480 The server determines the abstract of the search result from the target paragraph according to the above-mentioned first clipping position.
  • the character length corresponding to the abstract of the search result determined in step S1480 is within a preset length range.
  • the server may determine the text in the target paragraph before the first clipping position as the abstract of the search result.
  • step S707 in the second embodiment and step S1207 in the third embodiment are examples of determining the text in the target paragraph before the first interception position as the abstract of the search result.
  • the server may also determine another clipping position according to the text before the first clipping position, and determine the text in the target paragraph before the other clipping position as the abstract of the search result.
  • steps S405 to S416 in the first embodiment are to determine another interception position according to the text before the first interception position, and determine the text in the target paragraph before another interception position as an example of the abstract of the search result, the first embodiment
  • the latest interception position in is another determined interception position.
  • the text in the target paragraph before the first interception position still cannot present the abstract well, for example, the text is too long or too short, the text semantics is still incomplete, etc.
  • the text before the first interception position can be Another clipping position is further determined, wherein the text before the other clipping position can better present the abstract, so that the determined abstract can be further presented in a better effect.
  • the solution provided in the embodiment of the present application evaluates the last sentence of text with a preset length of characters starting from the starting position of the target paragraph, and determines the first interception position according to the first evaluation result obtained from the evaluation. It can be seen that the implementation of the present application For example, according to the first evaluation result, the first interception position can be flexibly determined, so that the last sentence of the abstract of the determined search result can be a sentence of text with relatively complete semantics, which improves the reference value for whether the user browses the web page.
  • the rendering effect is better, the amount of invalid reading by the user is reduced, the reading experience of the user is improved, and the probability of the user clicking on the webpage corresponding to the abstract is increased.
  • the first interception position determined in the embodiment of the present application is the end position of the text obtained after supplementing the last sentence of text, the end position of the last sentence of text, or the end position of the text obtained after deleting the last sentence of text.
  • the first interception positions determined in the application embodiment are also different, and the determined first interception positions are further flexibly determined, so that the determined abstract has a better presentation effect.
  • step S1460 the last sentence of text can be evaluated in any one of the following evaluation manners to obtain the first evaluation result.
  • Evaluation method 1 Evaluate whether the character length corresponding to the last sentence of text is less than the first preset length.
  • step S404 for the specific evaluation process of the first evaluation method, reference may be made to step S404 in the first embodiment.
  • the evaluation result based on the first evaluation method is yes, and the first interception position may be: the end position of the text obtained after deleting the last sentence of text, as shown in steps S405 to S406 of the first embodiment.
  • the first evaluation method it can be quickly evaluated that the semantic integrity of the last sentence of the text is low and the reference value for the user to understand the web page is small, so that when determining the first interception position, it can quickly determine the preset length of characters.
  • the last sentence is removed to more quickly identify snippets that show better search results to users.
  • Evaluation method 2 Evaluate whether the number of words corresponding to the last sentence of text is less than the second preset number.
  • step S403 of the first embodiment For the specific evaluation process of the second evaluation method, reference may be made to step S403 of the first embodiment.
  • the evaluation result based on evaluation method 2 is yes, and the first interception position may be: the end position of the text obtained after deleting the last sentence of text, such as steps S405 to S406 in the first embodiment.
  • Evaluation method 2 can also more quickly determine the summary of search results that are better displayed to the user, and because the number of words contained in a sentence of text usually better reflects the semantic integrity of a sentence of text, therefore, evaluate the words of the last sentence of text. Quantity, can more accurately evaluate the semantic integrity of the last sentence of the text and the reference value for users to understand the web page. Through the evaluation method 2, it can be quickly evaluated that the semantic integrity of the last sentence of the text is low and the reference value for the user to understand the web page is small, so that when determining the first interception position, it can quickly determine the preset length of characters. The last sentence is removed to more quickly identify snippets that show better search results to users.
  • Evaluation method 3 Evaluate whether the proportion of characters other than words and words in the last sentence of text is greater than the preset proportion.
  • the above-mentioned words may include single-character words and phrases in Chinese, English words, and may include single-character words and phrases in other languages, and the above-mentioned words may include Chinese characters.
  • characters other than words and words may include: at least one of numbers, punctuation marks, mathematical symbols, currency symbols, collation symbols, lexicon symbols, and phonetic symbols, and characters other than words and words may also include other Symbols are not specifically limited in this application.
  • the above preset ratio may be any ratio from 40% to 60%, or may be other larger ratios.
  • the evaluation result based on evaluation method 3 is yes, and the first interception position may be: the end position of the text obtained after deleting the last sentence of text.
  • the third evaluation method can evaluate that the proportion of effective information in the last sentence of text is small, and when the first interception position is determined, it can be quickly determined to delete the last sentence of text with a preset length of characters.
  • Evaluation method 4 Evaluate whether the end of the text in the last sentence above is incomplete topic information.
  • step S704 of the second embodiment For the specific evaluation process of the fourth evaluation mode, reference may be made to step S704 of the second embodiment.
  • the evaluation result based on evaluation method 4 is that the first interception position may be the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
  • the last sentence of text is evaluated, that is, the subject information at the end of the last sentence of text is evaluated, and the first evaluation result obtained is that the end of the last sentence of text is incomplete subject information.
  • the first evaluation result obtained is that the end of the last sentence of text is incomplete subject information.
  • the first interception position can be determined according to the first evaluation result of the fourth evaluation method as a position that can complete the above subject information, so that the user can better understand the text.
  • Evaluation method five evaluate whether the semantics of the last sentence above is incomplete.
  • step S1203 of the third embodiment For the specific evaluation process of the fifth evaluation mode, reference may be made to step S1203 of the third embodiment.
  • the evaluation result based on evaluation method 5 is yes, and the first interception position may be: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
  • the last sentence of text is evaluated, that is, the semantics of the last sentence of text is evaluated, and the first evaluation result obtained is that the semantics of the last sentence of text is incomplete.
  • the evaluation method 5 evaluates that the semantics of the last sentence is incomplete.
  • Evaluation method 6 Evaluate whether the end of the text of the last sentence above is a punctuation mark.
  • Punctuation marks in assessment method 6 can be dots.
  • the evaluation result based on evaluation method 6 is yes, and the first interception position may be: the end position of the text obtained by deleting the punctuation mark at the end of the text of the last sentence.
  • evaluation method 1, evaluation method 2, evaluation method 3 and evaluation method 6 can be summarized as format evaluation.
  • the first interception position is determined according to the first evaluation result. If the first evaluation result is different, the determined first interception position may be the same or different.
  • the last sentence of text when the length of characters corresponding to the last sentence of text is less than the first preset length, or when the number of words corresponding to the last sentence of text is less than the second preset number, or the proportion of characters other than words in the last sentence of text is greater than
  • the preset ratio it means that the last sentence of text is likely to be a sentence with incomplete semantics, or a sentence that does not express any actual semantics. Therefore, the last sentence of text has little reference value for users to understand the search results.
  • the last sentence can be deleted, and the sentence with less reference value can be deleted, so that the completeness of the determined abstract is better, and the user's reading experience is better when the abstract is displayed on the client.
  • the character length or the number of words of the last sentence is small, when the last sentence is directly deleted, the difference between the total character length of the obtained abstract and the above preset length is also small. Other monitors show less impact as well.
  • the last sentence of text may also be supplemented as a complete sentence of text. In this way, the completeness of the determined digest can also be improved.
  • the above sentence of complete text can be understood as a sentence of text with complete semantics, for example, it can be a sentence of text including subject, location and object.
  • the first evaluation result is to determine that the length of characters corresponding to the last sentence of text is not less than the first preset length, or when the number of words corresponding to the last sentence of text is not less than the second preset number, or it is determined that the last sentence
  • the proportion of characters other than words in a sentence of text is not greater than the preset ratio, or when it is determined that the end of the last sentence of text is complete topic information, or when it is determined that the semantics of the last sentence of text are complete
  • the first interception position can be the above
  • the end position of the last sentence of text is step S708 in the second embodiment, and step S1205 in the third embodiment.
  • the result of the evaluation is that the semantics of the last sentence is complete. In this case, there is no need to delete or supplement the content of the last sentence.
  • the reason for directly determining the end position of the last sentence of text as the clipping position of the paragraph where the abstract is located is that the last sentence of text with a preset length of characters is already text with complete semantics and better display effect.
  • the method for determining the summary of the search result may further include the following step: sending the determined summary of the search result to the client, so that the client displays the received summary on the display.
  • the first clipping position may be: the end position of the text obtained after deleting the last sentence of text.
  • Step S1480 can be implemented by the following steps: determine that the length of the character corresponding to the text located before the first interception position in the target paragraph is less than the second preset length; determine the second interception position, wherein the second interception position is: for the last sentence The end position of the text obtained after the text is supplemented; according to the second interception position, the abstract of the search result is determined from the target paragraph.
  • step S408 , step S409 and step S413 of the first embodiment The latest interception position in step 409 of the first embodiment is the above-mentioned second interception position.
  • the above-mentioned second preset length may be the lower limit of the above-mentioned preset length range.
  • the first interception position is the end position of the text obtained after deleting the last sentence of text
  • the character length corresponding to the text in the target paragraph before the first interception position is less than the second preset length, it means that the first The length of the characters corresponding to the text before the interception position is too small.
  • the summary of the determined search result may contain very little information, so that the user cannot obtain the effective information of the search result from the summary.
  • the summary of the search result is determined from the target paragraph according to the second interception position. Since the second interception position is the end position of the text obtained after supplementing the last sentence of text, the length of characters corresponding to the obtained summary of the search result can be increased, so that the user can You can learn more about the search results from the abstract.
  • the above-mentioned, according to the second interception position determines the summary of the search result from the target paragraph, which can be the text before the second interception position in the target paragraph is determined as the summary of the search result, or it can be further based on the text before the second interception position.
  • Other interception positions are determined, and an abstract of the search result is determined according to the other interception positions, until the presentation effect of the determined abstract of the search result meets the requirements.
  • the above-mentioned determination of the abstract of the search result from the target paragraph according to the second interception position can be achieved by the following steps: determining the character corresponding to the text in the target paragraph before the second interception position If the length is greater than the third preset length, the determined preset length characters are determined as the abstract of the search result.
  • the third preset length is greater than the second preset length, and the third preset length may be the upper limit of the preset length range.
  • the abstract of the search result may be determined according to steps S409, S415, S416, and S413 of the first embodiment.
  • the character length corresponding to the text before the second interception position in the target paragraph is greater than the third preset length, it means that the character length corresponding to the text before the second interception position is too long.
  • the number of search results is too small, or the determined summary cannot be displayed completely, which affects the typesetting of the page.
  • the determined preset length characters can be determined as the summary of the search result, which is beneficial to the typesetting of the page.
  • the character length corresponding to the determined abstract is within a preset length range. That is to say, in this embodiment, when the determined length of the digest is short, if further increasing the character length of the digest will cause the digest to be too long, the determined preset length characters are used as the final determined digest.
  • the first clipping position is the end position of the text obtained by supplementing the last sentence of text.
  • Step S1480 can be implemented according to the following steps: determine that the length of the character corresponding to the text located before the first interception position in the target paragraph is greater than the third preset length; determine the third interception position, wherein the third interception position is: after deleting the last sentence of text The end position of the obtained text; the text in the target paragraph before the third interception position is determined as the abstract of the search result.
  • the abstract of the search result may be determined according to steps S1206, S1208, and S1209 of the third embodiment.
  • the third preset length may be the upper limit of the above-mentioned preset length range.
  • the first interception position is the end position of the text obtained by supplementing the last sentence of text
  • the character length corresponding to the text in the target paragraph before the first interception position is greater than the second preset length, it means that The character length corresponding to the text before the first interception position is too long.
  • the summary of the determined search result may not be displayed completely.
  • the abstract of the search result is determined from the target paragraph according to the third interception position, since the third interception position is the end position of the text obtained after deleting the last sentence of text, the character length corresponding to the abstract of the obtained search result can be shortened, Enables the determined search results to be fully displayed in the display page.
  • step S1470 may be implemented in the following steps: determine that the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the characters corresponding to the last sentence of text The length is not less than the first preset length, and the proportion of characters other than words and phrases in the last sentence of text is not greater than the preset proportion; the last sentence of text is evaluated to obtain the second evaluation result; according to the second evaluation result, the target paragraph is determined the first intercept position. The second evaluation result is different from the evaluation content corresponding to the first evaluation result.
  • the summary of the search result can be determined according to steps S706 to S709 of the second embodiment, wherein step S706 is the process of evaluating the last sentence of text to obtain the second evaluation result, and steps S708 and S709 are based on the second evaluation result.
  • step S706 is the process of evaluating the last sentence of text to obtain the second evaluation result
  • steps S708 and S709 are based on the second evaluation result.
  • the process of determining the first intercept position is the process of evaluating the last sentence of text to obtain the second evaluation result.
  • the first evaluation result is that the end of the last sentence of text is the complete topic information
  • the semantics of the last sentence of text is complete
  • the length of the characters corresponding to the last sentence of text is not less than the first preset length, or the last sentence of text except for words and words
  • the proportion of characters is not greater than the preset proportion, it means that when evaluating the last sentence of text, the last sentence meets the evaluation requirements, that is, the last sentence meets certain integrity requirements.
  • the completeness of the abstract is better and the presentation effect of the abstract is better, and the text result of the last sentence can be further evaluated to obtain the second evaluation result.
  • the evaluation content corresponding to the second evaluation result and the first evaluation result may be different.
  • the evaluation content corresponding to the second evaluation result can be whether the semantics of the last sentence is complete; when the evaluation content corresponding to the first evaluation result is the character length corresponding to the last sentence, the evaluation content corresponding to the second evaluation result can be whether the end of the last sentence contains incomplete topics. information.
  • step S1480 may be implemented by the following steps: evaluating the last sentence of the text in the target paragraph before the first interception position to obtain a third evaluation result; Fourth interception position; according to the fourth interception position, determine the abstract of the search result from the target paragraph.
  • steps S408 to S416 in the first embodiment and steps S1206 to S1209 in the third embodiment.
  • the evaluation result obtained in step S408 is the third evaluation result
  • steps S409 to S411 determine the fourth interception position
  • steps S412 to S416 determine the abstract of the search result according to the fourth interception position.
  • the evaluation result obtained in step S1206 is the third evaluation result
  • the third interception position determined in step S1208 is the fourth interception position
  • step S1209 is the process of determining the abstract of the search result according to the fourth interception position.
  • the above-mentioned fourth interception position can be any of the following: the end position of the text obtained after supplementing the last sentence of the text in the target paragraph before the first interception position, the position in the target paragraph before the first interception position.
  • the end position of the text the end position of the text obtained by deleting the last sentence of the text in the target paragraph before the first interception position.
  • the integrity of the last sentence of the text in the target paragraph before the first interception position may still be low.
  • the last sentence of the text before an interception position is evaluated, a fourth interception position is determined according to the obtained third evaluation result, and an abstract of the search result is determined from the target paragraph according to the fourth interception position. That is to say, after each time the first interception position is determined, this embodiment will re-evaluate the last sentence of the text in the target paragraph before the first interception position, so that the text before the determined interception position is The text of the last sentence satisfies the requirement of complete display of the abstract, so that the completeness and display effect of the abstract of the determined search result can be improved.
  • the server may determine the target paragraph of the search result as follows.
  • FIG. 15 is a schematic flowchart of determining a target paragraph of a search result according to an embodiment of the present application. As shown in FIG. 15 , the target paragraph of the search result can be determined according to the following steps S1501 to S1506 .
  • S1501 The server receives the search query sent by the client.
  • S1502 The server searches for multiple web pages matching the search query, and scores each web page in descending order of matching degree.
  • the higher the matching degree the higher the score.
  • S1503 The server determines X webpages from the above-mentioned multiple webpages according to the order of the scores from high to low.
  • X is a positive integer not less than 1.
  • X may be any value from 50 to 300, or may be another specific value.
  • the server acquires the text information corresponding to the determined webpage, and segments the text information to obtain at least one paragraph corresponding to each of the X webpages.
  • S1505 The server scores each paragraph.
  • S1506 The server selects Y paragraphs in descending order of ratings as target paragraphs of the search result.
  • a webpage can select a paragraph as the paragraph where the abstract is located.
  • the value of Y is 1.
  • the value of Y can be a positive integer greater than 1.
  • step S1606 may be implemented in the following steps a to d:
  • Step b The server determines whether the length of the text corresponding to the paragraphs ranked 1 to i is not less than a preset length.
  • step b If the judgment result of step b is yes, execute step c, and if the judgment result of step b is no, execute step d.
  • step b For the setting method of the preset length in step b, reference may be made to the setting method of the preset length in the foregoing embodiments, which will not be repeated here.
  • Step c Determine the paragraphs ranked 1 to i as Y paragraphs.
  • the present application can divide the functional modules of the apparatus for determining the abstract of the search result according to the above method examples.
  • each function can be divided into each functional module, or two or more functions can be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that the division of modules in this application is schematic, and is only a logical function division, and other division methods may be used in actual implementation.
  • FIG. 16 shows a schematic structural diagram of an apparatus for determining an abstract of a search result provided by the present application.
  • the apparatus includes an evaluation module 1610 and a determination module 1620.
  • the evaluation module is used to determine the preset length characters from the starting position of the target paragraph, and evaluate the last sentence of text of the preset length characters to obtain the first evaluation result, wherein the target paragraph is the search result.
  • a determination module configured to determine a first interception position of the target paragraph according to the first evaluation result, wherein the first interception position is any one of the following: a text obtained by supplementing the last sentence of text The ending position of the text, the ending position of the last sentence of text, the ending position of the text obtained after deleting the last sentence of text, according to the first interception position, the summary of the search result is determined from the target paragraph, wherein , the character length corresponding to the abstract of the search result is within a preset length range.
  • the first evaluation result is: the end of the last sentence of text is incomplete topic information;
  • the first interception position is: the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
  • the first evaluation result is: the semantics of the last sentence of text is incomplete;
  • the first interception position is: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
  • the first evaluation result is: the length of the characters corresponding to the last sentence of text is less than the first preset length, or the proportion of characters other than words and words in the last sentence of text is greater than the predetermined length. set proportion;
  • the first interception position is: the end position of the text obtained after deleting the last sentence of text.
  • the first interception position is: the end position of the text obtained after deleting the last sentence of text;
  • the determining module is specifically used for:
  • the second interception position is: the end position of the text obtained after the last sentence of text is supplemented;
  • a summary of the search result is determined from the target paragraph based on the second clipping position.
  • the first interception position is: the end position of the text obtained after the last sentence of text is supplemented;
  • the determining module is specifically used for:
  • the third interception position is: the end position of the text obtained after deleting the last sentence of text;
  • a summary of the search result is determined from the target paragraph based on the third clipping position.
  • the determining module is specifically used for:
  • the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the character length corresponding to the last sentence of text is not less than the first preset. The length, the proportion of characters other than words and words in the last sentence of text is not greater than the preset proportion;
  • the determining module is specifically used for:
  • a summary of the search result is determined from the target paragraph based on the fourth clipping position.
  • FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device provided by the present application includes a processor 1701, an interface 1702, a memory 1703, and a communication bus 1704, wherein the processor 1701, The interface 1702 and the memory 1703 communicate with each other through the communication bus 1704;
  • the memory 1703 is used to store computer programs
  • the processor 1701 is configured to implement the method for determining the digest of the search result described in any one of the foregoing embodiments when executing the program stored in the memory 1703 .
  • the communication bus mentioned in the above electronic device may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the interface is used for communication between the above electronic device and other devices.
  • the memory may include random access memory (RAM), and may also include non-volatile memory (NVM), such as at least one disk memory.
  • RAM random access memory
  • NVM non-volatile memory
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (digital signal processing, DSP), a dedicated integrated Circuit (application specific integrated circuit, ASIC), field-programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU central processing unit
  • NP network processor
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the processor is made to execute the description in any of the foregoing embodiments.
  • the method for determining the summary of the search results is not limited to a processor, but not limited to a processor, but not limited to a processor, but not limited to a processor, the processor is made to execute the description in any of the foregoing embodiments.
  • Embodiments of the present application also provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the above-mentioned relevant steps, so as to implement the method for determining an abstract of a search result described in any of the above-mentioned embodiments.
  • the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.
  • the disclosed apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative, for example, the division of modules or units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or May be integrated into another device, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • Units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium.
  • a readable storage medium including several instructions to make a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present application relates to the technical field of the Internet, and provides a method and an apparatus for determining a summary of a search result and an electronic device, wherein the electronic device may be a server, and the method comprises: determining a preset length of characters starting from a starting position of a target paragraph, evaluating the last sentence of text of the preset length of characters to obtain a first evaluation result and, on the basis of the first evaluation result, determining a first capturing position of the target paragraph, the first capturing position being any one of the following: the end position of text obtained after supplementing the last sentence of text, the end position of the last sentence of text, and the end position of text obtained after deleting the last sentence of text; and, on the basis of the first capturing position, determining a summary of the search result from the target paragraph. The present method can flexibly adjust the position of the text captured in a webpage such that the information expressed in the last sentence of the obtained webpage summary is more complete, thereby improving the presentation effect of the webpage summary and enhancing the reading experience of the user.

Description

搜索结果的摘要确定方法、装置及电子设备Method, device and electronic device for determining abstract of search result
本申请要求于2021年01月19日提交国家知识产权局、申请号为202110072051.5、申请名称为“搜索结果的摘要确定方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110072051.5 and the application title "Method, Apparatus and Electronic Equipment for Determining the Abstract of Search Results", which was submitted to the State Intellectual Property Office on January 19, 2021, the entire contents of which are by reference Incorporated in this application.
技术领域technical field
本申请涉及互联网技术领域,具体涉及一种搜索结果的摘要确定方法、装置及电子设备。The present application relates to the field of Internet technologies, and in particular, to a method, apparatus and electronic device for determining an abstract of a search result.
背景技术Background technique
随着互联网技术的快速发展,人们越来越多地习惯于借助搜索引擎来寻找自己需要的信息。搜索引擎能够根据用户提交的搜索查询提供对应网页的网址和摘要,摘要用于描述网页的概要内容,用户通过阅读摘要可以确定是否浏览该摘要对应的网页。With the rapid development of Internet technology, people are more and more accustomed to using search engines to find the information they need. The search engine can provide the URL and abstract of the corresponding webpage according to the search query submitted by the user, and the abstract is used to describe the general content of the webpage, and the user can determine whether to browse the webpage corresponding to the abstract by reading the abstract.
相关技术中,通常是从网页文本中截取一定长度字符的文本作为网页摘要,通过这种方式得到的网页摘要的最后一句文本有时非常短、所表达的信息也不完整,对用户是否浏览网页的参考价值很小,还会导致摘要呈现效果不好,从而影响用户的阅读体验。In the related art, the text of a certain length of characters is usually intercepted from the webpage text as the webpage summary. The reference value is very small, and it will also lead to poor presentation of the abstract, thereby affecting the user's reading experience.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种搜索结果的摘要确定方法、装置及电子方法,能够灵活调整在网页中截取的文本的位置,使得到的网页摘要的最后一句所表达的信息更完整,从而提高了网页摘要的呈现效果,提高了用户的阅读体验。具体方案如下:The present application provides a method, device and electronic method for determining an abstract of a search result, which can flexibly adjust the position of text intercepted in a webpage, so that the information expressed in the last sentence of the obtained webpage abstract is more complete, thereby improving the webpage abstraction. The presentation effect improves the user's reading experience. The specific plans are as follows:
第一方面,本申请实施例提供了一种搜索结果的摘要确定方法,包括:In a first aspect, an embodiment of the present application provides a method for determining an abstract of a search result, including:
从目标段落的起始位置开始确定预设长度个字符,其中,所述目标段落为所述搜索结果的摘要所在段落;Determine a preset length of characters from the starting position of the target paragraph, wherein the target paragraph is the paragraph where the abstract of the search result is located;
对所述预设长度个字符的最后一句文本进行评估,得到第一评估结果,其中,所述目标段落为所述搜索结果的摘要所在段落;Evaluate the last sentence of text of the preset length characters to obtain a first evaluation result, wherein the target paragraph is the paragraph where the abstract of the search result is located;
根据所述第一评估结果,确定所述目标段落的第一截取位置,其中,所述第一截取位置为以下任一项:对所述最后一句文本进行补充后得到的文本的结尾位置、所述最后一句文本的结尾位置、删除所述最后一句文本后得到的文本的结尾位置;According to the first evaluation result, the first interception position of the target paragraph is determined, wherein the first interception position is any one of the following: the end position of the text obtained by supplementing the last sentence of text, the Describe the end position of the last sentence of text, and the end position of the text obtained after deleting the last sentence of text;
根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,其中,所述搜索结果的摘要对应的字符长度在预设的长度范围内。According to the first interception position, an abstract of the search result is determined from the target paragraph, wherein a character length corresponding to the abstract of the search result is within a preset length range.
本申请实施例提供的搜索结果的摘要确定方法从目标段落的起始位置开始确定预设长度个字符,对所确定的预设长度个字符的最后一句进行了评估,根据评估得到的第一评估结果确定了第一截取位置,可见,本申请实施例根据第一评估结果灵活确定第一截取位置,可以使根据第一截取位置所确定的搜索结果的摘要的最后一句文本为语义较为完整的一句文本,提高了对用户是否浏览网页的参考价值,当搜索结果的摘要显示在客户端时,呈现效果更好,减少了用户无效阅读量,提高了用户的阅读体验,提升了用户点击摘要对应的网页的概率。另外,本申请实施例中确定的第一截取位置为对最后一句文本进行补充后得到的文本的结尾位置、最后一句文本的结尾位置、或者删除最后一句文本后得到的文 本的结尾位置,可见本申请实施例根据第一评估结果的不同,所确定的第一截取位置也有所不同,进一步灵活地确定出的第一截取位置,使确定出的摘要的呈现效果更好。The method for determining the abstract of the search result provided by the embodiment of the present application determines a preset length of characters from the starting position of the target paragraph, evaluates the last sentence of the determined preset length of characters, and obtains the first evaluation according to the evaluation. As a result, the first interception position is determined. It can be seen that the embodiment of the present application flexibly determines the first interception position according to the first evaluation result, so that the last sentence of the text of the abstract of the search result determined according to the first interception position can be a sentence with relatively complete semantics. The text improves the reference value of whether the user browses the web page. When the summary of the search results is displayed on the client, the rendering effect is better, which reduces the amount of invalid reading by the user, improves the user's reading experience, and improves the user's click on the summary. Web page probability. In addition, the first interception position determined in the embodiment of the present application is the end position of the text obtained after supplementing the last sentence of text, the end position of the last sentence of text, or the end position of the text obtained after deleting the last sentence of text. According to the different first evaluation results, the first interception positions determined in the application embodiment are also different, and the determined first interception positions are further flexibly determined, so that the determined abstract has a better presentation effect.
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本的末尾为不完整的主题信息;In an optional embodiment, the first evaluation result is: the end of the last sentence of text is incomplete topic information;
所述第一截取位置为:将所述最后一句文本的末尾的主题信息补充完整后得到的文本的结尾位置。The first interception position is: the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
本实施例中评估了最后一句文本末尾的主题信息,得到的第一评估结果是最后一句文本的末尾为不完整的主题信息。用户在阅读文本时,对于影视剧名称、书名等主题信息,用户通常更希望阅读这些主题信息的完整内容,以更好地理解文本,本实施例得到的第一评估结果为最后一句的末尾为不完整的主题信息,在确定第一截取位置时,可以根据第一评估结果将第一截取位置确定为能够使上述主题信息完整的位置,从而使用户更好地理解文本。In this embodiment, the topic information at the end of the last sentence of text is evaluated, and the obtained first evaluation result is that the end of the last sentence of text is incomplete topic information. When a user is reading a text, for subject information such as movie and TV drama titles, book titles, etc., the user usually prefers to read the complete content of these subject information to better understand the text. The first evaluation result obtained in this embodiment is the end of the last sentence. For incomplete subject information, when determining the first interception position, the first interception position may be determined as a position that can complete the above subject information according to the first evaluation result, so that the user can better understand the text.
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本的语义不完整;In an optional embodiment, the first evaluation result is: the semantics of the last sentence of text is incomplete;
所述第一截取位置为:将所述最后一句文本补充为一句完整文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
本实施中评估了最后一句文本的语义,得到的第一评估结果是最后一句文本的语义不完整。当一句文本的语义不完整时,用户阅读了这句文本后通常也无法得知这句文本表示的具体内容,因此,本实施例得到的第一评估结果为最后一句的语义不完整,在确定第一截取位置时,可以根据第一评估结果将最后一句文本补充完整,便于用户阅读文本。In this implementation, the semantics of the last sentence of text is evaluated, and the first evaluation result obtained is that the semantics of the last sentence of text is incomplete. When the semantics of a sentence of text is incomplete, the user usually cannot know the specific content represented by the text after reading the text. Therefore, the first evaluation result obtained in this embodiment is that the semantics of the last sentence is incomplete. At the first interception position, the text of the last sentence can be completed according to the first evaluation result, which is convenient for the user to read the text.
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本对应的字符长度小于第一预设长度;In an optional embodiment, the first evaluation result is: the character length corresponding to the last sentence of text is less than the first preset length;
所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after deleting the last sentence of text.
本实施例评估了最后一句文本对应的字符的长度,得到的第一评估结果是最后一句文本对应的字符长度小于第一预设长度。通过本实施例,可以快速地评估出最后一句文本的语义完整性较低以及对用户理解网页的参考价值较小,从而可以在确定第一截取位置时,快速确定出将上述预设长度个字符的最后一句删除,更快速地确定出对用户显示效果更好的搜索结果的摘要。In this embodiment, the length of the character corresponding to the last sentence of text is evaluated, and the obtained first evaluation result is that the length of the character corresponding to the last sentence of text is less than the first preset length. Through this embodiment, it can be quickly evaluated that the semantic integrity of the last sentence of text is low and the reference value for users to understand the web page is low, so that when determining the first interception position, it can be quickly determined that the above preset length characters The last sentence is removed to more quickly identify snippets that show better search results to users.
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本对应的单词数小于第二预设数量;In an optional embodiment, the first evaluation result is: the number of words corresponding to the last sentence of text is less than a second preset number;
所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after deleting the last sentence of text.
本实施例评估了最后一句文本对应的单词数,得到的第一评估结果是最后一句文本对应的单词数小于第二预设数量。由于一句文本包含的单词数通常能体现一句文本的语义完整性,因此,评估最后一句文本的单词的数量,可以更准确地评估出最后一句文本的语义完整性和对用户理解网页的参考价值。通过本申请实施例,可以快速地评估出最后一句文本的语义完整性较低以及对用户理解网页的参考价值较小,从而可以在确定第一截取位置时,快速确定出将上述预设长度个字符的最后一句删除,更快速地确定出对用户显示效果更好的搜索结果的摘要。In this embodiment, the number of words corresponding to the last sentence of text is evaluated, and the obtained first evaluation result is that the number of words corresponding to the last sentence of text is less than the second preset number. Since the number of words in a sentence of text can usually reflect the semantic integrity of a sentence of text, evaluating the number of words in the last sentence of text can more accurately evaluate the semantic integrity of the last sentence of text and the reference value for users to understand web pages. Through the embodiments of the present application, it can be quickly assessed that the semantic integrity of the last sentence of text is low and the reference value for users to understand the web page is low, so that when the first interception position is determined, it can be quickly determined that the above preset length The last sentence of characters is removed to more quickly identify snippets that show better search results to users.
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本的除字和词以外的字符占比大于预设比例;In an optional embodiment, the first evaluation result is: the proportion of characters other than words and words in the last sentence of text is greater than a preset proportion;
所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after deleting the last sentence of text.
本申请实施例中评估了最后一句文本的除字和词以外的字符占比,得到的第一评估结果是最后一句文本的除字和词以外的字符占比大于预设比例。用户通常通过有实际语义的字和词来更充分地理解摘要,摘要中的除字和词以外的字符对用户理解摘要内容的帮助通常较小,因此,文本中的字和词通常是用户理解摘要的较有效的信息,本申请实施例可以评估出最后一句文本中的有效信息的占比较少,在确定第一截取位置时,能够快速确定出将上述预设长度个字符的最后一句文本删除。In the embodiment of the present application, the proportion of characters other than words and words in the text of the last sentence is evaluated, and the obtained first evaluation result is that the proportion of characters other than words and words in the text of the last sentence is greater than the preset proportion. Users usually understand abstracts more fully through words and phrases that have actual semantics. Characters other than words and words in abstracts are usually less helpful for users to understand abstract content. Therefore, words and words in texts are usually the ones that users understand. The more effective information of the abstract, the embodiment of the present application can estimate that the proportion of effective information in the last sentence of text is small, and when determining the first interception position, it can be quickly determined to delete the above-mentioned last sentence of text with a preset length of characters. .
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本的末尾为标点符号;In an optional embodiment, the first evaluation result is: the end of the last sentence of text is a punctuation mark;
所述第一截取位置为:删除最后一句文本末尾的标点符号后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after deleting the punctuation mark at the end of the last sentence of text.
摘要在显示时,摘要末尾若为标点符号,这个标点符号对用户的阅读理解的参考价值是很小的,并且末尾的标点符号也可能影响摘要在客户端显示效果,因此,当评估出最后一句文本的末尾为标点符号时,可以将末尾的标点符号删除,以使摘要的显示效果更好。When the abstract is displayed, if there is a punctuation mark at the end of the abstract, the reference value of this punctuation mark to the user's reading comprehension is very small, and the punctuation mark at the end may also affect the display effect of the abstract on the client side. Therefore, when evaluating the last sentence When there is a punctuation mark at the end of the text, you can delete the punctuation mark at the end to make the summary display better.
在一个可选的实施例中,所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置;In an optional embodiment, the first interception position is: the end position of the text obtained after deleting the last sentence of text;
所述根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,包括:The determining the abstract of the search result from the target paragraph according to the first interception position includes:
确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度小于所述预设的长度范围的下限;It is determined that the character length corresponding to the text located before the first interception position in the target paragraph is less than the lower limit of the preset length range;
确定第二截取位置,其中,所述第二截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;Determine the second interception position, wherein, the second interception position is: the end position of the text obtained after the last sentence of text is supplemented;
根据所述第二截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the second clipping position.
本实施例中,当第一截取位置为删除最后一句文本后得到的文本的结尾位置时,目标段落中位于第一截取位置之前的文本对应的字符长度若小于所述预设的长度范围的下限,说明第一截取位置之前的文本对应的字符长度过小,这种情况下,所确定的搜索结果的摘要包含的信息可能很少,使得用户无法从摘要中获知该搜索结果的有效信息,这种情况下,根据第二截取位置从目标段落中确定搜索结果的摘要,由于第二截取位置为对最后一句文本进行补充后得到的文本的结尾位置,可以增长得到的搜索结果的摘要对应的字符长度,使用户可以从摘要中获知搜索结果的更多信息。In this embodiment, when the first interception position is the end position of the text obtained after deleting the last sentence of text, if the character length corresponding to the text in the target paragraph before the first interception position is less than the lower limit of the preset length range , indicating that the length of the characters corresponding to the text before the first interception position is too small. In this case, the summary of the determined search result may contain very little information, so that the user cannot obtain the effective information of the search result from the summary. In this case, the abstract of the search result is determined from the target paragraph according to the second interception position, since the second interception position is the end position of the text obtained after the last sentence of text is supplemented, the characters corresponding to the abstract of the obtained search result can be increased. length, so that users can learn more about the search results from the snippet.
在一个可选的实施例中,所述根据所述第二截取位置,从所述目标段落中确定所述搜索结果的摘要,包括:In an optional embodiment, the determining the abstract of the search result from the target paragraph according to the second clipping position includes:
确定所述目标段落中位于所述第二截取位置之前的文本对应的字符长度大于所述预设的长度范围的上限;Determine that the character length corresponding to the text before the second interception position in the target paragraph is greater than the upper limit of the preset length range;
将所述预设长度个字符确定为所述搜索结果的摘要。The preset length characters are determined as an abstract of the search result.
若目标段落中位于第二截取位置之前的文本对应的字符长度大于预设的长度范围的下限,说明第二截取位置之前的文本对应的字符长度过长,这样,可能使得页面呈现的搜索结果的条数过少,或者使所确定出的摘要无法完整显示,影响页面的排版,这种情况下,将预设长度个字符确定为搜索结果的摘要,可以使确定出的摘要既不会过长,也不会过短,利于页面的排版。If the character length corresponding to the text before the second interception position in the target paragraph is greater than the lower limit of the preset length range, it means that the character length corresponding to the text before the second interception position is too long. The number of entries is too small, or the determined abstract cannot be displayed completely, which affects the layout of the page. In this case, the preset length of characters is determined as the abstract of the search result, so that the determined abstract will not be too long. , and it will not be too short, which is conducive to the layout of the page.
在一个可选的实施例中,所述第一截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;In an optional embodiment, the first interception position is: the end position of the text obtained after the last sentence of text is supplemented;
所述根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,包括:The determining the abstract of the search result from the target paragraph according to the first interception position includes:
确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度大于预设的长度范围的上限;It is determined that the character length corresponding to the text located before the first interception position in the target paragraph is greater than the upper limit of the preset length range;
确定第三截取位置,其中,所述第三截取位置为:删除所述最后一句文本后得到的文本的结尾位置;Determine the third interception position, wherein, the third interception position is: the end position of the text obtained after deleting the last sentence of text;
根据所述第三截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the third clipping position.
本实施例中,当第一截取位置为对最后一句文本进行补充后得到的文本的结尾位置时,目标段落中位于第一截取位置之前的文本对应的字符长度若大于第二预设长度,说明第一截取位置之前的文本对应的字符长度过长,这种情况下,由于每条搜索结果在显示页面中的显示空间通常是确定的,所以所确定的搜索结果的摘要可能无法完整显示,这种情况下,根据第三截取位置从目标段落中确定搜索结果的摘要,由于第三截取位置为删除最后一句文本后得到的文本的结尾位置,可以缩短得到的搜索结果的摘要对应的字符长度,使所确定的搜索结果可以在显示页面中完整显示。In this embodiment, when the first interception position is the end position of the text obtained by supplementing the last sentence of text, if the character length corresponding to the text before the first interception position in the target paragraph is greater than the second preset length, it means that The character length corresponding to the text before the first interception position is too long. In this case, since the display space of each search result on the display page is usually determined, the summary of the determined search result may not be displayed completely. In this case, the abstract of the search result is determined from the target paragraph according to the third interception position, since the third interception position is the end position of the text obtained after deleting the last sentence of text, the character length corresponding to the abstract of the obtained search result can be shortened, Enables the determined search results to be fully displayed in the display page.
在一个可选的实施例中,所述根据所述第一评估结果,确定所述目标段落的第一截取位置,包括:In an optional embodiment, the determining the first interception position of the target paragraph according to the first evaluation result includes:
确定所述第一评估结果为以下任一项:所述最后一句文本的末尾为完整的主题信息、所述最后一句文本的语义完整、所述最后一句文本对应的字符长度不小于第一预设长度、所述最后一句文本的除字和词以外的字符占比不大于预设比例;It is determined that the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the character length corresponding to the last sentence of text is not less than the first preset. The length, the proportion of characters other than words and words in the last sentence of text is not greater than the preset proportion;
对所述最后一句文本进行评估,得到第二评估结果,其中,所述第二评估结果与所述第一评估结果对应的评估内容不同;Evaluate the last sentence of text to obtain a second evaluation result, wherein the second evaluation result is different from the evaluation content corresponding to the first evaluation result;
根据所述第二评估结果,确定所述目标段落的第一截取位置。According to the second evaluation result, a first interception position of the target paragraph is determined.
当第一评估结果为以最后一句文本的末尾为完整的主题信息、最后一句文本的语义完整、最后一句文本对应的字符长度不小于第一预设长度、或者最后一句文本的除字和词以外的字符占比不大于预设比例时,说明在对最后一句文本进行评估时,最后一句是满足评估要求的,即最后一句是满足一定的完整性要求的,这种情况下,为了使得到的摘要的完整性更好、使摘要的呈现效果更好,可以进一步对最后一句文本再次进行不同内容的评估,得到第二评估结果,并根据第二评估结果确定第一截取位置。When the first evaluation result is that the end of the last sentence of text is the complete topic information, the semantics of the last sentence of text is complete, the length of the characters corresponding to the last sentence of text is not less than the first preset length, or the last sentence of text except for words and words When the proportion of characters is not greater than the preset proportion, it means that when evaluating the last sentence of text, the last sentence meets the evaluation requirements, that is, the last sentence meets certain integrity requirements. In this case, in order to make the obtained The completeness of the abstract is better and the presentation effect of the abstract is better. The last sentence of text can be further evaluated with different contents to obtain a second evaluation result, and the first interception position can be determined according to the second evaluation result.
在一个可选的实施例中,所述根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,包括:In an optional embodiment, the determining the abstract of the search result from the target paragraph according to the first interception position includes:
对所述目标段落中位于所述第一截取位置之前的文本的最后一句文本进行评估,得到第三评估结果;Evaluate the last sentence of the text in the target paragraph before the first interception position to obtain a third evaluation result;
根据所述第三评估结果,确定所述目标段落的第四截取位置;According to the third evaluation result, determine the fourth interception position of the target paragraph;
根据所述第四截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the fourth clipping position.
本实施例中,当确定出第一截取位置后,目标段落中位于第一截取位置之前的文本的最后一句文文可能完整性还是较低,这种情况下,可以进一步对目标段落中位于第一截取位置之前的文本的最后一句文本进行评估,根据得到的第三评估结果,确定第四截取位置,根据第四截取位置,从目标段落中确定所述搜索结果的摘要。也就是说,本实施例在每次确定出截取位置后,会对目标段落中位于最近一次确定出的截取位置之前的文本的最后一句文本再次进行评估,直至最近一次确定出的截取位置之前的文本的最后一句文本满足摘 要完整显示的需求,这样,可以使确定出的搜索结果的摘要的完整性更好、显示效果更好。In this embodiment, after the first interception position is determined, the integrity of the last sentence of the text in the target paragraph before the first interception position may still be low. The last sentence of the text before an interception position is evaluated, a fourth interception position is determined according to the obtained third evaluation result, and an abstract of the search result is determined from the target paragraph according to the fourth interception position. That is to say, after each interception position is determined in this embodiment, the last sentence of the text in the target paragraph that is located before the most recently determined interception position is re-evaluated until the last sentence of the text before the most recently determined interception position. The text of the last sentence of the text satisfies the requirement of complete display of the abstract, so that the completeness and display effect of the abstract of the determined search result can be improved.
第二方面,本申请实施例还提供了一种搜索结果的摘要确定装置,包括:In a second aspect, an embodiment of the present application also provides a device for determining a summary of a search result, including:
评估模块,用于从目标段落的起始位置开始确定预设长度个字符,对上述预设长度个字符的最后一句文本进行评估,得到第一评估结果,其中,所述目标段落为所述搜索结果的摘要所在段落;The evaluation module is used to determine the preset length characters from the starting position of the target paragraph, and evaluate the last sentence of text of the preset length characters to obtain the first evaluation result, wherein the target paragraph is the search result. The paragraph in which the summary of the results is located;
确定模块,用于根据所述第一评估结果,确定所述目标段落的第一截取位置,其中,所述第一截取位置为以下任一项:对所述最后一句文本进行补充后得到的文本的结尾位置、所述最后一句文本的结尾位置、删除所述最后一句文本后得到的文本的结尾位置,根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,其中,所述搜索结果的摘要对应的字符长度在预设的长度范围内。A determination module, configured to determine a first interception position of the target paragraph according to the first evaluation result, wherein the first interception position is any one of the following: a text obtained by supplementing the last sentence of text The ending position of the text, the ending position of the last sentence of text, the ending position of the text obtained after deleting the last sentence of text, according to the first interception position, the summary of the search result is determined from the target paragraph, wherein , the character length corresponding to the abstract of the search result is within a preset length range.
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本的末尾为不完整的主题信息;In an optional embodiment, the first evaluation result is: the end of the last sentence of text is incomplete topic information;
所述第一截取位置为:将所述最后一句文本的末尾的主题信息补充完整后得到的文本的结尾位置。The first interception position is: the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本的语义不完整;In an optional embodiment, the first evaluation result is: the semantics of the last sentence of text is incomplete;
所述第一截取位置为:将所述最后一句文本补充为一句完整文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
在一个可选的实施例中,所述第一评估结果为:所述最后一句文本对应的字符长度小于第一预设长度,或者,所述最后一句文本的除字和词以外的字符占比大于预设比例;In an optional embodiment, the first evaluation result is: the character length corresponding to the last sentence of text is less than the first preset length, or the proportion of characters other than words and words in the last sentence of text greater than the preset ratio;
所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after deleting the last sentence of text.
在一个可选的实施例中,所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置;In an optional embodiment, the first interception position is: the end position of the text obtained after deleting the last sentence of text;
所述确定模块具体用于:The determining module is specifically used for:
确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度小于所述预设的长度范围的下限;It is determined that the character length corresponding to the text located before the first interception position in the target paragraph is less than the lower limit of the preset length range;
确定第二截取位置,其中,所述第二截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;Determine the second interception position, wherein, the second interception position is: the end position of the text obtained after the last sentence of text is supplemented;
根据所述第二截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the second clipping position.
在一个可选的实施例中,所述第一截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;In an optional embodiment, the first interception position is: the end position of the text obtained after the last sentence of text is supplemented;
所述确定模块具体用于:The determining module is specifically used for:
确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度大于预设的长度范围的上限;It is determined that the character length corresponding to the text located before the first interception position in the target paragraph is greater than the upper limit of the preset length range;
确定第三截取位置,其中,所述第三截取位置为:删除所述最后一句文本后得到的文本的结尾位置;Determine the third interception position, wherein, the third interception position is: the end position of the text obtained after deleting the last sentence of text;
根据所述第三截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the third clipping position.
在一个可选的实施例中,所述确定模块具体用于:In an optional embodiment, the determining module is specifically configured to:
确定所述第一评估结果为以下任一项:所述最后一句文本的末尾为完整的主题信息、所述最后一句文本的语义完整、所述最后一句文本对应的字符长度不小于第一预设长度、 所述最后一句文本的除字和词以外的字符占比不大于预设比例;It is determined that the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the character length corresponding to the last sentence of text is not less than the first preset. Length, the proportion of characters other than words and words in the last sentence of text is not greater than the preset proportion;
对所述最后一句文本进行评估,得到第二评估结果,其中,所述第二评估结果与所述第一评估结果对应的评估内容不同;Evaluate the last sentence of text to obtain a second evaluation result, wherein the second evaluation result is different from the evaluation content corresponding to the first evaluation result;
根据所述第二评估结果,确定所述目标段落的第一截取位置。According to the second evaluation result, a first interception position of the target paragraph is determined.
一个可选的实施例中,所述确定模块具体用于:In an optional embodiment, the determining module is specifically used for:
对所述目标段落中位于所述第一截取位置之前的文本的最后一句文本进行评估,得到第三评估结果;Evaluate the last sentence of the text in the target paragraph before the first interception position to obtain a third evaluation result;
根据所述第三评估结果,确定所述目标段落的第四截取位置;According to the third evaluation result, determine the fourth interception position of the target paragraph;
根据所述第四截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the fourth clipping position.
第三方面,本申请实施例还提供了一种电子设备,包括:处理器、存储器和接口;In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory, and an interface;
所述处理器、所述存储器和所述接口相互配合,所述处理器用于执行第一方面中任一项所述的方法。The processor, the memory, and the interface cooperate with each other, and the processor is configured to perform the method of any one of the first aspects.
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储了计算机程序,当所述计算机程序被处理器执行时,使得处理器执行第一方面中任一项所述的方法。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the processor executes the first aspect The method of any of the above.
第五方面,本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行第一方面中任一项所述的方法。In a fifth aspect, the embodiments of the present application further provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the method described in any one of the first aspects.
附图说明Description of drawings
图1a是本申请实施例提供的一例搜索引擎的搜索页面的示意图;1a is a schematic diagram of a search page of an example search engine provided by an embodiment of the present application;
图1b为浏览器页面的示意图;Figure 1b is a schematic diagram of a browser page;
图2是本申请实施例提供的一例浏览器所显示的搜索结果对应的内容的示意图;2 is a schematic diagram of content corresponding to a search result displayed by an example of a browser provided by an embodiment of the present application;
图3是本申请实施例提供的一例搜索系统的结构示意图;3 is a schematic structural diagram of an example of a search system provided by an embodiment of the present application;
图4是本申请实施例提供的一例搜索结果的摘要确定方法的流程示意图;4 is a schematic flowchart of an example of a method for determining an abstract of a search result provided by an embodiment of the present application;
图5是本申请实施例提供的一例从目标段落确定搜索结果的摘要的示例图;FIG. 5 is an example diagram of determining an abstract of a search result from a target paragraph provided by an embodiment of the present application;
图6是本申请实施例提供的又一例从目标段落确定搜索结果的摘要的示例图;6 is another example diagram of determining an abstract of a search result from a target paragraph provided by an embodiment of the present application;
图7是本申请实施例提供的又一例搜索结果的摘要确定方法的流程示意图;7 is a schematic flowchart of another example of a method for determining an abstract of a search result provided by an embodiment of the present application;
图8是本申请实施例提供的预设长度个字符的最后一句的多个示例图;8 is a plurality of example diagrams of the last sentence of a preset length of characters provided by an embodiment of the present application;
图9是本申请实施例提供的一例对文本进行分词的示意图;9 is a schematic diagram of an example of word segmentation of text provided by an embodiment of the present application;
图10是本申请实施例提供的又一例对文本进行分词的示意图;10 is a schematic diagram of another example of word segmentation of text provided by an embodiment of the present application;
图11是本申请实施例提供的一例从目标段落确定搜索结果的摘要的示例图;11 is an example diagram of an example of determining an abstract of a search result from a target paragraph provided by an embodiment of the present application;
图12是本申请实施例提供的又一例搜索结果的摘要确定方法的流程示意图;12 is a schematic flowchart of another example of a method for determining a summary of a search result provided by an embodiment of the present application;
图13是使用本申请实施例提供的搜索结果的摘要确定方法确定出的各搜索结果的摘要在浏览器上的显示示意图;13 is a schematic diagram of displaying on a browser the abstracts of each search result determined using the method for determining the abstract of the search result provided by the embodiment of the present application;
图14为本申请实施例提供的另一例搜索结果的摘要确定方法的流程示意图;14 is a schematic flowchart of another example of a method for determining an abstract of a search result provided by an embodiment of the present application;
图15是本申请实施例提供的一例确定目标段落的流程示意图;15 is a schematic flowchart of an example of determining a target paragraph provided by an embodiment of the present application;
图16为本申请实施例提供的一例搜索结果的摘要确定装置的结构示意图;16 is a schematic structural diagram of an apparatus for determining a summary of a search result provided by an embodiment of the present application;
图17为本申请实施例提供电子设备的结构示意图。FIG. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Wherein, in the description of the embodiments of the present application, unless otherwise stated, “/” means or means, for example, A/B can mean A or B; “and/or” in this document is only a description of the associated object The association relationship of , indicates that there can be three kinds of relationships, for example, A and/or B, can indicate that A exists alone, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" refers to two or more than two.
以下,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”、“第三”的特征可以明示或者隐含地包括一个或者更多个该特征。Hereinafter, the terms "first", "second" and "third" are only used for descriptive purposes, and should not be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first", "second", "third" may expressly or implicitly include one or more of that feature.
下面,对本申请实施例的应用场景进行介绍。In the following, application scenarios of the embodiments of the present application are introduced.
用户通常借助搜索引擎搜索自己需要的信息,具体的,用户可以在客户端输入搜索查询,该客户端可以是手机、平板电脑、智能手表、电脑等电子设备,客户端上可以安装浏览器,也可以安装搜索引擎对应的应用程序,该搜索查询可以是一个或多个关键字,也可以是一段文本或一个公式等,还可以是其他搜索查询(例如一张图片),搜索引擎在获取到用户输入的搜索查询后,可以查询与搜索查询匹配的各个网页作为搜索结果,并获取查询到的各个网页的标题、超链接和摘要,将获取到的标题、超链接和摘要发送给客户端,客户端将接收到的标题、超链接和摘要在显示屏上显示出来,以供用户查看。其中,搜索结果的摘要可以描述网页的概要内容,能够反应网页为何与用户输入的搜索查询是相关的。Users usually use search engines to search for the information they need. Specifically, users can enter a search query on the client. The client can be an electronic device such as a mobile phone, tablet computer, smart watch, computer, etc., and a browser can be installed on the client. The application program corresponding to the search engine can be installed. The search query can be one or more keywords, a piece of text or a formula, etc., or other search queries (such as a picture). After entering the search query, you can query each web page matching the search query as the search result, and obtain the title, hyperlink and abstract of each web page queried, and send the obtained title, hyperlink and abstract to the client, the customer The terminal displays the received title, hyperlink and summary on the display screen for the user to view. The abstract of the search result can describe the general content of the web page, and can reflect why the web page is relevant to the search query input by the user.
图1a为浏览器的搜索引擎页面的示意图,图1b为浏览器页面的示意图,图2为浏览器所显示的搜索结果对应的内容的示意图。例如,用户可以在图1a所示的浏览器的搜索引擎页面的搜索框102中输入搜索查询(例如木地板);或者,如图1b所示,用户也可以在浏览器的地址栏101中输入搜索查询(例如热水器种类),浏览器接收到该搜索查询后,可以跳转至搜索引擎页面,并显示搜索结果。图2为用户在图1a中的搜索框102中输入“木地板”的搜索查询后,浏览器显示的各条搜索结果的标题和摘要,图2中浏览器还显示了搜索结果的统一资源定位(uniform resource locator,URL),当搜索结果对应的网页中存在图片时,浏览器还显示了搜索结果对应的图片。图2中的第二、三和四条摘要的最后一句文本分别对应“以强”、“不然”、“这里””,可见,第二、三和四条摘要的最后一句文本没有对用户浏览该网页提供较有价值的参考信息,使得这些最后一句文本在浏览器上展示时,摘要的呈现效果不好,会增加用户的无效阅读量,影响用户体验。FIG. 1 a is a schematic diagram of a search engine page of a browser, FIG. 1 b is a schematic diagram of a browser page, and FIG. 2 is a schematic diagram of content corresponding to a search result displayed by the browser. For example, the user may enter a search query (eg, wood floor) in the search box 102 of the search engine page of the browser shown in FIG. 1a; or, as shown in FIG. 1b, the user may also enter the search query in the address bar 101 of the browser Search query (for example, water heater type), after receiving the search query, the browser can jump to the search engine page and display the search result. FIG. 2 shows the titles and abstracts of each search result displayed by the browser after the user enters the search query of “wooden floor” in the search box 102 in FIG. 1a. In FIG. 2, the browser also displays the Uniform Resource Location of the search results. (uniform resource locator, URL), when there is an image in the web page corresponding to the search result, the browser also displays the image corresponding to the search result. The last sentences of the second, third and fourth abstracts in Figure 2 correspond to "Yiqiang", "otherwise" and "here" respectively. It can be seen that the last sentences of the second, third and fourth abstracts are not helpful for users to browse the webpage. Provide more valuable reference information, so that when the last sentence of text is displayed on the browser, the rendering effect of the abstract is not good, which will increase the invalid reading volume of users and affect the user experience.
为了解决上述问题,本申请实施例提供了一种搜索结果的摘要确定方法和装置,能够提高网页摘要的呈现效果,提高了用户的阅读体验。In order to solve the above problem, the embodiments of the present application provide a method and apparatus for determining an abstract of a search result, which can improve the presentation effect of a webpage abstract and improve the user's reading experience.
本申请实施例提供的搜索结果的摘要确定方法的执行主体可以是服务器,该服务器可以是搜索引擎的服务器,也可以是其他搜索系统的服务器,服务器能够爬取网页以及将与网页相关联的信息存储在信息数据库中,服务器可以根据从客户端接收到的搜索查询在信息数据库中进行信息搜索。The executive body of the method for determining the summary of the search result provided by the embodiment of the present application may be a server, the server may be a server of a search engine, or a server of other search systems, and the server can crawl web pages and associate information with web pages Stored in the information database, the server can perform information search in the information database according to the search query received from the client.
下面,将通过附图和实施例对本申请提供的搜索结果的摘要确定方法进行介绍。以下实施例中,以执行主体为服务器为例进行介绍。Below, the method for determining the abstract of the search result provided by the present application will be introduced through the accompanying drawings and embodiments. In the following embodiments, the execution subject is a server as an example for description.
图3为本申请实施例提供的搜索系统300的一种结构示意图。如图3所示,本申请实施例提供的搜索结果的摘要确定方法可以应用于搜索系统300,该搜索系统300可以包括客户端310、服务器320,客户端310与服务器320之间通过网络330连接。其中,服务器320用于接收客户端310通过网络330发送的搜索查询,根据该搜索查询确定搜索结果的摘要,并将所确定的摘要通过网络330发送给客户端310,服务器320还可以用于通过网络330爬取各个网页或其他信息。客户端310用于通过网络330获取用户输入的搜索查询,并将该搜索查询发送给服务器320,并接收到服务器320发送的搜索结果的摘要,将接收到的摘要显示在显示器上。客户端310可以是电子设备。搜索系统300可以包括一个服务器320,也可以包括多个服务器320,可以包括一个客户端310,也可以包括多个客户端310,本申请对此不做限定。在图3中,搜索系统300包括三个服务器320和两个客户端310。FIG. 3 is a schematic structural diagram of a search system 300 according to an embodiment of the present application. As shown in FIG. 3 , the method for determining the abstract of a search result provided in this embodiment of the present application may be applied to a search system 300 , and the search system 300 may include a client 310 and a server 320 , and the client 310 and the server 320 are connected through a network 330 . The server 320 is configured to receive the search query sent by the client 310 through the network 330, determine the abstract of the search result according to the search query, and send the determined abstract to the client 310 through the network 330. The server 320 can also be used to Network 330 crawls individual web pages or other information. The client 310 is configured to acquire the search query input by the user through the network 330, send the search query to the server 320, receive the abstract of the search result sent by the server 320, and display the received abstract on the display. Client 310 may be an electronic device. The search system 300 may include one server 320, may include multiple servers 320, may include one client 310, or may include multiple clients 310, which is not limited in this application. In FIG. 3 , the search system 300 includes three servers 320 and two clients 310 .
在一种实施方式中,客户端310上可以安装有浏览器311,浏览器311可以提供搜索引擎的接口,使用户能够通过浏览器311将搜索查询发送给搜索引擎。服务器320可以是搜索引擎对应的服务器320,搜索引擎可以从索引数据库中搜索出与用户输入的搜索查询匹配的搜索结果,以及可以从信息数据库获取用户所选择的搜索结果对应的网页。In one embodiment, a browser 311 may be installed on the client 310, and the browser 311 may provide an interface of a search engine, so that a user can send a search query to the search engine through the browser 311. The server 320 may be a server 320 corresponding to a search engine. The search engine may search the index database for search results matching the search query input by the user, and may obtain web pages corresponding to the search results selected by the user from the information database.
下面,通过具体实施例对本申请提供的搜索结果的摘要确定方法进行介绍。Hereinafter, the method for determining the abstract of the search result provided by the present application will be introduced through specific embodiments.
实施例一Example 1
图4为本申请实施例提供的搜索结果的摘要确定方法的一种流程示意图,如图4所示,可以按以下步骤S401~步骤S416确定搜索结果的摘要,步骤S401~步骤S416的执行主体为服务器。FIG. 4 is a schematic flowchart of a method for determining an abstract of a search result provided by an embodiment of the present application. As shown in FIG. 4 , the abstract of a search result may be determined according to the following steps S401 to S416, and the execution subject of steps S401 to S416 is: server.
步骤S401:获取搜索结果的文本信息,并从该文本信息中确定目标段落。Step S401: Acquire the text information of the search result, and determine the target paragraph from the text information.
上述目标段落为搜索结果的摘要所在段落。The above target paragraph is the paragraph where the abstract of the search result is located.
图5、图6示出了两例从目标段落确定搜索结果的摘要的示例。例如,如图5所示,所确定的摘要所在段落可以为下面的第一段落5a:“Mary is one of the most popular singers and dancers.Mary is born in August 4th,1969.She is very good at singing and acting.Her songs are always pop music,but there are also countryside music,hip-hop,and rock.She composed many of her own songs.”Figures 5 and 6 show two examples of determining the abstract of the search result from the target paragraph. For example, as shown in FIG. 5, the determined paragraph of the abstract may be the following first paragraph 5a: "Mary is one of the most popular singers and dancers. Mary is born in August 4th, 1969. She is very good at singing and acting. Her songs are always pop music, but there are also countryside music, hip-hop, and rock. She composed many of her own songs.”
又如,如图6所示,所确定的摘要所在段落也可以为下面的第二段落6a:“注塑成型的工艺过程是:在一定温度下,通过搅拌完全熔融的塑料材料,用高压射入模腔,经冷却固化后得到成型品。注塑成型方法的优点是生产速度快、效率高、产品整体性好,生产过程可实现自动化,产出的制品形状多样,制品尺寸精确,产品易更新换代,能成形状复杂的制件,注塑成型的具体工艺以《塑料制品生产方法》第三章内容为参考。注塑过程中的参数主要有注塑压力、注塑时间、注塑温度、保压时长和温度、背压压力。”For another example, as shown in Fig. 6, the determined paragraph of the abstract may also be the following second paragraph 6a: "The process of injection molding is: at a certain temperature, by stirring a completely molten plastic material, injecting it with high pressure The mold cavity is cooled and solidified to obtain a molded product. The advantages of the injection molding method are that the production speed is fast, the efficiency is high, the product integrity is good, the production process can be automated, the output products have various shapes, accurate product dimensions, and products are easy to replace. The specific process of injection molding is based on the content of Chapter 3 of "Production Methods of Plastic Products". The parameters in the injection molding process mainly include injection pressure, injection time, injection temperature, pressure holding time and temperature, back pressure pressure."
上述第一段落5a和第二段落6a是分别针对摘要所在段落为英文、中文时的举例,搜索结果的具体内容不同、对应的语言不同,摘要所在段落的内容、语言也不同,第一段落5a和第二段落6a是针对两个不同的搜索查询所确定出的搜索结果对应的两个摘要所在段落。The above-mentioned first paragraph 5a and second paragraph 6a are examples when the paragraph where the abstract is located is in English and Chinese respectively. The specific content of the search results and the corresponding language are different, and the content and language of the paragraph where the abstract is located are also different. The second paragraph 6a is the paragraph where the two abstracts corresponding to the search results determined for the two different search queries are located.
步骤S402:从目标段落的起始位置开始确定预设长度个字符。Step S402: Determine a preset length of characters from the start position of the target paragraph.
例如,当预设长度为140个字符时,对于上述第一段落5a,如图5所示,服务器所确 定的预设长度个字符5b为:“Mary is one of the most popular singers and dancers.Mary is born in August 4th,1969.She is very good at singing and acting.Her songs are”,所确定的预设长度个字符为非中文字符;对于上述第二段落6a,如图6所示,服务器所确定的预设长度个字符6b为:“注塑成型的工艺过程是:在一定温度下,通过搅拌完全熔融的塑料材料,用高压射入模腔,经冷却固化后得到成型品。注塑成型方法的优点是生产速度快、效率高、产品整体性好,生产过程可实现自动化,产出的制品形状多样,制品尺寸精确,产品易更新换代,能成形状复杂的制件,注塑成型的具体工艺以《塑料制品”,所确定的预设长度个字符为中文字符。For example, when the preset length is 140 characters, for the first paragraph 5a above, as shown in Figure 5, the preset length characters 5b determined by the server are: "Mary is one of the most popular singers and dancers. Mary is born in August 4th, 1969. She is very good at singing and acting. Her songs are", the determined preset length characters are non-Chinese characters; for the second paragraph 6a above, as shown in Figure 6, the server determines The preset length of the characters 6b is: "The process of injection molding is: at a certain temperature, by stirring the completely molten plastic material, injecting it into the mold cavity with high pressure, and cooling and solidifying to obtain a molded product. The advantages of the injection molding method The production speed is fast, the efficiency is high, the product integrity is good, the production process can be automated, the output products have various shapes, accurate product dimensions, the products are easy to replace, and can be formed into complex shapes. The specific process of injection molding is based on "plastics". Products", the determined preset length characters are Chinese characters.
其中,所确定的预设长度个字符包括空格。由于搜索结果的摘要在客户端上显示时,空格、标点符号等字符均会占用显示空间,所以,所确定的预设长度个字符为包括空格的实际的字符长度。Wherein, the determined preset length characters include spaces. Since characters such as spaces and punctuation marks will occupy display space when the summary of the search result is displayed on the client, the determined preset length of characters is the actual character length including spaces.
步骤S403:评估上述预设长度个字符的最后一句文本对应的单词数是否不小于第二预设数量。Step S403: Evaluate whether the number of words corresponding to the last sentence of text with the preset length of characters is not less than the second preset number.
步骤S403得到的评估结果即第一评估结果。The evaluation result obtained in step S403 is the first evaluation result.
最后一句文本对应的单词数可以为最后一句文本包含的中文的单字词的数量和非中文的单词的数量之和。其中,非中文的单词可以包括英文单词、德文单词、法文单词等,标点符号对应的单词数为零,连续的至少一个数字为1个非中文的单词,连续的数字和英文字符为1个非中文的单词,连续指的是相邻且未被空格隔开。本实施例中,单字词可以理解为单个字。The number of words corresponding to the last sentence of text may be the sum of the number of Chinese single-character words and the number of non-Chinese words contained in the last sentence of text. Among them, non-Chinese words can include English words, German words, French words, etc., the number of words corresponding to punctuation marks is zero, at least one consecutive number is a non-Chinese word, and consecutive numbers and English characters are 1 Non-Chinese words, consecutive means adjacent and not separated by spaces. In this embodiment, a single-character word can be understood as a single character.
上述第二预设数量可以是2至5中的任一数量,也可以是其他较少的数量。The above-mentioned second preset number may be any number from 2 to 5, or may be another smaller number.
例如,第二预设数量为4,如图5所示,服务器从上述第一段落5a的起始位置开始所确定的预设长度个字符5b的最后一句文本为“Her songs are”,其包含的单词的个数3小于第二预设数量4,服务器评估出上述预设长度个字符5b的最后一句文本对应的单词数小于第二预设数量,也就是说,步骤S403的评估结果为否。For example, the second preset number is 4. As shown in FIG. 5 , the last sentence text of the preset length characters 5b determined by the server from the starting position of the first paragraph 5a is "Her songs are", which contains The number of words 3 is less than the second preset number 4, and the server evaluates that the number of words corresponding to the last sentence of text of the preset length characters 5b is less than the second preset number, that is, the evaluation result of step S403 is no.
再例如,如图6所示,服务器从上述第二段落6a的起始位置开始所确定的预设长度个字符6b的最后一句文本为“注塑成型的具体工艺以《塑料制品”,其包含的单词的个数15大于第二预设数量4,服务器评估出上述预设长度个字符6b的最后一句文本对应的单词的数量大于第二预设数量,也就是说,步骤S403的评估结果为是。For another example, as shown in FIG. 6 , the last text of the preset length characters 6b determined by the server from the starting position of the second paragraph 6a is “the specific process of injection molding is “plastic products”, which contains The number of words 15 is greater than the second preset number 4, and the server evaluates that the number of words corresponding to the last sentence of text of the above-mentioned preset length characters 6b is greater than the second preset number, that is, the evaluation result of step S403 is Yes .
下面通过几个示例说明如何确定最后一句文本对应的单词数。The following is a few examples to illustrate how to determine the number of words corresponding to the last sentence of text.
例如,最后一句文本为“2016年8月以3D版在内地重映”,这句文本包括的中文的单个字的数量为9个(9个中文的单字词),非中文的单词的数量为3个(分别为2016、8、3D),所以,这句文本对应的单词数为9+3=12个单词。For example, the text of the last sentence is "Rescreened in the Mainland in 3D version in August 2016", the number of single Chinese characters included in this sentence is 9 (9 Chinese single-character words), and the number of non-Chinese words is 3 (respectively 2016, 8, and 3D), so the number of words corresponding to this text is 9+3=12 words.
再例如,最后一句文本为“Mary Ly是主人公的名字”,这句文本包括的中文的单个字的数量为7个(7个中文的单字词),英文单词的数量为2个(2个非中文的单词),所以,这句文本对应的单词数为7+2=9个单词。For another example, the last sentence of text is "Mary Ly is the name of the protagonist", the number of single Chinese characters included in this text is 7 (7 Chinese single-character words), and the number of English words is 2 (2 non-Chinese words), so the number of words corresponding to this text is 7+2=9 words.
再例如,最后一句文本为“《塑料制品生产方法》出版于1995年”,这句文本包括的中文的单个字的数量为12个(12个中文的单字词),非中文的单词的数量为1个(为19995),所以,这句文本对应的单词数为12+1=13个单词。For another example, the text of the last sentence is "The Production Method of Plastic Products" was published in 1995. The number of single Chinese characters included in this sentence is 12 (12 Chinese single-character words), and the number of non-Chinese words is 1 (19995), so the number of words corresponding to this text is 12+1=13 words.
步骤S403也可以用步骤S404替代。Step S403 can also be replaced by step S404.
步骤S404:评估上述预设长度个字符的最后一句文本对应的字符长度是否不小于第一预设长度。Step S404: Evaluate whether the character length corresponding to the last sentence of text of the preset length characters is not less than the first preset length.
步骤S404得到的评估结果即第一评估结果。步骤S404在图中未示出。The evaluation result obtained in step S404 is the first evaluation result. Step S404 is not shown in the figure.
最后一句文本可能包含中文字符,也有可能包含英文字符,或者包含其他的非中文字符,本申请实施例中,最后一句文本对应的字符长度,可以理解为是最后一句文本对应的各个字符的总的长度,即最后一句文本所包含的各类字符的总的长度。具体的,最后一句文本对应的字符长度可以为最后一句文本包含的中文字符的数量与非中文字符的数量之和,其中,一个中文字为一个中文字符,一个英文字母为一个非中文字符,一个全角的标点符号为一个中文字符,一个半角的标点符号为一个非中文字符,一个数字为一个非中文字符,空格为零个字符。The last sentence of text may contain Chinese characters, English characters, or other non-Chinese characters. In this embodiment of the present application, the character length corresponding to the last sentence of text can be understood as the sum of the characters corresponding to the last sentence of text. Length, that is, the total length of all kinds of characters contained in the last sentence of text. Specifically, the character length corresponding to the last sentence of text may be the sum of the number of Chinese characters and the number of non-Chinese characters contained in the last sentence of text, where one Chinese character is one Chinese character, one English letter is one non-Chinese character, and one A full-width punctuation mark is one Chinese character, a half-width punctuation mark is one non-Chinese character, a number is one non-Chinese character, and spaces are zero characters.
上述第一预设长度可以是3个至8个字符中的任一长度,也可以是其他的具体长度,当目标段落的语言不同时,第一预设长度的值可以不同,例如,当目标段落的语言是图5中5a所示的英文时,第一预设长度可以为10~15中的任一值,当目标段落的语言是图6中6a所示的中文时,第一预设长度可以为3~8中的任一值,因为相同的字符长度对应的中文和英文的语义完整度是不同的(中文语义更完整),所以,中文对应的第一预设长度小于英文(或者其他以字母组成单词的语言)对应的第一预设长度。当第一预设长度设置的较长时,可以在步骤S404中更大概率地将较短的、且语义不完整的最后一句文本删除,使最后得到的搜索结果的摘要的末尾为语义更完整的句子;由于有些短句也可能表达完整的语义,当第一预设长度设置的较短时,可以在步骤S404中减少误删语义完整的或者参考价值较大的最后一句文本的概率。本领域技术人员可以根据实际场景设置第一预设长度的具体值。The above-mentioned first preset length can be any length from 3 to 8 characters, or can be other specific lengths. When the language of the target paragraph is different, the value of the first preset length can be different. When the language of the paragraph is English as shown in 5a in FIG. 5, the first preset length can be any value from 10 to 15. When the language of the target paragraph is Chinese as shown in 6a in FIG. 6, the first preset length The length can be any value from 3 to 8, because the semantic integrity of Chinese and English corresponding to the same character length is different (Chinese semantics are more complete), so the first preset length corresponding to Chinese is smaller than English (or other languages that use letters to form words) corresponding to the first preset length. When the first preset length is set to be longer, in step S404, the shorter and incomplete last sentence of text can be deleted with greater probability, so that the end of the abstract of the finally obtained search result is more semantically complete. Since some short sentences may also express complete semantics, when the first preset length is set to be shorter, the probability of mistakenly deleting the last sentence with complete semantics or larger reference value can be reduced in step S404. Those skilled in the art can set the specific value of the first preset length according to the actual scene.
例如,第一预设长度为13,如图5所示,服务器从上述第一段落5a的起始位置开始所确定的预设长度个字符5b的最后一句文本为“Her songs are”,其对应的字符长度11小于第一预设长度13,服务器评估出上述预设长度个字符5b的最后一句文本对应的字符长度小于第预设长度,也就是说,步骤S414的评估结果为否。For example, the first preset length is 13. As shown in FIG. 5 , the last sentence text of the preset length characters 5b determined by the server from the starting position of the first paragraph 5a is "Her songs are", and the corresponding If the character length 11 is less than the first preset length 13, the server evaluates that the character length corresponding to the last sentence of text of the preset length characters 5b is less than the first preset length, that is, the evaluation result of step S414 is no.
再例如,第一预设长度为5,如图6所示,服务器从上述第二段落6a的起始位置开始所确定的预设长度个字符6b的最后一句文本为“注塑成型的具体工艺以《塑料制品”,其对应的字符长度15大于第一预设长度5,服务器评估出上述预设长度个字符6b的最后一句文本对应的字符长度大于第一预设长度,也就是说,步骤S414的评估结果为是。For another example, the first preset length is 5. As shown in FIG. 6 , the last sentence of the preset length characters 6b determined by the server from the starting position of the second paragraph 6a is “The specific process of injection molding is "Plastic products", the corresponding character length 15 is greater than the first preset length 5, and the server evaluates that the character length corresponding to the last sentence of text of the above preset length characters 6b is greater than the first preset length, that is, step S414 The evaluation result is yes.
为方便理解,下面通过几个示例说明如何确定最后一句文本对应的字符长度。For the convenience of understanding, the following examples illustrate how to determine the character length corresponding to the last sentence of text.
例如,最后一句文本为“2016年8月以3D版在内地重映”,这句文本包括的中文字的数量为9个(9个中文字符),数字的数量为6个(6个非中文字符),英文字母的数量为1个(1个非中文字符),所以,这句文本对应的字符长度为9+6+1=16个字符。For example, the last sentence of text is "Re-released in 3D version in Mainland China in August 2016", the number of Chinese characters included in this sentence is 9 (9 Chinese characters), and the number of numbers is 6 (6 non-Chinese characters) character), the number of English letters is 1 (1 non-Chinese character), so the character length corresponding to this text is 9+6+1=16 characters.
再例如,最后一句文本为“Mary Ly是主人公的名字”,这句文本包括的中文字的数量为7个(7个中文字符),英文字母的数量为6个(6个非中文字符),所以,这句文本对应的字符长度为7+6=13个字符。For another example, the last sentence of text is "Mary Ly is the protagonist's name", the number of Chinese characters included in this text is 7 (7 Chinese characters), and the number of English letters is 6 (6 non-Chinese characters), Therefore, the character length corresponding to this text is 7+6=13 characters.
再例如,最后一句文本为“《塑料制品生产方法》出版于1995年”,这句文本包括的中文字的数量为12个(12个中文字符),数字的数量为4个(4个非中文字符),全角的标点符号的数量为2个(2个中文字符),所以,这句文本对应的字符长度为12+4+2=18 个字符。For another example, the text of the last sentence is "The Production Method of Plastic Products" was published in 1995. The number of Chinese characters included in this sentence is 12 (12 Chinese characters), and the number of numbers is 4 (4 non-Chinese characters). character), the number of full-width punctuation marks is 2 (2 Chinese characters), so the character length corresponding to this text is 12+4+2=18 characters.
步骤S403、步骤S404也可以理解为对最后一句文本进行格式的评估。步骤S403、步骤S404中确定预设长度个字符的最后一句的具体过程可以参考实施例二中的步骤S703。Steps S403 and S404 can also be understood as evaluating the format of the last sentence of text. For the specific process of determining the last sentence of the preset length of characters in steps S403 and S404, reference may be made to step S703 in the second embodiment.
当步骤S403的评估结果为否时,或者步骤S404的评估结果为否时,执行步骤S405,当步骤S403的评估结果为是时,或者步骤S404的评估结果为是时,执行步骤S407。When the evaluation result of step S403 is negative, or the evaluation result of step S404 is negative, step S405 is performed, and when the evaluation result of step S403 is yes, or the evaluation result of step S404 is positive, step S407 is performed.
步骤S405:将上述最后一句文本删除。Step S405: Delete the last sentence of text.
例如,如图5所示,对于上述第一段落5a,可以删除从上述第一段落5a的起始位置开始确定的预设长度个字符5b的最后一句文本“Her songs are”,删除“Her songs are”后的文本信息5c为:“Mary is one of the most popular singers and dancers.Mary is born in August 4th,1969.She is very good at singing and acting.”删除“Her songs are”后的文本的结尾位置为第一截取位置,删除“Her songs are”后的文本信息5c的字符长度为127个。For example, as shown in FIG. 5, for the above-mentioned first paragraph 5a, the last sentence of the text "Her songs are" of the preset length characters 5b determined from the starting position of the above-mentioned first paragraph 5a can be deleted, and "Her songs are" can be deleted. The following text information 5c is: "Mary is one of the most popular singers and dancers. Mary is born in August 4th, 1969. She is very good at singing and acting." The end position of the text after deleting "Her songs are" For the first interception position, the length of the text information 5c after deleting "Hersongs are" is 127 characters.
步骤S406:将删除最后一句文本后得到的文本的结尾位置确定为第一截取位置,将第一截取位置确定为最新的截取位置。Step S406: Determine the end position of the text obtained after deleting the last sentence of text as the first interception position, and determine the first interception position as the latest interception position.
步骤S406执行完后,可以执行步骤S408。After step S406 is performed, step S408 may be performed.
步骤S407:将上述最后一句文本的结尾位置确定为第一截取位置,将第一截取位置确定为最新的截取位置。Step S407: Determine the end position of the last sentence of text as the first interception position, and determine the first interception position as the latest interception position.
步骤S407执行完成后,可以执行步骤S410。After the execution of step S407 is completed, step S410 may be executed.
步骤S408:判断目标段落中位于第一截取位置之前的文本对应的字符长度是否不小于预设的长度范围的下限。Step S408: Determine whether the character length corresponding to the text located before the first clipping position in the target paragraph is not less than the lower limit of the preset length range.
例如,当预设的长度范围为120个~200个字符时,如图5所示,步骤S405所举的例子中删除“Her songs are”后的文本信息5c的字符长度127个不小于120个~200个中的下限120个。For example, when the preset length range is 120 to 200 characters, as shown in FIG. 5 , in the example in step S405, the length of the text information 5c after deleting "Hersongs are" is 127 characters and not less than 120 characters. The lower limit of ~200 is 120.
再例如,当预设的长度范围为130个~200个字符时,如图5所示,步骤S404所举的例子中删除“Her songs are”后的文本信息5c的字符长度127个小于130个~200个中的下限130个。For another example, when the preset length range is 130 to 200 characters, as shown in FIG. 5 , in the example in step S404, the length of 127 characters in the text information 5c after “Hersongs are” is deleted is less than 130 characters. The lower limit of ~200 is 130.
若步骤S408的判断结果为否,则执行步骤S409,若步骤S408的判断结果为是,则执行步骤S410。If the judgment result of step S408 is NO, execute step S409, and if the judgment result of step S408 is YES, execute step S410.
步骤S409:将上述最后一句文本补充为一句完整文本,并将最新的截取位置更新为补充为一句完整文本后得到的文本的结尾位置。Step S409 : supplementing the last sentence of text as a complete text, and updating the latest interception position to the end position of the text obtained by adding a complete text.
上述最新的截取位置用于确定搜索结果的摘要。The most recent clipping position described above is used to determine the summary of the search results.
步骤S409执行完成后,可以执行步骤S415,也可以直接执行步骤S413。After step S409 is executed, step S415 may be executed, or step S413 may be executed directly.
例如,若摘要所在段落为上述第一段落5a,如图5所示,服务器将所确定的预设长度个字符5b的最后一句文本删除后的文本信息为5c,它包括127个字符,若预设的长度范围为130个~200个字符,由于127小于130,所以将预设长度个字符5b的最后一句文本补充为一句完整文本后得到的文本5d为“Mary is one of the most popular singers and dancers.Mary is born in August 4th,1969.She is very good at singing and acting.Her songs are always pop music,”,得到的文本包括158个字符,大于下限130个。For example, if the paragraph where the abstract is located is the above-mentioned first paragraph 5a, as shown in FIG. 5 , the text information after the server deletes the last sentence of text with the determined preset length of characters 5b is 5c, which includes 127 characters. The length ranges from 130 to 200 characters. Since 127 is less than 130, the text 5d obtained after adding the last sentence of the preset length of characters 5b to a complete sentence is "Mary is one of the most popular singers and dancers" .Mary is born in August 4th, 1969. She is very good at singing and acting. Her songs are always pop music,", the obtained text includes 158 characters, which is greater than the lower limit of 130.
步骤S410:评估最新的截取位置之前的文本的末尾是否为不完整的主题信息。Step S410: Evaluate whether the end of the text before the latest clipping position is incomplete topic information.
本实施例中,步骤S406、步骤S407确定了最新的截取位置,步骤S409对最新的截 取位置进行了更新。服务器可能会根据所确定的第一截取位置之前的文本对最新的截取位置进行一次或多次更新调整,例如步骤S409进行了一次更新,所以,服务器每次均是对最新的截取位置之前的文本进行评估。In this embodiment, the latest interception position is determined in steps S406 and S407, and the latest interception position is updated in step S409. The server may update and adjust the latest interception position one or more times according to the text before the determined first interception position. For example, step S409 is updated once, so the server updates the text before the latest interception position every time. to evaluate.
例如,如图5所示,当最新的截取位置之前的文本为5d时,最后一句文本为“Her songs are always pop music,”,服务器评估出“Her songs are always pop music,”的末尾不为不完整的主题信息。For example, as shown in Figure 5, when the text before the latest interception position is 5d, the last sentence of text is "Her songs are always pop music,", and the server evaluates that the end of "Her songs are always pop music," is not Incomplete subject information.
再例如,如图6所示,当最新的截取位置之前的文本即从目标段落6a的起始位置开始所确定的预设长度个字符6b时(由于还没有对最新的截取位置进行过更新调整,所以最新的截取位置之前的文本也就是未被调整的预设长度个字符6b),最近一次进行文本调整后得到的文本6b的最后一句文本为“注塑成型的具体工艺以《塑料制品”,服务器评估出“注塑成型的具体工艺以《塑料制品”的末尾为不完整的主题信息“《塑料制品”。For another example, as shown in FIG. 6, when the text before the latest clipping position is a preset length of characters 6b determined from the starting position of the target paragraph 6a (because the latest clipping position has not been updated and adjusted yet) , so the text before the latest interception position is the unadjusted preset length characters 6b), and the last sentence of the text 6b obtained after the latest text adjustment is "The specific process of injection molding is based on "plastic products", The server evaluated that "the specific process of injection molding ends with "plastic products" as incomplete subject information "plastic products".
若步骤S410的评估结果为是,则执行步骤S411,若步骤S410的评估结果为否,则执行步骤S412。If the evaluation result of step S410 is yes, then step S411 is executed, and if the evaluation result of step S410 is no, then step S412 is executed.
步骤S411:根据目标段落将不完整的主题信息补充完整,并将最新的截取位置更新为主题信息补充完整后得到的文本的结尾位置。Step S411: Complete the incomplete topic information according to the target paragraph, and update the latest interception position to the end position of the text obtained after the topic information is completed.
例如,如图6所示,若最新的截取位置之前的文本为6b,其最后一句文本为“注塑成型的具体工艺以《塑料制品”,则将“注塑成型的具体工艺以《塑料制品”补充为“注塑成型的具体工艺以《塑料制品生产方法》”,补充后的文本如图6中6c所示。For example, as shown in Figure 6, if the text before the latest interception position is 6b, and the last sentence of the text is "the specific process of injection molding is "plastic products", then "the specific process of injection molding is supplemented with "plastic products" For "the specific process of injection molding is "Production Method of Plastic Products", the supplemented text is shown as 6c in Figure 6.
执行完步骤S411后,可以执行步骤S412。After step S411 is performed, step S412 may be performed.
步骤S412:评估最新的截取位置之前的文本的最后一句文本的语义是否完整。Step S412: Evaluate whether the semantics of the last sentence of the text before the latest clipping position are complete.
例如,如图6所示,若新的截取位置之前的文本为图6中的6c所示的文本,其最后一句文本为“注塑成型的具体工艺以《塑料制品生产方法》”,这句话中的主语为“注塑成型的具体工艺”,而主语之后并无谓语,本申请中,可以当主语后出现动词或形容词时确定主语后存在谓语,本句文本中主语后的“以”并不是个动词或形容词,所以认为本句文本缺少谓语,因此确定本句文本“注塑成型的具体工艺以《塑料制品生产方法》”的语义不完整。For example, as shown in Figure 6, if the text before the new interception position is the text shown in 6c in Figure 6, the last sentence of the text is "The specific process of injection molding is "Production Method of Plastic Products", this sentence The subject is "the specific process of injection molding", and there is no predicate after the subject. In this application, when a verb or adjective appears after the subject, it can be determined that there is a predicate after the subject. Therefore, it is considered that the text of this sentence lacks a predicate, so it is determined that the semantics of the text "the specific process of injection molding is based on "Production Method of Plastic Products" is incomplete.
例如,如图5所示,若最新的截取位置之前的文本为图5中5d所示的文本,其最后一句文本为“Her songs are always pop music,”,服务器评估出“Her songs are always pop music,”包括主语“Her songs”、谓语“are”和宾语“pop music”,所以认为本句文本的语义完整。For example, as shown in Figure 5, if the text before the latest interception position is the text shown in 5d in Figure 5, and the last sentence of the text is "Her songs are always pop music,", the server evaluates that "Her songs are always pop music," includes the subject "Her songs", the predicate "are" and the object "pop music", so the semantics of the text of this sentence is considered complete.
若步骤S412的评估结果为是,则执行步骤S413,若步骤S412的评估结果为否,则执行步骤S414。If the evaluation result of step S412 is yes, step S413 is executed, and if the evaluation result of step S412 is negative, step S414 is executed.
步骤S413:将最新的截取位置之前的文本确定为搜索结果的摘要。Step S413: Determine the text before the latest clipping position as the abstract of the search result.
服务器在确定搜索结果的摘要过程中,在步骤S403的评估最后一句文本的单词的数量后、步骤S404的评估最后一句文本对应的字符长度后、在步骤S410的评估最新的截取位置之前的文本的末尾的主题信息后、以及在步骤S412的评估最新的截取位置之前的文本的最后一句文本的语义完整性后,都有可能对最新的截取位置进行更新调整,因此,最新的截取位置的更新次数可能是一次,也可能是多次。当服务器在各次评估后,均未对最新的截取位置进行调整时,最新的截取位置即第一截取位置。In the process of determining the summary of the search result, after evaluating the number of words in the last sentence of text in step S403, after evaluating the character length corresponding to the last sentence of text in step S404, and evaluating the text before the latest interception position in step S410 After the topic information at the end, and after evaluating the semantic integrity of the last sentence of the text before the latest interception position in step S412, it is possible to update and adjust the latest interception position. Therefore, the number of updates of the latest interception position is It could be one time, it could be multiple times. When the server does not adjust the latest interception position after each evaluation, the latest interception position is the first interception position.
步骤S414:根据目标段落将最新的截取位置之前的文本的最后一句文本的语义补充完整,并将最新的截取位置更新为语义补充完整后得到的文本的结尾位置。Step S414: Complete the semantics of the last sentence of the text before the latest clipping position according to the target paragraph, and update the latest clipping position to the end position of the text obtained after the semantics is completed.
例如,如图6所示,最新的截取位置之前的文本为图6中6c所示的文本,其最后一句文本为“注塑成型的具体工艺以《塑料制品生产方法》”,服务器可以将这句补充为“注塑成型的具体工艺以《塑料制品生产方法》第三章内容为参考”,得到的文本如图6中6d所示。可见,当将语义补充完整后,最后一句变为“注塑成型的具体工艺以《塑料制品生产方法》第三章内容为参考”,该句语义完整。For example, as shown in Figure 6, the text before the latest interception position is the text shown in 6c in Figure 6, and the last sentence of the text is "The specific process of injection molding is "Production Method of Plastic Products", and the server can convert this sentence It is added that "the specific process of injection molding is based on the content of Chapter 3 of "Production Methods of Plastic Products", and the obtained text is shown in 6d in Figure 6. It can be seen that when the semantics are completed, the last sentence becomes "the specific process of injection molding is based on the content of Chapter 3 of "Production Methods of Plastic Products", and the semantics of this sentence are complete.
步骤S415:判断最新的截取位置之前的文本的字符长度是否不大于预设的长度范围的上限。Step S415: Determine whether the character length of the text before the latest clipping position is not greater than the upper limit of the preset length range.
若步骤S415的判断结果为是,则执行步骤S413,若步骤S415的判断结果为否,则执行步骤S416。If the judgment result of step S415 is yes, then step S413 is executed, and if the judgment result of step S415 is no, then step S416 is executed.
步骤S416:将最新的截取位置更新为第一截取位置。Step S416: Update the latest interception position to the first interception position.
例如,如图6所示,最新的截取位置之前的文本为图6中的6d所示的文本,为:“注塑成型的工艺过程是:在一定温度下,通过搅拌完全熔融的塑料材料,用高压射入模腔,经冷却固化后得到成型品。注塑成型方法的优点是生产速度快、效率高、产品整体性好,生产过程可实现自动化,产出的制品形状多样,制品尺寸精确,产品易更新换代,能成形状复杂的制件,注塑成型的具体工艺以《塑料制品生产方法》第三章内容为参考”,6d所示的文本的字符长度为153个,若预设的长度范围为120个~150个字符,则得到的文本的字符长度153大于预设的长度范围的上限,所以,将最新的截取位置更新为6c的结尾位置(即第一截取位置),得到的文本如图6中的6e所示。For example, as shown in Figure 6, the text before the latest interception position is the text shown at 6d in Figure 6, which is: "The process of injection molding is: at a certain temperature, by stirring a completely molten plastic material, with High pressure is injected into the mold cavity, and the molded product is obtained after cooling and solidification. The advantages of the injection molding method are that the production speed is fast, the efficiency is high, the product integrity is good, the production process can be automated, the output products have various shapes, accurate product dimensions, and It is easy to be replaced and can be made into parts with complex shapes. The specific process of injection molding is based on the content of Chapter 3 of "Production Methods of Plastic Products". The length of the text shown in 6d is 153 characters. If the preset length range is 120 to 150 characters, then the character length 153 of the obtained text is greater than the upper limit of the preset length range. Therefore, update the latest interception position to the end position of 6c (ie, the first interception position), and the obtained text is shown in the figure 6e in 6 is shown.
执行完步骤S416后,执行步骤S413。After step S416 is performed, step S413 is performed.
若步骤S416进行删除操作后得到的文本的字符长度小于预设的长度范围的下限,则不再更新最新的截取位置。例如,如图6所示,图6中最后一步删除操作后得到的文本6e对应的字符长度为125个,若预设的长度范围为130个~160个字符,下限为130个,此时,即使125小于130,也不再对最新的截取位置进行更新。当摘要的字符长度超过预设的长度范围时,摘要在客户端显示的幅面会占比较大,影响网页的整体显示效果,所以,摘要不易过长。If the character length of the text obtained after the deletion operation is performed in step S416 is less than the lower limit of the preset length range, the latest interception position is not updated. For example, as shown in Figure 6, the length of the text 6e obtained after the last step of the deletion operation in Figure 6 corresponds to 125 characters. If the preset length range is 130 to 160 characters, and the lower limit is 130 characters, at this time, Even if 125 is less than 130, the latest interception position is no longer updated. When the character length of the abstract exceeds the preset length range, the size of the abstract displayed on the client will take up a larger proportion, which affects the overall display effect of the webpage. Therefore, the abstract is not easy to be too long.
本实施例中,步骤S408~步骤S416即根据第一截取位置从目标段落中确定所述搜索结果的摘要。In this embodiment, steps S408 to S416 are to determine the abstract of the search result from the target paragraph according to the first clipping position.
图13为使用实施例一提供的搜索结果的摘要确定方法确定出的各搜索结果的摘要在浏览器上的显示示意图。用户输入“木地板”的搜索查询后,使用本实施例提供的搜索结果的摘要确定方法确定出各搜索结果的摘要,浏览器显示各条搜索结果对应的内容如图13所示。用户输入“木地板”的搜索查询后,从每个搜索结果对应的网页文本的目标段落的起始位置开始直接截取预设长度个字符,将截取的字符确定为网页摘要,浏览器显示各条搜索结果对应的内容如图2所示。FIG. 13 is a schematic diagram of displaying on a browser the abstracts of each search result determined by using the method for determining the abstract of the search results provided in the first embodiment. After the user enters the search query of "wooden floor", the abstract of each search result is determined using the method for determining the abstract of the search result provided in this embodiment, and the browser displays the content corresponding to each search result as shown in FIG. 13 . After the user enters the search query of "wooden floor", the preset length characters are directly intercepted from the starting position of the target paragraph of the webpage text corresponding to each search result, and the intercepted characters are determined as the webpage summary, and the browser displays each entry. The content corresponding to the search results is shown in Figure 2.
图13与图2相比,图2中第二条搜索结果的摘要的最后一句为“以强”,对“以强”进行评估,得到的评估结果是“以强”(2个字符)对应的字符长度小于第一预设长度(第一预设长度例如是5个字符),说明最后一句文本对应的字符长度较小,可见最后一句文本很难表达出有效信息,所以将最后一句文本“以强”删除,删除后得到的搜索结果的摘 要如图13中第二条搜索结果所示;图2中第三、四条搜索结果的摘要的最后一句文本分别为“不然”、“这里”,同理,这两条摘要的最后一句文本对应的字符长度均较小,最后一句文本也很难表达出有效信息,所以将这两条摘要的最后一句删除,删除后得到的搜索结果的摘要如图13中第三、四条搜索结果所示;图2中第一条、第五条搜索结果的摘要的最后一句分别为“这个铺什么和好不好有什么关系吗”、“复合木地板再生产时有加入甲醛”,对第一条、第五条搜索结果的摘要的最后一句进行评估,得到的评估结果是最后一句文本字符长度不小于第一预设长度和语义完整,所以,不对这两条的最后一句进行处理;图2中第六条搜索结果的摘要的最后一句文本为“但它也存在木材消”,对第六条搜索结果对应的摘要的最后一句文本进行评估,其对应的字符长度不小于第一预设长度,进一步评估得到的评估结果是这条搜索结果的摘要的最后一句文本的语义不完整,所以将这条搜索结果的摘要的最后一句文本补充完整。Compared with Fig. 2, Fig. 13 shows that the last sentence of the abstract of the second search result in Fig. 2 is "to be strong", and to evaluate "to be strong", the obtained evaluation result is that "by strong" (2 characters) corresponds to The length of the characters is less than the first preset length (for example, the first preset length is 5 characters), indicating that the length of the characters corresponding to the last sentence of text is small. It can be seen that the last sentence of text is difficult to express effective information, so the last sentence of text "" Delete with "strong", the summary of the search result obtained after deletion is shown in the second search result in Figure 13; the last sentence of the abstract of the third and fourth search results in Figure 2 is "otherwise", "here", In the same way, the character lengths corresponding to the last sentence of the two abstracts are both small, and the last sentence of the text is difficult to express effective information, so delete the last sentence of the two abstracts, and the abstract of the search results obtained after deletion is as follows: As shown in the third and fourth search results in Figure 13; the last sentence of the abstracts of the first and fifth search results in Figure 2 are "What does this shop have to do with whether it is good or not?" There is added formaldehyde", evaluate the last sentence of the abstract of the first and fifth search results, and the obtained evaluation result is that the text character length of the last sentence is not less than the first preset length and the semantics are complete, so these two The text of the last sentence of the abstract of the sixth search result in Figure 2 is "but it also has wood elimination", and the text of the last sentence of the abstract corresponding to the sixth search result is evaluated. The length is not less than the first preset length, and the evaluation result obtained by further evaluation is that the semantics of the last sentence of the abstract of this search result is incomplete, so the last sentence of the abstract of this search result is completed.
从图13可以看出,使用本申请实施例的方案所确定的各个搜索结果的摘要的最后一句语句完整性更好、摘要的呈现效果更好、显示更友好。As can be seen from FIG. 13 , the last sentence of the abstract of each search result determined by using the solution of the embodiment of the present application is more complete, the presentation effect of the abstract is better, and the display is more friendly.
本实施例中,先评估摘要所在段落的预设长度个字符的最后一句文本是否满足格式要求,根据格式评估结果确定第一截取位置,将第一截取位置确定为最新的截取位置,再进一步评估第一截取位置之前的文本的最后一句文本,并根据评估结果进一步更新最新的截取位置,本实施例对最新的截取位置之前的文本进行了多次评估,每次评估完成后,根据评估结果对最新的截取位置进行了更新调整,多次更新最新的截取位置使得所确定的搜索结果的摘要的最后一句文本的语义更完整、对用户显示地更友好。In this embodiment, first evaluate whether the text of the last sentence of the preset length characters of the paragraph where the abstract is located meets the format requirements, determine the first interception position according to the format evaluation result, determine the first interception position as the latest interception position, and then further evaluate The last sentence of the text before the first interception position, and the latest interception position is further updated according to the evaluation result. In this embodiment, the text before the latest interception position is evaluated multiple times. The latest interception position is updated and adjusted, and the latest interception position is updated multiple times, so that the semantics of the last sentence of the text in the abstract of the determined search result is more complete and more user-friendly.
实施例二Embodiment 2
图7为本申请实施例提供的搜索结果的摘要确定方法的另一种流程示意图,本实施例的执行主体可以是服务器,如图7所示,可以按以下步骤S701~步骤S708确定搜索结果的摘要。FIG. 7 is another schematic flowchart of a method for determining a summary of a search result provided by an embodiment of the present application. The execution body of this embodiment may be a server. As shown in FIG. 7 , the following steps S701 to S708 may be used to determine the search result Summary.
步骤S701:获取搜索结果的文本信息,并从该文本信息中确定目标段落。Step S701: Acquire the text information of the search result, and determine the target paragraph from the text information.
步骤S702:从目标段落的起始位置开始确定设长度个字符。Step S702: From the starting position of the target paragraph, determine the set length of characters.
步骤S701、步骤S702的实现过程与步骤S401、步骤S402的实现过程相似,此处不再赘述。The implementation process of step S701 and step S702 is similar to the implementation process of step S401 and step S402, and will not be repeated here.
步骤S703:确定上述预设长度个字符的最后一句文本。Step S703: Determine the last sentence of text with the preset length of characters.
本实施例中,可以按以下方式确定预设长度个字符的最后一句:将上述预设长度个字符中的最后一个第一标点符号后的文本确定为最后一句文本。具体的,第一标点符号可以是点号,其中,点号是标点符号中的其中一种,点号用于表示口语的不同长短的停顿。在一个具体实施例中,上述第一标点符号可以是以下点号中的任一个:“,”(逗号)、“?”(问号)、“!”(感叹号)、“;”(分号)、“:”(冒号)。当文本中出现了上述点号中的任一个时,该点号之前的一句文本通常为语义较为完整的文本,所以第一标点符号为上述点号中的任一个,可以使语句划分更合理。In this embodiment, the last sentence of the preset length characters may be determined in the following manner: the text after the last first punctuation mark in the preset length characters is determined as the last sentence of text. Specifically, the first punctuation mark may be a period mark, wherein the period mark is one of the punctuation marks, and the period mark is used to indicate pauses of different lengths in spoken language. In a specific embodiment, the above-mentioned first punctuation mark may be any one of the following dots: "," (comma), "?" (question mark), "!" (exclamation mark), ";" (semicolon) ,":"(colon). When any one of the above dots appears in the text, the text before the dot is usually a text with relatively complete semantics, so the first punctuation mark is any of the above dots, which can make the sentence division more reasonable.
图8示出了上述预设长度个字符的最后一句的多个示例。FIG. 8 shows a plurality of examples of the last sentence of the above-mentioned preset length characters.
如图8所示,若从上述目标段落的起始位置开始确定的预设长度个字符如图8中的a所示,其最后一个第一标点符号为“我们可以看新闻”后的“,”,因此,将这个“,” 后的“了解国内外的事”确定为预设长度个字符的最后一句文本,图8中的a中的最后一句文本用方框框出。As shown in Figure 8, if the preset length characters determined from the starting position of the above-mentioned target paragraph are as shown in a in Figure 8, the last first punctuation mark is " after "We can watch the news", ”, therefore, the “understanding of domestic and foreign affairs” after this “,” is determined as the last sentence of text with a preset length of characters, and the last sentence of text in a in Figure 8 is framed by a box.
再例如,从上述目标段落的起始位置开始确定的预设长度个字符如图8中的b所示,其最后一个第一标点符号为“Therefore”后的“,”,因此,将这个“,”后的“briging the”确定为预设长度个字符的最后一句文本,图8中的b中的最后一句文本用方框框出。For another example, the preset length characters determined from the starting position of the above target paragraph are shown in b in Figure 8, and the last first punctuation mark is "," after "Therefore", therefore, this " ," after "briging the" is determined as the last sentence of text with a preset length of characters, and the last sentence of text in b in Figure 8 is framed by a box.
再例如,从上述目标段落的起始位置开始确定的预设长度个字符如图8中的c所示,其最后一个第一标点符号为“可以再网络上读欣慰、看电影”后的“,”,因此,将这个“,”后的“医疗、购物”确定为预设长度个字符的最后一句文本,图8中的c中,最后一个标点符号为“、”,其不属于第一标点符号,所以,不以“、”来确定最后一句文本,图8中的c中,最后一句文本用方框框出。For another example, the preset length characters determined from the starting position of the above-mentioned target paragraph are as shown in c in Figure 8, and the last first punctuation mark is "can read gratifying, watch movies on the Internet". ,", therefore, the "medical, shopping" after this "," is determined as the last sentence of text with a preset length of characters. In c in Figure 8, the last punctuation mark is ",", which does not belong to the first sentence. Punctuation marks, therefore, do not use "," to determine the last sentence of text. In c in Figure 8, the last sentence of text is framed by a box.
可选地,服务器也可以按以下方式确定上述预设长度个字符的最后一句:确定上述预设长度个字符以第一标点符号结尾,将预设长度个字符中的倒数第二个第一标点符号后的文本确定为最后一句文本。Optionally, the server may also determine the last sentence of the above-mentioned preset length characters in the following manner: determine that the above-mentioned preset length characters end with a first punctuation mark, and use the second-to-last first punctuation mark in the preset length characters. The text after the symbol is determined to be the last sentence of text.
当预设长度个字符以第一标点符号结尾时,最后一个第一标点符号后是不存在字符的,此时,若将最后一个第一标点符号后的文本确定为最后一句文本,那么,确定出的最后一句文本为空,即不存在最后一句文本,因此,可以将预设长度个字符中的倒数第二个第一标点符号后的文本确定为最后一句文本,以得到最后一句文本。When the preset length characters end with the first punctuation mark, there is no character after the last first punctuation mark. At this time, if the text after the last first punctuation mark is determined as the last sentence of text, then determine The last sentence of text output is empty, that is, there is no last sentence of text. Therefore, the text after the penultimate first punctuation mark in the preset length characters can be determined as the last sentence of text, so as to obtain the last sentence of text.
如图8所示,若从上述目标段落的起始位置开始确定的预设长度个字符如图8中的d所示,其以“,”结尾,所以最后一个第一标点符号为结尾处的“,”,倒数第二个第一标点符号为“Therefore”前的“.”,这种情况下,可以将倒数第二个第一标点符号“.”后的“Therefoer,”确定为预设长度个字符的最后一句文本,图8中的d中,最后一句文本用方框框出。As shown in FIG. 8 , if the preset length characters determined from the starting position of the above-mentioned target paragraph are as shown in d in FIG. 8 , it ends with ",", so the last first punctuation mark is the one at the end. ",", the second-to-last first punctuation mark is "." before "Therefore", in this case, the "Therefoer," after the second-to-last first punctuation mark "." can be determined as the default The last sentence of text with a length of characters, in d in Figure 8, the last sentence of text is framed by a box.
步骤S704:评估上述最后一句文本的末尾是否为不完整的主题信息。Step S704: Evaluate whether the end of the last sentence of text is incomplete topic information.
若步骤S704的评估结果为是,则执行步骤S705,若步骤S704的评估结果为否,则执行步骤S706。If the evaluation result of step S704 is yes, then step S705 is executed, and if the evaluation result of step S704 is no, then step S706 is executed.
上述主题信息可以包括:主题名称、URL,还可以包括其他类型的主题信息。其中,主题名称可以包括影视剧名称、人物名称、书籍名称、音乐作品名称、戏曲名称、曲艺表演名称等,也可以包括其他的主题名称。The above topic information may include: topic name, URL, and may also include other types of topic information. Wherein, the subject name may include the title of film and television drama, character name, book title, musical work title, opera title, folk art performance title, etc., and may also include other subject titles.
具体的,可以按以下步骤S1至步骤S4确定最后一句文本的末尾是否为不完整的主题信息。Specifically, the following steps S1 to S4 may be used to determine whether the end of the last sentence of text is incomplete topic information.
步骤S1:对最后一句文本进行分词,得到至少一个分词。Step S1: Perform word segmentation on the last sentence of text to obtain at least one word segmentation.
本实施例中,可以利用机器算法对最后一句文本进行分词,例如,使用正向最大匹配分词算法、双向最大匹配分词算法等,也可以使用其他算法对最后一句文本进行分词,本申请不限定具体的分词方式。In this embodiment, a machine algorithm can be used to segment the last sentence of text, for example, a forward maximum matching word segmentation algorithm, a bidirectional maximum matching word segmentation algorithm, etc., or other algorithms can be used to segment the last sentence of text. This application does not limit the specific participle method.
步骤S1中,当最后一句文本只包含一个单词或一个词组时,得到一个分词,当最后一句文本包含多个单词或多个词组时,得到多个分词。In step S1, when the last sentence of text contains only one word or one phrase, a word segment is obtained, and when the last sentence of text contains multiple words or multiple phrases, multiple word segments are obtained.
图9、图10分别为本实施例中对文本进行分词的示意图。例如,最后一句文本为“注塑成型的具体工艺以《塑料制品”,对这句文本进行分词的结果可以如图9所示。再例如,最后一句文本为“科技使我们的生活更加快捷方便”,对这句文本进行分词的结果可以如 图10所示。FIG. 9 and FIG. 10 are schematic diagrams of word segmentation of text in this embodiment, respectively. For example, the text of the last sentence is "the specific process of injection molding is "plastic products", and the result of word segmentation of this text can be shown in Figure 9. For another example, the last sentence of text is "Technology makes our life faster and more convenient", and the result of word segmentation of this text can be shown in Figure 10.
步骤S2:确定主题信息库中存在包含各分词中的最后一个分词的主题信息。Step S2: It is determined that there is topic information including the last participle of each participle in the subject information database.
其中,上述主题信息库中包括多个主题信息。具体的,主题信息库中可以包括:各主题名称、各统一资源定位器,也可以包括其他的主题信息。其中,主题信息库中的主题信息可以被定时或不定时的更新,以使主题信息库中的主题信息更全面。Wherein, the above-mentioned subject information database includes a plurality of subject information. Specifically, the topic information base may include: each topic name, each uniform resource locator, and may also include other topic information. The subject information in the subject information base can be updated regularly or irregularly, so that the subject information in the subject information base is more comprehensive.
具体的,服务器可以从主题信息库中查找是否包含上述最后一个分词,当查找到包含上述最后一个分词时,确定主题信息库中存在包含各分词中的最后一个分词的主题信息。Specifically, the server may check whether the last word segment is included in the topic information database, and when the last word segment is found, it is determined that topic information including the last word segment in each word segment exists in the topic information database.
步骤S3:确定上述最后一个分词与包含上述最后一个分词的主题信息相比不完整。Step S3: It is determined that the above-mentioned last participle is incomplete compared with the topic information including the above-mentioned last participle.
例如,确定的最后一个分词为“塑料制品”,信息库中的包含“塑料制品”的主题信息为“塑料制品生产方法”,由于“塑料制品”相比“塑料制品生产方法”是不完整的,可以确定最后一个分词“塑料制品”与包含“塑料制品”的主题信息“塑料制品生产方法”相比不完整。For example, the determined last participle is "plastic products", and the subject information containing "plastic products" in the information base is "production method of plastic products", since "plastic products" is incomplete compared to "production methods of plastic products" , it can be determined that the last participle "plastic products" is incomplete compared to the subject information "production method of plastic products" that contains "plastic products".
步骤S4:确定上述最后一句文本的末尾为不完整的主题信息。Step S4: It is determined that the end of the text of the last sentence is incomplete topic information.
上述步骤S1至步骤S4能够确定最后一句文本的末尾为不完整的主题信息。The above steps S1 to S4 can determine that the end of the last sentence of text is incomplete topic information.
本申请中,也可以使用其他方式确定最后一句文本的末尾为不完整的主题信息。例如,当确定最后一句文本中包括书名号这种标示作品名称的标点符号时,若最后一句包括该标点符号的左半部分,未包含该标点符号的右半部分,则确定最后一句文本的末尾为不完整的主题信息。例如,最后一句文本为“注塑成型的具体工艺以《塑料制品”,由于这句中包含了书名号的左半部分“《”,未包含右半部分“》”,所以确定这一句末尾为不完整的主题信息。In this application, other means may also be used to determine that the end of the last sentence of text is incomplete subject information. For example, when it is determined that the last sentence of text includes a punctuation mark that indicates the title of a work, if the last sentence includes the left half of the punctuation mark, but does not include the right half of the punctuation mark, then determine that the end of the last sentence of text is Incomplete subject information. For example, the text of the last sentence is "the specific process of injection molding is based on "plastic products". Since this sentence contains the left half of the book title "", but does not include the right half of """, it is determined that the end of this sentence is incomplete subject information.
还可以使用其他方式确定最后一句文本的末尾为不完整的主题信息,本申请不具体限定。Other methods may also be used to determine that the end of the last sentence of text is incomplete subject information, which is not specifically limited in this application.
图11示出了一例确定搜索结果的摘要的示例。例如,如图11所示,所确定的目标段落(即摘要所在段落)11a为:FIG. 11 shows an example of determining an abstract of a search result. For example, as shown in Figure 11, the determined target paragraph (that is, the paragraph where the abstract is located) 11a is:
“电影丰富了人们日常生活,今年是个电影大年,6月的《成长故事》是一部经典,9月的《从现在开始》也好评不断,很多电影给大家留下了深刻印象,好评不断的《奔向太阳》是一部经典电影,该片在当年并没有获得很多的奖项,但却给人留下了非常深刻的印象,时间可以证明,该片确实值得一看。”"Movies have enriched people's daily lives. This year is a big year for movies. June's "Growing Up Story" is a classic, and September's "From Now on" has also been well received. Many movies have left a deep impression on everyone and received continuous praise. "Running to the Sun" is a classic movie that didn't win many awards that year, but it left a very deep impression on people, and time will tell that the movie is really worth watching."
若预设长度为75个字符,则从目标段落11a的起始位置开始确定的预设长度个字符11b为:“电影丰富了人们日常生活,今年是个电影大年,6月的《成长故事》是一部经典,9月的《从现在开始》也好评不断,很多电影给大家留下了深刻印象,好评不断的《奔向”,所确定的预设长度个字符11b的最后一句文本为:“好评不断的《奔向”,则对“好评不断的《奔向”进行评估,经过评估,得到的评估结果为最后一句文本“好评不断的《奔向”的末尾为不完整的主题信息。If the preset length is 75 characters, the preset length of characters 11b determined from the starting position of the target paragraph 11a is: "Movie enriches people's daily life, this year is a big year for movies, and the "Growing Story" in June It is a classic, and "From Now on" in September has also been well received. Many movies have left a deep impression on everyone. The well-received "Run to", the final text of the determined preset length of characters 11b is: "Run to" with constant praise, evaluate "Run to" with constant praise. After the evaluation, the evaluation result obtained is that the last sentence of the text "The end of "Run to" with constant praise is incomplete topic information.
步骤S705:将最后一句文本的末尾的不完整的主题信息补充完整,并将主题信息补充完整后得到的文本的结尾位置确定为第一截取位置。Step S705: Complete the incomplete topic information at the end of the last sentence of text, and determine the end position of the text obtained after the topic information is completed as the first interception position.
步骤S706:评估最后一句文本的语义是否完整。Step S706: Evaluate whether the semantics of the last sentence of text are complete.
若步骤S706的判断结果为是,则执行步骤S708,若步骤S706的判断结果为否,则执行步骤S709。If the judgment result of step S706 is yes, then step S708 is executed, and if the judgment result of step S706 is no, then step S709 is executed.
步骤S708:将预设长度个字符的结尾位置确定为第一截取位置。Step S708: Determine the end position of the preset length characters as the first cutting position.
步骤S709:将最后一句文本补充完整,将最后一句补充完整后得到的文本的结尾位置确定为第一截取位置。Step S709: Complete the last sentence of text, and determine the end position of the text obtained after the last sentence is completed as the first interception position.
步骤S709执行完后,执行步骤S707。After step S709 is executed, step S707 is executed.
步骤S706和步骤S709的具体实现过程可以参考实施例三的步骤S1203和步骤S1204,以及实施例一的步骤S412和步骤S414,此处不再详细说明。For the specific implementation process of step S706 and step S709, reference may be made to step S1203 and step S1204 in the third embodiment, and step S412 and step S414 in the first embodiment, which will not be described in detail here.
步骤S707:将目标段落中位于第一截取位置之前的文本确定为搜索结果的摘要。Step S707: Determine the text in the target paragraph before the first cut position as the abstract of the search result.
例如,将主题信息补充完整后得到的文本11c的最后一句文本为“好评不断的《奔向太阳》”,通过补充主题信息得到的搜索结果的摘要如图11中的11c所示。或者,最后一句文本也可以补充为“好评不断的《奔向太阳”。For example, the last sentence of the text 11c obtained by supplementing the topic information is "Running to the Sun", which has been well received. The summary of the search result obtained by supplementing the topic information is shown in 11c in Figure 11. Alternatively, the last sentence of text could be supplemented with "Towards the Sun," which has been well-received.
本实施例对预设长度个字符的最后一句文本的末尾的主题信息是否完整进行评估,以将末尾的主题信息补充完整,对于影视剧名称、书名等主题信息,当用户阅读到不完整的主题信息时,用户通常更希望继续将主题信息阅读完,以更好地理解文本,本实施例将主题信息补充完整,使用户阅读到更完整的信息。This embodiment evaluates whether the subject information at the end of the last sentence of text with a preset length of characters is complete, so as to complete the subject information at the end. When the subject information is used, the user usually prefers to continue reading the subject information to better understand the text. In this embodiment, the subject information is supplemented completely, so that the user can read more complete information.
实施例三 Embodiment 3
图12为本申请实施例提供的搜索结果的摘要确定方法的另一种流程示意图,如图12所示,可以按以下步骤S1201~步骤S1206确定搜索结果的摘要。本实施例的执行主体可以是服务器。FIG. 12 is another schematic flowchart of a method for determining an abstract of a search result provided by an embodiment of the present application. As shown in FIG. 12 , the abstract of the search result may be determined according to the following steps S1201 to S1206 . The execution body of this embodiment may be a server.
步骤S1201:获取搜索结果的文本信息,并从该文本信息中确定目标段落。Step S1201: Acquire the text information of the search result, and determine the target paragraph from the text information.
步骤S1202:从目标段落的起始位置开始确定设长度个字符。Step S1202: From the starting position of the target paragraph, determine the set length of characters.
步骤S1201、步骤S1202的实现过程与步骤S401、步骤S402的实现过程相似,此处不再赘述。The implementation process of step S1201 and step S1202 is similar to the implementation process of step S401 and step S402, and will not be repeated here.
步骤S1203:评估上述预设长度个字符的最后一句文本的语义是否完整。Step S1203: Evaluate whether the semantics of the last sentence of text with the preset length of characters is complete.
若步骤S1203的评估结果为否,则执行步骤S1204,若步骤S1203的评估结果为是,则执行步骤S1205。If the evaluation result of step S1203 is negative, step S1204 is performed, and if the evaluation result of step S1203 is yes, step S1205 is performed.
在一个具体实施例中,可以在判定最后一句文本缺少主语、谓语与宾语中的任一个时,确定最后一句文本的语义不完整。具体的评估语义是否完整的示例可以参考实施例一的步骤S412。In a specific embodiment, when it is determined that the last sentence of text lacks any one of a subject, a predicate and an object, it can be determined that the semantics of the last sentence of text is incomplete. For a specific example of evaluating whether the semantics is complete, reference may be made to step S412 of the first embodiment.
本实施例中,也可以通过其他方式确定最后一句文本的语义不完整,例如,当最后一句文本的词的数量少于第三预设数量时,判定最后一句文本的语义不完整。例如,该第三预设数量可以是2至6中的任一数量,也可以是其他较少的数量,其中,词可以包括中文的单字词和词组、英文的单词,也可以包括其他语言的单字词和词组。In this embodiment, the semantics of the last sentence of text can also be determined to be incomplete in other ways. For example, when the number of words in the last sentence of text is less than the third preset number, it is determined that the semantics of the last sentence of text is incomplete. For example, the third preset number may be any number from 2 to 6, or may be any other smaller number, wherein the words may include single words and phrases in Chinese, words in English, and may also include other languages words and phrases.
例如,如图11所示,对所确定的预设长度个字符11b的最后一句文本“好评不断的《奔向”进行评估,得到的评估结果是最后一句文本“好评不断的《奔向”缺少谓语和宾语,语义不完整。For example, as shown in FIG. 11 , evaluating the last sentence of the text of the determined preset length of characters 11b "running towards", which is continuously praised, the obtained evaluation result is that the last sentence of text "running towards" which is constantly praised is lacking. Predicate and object, semantically incomplete.
步骤S1204:将上述最后一句文本补充为一句完整文本,将补充为一句完整文本后得到的文本的结尾位置确定为第一截取位置。Step S1204: Complement the last sentence of text as a complete sentence of text, and determine the end position of the text obtained by adding a complete sentence of text as the first interception position.
步骤S1205:将预设长度个字符的结尾位置确定为第一截取位置。Step S1205: Determine the end position of the preset length characters as the first interception position.
步骤S1206:判断第一截取位置之前的文本的字符长度是否不大于预设的长度范围的上限。Step S1206: Determine whether the character length of the text before the first clipping position is not greater than the upper limit of the preset length range.
若步骤S1206的判断结果为是,则执行步骤S1207,若步骤S1206的判断结果为否,则执行步骤S1208。If the judgment result of step S1206 is yes, then step S1207 is executed, and if the judgment result of step S1206 is no, then step S1208 is executed.
步骤S1207:将将目标段落中位于第一截取位置之前的文本确定为搜索结果的摘要。Step S1207: Determine the text in the target paragraph before the first interception position as the abstract of the search result.
例如,如图11所示,将最后一句文本补充为一句完整文本,最后一句文本变为“好评不断的《奔向太阳》是一部经典电影”,通过将最后一句文本补充为一句完整文本得到的搜索结果的摘要如图11中的11d所示。For example, as shown in Figure 11, the last sentence of text is supplemented as a complete text, and the last sentence of text becomes "Towards the Sun, which has been well received, is a classic movie", by adding the last sentence of text to a complete text to get A summary of the search results for is shown in 11d in Figure 11.
步骤S1208:删除最后一句文本,将删除最后一句文本后得到的文本的结尾位置确定为第三截取位置。Step S1208: Delete the last sentence of text, and determine the end position of the text obtained after deleting the last sentence of text as the third interception position.
步骤S1209:将目标段落中位于第三截取位置之前的文本确定为所述搜索结果的摘要。Step S1209: Determine the text in the target paragraph before the third clipping position as the abstract of the search result.
步骤S1208、步骤S1209的具体过程可以参考实施例一的步骤S415、步骤S416、步骤S413,此处不再详细说明。For the specific processes of step S1208 and step S1209, reference may be made to step S415, step S416, and step S413 in the first embodiment, and details are not described here.
当用户阅读到语义不完整的文本时,用户通常更希望获取到不完整文本的完整内容,以更好地理解文本,本实施例将语句补充完整,使用户阅读到更完整的信息。When a user reads a text with incomplete semantics, the user usually prefers to obtain the complete content of the incomplete text to better understand the text. In this embodiment, the sentences are supplemented so that the user can read more complete information.
下面,对本申请实施例提供的搜索结果的摘要确定方法进行详细介绍。Below, the method for determining the abstract of the search result provided by the embodiment of the present application will be described in detail.
图14为本申请实施例提供的搜索结果的摘要确定方法一种流程示意图。如图14所示,本申请实施例提供的搜索结果的摘要确定方法包括以下步骤S1410~步骤S1480。图14所示的实施例中的服务器可以是上述服务器320,客户端可以是上述客户端310。FIG. 14 is a schematic flowchart of a method for determining an abstract of a search result provided by an embodiment of the present application. As shown in FIG. 14 , the method for determining an abstract of a search result provided by this embodiment of the present application includes the following steps S1410 to S1480. The server in the embodiment shown in FIG. 14 may be the foregoing server 320 , and the client may be the foregoing client 310 .
步骤S1410:服务器接收客户端发送的搜索查询。Step S1410: The server receives the search query sent by the client.
用户可以在客户端310输入搜索查询,客户端310在接收到用户输入的搜索查询后,可以通过网络330(例如有线和/或无线网络)发送给服务器320,服务器320即可接收到该搜索查询。例如,如图2所示,用户可以在浏览器的搜索页面的搜索框201中输入“木地板”这个关键字,该关键字即搜索查询,搜索页面的搜索框201即搜索引擎的用于供用户进行信息输入的接口。The user can input a search query on the client 310, and after receiving the search query input by the user, the client 310 can send the search query to the server 320 through the network 330 (eg, wired and/or wireless network), and the server 320 can receive the search query. . For example, as shown in FIG. 2, the user can input the keyword "wooden floor" in the search box 201 of the search page of the browser, the keyword is the search query, and the search box 201 of the search page is the search engine's An interface for users to input information.
步骤S1420:服务器查找与上述搜索查询匹配的至少一个搜索结果。Step S1420: The server searches for at least one search result matching the above search query.
本申请实施例中,搜索结果可以是任何机器可读和可存储的文档,例如,搜索结果可以是电子邮件、新闻、博客、企业目录、印刷文本的电子版本、网页等等。用户通过互联网查询信息时,搜索结果通常是网页。搜索结果通常包括文本信息,还可以包括图像、超链接、音频、视频等的嵌入信息以及脚本语言等的嵌入指令。In this embodiment of the present application, the search results may be any machine-readable and storable documents, for example, the search results may be emails, news, blogs, business directories, electronic versions of printed texts, web pages, and the like. When users search for information on the Internet, the search results are usually web pages. The search results usually include text information, and may also include embedded information such as images, hyperlinks, audio, video, etc., and embedded instructions such as scripting languages.
本申请实施例中,服务器可以从搜索引擎的索引数据库中查找与上述搜索查询匹配的至少一个搜索结果。其中,索引数据库中存储有各个网页的索引信息,根据该索引信息可以快速查找到符合特定条件的网页,例如,该索引信息可以包括各个分词对应的网页的URL和网页的关键词,还可以包括各个分词对应的网页的其他信息。服务器在进行网页搜索的过程中,可以将爬取到的各个网页的URL、网页的超级文本标记语言(hyper text markup language,HTML)代码、网页标题等网页的各个信息存储在信息数据库中,并在索引数据库中建立各个网页的索引信息。其中,信息数据库中存储着网页的完整信息,当用户查看某一网页时,服务器可以从信息数据库中获取该网页的完整信息,并将该网页 的完整信息发送给客户端,使客户端显示该网页的完整内容。In this embodiment of the present application, the server may search for at least one search result matching the above search query from the index database of the search engine. Wherein, the index database stores the index information of each webpage, and according to the index information, webpages that meet specific conditions can be quickly found. For example, the index information can include the URL of the webpage corresponding to each word segment and the keyword of the webpage, and can also include Other information about the webpage corresponding to each word segment. In the process of web page search, the server can store the URL of each web page, the hyper text markup language (HTML) code of the web page, the title of the web page and other web page information in the information database. The index information of each web page is established in the index database. The complete information of the webpage is stored in the information database. When the user views a webpage, the server can obtain the complete information of the webpage from the information database, and send the complete information of the webpage to the client, so that the client can display the complete information of the webpage. The complete content of the web page.
服务器可以在接收到上述搜索查询后查找搜索结果。搜索结果可以是与用户输入的搜索查询匹配的网页。搜索结果与搜索查询匹配,可以是搜索结果的标题包含用户输入的关键词,或者搜索结果的标题包含用户输入的关键字的近义词,或者搜索结果包含用户输入的全部或者部分文本,或者搜索结果所属的领域与用户输入的搜索查询所属的领域相同或相近,搜索结果也可以是其他与搜索查询匹配的情况。例如,如图2所示,若用户搜索时输入的关键词为“木地板”,则服务器可以将标题包含“木地板”的网页确定为搜索结果。The server may look up search results after receiving the above search query. The search results may be web pages that match the search query entered by the user. The search result matches the search query. It can be that the title of the search result contains the keyword entered by the user, or the title of the search result contains a synonym of the keyword entered by the user, or the search result contains all or part of the text entered by the user, or the search result belongs to The field is the same or similar to the field to which the search query entered by the user belongs, and the search result may also be other cases that match the search query. For example, as shown in FIG. 2 , if the keyword input by the user during the search is "wooden floor", the server may determine the web page whose title contains "wooden floor" as the search result.
步骤S1430:服务器为查找到的各个搜索结果评分,并按分数由高到低的顺序对查找到的各个搜索结果进行排序。Step S1430: The server scores each of the found search results, and sorts each of the found search results in descending order of scores.
其中,分数越高表明搜索结果与搜索查询的匹配度越高。Among them, a higher score indicates a better match between the search result and the search query.
具体的,搜索结果为网页时,服务器可以根据以下条件中的至少一项对搜索到的网页进行打分:Specifically, when the search result is a web page, the server may score the searched web page according to at least one of the following conditions:
网页中搜索查询出现的次数(例如出现次数越多,分数越高);The number of occurrences of the search query in the webpage (e.g., the more occurrences, the higher the score);
搜索查询在网页中出现的位置(例如在标题中出现评分较高);where the search query appears in the web page (e.g. in the title with a higher rating);
网页被浏览的次数(例如浏览次数越多,分数越高)等。The number of times the web page was viewed (for example, the more times the page was viewed, the higher the score), etc.
步骤S1440:服务器按照从前到后的顺序获取第一预设数量个搜索结果中每一个的文本信息,并从获取的文本信息中确定目标段落。Step S1440: The server acquires the text information of each of the first preset number of search results in order from front to back, and determines the target paragraph from the acquired text information.
其中,针对每一个搜索结果,目标段落为该搜索结果的摘要所在段落。Wherein, for each search result, the target paragraph is the paragraph where the abstract of the search result is located.
服务器从获取的文本信息中确定目标段落,即服务器从获取的每一个文本信息中确定该文本信息对应的目标段落。The server determines the target paragraph from the acquired text information, that is, the server determines the target paragraph corresponding to the text information from each acquired text information.
上述第一预设数量可以是任一数量,例如,可以是10、50、100、200、500等,也可以是其他更多或更少数量,第一预设数量设置的越大,服务器确定出的搜索结果的条数越多,给用户呈现的搜索结果的条数越多,第一预设数量设置的越小,服务器确定出搜索结果的速度越快。The above-mentioned first preset number may be any number, for example, may be 10, 50, 100, 200, 500, etc., or may be other more or less number, the larger the first preset number is set, the server determines The more search results are output, the more search results are presented to the user, and the smaller the first preset number is, the faster the server determines the search results.
本申请实施例中,以搜索结果为网页为例,当服务器将爬取到的各个网页的完整信息存储在上述信息数据库中时,服务器可以从该信息数据库中获取搜索结果的文本信息。In the embodiment of the present application, taking the search result as a webpage as an example, when the server stores the complete information of each crawled webpage in the above-mentioned information database, the server can obtain the text information of the search result from the information database.
可选地,针对第一预设数量个搜索结果中的每一个,服务器可以按以下确定方式中的任意一种确定目标段落。Optionally, for each of the first preset number of search results, the server may determine the target paragraph in any one of the following determination manners.
确定方式一:服务器根据搜索结果的文本信息中的回车符对文本信息进行分段,对每个段落进行评分,按照分数从高到低的顺序选择至少一个段落确定为目标段落。可以理解的是,分段后的有些段落的字符长度很短,这种情况下,一个段落可能无法满足摘要截取的要求,所以,目标段落可能包括一个段落,也可能包括两个或更多个段落。Determination method 1: The server segments the text information according to the carriage return in the text information of the search result, scores each paragraph, and selects at least one paragraph in descending order of scores to determine the target paragraph. It is understandable that the character length of some paragraphs after the paragraph is very short. In this case, a paragraph may not meet the requirements of abstract interception. Therefore, the target paragraph may include one paragraph or two or more. paragraph.
这种确定方式确定的目标段落与用户的搜索查询的匹配度较高。The target paragraph determined in this way has a high degree of matching with the user's search query.
确定方式二:根据搜索结果的文本信息中的回车符对文本信息进行分段,对每个段落进行评分,服务器将评分最高的段落以及位于该评分最高的段落后的、与该评分最高的段落相邻的至少一个段落确定为目标段落。Determination method 2: Segment the text information according to the carriage return in the text information of the search result, and score each paragraph. At least one paragraph adjacent to the paragraph is determined as the target paragraph.
这种方式确定的目标段落既能与用户的搜索查询有较高的匹配度,当目标段落包括两个或更多个时,又可以使目标段落包括的各个段落的连贯性更好,从而使确定出的搜索结果的摘要的语义连贯性与可读性更好。The target paragraph determined in this way can not only have a high degree of matching with the user's search query, and when the target paragraph includes two or more paragraphs, the coherence of each paragraph included in the target paragraph can be better, so that the Semantic coherence and readability of summaries of identified search results are better.
确定方式三:服务器根据搜索结果的文本信息中的回车符对文本信息进行分段,将搜索结果的文本信息的起始段落确定为目标段落。这种方式可以快速、方便地确定出摘要所在段落。The third determination method: the server segments the text information according to the carriage return in the text information of the search result, and determines the starting paragraph of the text information of the search result as the target paragraph. This method can quickly and easily determine the paragraph in which the abstract is located.
服务器也可以按其他方式确定摘要所在段落,本申请不具体限定。The server may also determine the paragraph in which the abstract is located in other ways, which is not specifically limited in this application.
在上述确定方式一与确定方式二中,分段的分数越高表明该分段与搜索查询的匹配度越高,因此,服务器可以基于以下因素对每个分段进行评分:分段中搜索查询出现的次数(例如出现次数越多,分数越高)、分段在搜索结果中所处的位置(例如,若分段为标题或者首段,分数较高),分段的评分方式此处不再赘述。In the above determination method 1 and determination method 2, the higher the score of the segment, the higher the matching degree of the segment and the search query. Therefore, the server can score each segment based on the following factors: the search query in the segment The number of occurrences (for example, the more occurrences, the higher the score), the position of the segment in the search results (for example, if the segment is the title or the first paragraph, the score is higher), the scoring method of the segment is not here. Repeat.
步骤S1450:服务器从上述目标段落的起始位置开始确定预设长度个字符。Step S1450: The server determines a preset length of characters from the starting position of the above-mentioned target paragraph.
服务器确定预设长度个字符,可以是确定预设长度个中文字符,或者确定预设长度个英文字符,或者确定预设长度个韩文字符,或者确定预设长度个其他语言的文本,或者是确定预设长度个多种语言混合的字符。The server determines the preset length of characters, which may be to determine the preset length of Chinese characters, or to determine the preset length of English characters, or to determine the preset length of Korean characters, or to determine the preset length of text in other languages, or to determine Preset lengths of characters in a mix of languages.
在一种实施方式中,搜索结果的文本信息为不同的语言种类所对应的预设长度可以相同,如实施例一的步骤S402中所举示例,这样,可以使本申请提供的方案的适用性更强。对于不同种类的语言,相同长度、相同字号的字符在网页中所占的行数、空间常常是不同的,所以,在另一种实施方式中,针对不同语言种类的搜索结果的文本信息,预设长度也可以不同。本领域技术人员可以根据实际情况设置预设长度,此处不具体限定。预设长度例如可以是100个至200个字符中的任一长度,也可以是其他数量个字符的长度。当预设长度设置的较长时,确定出的摘要的字符数量也较多,使用户可以更多地了解搜索结果;当预设长度设置的较短时,确定出的摘要的字符数量也较少,每一条搜索结果在客户端的显示器上显示时所占的网页的空间也更少,使同一网页能够展示更多条搜索结果,本领域技术人员可以根据实际需求设置预设长度的具体值。In an implementation manner, the text information of the search result may have the same preset lengths corresponding to different language types, such as the example in step S402 of the first embodiment, so that the applicability of the solution provided in this application can be improved. stronger. For different languages, the number of lines and spaces occupied by characters of the same length and the same font size in the web page are often different. Therefore, in another implementation manner, for the text information of the search results in different languages, pre- The length can also be different. Those skilled in the art can set the preset length according to the actual situation, which is not specifically limited here. The preset length may be, for example, any length from 100 to 200 characters, or may be a length of other numbers of characters. When the preset length is set longer, the number of characters in the determined summary is also larger, so that users can learn more about the search results; when the preset length is set shorter, the number of characters in the determined summary is also larger. Each search result occupies less space on the webpage when displayed on the display of the client, so that the same webpage can display more search results. Those skilled in the art can set the specific value of the preset length according to actual needs.
步骤S1460:服务器对所确定的预设长度个字符的最后一句文本进行评估,得到第一评估结果。Step S1460: The server evaluates the determined last sentence of text with a preset length of characters to obtain a first evaluation result.
本申请中,可以按实施例二的步骤S703确定出预设长度个字符的最后一句文本,再对最后一句文本进行评估。In this application, the last sentence of text with a preset length of characters may be determined according to step S703 of the second embodiment, and then the last sentence of text is evaluated.
本申请实施例中,对从上述目标段落的起始位置开始确定的预设长度个字符的最后一句文本进行评估,可以是如实施例一中步骤S403评估最后一句文本对应单词数,或者是步骤S404评估最后一句文本对应的字符长度,也可以是如实施例二中步骤S704评估最后一句文本的末尾是否是不完整的主题信息,也可以是如实施例三中步骤S1203评估最后一句文本的语义完整性。In this embodiment of the present application, evaluating the last sentence of text with a preset length of characters determined from the starting position of the above-mentioned target paragraph may be, as in step S403 in the first embodiment, evaluating the number of words corresponding to the last sentence of text, or step S404 evaluates the character length corresponding to the last sentence of text. It can also be to evaluate whether the end of the last sentence of text is incomplete topic information as in step S704 in the second embodiment, or it can be to evaluate the semantics of the last sentence of text as in step S1203 in the third embodiment. completeness.
得到的第一评估结果可以是实施例一的步骤S403中的最后一句文本对应的单词数小于第二预设数量(即单词数较少)、也可以是实施例一的步骤S404中的最后一句文本对应的字符长度小于第一预设长度(即字符长度较短),也可以是实施例二的步骤S704中的最后一句文本的末尾为不完整的主题信息,也可以是实施例三的步骤S1203中的最后一句文本的语义不完整。得到的第一评估结果也可以是其他能够反映最后一句文本的完整性的结果,本申请不具体限定。The obtained first evaluation result may be that the number of words corresponding to the last sentence of text in step S403 of the first embodiment is less than the second preset number (that is, the number of words is less), or it may be the last sentence in step S404 of the first embodiment. The character length corresponding to the text is less than the first preset length (that is, the character length is shorter), it can also be that the end of the last sentence of text in step S704 of the second embodiment is incomplete theme information, or it can be the step of the third embodiment The semantics of the last sentence text in S1203 is incomplete. The obtained first evaluation result may also be other results that can reflect the integrity of the text of the last sentence, which is not specifically limited in this application.
步骤S1470:服务器根据上述第一评估结果,确定目标段落的第一截取位置。Step S1470: The server determines the first clipping position of the target paragraph according to the above-mentioned first evaluation result.
其中,第一截取位置可以为对目标段落的最后一句文本进行补充后得到的文本的结尾 位置,例如,第一截取位置也可以为实施例二的步骤S705将最后一句文本的末尾的不完整的主题信息补充完整后得到的文本的结尾位置,第一截取位置也可以为实施例三的步骤S1204将最后一句文本补充为一句完整文本后得到的文本的结尾位置。第一截取位置也可以为删除目标段落的最后一句文本后得到的文本的结尾位置,例如,第一截取位置可以为实施例一的步骤S405中删除最后一句文本后得到的文本的结尾位置。第一截取位置也可以为目标段落的最后一句文本的结尾位置,例如,第一截取位置可以是实施例二的步骤S708中或实施例三的步骤S1205中所确定的搜索结果的摘要。Wherein, the first interception position may be the end position of the text obtained by supplementing the last sentence of text of the target paragraph. The end position of the text obtained after the topic information is completely supplemented, and the first interception position may also be the end position of the text obtained after the last sentence of text is supplemented into a complete sentence in step S1204 of the third embodiment. The first interception position may also be the end position of the text obtained after deleting the last sentence of text of the target paragraph. For example, the first interception position may be the end position of the text obtained after deleting the last sentence of text in step S405 of the first embodiment. The first clipping position may also be the end position of the last sentence of the target paragraph. For example, the first clipping position may be the summary of the search result determined in step S708 of the second embodiment or step S1205 of the third embodiment.
本申请实施例中,对目标段落的最后一句文本进行补充,可以是如实施例三中的步骤S1204将最后一句文本补充为一句完整文本,也可以是如实施例二中的步骤S705将最后一句文本的末尾的主题信息补充完整。具体的,如实施例三中的所举示例的图11中的11d所示,将最后一句文本补充为一句完整文本,可以是将目标段落中位于上述最后一句文本的下一个第一标点符号前的文本补充至该最后一句文本的末尾。In this embodiment of the present application, to supplement the last sentence of the target paragraph, as in step S1204 in the third embodiment, the last sentence of text may be supplemented into a complete sentence, or as in step S705 in the second embodiment, the last sentence may be added to the text. The subject information at the end of the text is complete. Specifically, as shown in 11d in FIG. 11 of the example in the third embodiment, to supplement the last sentence of text as a complete sentence of text, it may be to add the target paragraph before the next first punctuation mark of the above-mentioned last sentence of text is added to the end of this last sentence of text.
在一种实施方式中,在确定目标段落的第一截取位置后,可以调整位于第一截取位置之前的文本,重新确定新的新的截取位置,使新的截取位置之前的文本对应的字符长度在预设的长度范围内,例如,可以按实施例一的步骤S408、步骤S409重新确定新的新的截取位置。该预设的长度范围可以是80个至350个字符,也可以是其他的字符长度范围,本申请不具体限定预设的长度范围,本领域技术人员根据实际需求设置即可。本实施方式中,位于第一截取位置之前的文本对应的字符长度若在预设的长度范围内,可以使确定的搜索结果的摘要的字符长度保持在合理的长度范围,不会过长导致页面呈现的搜索结果的条数过少,也不会过短导致摘要包含的有效信息过少。In one embodiment, after the first interception position of the target paragraph is determined, the text located before the first interception position can be adjusted, and a new new interception position can be re-determined, so that the text before the new interception position corresponds to the character length Within the preset length range, for example, a new cutting position can be re-determined according to steps S408 and S409 of the first embodiment. The preset length range may be 80 to 350 characters, and may also be other character length ranges. This application does not specifically limit the preset length range, and those skilled in the art can set it according to actual needs. In this implementation manner, if the character length corresponding to the text located before the first interception position is within the preset length range, the character length of the abstract of the determined search result can be kept within a reasonable length range, and the page length will not be too long. The number of displayed search results is too small, nor is it too short to cause the summary to contain too little valid information.
在确定目标段落的第一截取位置后,若目标段落中位于第一截取位置之前的文本对应的字符长度较短,则用户通过摘要很难了解搜索结果的大概内容。客户端或其他显示器上显示各条搜索结果的摘要时,每一条搜索结果的摘要所占的版面空间通常是确定好的,每一条搜索结果的摘要的字号、行间距、字间距等显示设置也是确定好的,若摘要所在段落中位于截取位置之前的文本对应的字符长度较长,会使确定出的摘要在客户端或其他显示器上无法完整显示,导致所显示的搜索结果的摘要完整度较差、显示效果不好。所以,本申请中设置目标段落中的位于截取位置之前的文本对应的字符长度在预设的长度范围内,既可以使用户通过摘要较好地了解搜索结果的大概内容,也可以使确定出的摘要在客户端或其他显示器上完整显示,使所显示的搜索结果的摘要完整度更好、显示效果更好。After the first clipping position of the target paragraph is determined, if the character length corresponding to the text in the target paragraph before the first clipping position is short, it is difficult for the user to know the general content of the search result through the abstract. When the summary of each search result is displayed on the client or other monitors, the layout space occupied by the summary of each search result is usually determined, and the display settings such as font size, line spacing, and word spacing of the summary of each search result are also OK, if the length of the text corresponding to the text in the paragraph where the abstract is located before the cut-off position is long, the determined abstract will not be fully displayed on the client or other monitors, resulting in a less complete abstract of the displayed search results. Poor, the display effect is not good. Therefore, in this application, the character length corresponding to the text located before the interception position in the target paragraph is set within the preset length range, so that the user can better understand the general content of the search result through the abstract, and the determined The summary is completely displayed on the client or other displays, so that the summary of the displayed search results is more complete and the display effect is better.
本申请实施例中,当上述第一评估结果是最后一句文本对应的字符数或单词数较少时,如实施例一的步骤S403、步骤S404的评估结果为否时,或者最后一句文本的语义不完整时,如实施例一的步骤S412的评估结果为否时、实施例三的步骤S1203的评估结果为否时,或者最后一句文本的末尾为不完整的主题信息时,如实施例一的步骤S410的评估结果为是时、实施例二的步骤S704的评估结果为是时,可以删除上述最后一句文本,如实施例一的步骤S405,也可以将上述最后一句文本进行补充,如实施例二的步骤S705、实施例三的步骤S1204,将删除最后一句文本后得到的文本或者进行补充后得到的文本的结尾位置确定为目标段落的第一截取位置。当上述第一评估结果是最后一句文本对应的字符数或单词数不少,或者最后一句文本的语义完整,或者最后一句文本的末尾不为不完整的主题信息时,可以将上述最后一句文本的结尾位置确定为目标段落的第一截取位置,如 实施例二的步骤S708和实施例三的步骤S1205。本申请实施例中,可以根据第一评估结果灵活确定摘要所在段落的第一截取位置,本领域技术人员可以基于搜索结果的摘要的完整度更好、显示效果更好的原则确定摘要所在段落的第一截取位置,本申请不限定具体的第一截取位置确定方式。In this embodiment of the present application, when the above-mentioned first evaluation result is that the number of characters or words corresponding to the last sentence of text is small, such as when the evaluation results of steps S403 and S404 in Embodiment 1 are no, or the semantics of the last sentence of text is negative When it is incomplete, such as when the evaluation result of step S412 in the first embodiment is no, when the evaluation result of step S1203 in the third embodiment is no, or when the end of the last sentence of text is incomplete topic information, as in the first embodiment When the evaluation result of step S410 is yes, and when the evaluation result of step S704 of the second embodiment is yes, the above-mentioned last sentence of text can be deleted, as in step S405 of the first embodiment, and the above-mentioned last sentence of text can also be supplemented, as in the embodiment In step S705 of the second embodiment and step S1204 of the third embodiment, the end position of the text obtained after deleting the last sentence of text or the text obtained after supplementing is determined as the first interception position of the target paragraph. When the above-mentioned first evaluation result is that the number of characters or words corresponding to the last sentence of text is quite large, or the semantics of the last sentence of text is complete, or the end of the last sentence of text is not incomplete topic information, the above-mentioned last sentence of text may be The end position is determined as the first interception position of the target paragraph, as shown in step S708 of the second embodiment and step S1205 of the third embodiment. In this embodiment of the present application, the first interception position of the paragraph where the abstract is located can be flexibly determined according to the first evaluation result, and those skilled in the art can determine the location of the paragraph where the abstract is located based on the principles of better completeness and better display effect of the abstract in the search result. For the first interception position, the present application does not limit a specific manner of determining the first interception position.
步骤S1480:服务器根据上述第一截取位置,从目标段落中确定搜索结果的摘要。Step S1480: The server determines the abstract of the search result from the target paragraph according to the above-mentioned first clipping position.
步骤S1480中确定的搜索结果的摘要对应的字符长度在预设的长度范围内。The character length corresponding to the abstract of the search result determined in step S1480 is within a preset length range.
具体的,服务器可以将目标段落中位于第一截取位置之前的文本确定为搜索结果的摘要。例如,实施例二中的步骤S707、实施例三中的步骤S1207为将目标段落中位于第一截取位置之前的文本确定为搜索结果的摘要的示例。Specifically, the server may determine the text in the target paragraph before the first clipping position as the abstract of the search result. For example, step S707 in the second embodiment and step S1207 in the third embodiment are examples of determining the text in the target paragraph before the first interception position as the abstract of the search result.
或者,服务器也可以根据第一截取位置之前的文本,确定另一截取位置,将目标段落中位于另一截取位置之前的文本确定为搜索结果的摘要。例如,实施例一的步骤S405~步骤S416为根据第一截取位置之前的文本确定另一截取位置,将目标段落中位于另一截取位置之前的文本确定为搜索结果的摘要的示例,实施例一中的最新的截取位置即所确定的另一截取位置。目标段落中位于第一截取位置之前的文本有时仍无法较好地呈现摘要,例如,文本过长或过短、文本语义仍不完整等,这种情况下,可以根据第一截取位置之前的文本进一步确定另一截取位置,其中,另一截取位置之前的文本能够较好地呈现摘要,这样,可以进一步使确定出的摘要的呈现效果更好。Alternatively, the server may also determine another clipping position according to the text before the first clipping position, and determine the text in the target paragraph before the other clipping position as the abstract of the search result. For example, steps S405 to S416 in the first embodiment are to determine another interception position according to the text before the first interception position, and determine the text in the target paragraph before another interception position as an example of the abstract of the search result, the first embodiment The latest interception position in is another determined interception position. Sometimes the text in the target paragraph before the first interception position still cannot present the abstract well, for example, the text is too long or too short, the text semantics is still incomplete, etc. In this case, the text before the first interception position can be Another clipping position is further determined, wherein the text before the other clipping position can better present the abstract, so that the determined abstract can be further presented in a better effect.
本申请实施例提供的方案对从目标段落的起始位置开始的预设长度个字符的最后一句文本进行了评估,根据评估得到的第一评估结果确定了第一截取位置,可见,本申请实施例根据第一评估结果灵活确定第一截取位置,可以使所确定的搜索结果的摘要的最后一句文本为语义较为完整的一句文本,提高了对用户是否浏览网页的参考价值,当搜索结果的摘要显示在客户端时,呈现效果更好,减少了用户无效阅读量,提高了用户的阅读体验,提升了用户点击摘要对应的网页的概率。另外,本申请实施例中确定的第一截取位置为对最后一句文本进行补充后得到的文本的结尾位置、最后一句文本的结尾位置、或者删除最后一句文本后得到的文本的结尾位置,可见本申请实施例根据第一评估结果的不同,所确定的第一截取位置也有所不同,进一步灵活地确定出的第一截取位置,使确定出的摘要的呈现效果更好。The solution provided in the embodiment of the present application evaluates the last sentence of text with a preset length of characters starting from the starting position of the target paragraph, and determines the first interception position according to the first evaluation result obtained from the evaluation. It can be seen that the implementation of the present application For example, according to the first evaluation result, the first interception position can be flexibly determined, so that the last sentence of the abstract of the determined search result can be a sentence of text with relatively complete semantics, which improves the reference value for whether the user browses the web page. When displayed on the client, the rendering effect is better, the amount of invalid reading by the user is reduced, the reading experience of the user is improved, and the probability of the user clicking on the webpage corresponding to the abstract is increased. In addition, the first interception position determined in the embodiment of the present application is the end position of the text obtained after supplementing the last sentence of text, the end position of the last sentence of text, or the end position of the text obtained after deleting the last sentence of text. According to the different first evaluation results, the first interception positions determined in the application embodiment are also different, and the determined first interception positions are further flexibly determined, so that the determined abstract has a better presentation effect.
在一种实施方式中,步骤S1460中,可以按以下评估方式中的任意一种对最后一句文本进行评估,得到第一评估结果。In an embodiment, in step S1460, the last sentence of text can be evaluated in any one of the following evaluation manners to obtain the first evaluation result.
评估方式一:评估最后一句文本对应的字符长度是否小于第一预设长度。Evaluation method 1: Evaluate whether the character length corresponding to the last sentence of text is less than the first preset length.
评估方式一的具体评估过程可以参考实施例一的步骤S404。For the specific evaluation process of the first evaluation method, reference may be made to step S404 in the first embodiment.
基于评估方式一的评估结果为是,第一截取位置可以为:删除最后一句文本后得到的文本的结尾位置,如实施例一的步骤S405~步骤S406。The evaluation result based on the first evaluation method is yes, and the first interception position may be: the end position of the text obtained after deleting the last sentence of text, as shown in steps S405 to S406 of the first embodiment.
通过评估方式一,可以快速地评估出最后一句文本的语义完整性较低以及对用户理解网页的参考价值较小,从而可以在确定第一截取位置时,快速确定出将预设长度个字符的最后一句删除,更快速地确定出对用户显示效果更好的搜索结果的摘要。Through the first evaluation method, it can be quickly evaluated that the semantic integrity of the last sentence of the text is low and the reference value for the user to understand the web page is small, so that when determining the first interception position, it can quickly determine the preset length of characters. The last sentence is removed to more quickly identify snippets that show better search results to users.
评估方式二:评估最后一句文本对应的单词数是否小于第二预设数量。Evaluation method 2: Evaluate whether the number of words corresponding to the last sentence of text is less than the second preset number.
评估方式二的具体评估过程可以参考实施例一的步骤S403。For the specific evaluation process of the second evaluation method, reference may be made to step S403 of the first embodiment.
基于评估方式二的评估结果为是,第一截取位置可以为:删除最后一句文本后得到的 文本的结尾位置,如实施例一的步骤S405~步骤S406。The evaluation result based on evaluation method 2 is yes, and the first interception position may be: the end position of the text obtained after deleting the last sentence of text, such as steps S405 to S406 in the first embodiment.
评估方式二也能够更快速地确定出对用户显示效果更好的搜索结果的摘要,且由于一句文本包含的单词数通常更能体现一句文本的语义完整性,因此,评估最后一句文本的单词的数量,可以更准确地评估出最后一句文本的语义完整性和对用户理解网页的参考价值。通过评估方式二,可以快速地评估出最后一句文本的语义完整性较低以及对用户理解网页的参考价值较小,从而可以在确定第一截取位置时,快速确定出将预设长度个字符的最后一句删除,更快速地确定出对用户显示效果更好的搜索结果的摘要。Evaluation method 2 can also more quickly determine the summary of search results that are better displayed to the user, and because the number of words contained in a sentence of text usually better reflects the semantic integrity of a sentence of text, therefore, evaluate the words of the last sentence of text. Quantity, can more accurately evaluate the semantic integrity of the last sentence of the text and the reference value for users to understand the web page. Through the evaluation method 2, it can be quickly evaluated that the semantic integrity of the last sentence of the text is low and the reference value for the user to understand the web page is small, so that when determining the first interception position, it can quickly determine the preset length of characters. The last sentence is removed to more quickly identify snippets that show better search results to users.
评估方式三:评估最后一句文本的除字和词以外的字符占比是否大于预设比例。Evaluation method 3: Evaluate whether the proportion of characters other than words and words in the last sentence of text is greater than the preset proportion.
上述词可以包括中文的单字词和词组、英文的单词,也可以包括其他语言的单字词和词组,上述字可以包括汉字。对于文本信息,除字和词以外的字符可以包括:数字、标点符号、数学符号、货币符号、校勘符号、辞书符号、注音符号中的至少一种,除字和词以外的字符还可以包括其他符号,本申请不具体限定。上述预设比例可以是40%至60%中的任一比例,也可以是其他较大的比例。The above-mentioned words may include single-character words and phrases in Chinese, English words, and may include single-character words and phrases in other languages, and the above-mentioned words may include Chinese characters. For text information, characters other than words and words may include: at least one of numbers, punctuation marks, mathematical symbols, currency symbols, collation symbols, lexicon symbols, and phonetic symbols, and characters other than words and words may also include other Symbols are not specifically limited in this application. The above preset ratio may be any ratio from 40% to 60%, or may be other larger ratios.
基于评估方式三的评估结果为是,第一截取位置可以为:删除最后一句文本后得到的文本的结尾位置。The evaluation result based on evaluation method 3 is yes, and the first interception position may be: the end position of the text obtained after deleting the last sentence of text.
用户通常通过有实际语义的字和词来更充分地理解摘要,摘要中的除字和词以外的字符对用户理解摘要内容的帮助通常较小,因此,文本中的字和词通常是用户理解摘要的较有效的信息,评估方式三可以评估出最后一句文本中的有效信息的占比较少,在确定第一截取位置时,能够快速确定出将预设长度个字符的最后一句文本删除。Users usually understand abstracts more fully through words and phrases that have actual semantics. Characters other than words and words in abstracts are usually less helpful for users to understand abstract content. Therefore, words and words in texts are usually the ones that users understand. The more effective information of the abstract, the third evaluation method can evaluate that the proportion of effective information in the last sentence of text is small, and when the first interception position is determined, it can be quickly determined to delete the last sentence of text with a preset length of characters.
评估方式四:评估上述最后一句文本的末尾是否为不完整的主题信息。Evaluation method 4: Evaluate whether the end of the text in the last sentence above is incomplete topic information.
评估方式四的具体评估过程可以参考实施例二的步骤S704。For the specific evaluation process of the fourth evaluation mode, reference may be made to step S704 of the second embodiment.
基于评估方式四的评估结果为是,第一截取位置可以为将最后一句文本的末尾的主题信息补充完整后得到的文本的结尾位置。The evaluation result based on evaluation method 4 is that the first interception position may be the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
评估方式四中对最后一句文本进行评估,即评估最后一句文本末尾的主题信息,得到的第一评估结果即最后一句文本的末尾为不完整的主题信息。用户在阅读文本时,对于影视剧名称、书名等主题信息,用户通常更希望阅读这些主题信息的完整内容,以更好地理解文本,评估方式四可以评估出最后一句的末尾为不完整的主题信息,在确定第一截取位置时,可以根据评估方式四的第一评估结果将第一截取位置确定为能够使上述主题信息完整的位置,从而使用户更好地理解文本。In the evaluation method 4, the last sentence of text is evaluated, that is, the subject information at the end of the last sentence of text is evaluated, and the first evaluation result obtained is that the end of the last sentence of text is incomplete subject information. When users are reading the text, they usually prefer to read the full content of the subject information such as movie and TV series titles, book titles, etc. to better understand the text. Evaluation method 4 can evaluate that the end of the last sentence is incomplete. For the subject information, when determining the first interception position, the first interception position can be determined according to the first evaluation result of the fourth evaluation method as a position that can complete the above subject information, so that the user can better understand the text.
评估方式五:评估上述最后一句文本的语义是否不完整。Evaluation method five: evaluate whether the semantics of the last sentence above is incomplete.
评估方式五的具体评估过程可以参考实施例三的步骤S1203。For the specific evaluation process of the fifth evaluation mode, reference may be made to step S1203 of the third embodiment.
基于评估方式五的评估结果为是,第一截取位置可以为:将最后一句文本补充为一句完整文本后得到的文本的结尾位置。The evaluation result based on evaluation method 5 is yes, and the first interception position may be: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
评估方式五中对最后一句文本进行评估,即评估最后一句文本的语义,得到的第一评估结果即最后一句文本的语义不完整。当一句文本的语义不完整时,用户阅读了这句文本后通常也无法得知这句文本表示的具体内容,因此,评估方式五评估出最后一句的语义不完整,在确定第一截取位置时,可以根据评估方式五的第一评估结果将最后一句文本补充完整,便于用户阅读文本。In the fifth evaluation method, the last sentence of text is evaluated, that is, the semantics of the last sentence of text is evaluated, and the first evaluation result obtained is that the semantics of the last sentence of text is incomplete. When the semantics of a sentence of text is incomplete, the user usually cannot know the specific content of the text after reading the text. Therefore, the evaluation method 5 evaluates that the semantics of the last sentence is incomplete. When determining the first interception position , the text of the last sentence can be completed according to the first evaluation result of evaluation method 5, which is convenient for users to read the text.
评估方式六:评估上述最后一句文本的末尾是否为标点符号。Evaluation method 6: Evaluate whether the end of the text of the last sentence above is a punctuation mark.
评估方式六中的标点符号可以是点号。Punctuation marks in assessment method 6 can be dots.
基于评估方式六的评估结果为是,第一截取位置可以为:删除最后一句文本末尾的标点符号后得到的文本的结尾位置。The evaluation result based on evaluation method 6 is yes, and the first interception position may be: the end position of the text obtained by deleting the punctuation mark at the end of the text of the last sentence.
摘要在显示时,摘要末尾若为标点符号,这个标点符号对用户的阅读理解的参考价值是很小的,并且末尾的标点符号也可能影响摘要在客户端显示效果,因此,当评估出最后一句文本的末尾为标点符号时,可以将末尾的标点符号删除,以使摘要的显示效果更好。When the abstract is displayed, if there is a punctuation mark at the end of the abstract, the reference value of this punctuation mark to the user's reading comprehension is very small, and the punctuation mark at the end may also affect the display effect of the abstract on the client side. Therefore, when evaluating the last sentence When there is a punctuation mark at the end of the text, you can delete the punctuation mark at the end to make the summary display better.
从上述评估方式一、评估方式二、评估方式三和评估方式六可以看出,这四个评估方式不对最后一句文本的语义进行分析,而是直接从最后一句包含的字符数、单词数或者词的比例、标点符号等句子的格式方面进行评估,因此,可以将评估方式一、评估方式二、评估方式三和评估方式六概括为格式评估。It can be seen from the above evaluation methods 1, 2, 3 and 6 that these four evaluation methods do not analyze the semantics of the last sentence, but directly determine the number of characters, words or words contained in the last sentence. Therefore, the evaluation method 1, evaluation method 2, evaluation method 3 and evaluation method 6 can be summarized as format evaluation.
本申请实施例中,也可以采用其他评估方式对预设长度个字符的最后一句文本进行评估,其中,评估得到的结果能够反映最后一句文本的语义的完整性即可,具体评估方式不进行限定。第一截取位置根据第一评估结果确定,第一评估结果不同,确定出的第一截取位置可能相同,也可能不同。In the embodiment of the present application, other evaluation methods may also be used to evaluate the last sentence of text with a preset length of characters, wherein the evaluation result can reflect the semantic integrity of the last sentence of text, and the specific evaluation method is not limited . The first interception position is determined according to the first evaluation result. If the first evaluation result is different, the determined first interception position may be the same or different.
上述实施方式中,当最后一句文本对应的字符长度小于第一预设长度时,或者最后一句文本对应的单词数小于第二预设数量时,或者最后一句文本的除词以外的字符占比大于预设比例时,说明最后一句文本很大概率上是一句语义不完整的话,或者是一句未表达任何实际语义的话,所以,最后一句文本对用户理解搜索结果是没有太多参考价值的,这种情况下,将最后一句删除,可以删除参考价值较小的语句,使确定出的摘要的完整性更好,摘要展示在客户端上时,用户的阅读体验更好。并且,由于最后一句文本的字符长度或单词数较少,当直接将最后一句文本删除时,所得到的摘要的总的字符长度与上述预设长度的差距也较小,对摘要在客户端或其他显示器显示的影响也较小。In the above embodiment, when the length of characters corresponding to the last sentence of text is less than the first preset length, or when the number of words corresponding to the last sentence of text is less than the second preset number, or the proportion of characters other than words in the last sentence of text is greater than When the preset ratio is used, it means that the last sentence of text is likely to be a sentence with incomplete semantics, or a sentence that does not express any actual semantics. Therefore, the last sentence of text has little reference value for users to understand the search results. In this case, the last sentence can be deleted, and the sentence with less reference value can be deleted, so that the completeness of the determined abstract is better, and the user's reading experience is better when the abstract is displayed on the client. In addition, since the character length or the number of words of the last sentence is small, when the last sentence is directly deleted, the difference between the total character length of the obtained abstract and the above preset length is also small. Other monitors show less impact as well.
当确定最后一句文本对应的字符长度小于第一预设长度时,或者最后一句文本对应的单词数小于第二预设数量时,也可以将最后一句文本补充为一句完整文本。这种方式也可以使确定出的摘要的完整性更好。When it is determined that the length of characters corresponding to the last sentence of text is less than the first preset length, or when the number of words corresponding to the last sentence of text is less than the second preset number, the last sentence of text may also be supplemented as a complete sentence of text. In this way, the completeness of the determined digest can also be improved.
上述的一句完整文本,可以理解为一句语义完整的文本,例如,可以是一句包含了主语、位于和宾语的文本。The above sentence of complete text can be understood as a sentence of text with complete semantics, for example, it can be a sentence of text including subject, location and object.
在一种实施方式中,当第一评估结果为确定最后一句文本对应的字符长度不小于第一预设长度时,或者最后一句文本对应的单词数不小于第二预设数量时,或者确定最后一句文本的除词以外的字符占比不大于预设比例时,或者当确定最后一句文本的末尾为完整的主题信息时,或者当确定最后一句文本的语义完整时,第一截取位置可以为上述最后一句文本的结尾位置,实施例二的步骤S708,实施例三的步骤S1205。当预设长度个字符的最后一句文本进行评估后,评估结果是最后一句文本的语义完整,此时,无需删除或补充最后一句的内容。本实施方式中,直接将上述最后一句文本的结尾位置确定为摘要所在段落的截取位置的原因是预设长度个字符的最后一句文本已经是语义较完整、显示效果较好的文本了。In one embodiment, when the first evaluation result is to determine that the length of characters corresponding to the last sentence of text is not less than the first preset length, or when the number of words corresponding to the last sentence of text is not less than the second preset number, or it is determined that the last sentence When the proportion of characters other than words in a sentence of text is not greater than the preset ratio, or when it is determined that the end of the last sentence of text is complete topic information, or when it is determined that the semantics of the last sentence of text are complete, the first interception position can be the above The end position of the last sentence of text is step S708 in the second embodiment, and step S1205 in the third embodiment. When the last sentence of text with a preset length of characters is evaluated, the result of the evaluation is that the semantics of the last sentence is complete. In this case, there is no need to delete or supplement the content of the last sentence. In this embodiment, the reason for directly determining the end position of the last sentence of text as the clipping position of the paragraph where the abstract is located is that the last sentence of text with a preset length of characters is already text with complete semantics and better display effect.
在一种实施方式中,搜索结果的摘要确定方法还可以包括以下步骤:将所确定的搜索结果的摘要发送给客户端,以使客户端将接收到的摘要显示在显示器上。In one embodiment, the method for determining the summary of the search result may further include the following step: sending the determined summary of the search result to the client, so that the client displays the received summary on the display.
在一种实施方式中,第一截取位置可以为:删除最后一句文本后得到的文本的结尾位 置。步骤S1480可以按以下步骤实现:确定目标段落中位于第一截取位置之前的文本对应的字符长度小于第二预设长度;确定第二截取位置,其中,所述第二截取位置为:对最后一句文本进行补充后得到的文本的结尾位置;根据第二截取位置,从目标段落中确定搜索结果的摘要。本实施方式的具体实施例可以参考实施例一的步骤S408、步骤S409和步骤S413,实施例一的步骤409中的最新的截取位置即上述第二截取位置。In an implementation manner, the first clipping position may be: the end position of the text obtained after deleting the last sentence of text. Step S1480 can be implemented by the following steps: determine that the length of the character corresponding to the text located before the first interception position in the target paragraph is less than the second preset length; determine the second interception position, wherein the second interception position is: for the last sentence The end position of the text obtained after the text is supplemented; according to the second interception position, the abstract of the search result is determined from the target paragraph. For a specific example of this implementation manner, reference may be made to step S408 , step S409 and step S413 of the first embodiment. The latest interception position in step 409 of the first embodiment is the above-mentioned second interception position.
上述第二预设长度可以为上述预设的长度范围的下限。本实施方式中,当第一截取位置为删除最后一句文本后得到的文本的结尾位置时,目标段落中位于第一截取位置之前的文本对应的字符长度若小于第二预设长度,说明第一截取位置之前的文本对应的字符长度过小,这种情况下,所确定的搜索结果的摘要包含的信息可能很少,使得用户无法从摘要中获知该搜索结果的有效信息,这种情况下,根据第二截取位置从目标段落中确定搜索结果的摘要,由于第二截取位置为对最后一句文本进行补充后得到的文本的结尾位置,可以增长得到的搜索结果的摘要对应的字符长度,使用户可以从摘要中获知搜索结果的更多信息。The above-mentioned second preset length may be the lower limit of the above-mentioned preset length range. In this embodiment, when the first interception position is the end position of the text obtained after deleting the last sentence of text, if the character length corresponding to the text in the target paragraph before the first interception position is less than the second preset length, it means that the first The length of the characters corresponding to the text before the interception position is too small. In this case, the summary of the determined search result may contain very little information, so that the user cannot obtain the effective information of the search result from the summary. In this case, The summary of the search result is determined from the target paragraph according to the second interception position. Since the second interception position is the end position of the text obtained after supplementing the last sentence of text, the length of characters corresponding to the obtained summary of the search result can be increased, so that the user can You can learn more about the search results from the abstract.
上述根据第二截取位置,从目标段落中确定搜索结果的摘要,可以是将目标段落中位于第二截取位置之前的文本确定为搜索结果的摘要,也可以是根据第二截取位置之前的文本进一步确定其他的截取位置,根据其他的截取位置确定搜索结果的摘要,直至所确定出的搜索结果的摘要的呈现效果满足需求。The above-mentioned, according to the second interception position, determines the summary of the search result from the target paragraph, which can be the text before the second interception position in the target paragraph is determined as the summary of the search result, or it can be further based on the text before the second interception position. Other interception positions are determined, and an abstract of the search result is determined according to the other interception positions, until the presentation effect of the determined abstract of the search result meets the requirements.
在一种具体实施方式中,上述根据第二截取位置,从所述目标段落中确定所述搜索结果的摘要,可以按以下步骤实现:确定目标段落中位于第二截取位置之前的文本对应的字符长度大于第三预设长度,将确定的预设长度个字符确定为搜索结果的摘要。其中,第三预设长度大于上述第二预设长度,第三预设长度可以是上述预设的长度范围的上限。例如,可以按实施例一的步骤S409、步骤S415、步骤S416、步骤S413确定搜索结果的摘要。In a specific embodiment, the above-mentioned determination of the abstract of the search result from the target paragraph according to the second interception position can be achieved by the following steps: determining the character corresponding to the text in the target paragraph before the second interception position If the length is greater than the third preset length, the determined preset length characters are determined as the abstract of the search result. The third preset length is greater than the second preset length, and the third preset length may be the upper limit of the preset length range. For example, the abstract of the search result may be determined according to steps S409, S415, S416, and S413 of the first embodiment.
本实施方式中,若目标段落中位于第二截取位置之前的文本对应的字符长度大于第三预设长度,说明第二截取位置之前的文本对应的字符长度过长,这样,可能使得页面呈现的搜索结果的条数过少,或者使所确定出的摘要无法完整显示,影响页面的排版,这种情况下,可以将确定的预设长度个字符确定为搜索结果的摘要,利于页面的排版,使确定出的摘要对应的字符长度在预设的长度范围内。也就是说,本实施方式在所确定的摘要的长度较短时,若进一步增加摘要的字符长度会导致摘要过长,则以所确定的预设长度个字符作为最后确定的摘要。In this implementation manner, if the character length corresponding to the text before the second interception position in the target paragraph is greater than the third preset length, it means that the character length corresponding to the text before the second interception position is too long. The number of search results is too small, or the determined summary cannot be displayed completely, which affects the typesetting of the page. In this case, the determined preset length characters can be determined as the summary of the search result, which is beneficial to the typesetting of the page. The character length corresponding to the determined abstract is within a preset length range. That is to say, in this embodiment, when the determined length of the digest is short, if further increasing the character length of the digest will cause the digest to be too long, the determined preset length characters are used as the final determined digest.
在一种实施方式中,第一截取位置为对最后一句文本进行补充后得到的文本的结尾位置。步骤S1480可以按以下步骤实现:确定目标段落中位于第一截取位置之前的文本对应的字符长度大于第三预设长度;确定第三截取位置,其中,第三截取位置为:删除最后一句文本后得到的文本的结尾位置;将目标段落中位于第三截取位置之前的文本确定为搜索结果的摘要。例如,可以按实施例三的步骤S1206、步骤S1208、步骤S1209确定搜索结果的摘要。In an implementation manner, the first clipping position is the end position of the text obtained by supplementing the last sentence of text. Step S1480 can be implemented according to the following steps: determine that the length of the character corresponding to the text located before the first interception position in the target paragraph is greater than the third preset length; determine the third interception position, wherein the third interception position is: after deleting the last sentence of text The end position of the obtained text; the text in the target paragraph before the third interception position is determined as the abstract of the search result. For example, the abstract of the search result may be determined according to steps S1206, S1208, and S1209 of the third embodiment.
第三预设长度可以为上述预设的长度范围的上限。本实施方式中,当第一截取位置为对最后一句文本进行补充后得到的文本的结尾位置时,目标段落中位于第一截取位置之前的文本对应的字符长度若大于第二预设长度,说明第一截取位置之前的文本对应的字符长度过长,这种情况下,由于每条搜索结果在显示页面中的显示空间通常是确定的,所以所 确定的搜索结果的摘要可能无法完整显示,这种情况下,根据第三截取位置从目标段落中确定搜索结果的摘要,由于第三截取位置为删除最后一句文本后得到的文本的结尾位置,可以缩短得到的搜索结果的摘要对应的字符长度,使所确定的搜索结果可以在显示页面中完整显示。The third preset length may be the upper limit of the above-mentioned preset length range. In this embodiment, when the first interception position is the end position of the text obtained by supplementing the last sentence of text, if the character length corresponding to the text in the target paragraph before the first interception position is greater than the second preset length, it means that The character length corresponding to the text before the first interception position is too long. In this case, since the display space of each search result on the display page is usually determined, the summary of the determined search result may not be displayed completely. In this case, the abstract of the search result is determined from the target paragraph according to the third interception position, since the third interception position is the end position of the text obtained after deleting the last sentence of text, the character length corresponding to the abstract of the obtained search result can be shortened, Enables the determined search results to be fully displayed in the display page.
在一种实施方式中,步骤S1470可以按以下步骤实现:确定第一评估结果为以下任一项:最后一句文本的末尾为完整的主题信息、最后一句文本的语义完整、最后一句文本对应的字符长度不小于第一预设长度、最后一句文本的除字和词以外的字符占比不大于预设比例;对最后一句文本进行评估,得到第二评估结果;根据第二评估结果,确定目标段落的第一截取位置。其中,第二评估结果与第一评估结果对应的评估内容不同。具体的,可以按照实施例二的步骤S706~步骤S709确定搜索结果的摘要,其中的步骤S706为对最后一句文本进行评估得到第二评估结果的过程,步骤S708和步骤S709为根据第二评估结果确定第一截取位置的过程。In one embodiment, step S1470 may be implemented in the following steps: determine that the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the characters corresponding to the last sentence of text The length is not less than the first preset length, and the proportion of characters other than words and phrases in the last sentence of text is not greater than the preset proportion; the last sentence of text is evaluated to obtain the second evaluation result; according to the second evaluation result, the target paragraph is determined the first intercept position. The second evaluation result is different from the evaluation content corresponding to the first evaluation result. Specifically, the summary of the search result can be determined according to steps S706 to S709 of the second embodiment, wherein step S706 is the process of evaluating the last sentence of text to obtain the second evaluation result, and steps S708 and S709 are based on the second evaluation result. The process of determining the first intercept position.
当第一评估结果为以最后一句文本的末尾为完整的主题信息、最后一句文本的语义完整、最后一句文本对应的字符长度不小于第一预设长度、或者最后一句文本的除字和词以外的字符占比不大于预设比例时,说明在对最后一句文本进行评估时,最后一句是满足评估要求的,即最后一句是满足一定的完整性要求的,这种情况下,为了使得到的摘要的完整性更好、使摘要的呈现效果更好,可以进一步对最后一句文本结果进行评估,得到第二评估结果。其中,第二评估结果与第一评估结果对应的评估内容可以不同,例如,当第一评估结果对应的评估内容是最后一句文本末尾是否包含不完整的主题信息时,第二评估结果对应的评估内容可以是最后一句文本的语义是否完整;当第一评估结果对应的评估内容是最后一句文本对应的字符长度时,第二评估结果对应的评估内容可以是最后一句文本末尾是否包含不完整的主题信息。When the first evaluation result is that the end of the last sentence of text is the complete topic information, the semantics of the last sentence of text is complete, the length of the characters corresponding to the last sentence of text is not less than the first preset length, or the last sentence of text except for words and words When the proportion of characters is not greater than the preset proportion, it means that when evaluating the last sentence of text, the last sentence meets the evaluation requirements, that is, the last sentence meets certain integrity requirements. In this case, in order to make the obtained The completeness of the abstract is better and the presentation effect of the abstract is better, and the text result of the last sentence can be further evaluated to obtain the second evaluation result. The evaluation content corresponding to the second evaluation result and the first evaluation result may be different. For example, when the evaluation content corresponding to the first evaluation result is whether the end of the last sentence of text contains incomplete topic information, the evaluation content corresponding to the second evaluation result The content can be whether the semantics of the last sentence is complete; when the evaluation content corresponding to the first evaluation result is the character length corresponding to the last sentence, the evaluation content corresponding to the second evaluation result can be whether the end of the last sentence contains incomplete topics. information.
在一种实施方式中,步骤S1480可以按以下步骤实现:对目标段落中位于第一截取位置之前的文本的最后一句文本进行评估,得到第三评估结果;根据第三评估结果,确定目标段落的第四截取位置;根据第四截取位置,从目标段落中确定所述搜索结果的摘要。本实施方式的具体实现过程可以参考实施例一的步骤S408~步骤S416,实施例三的步骤S1206~步骤S1209。实施例一中,步骤S408得到的评估结果即第三评估结果,步骤S409~步骤S411即确定第四截取位置,步骤S412~步骤S416即根据第四截取位置确定搜索结果的摘要的过程。实施例三中,步骤S1206得到的评估结果即第三评估结果,步骤S1208中确定出的第三截取位置即第四截取位置,步骤S1209即根据第四截取位置确定搜索结果的摘要的过程。In one embodiment, step S1480 may be implemented by the following steps: evaluating the last sentence of the text in the target paragraph before the first interception position to obtain a third evaluation result; Fourth interception position; according to the fourth interception position, determine the abstract of the search result from the target paragraph. For the specific implementation process of this embodiment, reference may be made to steps S408 to S416 in the first embodiment, and steps S1206 to S1209 in the third embodiment. In the first embodiment, the evaluation result obtained in step S408 is the third evaluation result, steps S409 to S411 determine the fourth interception position, and steps S412 to S416 determine the abstract of the search result according to the fourth interception position. In the third embodiment, the evaluation result obtained in step S1206 is the third evaluation result, the third interception position determined in step S1208 is the fourth interception position, and step S1209 is the process of determining the abstract of the search result according to the fourth interception position.
上述第四截取位置可以为以下任一项:对所述目标段落中位于第一截取位置之前的文本的最后一句文本进行补充后得到的文本的结尾位置、目标段落中位于第一截取位置之前的文本的结尾位置、删除目标段落中位于第一截取位置之前的文本的最后一句文本后得到的文本的结尾位置。The above-mentioned fourth interception position can be any of the following: the end position of the text obtained after supplementing the last sentence of the text in the target paragraph before the first interception position, the position in the target paragraph before the first interception position. The end position of the text, the end position of the text obtained by deleting the last sentence of the text in the target paragraph before the first interception position.
本实施方式中,当确定出第一截取位置后,目标段落中位于第一截取位置之前的文本的最后一句文文可能完整性还是较低,这种情况下,可以进一步对目标段落中位于第一截取位置之前的文本的最后一句文本进行评估,根据得到的第三评估结果,确定第四截取位置,根据第四截取位置,从目标段落中确定所述搜索结果的摘要。也就是说,本实施方式 在每次确定出第一截取位置后,会对目标段落中位于第一截取位置之前的文本的最后一句文本再次进行评估,以使得确定出的截取位置之前的文本的最后一句文本满足摘要完整显示的需求,这样,可以使确定出的搜索结果的摘要的完整性更好、显示效果更好。In this implementation manner, after the first interception position is determined, the integrity of the last sentence of the text in the target paragraph before the first interception position may still be low. The last sentence of the text before an interception position is evaluated, a fourth interception position is determined according to the obtained third evaluation result, and an abstract of the search result is determined from the target paragraph according to the fourth interception position. That is to say, after each time the first interception position is determined, this embodiment will re-evaluate the last sentence of the text in the target paragraph before the first interception position, so that the text before the determined interception position is The text of the last sentence satisfies the requirement of complete display of the abstract, so that the completeness and display effect of the abstract of the determined search result can be improved.
在一种实施方式中,服务器可以按以下步骤确定搜索结果的目标段落。图15为本申请实施例提供的确定搜索结果的目标段落的一种流程示意图。如图15所示,可以按以下步骤S1501~步骤S1506确定搜索结果的目标段落。In one embodiment, the server may determine the target paragraph of the search result as follows. FIG. 15 is a schematic flowchart of determining a target paragraph of a search result according to an embodiment of the present application. As shown in FIG. 15 , the target paragraph of the search result can be determined according to the following steps S1501 to S1506 .
S1501:服务器接收客户端发送的搜索查询。S1501: The server receives the search query sent by the client.
S1502:服务器查找与该搜索查询匹配的多个网页,并按匹配度从高到低的顺序对各网页进行评分。S1502: The server searches for multiple web pages matching the search query, and scores each web page in descending order of matching degree.
其中,匹配度越高,评分越高。Among them, the higher the matching degree, the higher the score.
S1503:服务器根据评分从高到低的顺序从上述多个网页中确定X个网页。S1503: The server determines X webpages from the above-mentioned multiple webpages according to the order of the scores from high to low.
X为不小于1的正整数。例如,X可以为50~300中的任一值,也可以为其他的具体值。X设置的越大,服务器发送给客户端的网页的条数越多,客户端能够显示的网页的条数也越多。X is a positive integer not less than 1. For example, X may be any value from 50 to 300, or may be another specific value. The larger the X setting is, the more web pages the server sends to the client, and the more web pages the client can display.
S1504:服务器获取所确定的网页对应的文本信息,并对该文本信息进行分段,得到上述X个网页中的每一个所对应的至少一个段落。S1504: The server acquires the text information corresponding to the determined webpage, and segments the text information to obtain at least one paragraph corresponding to each of the X webpages.
S1505:服务器对各个段落进行评分。S1505: The server scores each paragraph.
其中,评分越高,说明段落与搜索查询的匹配度越高。Among them, the higher the score, the better the match between the paragraph and the search query.
S1506:服务器按评分从高到低的顺序,选择Y个段落,作为搜索结果的目标段落。S1506: The server selects Y paragraphs in descending order of ratings as target paragraphs of the search result.
通常情况下,一个网页选择一个分段作为摘要所在段落即可,这种情况下,Y的取值为1。在其他情况下,若网页的一个分段的文本信息较少,不足以提供足够的信息作为摘要,这种情况下,Y的取值可以为大于1的正整数。Usually, a webpage can select a paragraph as the paragraph where the abstract is located. In this case, the value of Y is 1. In other cases, if the text information of a segment of the web page is too small to provide enough information as a summary, in this case, the value of Y can be a positive integer greater than 1.
在一种实施方式中,步骤S1606可以按以下步骤a~步骤d实现:In one embodiment, step S1606 may be implemented in the following steps a to d:
步骤a:确定i=1。Step a: Determine i=1.
步骤b:服务器判断排在第1至i位的段落对应的文本的长度是否不小于预设长度。Step b: The server determines whether the length of the text corresponding to the paragraphs ranked 1 to i is not less than a preset length.
若步骤b的判断结果为是,则执行步骤c,若步骤b的判断结果为否,则执行步骤d。If the judgment result of step b is yes, execute step c, and if the judgment result of step b is no, execute step d.
步骤b中的预设长度的设置方式可以参考前述各实施例中预设长度的设置方式,此处不再赘述。For the setting method of the preset length in step b, reference may be made to the setting method of the preset length in the foregoing embodiments, which will not be repeated here.
步骤c:将排在第1至i位的段落确定为Y个段落。Step c: Determine the paragraphs ranked 1 to i as Y paragraphs.
步骤d:确定i=i+1,并执行步骤b至步骤c。Step d: determine i=i+1, and execute steps b to c.
上文详细介绍了本申请提供的搜索结果的摘要确定方法的示例。可以理解的是,相应的装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Examples of abstract determination methods for search results provided by the present application are described in detail above. It can be understood that, in order to implement the above-mentioned functions, the corresponding apparatuses include corresponding hardware structures and/or software modules for performing the respective functions. Those skilled in the art should easily realize that the present application can be implemented in hardware or a combination of hardware and computer software with reference to the modules and algorithm steps of each example described in the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
本申请可以根据上述方法示例对搜索结果的摘要确定装置进行功能模块的划分,例如,可以将各个功能划分为各个功能模块,也可以将两个或两个以上的功能集成在一个模 块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The present application can divide the functional modules of the apparatus for determining the abstract of the search result according to the above method examples. For example, each function can be divided into each functional module, or two or more functions can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that the division of modules in this application is schematic, and is only a logical function division, and other division methods may be used in actual implementation.
图16示出了本申请提供的一种搜索结果的摘要确定装置的结构示意图。装置包括评估模块1610和确定模块1620。FIG. 16 shows a schematic structural diagram of an apparatus for determining an abstract of a search result provided by the present application. The apparatus includes an evaluation module 1610 and a determination module 1620.
评估模块,用于从目标段落的起始位置开始确定预设长度个字符,对上述预设长度个字符的最后一句文本进行评估,得到第一评估结果,其中,所述目标段落为所述搜索结果的摘要所在段落;The evaluation module is used to determine the preset length characters from the starting position of the target paragraph, and evaluate the last sentence of text of the preset length characters to obtain the first evaluation result, wherein the target paragraph is the search result. The paragraph in which the summary of the results is located;
确定模块,用于根据所述第一评估结果,确定所述目标段落的第一截取位置,其中,所述第一截取位置为以下任一项:对所述最后一句文本进行补充后得到的文本的结尾位置、所述最后一句文本的结尾位置、删除所述最后一句文本后得到的文本的结尾位置,根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,其中,所述搜索结果的摘要对应的字符长度在预设的长度范围内。A determination module, configured to determine a first interception position of the target paragraph according to the first evaluation result, wherein the first interception position is any one of the following: a text obtained by supplementing the last sentence of text The ending position of the text, the ending position of the last sentence of text, the ending position of the text obtained after deleting the last sentence of text, according to the first interception position, the summary of the search result is determined from the target paragraph, wherein , the character length corresponding to the abstract of the search result is within a preset length range.
在一种实施方式中,所述第一评估结果为:所述最后一句文本的末尾为不完整的主题信息;In one embodiment, the first evaluation result is: the end of the last sentence of text is incomplete topic information;
所述第一截取位置为:将所述最后一句文本的末尾的主题信息补充完整后得到的文本的结尾位置。The first interception position is: the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
在一种实施方式中,所述第一评估结果为:所述最后一句文本的语义不完整;In one embodiment, the first evaluation result is: the semantics of the last sentence of text is incomplete;
所述第一截取位置为:将所述最后一句文本补充为一句完整文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
在一种实施方式中,所述第一评估结果为:所述最后一句文本对应的字符长度小于第一预设长度,或者,所述最后一句文本的除字和词以外的字符占比大于预设比例;In an embodiment, the first evaluation result is: the length of the characters corresponding to the last sentence of text is less than the first preset length, or the proportion of characters other than words and words in the last sentence of text is greater than the predetermined length. set proportion;
所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after deleting the last sentence of text.
在一种实施方式中,所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置;In one embodiment, the first interception position is: the end position of the text obtained after deleting the last sentence of text;
所述确定模块具体用于:The determining module is specifically used for:
确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度小于预设的长度范围的下限;It is determined that the character length corresponding to the text before the first interception position in the target paragraph is less than the lower limit of the preset length range;
确定第二截取位置,其中,所述第二截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;Determine the second interception position, wherein, the second interception position is: the end position of the text obtained after the last sentence of text is supplemented;
根据所述第二截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the second clipping position.
在一种实施方式中,所述第一截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;In one embodiment, the first interception position is: the end position of the text obtained after the last sentence of text is supplemented;
所述确定模块具体用于:The determining module is specifically used for:
确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度大于预设的长度范围的上限;It is determined that the character length corresponding to the text located before the first interception position in the target paragraph is greater than the upper limit of the preset length range;
确定第三截取位置,其中,所述第三截取位置为:删除所述最后一句文本后得到的文本的结尾位置;Determine the third interception position, wherein, the third interception position is: the end position of the text obtained after deleting the last sentence of text;
根据所述第三截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the third clipping position.
在一种实施方式中,所述确定模块具体用于:In one embodiment, the determining module is specifically used for:
确定所述第一评估结果为以下任一项:所述最后一句文本的末尾为完整的主题信息、所述最后一句文本的语义完整、所述最后一句文本对应的字符长度不小于第一预设长度、所述最后一句文本的除字和词以外的字符占比不大于预设比例;It is determined that the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the character length corresponding to the last sentence of text is not less than the first preset. The length, the proportion of characters other than words and words in the last sentence of text is not greater than the preset proportion;
对所述最后一句文本进行评估,得到第二评估结果,其中,所述第二评估结果与所述第一评估结果对应的评估内容不同;Evaluate the last sentence of text to obtain a second evaluation result, wherein the second evaluation result is different from the evaluation content corresponding to the first evaluation result;
根据所述第二评估结果,确定所述目标段落的第一截取位置。According to the second evaluation result, a first interception position of the target paragraph is determined.
在一种实施方式中,所述确定模块具体用于:In one embodiment, the determining module is specifically used for:
对所述目标段落中位于所述第一截取位置之前的文本的最后一句文本进行评估,得到第三评估结果;Evaluate the last sentence of the text in the target paragraph before the first interception position to obtain a third evaluation result;
根据所述第三评估结果,确定所述目标段落的第四截取位置;According to the third evaluation result, determine the fourth interception position of the target paragraph;
根据所述第四截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the fourth clipping position.
本申请实施例还提供了一种电子设备。本申请实施例提供的电子设备可以是图3所示的服务器320,用于执行上述搜索结果的摘要确定方法。图17为本申请实施例提供的电子设备的一种结构示意图,如图17所示,本申请提供的电子设备包括处理器1701、接口1702、存储器1703和通信总线1704,其中,处理器1701,接口1702,存储器1703通过通信总线1704完成相互间的通信;The embodiments of the present application also provide an electronic device. The electronic device provided in this embodiment of the present application may be the server 320 shown in FIG. 3 , and is configured to execute the foregoing method for determining the abstract of the search result. FIG. 17 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 17 , the electronic device provided by the present application includes a processor 1701, an interface 1702, a memory 1703, and a communication bus 1704, wherein the processor 1701, The interface 1702 and the memory 1703 communicate with each other through the communication bus 1704;
存储器1703,用于存放计算机程序;The memory 1703 is used to store computer programs;
处理器1701,用于执行存储器1703上所存放的程序时,实现上述实施例中任一项所述的搜索结果的摘要确定方法。The processor 1701 is configured to implement the method for determining the digest of the search result described in any one of the foregoing embodiments when executing the program stored in the memory 1703 .
上述电子设备提到的通信总线可以是外设部件互连标准(peripheral component interconnect,P C I)总线或扩展工业标准结构(extended industry sandard architecture,EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus mentioned in the above electronic device may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like. The communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
接口用于上述电子设备与其他设备之间的通信。The interface is used for communication between the above electronic device and other devices.
存储器可以包括随机存取存储器(random access nemory,RAM),也可以包括非易失性存储器(non-volatile memory,NVM),例如至少一个磁盘存储器。The memory may include random access memory (RAM), and may also include non-volatile memory (NVM), such as at least one disk memory.
可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。Optionally, the memory may also be at least one storage device located away from the aforementioned processor.
上述的处理器可以是通用处理器,包括中央处理器(central processing unit,CPU)、网络处理器(network processor,NP)等;还可以是数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (digital signal processing, DSP), a dedicated integrated Circuit (application specific integrated circuit, ASIC), field-programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储了计算机程序,当所述计算机程序被处理器执行时,使得处理器执行上述任一实施例所述的搜索结果的摘要确定方法。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the processor is made to execute the description in any of the foregoing embodiments. The method for determining the summary of the search results.
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述任一实施例所述的搜索结果的摘要确定方法。Embodiments of the present application also provide a computer program product, which, when the computer program product runs on a computer, causes the computer to execute the above-mentioned relevant steps, so as to implement the method for determining an abstract of a search result described in any of the above-mentioned embodiments.
其中,本实施例提供的电子设备、计算机可读存储介质、计算机程序产品或芯片均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。Wherein, the electronic device, computer-readable storage medium, computer program product or chip provided in this embodiment are all used to execute the corresponding method provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the above-provided method. The beneficial effects in the corresponding method will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative, for example, the division of modules or units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or May be integrated into another device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。Units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above content is only a specific embodiment of the present application, but the protection scope of the present application is not limited to this. Covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (18)

  1. 一种搜索结果的摘要确定方法,其特征在于,包括:A method for determining an abstract of a search result, comprising:
    从目标段落的起始位置开始确定预设长度个字符,其中,所述目标段落为所述搜索结果的摘要所在段落;Determine a preset length of characters from the starting position of the target paragraph, wherein the target paragraph is the paragraph where the abstract of the search result is located;
    对所述预设长度个字符的最后一句文本进行评估,得到第一评估结果;Evaluate the last sentence of text of the preset length characters to obtain the first evaluation result;
    根据所述第一评估结果,确定所述目标段落的第一截取位置,其中,所述第一截取位置为以下任一项:对所述最后一句文本进行补充后得到的文本的结尾位置、所述最后一句文本的结尾位置、删除所述最后一句文本后得到的文本的结尾位置;According to the first evaluation result, the first interception position of the target paragraph is determined, wherein the first interception position is any one of the following: the end position of the text obtained by supplementing the last sentence of text, the Describe the end position of the last sentence of text, and the end position of the text obtained after deleting the last sentence of text;
    根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,其中,所述搜索结果的摘要对应的字符长度在预设的长度范围内。According to the first interception position, an abstract of the search result is determined from the target paragraph, wherein a character length corresponding to the abstract of the search result is within a preset length range.
  2. 根据权利要求1所述的方法,其特征在于,所述第一评估结果为:所述最后一句文本的末尾为不完整的主题信息;The method according to claim 1, wherein the first evaluation result is: the end of the last sentence of text is incomplete topic information;
    所述第一截取位置为:将所述最后一句文本的末尾的主题信息补充完整后得到的文本的结尾位置。The first interception position is: the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
  3. 根据权利要求1所述的方法,其特征在于,所述第一评估结果为:所述最后一句文本的语义不完整;The method according to claim 1, wherein the first evaluation result is: the semantics of the last sentence of text is incomplete;
    所述第一截取位置为:将所述最后一句文本补充为一句完整文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
  4. 根据权利要求1所述的方法,其特征在于,所述第一评估结果为:所述最后一句文本对应的字符长度小于第一预设长度,或者,所述最后一句文本的除字和词以外的字符占比大于预设比例;The method according to claim 1, wherein the first evaluation result is: a character length corresponding to the last sentence of text is less than a first preset length, or, the characters of the last sentence of text other than words and words The proportion of characters is greater than the preset proportion;
    所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after deleting the last sentence of text.
  5. 根据权利要求1或4所述的方法,其特征在于,所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置;The method according to claim 1 or 4, wherein the first interception position is: the end position of the text obtained after deleting the last sentence of text;
    所述根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,包括:The determining the abstract of the search result from the target paragraph according to the first interception position includes:
    确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度小于所述预设的长度范围的下限;It is determined that the character length corresponding to the text located before the first interception position in the target paragraph is less than the lower limit of the preset length range;
    确定第二截取位置,其中,所述第二截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;Determine the second interception position, wherein, the second interception position is: the end position of the text obtained after the last sentence of text is supplemented;
    根据所述第二截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the second clipping position.
  6. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;The method according to any one of claims 1 to 3, wherein the first interception position is: the end position of the text obtained after the last sentence of text is supplemented;
    所述根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,包括:The determining the abstract of the search result from the target paragraph according to the first interception position includes:
    确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度大于所述预设的长度范围的上限;It is determined that the character length corresponding to the text located before the first interception position in the target paragraph is greater than the upper limit of the preset length range;
    确定第三截取位置,其中,所述第三截取位置为:删除所述最后一句文本后得到的文本的结尾位置;Determine the third interception position, wherein, the third interception position is: the end position of the text obtained after deleting the last sentence of text;
    根据所述第三截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the third clipping position.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述根据所述第一评估结果,确定所述目标段落的第一截取位置,包括:The method according to any one of claims 1 to 6, wherein the determining, according to the first evaluation result, the first interception position of the target paragraph comprises:
    确定所述第一评估结果为以下任一项:所述最后一句文本的末尾为完整的主题信息、所述最后一句文本的语义完整、所述最后一句文本对应的字符长度不小于第一预设长度、所述最后一句文本的除字和词以外的字符占比不大于预设比例;It is determined that the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the character length corresponding to the last sentence of text is not less than the first preset. The length, the proportion of characters other than words and words in the last sentence of text is not greater than the preset proportion;
    对所述最后一句文本进行评估,得到第二评估结果,其中,所述第二评估结果与所述第一评估结果对应的评估内容不同;Evaluate the last sentence of text to obtain a second evaluation result, wherein the second evaluation result is different from the evaluation content corresponding to the first evaluation result;
    根据所述第二评估结果,确定所述目标段落的第一截取位置。According to the second evaluation result, a first interception position of the target paragraph is determined.
  8. 根据权利要求1至6中任一项所述的方法,其特征在于,所述根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,包括:The method according to any one of claims 1 to 6, wherein the determining the abstract of the search result from the target paragraph according to the first interception position comprises:
    对所述目标段落中位于所述第一截取位置之前的文本的最后一句文本进行评估,得到第三评估结果;Evaluate the last sentence of the text in the target paragraph before the first interception position to obtain a third evaluation result;
    根据所述第三评估结果,确定所述目标段落的第四截取位置;According to the third evaluation result, determine the fourth interception position of the target paragraph;
    根据所述第四截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the fourth clipping position.
  9. 一种搜索结果的摘要确定装置,其特征在于,包括:A device for determining an abstract of a search result, comprising:
    评估模块,用于从目标段落的起始位置开始确定预设长度个字符,对上述预设长度个字符的最后一句文本进行评估,得到第一评估结果,其中,所述目标段落为所述搜索结果的摘要所在段落;The evaluation module is used to determine the preset length characters from the starting position of the target paragraph, and evaluate the last sentence of the above preset length characters to obtain a first evaluation result, wherein the target paragraph is the search result. The paragraph in which the summary of the results is located;
    确定模块,用于根据所述第一评估结果,确定所述目标段落的第一截取位置,其中,所述第一截取位置为以下任一项:对所述最后一句文本进行补充后得到的文本的结尾位置、所述最后一句文本的结尾位置、删除所述最后一句文本后得到的文本的结尾位置,根据所述第一截取位置,从所述目标段落中确定所述搜索结果的摘要,其中,所述搜索结果的摘要对应的字符长度在预设的长度范围内。A determination module, configured to determine a first interception position of the target paragraph according to the first evaluation result, wherein the first interception position is any one of the following: a text obtained by supplementing the last sentence of text The ending position of the text, the ending position of the last sentence of text, the ending position of the text obtained after deleting the last sentence of text, according to the first interception position, from the target paragraph to determine the summary of the search result, wherein , the character length corresponding to the abstract of the search result is within a preset length range.
  10. 根据权利要求9所述的装置,其特征在于,所述第一评估结果为:所述最后一句文本的末尾为不完整的主题信息;The device according to claim 9, wherein the first evaluation result is: the end of the last sentence of text is incomplete topic information;
    所述第一截取位置为:将所述最后一句文本的末尾的主题信息补充完整后得到的文本的结尾位置。The first interception position is: the end position of the text obtained by supplementing the subject information at the end of the last sentence of text.
  11. 根据权利要求9所述的装置,其特征在于,所述第一评估结果为:所述最后一句文本的语义不完整;The device according to claim 9, wherein the first evaluation result is: the semantics of the last sentence of text is incomplete;
    所述第一截取位置为:将所述最后一句文本补充为一句完整文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after the last sentence of text is supplemented into a complete sentence of text.
  12. 根据权利要求9所述的装置,其特征在于,所述第一评估结果为:所述最后一句文本对应的字符长度小于第一预设长度,或者,所述最后一句文本的除字和词以外的字符占比大于预设比例;The device according to claim 9, wherein the first evaluation result is: a character length corresponding to the last sentence of text is less than a first preset length, or, the characters of the last sentence of text other than words and words The proportion of characters is greater than the preset proportion;
    所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置。The first interception position is: the end position of the text obtained after deleting the last sentence of text.
  13. 根据权利要求9或12所述的装置,其特征在于,所述第一截取位置为:删除所述最后一句文本后得到的文本的结尾位置;The device according to claim 9 or 12, wherein the first interception position is: the end position of the text obtained after deleting the last sentence of text;
    所述确定模块具体用于:The determining module is specifically used for:
    确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度小于所述预 设的长度范围的下限;Determine that the character length corresponding to the text before the first interception position in the target paragraph is less than the lower limit of the preset length range;
    确定第二截取位置,其中,所述第二截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;Determine the second interception position, wherein, the second interception position is: the end position of the text obtained after the last sentence of text is supplemented;
    根据所述第二截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the second clipping position.
  14. 根据权利要求9至11中任一项所述的装置,其特征在于,所述第一截取位置为:对所述最后一句文本进行补充后得到的文本的结尾位置;The device according to any one of claims 9 to 11, wherein the first interception position is: the end position of the text obtained after the last sentence of text is supplemented;
    所述确定模块具体用于:The determining module is specifically used for:
    确定所述目标段落中位于所述第一截取位置之前的文本对应的字符长度大于所述预设的长度范围的上限;It is determined that the character length corresponding to the text located before the first interception position in the target paragraph is greater than the upper limit of the preset length range;
    确定第三截取位置,其中,所述第三截取位置为:删除所述最后一句文本后得到的文本的结尾位置;Determine the third interception position, wherein, the third interception position is: the end position of the text obtained after deleting the last sentence of text;
    根据所述第三截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the third clipping position.
  15. 根据权利要求9至14中任一项所述的装置,其特征在于,所述确定模块具体用于:The device according to any one of claims 9 to 14, wherein the determining module is specifically configured to:
    确定所述第一评估结果为以下任一项:所述最后一句文本的末尾为完整的主题信息、所述最后一句文本的语义完整、所述最后一句文本对应的字符长度不小于第一预设长度、所述最后一句文本的除字和词以外的字符占比不大于预设比例;It is determined that the first evaluation result is any of the following: the end of the last sentence of text is complete topic information, the semantics of the last sentence of text is complete, and the character length corresponding to the last sentence of text is not less than the first preset. The length, the proportion of characters other than words and words in the last sentence of text is not greater than the preset proportion;
    对所述最后一句文本进行评估,得到第二评估结果,其中,所述第二评估结果与所述第一评估结果对应的评估内容不同;Evaluate the last sentence of text to obtain a second evaluation result, wherein the second evaluation result is different from the evaluation content corresponding to the first evaluation result;
    根据所述第二评估结果,确定所述目标段落的第一截取位置。According to the second evaluation result, a first interception position of the target paragraph is determined.
  16. 根据权利要求9至14中任一项所述的装置,其特征在于,所述确定模块具体用于:The device according to any one of claims 9 to 14, wherein the determining module is specifically configured to:
    对所述目标段落中位于所述第一截取位置之前的文本的最后一句文本进行评估,得到第三评估结果;Evaluate the last sentence of the text in the target paragraph before the first interception position to obtain a third evaluation result;
    根据所述第三评估结果,确定所述目标段落的第四截取位置;According to the third evaluation result, determine the fourth interception position of the target paragraph;
    根据所述第四截取位置,从所述目标段落中确定所述搜索结果的摘要。A summary of the search result is determined from the target paragraph based on the fourth clipping position.
  17. 一种电子设备,其特征在于,包括:处理器、存储器和接口;An electronic device, comprising: a processor, a memory and an interface;
    所述处理器、所述存储器和所述接口相互配合,所述处理器用于执行如权利要求1至8中任一项所述的方法。The processor, the memory and the interface cooperate with each other for carrying out the method of any one of claims 1 to 8.
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储了计算机程序,当所述计算机程序被处理器执行时,使得处理器执行权利要求1至8中任一项所述的方法。A computer-readable storage medium, characterized in that, a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the processor is made to execute any one of claims 1 to 8. method described.
PCT/CN2021/138921 2021-01-19 2021-12-16 Method and apparatus for determining summary of search result, and electronic device WO2022156446A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110072051.5A CN114817520A (en) 2021-01-19 2021-01-19 Method and device for determining abstract of search result and electronic equipment
CN202110072051.5 2021-01-19

Publications (1)

Publication Number Publication Date
WO2022156446A1 true WO2022156446A1 (en) 2022-07-28

Family

ID=82523866

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138921 WO2022156446A1 (en) 2021-01-19 2021-12-16 Method and apparatus for determining summary of search result, and electronic device

Country Status (2)

Country Link
CN (1) CN114817520A (en)
WO (1) WO2022156446A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458718A (en) * 2009-01-05 2009-06-17 北京大学 Search engine dynamic summarization extracting method
CN104050158A (en) * 2014-06-27 2014-09-17 吴涛军 Automatic quotation extraction method and device with semantic integrity kept
CN105068992A (en) * 2015-07-29 2015-11-18 魅族科技(中国)有限公司 Searching result display method and searching result display device
CN109597982A (en) * 2017-09-30 2019-04-09 北京国双科技有限公司 Summary texts recognition methods and device
CN110489543A (en) * 2019-08-14 2019-11-22 北京金堤科技有限公司 A kind of extracting method and device of news in brief

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458718A (en) * 2009-01-05 2009-06-17 北京大学 Search engine dynamic summarization extracting method
CN104050158A (en) * 2014-06-27 2014-09-17 吴涛军 Automatic quotation extraction method and device with semantic integrity kept
CN105068992A (en) * 2015-07-29 2015-11-18 魅族科技(中国)有限公司 Searching result display method and searching result display device
CN109597982A (en) * 2017-09-30 2019-04-09 北京国双科技有限公司 Summary texts recognition methods and device
CN110489543A (en) * 2019-08-14 2019-11-22 北京金堤科技有限公司 A kind of extracting method and device of news in brief

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANONYMITY: "How to Customize Your Excerpt Length", BLOG SINA, XP009538467, Retrieved from the Internet <URL:http://blog.sina.com.cn/s/blog_7d0afc020100qc7t.html> *
JINGYUN LIU; JACKIE C.K.CHEUNG; ANNIE LOUIS: "What comes next? Extractive summarization by next-sentence prediction", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 12 January 2019 (2019-01-12), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081004750 *

Also Published As

Publication number Publication date
CN114817520A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
US11294970B1 (en) Associating an entity with a search query
US10430474B2 (en) Search and navigation to specific document content
US10997185B2 (en) Information query method and apparatus
CN107092615B (en) Query suggestions from documents
US8276060B2 (en) System and method for annotating documents using a viewer
KR101098703B1 (en) System and method for identifying related queries for languages with multiple writing systems
US8001135B2 (en) Search support apparatus, computer program product, and search support system
TWI506460B (en) System and method for recommending files
US8166056B2 (en) System and method for searching annotated document collections
US20070043761A1 (en) Semantic discovery engine
US20160098405A1 (en) Document Curation System
KR102001647B1 (en) Contextualizing knowledge panels
US10025783B2 (en) Identifying similar documents using graphs
US20110246464A1 (en) Keyword presenting device
US20110179012A1 (en) Network-oriented information search system and method
US20190065502A1 (en) Providing information related to a table of a document in response to a search query
Li et al. Towards retrieving relevant information graphics
WO2022156446A1 (en) Method and apparatus for determining summary of search result, and electronic device
EP1962202A2 (en) System and method for annotating documents
US9773035B1 (en) System and method for an annotation search index
US10579660B2 (en) System and method for augmenting search results
CN111831922B (en) Recommendation system and method based on internet information
US11954422B2 (en) Systems and methods for structure-based automated hyperlinking
JP2013145448A (en) Document retrieval system and document retrieval method
Colbert-Lewis The Chicago Manual of Style

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920820

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21920820

Country of ref document: EP

Kind code of ref document: A1