WO2021037012A1

WO2021037012A1 - Text information navigation and browsing method, apparatus, server and storage medium

Info

Publication number: WO2021037012A1
Application number: PCT/CN2020/110994
Authority: WO
Inventors: 夏宇彬; 袁明; 孙敏; 蔡洁; 张�成
Original assignee: 智慧芽信息科技(苏州)有限公司
Priority date: 2019-08-30
Filing date: 2020-08-25
Publication date: 2021-03-04
Also published as: CN112445891A

Abstract

A text information navigation and browsing method, an apparatus, a server and a storage medium, the text information navigation and browsing method comprising: acquiring first text, the first text comprising first information (S110); acquiring second text, the second text comprising second information (S120); matching the first information and the second information, so as to determine a degree of similarity of the second information to the first information (S130); according to the degree of similarity, performing navigation and browsing of the second text (S140).

Description

Method, device, server and storage medium for navigation and browsing of text information

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office with application number 201910816838.0 on August 30, 2019. The entire content of this application is incorporated into this application by reference.

Technical field

The present disclosure relates to the field of information processing technology, for example, to a method, device, server, and storage medium for navigating and browsing text information.

Background technique

With the rapid development of information processing, how to compare information simply and efficiently becomes more and more important.

For the innovative evaluation of patent application documents, for example, when an examiner examines a patent application document, it is necessary to search the patent website for the technical features recorded in the patent application document to find a comparative document. After finding a comparative document that is similar to the patent application document, it is necessary to find more detailed content in the comparative document for comparison. For example, what content in the retrieved comparative documents is similar or identical to which claim features of the patent application document.

By comparing patent application documents and comparison documents through this artificially involved method, the efficiency of the comparison is very low.

Summary of the invention

The present disclosure provides a method, device, server and storage medium for navigating and browsing text information, so as to realize automatic searching for similar or identical content in at least two documents to improve comparison efficiency.

Provides a method for navigating and browsing text information, including:

Acquiring a first text, where the first text includes first information;

Acquiring a second text, where the second text includes second information;

Matching the first information and the second information to determine the similarity between the second information and the first information;

Navigating and browsing the second text according to the similarity.

A navigation and browsing device for text information is also provided, including:

A first obtaining module, configured to obtain a first text, wherein the first text includes first information;

The second obtaining module is configured to obtain a second text, wherein the second text includes information;

A matching module, configured to match the first information and the second information to determine the similarity between the second information and the first information;

The navigation and browsing module is configured to navigate and browse the second text according to the similarity.

A server is also provided, including:

One or more processors;

Storage device, set to store one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the aforementioned method for navigating and browsing text information.

A computer-readable storage medium is also provided, on which a computer program is stored, and when the program is executed by a processor, the above-mentioned method for navigating and browsing text information is realized.

Description of the drawings

FIG. 1 is a schematic flowchart of a method for navigating and browsing text information according to Embodiment 1 of the present invention;

2 is a schematic flowchart of a method for navigating and browsing text information according to Embodiment 2 of the present invention;

3 is a schematic flowchart of another method for navigating and browsing text information according to Embodiment 2 of the present invention;

4 is a schematic flowchart of another method for navigating and browsing text information according to Embodiment 2 of the present invention;

FIG. 5 is a schematic structural diagram of a text information navigation and browsing device provided in the third embodiment of the present invention;

Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention.

detailed description

The present disclosure will be described below with reference to the drawings and embodiments.

It should be mentioned before discussing the exemplary embodiments that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many of the steps can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when its operation is completed, but may also have additional steps not included in the drawings. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.

In addition, the terms "first", "second", etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element. For example, without departing from the scope of the present application, the first information may be referred to as second information, and similarly, the second information may be referred to as first information. Both the first information and the second information are information, but they are not the same information. The terms "first", "second", etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present disclosure, "multiple" and "batch" mean at least two, such as two, three, etc., unless specifically defined otherwise.

Example one

Fig. 1 is a schematic flow chart of a method for navigating and browsing text information according to Embodiment 1 of the present invention, which can be applied to a scenario where text is compared. The method can be executed by a text information navigation and browsing device, which can be used It can be implemented by software and/or hardware, and can be integrated on the server.

As shown in FIG. 1, the method for navigating and browsing text information provided in the first embodiment of the present invention includes:

S110. Acquire a first text, where the first text includes one or more pieces of first information.

The first text refers to the text that needs to be analyzed and compared. In this embodiment, the first text can be a technical document, such as a dissertation, a patent document, a technical submission, or a technical solution for risk analysis, or part of the content in a patent document or a technical submission. , Such as the text of the technical solution described in the claims and technical disclosure documents, etc., which are not limited here. In one embodiment, the first text is the claim. The first information refers to part or all of the information in the first text, and there is no restriction here. In an embodiment, the first information is related information describing the technical solution in the first text. Taking the first text as the claim as an example, the first information can be one or more features in the claim, a sentence in the claim, or the entire claim, which is not limited here.

Optionally, the first information includes but is not limited to one or more of words, sentences or paragraphs. The user can select the first information in the first text as needed, or the system can select it by default. There is no restriction here. Optionally, the first information is one or more. Taking the first information as a claim as an example, when there are multiple first information, multiple claims in the first text can be matched at the same time to find similar second information in the second text, which greatly improves the ratio. The efficiency of the file.

S120. Acquire a second text, where the second text includes one or more pieces of second information.

The second text is a text that needs to be compared with the first text to determine whether it is similar to the technical solution recorded in the first text. In this embodiment, the second text can be technical documents, books, patent documents, etc., or part of the content of technical documents, books, and patent documents, which is not limited here. In one embodiment, the second text is the target comparison document. The second information refers to part or all of the information in the second text. There are one or more second information. In one embodiment, the second information is related information describing the technical solution in the second text. Taking the second text as a similar patent document as an example, the second information can be the entire specification, a paragraph of the entire specification, or a sentence or word in the specification, which is not limited here.

Optionally, the second information includes one or more of words, sentences, or paragraphs.

The second text can be obtained by manually importing the existing text into the navigation and browsing device of the text information. For example, if you find a text that you think is similar to the first text, you can download the text and import it into the navigation and browsing device of the text information to compare with the first information in the first text to determine the similar part and the corresponding position.

S130. Match the first information and the second information to determine the similarity between the second information and the first information.

The similarity refers to the degree of similarity between the first information and the second information. Matching refers to comparing the first information with the second information to determine the similarity. The similarity degree can be expressed in the form of percentage or color. For example, green represents a low degree of similarity, and red represents a high degree of similarity. There is no restriction on the form of similarity here. By matching the similarity between the first information and the second information, a position similar to the first information in the second text is determined.

S140: Navigate and browse the second text according to the similarity.

Navigating browsing refers to locating second information similar to the first information in the second text by matching similarity, so as to facilitate quick browsing without manual searching. In one implementation, you can set navigation marks of different colors on the side of the second text. The navigation marks correspond to the row positions of the second information similar to the first information. The user can quickly switch to the first information with higher similarity through the navigation marks. The second information is browsed; in an alternative embodiment, a quick browsing window can also be set to summarize the second information similar to the first information and sort the second information according to the similarity as the browsing index of the second text. The user can click on the corresponding summary Quickly browse the second information in the second text. In this embodiment, by matching the similarity between the first information and the second information, the second information in the second text similar to the first information can be quickly obtained, which greatly improves the efficiency of comparison.

Optionally, step S140, navigating and browsing the second text according to the similarity may include:

The second information and the first information are displayed on a navigation browsing interface according to the similarity.

The navigation browsing interface refers to an interface that displays similarity matching results, and is used to find similar locations and content on the navigation browsing interface. The similarity matching result is a result of the similarity of the first information corresponding to one or more second information, and the similarity matching result reflects the similarity between the second information in the second text and the first information in the first text. The similarity matching result can be to display one or more second texts similar to the first information describing the technical solution in the form of all texts; it can also be to display only one or more second texts similar to the first information There is no restriction on the similar part of the.

Optionally, before step S130, it may include:

Receiving chapter selection information of the second text; extracting a corresponding chapter based on the chapter selection information as the second information.

Chapter refers to part of the content in the second text. Taking the second text as a patent document as an example, the chapters can be chapters such as claims, descriptions, etc., and can also be background technology, descriptions of drawings, and specific implementations. There is no restriction on the division of chapters here. By selecting the specified chapter before the comparison, the similarity of some content can be matched in a targeted manner, and the first information and the second information can be accurately matched.

Optionally, after step S140, it may include:

Sort the second information according to the similarity. By sorting the similarity matching results in descending order, the content with high similarity can be viewed first, saving the time of comparing the size of the distinction.

Optionally, the navigation and browsing interface also includes a similarity mark. According to the similarity matching result of the first information and the second information, the similar parts of the first information and the second information can be highlighted, which can help the user locate the similar parts as soon as possible. Content. The way of highlighting can be highlighted, and there is no limitation here.

Optionally, a switch control is included in the search result, and the switch control is used to control switch display of a plurality of second information. The switch control can control to switch to the previous or next item, and can also switch to more similar or sub-similar similar parts. There is no restriction on how to switch the display here.

Optionally, obtaining the second text includes: receiving search information based on the first text; and searching the second text similar to the first text in a database based on the search information. The search information can be the text or graphic part of the first text about the first technical feature, or it can be automatically generated based on the first information, which is not limited here.

According to the technical solution of the embodiment of the present invention, by obtaining a first text, the first text includes one or more pieces of first information; obtaining a second text, where the second text includes one or more pieces of second information; The first information and the second information determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity. Through automatic matching of similarity, the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results to achieve the effect of improving the efficiency of document retrieval. It solves the problem that the efficiency of comparing files is very low by manually searching for similar or identical content in the comparison files, and it realizes the effect of automatically searching for similar or identical content to improve the efficiency of comparing files.

Example two

Fig. 2 is a schematic flowchart of a method for navigating and browsing text information according to the second embodiment of the present invention. This embodiment is described on the basis of the above technical solution, and is suitable for the scenario of comparing texts. The method can be executed by a text information navigation and browsing device, which can be implemented in software and/or hardware, and can be integrated on a server.

As shown in FIG. 2, the method for navigating and browsing text information provided by the second embodiment of the present invention includes:

S210. Acquire a first text, where the first text includes one or more pieces of first information.

The first text refers to the text that needs to be analyzed and compared. In this embodiment, the first text can be a technical document, such as a dissertation, a patent document, or a technical submission, or part of the content in a patent document or a technical submission, such as claims and technical submissions. There are no restrictions on the text of the technical solution described here. In one embodiment, the first text is the claim. The first information refers to part or all of the information in the first text, and there is no restriction here. In an embodiment, the first information is related information describing the technical solution in the first text. Taking the first text as the claim as an example, the first information can be one or more features in the claim, a sentence in the claim, or the entire claim, which is not limited here.

Optionally, the first information includes but is not limited to one or more of words, sentences or paragraphs.

S220. Acquire a second text, where the second text includes one or more pieces of second information.

The second text is a text that needs to be compared with the first text to determine whether it is similar to the technical solution recorded in the first text. In this embodiment, the second text can be technical documents, books, patent documents, etc., or part of the content of technical documents, books, and patent documents, which is not limited here. In one embodiment, the second text is the target comparison document. The second information refers to part or all of the information in the second text. There are one or more second information. In an embodiment, the second information is related information describing the technical solution in the second text. Taking the second text as a similar patent document as an example, the second information can be the entire specification, a paragraph of the entire specification, or a sentence or word in the specification, which is not limited here.

S230. Extract a first key feature from the first information.

The first key feature refers to the feature related to the first technical feature in the first information. The first information may be one or more of words, sentences or paragraphs, and the first key feature may also be one or more of words, sentences or paragraphs. If the first information is a word, the first key feature is a word; if the first information is a sentence, the first key feature can be a sentence and/or a word; if the first information is a paragraph, the first key feature can be a paragraph , Sentences and/or words. In one embodiment, the first key feature is a keyword.

The first key feature can be extracted through the key feature extraction model. In one embodiment, the key feature extraction model is a text-rank model. The text-rank model is a graph-based ranking model for text. By dividing the text into multiple constituent units (words, sentences) and building a graph model, the voting mechanism is used to rank important components in the text. The information of a single document itself can be used to extract keywords and abstracts.

Exemplarily, suppose the first information is "a UAV emergency parachute opening system, which is used to open the parachute when the UAV fails, and it is characterized in that: the UAV emergency parachute opening system includes a main control module, Module, power management module, umbrella opening module", the first key feature can be UAV, umbrella opening system, main control module, detection module, power management module, umbrella opening module, etc., or it can be UAV emergency The umbrella opening system includes a main control module, a detection module, a power management module, and an umbrella opening module. There are no restrictions here.

S240. Extract a second key feature from the second information.

The second key feature refers to the feature related to the second technical feature in the second information. The second information may be one or more of words, sentences or paragraphs, and the second key feature may also be one or more of words, sentences or paragraphs. If the second information is a word, the second key feature is a word; if the second information is a sentence, the second key feature can be a sentence and/or a word; if the second information is a paragraph, the second key feature can be a paragraph , Sentences and/or words. In one embodiment, the second key feature is a keyword.

When the first key feature is a word, the second key feature may be a word, sentence, or paragraph.

The second key feature can be extracted through the key feature extraction model. In one embodiment, the key feature extraction model is a text-rank model. The text-rank model is a graph-based ranking model for text. By dividing the text into multiple constituent units (words, sentences) and building a graph model, the voting mechanism is used to rank important components in the text. The information of a single document itself can be used to extract keywords and abstracts.

S250. Match the first key feature and the second key feature to determine the similarity between the second information and the first information.

When the first key feature is a word, the second key feature can be a word, sentence, and/or paragraph. That is, when the first key feature is a word, the word of the first key feature can be the same as the word, sentence and/or sentence of the second key feature. Paragraphs are compared, there is no restriction here. Exemplarily, if the first key feature is an unmanned aerial vehicle and the second key feature is an unmanned aerial vehicle, the first key feature and the second key feature can be matched to determine the similarity between the first information and the second information.

The similarity can be expressed in the form of percentage or color. For example, green represents low similarity, and red represents high similarity. There is no restriction on the form of similarity here. By matching the similarity between the first key feature and the second key feature to determine the similarity between the first information and the second information, the similarity between the first technical feature and the second technical feature can be determined.

Optionally, in this embodiment, the similarity may be determined by a cosine similarity model and/or a word vector similarity summation model. When the first key feature and the second key feature are both words, the similarity can be determined through the word vector similarity summation model. The word vector similarity summation model refers to the model obtained by using the word vector similarity summation training; When one key feature is a sentence or a paragraph, and the second key feature is also a sentence or a paragraph, the similarity can be determined by the cosine similarity model. The cosine similarity model refers to a model trained using the cosine similarity algorithm. This embodiment does not limit the algorithm for calculating the similarity.

S260. Navigate and browse the second text according to the similarity.

Navigating browsing refers to locating second information similar to the first information in the second text by matching similarity, so as to facilitate quick browsing without manual searching.

Referring to FIG. 3, in an alternative embodiment, step S250, matching the first key feature and the second key feature to determine the similarity between the second information and the first information can be replaced by:

S251: Perform vectorization on the first key feature based on the trained first comparison model to obtain a first vector result.

The first comparison model refers to a model that vectorizes the first key feature. In this embodiment, vectorization refers to expressing text as a series of vectors that can express the semantics of the text. In an embodiment, the first comparison model includes a word to vector (Word2vec) model and/or a recursive neural network recursive autoencoder (recursive autoencoder) model. When the first key feature is a word, the first comparison model includes a Word2vec model; when the first key feature is a sentence or a paragraph, the first comparison model includes a neural network recursive autoencoder model. If the first key feature includes both words and sentences or paragraphs, the first comparison model includes a Word2vec model and a recursive neural network recursive autoencoder model. This embodiment does not limit which model the first comparison model is.

S252: Perform vectorization on the second key feature based on the trained second comparison model to obtain a second vector result.

The second comparison model refers to a model that vectorizes the second key feature. In an embodiment, the first comparison model includes a Word2vec model and/or a recursive neural network recursive autoencoder model. When the second key feature is a word, the second comparison model includes a Word2vec model; when the second key feature is a sentence or a paragraph, the second comparison model includes a neural network recursive autoencoder model. If the second key feature includes both words and sentences or paragraphs, the first comparison model includes the Word2vec model and the recursive autoencoder model of the recurrent neural network. This embodiment does not limit which model the second comparison model is. In an embodiment, the first comparison model and the second comparison model may use the same model or the same type of model.

S253. Match the first vector result and the second vector result to determine the similarity between the second information and the first information.

In this embodiment, the similarity is determined only after the first key feature and the second key feature are vectorized. It is not only a mechanical comparison of words, but the similarity is determined based on the semantics of the key features, and the similarity is matched. The result is more accurate.

Referring to FIG. 4, optionally, step S230, extracting a first key feature from the first information includes:

S231. Process the first information based on a preset rule to obtain a first processing result.

The preset rule refers to a rule for processing the first information, and the first processing result is obtained by processing the first information through the preset rule. Processing the first information based on preset rules to obtain the first processing result may include: acquiring text information, symbol information, and/or text structure information of the first information; based on the text information, symbol information, and/or The text structure information processes the first information to obtain the first processing result.

The text information includes stop words. For example, stop words include "the", "and", "or", etc., which are not limited here. Processing the first information based on the text information to obtain the first processing result includes: analyzing and obtaining stop words in the first information; and extracting relevant information before and/or after the stop words . Exemplarily, the first information is a sentence or paragraph. For example, "the drone includes a main control module and a flight module", then relevant information such as "the drone, the main control module, and the flight module" is extracted, and the text information can be used to Quickly extract key features to facilitate similarity matching. Optionally, the text information may also include other related words, etc., which are not limited here.

The symbol information includes semicolon and/or comma. Processing the first information based on the symbol information to obtain the first processing result includes: extracting related information before and/or after the semicolon and/or comma. Exemplarily, if the first information is "the drone includes a main control module and a flight module; the flight module includes a power supply unit", then relevant information such as "the main control module, flight module, and the flight module" is extracted. Optionally, the symbol information may also include other identifying symbols, which is not limited here.

The text structure information includes a preamble part and a characteristic part, and processing the first information based on the text structure information to obtain the first processing result includes: extracting relevant information of the preamble part and/or the characteristic part. Exemplarily, if the first information is "an unmanned aerial vehicle, which is characterized by including a flight module", then relevant information such as "unmanned aerial vehicle, flight module" is extracted. Optionally, the text structure information may also include other text structure information, which is not limited here.

S232. Use the first processing result as the first key feature.

In this embodiment, the first information is processed by preset rules to extract key features, the extraction method is simple and effective, and the efficiency of retrieving files is improved.

According to the technical solution of the embodiment of the present invention, by obtaining a first text, the first text includes one or more pieces of first information; obtaining a second text, where the second text includes one or more pieces of second information; The first information and the second information determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity. Through automatic matching of similarity, the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results, and achieve the effect of improving the efficiency of retrieving files.

Example three

FIG. 5 is a schematic structural diagram of a text information navigation and browsing device provided in the third embodiment of the present invention. This embodiment can be applied to a scenario where text is compared. The device can be implemented by software and/or hardware, and Can be integrated on the server.

As shown in FIG. 5, the apparatus for navigation and browsing of text information provided in this embodiment may include a first obtaining module 310, a second obtaining module 320, a matching module 330, and a navigation browsing module 340, wherein:

The first obtaining module 310 is configured to obtain a first text, and the first text includes one or more pieces of first information; the second obtaining module 320 is configured to obtain a second text, and the second text includes one or more pieces of information. Second information; a matching module 330, configured to match the first information and the second information to determine the similarity between the second information and the first information; the navigation and browsing module 340, configured to match the similarity Navigate and browse the second text.

Optionally, the navigation browsing module 340 includes: a display unit configured to display the first information and the second information on a navigation browsing interface according to the similarity.

Optionally, the matching module 330 includes: a first extraction unit configured to extract a first key feature from the first information; a second extraction unit configured to extract a second key feature from the second information; similarity The degree matching unit is configured to match the first key feature and the second key feature to determine the similarity between the second information and the first information.

Optionally, the device for navigating and browsing text information further includes: a first vectorization module configured to perform vectorization on the first key feature based on the trained first comparison model to obtain a first vector result; and a second vector The matching module 330 is set to vectorize the second key feature based on the trained second comparison model to obtain a second vector result; the matching module 330 is set to match the first vector result and the second vector result To determine the similarity between the second information and the first information.

Optionally, the first extraction unit includes: a first processing subunit configured to process the first information based on a preset rule to obtain a first processing result; and use the first processing result as the first key feature .

Optionally, the first processing subunit is configured to obtain text information, symbol information, and/or text structure information of the first information; The information is processed to obtain the first processing result.

Optionally, the text information includes stop words, and the first processing subunit is configured to analyze the stop words in the first information; and extract relevant information before and/or after the stop words.

Optionally, the symbol information includes a semicolon and/or a comma, and the first processing subunit is configured to extract related information before and/or after the semicolon and/or the comma.

Optionally, the text structure information includes a preamble part and a characteristic part, and the first processing subunit is configured to extract relevant information of the preamble part and/or the characteristic part.

Optionally, the second acquisition module 320 includes: a receiving unit configured to receive retrieval information based on a first text; a retrieval unit configured to retrieve the first text similar to the first text in a database based on the retrieval information Two text.

Optionally, the apparatus for navigating and browsing text information further includes: a chapter selection module configured to receive chapter selection information of the second text; and extract a corresponding chapter based on the chapter selection information as the second information.

Optionally, the device for navigating and browsing text information further includes: a sorting module configured to sort the second information according to the similarity.

Optionally, the navigation browsing interface further includes: a switching control, the switching control is set to control the switching display of a plurality of second information.

Optionally, the navigation browsing interface further includes a similar identifier, and the display unit includes a highlight display unit configured to highlight similar parts of the first information and the second information.

Optionally, the key feature is extracted through a text-rank model.

Optionally, the similarity is determined by a cosine similarity model and/or a word vector similarity summation model.

Optionally, the comparison model includes a Word2vec model and/or a recursive neural network recursive autoencoder model.

Optionally, the first information and the second information include one or more of words, sentences or paragraphs.

Optionally, the first text is a claim.

Optionally, the second text is a target comparison document.

The navigation and browsing device for text information provided by the embodiment of the present invention can execute the navigation and browsing method for text information provided by any embodiment of the present invention, and has the corresponding functional modules and effects for the execution method. For content that is not described in detail in the embodiment of the present invention, reference may be made to the description in any method embodiment of the present invention.

Example four

Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention. Figure 6 shows a block diagram of an exemplary server 612 suitable for implementing embodiments of the present invention. The server 612 shown in FIG. 6 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.

As shown in FIG. 6, the server 612 is represented in the form of a general server. The components of the server 612 may include, but are not limited to: one or more processors 616, a storage device 628, and a bus 618 connecting different system components (including the storage device 628 and the processor 616).

The bus 618 represents one or more of several types of bus structures, including a storage device bus or a storage device controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any bus structure among multiple bus structures. For example, these architectures include, but are not limited to, Industry Subversive Alliance (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.

The server 612 includes a variety of computer system readable media. These media may be any available media that can be accessed by the server 612, including volatile and non-volatile media, removable and non-removable media.

The storage device 628 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (RAM) 630 and/or a cache memory 632. The terminal 612 may include other removable/non-removable, volatile/nonvolatile computer system storage media. For example only, the storage system 634 may be configured to read and write a non-removable, non-volatile magnetic medium (not shown in FIG. 6, usually referred to as a "hard drive"). Although not shown in FIG. 6, a disk drive configured to read and write to a removable non-volatile disk (such as a "floppy disk") and a removable non-volatile optical disk such as a compact disc (Compact Disc Read) can be provided. -Only Memory, CD-ROM), Digital Video Disc-Read Only Memory (DVD-ROM) or other optical media) read and write optical disc drives. In these cases, each drive can be connected to the bus 618 through one or more data media interfaces. The storage device 628 may include at least one program product, and the program product has a set of (for example, at least one) program modules, and these program modules are configured to perform the functions of the embodiments of the present invention.

A program/utility tool 640 having a set of (at least one) program module 642 may be stored in, for example, the storage device 628. Such program module 642 includes but is not limited to an operating system, one or more application programs, other program modules, and programs Data, each of these examples or a combination may include the realization of a network environment. The program module 642 generally executes the functions and/or methods in the embodiments described in the present disclosure.

The server 612 can also communicate with one or more external devices 614 (such as keyboards, pointing terminals, displays 624, etc.), and can also communicate with one or more terminals that enable users to interact with the server 612, and/or communicate with The server 612 can communicate with any terminal (such as a network card, a modem, etc.) that communicates with one or more other computing terminals. Such communication can be performed through an input/output (I/O) interface 622. In addition, the server 612 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 620. As shown in FIG. 6, the network adapter 620 communicates with other modules of the server 612 through the bus 618. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the server 612, including but not limited to: microcode, terminal drives, redundant processors, external disk drive arrays, and disk arrays. Independent Disks, RAID) systems, tape drives, and data backup storage systems.

The processor 616 executes a variety of functional applications and data processing by running programs stored in the storage device 628, for example, to implement a method for navigating and browsing text information provided by any embodiment of the present invention. The method may include: obtaining the first A text, the first text includes one or more first information; obtain a second text, the second text includes one or more second information; match the first information and the second information to determine The similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity.

According to the technical solution of the embodiment of the present invention, by obtaining a first text, the first text includes one or more pieces of first information; obtaining a second text, where the second text includes one or more pieces of second information; The first information and the second information determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity. Through automatic matching of similarity, the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results to achieve the effect of improving the efficiency of document retrieval.

Example five

The fifth embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, a method for navigating and browsing text information as provided in any embodiment of the present invention is implemented. The method may include: obtaining a first text, the first text including one or more first information; obtaining a second text, the second text including one or more second information; matching the first information and the first information The second information is used to determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity.

The computer-readable storage medium of the embodiment of the present invention may adopt any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. Examples of computer-readable storage media (non-exhaustive list) include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable Programmable Read-Only Memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.

The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .

The program code contained on the storage medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the foregoing.

The computer program code for performing the operations of the present disclosure can be written in one or more programming languages or a combination thereof. The programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or terminal. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).

Claims

A method for navigating and browsing text information, including:

Acquiring a first text, where the first text includes first information;

Acquiring a second text, where the second text includes second information;

Matching the first information and the second information to determine the similarity between the second information and the first information;

Navigating and browsing the second text according to the similarity.
The method according to claim 1, wherein the navigation and browsing of the second text according to the similarity comprises:

The second information and the first information are displayed on a navigation browsing interface according to the similarity.
The method of claim 1, wherein said matching said first information and said second information to determine the similarity between said second information and said first information comprises:

Extracting a first key feature from the first information;

Extracting a second key feature from the second information;

The first key feature and the second key feature are matched to determine the similarity between the second information and the first information.
5. The method according to claim 3, before said matching said first key feature and said second key feature to determine the similarity between said second information and said first information, further comprising:

Vectorizing the first key feature based on the trained first comparison model to obtain a first vector result;

Vectorizing the second key feature based on the trained second comparison model to obtain a second vector result;

The matching the first key feature and the second key feature to determine the similarity between the second information and the first information includes:

The first vector result and the second vector result are matched to determine the similarity between the second information and the first information.
The method of claim 3, wherein said extracting the first key feature from the first information comprises:

Processing the first information based on a preset rule to obtain a first processing result;

Use the first processing result as the first key feature.
The method according to claim 5, wherein said processing said first information based on a preset rule to obtain a first processing result comprises:

Acquiring at least one of text information, symbol information, and text structure information of the first information;

The first information is processed based on the acquired information to obtain the first processing result.
7. The method of claim 6, wherein the text information includes stop words, and processing the first information based on the text information to obtain the first processing result comprises:

Analyze and obtain the stop words in the first information;

Extract at least one of the related information before the stop word and the related information after the stop word;

Use the extracted relevant information as the first processing result.
7. The method of claim 6, wherein the symbol information includes at least one of a semicolon and a comma, and processing the first information based on the symbol information to obtain the first processing result comprises:

Extract at least one of the following: related information before the semicolon, related information before the comma, related information after the semicolon, and related information after the comma;

Use the extracted relevant information as the first processing result.
7. The method according to claim 6, wherein the text structure information includes a preamble part and a characteristic part, and processing the first information based on the text structure information to obtain the first processing result comprises:

Extracting at least one of the related information of the preamble part and the related information of the characteristic part;

Use the extracted relevant information as the first processing result.
The method according to claim 1, before said matching said first information and said second information to determine the similarity between said second information and said first information, further comprising:

Receiving chapter selection information of the second text;

Extracting a corresponding chapter based on the chapter selection information as the second information.
The method according to claim 2, after the displaying the second information and the first information on a navigation browsing interface according to the similarity, the method further comprises:

Sort the plurality of second information according to the similarity.
3. The method according to claim 2, wherein the navigation browsing interface further comprises a switch control, and the switch control is used to control the switch display of a plurality of second information.
The method of claim 2, wherein the navigation browsing interface further includes a similarity identifier, and displaying the first information and the second information on the navigation browsing interface according to the similarity includes:

The similar parts of the first information and the second information are highlighted on the navigation browsing interface.
The method of claim 3, wherein the first key feature and the second key feature are both extracted by a text-rank model.
The method according to claim 1, wherein the similarity is determined by at least one of a cosine similarity model and a word vector similarity summation model.
8. The method of claim 4, wherein the first comparison model and the second comparison model both comprise at least one of a word vector model and a recurrent neural network model.
The method of claim 1, wherein the first information and the second information each include at least one of words, sentences, and paragraphs.
The method of claim 1, wherein the first text is a claim.
The method according to claim 1, wherein the second text is a target comparison document.
A navigation and browsing device for text information includes:

A first obtaining module, configured to obtain a first text, wherein the first text includes first information;

A second obtaining module, configured to obtain a second text, wherein the second text includes second information;

A matching module, configured to match the first information and the second information to determine the similarity between the second information and the first information;

The navigation and browsing module is configured to navigate and browse the second text according to the similarity.
The device of claim 20, wherein the navigation and browsing module comprises:

The display unit is configured to display the second information and the first information on a navigation browsing interface according to the similarity.
The apparatus of claim 20, wherein the matching module comprises:

A first extraction unit, configured to extract a first key feature from the first information;

A second extraction unit, configured to extract a second key feature from the second information;

The similarity matching unit is configured to match the first key feature and the second key feature to determine the similarity between the second information and the first information.
The device of claim 22, further comprising:

The first vectorization module is set to vectorize the first key feature based on the trained first comparison model to obtain a first vector result;

The second vectorization module is set to vectorize the second key feature based on the trained second comparison model to obtain a second vector result;

The matching module is configured to match the first vector result and the second vector result to determine the similarity between the second information and the first information.
The apparatus of claim 22, wherein the first extraction unit comprises:

The first processing subunit is configured to process the first information based on a preset rule to obtain a first processing result; and use the first processing result as the first key feature.
The device of claim 24, wherein the first processing subunit is configured to process the first information based on a preset rule in the following manner to obtain the first processing result:

At least one of text information, symbol information, and text structure information of the first information is acquired; the first information is processed based on the acquired information to obtain the first processing result.
The apparatus of claim 20, wherein the second acquisition module comprises:

A receiving unit, configured to receive retrieval information based on the first text;

The retrieval unit is configured to retrieve the second text similar to the first text in the database based on the retrieval information.
The device of claim 20, further comprising:

The chapter selection module is configured to receive chapter selection information of the second text; and extract a corresponding chapter based on the chapter selection information as the second information.
The device of claim 21, further comprising:

The sorting module is configured to sort the plurality of second information according to the similarity.
21. The device of claim 21, wherein the navigation browsing interface further includes a similar identifier, and the display unit includes:

The highlight display unit is configured to highlight similar parts of the first information and the second information on the navigation browsing interface.
A server that includes:

At least one processor;

The storage device is set to store at least one program;

When the at least one program is executed by the at least one processor, the at least one processor implements the method for navigating and browsing text information according to any one of claims 1-19.
A computer-readable storage medium storing a computer program, wherein when the program is executed by a processor, the method for navigating and browsing text information according to any one of claims 1-19 is realized.