WO2021037012A1 - Text information navigation and browsing method, apparatus, server and storage medium - Google Patents

Text information navigation and browsing method, apparatus, server and storage medium Download PDF

Info

Publication number
WO2021037012A1
WO2021037012A1 PCT/CN2020/110994 CN2020110994W WO2021037012A1 WO 2021037012 A1 WO2021037012 A1 WO 2021037012A1 CN 2020110994 W CN2020110994 W CN 2020110994W WO 2021037012 A1 WO2021037012 A1 WO 2021037012A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
text
similarity
key feature
browsing
Prior art date
Application number
PCT/CN2020/110994
Other languages
French (fr)
Chinese (zh)
Inventor
夏宇彬
袁明
孙敏
蔡洁
张�成
Original Assignee
智慧芽信息科技(苏州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 智慧芽信息科技(苏州)有限公司 filed Critical 智慧芽信息科技(苏州)有限公司
Publication of WO2021037012A1 publication Critical patent/WO2021037012A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor

Definitions

  • the present disclosure relates to the field of information processing technology, for example, to a method, device, server, and storage medium for navigating and browsing text information.
  • the present disclosure provides a method, device, server and storage medium for navigating and browsing text information, so as to realize automatic searching for similar or identical content in at least two documents to improve comparison efficiency.
  • a method for navigating and browsing text information including:
  • a navigation and browsing device for text information including:
  • a first obtaining module configured to obtain a first text, wherein the first text includes first information
  • the second obtaining module is configured to obtain a second text, wherein the second text includes information
  • a matching module configured to match the first information and the second information to determine the similarity between the second information and the first information
  • the navigation and browsing module is configured to navigate and browse the second text according to the similarity.
  • a server including:
  • One or more processors are One or more processors;
  • Storage device set to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the aforementioned method for navigating and browsing text information.
  • a computer-readable storage medium is also provided, on which a computer program is stored, and when the program is executed by a processor, the above-mentioned method for navigating and browsing text information is realized.
  • FIG. 1 is a schematic flowchart of a method for navigating and browsing text information according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic flowchart of a method for navigating and browsing text information according to Embodiment 2 of the present invention
  • FIG. 3 is a schematic flowchart of another method for navigating and browsing text information according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic flowchart of another method for navigating and browsing text information according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic structural diagram of a text information navigation and browsing device provided in the third embodiment of the present invention.
  • Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention.
  • first, second, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element.
  • first information may be referred to as second information
  • second information may be referred to as first information. Both the first information and the second information are information, but they are not the same information.
  • the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with “first” and “second” may explicitly or implicitly include one or more of these features.
  • “multiple” and “batch” mean at least two, such as two, three, etc., unless specifically defined otherwise.
  • Fig. 1 is a schematic flow chart of a method for navigating and browsing text information according to Embodiment 1 of the present invention, which can be applied to a scenario where text is compared.
  • the method can be executed by a text information navigation and browsing device, which can be used It can be implemented by software and/or hardware, and can be integrated on the server.
  • the method for navigating and browsing text information provided in the first embodiment of the present invention includes:
  • the first text refers to the text that needs to be analyzed and compared.
  • the first text can be a technical document, such as a dissertation, a patent document, a technical submission, or a technical solution for risk analysis, or part of the content in a patent document or a technical submission. , Such as the text of the technical solution described in the claims and technical disclosure documents, etc., which are not limited here.
  • the first text is the claim.
  • the first information refers to part or all of the information in the first text, and there is no restriction here.
  • the first information is related information describing the technical solution in the first text. Taking the first text as the claim as an example, the first information can be one or more features in the claim, a sentence in the claim, or the entire claim, which is not limited here.
  • the first information includes but is not limited to one or more of words, sentences or paragraphs.
  • the user can select the first information in the first text as needed, or the system can select it by default. There is no restriction here.
  • the first information is one or more. Taking the first information as a claim as an example, when there are multiple first information, multiple claims in the first text can be matched at the same time to find similar second information in the second text, which greatly improves the ratio. The efficiency of the file.
  • the second text is a text that needs to be compared with the first text to determine whether it is similar to the technical solution recorded in the first text.
  • the second text can be technical documents, books, patent documents, etc., or part of the content of technical documents, books, and patent documents, which is not limited here.
  • the second text is the target comparison document.
  • the second information refers to part or all of the information in the second text. There are one or more second information.
  • the second information is related information describing the technical solution in the second text. Taking the second text as a similar patent document as an example, the second information can be the entire specification, a paragraph of the entire specification, or a sentence or word in the specification, which is not limited here.
  • the second information includes one or more of words, sentences, or paragraphs.
  • the second text can be obtained by manually importing the existing text into the navigation and browsing device of the text information. For example, if you find a text that you think is similar to the first text, you can download the text and import it into the navigation and browsing device of the text information to compare with the first information in the first text to determine the similar part and the corresponding position.
  • the similarity refers to the degree of similarity between the first information and the second information. Matching refers to comparing the first information with the second information to determine the similarity.
  • the similarity degree can be expressed in the form of percentage or color. For example, green represents a low degree of similarity, and red represents a high degree of similarity. There is no restriction on the form of similarity here.
  • S140 Navigate and browse the second text according to the similarity.
  • Navigating browsing refers to locating second information similar to the first information in the second text by matching similarity, so as to facilitate quick browsing without manual searching.
  • you can set navigation marks of different colors on the side of the second text.
  • the navigation marks correspond to the row positions of the second information similar to the first information.
  • the user can quickly switch to the first information with higher similarity through the navigation marks.
  • the second information is browsed; in an alternative embodiment, a quick browsing window can also be set to summarize the second information similar to the first information and sort the second information according to the similarity as the browsing index of the second text.
  • the user can click on the corresponding summary Quickly browse the second information in the second text.
  • the second information in the second text similar to the first information can be quickly obtained, which greatly improves the efficiency of comparison.
  • step S140 navigating and browsing the second text according to the similarity may include:
  • the second information and the first information are displayed on a navigation browsing interface according to the similarity.
  • the navigation browsing interface refers to an interface that displays similarity matching results, and is used to find similar locations and content on the navigation browsing interface.
  • the similarity matching result is a result of the similarity of the first information corresponding to one or more second information, and the similarity matching result reflects the similarity between the second information in the second text and the first information in the first text.
  • the similarity matching result can be to display one or more second texts similar to the first information describing the technical solution in the form of all texts; it can also be to display only one or more second texts similar to the first information There is no restriction on the similar part of the.
  • step S130 it may include:
  • Chapter refers to part of the content in the second text.
  • the chapters can be chapters such as claims, descriptions, etc., and can also be background technology, descriptions of drawings, and specific implementations. There is no restriction on the division of chapters here.
  • step S140 it may include:
  • the navigation and browsing interface also includes a similarity mark.
  • the similar parts of the first information and the second information can be highlighted, which can help the user locate the similar parts as soon as possible.
  • Content The way of highlighting can be highlighted, and there is no limitation here.
  • a switch control is included in the search result, and the switch control is used to control switch display of a plurality of second information.
  • the switch control can control to switch to the previous or next item, and can also switch to more similar or sub-similar similar parts. There is no restriction on how to switch the display here.
  • obtaining the second text includes: receiving search information based on the first text; and searching the second text similar to the first text in a database based on the search information.
  • the search information can be the text or graphic part of the first text about the first technical feature, or it can be automatically generated based on the first information, which is not limited here.
  • the first text includes one or more pieces of first information; obtaining a second text, where the second text includes one or more pieces of second information;
  • the first information and the second information determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity.
  • the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results to achieve the effect of improving the efficiency of document retrieval. It solves the problem that the efficiency of comparing files is very low by manually searching for similar or identical content in the comparison files, and it realizes the effect of automatically searching for similar or identical content to improve the efficiency of comparing files.
  • Fig. 2 is a schematic flowchart of a method for navigating and browsing text information according to the second embodiment of the present invention. This embodiment is described on the basis of the above technical solution, and is suitable for the scenario of comparing texts.
  • the method can be executed by a text information navigation and browsing device, which can be implemented in software and/or hardware, and can be integrated on a server.
  • the method for navigating and browsing text information provided by the second embodiment of the present invention includes:
  • the first text refers to the text that needs to be analyzed and compared.
  • the first text can be a technical document, such as a dissertation, a patent document, or a technical submission, or part of the content in a patent document or a technical submission, such as claims and technical submissions.
  • the first text is the claim.
  • the first information refers to part or all of the information in the first text, and there is no restriction here.
  • the first information is related information describing the technical solution in the first text. Taking the first text as the claim as an example, the first information can be one or more features in the claim, a sentence in the claim, or the entire claim, which is not limited here.
  • the first information includes but is not limited to one or more of words, sentences or paragraphs.
  • the second text is a text that needs to be compared with the first text to determine whether it is similar to the technical solution recorded in the first text.
  • the second text can be technical documents, books, patent documents, etc., or part of the content of technical documents, books, and patent documents, which is not limited here.
  • the second text is the target comparison document.
  • the second information refers to part or all of the information in the second text. There are one or more second information.
  • the second information is related information describing the technical solution in the second text. Taking the second text as a similar patent document as an example, the second information can be the entire specification, a paragraph of the entire specification, or a sentence or word in the specification, which is not limited here.
  • the second information includes one or more of words, sentences, or paragraphs.
  • the first key feature refers to the feature related to the first technical feature in the first information.
  • the first information may be one or more of words, sentences or paragraphs, and the first key feature may also be one or more of words, sentences or paragraphs. If the first information is a word, the first key feature is a word; if the first information is a sentence, the first key feature can be a sentence and/or a word; if the first information is a paragraph, the first key feature can be a paragraph , Sentences and/or words.
  • the first key feature is a keyword.
  • the first key feature can be extracted through the key feature extraction model.
  • the key feature extraction model is a text-rank model.
  • the text-rank model is a graph-based ranking model for text. By dividing the text into multiple constituent units (words, sentences) and building a graph model, the voting mechanism is used to rank important components in the text. The information of a single document itself can be used to extract keywords and abstracts.
  • the first information is "a UAV emergency parachute opening system, which is used to open the parachute when the UAV fails, and it is characterized in that: the UAV emergency parachute opening system includes a main control module, Module, power management module, umbrella opening module", the first key feature can be UAV, umbrella opening system, main control module, detection module, power management module, umbrella opening module, etc., or it can be UAV emergency
  • the umbrella opening system includes a main control module, a detection module, a power management module, and an umbrella opening module. There are no restrictions here.
  • the second key feature refers to the feature related to the second technical feature in the second information.
  • the second information may be one or more of words, sentences or paragraphs, and the second key feature may also be one or more of words, sentences or paragraphs. If the second information is a word, the second key feature is a word; if the second information is a sentence, the second key feature can be a sentence and/or a word; if the second information is a paragraph, the second key feature can be a paragraph , Sentences and/or words.
  • the second key feature is a keyword.
  • the second key feature may be a word, sentence, or paragraph.
  • the second key feature can be extracted through the key feature extraction model.
  • the key feature extraction model is a text-rank model.
  • the text-rank model is a graph-based ranking model for text. By dividing the text into multiple constituent units (words, sentences) and building a graph model, the voting mechanism is used to rank important components in the text. The information of a single document itself can be used to extract keywords and abstracts.
  • the second key feature can be a word, sentence, and/or paragraph. That is, when the first key feature is a word, the word of the first key feature can be the same as the word, sentence and/or sentence of the second key feature. Paragraphs are compared, there is no restriction here. Exemplarily, if the first key feature is an unmanned aerial vehicle and the second key feature is an unmanned aerial vehicle, the first key feature and the second key feature can be matched to determine the similarity between the first information and the second information.
  • the similarity can be expressed in the form of percentage or color. For example, green represents low similarity, and red represents high similarity. There is no restriction on the form of similarity here.
  • the similarity may be determined by a cosine similarity model and/or a word vector similarity summation model.
  • the similarity can be determined through the word vector similarity summation model.
  • the word vector similarity summation model refers to the model obtained by using the word vector similarity summation training;
  • the cosine similarity model refers to a model trained using the cosine similarity algorithm. This embodiment does not limit the algorithm for calculating the similarity.
  • Navigating browsing refers to locating second information similar to the first information in the second text by matching similarity, so as to facilitate quick browsing without manual searching.
  • step S250 matching the first key feature and the second key feature to determine the similarity between the second information and the first information can be replaced by:
  • S251 Perform vectorization on the first key feature based on the trained first comparison model to obtain a first vector result.
  • the first comparison model refers to a model that vectorizes the first key feature.
  • vectorization refers to expressing text as a series of vectors that can express the semantics of the text.
  • the first comparison model includes a word to vector (Word2vec) model and/or a recursive neural network recursive autoencoder (recursive autoencoder) model.
  • Word2vec word to vector
  • recursive autoencoder recursive autoencoder
  • S252 Perform vectorization on the second key feature based on the trained second comparison model to obtain a second vector result.
  • the second comparison model refers to a model that vectorizes the second key feature.
  • the first comparison model includes a Word2vec model and/or a recursive neural network recursive autoencoder model.
  • the second comparison model includes a Word2vec model; when the second key feature is a sentence or a paragraph, the second comparison model includes a neural network recursive autoencoder model.
  • the first comparison model includes the Word2vec model and the recursive autoencoder model of the recurrent neural network.
  • the first comparison model and the second comparison model may use the same model or the same type of model.
  • the similarity is determined only after the first key feature and the second key feature are vectorized. It is not only a mechanical comparison of words, but the similarity is determined based on the semantics of the key features, and the similarity is matched. The result is more accurate.
  • step S230 extracting a first key feature from the first information includes:
  • the preset rule refers to a rule for processing the first information, and the first processing result is obtained by processing the first information through the preset rule.
  • Processing the first information based on preset rules to obtain the first processing result may include: acquiring text information, symbol information, and/or text structure information of the first information; based on the text information, symbol information, and/or The text structure information processes the first information to obtain the first processing result.
  • the text information includes stop words.
  • stop words include “the”, “and”, “or”, etc., which are not limited here.
  • Processing the first information based on the text information to obtain the first processing result includes: analyzing and obtaining stop words in the first information; and extracting relevant information before and/or after the stop words .
  • the first information is a sentence or paragraph.
  • the text information may also include other related words, etc., which are not limited here.
  • the symbol information includes semicolon and/or comma. Processing the first information based on the symbol information to obtain the first processing result includes: extracting related information before and/or after the semicolon and/or comma. Exemplarily, if the first information is "the drone includes a main control module and a flight module; the flight module includes a power supply unit", then relevant information such as “the main control module, flight module, and the flight module” is extracted. Optionally, the symbol information may also include other identifying symbols, which is not limited here.
  • the text structure information includes a preamble part and a characteristic part
  • processing the first information based on the text structure information to obtain the first processing result includes: extracting relevant information of the preamble part and/or the characteristic part.
  • relevant information such as "unmanned aerial vehicle, flight module” is extracted.
  • the text structure information may also include other text structure information, which is not limited here.
  • the first information is processed by preset rules to extract key features
  • the extraction method is simple and effective, and the efficiency of retrieving files is improved.
  • the first text includes one or more pieces of first information
  • obtaining a second text where the second text includes one or more pieces of second information
  • the first information and the second information determine the similarity between the second information and the first information
  • the second text is navigated and browsed according to the similarity.
  • the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results, and achieve the effect of improving the efficiency of retrieving files.
  • FIG. 5 is a schematic structural diagram of a text information navigation and browsing device provided in the third embodiment of the present invention. This embodiment can be applied to a scenario where text is compared.
  • the device can be implemented by software and/or hardware, and Can be integrated on the server.
  • the apparatus for navigation and browsing of text information may include a first obtaining module 310, a second obtaining module 320, a matching module 330, and a navigation browsing module 340, wherein:
  • the first obtaining module 310 is configured to obtain a first text, and the first text includes one or more pieces of first information; the second obtaining module 320 is configured to obtain a second text, and the second text includes one or more pieces of information. Second information; a matching module 330, configured to match the first information and the second information to determine the similarity between the second information and the first information; the navigation and browsing module 340, configured to match the similarity Navigate and browse the second text.
  • the navigation browsing module 340 includes: a display unit configured to display the first information and the second information on a navigation browsing interface according to the similarity.
  • the matching module 330 includes: a first extraction unit configured to extract a first key feature from the first information; a second extraction unit configured to extract a second key feature from the second information; similarity The degree matching unit is configured to match the first key feature and the second key feature to determine the similarity between the second information and the first information.
  • the device for navigating and browsing text information further includes: a first vectorization module configured to perform vectorization on the first key feature based on the trained first comparison model to obtain a first vector result; and a second vector
  • the matching module 330 is set to vectorize the second key feature based on the trained second comparison model to obtain a second vector result; the matching module 330 is set to match the first vector result and the second vector result To determine the similarity between the second information and the first information.
  • the first extraction unit includes: a first processing subunit configured to process the first information based on a preset rule to obtain a first processing result; and use the first processing result as the first key feature .
  • the first processing subunit is configured to obtain text information, symbol information, and/or text structure information of the first information; The information is processed to obtain the first processing result.
  • the text information includes stop words
  • the first processing subunit is configured to analyze the stop words in the first information; and extract relevant information before and/or after the stop words.
  • the symbol information includes a semicolon and/or a comma
  • the first processing subunit is configured to extract related information before and/or after the semicolon and/or the comma.
  • the text structure information includes a preamble part and a characteristic part
  • the first processing subunit is configured to extract relevant information of the preamble part and/or the characteristic part.
  • the second acquisition module 320 includes: a receiving unit configured to receive retrieval information based on a first text; a retrieval unit configured to retrieve the first text similar to the first text in a database based on the retrieval information Two text.
  • the apparatus for navigating and browsing text information further includes: a chapter selection module configured to receive chapter selection information of the second text; and extract a corresponding chapter based on the chapter selection information as the second information.
  • a chapter selection module configured to receive chapter selection information of the second text; and extract a corresponding chapter based on the chapter selection information as the second information.
  • the device for navigating and browsing text information further includes: a sorting module configured to sort the second information according to the similarity.
  • the navigation browsing interface further includes: a switching control, the switching control is set to control the switching display of a plurality of second information.
  • the navigation browsing interface further includes a similar identifier
  • the display unit includes a highlight display unit configured to highlight similar parts of the first information and the second information.
  • the key feature is extracted through a text-rank model.
  • the similarity is determined by a cosine similarity model and/or a word vector similarity summation model.
  • the comparison model includes a Word2vec model and/or a recursive neural network recursive autoencoder model.
  • the first information and the second information include one or more of words, sentences or paragraphs.
  • the first text is a claim.
  • the second text is a target comparison document.
  • the navigation and browsing device for text information provided by the embodiment of the present invention can execute the navigation and browsing method for text information provided by any embodiment of the present invention, and has the corresponding functional modules and effects for the execution method.
  • the navigation and browsing device for text information provided by the embodiment of the present invention can execute the navigation and browsing method for text information provided by any embodiment of the present invention, and has the corresponding functional modules and effects for the execution method.
  • Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention.
  • Figure 6 shows a block diagram of an exemplary server 612 suitable for implementing embodiments of the present invention.
  • the server 612 shown in FIG. 6 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
  • the server 612 is represented in the form of a general server.
  • the components of the server 612 may include, but are not limited to: one or more processors 616, a storage device 628, and a bus 618 connecting different system components (including the storage device 628 and the processor 616).
  • the bus 618 represents one or more of several types of bus structures, including a storage device bus or a storage device controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any bus structure among multiple bus structures.
  • these architectures include, but are not limited to, Industry Subversive Alliance (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
  • the server 612 includes a variety of computer system readable media. These media may be any available media that can be accessed by the server 612, including volatile and non-volatile media, removable and non-removable media.
  • the storage device 628 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (RAM) 630 and/or a cache memory 632.
  • the terminal 612 may include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • the storage system 634 may be configured to read and write a non-removable, non-volatile magnetic medium (not shown in FIG. 6, usually referred to as a "hard drive").
  • a disk drive configured to read and write to a removable non-volatile disk (such as a "floppy disk") and a removable non-volatile optical disk such as a compact disc (Compact Disc Read) can be provided.
  • each drive can be connected to the bus 618 through one or more data media interfaces.
  • the storage device 628 may include at least one program product, and the program product has a set of (for example, at least one) program modules, and these program modules are configured to perform the functions of the embodiments of the present invention.
  • a program/utility tool 640 having a set of (at least one) program module 642 may be stored in, for example, the storage device 628.
  • Such program module 642 includes but is not limited to an operating system, one or more application programs, other program modules, and programs Data, each of these examples or a combination may include the realization of a network environment.
  • the program module 642 generally executes the functions and/or methods in the embodiments described in the present disclosure.
  • the server 612 can also communicate with one or more external devices 614 (such as keyboards, pointing terminals, displays 624, etc.), and can also communicate with one or more terminals that enable users to interact with the server 612, and/or communicate with
  • the server 612 can communicate with any terminal (such as a network card, a modem, etc.) that communicates with one or more other computing terminals. Such communication can be performed through an input/output (I/O) interface 622.
  • the server 612 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 620. As shown in FIG.
  • the network adapter 620 communicates with other modules of the server 612 through the bus 618. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the server 612, including but not limited to: microcode, terminal drives, redundant processors, external disk drive arrays, and disk arrays. Independent Disks, RAID) systems, tape drives, and data backup storage systems.
  • the processor 616 executes a variety of functional applications and data processing by running programs stored in the storage device 628, for example, to implement a method for navigating and browsing text information provided by any embodiment of the present invention.
  • the method may include: obtaining the first A text, the first text includes one or more first information; obtain a second text, the second text includes one or more second information; match the first information and the second information to determine The similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity.
  • the first text includes one or more pieces of first information
  • obtaining a second text where the second text includes one or more pieces of second information
  • the first information and the second information determine the similarity between the second information and the first information
  • the second text is navigated and browsed according to the similarity.
  • the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results to achieve the effect of improving the efficiency of document retrieval.
  • the fifth embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored.
  • a method for navigating and browsing text information as provided in any embodiment of the present invention is implemented.
  • the method may include: obtaining a first text, the first text including one or more first information; obtaining a second text, the second text including one or more second information; matching the first information and the first information
  • the second information is used to determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity.
  • the computer-readable storage medium of the embodiment of the present invention may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
  • Examples of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable Programmable Read-Only Memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the storage medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
  • suitable medium including but not limited to wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
  • the computer program code for performing the operations of the present disclosure can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or terminal.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to pass Internet connection.
  • the first text includes one or more pieces of first information
  • obtaining a second text where the second text includes one or more pieces of second information
  • the first information and the second information determine the similarity between the second information and the first information
  • the second text is navigated and browsed according to the similarity.
  • the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results to achieve the effect of improving the efficiency of document retrieval.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A text information navigation and browsing method, an apparatus, a server and a storage medium, the text information navigation and browsing method comprising: acquiring first text, the first text comprising first information (S110); acquiring second text, the second text comprising second information (S120); matching the first information and the second information, so as to determine a degree of similarity of the second information to the first information (S130); according to the degree of similarity, performing navigation and browsing of the second text (S140).

Description

文本信息的导航浏览方法、装置、服务器和存储介质Method, device, server and storage medium for navigation and browsing of text information
本申请要求在2019年08月30日提交中国专利局、申请号为201910816838.0的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office with application number 201910816838.0 on August 30, 2019. The entire content of this application is incorporated into this application by reference.
技术领域Technical field
本公开涉及信息处理技术领域,例如涉及一种文本信息的导航浏览方法、装置、服务器和存储介质。The present disclosure relates to the field of information processing technology, for example, to a method, device, server, and storage medium for navigating and browsing text information.
背景技术Background technique
随着信息处理的迅速发展,如何简单高效地对信息进行比对越来越重要。With the rapid development of information processing, how to compare information simply and efficiently becomes more and more important.
对于专利申请文件的新创性的评判,例如审查员对专利申请文件进行审查时,需要针对专利申请文件中记载的技术特征在专利网站上进行检索,去找到对比文件。发现与本专利申请文件比较相似的对比文件后,需要在对比文件中,查找更加细节的内容进行对照。例如,所检索到的对比文件中哪些内容与本专利申请文件的哪个权利要求特征相似或相同。For the innovative evaluation of patent application documents, for example, when an examiner examines a patent application document, it is necessary to search the patent website for the technical features recorded in the patent application document to find a comparative document. After finding a comparative document that is similar to the patent application document, it is necessary to find more detailed content in the comparative document for comparison. For example, what content in the retrieved comparative documents is similar or identical to which claim features of the patent application document.
通过这种人为参与较多的方式比对专利申请文件和对比文件,比对的效率非常低。By comparing patent application documents and comparison documents through this artificially involved method, the efficiency of the comparison is very low.
发明内容Summary of the invention
本公开提供了一种文本信息的导航浏览方法、装置、服务器和存储介质,以实现自动查找至少两篇文件中的相似或相同的内容以提高比对效率。The present disclosure provides a method, device, server and storage medium for navigating and browsing text information, so as to realize automatic searching for similar or identical content in at least two documents to improve comparison efficiency.
提供了一种文本信息的导航浏览方法,包括:Provides a method for navigating and browsing text information, including:
获取第一文本,其中,所述第一文本包括第一信息;Acquiring a first text, where the first text includes first information;
获取第二文本,其中,所述第二文本包括第二信息;Acquiring a second text, where the second text includes second information;
匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;Matching the first information and the second information to determine the similarity between the second information and the first information;
根据所述相似度对所述第二文本进行导航浏览。Navigating and browsing the second text according to the similarity.
还提供了一种文本信息的导航浏览装置,包括:A navigation and browsing device for text information is also provided, including:
第一获取模块,设置为获取第一文本,其中,所述第一文本包括第一信息;A first obtaining module, configured to obtain a first text, wherein the first text includes first information;
第二获取模块,设置为获取第二文本,其中,所述第二文本包括信息;The second obtaining module is configured to obtain a second text, wherein the second text includes information;
匹配模块,设置为匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;A matching module, configured to match the first information and the second information to determine the similarity between the second information and the first information;
导航浏览模块,设置为根据所述相似度对所述第二文本进行导航浏览。The navigation and browsing module is configured to navigate and browse the second text according to the similarity.
还提供了一种服务器,包括:A server is also provided, including:
一个或多个处理器;One or more processors;
存储装置,设置为存储一个或多个程序;Storage device, set to store one or more programs;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的文本信息的导航浏览方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the aforementioned method for navigating and browsing text information.
还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的文本信息的导航浏览方法。A computer-readable storage medium is also provided, on which a computer program is stored, and when the program is executed by a processor, the above-mentioned method for navigating and browsing text information is realized.
附图说明Description of the drawings
图1是本发明实施例一提供的一种文本信息的导航浏览方法的流程示意图;FIG. 1 is a schematic flowchart of a method for navigating and browsing text information according to Embodiment 1 of the present invention;
图2是本发明实施例二提供的一种文本信息的导航浏览方法的流程示意图;2 is a schematic flowchart of a method for navigating and browsing text information according to Embodiment 2 of the present invention;
图3是本发明实施例二提供的另一种文本信息的导航浏览方法的流程示意图;3 is a schematic flowchart of another method for navigating and browsing text information according to Embodiment 2 of the present invention;
图4是本发明实施例二提供的另一种文本信息的导航浏览方法的流程示意图;4 is a schematic flowchart of another method for navigating and browsing text information according to Embodiment 2 of the present invention;
图5是本发明实施例三提供的一种文本信息的导航浏览装置的结构示意图;FIG. 5 is a schematic structural diagram of a text information navigation and browsing device provided in the third embodiment of the present invention;
图6是本发明实施例四提供的一种服务器的结构示意图。Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention.
具体实施方式detailed description
下面结合附图和实施例对本公开作说明。The present disclosure will be described below with reference to the drawings and embodiments.
在讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,多个步骤的顺序可以被重新安排。当其操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子程序等等。It should be mentioned before discussing the exemplary embodiments that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many of the steps can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when its operation is completed, but may also have additional steps not included in the drawings. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
此外,术语“第一”、“第二”等可在本文中用于描述多种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用于将第一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区 分。举例来说,在不脱离本申请的范围的情况下,可以将第一信息称为第二信息,且类似地,可将第二信息称为第一信息。第一信息和第二信息两者都是信息,但其不是同一信息。术语“第一”、“第二”等而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本公开的描述中,“多个”、“批量”的含义是至少两个,例如两个,三个等,除非另有明确的限定。In addition, the terms "first", "second", etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish a first direction, action, step or element from another direction, action, step or element. For example, without departing from the scope of the present application, the first information may be referred to as second information, and similarly, the second information may be referred to as first information. Both the first information and the second information are information, but they are not the same information. The terms "first", "second", etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present disclosure, "multiple" and "batch" mean at least two, such as two, three, etc., unless specifically defined otherwise.
实施例一Example one
图1为本发明实施例一提供的一种文本信息的导航浏览方法的流程示意图,可适用于对文本进行比对的场景,该方法可以由文本信息的导航浏览装置来执行,该装置可以采用软件和/或硬件的方式实现,并可集成在服务器上。Fig. 1 is a schematic flow chart of a method for navigating and browsing text information according to Embodiment 1 of the present invention, which can be applied to a scenario where text is compared. The method can be executed by a text information navigation and browsing device, which can be used It can be implemented by software and/or hardware, and can be integrated on the server.
如图1所示,本发明实施例一提供的文本信息的导航浏览方法包括:As shown in FIG. 1, the method for navigating and browsing text information provided in the first embodiment of the present invention includes:
S110、获取第一文本,所述第一文本包括一个或多个第一信息。S110. Acquire a first text, where the first text includes one or more pieces of first information.
第一文本是指需要进行分析比对的文本。在本实施例中,第一文本可以是技术文献,例如论文,专利文件,也可以是技术交底书,也可以是待风险分析的技术方案,也可以为专利文件或技术交底书中的部分内容,例如权利要求书、技术交底书中记载的技术方案的文字等,此处不作限制。一实施例中,第一文本为权利要求书。第一信息是指在第一文本中的部分或全部信息,此处不做限制。一实施例中,第一信息是在第一文本中,描述技术方案的相关信息。以第一文本为权利要求书为例,则第一信息可以是权利要求中的一个或多个特征,也可以是权利要求中的一个句子,也可以是整条权利要求,此处不作限制。The first text refers to the text that needs to be analyzed and compared. In this embodiment, the first text can be a technical document, such as a dissertation, a patent document, a technical submission, or a technical solution for risk analysis, or part of the content in a patent document or a technical submission. , Such as the text of the technical solution described in the claims and technical disclosure documents, etc., which are not limited here. In one embodiment, the first text is the claim. The first information refers to part or all of the information in the first text, and there is no restriction here. In an embodiment, the first information is related information describing the technical solution in the first text. Taking the first text as the claim as an example, the first information can be one or more features in the claim, a sentence in the claim, or the entire claim, which is not limited here.
可选的,第一信息包括但不限于词语、句子或段落中的一种或多种。用户可以根据需要选定第一文本中的第一信息,也可以是系统默认进行选择。此处不作限制。可选的,第一信息为一个或多个。以第一信息为权利要求为例,当第一信息为多个时,能同时匹配第一文本中的多个权利要求,以在第二文本中查找出相似的第二信息,大大提高了比对文件的效率。Optionally, the first information includes but is not limited to one or more of words, sentences or paragraphs. The user can select the first information in the first text as needed, or the system can select it by default. There is no restriction here. Optionally, the first information is one or more. Taking the first information as a claim as an example, when there are multiple first information, multiple claims in the first text can be matched at the same time to find similar second information in the second text, which greatly improves the ratio. The efficiency of the file.
S120、获取第二文本,所述第二文本包括一个或多个第二信息。S120. Acquire a second text, where the second text includes one or more pieces of second information.
第二文本是需要与第一文本进行比对,以确定是否与第一文本记载的技术方案相似的文本。在本实施例中,第二文本可以是技术文献、书籍、专利文件等,也可以是技术文献、书籍、专利文件的部分内容,此处不作限制。一实施例中,第二文本为目标对比文件。第二信息是指第二文本中的部分或全部信息。第二信息有一个或多个。一实施例中,第二信息是在第二文本中,描述技术方 案的相关信息。以第二文本为相似的专利文件为例,第二信息可以是整个说明书,也可以是整个说明书的一个段落,也可以是说明书中的一个句子或词语,此处不作限制。The second text is a text that needs to be compared with the first text to determine whether it is similar to the technical solution recorded in the first text. In this embodiment, the second text can be technical documents, books, patent documents, etc., or part of the content of technical documents, books, and patent documents, which is not limited here. In one embodiment, the second text is the target comparison document. The second information refers to part or all of the information in the second text. There are one or more second information. In one embodiment, the second information is related information describing the technical solution in the second text. Taking the second text as a similar patent document as an example, the second information can be the entire specification, a paragraph of the entire specification, or a sentence or word in the specification, which is not limited here.
可选的,第二信息包括词语、句子或段落中的一种或多种。Optionally, the second information includes one or more of words, sentences, or paragraphs.
第二文本可以通过将已有文本人工导入文本信息的导航浏览装置中得到。例如发现一个认为与第一文本相似的文本,则把该文本下载,导入到文本信息的导航浏览装置,就能实现和第一文本中的第一信息进行比对,以确定相似部分及对应的位置。The second text can be obtained by manually importing the existing text into the navigation and browsing device of the text information. For example, if you find a text that you think is similar to the first text, you can download the text and import it into the navigation and browsing device of the text information to compare with the first information in the first text to determine the similar part and the corresponding position.
S130、匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度。S130. Match the first information and the second information to determine the similarity between the second information and the first information.
相似度是指第一信息和第二信息的相似程度。匹配是指将第一信息和第二信息进行比对,以确定相似度。相似度可以是以百分比的形式体现,也可以是以颜色进行体现,例如绿色代表相似度低,红色代表相似度高,此处对于相似度的形式不作限制。通过匹配第一信息和第二信息的相似度,以在第二文本中确定与第一信息相似的位置。The similarity refers to the degree of similarity between the first information and the second information. Matching refers to comparing the first information with the second information to determine the similarity. The similarity degree can be expressed in the form of percentage or color. For example, green represents a low degree of similarity, and red represents a high degree of similarity. There is no restriction on the form of similarity here. By matching the similarity between the first information and the second information, a position similar to the first information in the second text is determined.
S140、根据所述相似度对所述第二文本进行导航浏览。S140: Navigate and browse the second text according to the similarity.
导航浏览是指在第二文本中,通过匹配相似度定位出与第一信息相似的第二信息,以便于快速浏览,不需要通过人工去查找。一实施中可以通过在第二文本侧面设置不同颜色的导航标识,导航标识对应和第一信息相似的第二信息的行位置,用户可以通过导航标识迅速切换对第一信息的相似度较高的第二信息进行浏览;替代实施例中也可以设置快速浏览窗口将和第一信息相似的第二信息摘要出来并按相似度进行排序作为第二文本的浏览索引,用户点击对应的摘要即可对第二文本中的第二信息进行快速浏览。在本实施例中,通过匹配第一信息和第二信息的相似度,快速得出与第一信息相似的第二文本中的第二信息,大大提高了比对的效率。Navigating browsing refers to locating second information similar to the first information in the second text by matching similarity, so as to facilitate quick browsing without manual searching. In one implementation, you can set navigation marks of different colors on the side of the second text. The navigation marks correspond to the row positions of the second information similar to the first information. The user can quickly switch to the first information with higher similarity through the navigation marks. The second information is browsed; in an alternative embodiment, a quick browsing window can also be set to summarize the second information similar to the first information and sort the second information according to the similarity as the browsing index of the second text. The user can click on the corresponding summary Quickly browse the second information in the second text. In this embodiment, by matching the similarity between the first information and the second information, the second information in the second text similar to the first information can be quickly obtained, which greatly improves the efficiency of comparison.
可选的,步骤S140、根据所述相似度对所述第二文本进行导航浏览可以包括:Optionally, step S140, navigating and browsing the second text according to the similarity may include:
根据所述相似度在导航浏览界面展示所述第二信息和所述第一信息。The second information and the first information are displayed on a navigation browsing interface according to the similarity.
导航浏览界面是指展示相似度匹配结果的界面,以用于在导航浏览界面上查找相似的位置和内容。相似度匹配结果为第一信息对应一个或多个第二信息相似度的结果,相似度匹配结果反映了第二文本中的第二信息与第一文本中的第一信息的相似程度。相似度匹配结果可以是显示一个或多个与描述技术方案的第一信息相似的第二文本,以全部文本的形式体现;也可以是只显示一个或 多个与第一信息相似的第二文本的相似部分,此处不作限制。The navigation browsing interface refers to an interface that displays similarity matching results, and is used to find similar locations and content on the navigation browsing interface. The similarity matching result is a result of the similarity of the first information corresponding to one or more second information, and the similarity matching result reflects the similarity between the second information in the second text and the first information in the first text. The similarity matching result can be to display one or more second texts similar to the first information describing the technical solution in the form of all texts; it can also be to display only one or more second texts similar to the first information There is no restriction on the similar part of the.
可选的,在步骤S130之前可以包括:Optionally, before step S130, it may include:
接收所述第二文本的章节选择信息;基于所述章节选择信息提取对应的章节作为所述第二信息。Receiving chapter selection information of the second text; extracting a corresponding chapter based on the chapter selection information as the second information.
章节是指第二文本中的部分内容。以第二文本为专利文件为例,章节可以是权利要求书、说明书等章节,还可以是背景技术、附图说明、具体实施方式,此处对于章节的区分不作限制。通过在比对前选择指定的章节,可以针对性的对部分内容进行相似度的匹配,能精准匹配第一信息和第二信息。Chapter refers to part of the content in the second text. Taking the second text as a patent document as an example, the chapters can be chapters such as claims, descriptions, etc., and can also be background technology, descriptions of drawings, and specific implementations. There is no restriction on the division of chapters here. By selecting the specified chapter before the comparison, the similarity of some content can be matched in a targeted manner, and the first information and the second information can be accurately matched.
可选的,在步骤S140之后,可以包括:Optionally, after step S140, it may include:
根据所述相似度对所述第二信息进行排序。通过对相似度匹配结果按照从大到小的顺序进行排序,可以优先查看相似度高的内容,节省对比区分度大小的时间。Sort the second information according to the similarity. By sorting the similarity matching results in descending order, the content with high similarity can be viewed first, saving the time of comparing the size of the distinction.
可选的,导航浏览界面还包括相似标识,可以根据第一信息和第二信息的相似度匹配结果,对于第一信息和第二信息的相似部分进行突出显示,能帮助用户尽快定位到相似部分的内容。突出显示的方式可以是高亮的形式,此处不作限制。Optionally, the navigation and browsing interface also includes a similarity mark. According to the similarity matching result of the first information and the second information, the similar parts of the first information and the second information can be highlighted, which can help the user locate the similar parts as soon as possible. Content. The way of highlighting can be highlighted, and there is no limitation here.
可选的,在检索结果中包括切换控件,所述切换控件用于控制对多个第二信息进行切换显示。切换控件可以控制切换至上一条或下一条,也可以切换至更相似或次相似的相似部分上,此处对于如何切换显示不作限制。Optionally, a switch control is included in the search result, and the switch control is used to control switch display of a plurality of second information. The switch control can control to switch to the previous or next item, and can also switch to more similar or sub-similar similar parts. There is no restriction on how to switch the display here.
可选的,获取第二文本包括:接收基于第一文本的检索信息;基于所述检索信息在数据库中检索与所述第一文本相似的所述第二文本。检索信息可以是第一文本中关于第一技术特征的文字或图形部分,也可以是根据第一信息自动生成,此处不作限制。Optionally, obtaining the second text includes: receiving search information based on the first text; and searching the second text similar to the first text in a database based on the search information. The search information can be the text or graphic part of the first text about the first technical feature, or it can be automatically generated based on the first information, which is not limited here.
本发明实施例的技术方案,通过获取第一文本,所述第一文本包括一个或多个第一信息;获取第二文本,所述第二文本包括一个或多个第二信息;匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;根据所述相似度对所述第二文本进行导航浏览。通过自动匹配相似度,在第二文本中自动查找与第一信息相似或相同的信息,可以快速地确认在第二文本中,哪些部分与第一文本中的第一信息相似,不需要通过人工在第二文本中寻找与第一信息相关的内容。可以针对匹配结果有目的的进行细节确认,达到提高检索文件的效率的效果。解决了通过人工方式在对比文件中查找相似或相同的内容进行比对,比对文件的效率非常低的问题,实现了自动查找相似或相同的内容以提高比对文件的效率的效果。According to the technical solution of the embodiment of the present invention, by obtaining a first text, the first text includes one or more pieces of first information; obtaining a second text, where the second text includes one or more pieces of second information; The first information and the second information determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity. Through automatic matching of similarity, the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results to achieve the effect of improving the efficiency of document retrieval. It solves the problem that the efficiency of comparing files is very low by manually searching for similar or identical content in the comparison files, and it realizes the effect of automatically searching for similar or identical content to improve the efficiency of comparing files.
实施例二Example two
图2是本发明实施例二提供的一种文本信息的导航浏览方法的流程示意图。本实施例是在上述技术方案的基础上进行说明,适用于对文本进行比对的场景。该方法可以由文本信息的导航浏览装置来执行,该装置可以采用软件和/或硬件的方式实现,并可集成在服务器上。Fig. 2 is a schematic flowchart of a method for navigating and browsing text information according to the second embodiment of the present invention. This embodiment is described on the basis of the above technical solution, and is suitable for the scenario of comparing texts. The method can be executed by a text information navigation and browsing device, which can be implemented in software and/or hardware, and can be integrated on a server.
如图2所示,本发明实施例二提供的文本信息的导航浏览方法包括:As shown in FIG. 2, the method for navigating and browsing text information provided by the second embodiment of the present invention includes:
S210、获取第一文本,所述第一文本包括一个或多个第一信息。S210. Acquire a first text, where the first text includes one or more pieces of first information.
第一文本是指需要进行分析比对的文本。在本实施例中,第一文本可以是技术文献,例如论文,专利文件,也可以是技术交底书,也可以为专利文件或技术交底书中的部分内容,例如权利要求书、技术交底书中记载的技术方案的文字等,此处不作限制。一实施例中,第一文本为权利要求书。第一信息是指在第一文本中的部分或全部信息,此处不做限制。一实施例中,第一信息是在第一文本中,描述技术方案的相关信息。以第一文本为权利要求书为例,则第一信息可以是权利要求中的一个或多个特征,也可以是权利要求中的一个句子,也可以是整条权利要求,此处不作限制。The first text refers to the text that needs to be analyzed and compared. In this embodiment, the first text can be a technical document, such as a dissertation, a patent document, or a technical submission, or part of the content in a patent document or a technical submission, such as claims and technical submissions. There are no restrictions on the text of the technical solution described here. In one embodiment, the first text is the claim. The first information refers to part or all of the information in the first text, and there is no restriction here. In an embodiment, the first information is related information describing the technical solution in the first text. Taking the first text as the claim as an example, the first information can be one or more features in the claim, a sentence in the claim, or the entire claim, which is not limited here.
可选的,第一信息包括但不限于词语、句子或段落中的一种或多种。Optionally, the first information includes but is not limited to one or more of words, sentences or paragraphs.
S220、获取第二文本,所述第二文本包括一个或多个第二信息。S220. Acquire a second text, where the second text includes one or more pieces of second information.
第二文本是需要与第一文本进行比对,以确定是否与第一文本记载的技术方案相似的文本。在本实施例中,第二文本可以是技术文献、书籍、专利文件等,也可以是技术文献、书籍、专利文件的部分内容,此处不作限制。一实施例中,第二文本为目标对比文件。第二信息是指第二文本中的部分或全部信息。第二信息有一个或多个。一实施例中,第二信息是在第二文本中,描述技术方案的相关信息。以第二文本为相似的专利文件为例,第二信息可以是整个说明书,也可以是整个说明书的一个段落,也可以是说明书中的一个句子或词语,此处不作限制。The second text is a text that needs to be compared with the first text to determine whether it is similar to the technical solution recorded in the first text. In this embodiment, the second text can be technical documents, books, patent documents, etc., or part of the content of technical documents, books, and patent documents, which is not limited here. In one embodiment, the second text is the target comparison document. The second information refers to part or all of the information in the second text. There are one or more second information. In an embodiment, the second information is related information describing the technical solution in the second text. Taking the second text as a similar patent document as an example, the second information can be the entire specification, a paragraph of the entire specification, or a sentence or word in the specification, which is not limited here.
可选的,第二信息包括词语、句子或段落中的一种或多种。Optionally, the second information includes one or more of words, sentences, or paragraphs.
S230、在所述第一信息中提取第一关键特征。S230. Extract a first key feature from the first information.
第一关键特征是指在第一信息中,与第一技术特征相关的特征。第一信息可能是词语、句子或段落中的一种或多种,则第一关键特征也可以是词语、句子或段落中的一种或多种。如果第一信息是词语,则第一关键特征是词语;如果第一信息为句子,则第一关键特征可以是句子和/或词语;如果第一信息是段 落,则第一关键特征可以是段落、句子和/或词语。一实施例中,第一关键特征为关键词语。The first key feature refers to the feature related to the first technical feature in the first information. The first information may be one or more of words, sentences or paragraphs, and the first key feature may also be one or more of words, sentences or paragraphs. If the first information is a word, the first key feature is a word; if the first information is a sentence, the first key feature can be a sentence and/or a word; if the first information is a paragraph, the first key feature can be a paragraph , Sentences and/or words. In one embodiment, the first key feature is a keyword.
可以通过关键特征提取模型提取第一关键特征。一实施例中,关键特征提取模型为文本排名(text-rank)模型。text-rank模型是一种用于文本的基于图的排序模型,通过把文本分割成多个组成单元(单词、句子)并建立图模型,利用投票机制对文本中的重要成分进行排序,仅利用单篇文档本身的信息即可实现关键词提取、文摘。The first key feature can be extracted through the key feature extraction model. In one embodiment, the key feature extraction model is a text-rank model. The text-rank model is a graph-based ranking model for text. By dividing the text into multiple constituent units (words, sentences) and building a graph model, the voting mechanism is used to rank important components in the text. The information of a single document itself can be used to extract keywords and abstracts.
示例性的,假设第一信息为“一种无人机应急开伞系统,其用于在无人机发生故障时开伞,其特征在于:无人机应急开伞系统包括主控模块、检测模块、电源管理模块、开伞模块”,则第一关键特征可以是无人机、开伞系统、主控模块、检测模块、电源管理模块、开伞模块等词语,也可以是无人机应急开伞系统包括主控模块、检测模块、电源管理模块、开伞模块,此处不作限制。Exemplarily, suppose the first information is "a UAV emergency parachute opening system, which is used to open the parachute when the UAV fails, and it is characterized in that: the UAV emergency parachute opening system includes a main control module, Module, power management module, umbrella opening module", the first key feature can be UAV, umbrella opening system, main control module, detection module, power management module, umbrella opening module, etc., or it can be UAV emergency The umbrella opening system includes a main control module, a detection module, a power management module, and an umbrella opening module. There are no restrictions here.
S240、在所述第二信息中提取第二关键特征。S240. Extract a second key feature from the second information.
第二关键特征是指在第二信息中,与第二技术特征相关的特征。第二信息可能是词语、句子或段落中的一种或多种,则第二关键特征也可以是词语、句子或段落中的一种或多种。如果第二信息是词语,则第二关键特征是词语;如果第二信息为句子,则第二关键特征可以是句子和/或词语;如果第二信息是段落,则第二关键特征可以是段落、句子和/或词语。一实施例中,第二关键特征为关键词语。The second key feature refers to the feature related to the second technical feature in the second information. The second information may be one or more of words, sentences or paragraphs, and the second key feature may also be one or more of words, sentences or paragraphs. If the second information is a word, the second key feature is a word; if the second information is a sentence, the second key feature can be a sentence and/or a word; if the second information is a paragraph, the second key feature can be a paragraph , Sentences and/or words. In one embodiment, the second key feature is a keyword.
第一关键特征为词语时,第二关键特征可以是词语、句子或段落。When the first key feature is a word, the second key feature may be a word, sentence, or paragraph.
可以通过关键特征提取模型提取第二关键特征。一实施例中,关键特征提取模型为text-rank模型。text-rank模型是一种用于文本的基于图的排序模型,通过把文本分割成多个组成单元(单词、句子)并建立图模型,利用投票机制对文本中的重要成分进行排序,仅利用单篇文档本身的信息即可实现关键词提取、文摘。The second key feature can be extracted through the key feature extraction model. In one embodiment, the key feature extraction model is a text-rank model. The text-rank model is a graph-based ranking model for text. By dividing the text into multiple constituent units (words, sentences) and building a graph model, the voting mechanism is used to rank important components in the text. The information of a single document itself can be used to extract keywords and abstracts.
S250、匹配所述第一关键特征和所述第二关键特征以确定所述第二信息和所述第一信息的相似度。S250. Match the first key feature and the second key feature to determine the similarity between the second information and the first information.
第一关键特征为词语时,第二关键特征可以是词语、句子和/或段落,即第一关键特征为词语时,第一关键特征的词语可以与第二关键特征的词语、句子和/或段落进行比对,此处不作限制。示例性的,第一关键特征为无人机,第二关键特征为无人飞行器,则可以匹配第一关键特征和第二关键特征已确定第一信息和第二信息的相似度。When the first key feature is a word, the second key feature can be a word, sentence, and/or paragraph. That is, when the first key feature is a word, the word of the first key feature can be the same as the word, sentence and/or sentence of the second key feature. Paragraphs are compared, there is no restriction here. Exemplarily, if the first key feature is an unmanned aerial vehicle and the second key feature is an unmanned aerial vehicle, the first key feature and the second key feature can be matched to determine the similarity between the first information and the second information.
相似度可以是以百分比的形式体现,也可以是以颜色进行体现,例如绿色 代表相似度低,红色代表相似度高,此处对于相似度的形式不作限制。通过匹配第一关键特征和第二关键特征的相似度,以确定第一信息和第二信息的相似度,就能确定第一技术特征和第二技术特征的相似度。The similarity can be expressed in the form of percentage or color. For example, green represents low similarity, and red represents high similarity. There is no restriction on the form of similarity here. By matching the similarity between the first key feature and the second key feature to determine the similarity between the first information and the second information, the similarity between the first technical feature and the second technical feature can be determined.
可选的,在本实施例中,相似度可以通过余弦(cosine)相似度模型和/或词向量相似度求和模型确定。第一关键特征和第二关键特征均为词语时,则可以通过词向量相似度求和模型确定相似度,词向量相似度求和模型是指使用词向量相似度求和训练得到的模型;第一关键特征为句子或段落,且第二关键特征也为句子或段落时,可以通过余弦相似度模型确定相似度,余弦相似度模型是指使用余弦相似度算法训练得到的模型。本实施例对于计算相似度的算法不作限制。Optionally, in this embodiment, the similarity may be determined by a cosine similarity model and/or a word vector similarity summation model. When the first key feature and the second key feature are both words, the similarity can be determined through the word vector similarity summation model. The word vector similarity summation model refers to the model obtained by using the word vector similarity summation training; When one key feature is a sentence or a paragraph, and the second key feature is also a sentence or a paragraph, the similarity can be determined by the cosine similarity model. The cosine similarity model refers to a model trained using the cosine similarity algorithm. This embodiment does not limit the algorithm for calculating the similarity.
S260、根据所述相似度对所述第二文本进行导航浏览。S260. Navigate and browse the second text according to the similarity.
导航浏览是指在第二文本中,通过匹配相似度定位出与第一信息相似的第二信息,以便于快速浏览,不需要通过人工去查找。Navigating browsing refers to locating second information similar to the first information in the second text by matching similarity, so as to facilitate quick browsing without manual searching.
参考图3,在一代替实施例中,步骤S250、匹配所述第一关键特征和所述第二关键特征以确定所述第二信息和所述第一信息的相似度可以代替为:Referring to FIG. 3, in an alternative embodiment, step S250, matching the first key feature and the second key feature to determine the similarity between the second information and the first information can be replaced by:
S251、基于训练好的第一比对模型对所述第一关键特征进行向量化得到第一向量结果。S251: Perform vectorization on the first key feature based on the trained first comparison model to obtain a first vector result.
第一比对模型是指对第一关键特征进行向量化的模型。在本实施例中,向量化是指将文本表示成一系列能够表达文本语义的向量。一实施例中,第一比对模型包括词到向量(Word to Vector,Word2vec)模型和/或递归神经网络递归自动编码器(recursive autoencoder)模型。当第一关键特征为词语时,则第一比对模型包括Word2vec模型;当第一关键特征为句子或段落时,则第一比对模型包括归神经网络recursive autoencoder模型。如果第一关键特征既包括词语,又包括句子或段落时,则第一比对模型包括Word2vec模型和递归神经网络recursive autoencoder模型。本实施例对于第一比对模型是何模型不作限制。The first comparison model refers to a model that vectorizes the first key feature. In this embodiment, vectorization refers to expressing text as a series of vectors that can express the semantics of the text. In an embodiment, the first comparison model includes a word to vector (Word2vec) model and/or a recursive neural network recursive autoencoder (recursive autoencoder) model. When the first key feature is a word, the first comparison model includes a Word2vec model; when the first key feature is a sentence or a paragraph, the first comparison model includes a neural network recursive autoencoder model. If the first key feature includes both words and sentences or paragraphs, the first comparison model includes a Word2vec model and a recursive neural network recursive autoencoder model. This embodiment does not limit which model the first comparison model is.
S252、基于训练好的第二比对模型对所述第二关键特征进行向量化得到第二向量结果。S252: Perform vectorization on the second key feature based on the trained second comparison model to obtain a second vector result.
第二比对模型是指对第二关键特征进行向量化的模型。一实施例中,第一比对模型包括Word2vec模型和/或递归神经网络recursive autoencoder模型。当第二关键特征为词语时,则第二比对模型包括Word2vec模型;当第二关键特征为句子或段落时,则第二比对模型包括归神经网络recursive autoencoder模型。如果第二关键特征既包括词语,又包括句子或段落时,则第一比对模型包括Word2vec模型和递归神经网络recursive autoencoder模型。本实施例对于第二比 对模型是何模型不作限制。一实施例中第一比对模型和第二比对模型可以采用同一个模型或同一类模型。The second comparison model refers to a model that vectorizes the second key feature. In an embodiment, the first comparison model includes a Word2vec model and/or a recursive neural network recursive autoencoder model. When the second key feature is a word, the second comparison model includes a Word2vec model; when the second key feature is a sentence or a paragraph, the second comparison model includes a neural network recursive autoencoder model. If the second key feature includes both words and sentences or paragraphs, the first comparison model includes the Word2vec model and the recursive autoencoder model of the recurrent neural network. This embodiment does not limit which model the second comparison model is. In an embodiment, the first comparison model and the second comparison model may use the same model or the same type of model.
S253、匹配所述第一向量结果和所述第二向量结果以确定所述第二信息和所述第一信息的相似度。S253. Match the first vector result and the second vector result to determine the similarity between the second information and the first information.
本实施例在通过对第一关键特征和第二关键特征进行向量化后才进行相似度的确定,不是仅对词语的机械对比,而是基于关键特征的语义进行相似度的确定,匹配相似度的结果更加准确。In this embodiment, the similarity is determined only after the first key feature and the second key feature are vectorized. It is not only a mechanical comparison of words, but the similarity is determined based on the semantics of the key features, and the similarity is matched. The result is more accurate.
参考图4,可选的,步骤S230、在所述第一信息中提取第一关键特征包括:Referring to FIG. 4, optionally, step S230, extracting a first key feature from the first information includes:
S231、基于预设规则对所述第一信息进行处理得到第一处理结果。S231. Process the first information based on a preset rule to obtain a first processing result.
预设规则是指对第一信息进行处理的规则,通过预设规则对第一信息进行处理得到第一处理结果。基于预设规则对所述第一信息进行处理得到第一处理结果可以包括:获取所述第一信息的文字信息、符号信息和/或文字结构信息;基于所述文字信息、符号信息和/或文字结构信息对所述第一信息进行处理得到所述第一处理结果。The preset rule refers to a rule for processing the first information, and the first processing result is obtained by processing the first information through the preset rule. Processing the first information based on preset rules to obtain the first processing result may include: acquiring text information, symbol information, and/or text structure information of the first information; based on the text information, symbol information, and/or The text structure information processes the first information to obtain the first processing result.
文字信息包括停用词。例如,停用词包括“所述”、“和”、“或”等,此处不作限制。基于所述文字信息对所述第一信息进行处理得到所述第一处理结果,包括:分析得到所述第一信息中的停用词;提取所述停用词之前和/或之后的相关信息。示例性的,第一信息为句子或段落,例如“所述无人机包括主控模块和飞行模块”,则提取“无人机、主控模块、飞行模块”等相关信息,通过文字信息可以快速提取关键特征,以方便进行相似度匹配。可选的,文字信息还可以包括其他相关词等,此处不作限制。The text information includes stop words. For example, stop words include "the", "and", "or", etc., which are not limited here. Processing the first information based on the text information to obtain the first processing result includes: analyzing and obtaining stop words in the first information; and extracting relevant information before and/or after the stop words . Exemplarily, the first information is a sentence or paragraph. For example, "the drone includes a main control module and a flight module", then relevant information such as "the drone, the main control module, and the flight module" is extracted, and the text information can be used to Quickly extract key features to facilitate similarity matching. Optionally, the text information may also include other related words, etc., which are not limited here.
符号信息包括分号和/或顿号。基于所述符号信息对所述第一信息进行处理得到所述第一处理结果,包括:提取所述分号和/或顿号之前和/或之后的相关信息。示例性的,第一信息为“所述无人机包括主控模块、飞行模块;所述飞行模块包括电源单元”,则提取“主控模块、飞行模块、所述飞行模块”等相关信息。可选的,符号信息还可以包括其他具有标识性的符号,此处不作限制。The symbol information includes semicolon and/or comma. Processing the first information based on the symbol information to obtain the first processing result includes: extracting related information before and/or after the semicolon and/or comma. Exemplarily, if the first information is "the drone includes a main control module and a flight module; the flight module includes a power supply unit", then relevant information such as "the main control module, flight module, and the flight module" is extracted. Optionally, the symbol information may also include other identifying symbols, which is not limited here.
文字结构信息包括前序部分和特征部分,基于所述文字结构信息对所述第一信息进行处理得到所述第一处理结果,包括:提取所述前序部分和/或特征部分的相关信息。示例性的,第一信息为“一种无人机,其特征在于,包括飞行模块”,则提取“无人机、飞行模块”等相关信息。可选的,文字结构信息还可以包括其他文字结构信息,此处不作限制。The text structure information includes a preamble part and a characteristic part, and processing the first information based on the text structure information to obtain the first processing result includes: extracting relevant information of the preamble part and/or the characteristic part. Exemplarily, if the first information is "an unmanned aerial vehicle, which is characterized by including a flight module", then relevant information such as "unmanned aerial vehicle, flight module" is extracted. Optionally, the text structure information may also include other text structure information, which is not limited here.
S232、将所述第一处理结果作为所述第一关键特征。S232. Use the first processing result as the first key feature.
在本实施例中,通过预设规则对所述第一信息进行处理以提取关键特征, 提取的方式简单有效,提高了检索文件的效率。In this embodiment, the first information is processed by preset rules to extract key features, the extraction method is simple and effective, and the efficiency of retrieving files is improved.
本发明实施例的技术方案,通过获取第一文本,所述第一文本包括一个或多个第一信息;获取第二文本,所述第二文本包括一个或多个第二信息;匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;根据所述相似度对所述第二文本进行导航浏览。通过自动匹配相似度,在第二文本中自动查找与第一信息相似或相同的信息,可以快速地确认在第二文本中,哪些部分与第一文本中的第一信息相似,不需要通过人工在第二文本中寻找与第一信息相关的内容。可以针对匹配结果有目的的进行细节确认,达到提高检索文件的效率的效果。According to the technical solution of the embodiment of the present invention, by obtaining a first text, the first text includes one or more pieces of first information; obtaining a second text, where the second text includes one or more pieces of second information; The first information and the second information determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity. Through automatic matching of similarity, the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results, and achieve the effect of improving the efficiency of retrieving files.
实施例三Example three
图5是本发明实施例三提供的一种文本信息的导航浏览装置的结构示意图,本实施例可适用于对文本进行比对的场景,该装置可以采用软件和/或硬件的方式实现,并可集成在服务器上。FIG. 5 is a schematic structural diagram of a text information navigation and browsing device provided in the third embodiment of the present invention. This embodiment can be applied to a scenario where text is compared. The device can be implemented by software and/or hardware, and Can be integrated on the server.
如图5所示,本实施例提供的文本信息的导航浏览装置可以包括第一获取模块310、第二获取模块320、匹配模块330和导航浏览模块340,其中:As shown in FIG. 5, the apparatus for navigation and browsing of text information provided in this embodiment may include a first obtaining module 310, a second obtaining module 320, a matching module 330, and a navigation browsing module 340, wherein:
第一获取模块310,设置为获取第一文本,所述第一文本包括一个或多个第一信息;第二获取模块320,设置为获取第二文本,所述第二文本包括一个或多个第二信息;匹配模块330,设置为匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;导航浏览模块340,设置为根据所述相似度对所述第二文本进行导航浏览。The first obtaining module 310 is configured to obtain a first text, and the first text includes one or more pieces of first information; the second obtaining module 320 is configured to obtain a second text, and the second text includes one or more pieces of information. Second information; a matching module 330, configured to match the first information and the second information to determine the similarity between the second information and the first information; the navigation and browsing module 340, configured to match the similarity Navigate and browse the second text.
可选的,导航浏览模块340包括:展示单元,设置为根据所述相似度在导航浏览界面展示所述第一信息和所述第二信息。Optionally, the navigation browsing module 340 includes: a display unit configured to display the first information and the second information on a navigation browsing interface according to the similarity.
可选的,匹配模块330包括:第一提取单元,设置为在所述第一信息中提取第一关键特征;第二提取单元,设置为在所述第二信息中提取第二关键特征;相似度匹配单元,设置为匹配所述第一关键特征和所述第二关键特征以确定所述第二信息和所述第一信息的相似度。Optionally, the matching module 330 includes: a first extraction unit configured to extract a first key feature from the first information; a second extraction unit configured to extract a second key feature from the second information; similarity The degree matching unit is configured to match the first key feature and the second key feature to determine the similarity between the second information and the first information.
可选的,该文本信息的导航浏览装置还包括:第一向量化模块,设置为基于训练好的第一比对模型对所述第一关键特征进行向量化得到第一向量结果;第二向量化模块,设置为基于训练好的第二比对模型对所述第二关键特征进行向量化得到第二向量结果;匹配模块330是设置为匹配所述第一向量结果和所述第二向量结果以确定所述第二信息和所述第一信息的相似度。Optionally, the device for navigating and browsing text information further includes: a first vectorization module configured to perform vectorization on the first key feature based on the trained first comparison model to obtain a first vector result; and a second vector The matching module 330 is set to vectorize the second key feature based on the trained second comparison model to obtain a second vector result; the matching module 330 is set to match the first vector result and the second vector result To determine the similarity between the second information and the first information.
可选的,第一提取单元包括:第一处理子单元,设置为基于预设规则对所 述第一信息进行处理得到第一处理结果;将所述第一处理结果作为所述第一关键特征。Optionally, the first extraction unit includes: a first processing subunit configured to process the first information based on a preset rule to obtain a first processing result; and use the first processing result as the first key feature .
可选的,第一处理子单元是设置为获取所述第一信息的文字信息、符号信息和/或文字结构信息;基于所述文字信息、符号信息和/或文字结构信息对所述第一信息进行处理得到所述第一处理结果。Optionally, the first processing subunit is configured to obtain text information, symbol information, and/or text structure information of the first information; The information is processed to obtain the first processing result.
可选的,所述文字信息包括停用词,第一处理子单元是设置为分析得到所述第一信息中的停用词;提取所述停用词之前和/或之后的相关信息。Optionally, the text information includes stop words, and the first processing subunit is configured to analyze the stop words in the first information; and extract relevant information before and/or after the stop words.
可选的,所述符号信息包括分号和/或顿号,第一处理子单元是设置为提取所述分号和/或顿号之前和/或之后的相关信息。Optionally, the symbol information includes a semicolon and/or a comma, and the first processing subunit is configured to extract related information before and/or after the semicolon and/or the comma.
可选的,所述文字结构信息包括前序部分和特征部分,第一处理子单元是设置为提取所述前序部分和/或特征部分的相关信息。Optionally, the text structure information includes a preamble part and a characteristic part, and the first processing subunit is configured to extract relevant information of the preamble part and/or the characteristic part.
可选的,第二获取模块320包括:接收单元,设置为接收基于第一文本的检索信息;检索单元,设置为基于所述检索信息在数据库中检索与所述第一文本相似的所述第二文本。Optionally, the second acquisition module 320 includes: a receiving unit configured to receive retrieval information based on a first text; a retrieval unit configured to retrieve the first text similar to the first text in a database based on the retrieval information Two text.
可选的,该文本信息的导航浏览装置还包括:章节选择模块,设置为接收所述第二文本的章节选择信息;基于所述章节选择信息提取对应的章节作为所述第二信息。Optionally, the apparatus for navigating and browsing text information further includes: a chapter selection module configured to receive chapter selection information of the second text; and extract a corresponding chapter based on the chapter selection information as the second information.
可选的,该文本信息的导航浏览装置还包括:排序模块,设置为根据所述相似度对所述第二信息进行排序。Optionally, the device for navigating and browsing text information further includes: a sorting module configured to sort the second information according to the similarity.
可选的,导航浏览界面还包括:切换控件,所述切换控件设置为控制对多个第二信息进行切换显示。Optionally, the navigation browsing interface further includes: a switching control, the switching control is set to control the switching display of a plurality of second information.
可选的,导航浏览界面还包括相似标识,该展示单元包括:突出显示单元,设置为对所述第一信息和第二信息的相似部分进行突出显示。Optionally, the navigation browsing interface further includes a similar identifier, and the display unit includes a highlight display unit configured to highlight similar parts of the first information and the second information.
可选的,所述关键特征通过text-rank模型提取。Optionally, the key feature is extracted through a text-rank model.
可选的,所述相似度通过余弦相似度模型和/或词向量相似度求和模型确定。Optionally, the similarity is determined by a cosine similarity model and/or a word vector similarity summation model.
可选的,所述比对模型包括Word2vec模型和/或递归神经网络recursive autoencoder模型。Optionally, the comparison model includes a Word2vec model and/or a recursive neural network recursive autoencoder model.
可选的,所述第一信息和第二信息包括词语、句子或段落中的一种或多种。Optionally, the first information and the second information include one or more of words, sentences or paragraphs.
可选的,所述第一文本为权利要求书。Optionally, the first text is a claim.
可选的,所述第二文本为目标对比文件。Optionally, the second text is a target comparison document.
本发明实施例所提供的文本信息的导航浏览装置可执行本发明任意实施例 所提供的文本信息的导航浏览方法,具备执行方法相应的功能模块和效果。本发明实施例中未详尽描述的内容可以参考本发明任意方法实施例中的描述。The navigation and browsing device for text information provided by the embodiment of the present invention can execute the navigation and browsing method for text information provided by any embodiment of the present invention, and has the corresponding functional modules and effects for the execution method. For content that is not described in detail in the embodiment of the present invention, reference may be made to the description in any method embodiment of the present invention.
实施例四Example four
图6是本发明实施例四提供的一种服务器的结构示意图。图6示出了适于用来实现本发明实施方式的示例性服务器612的框图。图6显示的服务器612仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。Fig. 6 is a schematic structural diagram of a server provided in the fourth embodiment of the present invention. Figure 6 shows a block diagram of an exemplary server 612 suitable for implementing embodiments of the present invention. The server 612 shown in FIG. 6 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
如图6所示,服务器612以通用服务器的形式表现。服务器612的组件可以包括但不限于:一个或者多个处理器616,存储装置628,连接不同系统组件(包括存储装置628和处理器616)的总线618。As shown in FIG. 6, the server 612 is represented in the form of a general server. The components of the server 612 may include, but are not limited to: one or more processors 616, a storage device 628, and a bus 618 connecting different system components (including the storage device 628 and the processor 616).
总线618表示几类总线结构中的一种或多种,包括存储装置总线或者存储装置控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Subversive Alliance,ISA)总线,微通道体系结构(Micro Channel Architecture,MAC)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association,VESA)局域总线以及外围组件互连(Peripheral Component Interconnect,PCI)总线。The bus 618 represents one or more of several types of bus structures, including a storage device bus or a storage device controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any bus structure among multiple bus structures. For example, these architectures include, but are not limited to, Industry Subversive Alliance (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (Video Electronics Standards) Association, VESA) local bus and Peripheral Component Interconnect (PCI) bus.
服务器612包括多种计算机系统可读介质。这些介质可以是任何能够被服务器612访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。The server 612 includes a variety of computer system readable media. These media may be any available media that can be accessed by the server 612, including volatile and non-volatile media, removable and non-removable media.
存储装置628可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory,RAM)630和/或高速缓存存储器632。终端612可以包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统634可以设置为读写不可移动的、非易失性磁介质(图6未显示,通常称为“硬盘驱动器”)。尽管图6中未示出,可以提供设置为对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘,例如只读光盘(Compact Disc Read-Only Memory,CD-ROM),数字视盘(Digital Video Disc-Read Only Memory,DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线618相连。存储装置628可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明实施例的功能。The storage device 628 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (RAM) 630 and/or a cache memory 632. The terminal 612 may include other removable/non-removable, volatile/nonvolatile computer system storage media. For example only, the storage system 634 may be configured to read and write a non-removable, non-volatile magnetic medium (not shown in FIG. 6, usually referred to as a "hard drive"). Although not shown in FIG. 6, a disk drive configured to read and write to a removable non-volatile disk (such as a "floppy disk") and a removable non-volatile optical disk such as a compact disc (Compact Disc Read) can be provided. -Only Memory, CD-ROM), Digital Video Disc-Read Only Memory (DVD-ROM) or other optical media) read and write optical disc drives. In these cases, each drive can be connected to the bus 618 through one or more data media interfaces. The storage device 628 may include at least one program product, and the program product has a set of (for example, at least one) program modules, and these program modules are configured to perform the functions of the embodiments of the present invention.
具有一组(至少一个)程序模块642的程序/实用工具640,可以存储在例 如存储装置628中,这样的程序模块642包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或一种组合中可能包括网络环境的实现。程序模块642通常执行本公开所描述的实施例中的功能和/或方法。A program/utility tool 640 having a set of (at least one) program module 642 may be stored in, for example, the storage device 628. Such program module 642 includes but is not limited to an operating system, one or more application programs, other program modules, and programs Data, each of these examples or a combination may include the realization of a network environment. The program module 642 generally executes the functions and/or methods in the embodiments described in the present disclosure.
服务器612也可以与一个或多个外部设备614(例如键盘、指向终端、显示器624等)通信,还可与一个或者多个使得用户能与该服务器612交互的终端通信,和/或与使得该服务器612能与一个或多个其它计算终端进行通信的任何终端(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口622进行。并且,服务器612还可以通过网络适配器620与一个或者多个网络(例如局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN)和/或公共网络,例如因特网)通信。如图6所示,网络适配器620通过总线618与服务器612的其它模块通信。应当明白,尽管图中未示出,可以结合服务器612使用其它硬件和/或软件模块,包括但不限于:微代码、终端驱动器、冗余处理器、外部磁盘驱动阵列、磁盘阵列(Redundant Arrays of Independent Disks,RAID)系统、磁带驱动器以及数据备份存储系统等。The server 612 can also communicate with one or more external devices 614 (such as keyboards, pointing terminals, displays 624, etc.), and can also communicate with one or more terminals that enable users to interact with the server 612, and/or communicate with The server 612 can communicate with any terminal (such as a network card, a modem, etc.) that communicates with one or more other computing terminals. Such communication can be performed through an input/output (I/O) interface 622. In addition, the server 612 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 620. As shown in FIG. 6, the network adapter 620 communicates with other modules of the server 612 through the bus 618. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the server 612, including but not limited to: microcode, terminal drives, redundant processors, external disk drive arrays, and disk arrays. Independent Disks, RAID) systems, tape drives, and data backup storage systems.
处理器616通过运行存储在存储装置628中的程序,从而执行多种功能应用以及数据处理,例如实现本发明任意实施例所提供的一种文本信息的导航浏览方法,该方法可以包括:获取第一文本,所述第一文本包括一个或多个第一信息;获取第二文本,所述第二文本包括一个或多个第二信息;匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;根据所述相似度对所述第二文本进行导航浏览。The processor 616 executes a variety of functional applications and data processing by running programs stored in the storage device 628, for example, to implement a method for navigating and browsing text information provided by any embodiment of the present invention. The method may include: obtaining the first A text, the first text includes one or more first information; obtain a second text, the second text includes one or more second information; match the first information and the second information to determine The similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity.
本发明实施例的技术方案,通过获取第一文本,所述第一文本包括一个或多个第一信息;获取第二文本,所述第二文本包括一个或多个第二信息;匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;根据所述相似度对所述第二文本进行导航浏览。通过自动匹配相似度,在第二文本中自动查找与第一信息相似或相同的信息,可以快速地确认在第二文本中,哪些部分与第一文本中的第一信息相似,不需要通过人工在第二文本中寻找与第一信息相关的内容。可以针对匹配结果有目的的进行细节确认,达到提高检索文件的效率的效果。According to the technical solution of the embodiment of the present invention, by obtaining a first text, the first text includes one or more pieces of first information; obtaining a second text, where the second text includes one or more pieces of second information; The first information and the second information determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity. Through automatic matching of similarity, the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results to achieve the effect of improving the efficiency of document retrieval.
实施例五Example five
本发明实施例五还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本发明任意实施例所提供的一种文本信息的导航浏览方法,该方法可以包括:获取第一文本,所述第一文本包括一个或多 个第一信息;获取第二文本,所述第二文本包括一个或多个第二信息;匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;根据所述相似度对所述第二文本进行导航浏览。The fifth embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, a method for navigating and browsing text information as provided in any embodiment of the present invention is implemented. The method may include: obtaining a first text, the first text including one or more first information; obtaining a second text, the second text including one or more second information; matching the first information and the first information The second information is used to determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity.
本发明实施例的计算机可读存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer-readable storage medium of the embodiment of the present invention may adopt any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. Examples of computer-readable storage media (non-exhaustive list) include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Erasable Programmable Read-Only Memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, the computer-readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
存储介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。The program code contained on the storage medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或终端上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。The computer program code for performing the operations of the present disclosure can be written in one or more programming languages or a combination thereof. The programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or terminal. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
本发明实施例的技术方案,通过获取第一文本,所述第一文本包括一个或多个第一信息;获取第二文本,所述第二文本包括一个或多个第二信息;匹配 所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;根据所述相似度对所述第二文本进行导航浏览。通过自动匹配相似度,在第二文本中自动查找与第一信息相似或相同的信息,可以快速地确认在第二文本中,哪些部分与第一文本中的第一信息相似,不需要通过人工在第二文本中寻找与第一信息相关的内容。可以针对匹配结果有目的的进行细节确认,达到提高检索文件的效率的效果。According to the technical solution of the embodiment of the present invention, by obtaining a first text, the first text includes one or more pieces of first information; obtaining a second text, where the second text includes one or more pieces of second information; The first information and the second information determine the similarity between the second information and the first information; and the second text is navigated and browsed according to the similarity. Through automatic matching of similarity, the second text can automatically find information that is similar or identical to the first information, which can quickly confirm which parts of the second text are similar to the first information in the first text without manual Look for content related to the first message in the second text. It is possible to purposefully confirm the details of the matching results to achieve the effect of improving the efficiency of document retrieval.

Claims (31)

  1. 一种文本信息的导航浏览方法,包括:A method for navigating and browsing text information, including:
    获取第一文本,其中,所述第一文本包括第一信息;Acquiring a first text, where the first text includes first information;
    获取第二文本,其中,所述第二文本包括第二信息;Acquiring a second text, where the second text includes second information;
    匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;Matching the first information and the second information to determine the similarity between the second information and the first information;
    根据所述相似度对所述第二文本进行导航浏览。Navigating and browsing the second text according to the similarity.
  2. 如权利要求1所述的方法,其中,所述根据所述相似度对所述第二文本进行导航浏览,包括:The method according to claim 1, wherein the navigation and browsing of the second text according to the similarity comprises:
    根据所述相似度在导航浏览界面展示所述第二信息和所述第一信息。The second information and the first information are displayed on a navigation browsing interface according to the similarity.
  3. 如权利要求1所述的方法,其中,所述匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度包括:The method of claim 1, wherein said matching said first information and said second information to determine the similarity between said second information and said first information comprises:
    在所述第一信息中提取第一关键特征;Extracting a first key feature from the first information;
    在所述第二信息中提取第二关键特征;Extracting a second key feature from the second information;
    匹配所述第一关键特征和所述第二关键特征以确定所述第二信息和所述第一信息的相似度。The first key feature and the second key feature are matched to determine the similarity between the second information and the first information.
  4. 如权利要求3所述的方法,在所述匹配所述第一关键特征和所述第二关键特征以确定所述第二信息和所述第一信息的相似度之前,还包括:5. The method according to claim 3, before said matching said first key feature and said second key feature to determine the similarity between said second information and said first information, further comprising:
    基于训练好的第一比对模型对所述第一关键特征进行向量化得到第一向量结果;Vectorizing the first key feature based on the trained first comparison model to obtain a first vector result;
    基于训练好的第二比对模型对所述第二关键特征进行向量化得到第二向量结果;Vectorizing the second key feature based on the trained second comparison model to obtain a second vector result;
    所述匹配所述第一关键特征和所述第二关键特征以确定所述第二信息和所述第一信息的相似度,包括:The matching the first key feature and the second key feature to determine the similarity between the second information and the first information includes:
    匹配所述第一向量结果和所述第二向量结果以确定所述第二信息和所述第一信息的相似度。The first vector result and the second vector result are matched to determine the similarity between the second information and the first information.
  5. 如权利要求3所述的方法,其中,所述在所述第一信息中提取第一关键特征,包括:The method of claim 3, wherein said extracting the first key feature from the first information comprises:
    基于预设规则对所述第一信息进行处理得到第一处理结果;Processing the first information based on a preset rule to obtain a first processing result;
    将所述第一处理结果作为所述第一关键特征。Use the first processing result as the first key feature.
  6. 如权利要求5所述的方法,其中,所述基于预设规则对所述第一信息进行 处理得到第一处理结果,包括:The method according to claim 5, wherein said processing said first information based on a preset rule to obtain a first processing result comprises:
    获取所述第一信息的文字信息、符号信息和文字结构信息中的至少之一;Acquiring at least one of text information, symbol information, and text structure information of the first information;
    基于获取的信息对所述第一信息进行处理得到所述第一处理结果。The first information is processed based on the acquired information to obtain the first processing result.
  7. 如权利要求6所述的方法,其中,所述文字信息包括停用词,基于所述文字信息对所述第一信息进行处理得到所述第一处理结果,包括:7. The method of claim 6, wherein the text information includes stop words, and processing the first information based on the text information to obtain the first processing result comprises:
    分析得到所述第一信息中的停用词;Analyze and obtain the stop words in the first information;
    提取所述停用词之前的相关信息和所述停用词之后的相关信息中的至少之一;Extract at least one of the related information before the stop word and the related information after the stop word;
    将提取的相关信息作为所述第一处理结果。Use the extracted relevant information as the first processing result.
  8. 如权利要求6所述的方法,其中,所述符号信息包括分号和顿号中的至少之一,基于所述符号信息对所述第一信息进行处理得到所述第一处理结果,包括:7. The method of claim 6, wherein the symbol information includes at least one of a semicolon and a comma, and processing the first information based on the symbol information to obtain the first processing result comprises:
    提取以下至少之一:所述分号之前的相关信息、所述顿号之前的相关信息、所述分号之后的相关信息、所述顿号之后的相关信息;Extract at least one of the following: related information before the semicolon, related information before the comma, related information after the semicolon, and related information after the comma;
    将提取的相关信息作为所述第一处理结果。Use the extracted relevant information as the first processing result.
  9. 如权利要求6所述的方法,其中,所述文字结构信息包括前序部分和特征部分,基于所述文字结构信息对所述第一信息进行处理得到所述第一处理结果,包括:7. The method according to claim 6, wherein the text structure information includes a preamble part and a characteristic part, and processing the first information based on the text structure information to obtain the first processing result comprises:
    提取所述前序部分的相关信息和所述特征部分的相关信息中的至少之一;Extracting at least one of the related information of the preamble part and the related information of the characteristic part;
    将提取的相关信息作为所述第一处理结果。Use the extracted relevant information as the first processing result.
  10. 如权利要求1所述的方法,在所述匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度之前,还包括:The method according to claim 1, before said matching said first information and said second information to determine the similarity between said second information and said first information, further comprising:
    接收所述第二文本的章节选择信息;Receiving chapter selection information of the second text;
    基于所述章节选择信息提取对应的章节作为所述第二信息。Extracting a corresponding chapter based on the chapter selection information as the second information.
  11. 如权利要求2所述的方法,在所述根据所述相似度在导航浏览界面展示所述第二信息和所述第一信息之后,还包括:The method according to claim 2, after the displaying the second information and the first information on a navigation browsing interface according to the similarity, the method further comprises:
    根据相似度对多个第二信息进行排序。Sort the plurality of second information according to the similarity.
  12. 如权利要求2所述的方法,其中,所述导航浏览界面还包括切换控件,所述切换控件用于控制对多个第二信息进行切换显示。3. The method according to claim 2, wherein the navigation browsing interface further comprises a switch control, and the switch control is used to control the switch display of a plurality of second information.
  13. 如权利要求2所述的方法,其中,所述导航浏览界面还包括相似标识, 所述根据所述相似度在导航浏览界面展示所述第一信息和所述第二信息,包括:The method of claim 2, wherein the navigation browsing interface further includes a similarity identifier, and displaying the first information and the second information on the navigation browsing interface according to the similarity includes:
    在导航浏览界面对所述第一信息和所述第二信息的相似部分进行突出显示。The similar parts of the first information and the second information are highlighted on the navigation browsing interface.
  14. 如权利要求3所述的方法,其中,所述第一关键特征和所述第二关键特征均通过文本排名text-rank模型提取。The method of claim 3, wherein the first key feature and the second key feature are both extracted by a text-rank model.
  15. 如权利要求1所述的方法,其中,所述相似度通过余弦相似度模型和词向量相似度求和模型中的至少之一确定。The method according to claim 1, wherein the similarity is determined by at least one of a cosine similarity model and a word vector similarity summation model.
  16. 如权利要求4所述的方法,其中,所述第一比对模型和所述第二比对模型均包括词向量模型和递归神经网络模型中的至少之一。8. The method of claim 4, wherein the first comparison model and the second comparison model both comprise at least one of a word vector model and a recurrent neural network model.
  17. 如权利要求1所述的方法,其中,所述第一信息和所述第二信息均包括词语、句子和段落中的至少一种。The method of claim 1, wherein the first information and the second information each include at least one of words, sentences, and paragraphs.
  18. 如权利要求1所述的方法,其中,所述第一文本为权利要求书。The method of claim 1, wherein the first text is a claim.
  19. 如权利要求1所述的方法,其中,所述第二文本为目标对比文件。The method according to claim 1, wherein the second text is a target comparison document.
  20. 一种文本信息的导航浏览装置,包括:A navigation and browsing device for text information includes:
    第一获取模块,设置为获取第一文本,其中,所述第一文本包括第一信息;A first obtaining module, configured to obtain a first text, wherein the first text includes first information;
    第二获取模块,设置为获取第二文本,其中,所述第二文本包括第二信息;A second obtaining module, configured to obtain a second text, wherein the second text includes second information;
    匹配模块,设置为匹配所述第一信息和所述第二信息以确定所述第二信息和所述第一信息的相似度;A matching module, configured to match the first information and the second information to determine the similarity between the second information and the first information;
    导航浏览模块,设置为根据所述相似度对所述第二文本进行导航浏览。The navigation and browsing module is configured to navigate and browse the second text according to the similarity.
  21. 如权利要求20所述的装置,其中,所述导航浏览模块包括:The device of claim 20, wherein the navigation and browsing module comprises:
    展示单元,设置为根据所述相似度在导航浏览界面展示所述第二信息和所述第一信息。The display unit is configured to display the second information and the first information on a navigation browsing interface according to the similarity.
  22. 如权利要求20所述的装置,其中,所述匹配模块包括:The apparatus of claim 20, wherein the matching module comprises:
    第一提取单元,设置为在所述第一信息中提取第一关键特征;A first extraction unit, configured to extract a first key feature from the first information;
    第二提取单元,设置为在所述第二信息中提取第二关键特征;A second extraction unit, configured to extract a second key feature from the second information;
    相似度匹配单元,设置为匹配所述第一关键特征和所述第二关键特征以确定所述第二信息和所述第一信息的相似度。The similarity matching unit is configured to match the first key feature and the second key feature to determine the similarity between the second information and the first information.
  23. 如权利要求22所述的装置,还包括:The device of claim 22, further comprising:
    第一向量化模块,设置为基于训练好的第一比对模型对所述第一关键特征进行向量化得到第一向量结果;The first vectorization module is set to vectorize the first key feature based on the trained first comparison model to obtain a first vector result;
    第二向量化模块,设置为基于训练好的第二比对模型对所述第二关键特征 进行向量化得到第二向量结果;The second vectorization module is set to vectorize the second key feature based on the trained second comparison model to obtain a second vector result;
    匹配模块是设置为匹配所述第一向量结果和所述第二向量结果以确定所述第二信息和所述第一信息的相似度。The matching module is configured to match the first vector result and the second vector result to determine the similarity between the second information and the first information.
  24. 如权利要求22所述的装置,其中,所述第一提取单元包括:The apparatus of claim 22, wherein the first extraction unit comprises:
    第一处理子单元,设置为基于预设规则对所述第一信息进行处理得到第一处理结果;将所述第一处理结果作为所述第一关键特征。The first processing subunit is configured to process the first information based on a preset rule to obtain a first processing result; and use the first processing result as the first key feature.
  25. 如权利要求24所述的装置,其中,所述第一处理子单元是设置为通过如下方式基于预设规则对所述第一信息进行处理得到第一处理结果:The device of claim 24, wherein the first processing subunit is configured to process the first information based on a preset rule in the following manner to obtain the first processing result:
    获取所述第一信息的文字信息、符号信息和文字结构信息中的至少之一;基于获取的信息对所述第一信息进行处理得到所述第一处理结果。At least one of text information, symbol information, and text structure information of the first information is acquired; the first information is processed based on the acquired information to obtain the first processing result.
  26. 如权利要求20所述的装置,其中,所述第二获取模块包括:The apparatus of claim 20, wherein the second acquisition module comprises:
    接收单元,设置为接收基于所述第一文本的检索信息;A receiving unit, configured to receive retrieval information based on the first text;
    检索单元,设置为基于所述检索信息在数据库中检索与所述第一文本相似的所述第二文本。The retrieval unit is configured to retrieve the second text similar to the first text in the database based on the retrieval information.
  27. 如权利要求20所述的装置,还包括:The device of claim 20, further comprising:
    章节选择模块,设置为接收所述第二文本的章节选择信息;基于所述章节选择信息提取对应的章节作为所述第二信息。The chapter selection module is configured to receive chapter selection information of the second text; and extract a corresponding chapter based on the chapter selection information as the second information.
  28. 如权利要求21所述的装置,还包括:The device of claim 21, further comprising:
    排序模块,设置为根据相似度对多个第二信息进行排序。The sorting module is configured to sort the plurality of second information according to the similarity.
  29. 如权利要求21所述的装置,其中,所述导航浏览界面还包括相似标识,所述展示单元包括:21. The device of claim 21, wherein the navigation browsing interface further includes a similar identifier, and the display unit includes:
    突出显示单元,设置为在导航浏览界面对所述第一信息和所述第二信息的相似部分进行突出显示。The highlight display unit is configured to highlight similar parts of the first information and the second information on the navigation browsing interface.
  30. 一种服务器,包括:A server that includes:
    至少一个处理器;At least one processor;
    存储装置,设置为存储至少一个程序;The storage device is set to store at least one program;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-19中任一项所述的文本信息的导航浏览方法。When the at least one program is executed by the at least one processor, the at least one processor implements the method for navigating and browsing text information according to any one of claims 1-19.
  31. 一种计算机可读存储介质,存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-19中任一项所述的文本信息的导航浏览方法。A computer-readable storage medium storing a computer program, wherein when the program is executed by a processor, the method for navigating and browsing text information according to any one of claims 1-19 is realized.
PCT/CN2020/110994 2019-08-30 2020-08-25 Text information navigation and browsing method, apparatus, server and storage medium WO2021037012A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910816838.0 2019-08-30
CN201910816838.0A CN112445891A (en) 2019-08-30 2019-08-30 Text information navigation browsing method, device, server and storage medium

Publications (1)

Publication Number Publication Date
WO2021037012A1 true WO2021037012A1 (en) 2021-03-04

Family

ID=74684562

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110994 WO2021037012A1 (en) 2019-08-30 2020-08-25 Text information navigation and browsing method, apparatus, server and storage medium

Country Status (2)

Country Link
CN (1) CN112445891A (en)
WO (1) WO2021037012A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789452A (en) * 2011-05-16 2012-11-21 株式会社日立制作所 Similar content extraction method
US20130054612A1 (en) * 2006-10-10 2013-02-28 Abbyy Software Ltd. Universal Document Similarity
US9852337B1 (en) * 2015-09-30 2017-12-26 Open Text Corporation Method and system for assessing similarity of documents
CN108763486A (en) * 2018-05-30 2018-11-06 湖南写邦科技有限公司 Paper duplicate checking method, terminal and storage medium based on terminal
CN110162630A (en) * 2019-05-09 2019-08-23 深圳市腾讯信息技术有限公司 A kind of method, device and equipment of text duplicate removal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920633B (en) * 2018-07-01 2021-12-03 湖北通远格知科技有限公司 Paper similarity detection method
CN109408826A (en) * 2018-11-07 2019-03-01 北京锐安科技有限公司 A kind of text information extracting method, device, server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130054612A1 (en) * 2006-10-10 2013-02-28 Abbyy Software Ltd. Universal Document Similarity
CN102789452A (en) * 2011-05-16 2012-11-21 株式会社日立制作所 Similar content extraction method
US9852337B1 (en) * 2015-09-30 2017-12-26 Open Text Corporation Method and system for assessing similarity of documents
CN108763486A (en) * 2018-05-30 2018-11-06 湖南写邦科技有限公司 Paper duplicate checking method, terminal and storage medium based on terminal
CN110162630A (en) * 2019-05-09 2019-08-23 深圳市腾讯信息技术有限公司 A kind of method, device and equipment of text duplicate removal

Also Published As

Publication number Publication date
CN112445891A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
CN108875067B (en) Text data classification method, device, equipment and storage medium
US10657325B2 (en) Method for parsing query based on artificial intelligence and computer device
WO2021017721A1 (en) Intelligent question answering method and apparatus, medium and electronic device
CN108052577B (en) Universal text content mining method, device, server and storage medium
CN107992596B (en) Text clustering method, text clustering device, server and storage medium
US9569506B2 (en) Uniform search, navigation and combination of heterogeneous data
CN108549656B (en) Statement analysis method and device, computer equipment and readable medium
CN110390054B (en) Interest point recall method, device, server and storage medium
US20180341866A1 (en) Method of building a sorting model, and application method and apparatus based on the model
CN106951503B (en) Information providing method, device, equipment and storage medium
US20210358570A1 (en) Method and system for claim scope labeling, retrieval and information labeling of gene sequence
CN110543592A (en) Information searching method and device and computer equipment
WO2020232898A1 (en) Text classification method and apparatus, electronic device and computer non-volatile readable storage medium
JP2020149686A (en) Image processing method, device, server, and storage medium
WO2023024975A1 (en) Text processing method and apparatus, and electronic device
US9436891B2 (en) Discriminating synonymous expressions using images
CN112989010A (en) Data query method, data query device and electronic equipment
CN107861948B (en) Label extraction method, device, equipment and medium
KR20120047622A (en) System and method for managing digital contents
CN114116997A (en) Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium
CN111563172B (en) Academic hot spot trend prediction method and device based on dynamic knowledge graph construction
CN116383412B (en) Functional point amplification method and system based on knowledge graph
WO2019071907A1 (en) Method for identifying help information based on operation page, and application server
WO2021037012A1 (en) Text information navigation and browsing method, apparatus, server and storage medium
CN117011581A (en) Image recognition method, medium, device and computing equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20857281

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20857281

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.08.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20857281

Country of ref document: EP

Kind code of ref document: A1