CN111460272B - Text page ordering method and related equipment - Google Patents

Text page ordering method and related equipment Download PDF

Info

Publication number
CN111460272B
CN111460272B CN201910062067.0A CN201910062067A CN111460272B CN 111460272 B CN111460272 B CN 111460272B CN 201910062067 A CN201910062067 A CN 201910062067A CN 111460272 B CN111460272 B CN 111460272B
Authority
CN
China
Prior art keywords
page
paragraph
attribute
referee document
last
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910062067.0A
Other languages
Chinese (zh)
Other versions
CN111460272A (en
Inventor
宁荣江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201910062067.0A priority Critical patent/CN111460272B/en
Publication of CN111460272A publication Critical patent/CN111460272A/en
Application granted granted Critical
Publication of CN111460272B publication Critical patent/CN111460272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a text page ordering method and related equipment, which are used for reducing labor cost. The method comprises the following steps: acquiring a target judge document; segmenting each page in the target referee document; determining paragraph attributes of paragraphs of each page in the target referee document; and sorting the pages of the target referee document based on the paragraph attribute of the paragraphs of each page in the target referee document and a preset paragraph sorting rule.

Description

Text page ordering method and related equipment
Technical Field
The present invention relates to the field of text processing, and in particular, to a method and related device for ordering text pages.
Background
When crawling referee documents using crawlers, a series of individual referee documents are often obtained, which are often referee documents that are scrambled in order. When the whole referee document is analyzed, the correct referee document cannot be obtained out of order, so that normal display and analysis of the referee document are affected.
The existing scheme is to sort each document manually and read the content. However, after a large amount of books are crawled, the books of each case are manually ordered, so that the workload is huge.
Disclosure of Invention
The embodiment of the invention provides a text page ordering method and related equipment, which are used for reducing labor cost and improving text ordering efficiency.
The first aspect of the embodiment of the invention provides a text page ordering method, which specifically comprises the following steps:
acquiring a target judge document;
segmenting each page in the target referee document;
determining paragraph attributes of paragraphs of each page in the target referee document;
and sorting the pages of the target referee document based on the paragraph attribute of the paragraphs of each page in the target referee document and a preset paragraph sorting rule.
Optionally, the determining the paragraph attribute of the paragraph of each page in the target referee document includes:
and identifying paragraph attributes of paragraphs of each page in the target referee document based on a preset training model.
Optionally, the sorting the pages in the target referee document based on the paragraph attribute of the paragraph of each page in the target referee document and a preset paragraph sorting rule includes;
judging whether the last paragraph of the first page of the target referee document is a separated paragraph, wherein the first page is any page which is not the last page in the target referee document;
if not, acquiring the paragraph attribute of the last paragraph of the first page;
and determining a second page adjacent to the first page from each first page in the target referee document based on the paragraph attribute of the last paragraph of the first page and a preset paragraph sorting rule, wherein the paragraph attribute of the first paragraph of the second page is adjacent to the paragraph attribute of the last paragraph of the first page, and the first paragraph of the second page is an undivided paragraph.
Optionally, when the last paragraph of the first page is a partitioned paragraph, the method further comprises:
judging whether the number of words contained in the last paragraph of the first page is larger than a preset threshold value or not;
if yes, acquiring paragraph attributes of the last paragraph of the first page;
and determining a third page from other pages except the first page in the target referee document based on the paragraph attribute of the last paragraph of the first page, wherein the paragraph attribute of the first paragraph of the third page is the same as the paragraph attribute of the last paragraph of the first page, the content of the first paragraph of the third page is larger than the preset threshold, and the third page is the page adjacent to the first page in the target referee document.
Optionally, when the content of the last paragraph of the first page is smaller than the preset threshold, the method further includes:
acquiring a paragraph attribute of the penultimate paragraph of the first page;
determining a fourth page from other pages except the first page in the target referee document based on the paragraph attribute of the first page penultimate paragraph, wherein the paragraph attribute of the first paragraph of the fourth page is adjacent to the paragraph attribute of the first page penultimate paragraph, and the fourth page is adjacent to the first page.
Optionally, when the content of the first paragraph of the third page is smaller than the preset threshold, the method further includes:
and determining a fifth page from other pages except the first page and the third page in the target referee document, wherein the paragraph attribute of a second paragraph of the fifth page is adjacent to the paragraph attribute of a last paragraph of the first page, and the fifth page is adjacent to the first page.
A second aspect of an embodiment of the present invention provides a device for sorting text pages, including:
an acquisition unit for acquiring a target referee document;
the segmentation unit is used for segmenting each page in the target referee document;
the determining unit is used for determining paragraph attributes of paragraphs of each page in the target referee document;
and the sorting unit is used for sorting the pages of the target referee document based on the paragraph attribute of the paragraphs of each page in the target referee document and a preset paragraph sorting rule.
Optionally, the determining unit determines the paragraph attribute of the paragraph of each page in the target referee document includes:
and identifying paragraph attributes of paragraphs of each page in the target referee document based on a preset training model.
Optionally, the ranking unit ranks the pages in the target referee document based on the paragraph attribute of the paragraph of each page in the target referee document and a preset paragraph ranking rule;
judging whether the last paragraph of the first page of the target referee document is a separated paragraph, wherein the first page is any page which is not the last page in the target referee document;
if not, acquiring the paragraph attribute of the last paragraph of the first page;
and determining a second page adjacent to the first page from each first page in the target referee document based on the paragraph attribute of the last paragraph of the first page and a preset paragraph sorting rule, wherein the paragraph attribute of the first paragraph of the second page is adjacent to the paragraph attribute of the last paragraph of the first page, and the first paragraph of the second page is an undivided paragraph.
Optionally, the sorting unit is further configured to:
when the last paragraph of the first page is a partitioned paragraph,
judging whether the number of words contained in the last paragraph of the first page is larger than a preset threshold value or not;
if yes, acquiring paragraph attributes of the last paragraph of the first page;
and determining a third page from other pages except the first page in the target referee document based on the paragraph attribute of the last paragraph of the first page, wherein the paragraph attribute of the first paragraph of the third page is the same as the paragraph attribute of the last paragraph of the first page, the content of the first paragraph of the third page is larger than the preset threshold, and the third page is the page adjacent to the first page in the target referee document.
Optionally, the sorting unit is further configured to:
when the content of the last paragraph of the first page is less than the preset threshold,
acquiring a paragraph attribute of the penultimate paragraph of the first page;
determining a fourth page from other pages except the first page in the target referee document based on the paragraph attribute of the first page penultimate paragraph, wherein the paragraph attribute of the first paragraph of the fourth page is adjacent to the paragraph attribute of the first page penultimate paragraph, and the fourth page is adjacent to the first page.
Optionally, the sorting unit is further configured to:
when the content of the first paragraph of the third page is smaller than the preset threshold,
and determining a fifth page from other pages except the first page and the third page in the target referee document, wherein the paragraph attribute of a second paragraph of the fifth page is adjacent to the paragraph attribute of a last paragraph of the first page, and the fifth page is adjacent to the first page.
A third aspect of the embodiments of the present invention provides a processor for running a computer program which, when run, performs the steps of the method for ordering text pages as described in the above aspects.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium having stored thereon a computer program, characterized in that: the method for ordering text pages according to the above aspects comprises the steps of, when the computer program is executed by a processor.
In summary, it can be seen that in the embodiment provided by the present invention, firstly, the paragraph attribute and the preset paragraph sorting rule of the paragraphs of each page in the target referee document are determined, and then the target referee document is sorted based on the paragraph attribute and the preset paragraph sorting rule, so that the pages of the referee document can be sorted rapidly, and compared with the existing manual sorting method, the labor cost can be saved.
Drawings
Fig. 1 is an embodiment diagram of a method for sorting text pages according to an embodiment of the present invention;
fig. 2 is an embodiment schematic diagram of a text page sorting apparatus according to an embodiment of the present invention;
fig. 3 is a schematic hardware structure of a server according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a text page ordering method and related equipment, which are used for saving labor cost and improving text ordering efficiency.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The text page sorting method of the present invention is described below from the perspective of a text page sorting device, which may be a server or a service unit in the server, and is not specifically limited.
Referring to fig. 1, fig. 1 is a schematic diagram of an embodiment of a method for sorting text pages according to an embodiment of the present invention, including:
101. and obtaining the target referee document.
In this embodiment, the sorting device for text pages may obtain a target referee document, where the target referee document is a referee document to be subjected to page sorting, and specifically, how to obtain the target referee document is not limited herein, for example, by a web crawler crawling the target referee document.
102. And segmenting each page in the target referee document.
In this embodiment, after the target referee document is obtained, the sorting device for text pages may segment each page in the target referee document, for example, by identifying a paragraph character in each page in the target referee document, or may segment the page in other manners, which is not limited specifically, so long as each page in the multi-target referee document can be segmented.
103. And determining the paragraph attribute of the paragraphs of each page in the target referee document.
In this embodiment, the text page sorting device may preset a training model to identify paragraph attributes of paragraphs of each page in the target referee document, where the preset training model is obtained by training a training corpus in advance through a machine learning manner, the training corpus includes a plurality of paragraph sets of referee documents in a plurality of judicial fields and paragraph attributes of each paragraph in the paragraph sets, and the preset training model is used to identify paragraph attributes of paragraphs in the referee document, where the paragraph attributes may include attributes such as litigation participants, trial passes, complaints, evidence, and referee results, and after determining a paragraph of each page in the target referee document, each paragraph of each page may be input into the training model, and the paragraph attributes of each paragraph of each page may be output.
It should be noted that, the target referee document has paragraphs separated by pages, the paragraphs are separated in two pages, the paragraphs separated in two pages with the number of words smaller than the preset threshold cannot determine the paragraph attribute, the paragraphs separated in two pages with the number of words larger than the preset threshold can determine the paragraph attribute, and the text sorting device does not process the paragraphs incapable of determining the paragraph attribute.
104. And sorting the pages of the target referee document based on the paragraph attribute of the paragraphs of each page in the target referee document and a preset paragraph sorting rule.
In this embodiment, the text page ranking device ranks the pages of the target referee document based on the paragraph attribute of the paragraphs of each page and a preset paragraph ranking rule.
It should be noted that, the text page sorting device may preset a preset paragraph sorting rule, and it may be understood that different types of referee documents may each have a fixed paragraph sorting rule corresponding to each type, so that the preset paragraph sorting rule of the document type to which the target referee document belongs may be numbered, and examples of the preset paragraph sorting rule include, for example, 0 title, 1 court, 2 document types, 3 case number, 4 litigation participants, 5 trial passes, 6 primary screening cases (a review please), 7 primary screening cases (a review claiming), 8 primary screening cases (a review third person stating), 9 primary screening cases (a review evidential segment), 10 primary screening cases (a review facts), 11 primary screening cases (a review result), 12 original cases (second-order complaints), 13 original cases (second-order complaints), 14 original cases (second-order third-order statement), 15 original cases (second-order evidence section), 16 original cases (second-order fact-approval), 17 original cases (second-order judgment result), 18 original cases (re-order cases), 19 original cases (anti-complaints), 20 original cases (anti-complaints), 21 original cases (anti-complaints), 22 anti-complaints, 23 anti-complaints, 24 anti-complaints, 25 original cases (re-order cases), 26 complaints, 27 complaints evidence, 28 complaints act cases, 29 claims, 30 claiming evidence, 31 third-order statement, 32 third-order submittal evidence, 33 evidence section, 34 fact-approval, 35 judge results, 36 legal effects, 37 trial persons, the preset ranking rules are described herein by way of example and not limitation, and the numbers in the paragraph ranking can be modified, added or subtracted according to actual situations, and the invention is not limited thereto.
In one embodiment, the ranking means for text pages ranks the pages in the target referee document based on the paragraph attribute of the paragraphs of each page of the target referee document and a preset paragraph ranking rule, including:
judging whether the last paragraph of the first page of the target referee document is a separated paragraph, wherein the first page is any page which is not the last page in the target referee document;
if not, acquiring the paragraph attribute of the last paragraph of the first page;
and determining a second page adjacent to the first page from each first page in the target referee document based on the paragraph attribute of the last paragraph of the first page and a preset paragraph ordering rule.
In this embodiment, the ranking device of the text page may first select a page from other pages except the first page in the target referee document, define the page as the first page, then determine whether the last paragraph of the first page is not separated (i.e. determine whether the last paragraph of the first page is a complete paragraph) or not, when the last paragraph of the first page is not separated, obtain the paragraph attribute of the last paragraph of the first page, determine, based on the paragraph attribute of the last paragraph of the first page and the preset paragraph ranking rule, from each first page in the target referee document, a second page adjacent to the first page (e.g. the page number of the first page is 3, then the page number of the second page is 4, of course, and this is merely illustrative, and does not represent a limitation thereto), that is, after knowing that the last paragraph of the first page is a complete paragraph of the first page, the first paragraph attribute of the first page including the first paragraph of the first page not separated from the first page in the target referee document can find the first paragraph attribute of the first page adjacent to the first page of the first paragraph of the first page not separated from the first page.
For example, referring to the above reference to the paragraph sequence, for example, the paragraph attribute of the last paragraph of the first page is "fact-identifying" of paragraph attribute of reference numeral 34, the page of the first paragraph whose paragraph attribute is reference numeral 35 "judging result" is searched from the other pages except the first page in the target judging document, the page of the first paragraph whose paragraph attribute is reference numeral 35 is the second page linked to the first page, if the paragraph attribute of the reference numeral 35 is not included in the target document, the page of the first paragraph whose paragraph attribute is reference numeral 36 is searched from the other pages except the first page in the target judging document, and so on until the second page adjacent to the first page is found.
It should be noted that, in one embodiment, the sorting device for the text may first determine the first page of the target referee document, that is, the first page of the target referee document, and since the first page has the most obvious features and includes the title, the first page may be determined from the target referee document by using the rule of regular matching, and then, the numbers of the pages other than the first page may be determined by using the first page as a reference in the above manner, so as to sort the target referee document.
In one embodiment, when the last paragraph of the first page is a partitioned paragraph, it is also possible to:
judging whether the number of words contained in the last paragraph of the first page is larger than a preset threshold value or not;
if yes, acquiring the paragraph attribute of the last paragraph of the first page;
based on the paragraph attribute of the last paragraph of the first page, determining a third page from other pages except the first page in the target referee document, where the paragraph attribute of the first paragraph of the third page is the same as the paragraph attribute of the last paragraph of the first page, and the content of the first paragraph of the third page is greater than a preset threshold, where the third page is a page adjacent to the first page, for example, the page number of the first page is 3, and the page number of the third page is 4, although other page numbers may also be used, which are only for illustration and not meant to be limiting.
In this embodiment, since the last paragraph of the first page is a separated paragraph, and since the paragraph attribute of the last paragraph of the first page cannot be determined if the number of words in the separated paragraph is too small, it is necessary to determine whether the number of words in the last paragraph of the first page is greater than a preset threshold (for example, the number of words in the whole paragraph is 150, the preset threshold may be 50 words or other words, that is, the paragraph attribute of the paragraph may be determined by at least 50 words, or may be set according to the actual situation, specifically but not limited to this, and when the number of words in the last paragraph of the first page is greater than the preset threshold, the paragraph attribute of the last paragraph of the first page may be obtained, and then a third page may be determined from other pages except the first page in the target referent, where the paragraph attribute of the first paragraph of the third page is the same as the paragraph attribute of the last paragraph of the first page, and the content of the first paragraph of the third page is greater than the preset threshold. That is, the first paragraph and the page having the same attribute as the last paragraph of the first page can be found from other pages based on the paragraph attribute of the last paragraph of the first page, the page is the page adjacent to the first page, and if the first page is the 5 th page of the target referee document, the third page is the 6 th page of the target referee document.
In one embodiment, when the last paragraph of the first page contains less words than the preset threshold,
acquiring paragraph attributes of the penultimate paragraph of the first page;
determining a fourth page from other pages except the first page in the target referee document based on the paragraph attribute of the first page penultimate paragraph, wherein the paragraph attribute of the first paragraph of the fourth page is adjacent to the paragraph attribute of the first page penultimate paragraph, and the third page is adjacent to the first page.
In this embodiment, when the number of words included in the last paragraph of the first page is smaller than the preset threshold, that is, it cannot be determined that the number of words included in the last paragraph of the first page is smaller than the preset threshold, at this time, a fourth page may be determined from other pages except the first page in the target referee document by the paragraph attribute of the last paragraph of the first page, where the paragraph attribute of the last paragraph of the fourth page is adjacent to the paragraph attribute of the last paragraph of the first page, that is, the page adjacent to the first page, for example, the page number of the first page is 3, and the page number of the fourth page is 4, although other page numbers may also be used, which are merely for illustration and not meant to be limiting.
For example, referring to the above reference numerals for the paragraph sequence, such as "fact-identifying" the paragraph attribute of the next-to-last paragraph of the first page is the paragraph attribute of reference numeral 34, the page of the first paragraph whose paragraph attribute is reference numeral 35 "judging result" is searched for from the other pages except the first page in the target judging document, the page of the first paragraph whose paragraph attribute is reference numeral 35 is the fourth page adjacent to the first page, and if the paragraph attribute of the reference numeral 35 is not included in the target document, the page of the first paragraph whose paragraph attribute is reference numeral 36 is searched for from the other pages except the first page in the target judging document, and so on until the fourth page is found.
In one embodiment, when the content of the first paragraph of the third page is less than a preset threshold,
determining a fifth page from other pages except the first page and the third page in the target referee document, wherein the paragraph attribute of the second paragraph of the fifth page is adjacent to the paragraph attribute of the last paragraph of the first page, and the fifth page is the page adjacent to the first page.
In this embodiment, when the first paragraph of the other pages except the first page in the target referee document cannot determine the paragraph attribute, the paragraph attribute of the second paragraph may be selected to be compared with the last paragraph of the first page to determine the fifth page, where the paragraph attribute of the second paragraph of the fifth page is adjacent to the paragraph attribute of the last paragraph of the first page, for example, the paragraph attribute of the second paragraph of the fifth page is "16" and the paragraph attribute of the last paragraph of the first page may be "15" and is merely illustrative, and not meant to be limiting.
It should be noted that, in the process of sorting the pages in the target referee document, firstly, the paragraph attribute of the last paragraph of the first page is used as the reference to find the page where the paragraph is located, which is the same as or adjacent to the paragraph attribute of the last paragraph, and when the paragraph attribute of the last paragraph of the first page cannot be determined, the paragraph attribute of the last but one paragraph of the first page is used as the reference to find the page where the paragraph is located, which rings with the attribute of the last but one paragraph, and so on can determine the sorting of the text pages in the target referee document.
In summary, it can be seen that in the embodiment provided by the present invention, the paragraph attribute of the paragraph of each page in the target referee document is first determined, and then the target referee document is ordered based on the paragraph attribute and the preset paragraph ordering rule, so that the pages of the referee document can be rapidly ordered, and compared with the existing manual ordering manner, the labor cost can be saved.
The text page sorting method provided by the embodiment of the invention is described above, and the text page sorting device provided by the embodiment of the invention is described below with reference to fig. 2.
Referring to fig. 2, fig. 2 is a schematic diagram of an embodiment of a text page sorting apparatus according to an embodiment of the present invention, including:
an acquisition unit 201 for acquiring a target referee document;
a segmentation unit 202, configured to segment each page in the target referee document;
a determining unit 203, configured to determine a paragraph attribute of a paragraph of each page in the target referee document;
the ranking unit 204 is configured to rank the pages of the target referee document based on the paragraph attribute of the paragraph of each page in the target referee document and a preset paragraph ranking rule.
Optionally, the determining unit 203 determines the paragraph attribute of the paragraph of each page in the target referee document includes:
and identifying paragraph attributes of paragraphs of each page in the target referee document based on a preset training model.
Optionally, the ranking unit 204 ranks the pages in the target referee document based on the paragraph attribute of the paragraph of each page in the target referee document and a preset paragraph ranking rule;
judging whether the last paragraph of the first page of the target referee document is a separated paragraph, wherein the first page is any page which is not the last page in the target referee document;
if not, acquiring the paragraph attribute of the last paragraph of the first page;
and determining a second page adjacent to the first page from each first page in the target referee document based on the paragraph attribute of the last paragraph of the first page and a preset paragraph sorting rule, wherein the paragraph attribute of the first paragraph of the second page is adjacent to the paragraph attribute of the last paragraph of the first page, and the first paragraph of the second page is an undivided paragraph.
Optionally, the sorting unit 204 is further configured to:
when the last paragraph of the first page is a partitioned paragraph,
judging whether the number of words contained in the last paragraph of the first page is larger than a preset threshold value or not;
if yes, acquiring paragraph attributes of the last paragraph of the first page;
determining a third page from other pages except the first page in the target referee document based on the paragraph attribute of the last paragraph of the first page, wherein the paragraph attribute of the first paragraph of the third page is the same as the paragraph attribute of the last paragraph of the first page, the content of the first paragraph of the third page is larger than the preset threshold, and the third page is any one page of the other pages except the first page in the target referee document, and the third page is adjacent to the first page.
Optionally, the sorting unit 204 is further configured to:
when the content of the last paragraph of the first page is less than the preset threshold,
acquiring a paragraph attribute of the penultimate paragraph of the first page;
determining a fourth page from other pages except the first page in the target referee document based on the paragraph attribute of the first page penultimate paragraph, wherein the paragraph attribute of the first paragraph of the fourth page is adjacent to the paragraph attribute of the first page penultimate paragraph, and the fourth page is adjacent to the first page.
Optionally, the sorting unit 204 is further configured to:
when the content of the first paragraph of the third page is smaller than the preset threshold,
and determining a fifth page from other pages except the first page and the third page in the target referee document, wherein the paragraph attribute of a second paragraph of the fifth page is adjacent to the paragraph attribute of a last paragraph of the first page, and the fifth page is adjacent to the first page.
The interaction manner between the units of the sorting device for text pages in this embodiment is described in the embodiment shown in fig. 1, and is not described herein in detail.
In summary, it can be seen that in the embodiment provided by the present invention, the paragraph attribute of the paragraph of each page in the target referee document is first determined, and then the target referee document is ranked based on the paragraph attribute and the preset paragraph ranking rule, so that the pages of the referee document can be rapidly ranked, and compared with the existing manual ranking manner, the labor cost can be saved.
Referring to fig. 3, fig. 3 is a schematic diagram of a server according to an embodiment of the present invention, where the server 300 may have a relatively large difference according to a configuration or performance, and may include one or more central processing units (central processing units, CPU) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) storing application programs 342 or data 344. Wherein the memory 332 and the storage medium 330 may be transitory or persistent. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 and execute a series of instruction operations in the storage medium 330 on the server 300.
The server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
The steps performed by the ranking means of the text pages in the above-described embodiments may be based on the server structure shown in fig. 3.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The embodiment of the invention also provides a storage medium, and a program is stored on the storage medium, and the program realizes the ordering method of the text pages when being executed by a processor.
The embodiment of the invention also provides a processor for running a program, wherein the program runs to execute the text page ordering method.
The embodiment of the invention also provides equipment, which comprises a processor, a memory and a program stored on the memory and capable of running on the processor, wherein the processor realizes the following steps when executing the program:
acquiring a target judge document;
segmenting each page in the target referee document;
determining paragraph attributes of paragraphs of each page in the target referee document;
and sorting the pages of the target referee document based on the paragraph attribute of the paragraphs of each page in the target referee document and a preset paragraph sorting rule.
In a specific implementation process, any implementation manner of the embodiment corresponding to fig. 1 may be implemented when a processor executes a program.
The device herein may be a server, PC, PAD, cell phone, etc.
The invention also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of:
acquiring a target judge document;
segmenting each page in the target referee document;
determining paragraph attributes of paragraphs of each page in the target referee document;
and sorting the pages of the target referee document based on the paragraph attribute of the paragraphs of each page in the target referee document and a preset paragraph sorting rule.
In a specific implementation, any of the embodiments corresponding to fig. 1 may be implemented when a computer program product is executed.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It is further noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of element tags includes not only those element tags, but also other element tags not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element tags defined by the statement "include one … …" do not preclude the presence of additional identical element tags in a process, method, article, or apparatus that includes an element tag.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims (8)

1. A method for ordering text pages, comprising:
acquiring a target judge document;
segmenting each page in the target referee document;
determining paragraph attributes of paragraphs of each page in the target referee document, wherein the paragraph attributes comprise at least one of litigation participants, trial passes, prosecution, evidence and referee results;
sorting the pages of the target referee document based on paragraph attributes of paragraphs of each page in the target referee document and a preset paragraph sorting rule;
the sorting the pages in the target referee document based on the paragraph attribute of the paragraph of each page in the target referee document and the preset paragraph sorting rule comprises:
judging whether the last paragraph of the first page of the target referee document is a separated paragraph, wherein the first page is any page which is not the last page in the target referee document;
if not, acquiring the paragraph attribute of the last paragraph of the first page;
and determining a second page adjacent to the first page from each first page in the target referee document based on the paragraph attribute of the last paragraph of the first page and a preset paragraph sorting rule, wherein the paragraph attribute of the first paragraph of the second page is adjacent to the paragraph attribute of the last paragraph of the first page, and the first paragraph of the second page is an undivided paragraph.
2. The method of claim 1, wherein the determining paragraph properties of the paragraphs of each page in the target referee document comprises:
and identifying paragraph attributes of paragraphs of each page in the target referee document based on a preset training model.
3. The method of claim 1, wherein when the last paragraph of the first page is a partitioned paragraph, the method further comprises:
judging whether the number of words contained in the last paragraph of the first page is larger than a preset threshold value or not;
if yes, acquiring paragraph attributes of the last paragraph of the first page;
and determining a third page from other pages except the first page in the target referee document based on the paragraph attribute of the last paragraph of the first page, wherein the paragraph attribute of the first paragraph of the third page is the same as the paragraph attribute of the last paragraph of the first page, the content of the first paragraph of the third page is larger than the preset threshold, and the third page is the page adjacent to the first page in the target referee document.
4. A method according to claim 3, wherein when the last paragraph of the first page contains a number of words less than the preset threshold, the method further comprises:
acquiring a paragraph attribute of the penultimate paragraph of the first page;
determining a fourth page from other pages except the first page in the target referee document based on the paragraph attribute of the first page penultimate paragraph, wherein the paragraph attribute of the first paragraph of the fourth page is adjacent to the paragraph attribute of the first page penultimate paragraph, and the fourth page is adjacent to the first page.
5. A method according to claim 3, wherein when the content of the first paragraph of the third page is less than the preset threshold, the method further comprises:
and determining a fifth page from other pages except the first page and the third page in the target referee document, wherein the paragraph attribute of a second paragraph of the fifth page is adjacent to the paragraph attribute of a last paragraph of the first page, and the fifth page is adjacent to the first page.
6. A text page ordering apparatus, comprising:
an acquisition unit for acquiring a target referee document;
the segmentation unit is used for segmenting each page in the target referee document;
the determining unit is used for determining paragraph attributes of the paragraphs of each page in the target referee document, wherein the paragraph attributes comprise at least one of litigation participants, trial passes, prosecution, evidence and referee results;
the sorting unit is used for sorting the pages of the target referee document based on the paragraph attribute of the paragraph of each page in the target referee document and a preset paragraph sorting rule;
the sorting unit is specifically configured to:
judging whether the last paragraph of the first page of the target referee document is a separated paragraph, wherein the first page is any page which is not the last page in the target referee document;
if not, acquiring the paragraph attribute of the last paragraph of the first page;
and determining a second page adjacent to the first page from each first page in the target referee document based on the paragraph attribute of the last paragraph of the first page and a preset paragraph sorting rule, wherein the paragraph attribute of the first paragraph of the second page is adjacent to the paragraph attribute of the last paragraph of the first page, and the first paragraph of the second page is an undivided paragraph.
7. A processor for running a computer program, which when run performs the steps of the method according to any one of claims 1 to 5.
8. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method according to any one of claims 1 to 5 when executed by a processor.
CN201910062067.0A 2019-01-22 2019-01-22 Text page ordering method and related equipment Active CN111460272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910062067.0A CN111460272B (en) 2019-01-22 2019-01-22 Text page ordering method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910062067.0A CN111460272B (en) 2019-01-22 2019-01-22 Text page ordering method and related equipment

Publications (2)

Publication Number Publication Date
CN111460272A CN111460272A (en) 2020-07-28
CN111460272B true CN111460272B (en) 2024-02-13

Family

ID=71679888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910062067.0A Active CN111460272B (en) 2019-01-22 2019-01-22 Text page ordering method and related equipment

Country Status (1)

Country Link
CN (1) CN111460272B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632948B (en) * 2020-12-29 2023-01-10 天津汇智星源信息技术有限公司 Case document ordering method and related equipment
CN117275649B (en) * 2023-11-22 2024-01-30 浙江太美医疗科技股份有限公司 Method and device for ordering document medical record pictures, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631784A (en) * 2012-08-21 2014-03-12 腾讯科技(深圳)有限公司 Page content retrieval method and system
CN103631794A (en) * 2012-08-22 2014-03-12 百度在线网络技术(北京)有限公司 Method, device and equipment for sorting search results
CN107977346A (en) * 2017-11-23 2018-05-01 万兴科技股份有限公司 A kind of PDF document edit methods and terminal device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9152730B2 (en) * 2011-11-10 2015-10-06 Evernote Corporation Extracting principal content from web pages

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631784A (en) * 2012-08-21 2014-03-12 腾讯科技(深圳)有限公司 Page content retrieval method and system
CN103631794A (en) * 2012-08-22 2014-03-12 百度在线网络技术(北京)有限公司 Method, device and equipment for sorting search results
CN107977346A (en) * 2017-11-23 2018-05-01 万兴科技股份有限公司 A kind of PDF document edit methods and terminal device

Also Published As

Publication number Publication date
CN111460272A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
US9104710B2 (en) Method for cross-domain feature correlation
CN102799647B (en) Method and device for webpage reduplication deletion
Salleh et al. Adoption of Big Data Solutions: A study on its security determinants using Sec-TOE Framework
CN106951571B (en) Method and device for labeling application with label
JP2021504789A (en) ESG-based corporate evaluation execution device and its operation method
CN106991175B (en) Customer information mining method, device, equipment and storage medium
CN107015961A (en) A kind of text similarity comparison method
CN110046278B (en) Video classification method and device, terminal equipment and storage medium
CN110287409B (en) Webpage type identification method and device
CN108932291B (en) Power grid public opinion evaluation method, storage medium and computer
CN110019785B (en) Text classification method and device
CN110019669B (en) Text retrieval method and device
CN111460272B (en) Text page ordering method and related equipment
CN106776609A (en) Reprint the statistical method and device of quantity in website
Story et al. Which apps have privacy policies? an analysis of over one million google play store apps
CN107330592A (en) A kind of screening technique, device and the computing device of target Enterprise Object
CN116109373A (en) Recommendation method and device for financial products, electronic equipment and medium
CN106168968A (en) A kind of Website classification method and device
CN106033444B (en) Text content clustering method and device
CN110019771B (en) Text processing method and device
CN107368464B (en) Method and device for acquiring bidding product information
CN111198934A (en) Information processing method and related equipment
CN104991920A (en) Label generation method and apparatus
CN110727767B (en) Method and system for expanding text sample
CN105205058A (en) Data processing system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant