CN113204610A - Automatic cataloguing method based on criminal case electronic file and computer readable storage device - Google Patents
Automatic cataloguing method based on criminal case electronic file and computer readable storage device Download PDFInfo
- Publication number
- CN113204610A CN113204610A CN202110490523.9A CN202110490523A CN113204610A CN 113204610 A CN113204610 A CN 113204610A CN 202110490523 A CN202110490523 A CN 202110490523A CN 113204610 A CN113204610 A CN 113204610A
- Authority
- CN
- China
- Prior art keywords
- electronic
- information
- evidence
- electronic file
- criminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 239000000463 material Substances 0.000 claims description 11
- 238000011835 investigation Methods 0.000 claims description 8
- 238000004519 manufacturing process Methods 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 238000002474 experimental method Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims 1
- 230000006872 improvement Effects 0.000 abstract description 3
- 230000014509 gene expression Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000003058 natural language processing Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005034 decoration Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000012776 electronic material Substances 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/328—Management therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3322—Query formulation using system suggestions
- G06F16/3323—Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Technology Law (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an automatic cataloguing method based on criminal case electronic files and a storage device capable of being read by a computer, which comprises the following steps: A) identifying the input electronic file information; B) extracting basic information in the electronic files, and judging whether the basic information is a new electronic file and then sorting; C) executing accurate extraction operation to generate electronic index information and an information catalog of information types; D) executing evidence classification operation to form evidence type electronic index information and an evidence catalog; E) and repeating the operation, and storing and feeding back the cataloguing information. The invention provides a more organized, clear and efficient recording flow mode for recording the electronic files of criminal cases, converts the electronic files into database data in the shortest time and obtains the required information in the required electronic files, so that the examination of criminal cases can be carried out in the most efficient mode, and effective assistance is provided for the improvement of judicial level.
Description
Technical Field
The invention relates to the technical field of data arrangement, in particular to a criminal case electronic volume based automatic cataloguing method and a computer-readable storage device.
Background
In the judicial industry, electronic portfolio refers to a collection of all materials involved in the court of law approval of a case, including envelopes, back covers, portfolio catalogues, decision books, citations, inquiry notes, and the like. Typically, a file is formed in units of cases, each case containing a lot of material.
For convenience of management, it is often necessary to classify various materials, and as a criminal case, criminal electronic papers are the most important electronic material evidence data for criminal decision books. The existing automatic material type identification system for the electronic files is to pre-establish an information tag type database of the electronic files, compare the materials to be identified with rules in a material type rule base, and classify the materials of the electronic files according to the comparison result.
However, the classification standards of information tags in the prior art are scattered, so that the problems that the input information is inconsistent, the input content has entry deviation, the input electronic files cannot be classified by more clear entries and the like can occur in the input of different electronic files of criminal cases, the input efficiency of the electronic files becomes low, the efficiency of calling the electronic files after the input is low, and the criminal litigation is not facilitated and the final judgment is not decided.
In patent No. CN109472424B, a method, an apparatus, a storage medium, and a server for predicting the actual criminal period of a crime are disclosed, which are explained in terms of man-machine interaction, but all the disclosed technical solutions belong to cases themselves, which realize the estimation of the criminal period before a sanction and give an expected sanction document based on existing basic information, but do not give a corresponding solution in terms of the entry and arrangement of the actual sanction document after a sanction, and the extraction method in the prior art cannot be well adapted to the above technical solutions.
Disclosure of Invention
Aiming at the technical defects in the background technology, the invention provides an automatic cataloguing method based on criminal case electronic volume and a storage device which can be read by a computer, solves the technical problems and meets the actual requirements, and the specific technical scheme is as follows:
the automatic cataloguing method based on the criminal case electronic volume comprises the following steps:
A) based on an input interface of a user, acquiring an electronic file of a criminal case after manual input by the user, and storing the electronic file in advance by a temporary storage space, wherein the electronic file comprises a plurality of pieces of basic information, if any one piece of basic information is missing, the electronic file cannot be judged as the electronic file, feeding the missing content back to the input interface of the user for prompting, and the electronic file is continuously stored in the temporary storage space for standby application, otherwise, executing the next basic information extraction operation;
B) extracting titles, page numbers and manufacturing dates in the electronic files, judging whether the titles, the page numbers and the manufacturing dates which do not belong to the same electronic file exist, if any condition is not matched, identifying the electronic files as a new electronic file, placing the electronic files in a temporary queue in a temporary storage space for standby according to the identified sequence, and executing the next accurate extraction operation, otherwise, directly executing the next accurate extraction operation;
C) performing accurate extraction operation on the foremost electronic file in the temporary queue, extracting the evidence-obtaining object, inquiry times, evidence-obtaining time, inquiry occasions, participators and record content information in the electronic file, synchronously identifying record content with the same name as the evidence-obtaining object, generating electronic index information of information types, synchronously generating an information catalog, and performing the next evidence classification operation;
D) executing evidence classification operation on the electronic files which are executed with the accurate extraction operation in the last step, automatically dividing and sequencing the evidence according to types, forming evidence type electronic index information in a single electronic file, synchronously generating an evidence catalogue, inputting the electronic index information and the information catalogue of the information type, the electronic index information and the evidence catalogue of the evidence type into a permanent storage database, and simultaneously moving the electronic files into the permanent storage database to form structured electronic file data;
E) and C, if the electronic files which are not executed in the step C exist in the temporary queue, sequentially executing the operations from the step C to the step D and entering the step, otherwise, finishing the cataloging operation of the electronic files and feeding back the cataloging result information to a user operation interface.
As a further technical solution of the present invention, the types of the evidence are divided into 8 types, and are distinguished according to the following specific categories:
material evidence; (II) book and certificate; (III) witness testimony; (IV) the victim states; (V) providing and resolving criminal suspects and defendees; (VI) appraising the opinions; (VII) recording of experiments such as investigation, inspection, identification and investigation; (VIII) Audio-visual data and electronic data.
As a further technical solution of the present invention, the input interface of the user includes: the recording port is used for reading the electronic file from the mobile storage medium or the intelligent device through direct connection or wireless connection, and the recording port is used for reading the electronic file from the scanning recording device indirectly.
And C, labeling the directory information established in the step C and the step D, performing text analysis on the labeled directory information, and only keeping noun components after removing non-noun components.
As a further technical solution of the present invention, after the catalog information is converted into a label with only name word components, the label is classified into a single entry set according to the definition of synonyms or near-synonyms in the "modern chinese dictionary", and the word with the highest use frequency in the entry set and the noun with the middle and last use frequency are used as the entry set name.
As a further technical solution of the present invention, the persistent storage database rearranges the directory information in time sequence according to a user-defined time interval ranging from 5 minutes to 3 months, and counts the frequency of the directory information called most in the user-defined time interval and feeds back the frequency to the user input interface.
The invention also discloses a memory, which comprises: one or more processors and one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: a method for performing automatic cataloguing based on criminal case electronic volume is performed.
The invention has the beneficial effects that: provides a more organized, clear and efficient entering flow mode for entering criminal case electronic files, accurately extracts basic information and case information in the electronic files, particularly evidential information playing a great role, establishes accurate electronic indexes in a targeted manner and catalogues the electronic indexes, facilitates a database system in a memory to classify and store the electronic files with more items, enables calling of the electronic files to be cut in from more dimensions, enables more keyword information to be directional vector values, facilitates case handling personnel to convert the electronic files into database information in the shortest time, can obtain required information in the required electronic files in the shortest time when calling is required, and can be carried out more smoothly and completely in the aspect of the whole case processing flow of an intelligent criminal measuring system, the case can be processed based on an intelligent system from case to case, so that the examination of criminal cases can be developed in the most efficient mode, and effective assistance is provided for the improvement of judicial level.
Detailed Description
Embodiments of the present invention will be described with reference to the following examples, which are only preferred examples for better illustrating the present invention itself, and the embodiments of the present invention are not limited to the following examples, and the present invention relates to the relevant parts in the technical field, which should be regarded as the known technology in the technical field and can be known and mastered by those skilled in the technical field.
The automatic cataloguing method based on the criminal case electronic volume comprises the following steps:
A) based on an input interface of a user, acquiring an electronic file of a criminal case after manual input by the user, and storing the electronic file in advance by a temporary storage space, wherein the electronic file comprises a plurality of pieces of basic information, if any one piece of basic information is missing, the electronic file cannot be judged as the electronic file, feeding the missing content back to the input interface of the user for prompting, and the electronic file is continuously stored in the temporary storage space for standby application, otherwise, executing the next basic information extraction operation;
B) extracting titles, page numbers and manufacturing dates in the electronic files, judging whether the titles, the page numbers and the manufacturing dates which do not belong to the same electronic file exist, if any condition is not matched, identifying the electronic files as a new electronic file, placing the electronic files in a temporary queue in a temporary storage space for standby according to the identified sequence, and executing the next accurate extraction operation, otherwise, directly executing the next accurate extraction operation;
C) performing accurate extraction operation on the foremost electronic file in the temporary queue, extracting the evidence-obtaining object, inquiry times, evidence-obtaining time, inquiry occasions, participators and record content information in the electronic file, synchronously identifying record content with the same name as the evidence-obtaining object, generating electronic index information of information types, synchronously generating an information catalog, and performing the next evidence classification operation;
D) executing evidence classification operation on the electronic files which are executed with the accurate extraction operation in the last step, automatically dividing and sequencing the evidence according to types, forming evidence type electronic index information in a single electronic file, synchronously generating an evidence catalogue, inputting the electronic index information and the information catalogue of the information type, the electronic index information and the evidence catalogue of the evidence type into a permanent storage database, and simultaneously moving the electronic files into the permanent storage database to form structured electronic file data;
E) and C, if the electronic files which are not executed in the step C exist in the temporary queue, sequentially executing the operations from the step C to the step D and entering the step, otherwise, finishing the cataloging operation of the electronic files and feeding back the cataloging result information to a user operation interface.
In this application, the electronic file of a criminal case may be an electronic data text, or an electronic data text obtained by converting a file of a non-electronic carrier by manually inputting the file of the non-electronic carrier or by scanning, etc., it should be noted that the assembly of the electronic file referred to in the present invention is an assembly of electronic files for criminal cases, and as is well known, case types can be classified into 3 types according to the definition in the legal aspect: criminal case, civil button, administrative case, the place different with other 2 types of cases lies in, among the criminal case, survey to suspect or witness and the collection of various evidences are attach attention to very much, consequently, the electronic file of criminal case if adopt with the same cataloguing mode of 2 other cases, will probably lose a lot of key information, reduce the efficiency of cataloguing, be unfavorable for the file in later stage and transfer, the investigation and the investigation of criminal case have brought certain inconvenience.
In the application, aiming at the electronic file assembly of criminal cases, a differentiated treatment is adopted for identification of other case files, firstly, information consistent with other cases is preferentially extracted and classified, and the part of information is mainly concentrated in information such as file titles, editors, assembly organs and the like, wherein in the titles, information classification related to cases can be carried out according to a NLP technology-BERT to obtain better description and representation of case elements, and explanation of element information priority is provided for a model, so that contents related to form cases such as cases, wounded persons and theft can be extracted from numerous keywords, and whether the electronic file is the file of the criminal case or not can be directly identified.
According to the interrelation of different case entities in the knowledge graph, mass judicial text information can be analyzed, deduced, extracted and fused automatically, in real time and definitely through an NLP (natural language processing technology), a regular expression is used to extract specific expressions and elements in a judgment document by combining a mode matching method, and the type of the extracted entities is labeled by using a named entity recognition technology. The information extraction technology is a key technology for preprocessing semi-structured and unstructured text data in the field of data mining, and the extraction process is a process of extracting specified events, facts and other information from the text and forming structured storage.
Under legal circumstances, the essence of natural language processing technology is to enable a machine model to accurately extract corresponding episodes from the legal language of legal documents, especially electronic files of criminal cases. Specifically, based on a designed criminal case body framework, semantic labeling and feature extraction are carried out on semi-structured and unstructured data in a batch of referee documents, tags which are good in structured degree and rich in semantic information are formed and stored in a case library, and valuable information conversion from 'sleeping' mass data to a support model is achieved.
Assuming that M is a fact element in a fact element set corresponding to an electronic volume of a certain criminal case, and synonymously transcribing the fact element to obtain n expressions of M1.. Mn, where the n expressions include the fact element M and a synonym of the fact element M, for each expression, a search can be performed from a permanent storage database in which case related vocabulary entries are collected in advance based on search algorithms such as BM25F, TF _ IDF, and the like to obtain a candidate criminal case keyword set corresponding to the expression, and finally obtain a candidate criminal case keyword set Wi of M, specifically as follows:
Wi=Wi1∪Wi2∪...∪Win,i∈[1,T],n∈[1,N]
the keyword set W' of criminal cases related to electronic volume is as follows:
W'=W1∪W2∪...∪Wt,t∈[1,T]
w' is a criminal case keyword set related to case A, M is the number of fact elements contained in a fact element set corresponding to case A, Wi is a criminal case keyword set of the ith fact element, N is the number of expressions obtained after the ith fact element is transferred synonymously, and whether the electronic file belongs to a criminal case or not and which type of criminal case belongs to are finally determined.
The automatic identification system should pre-input known and common label names, when the electronic files are input, the titles of the corresponding electronic files are identified, then the existing label names of the system are compared with the titles of the electronic files by using a short text similarity algorithm, when the comparison result is larger than a set threshold value, the high similarity between the preliminary label names and the titles of the electronic files is judged, and a correct label matching rule is formed; and when the comparison result is not greater than the set threshold value, judging that the similarity between the preliminary label name and the title of the electronic file material is low, and not forming a label matching rule.
After the electronic file is identified as belonging to a criminal case, the title, page number and production date of the document in the electronic file need to be accurately extracted, a plurality of evidence documents are automatically identified and analyzed, evidence-taking objects, inquiry times and the like in the record document are accurately extracted, and the homonymous records are automatically identified; and constructing an evidence classification model, automatically realizing the division and the sequencing of eight evidences, and generating an evidence catalog, wherein the execution of the processes is based on legal and legal relationships corresponding to knowledge in a case knowledge base in a memory so as to assist in identifying information related to cases in the electronic file.
Under the technical scheme disclosed by the invention, the information related to the case in the sanction document can be accurately extracted, summarized and sorted, so that the case can be hooked with a case file, the whole case can be processed by an intelligent system, the whole process of the case information can be carried out under the convenience of information electronization, and the case is more convenient and faster and is not easy to make mistakes due to human errors.
As a further technical solution of the present invention, the types of the evidence are divided into 8 types, and are distinguished according to the following specific categories:
material evidence; (II) book and certificate; (III) witness testimony; (IV) the victim states; (V) providing and resolving criminal suspects and defendees; (VI) appraising the opinions; (VII) recording of experiments such as investigation, inspection, identification and investigation; (VIII) Audio-visual data and electronic data.
The division of the evidence types is consistent with the identification principle of the labels, the automatic identification system should pre-input known and common evidence type names to identify the evidence nouns in the corresponding electronic files, then compares the existing evidence type names of the system with the evidence nouns in the electronic files by using a short text similarity algorithm, judges that the similarity between the titles of the primary label names and the electronic files is high, forms a correct label matching rule, otherwise, finally finishes the classification work of different evidence types, and facilitates the direct positioning and calling of the corresponding evidence and the electronic files where the evidence is located by a subsequent database according to the related labels.
In a preferred embodiment of the present invention, the input interface of the user includes: through the input port that direct connection or wireless connection read from mobile storage medium or smart machine to and the input port that reads electronic file in the indirect scanning input equipment of following, because the file is before changing into the electronic file, still be paper file very probably, if will type this partial file data, need turn into the electronic edition with the paper version, this just needs scanning input equipment, through discerning printed characters one by one, distinguish rare word through big data comparison, accomplish accurate discernment characters, and the form composing problem between the good characters, make it be as good as with the electronic file.
In one preferred embodiment of the invention, the catalog information established in the step C and the step D is labeled, the labeled catalog information is subjected to text analysis, only noun components are reserved after non-noun components are removed, and the noun components are more easily analyzed and processed by an NLP analysis algorithm, so that the most efficient and accurate electronic tag is obtained, and the database is favorable for classifying and summarizing different information of the electronic files.
In one preferred embodiment of the present invention, after the directory information is converted into a tag with only name components, the directory information is classified into a single entry set according to the definition of synonyms or synonyms in the "modern chinese dictionary", and the word with the highest frequency of use in the entry set and the nouns with the frequency of use at the middle and the last are used as the names of the entry set, firstly, the word with the highest frequency is definitely the most common word in the ordinary application, and is also an effective word which is most easily extracted by the case handling staff, and the relevant electronic volume can be found from the database at the highest speed, and secondly, the word with the highest frequency is also the most easily classified word, so that the burden of the execution program of the system can be effectively reduced; the middle-order and the last-order words have the possibility of being occasionally mentioned, and if the words cannot be labeled, the filing and calling speed of the electronic file is greatly slowed down when the words are selected, so that the words need to be listed separately.
In a preferred embodiment of the present invention, the persistent storage database rearranges the directory information in a time sequence according to a user-defined time interval ranging from 5 minutes to 3 months, counts the frequency of the directory information called most in the user-defined time interval, and feeds back the frequency to the user input interface.
Specifically, an NLP analysis algorithm is adopted to analyze and process the information types in the electronic files matched with each other, so that the positions of various types of information in the electronic files, such as case setting time, case setting reasons, evidence lists and the like, are obtained; then extracting corresponding data on different fields by a digital extraction algorithm; all data in the electronic files are subjected to statistical sequencing, wherein the data can be sequenced from large to small and from small to large, and different electronic files need to be converted into a uniform format, a uniform unit format, a year or a month; then accumulating all the same-kind data such as case values, the number of involved persons and the like to obtain an average value, and obtaining a final large data average value; and obtaining comprehensive criminal case information summary according to the maximum value and the minimum value in the sequencing result and the calculated average value.
The invention also discloses a memory, which comprises: one or more processors and one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: a method for performing automatic cataloguing based on criminal case electronic volume is performed.
Including but not limited to hard disks and optical disks. The memory may be used to store application programs and function modules, and the processor executes the application programs stored in the memory, thereby performing various function applications of the device and data processing. The memory may be internal or external memory, or include both internal and external memory. The internal memory may include, but is not limited to, a hard disk. The external memory may include, but is not limited to, an optical disc. The disclosed memory is by way of example only and not by way of limitation.
The processor is a control center of the terminal device, connects various parts of the entire device using various interfaces and lines, and performs various functions and processes data by operating or executing software programs and/or modules stored in the memory and calling data stored in the memory.
The invention has the beneficial effects that: provides a more organized, clear and efficient entering flow mode for entering criminal case electronic files, accurately extracts basic information and case information in the electronic files, particularly evidential information playing a great role, establishes accurate electronic indexes in a targeted manner and catalogues the electronic indexes, facilitates a database system in a memory to classify and store the electronic files with more items, enables calling of the electronic files to be cut in from more dimensions, enables more keyword information to be directional vector values, facilitates case handling personnel to convert the electronic files into database information in the shortest time, can obtain required information in the required electronic files in the shortest time when calling is required, and can be carried out more smoothly and completely in the aspect of the whole case processing flow of an intelligent criminal measuring system, the case can be processed based on an intelligent system from case setting to case closing, the examination of criminal cases can be developed in the most efficient mode, and effective assistance is provided for the improvement of judicial level.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (7)
1. The automatic cataloguing method based on the criminal case electronic volume is characterized by comprising the following steps of:
A) based on an input interface of a user, acquiring an electronic file of a criminal case after manual input by the user, and storing the electronic file in advance by a temporary storage space, wherein the electronic file comprises a plurality of pieces of basic information, if any one piece of basic information is missing, the electronic file cannot be judged as the electronic file, feeding the missing content back to the input interface of the user for prompting, and the electronic file is continuously stored in the temporary storage space for standby application, otherwise, executing the next basic information extraction operation;
B) extracting titles, page numbers and manufacturing dates in the electronic files, judging whether the titles, the page numbers and the manufacturing dates which do not belong to the same electronic file exist, if any condition is not matched, identifying the electronic files as a new electronic file, placing the electronic files in a temporary queue in a temporary storage space for standby according to the identified sequence, and executing the next accurate extraction operation, otherwise, directly executing the next accurate extraction operation;
C) performing accurate extraction operation on the foremost electronic file in the temporary queue, extracting the evidence-obtaining object, inquiry times, evidence-obtaining time, inquiry occasions, participators and record content information in the electronic file, synchronously identifying record content with the same name as the evidence-obtaining object, generating electronic index information of information types, synchronously generating an information catalog, and performing the next evidence classification operation;
D) executing evidence classification operation on the electronic files which are executed with the accurate extraction operation in the last step, automatically dividing and sequencing the evidence according to types, forming evidence type electronic index information in a single electronic file, synchronously generating an evidence catalogue, inputting the electronic index information and the information catalogue of the information type, the electronic index information and the evidence catalogue of the evidence type into a permanent storage database, and simultaneously moving the electronic files into the permanent storage database to form structured electronic file data;
E) and C, if the electronic files which are not executed in the step C exist in the temporary queue, sequentially executing the operations from the step C to the step D and entering the step, otherwise, finishing the cataloging operation of the electronic files and feeding back the cataloging result information to a user operation interface.
2. The criminal case electronic volume based automatic cataloguing method according to claim 1, wherein said evidence is classified into 8 types and distinguished according to the following specific categories:
material evidence; (II) book and certificate; (III) witness testimony; (IV) the victim states; (V) providing and resolving criminal suspects and defendees; (VI) appraising the opinions; (VII) recording of experiments such as investigation, inspection, identification and investigation; (VIII) Audio-visual data and electronic data.
3. The criminal case electronic volume based automatic cataloguing method according to claim 1, wherein said user's input interface comprises: the recording port is used for reading the electronic file from the mobile storage medium or the intelligent device through direct connection or wireless connection, and the recording port is used for reading the electronic file from the scanning recording device indirectly.
4. The automatic cataloging method based on criminal case electronic volume as claimed in claim 1, wherein said cataloging information created in step C and step D is labeled, and said labeled cataloging information is text analyzed, and only noun components are retained after non-noun components are removed.
5. The method for automatic cataloguing based on criminal case electronic volume according to claim 4, wherein said catalogue information is converted into a label with only name word components, then classified into a single entry set according to the definition of synonyms or near-synonyms in the "modern Chinese dictionary", and the words with highest usage frequency in the entry set and the nouns with middle and end usage frequency are used as the entry set names.
6. The automatic cataloguing method based on criminal case electronic volume according to claim 1 or 5, wherein said permanent storage database rearranges catalog information according to time sequence according to customized time interval in the range of 5 minutes to 3 months, and counts the frequency of said catalog information called most in said customized time interval and feeds back to user input interface.
7. A memory, comprising: one or more processors and one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: method of performing criminal case electronic volume based automatic cataloguing according to any of the claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110490523.9A CN113204610A (en) | 2021-05-06 | 2021-05-06 | Automatic cataloguing method based on criminal case electronic file and computer readable storage device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110490523.9A CN113204610A (en) | 2021-05-06 | 2021-05-06 | Automatic cataloguing method based on criminal case electronic file and computer readable storage device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113204610A true CN113204610A (en) | 2021-08-03 |
Family
ID=77029015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110490523.9A Pending CN113204610A (en) | 2021-05-06 | 2021-05-06 | Automatic cataloguing method based on criminal case electronic file and computer readable storage device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113204610A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116402477A (en) * | 2023-06-07 | 2023-07-07 | 山东韵升科技股份有限公司 | File digital information management system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361111A (en) * | 2014-11-28 | 2015-02-18 | 青岛大学 | Automatic archive editing method |
CN108171639A (en) * | 2018-01-10 | 2018-06-15 | 南京市公安局 | Electronics files application process based on police service comprehensive platform |
CN109902288A (en) * | 2019-01-17 | 2019-06-18 | 深圳壹账通智能科技有限公司 | Intelligent clause analysis method, device, computer equipment and storage medium |
CN110135264A (en) * | 2019-04-16 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Data entry method, device, computer equipment and storage medium |
CN110209632A (en) * | 2019-05-27 | 2019-09-06 | 武汉市润普网络科技有限公司 | A kind of electronics folder with case production, turn shelves system |
CN110502929A (en) * | 2019-09-02 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of method, apparatus of information processing, equipment and storage medium |
CN111858489A (en) * | 2020-07-17 | 2020-10-30 | 中国电子科技集团公司第五十四研究所 | Multi-source heterogeneous spatial data archiving method based on self-adaptive metadata template |
CN112463726A (en) * | 2020-11-19 | 2021-03-09 | 深圳供电局有限公司 | Automatic mobile financial bill filing method |
CN112733658A (en) * | 2020-12-31 | 2021-04-30 | 北京华宇信息技术有限公司 | Electronic document filing method and device |
-
2021
- 2021-05-06 CN CN202110490523.9A patent/CN113204610A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361111A (en) * | 2014-11-28 | 2015-02-18 | 青岛大学 | Automatic archive editing method |
CN108171639A (en) * | 2018-01-10 | 2018-06-15 | 南京市公安局 | Electronics files application process based on police service comprehensive platform |
CN109902288A (en) * | 2019-01-17 | 2019-06-18 | 深圳壹账通智能科技有限公司 | Intelligent clause analysis method, device, computer equipment and storage medium |
CN110135264A (en) * | 2019-04-16 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Data entry method, device, computer equipment and storage medium |
CN110209632A (en) * | 2019-05-27 | 2019-09-06 | 武汉市润普网络科技有限公司 | A kind of electronics folder with case production, turn shelves system |
CN110502929A (en) * | 2019-09-02 | 2019-11-26 | 腾讯科技(深圳)有限公司 | A kind of method, apparatus of information processing, equipment and storage medium |
CN111858489A (en) * | 2020-07-17 | 2020-10-30 | 中国电子科技集团公司第五十四研究所 | Multi-source heterogeneous spatial data archiving method based on self-adaptive metadata template |
CN112463726A (en) * | 2020-11-19 | 2021-03-09 | 深圳供电局有限公司 | Automatic mobile financial bill filing method |
CN112733658A (en) * | 2020-12-31 | 2021-04-30 | 北京华宇信息技术有限公司 | Electronic document filing method and device |
Non-Patent Citations (1)
Title |
---|
黄天元: "《文本数据挖掘 基于R语言》", 北京:机械工业出版社, pages: 182 - 183 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116402477A (en) * | 2023-06-07 | 2023-07-07 | 山东韵升科技股份有限公司 | File digital information management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992645B (en) | Data management system and method based on text data | |
US8315997B1 (en) | Automatic identification of document versions | |
JP3855551B2 (en) | Search method and search system | |
CN110851598B (en) | Text classification method and device, terminal equipment and storage medium | |
Choudhury et al. | Figure metadata extraction from digital documents | |
CN109670014B (en) | Paper author name disambiguation method based on rule matching and machine learning | |
US8510312B1 (en) | Automatic metadata identification | |
CN111723564B (en) | Event extraction and processing method for case-following electronic file | |
CN114911917B (en) | Asset meta-information searching method and device, computer equipment and readable storage medium | |
CN111382184A (en) | Method for verifying drug document and drug document verification system | |
CN113342984A (en) | Garden enterprise classification method and system, intelligent terminal and storage medium | |
CN113515622A (en) | Classified storage system for archive data | |
CN113204610A (en) | Automatic cataloguing method based on criminal case electronic file and computer readable storage device | |
CN113591476A (en) | Data label recommendation method based on machine learning | |
CN112632958A (en) | Contract document examination and analysis method based on contract knowledge base | |
CN112699949B (en) | Potential user identification method and device based on social platform data | |
CN114996400A (en) | Referee document processing method and device, electronic equipment and storage medium | |
CN114495138A (en) | Intelligent document identification and feature extraction method, device platform and storage medium | |
CN113342949A (en) | Matching method and system of intellectual library experts and topic to be researched | |
Cao et al. | Vector model based indexing and retrieval of handwritten medical forms | |
CN112818215A (en) | Product data processing method, device, equipment and storage medium | |
CN110765263B (en) | Display method and device for search cases | |
Asfoor | Applying Data Science Techniques to Improve Information Discovery in Oil And Gas Unstructured Data | |
CN117851602B (en) | Automatic legal document classification method and system based on deep learning | |
CN115858738B (en) | Enterprise public opinion information similarity identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210803 |
|
RJ01 | Rejection of invention patent application after publication |