CN110543457A - Track type document processing method and device, storage medium and electronic device - Google Patents

Track type document processing method and device, storage medium and electronic device Download PDF

Info

Publication number
CN110543457A
CN110543457A CN201910860042.5A CN201910860042A CN110543457A CN 110543457 A CN110543457 A CN 110543457A CN 201910860042 A CN201910860042 A CN 201910860042A CN 110543457 A CN110543457 A CN 110543457A
Authority
CN
China
Prior art keywords
keyword
event
keywords
track
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910860042.5A
Other languages
Chinese (zh)
Inventor
朱传山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910860042.5A priority Critical patent/CN110543457A/en
Publication of CN110543457A publication Critical patent/CN110543457A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

the invention discloses a track type document processing method and device, a storage medium and an electronic device. Wherein, the method comprises the following steps: acquiring a track class document to be processed; extracting time keywords in the track documents, and determining document data matched with the time keywords in the track documents; extracting event keywords from the document data; and sequencing the event keywords according to the time keywords to generate an event track matched with the target event. The invention solves the technical problem of low efficiency of processing the track type document caused by the complicated manual processing process of the track type document.

Description

Track type document processing method and device, storage medium and electronic device
Technical Field
the invention relates to the field of computers, in particular to a track type document processing method and device, a storage medium and an electronic device.
Background
the contents in the track document in the public security data are basically normalized statements, for example: somewhere, x month x number somewhere; taking a car to a certain place, x month x; the track type document processing method has the advantages that a certain position is located, x time x of x month and x day is divided into y time y of y month and y day, the track type document processing process is very complex, the track type document needs to be input into a system after manual processing, time and labor are consumed, and the technical problem of low track type document processing efficiency caused by the complex track type document manual processing process is solved.
in view of the above problems, no effective solution has been proposed.
Disclosure of Invention
the embodiment of the invention provides a track type document processing method and device, a storage medium and an electronic device, and at least solves the technical problem of low track type document processing efficiency caused by complex track type document manual processing process.
according to an aspect of the embodiments of the present invention, there is provided a track-class document processing method, including: acquiring a track type document to be processed, wherein the track type document is used for recording a target event; extracting time keywords in the track documents, and determining document data matched with the time keywords in the track documents; extracting event keywords from the document data, wherein the event keywords comprise at least one of the following: an object keyword corresponding to an object appearing in the target event and a place keyword corresponding to a place appearing in the target event; and sequencing the event keywords according to the time keywords to generate an event track matched with the target event.
As an optional implementation manner, the sorting the event keywords according to the time keywords includes: establishing a keyword pair between the time keyword and the event keyword; sorting the keyword pairs according to a time sequence indicated by the time keywords to obtain sorted keyword pairs, wherein in the sorted keyword pairs, the event keywords are sorted according to the time sequence; and taking the sequenced keyword pairs as the event tracks of the target events.
As an optional implementation manner, the determining the document data in the track-class document that matches the time keyword includes: determining a target text line where the time keyword is located in the track type document; acquiring an associated text line adjacent to the target text line, wherein the associated text line comprises at least one of the following: a first number of text lines preceding and adjacent to the target text line, a second number of text lines following and adjacent to the target text line; and determining the data in the associated text line as the document data.
As an optional implementation manner, the extracting the event keyword from the document data includes: traversing and searching the document data by using the object keywords in the keyword database; and determining the searched words matched with the object keywords as the event keywords.
As an optional implementation manner, the acquiring a track class document to be processed includes: acquiring a target document set; determining a document containing the object keyword from the target document set as a candidate document set; and acquiring the track class document from the candidate document set.
As an optional implementation manner, after the sorting the event keywords according to the time keywords to generate an event track matching the target event, the method further includes: obtaining a search request, wherein the search request carries a target keyword, and the target keyword is at least one of the following keywords: the time keyword, the object keyword, and the place keyword; and responding to the search request, and acquiring a target event track matched with the target keyword.
according to another aspect of the embodiments of the present invention, there is also provided a track-class document processing apparatus, including: the system comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring a track class document to be processed, and the track class document is used for recording a target event; the determining unit is used for extracting the time keywords in the track documents and determining document data matched with the time keywords in the track documents; an extracting unit, configured to extract an event keyword from the document data, where the event keyword includes at least one of: an object keyword corresponding to an object appearing in the target event and a place keyword corresponding to a place appearing in the target event; and the generating unit is used for sequencing the event keywords according to the time keywords so as to generate an event track matched with the target event.
As an optional implementation, the generating unit includes: the establishing module is used for establishing a keyword pair between the time keyword and the event keyword; a sorting module, configured to sort the keyword pairs according to a time sequence indicated by the time keyword to obtain sorted keyword pairs, where in the sorted keyword pairs, the event keyword is sorted according to the time sequence; and the generation module is used for taking the sequenced keyword pairs as the event tracks of the target events.
as an optional implementation, the determining unit includes: the first determining module is used for determining a target text line where the time keyword is located in the track type document; a first obtaining module, configured to obtain an associated text line adjacent to the target text line, where the associated text line includes at least one of: a first number of text lines preceding and adjacent to the target text line, a second number of text lines following and adjacent to the target text line; and the second determining module is used for determining the data in the associated text line as the document data.
As an optional implementation, the extracting unit includes: the searching module is used for performing traversal searching on the document data by using the object keywords in the keyword database; and the third determining module is used for determining the searched words matched with the object keywords as the event keywords.
As an optional implementation, the obtaining unit includes: the second acquisition module is used for acquiring a target document set; a third determining module, configured to determine, from the target document set, documents including the object keyword as a candidate document set; and the third acquisition module is used for acquiring the track class document from the candidate document set.
As an optional implementation, the method further includes: a second obtaining unit, configured to obtain a search request after the event keywords are sorted according to the time keywords to generate an event track matching the target event, where the search request carries target keywords, and the target keywords are at least one of the following keywords: the time keyword, the object keyword, and the place keyword; and the third acquisition unit is used for responding to the search request and acquiring the target event track matched with the target keyword.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the above-mentioned track-class document processing method when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above-mentioned track-class document processing method through the computer program.
In the embodiment of the invention, the way of processing the common keywords of the track type documents by using a program algorithm is adopted, the keywords are analyzed and processed by using a programming technology, and people, time and places are analyzed, so that the aims of forming track information and automatically inputting the track information into a system for subsequent use and analysis are fulfilled, the technical effect of improving the efficiency of processing the track type documents is realized, and the technical problem of low efficiency of processing the track type documents caused by the complicated manual processing process of the track type documents is solved.
Drawings
the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart diagram illustrating an alternative track class document processing method according to an embodiment of the present invention;
FIG. 2 is a diagram of an alternative track class document processing method according to an embodiment of the invention;
FIG. 3 is a diagram of an alternative track class document processing apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an alternative track class document processing apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative track class document processing apparatus according to an embodiment of the present invention;
FIG. 6 is a diagram of an alternative track class document processing apparatus according to an embodiment of the present invention.
Detailed Description
in order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
according to an aspect of the embodiments of the present invention, there is provided a track class document processing method, optionally, as an optional implementation manner, as shown in fig. 1, the track class document processing method includes:
S102, obtaining a track type document to be processed, wherein the track type document is used for recording a target event;
S104, extracting time keywords in the track documents, and determining document data matched with the time keywords in the track documents;
S106, extracting event keywords from the document data, wherein the event keywords comprise at least one of the following: an object keyword corresponding to an object appearing in the target event, and a place keyword corresponding to a place appearing in the target event;
And S108, sequencing the event keywords according to the time keywords to generate an event track matched with the target event.
Optionally, in this embodiment, the track-class document processing method is not limited to be applied to a case record document processing process in the public security field. The track-class document may be, but is not limited to, a Word processor application (Microsoft Office Word, Word) file from Microsoft corporation. The time keyword may be, but not limited to, a time point and a time period. The object keywords may include, but are not limited to, the name, number, gender, race, age, ethnicity, title, date of birth, address, vehicle, behavior, etc. of the object. The location keywords may include, but are not limited to, address names, longitude and latitude data, and the like.
It should be noted that, in this embodiment, a track-class document to be processed is obtained, where the track-class document is used to record a target event, extract a time keyword in the track-class document, determine document data in the track-class document matching the time keyword, and extract an event keyword from the document data, where the event keyword includes at least one of the following: and sorting the event keywords according to the time keywords to generate an event track matched with the target event.
For further example, as shown in fig. 2, the track class document 202 is first obtained, and then the track data 206 is generated through an automatic processing function of the track class document processing program 204, and finally the generated track data 206 is stored in the database 208. The automatic processing function of the track-type document processing program 204 can be completed through open source software poi and java program codes, the track-type document processing program 204 is compatible with various operating system platforms and has portability, common keywords are covered in the track-type document processing program 204 and are used for analyzing and processing people, time and places, and finally formed program track data 206 is stored in a database 208 and can be further used and analyzed by an operator.
For further example, Natural Language Processing (nlp) technology is used to obtain keyword rules from the track-class document Processing process, and the Processing efficiency of the track-class document is improved by continuous learning.
according to the embodiment provided by the application, the track type document is automatically processed and analyzed by the track type document processing program, so that the aim of replacing manual track type document processing is fulfilled, and the effect of improving the track type document processing efficiency is realized.
As an optional scheme, the sorting the event keywords according to the time keywords includes:
S1, establishing a keyword pair between the time keyword and the event keyword;
s2, sorting the keyword pairs according to the time sequence indicated by the time keywords to obtain sorted keyword pairs, wherein in the sorted keyword pairs, the event keywords are sorted according to the time sequence;
and S3, taking the sorted keyword pairs as event tracks of the target events.
It should be noted that, in this embodiment, a keyword pair between the time keyword and the event keyword is established, and the keyword pair is sorted according to the time sequence indicated by the time keyword to obtain a sorted keyword pair, where in the sorted keyword pair, the event keyword is sorted according to the time sequence, and the sorted keyword pair is used as the event track of the target event.
for further illustration, for example, a document content of a public security track class describes: no. 5 and No. 5 points 30 in No. 3 and month 10 in 2019 are suspected to be three out of the juvenile palace, and No. 5 and No. 45 in No. 10 and month 3 and month 10 in 2019 are suspected to be three out of the juvenile palace and reach a department store. Wherein, the time keyword in the content comprises: "No. 10/month No. 5/30 points in 2019 and" No. 10/month No. 5/45 points in 2019 ", the event keywords include: zhang San, juveniles palace and department store building. Further, determining the keyword pair as follows according to the content: "3/10/2019 points and 30 points" correspond to "zhang san" and "juju", and "3/10/5/45 points" correspond to "zhang san" and "department store". Further, the keyword pairs are ordered according to the time sequence in the time keywords, and the ordered keyword pairs are as follows: "30 points at No. 10 of month 3 in 2019, Zhang III, juveniles palace-45 points at No. 10 of month 3 in 2019, Zhang III, department store building", the above ordered keyword pairs are the event tracks in the above police track class documents.
According to the embodiment provided by the application, the purpose of obtaining the keyword sequencing in the key event and further obtaining the event track is achieved in a mode of establishing the keyword pair between the time keyword and the event keyword, and the effect of improving the comprehensiveness of the event track is achieved.
As an optional scheme, determining document data in the track-class document, which matches the time keyword, includes:
s1, determining a target text line of the time keyword in the track type document;
s2, obtaining an associated text line adjacent to the target text line, wherein the associated text line comprises at least one of the following: a first number of text lines before and adjacent to the target text line, a second number of text lines after and adjacent to the target text line;
S3, the data in the associated text line is determined as the document data.
it should be noted that, in this embodiment, a target text line where a time keyword is located in a track-like document is determined, and an associated text line adjacent to the target text line is obtained, where the associated text line includes at least one of the following: the data in the associated text line is determined as the document data in a first number of text lines before and adjacent to the target text line and in a second number of text lines after and adjacent to the target text line, and specifically, the target text line where the time keyword is located in the track-like document may be determined, but is not limited to, according to symbols, special words, and the like.
For further illustration, for example, a document content of a public security track class describes: 3, 10 and 5 o' clock in 2019, and 30 suspects that three people leave the juvenile palace; no. 5 and No. 45 in No. 3 and No. 10 in 2019 suspects that three suspects reach the department store building. Wherein, the time keywords are '10 # 5: 30 points in 3 and 10 months in 2019' and '5: 45 points in 3 and 10 months in 2019'. Further, by symbol "; "and". The "associated text behavior of" 3/10/2019 with 5 points and 30 points "is determined, the" 5 points and 30 points of 3/10/2019 with 3 points and 10 points and 30 points of 2019 with 5 points and 30 points "is the associated text behavior of" 3/10/2019 with 10 points and 5 points and 45 points "is determined, the" 5 points and 45 points of 3/10/2019 with 3 points and 5 points and 45 points of 2019 with 5 points and 45 points of 2019 with three points and 5.
by the embodiment provided by the application, the target text line where the time keyword is located in the track-type document is determined, the purpose of obtaining the associated text adjacent to the target text line to determine the document data corresponding to the time keyword is achieved, and the effect of improving the simplicity of determining the content through the time keyword is achieved.
as an alternative, the extracting the event keyword from the document data includes:
S1, using the object key words in the key word database to do traversal search in the document data;
and S2, determining the searched words matched with the object keywords as event keywords.
It should be noted that, in this embodiment, object keywords in the keyword database are used to perform traversal search in document data; and determining the searched words matched with the object keywords as event keywords.
to further illustrate, for example, the object keyword in the keyword database is "Zhang three". Further, three-in-one is searched in all the track class documents to be processed, and if three-in-one is recorded in one track class document, the three-in-one is determined as the event keyword of the track class document.
To further illustrate, for example, a certain bank track class document, the object keyword in the keyword database is "123123123" (e.g., bankcard number). Further, all the bank track class documents to be processed are searched for "123123123", and if a certain bank track class document is found to have "123123123", then "123123123" is determined as the event keyword of the track class document.
According to the embodiment provided by the application, the purpose of determining the searched words matched with the object keywords as the event keywords is achieved by using the object keywords in the keyword database and performing traversal search in the document data, and the effect of improving the accuracy of track-type document processing is achieved.
As an optional scheme, acquiring a track class document to be processed includes:
s1, acquiring a target document set;
S2, determining the documents containing the object keywords from the target document set as a candidate document set;
And S3, acquiring the track class document from the candidate document set.
it should be noted that, in this embodiment, a target document set is obtained, a document including an object keyword is determined from the target document set, and is used as a candidate document set, and a track-class document is obtained from the candidate document set. Further, for document data which cannot be identified and processed by the program, marking a line of data through the poi component, such as marking red, adding annotations, returning to the user, prompting the user to adjust the data until the data can be identified by the program, and automatically processing and warehousing the data.
To further illustrate, for example, a certain bank track class document, the object keyword in the keyword database is "123123123" (e.g., bankcard number). Further, 123123123 is searched in all the to-be-processed certain bank track documents, if "123123123" is recorded in ten certain bank track documents, the ten certain bank track documents are used as a candidate document set, and track documents are obtained from the candidate document set.
according to the embodiment provided by the application, the purpose of preprocessing the track documents is achieved in a mode of acquiring the candidate document set through the event keywords, and the efficiency of processing the track documents is improved.
as an optional scheme, after the event keywords are sorted according to the time keywords to generate the event track matching the target event, the method further includes:
S1, obtaining a search request, wherein the search request carries a target keyword, and the target keyword is at least one of the following keywords: time keywords, object keywords, place keywords;
and S2, responding to the search request, and acquiring the target event track matched with the target keyword.
it should be noted that, in this embodiment, a search request is obtained, where the search request carries a target keyword, and the target keyword is at least one of the following keywords: and responding to the search request by the time keyword, the object keyword and the place keyword to acquire a target event track matched with the target keyword.
For further example, the sorted event tracks are stored in the system and are paired with the time keywords and the event keyword settings, that is, in the case of inputting a search request time keyword, an object keyword or a place keyword, a target event track matching the time keywords, the object keyword or the place keyword is called and displayed.
According to the embodiment provided by the application, the target event track is matched with the target keyword, the purpose of obtaining the target event track matched with the target keyword in searching the target keyword is achieved, and the effect of improving the reading efficiency of track documents is achieved.
According to another aspect of the embodiment of the present invention, there is also provided a track class document processing apparatus for implementing the track class document processing method. As shown in fig. 3, the apparatus includes:
A first obtaining unit 302, configured to obtain a track class document to be processed, where the track class document is used to record a target event;
A determining unit 304, configured to extract a time keyword in the track-class document, and determine document data in the track-class document matching the time keyword;
An extracting unit 306, configured to extract an event keyword from the document data, wherein the event keyword includes at least one of: an object keyword corresponding to an object appearing in the target event, and a place keyword corresponding to a place appearing in the target event;
the generating unit 308 is configured to sort the event keywords according to the time keywords to generate an event track matching the target event.
Optionally, in this embodiment, the track-class document processing apparatus is not limited to be applied to a process of processing case record documents in the public security field. The track-class document may be, but is not limited to, a Word processor application (Microsoft Office Word, Word) file from Microsoft corporation. The time keyword may be, but not limited to, a time point and a time period. The object keywords may include, but are not limited to, the name, number, gender, race, age, ethnicity, title, date of birth, address, etc. of the object. The location keywords may include, but are not limited to, address names, longitude and latitude data, and the like.
It should be noted that, in this embodiment, a track-class document to be processed is obtained, where the track-class document is used to record a target event, extract a time keyword in the track-class document, determine document data in the track-class document matching the time keyword, and extract an event keyword from the document data, where the event keyword includes at least one of the following: and sorting the event keywords according to the time keywords to generate an event track matched with the target event.
For further example, as shown in fig. 2, the track class document 202 is first obtained, and then the track data 206 is generated through an automatic processing function of the track class document processing program 204, and finally the generated track data 206 is stored in the database 208. The automatic processing function of the track-type document processing program 204 can be completed through open source software poi and java program codes, the track-type document processing program 204 is compatible with various operating system platforms and has portability, common keywords are covered in the track-type document processing program 204 and are used for analyzing and processing people, time and places, and finally formed program track data 206 is stored in a database 208 and can be further used and analyzed by an operator.
for further example, Natural Language Processing (nlp) technology is used to obtain keyword rules from the track-class document Processing process, and the Processing efficiency of the track-class document is improved by continuous learning.
according to the embodiment provided by the application, the track type document is automatically processed and analyzed by the track type document processing program, so that the aim of replacing manual track type document processing is fulfilled, and the effect of improving the track type document processing efficiency is realized.
as an alternative, as shown in fig. 4, the generating unit 308 includes:
An establishing module 402, configured to establish a keyword pair between the time keyword and the event keyword;
a sorting module 404, configured to sort the keyword pairs according to a time sequence indicated by the time keyword to obtain sorted keyword pairs, where in the sorted keyword pairs, the event keywords are sorted according to the time sequence;
and a generating module 406, configured to use the sorted keyword pairs as event tracks of the target events.
It should be noted that, in this embodiment, a keyword pair between the time keyword and the event keyword is established, and the keyword pair is sorted according to the time sequence indicated by the time keyword to obtain a sorted keyword pair, where in the sorted keyword pair, the event keyword is sorted according to the time sequence, and the sorted keyword pair is used as the event track of the target event.
for further illustration, for example, a document content of a public security track class describes: no. 5 and No. 5 points 30 in No. 3 and month 10 in 2019 are suspected to be three out of the juvenile palace, and No. 5 and No. 45 in No. 10 and month 3 and month 10 in 2019 are suspected to be three out of the juvenile palace and reach a department store. Wherein, the time keyword in the content comprises: "No. 10/month No. 5/30 points in 2019 and" No. 10/month No. 5/45 points in 2019 ", the event keywords include: zhang San, juveniles palace and department store building. Further, determining the keyword pair as follows according to the content: "3/10/2019 points and 30 points" correspond to "zhang san" and "juju", and "3/10/5/45 points" correspond to "zhang san" and "department store". Further, the keyword pairs are ordered according to the time sequence in the time keywords, and the ordered keyword pairs are as follows: "30 points at No. 10 of month 3 in 2019, Zhang III, juveniles palace-45 points at No. 10 of month 3 in 2019, Zhang III, department store building", the above ordered keyword pairs are the event tracks in the above police track class documents.
According to the embodiment provided by the application, the purpose of obtaining the keyword sequencing in the key event and further obtaining the event track is achieved in a mode of establishing the keyword pair between the time keyword and the event keyword, and the effect of improving the comprehensiveness of the event track is achieved.
as an alternative, as shown in fig. 5, the determining unit 304 includes:
A first determining module 502, configured to determine a target text line where the time keyword is located in the track-type document;
A first obtaining module 504, configured to obtain an associated text line adjacent to the target text line, where the associated text line includes at least one of: a first number of text lines before and adjacent to the target text line, a second number of text lines after and adjacent to the target text line;
And a second determining module 506, configured to determine data in the associated text line as document data.
it should be noted that, in this embodiment, a target text line where a time keyword is located in a track-like document is determined, and an associated text line adjacent to the target text line is obtained, where the associated text line includes at least one of the following: the data in the associated text line is determined as the document data in a first number of text lines before and adjacent to the target text line and in a second number of text lines after and adjacent to the target text line, and specifically, the target text line where the time keyword is located in the track-like document may be determined, but is not limited to, according to symbols, special words, and the like.
For further illustration, for example, a document content of a public security track class describes: 3, 10 and 5 o' clock in 2019, and 30 suspects that three people leave the juvenile palace; no. 5 and No. 45 in No. 3 and No. 10 in 2019 suspects that three suspects reach the department store building. Wherein, the time keywords are '10 # 5: 30 points in 3 and 10 months in 2019' and '5: 45 points in 3 and 10 months in 2019'. Further, by symbol "; "and". The "associated text behavior of" 3/10/2019 with 5 points and 30 points "is determined, the" 5 points and 30 points of 3/10/2019 with 3 points and 10 points and 30 points of 2019 with 5 points and 30 points "is the associated text behavior of" 3/10/2019 with 10 points and 5 points and 45 points "is determined, the" 5 points and 45 points of 3/10/2019 with 3 points and 5 points and 45 points of 2019 with 5 points and 45 points of 2019 with three points and 5.
by the embodiment provided by the application, the target text line where the time keyword is located in the track-type document is determined, the purpose of obtaining the associated text adjacent to the target text line to determine the document data corresponding to the time keyword is achieved, and the effect of improving the simplicity of determining the content through the time keyword is achieved.
as an alternative, as shown in fig. 6, the extracting unit 306 includes:
a search module 602, configured to perform traversal search in document data by using the object keywords in the keyword database;
And a third determining module 604, configured to determine the found word matching the object keyword as an event keyword.
It should be noted that, in this embodiment, object keywords in the keyword database are used to perform traversal search in document data; and determining the searched words matched with the object keywords as event keywords.
to further illustrate, for example, the object keyword in the keyword database is "Zhang three". Further, three-in-one is searched in all the track class documents to be processed, and if three-in-one is recorded in one track class document, the three-in-one is determined as the event keyword of the track class document.
To further illustrate, for example, a certain bank track class document, the object keyword in the keyword database is "123123123" (e.g., bankcard number). Further, all the bank track class documents to be processed are searched for "123123123", and if a certain bank track class document is found to have "123123123", then "123123123" is determined as the event keyword of the track class document.
According to the embodiment provided by the application, the purpose of determining the searched words matched with the object keywords as the event keywords is achieved by using the object keywords in the keyword database and performing traversal search in the document data, and the effect of improving the accuracy of track-type document processing is achieved.
As an optional solution, the obtaining unit includes:
The second acquisition module is used for acquiring a target document set;
The third determining module is used for determining the documents containing the object keywords from the target document set as a candidate document set;
and the third acquisition module is used for acquiring the track class document from the candidate document set.
it should be noted that, in this embodiment, a target document set is obtained, a document including an object keyword is determined from the target document set, and is used as a candidate document set, and a track-class document is obtained from the candidate document set. Further, for document data which cannot be identified and processed by the program, marking a line of data through the poi component, such as marking red, adding annotations, returning to the user, prompting the user to adjust the data until the data can be identified by the program, and automatically processing and warehousing the data.
To further illustrate, for example, a certain bank track class document, the object keyword in the keyword database is "123123123" (e.g., bankcard number). Further, 123123123 is searched in all the to-be-processed certain bank track documents, if "123123123" is recorded in ten certain bank track documents, the ten certain bank track documents are used as a candidate document set, and track documents are obtained from the candidate document set.
according to the embodiment provided by the application, the purpose of preprocessing the track documents is achieved in a mode of acquiring the candidate document set through the event keywords, and the efficiency of processing the track documents is improved.
as an optional scheme, the method further comprises the following steps:
The second obtaining unit is used for obtaining a search request after the event keywords are sequenced according to the time keywords to generate an event track matched with the target event, wherein the search request carries the target keywords, and the target keywords are at least one of the following keywords: time keywords, object keywords, place keywords;
And the third acquisition unit is used for responding to the search request and acquiring the target event track matched with the target keyword.
It should be noted that, in this embodiment, a search request is obtained, where the search request carries a target keyword, and the target keyword is at least one of the following keywords: and responding to the search request by the time keyword, the object keyword and the place keyword to acquire a target event track matched with the target keyword.
for further example, the sorted event tracks are stored in the system and are paired with the time keywords and the event keyword settings, that is, in the case of inputting a search request time keyword, an object keyword or a place keyword, a target event track matching the time keywords, the object keyword or the place keyword is called and displayed.
According to the embodiment provided by the application, the target event track is matched with the target keyword, the purpose of obtaining the target event track matched with the target keyword in searching the target keyword is achieved, and the effect of improving the reading efficiency of track documents is achieved.
according to another aspect of the embodiments of the present invention, there is also provided an electronic device of a track-class document processing method, where the electronic device includes a memory and a processor, the memory stores a computer program, and the processor is configured to execute the steps in any one of the above method embodiments through the computer program.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring a track type document to be processed, wherein the track type document is used for recording a target event;
s2, extracting the time keywords in the track documents, and determining document data matched with the time keywords in the track documents;
S3, extracting event keywords from the document data, wherein the event keywords include at least one of: an object keyword corresponding to an object appearing in the target event, and a place keyword corresponding to a place appearing in the target event;
And S4, sorting the event keywords according to the time keywords to generate an event track matched with the target event.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the apparatus are described as a series of acts or combinations, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Optionally, as one of ordinary skill in the art will understand, the structure in the embodiment of the present application is only an illustration, and the electronic device may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a Mobile Internet Device (MID), a PAD, and the like. The embodiments of the present application do not limit the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.), or more different configurations.
The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for processing track-class documents in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, the method for processing track-class documents is implemented. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory may be specifically, but not limited to, used for storing information such as target events, time keywords, event keywords, and the like. As an example, the memory may include, but is not limited to, the first obtaining unit 302, the determining unit 304, the extracting unit 306, and the generating unit 308 in the track-class document processing apparatus, and may also include, but is not limited to, other module units in the track-class document processing apparatus, which is not described in detail in this example.
optionally, the transmission device is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device includes a Network adapter (NIC) that can be connected to the router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In addition, the electronic device further includes: a display for displaying the target event, the time keyword, the event keyword, etc.; and a connection bus for connecting the respective module parts in the electronic apparatus.
According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the steps in any of the above-mentioned method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
S1, acquiring a track type document to be processed, wherein the track type document is used for recording a target event;
s2, extracting the time keywords in the track documents, and determining document data matched with the time keywords in the track documents;
S3, extracting event keywords from the document data, wherein the event keywords include at least one of: an object keyword corresponding to an object appearing in the target event, and a place keyword corresponding to a place appearing in the target event;
And S4, sorting the event keywords according to the time keywords to generate an event track matched with the target event.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
the integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the above methods according to the embodiments of the present invention.
in the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (14)

1. a track class document processing method is characterized by comprising the following steps:
acquiring a track type document to be processed, wherein the track type document is used for recording a target event;
extracting time keywords in the track documents, and determining document data matched with the time keywords in the track documents;
extracting event keywords from the document data, wherein the event keywords comprise at least one of the following: an object keyword corresponding to an object appearing in the target event, and a place keyword corresponding to a place appearing in the target event;
And sequencing the event keywords according to the time keywords to generate an event track matched with the target event.
2. The method of claim 1, wherein said ranking said event keywords by said time keywords comprises:
establishing a keyword pair between the time keyword and the event keyword;
sorting the keyword pairs according to a time sequence indicated by the time keywords to obtain sorted keyword pairs, wherein in the sorted keyword pairs, the event keywords are sorted according to the time sequence;
And taking the ordered keyword pair as the event track of the target event.
3. The method according to claim 1, wherein the determining the document data in the track class document matching the time keyword comprises:
determining a target text line where the time keyword is located in the track type document;
obtaining an associated text line adjacent to the target text line, wherein the associated text line includes at least one of: a first number of text lines before and adjacent to the target text line, a second number of text lines after and adjacent to the target text line;
and determining the data in the associated text line as the document data.
4. the method according to claim 3, wherein the extracting event keywords from the document data comprises:
Traversing and searching in the document data by using the object keywords in the keyword database;
and determining the searched words matched with the object keywords as the event keywords.
5. The method according to claim 1, wherein the obtaining the track class document to be processed comprises:
acquiring a target document set;
Determining documents containing the object keywords from the target document set as a candidate document set;
and acquiring the track class document from the candidate document set.
6. the method of claim 1, further comprising, after said sorting said event keywords by said time keywords to generate an event track matching said target event:
obtaining a search request, wherein the search request carries a target keyword, and the target keyword is at least one of the following keywords: the time keyword, the object keyword, the place keyword;
And responding to the search request, and acquiring a target event track matched with the target keyword.
7. a track class document processing apparatus, comprising:
The system comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring a track class document to be processed, and the track class document is used for recording a target event;
the determining unit is used for extracting the time keywords in the track documents and determining document data matched with the time keywords in the track documents;
An extracting unit configured to extract an event keyword from the document data, wherein the event keyword includes at least one of: an object keyword corresponding to an object appearing in the target event, and a place keyword corresponding to a place appearing in the target event;
And the generating unit is used for sequencing the event keywords according to the time keywords so as to generate an event track matched with the target event.
8. The apparatus of claim 7, wherein the generating unit comprises:
the establishing module is used for establishing a keyword pair between the time keyword and the event keyword;
The sorting module is used for sorting the keyword pairs according to the time sequence indicated by the time keywords to obtain the sorted keyword pairs, wherein in the sorted keyword pairs, the event keywords are sorted according to the time sequence;
and the generation module is used for taking the sequenced keyword pairs as the event tracks of the target events.
9. The apparatus of claim 7, wherein the determining unit comprises:
The first determining module is used for determining a target text line where the time keyword is located in the track type document;
A first obtaining module, configured to obtain an associated text line adjacent to the target text line, where the associated text line includes at least one of: a first number of text lines before and adjacent to the target text line, a second number of text lines after and adjacent to the target text line;
And the second determining module is used for determining the data in the associated text line as the document data.
10. the apparatus of claim 9, wherein the extraction unit comprises:
the searching module is used for performing traversal searching in the document data by using the object keywords in the keyword database;
and the third determining module is used for determining the searched words matched with the object keywords as the event keywords.
11. The apparatus of claim 7, wherein the obtaining unit comprises:
The second acquisition module is used for acquiring a target document set;
A third determining module, configured to determine, from the target document set, documents including the object keyword as a candidate document set;
And the third acquisition module is used for acquiring the track class document from the candidate document set.
12. the apparatus of claim 7, further comprising:
A second obtaining unit, configured to obtain a search request after the event keywords are ranked according to the time keywords to generate an event track matching the target event, where the search request carries target keywords, and the target keywords are at least one of the following keywords: the time keyword, the object keyword, the place keyword;
and the third acquisition unit is used for responding to the search request and acquiring the target event track matched with the target keyword.
13. a computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 6.
14. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 6 by means of the computer program.
CN201910860042.5A 2019-09-11 2019-09-11 Track type document processing method and device, storage medium and electronic device Pending CN110543457A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910860042.5A CN110543457A (en) 2019-09-11 2019-09-11 Track type document processing method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910860042.5A CN110543457A (en) 2019-09-11 2019-09-11 Track type document processing method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN110543457A true CN110543457A (en) 2019-12-06

Family

ID=68713397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910860042.5A Pending CN110543457A (en) 2019-09-11 2019-09-11 Track type document processing method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN110543457A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231272A (en) * 2020-09-30 2021-01-15 陈梅玉 Information processing method and information service platform based on remote online office
CN113553407A (en) * 2021-06-18 2021-10-26 北京百度网讯科技有限公司 Event tracing method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609445A (en) * 2009-07-16 2009-12-23 复旦大学 Crucial sub-method for extracting topic based on temporal information
CN102567421A (en) * 2010-12-27 2012-07-11 北大方正集团有限公司 Document retrieval method and device
CN104102719A (en) * 2014-07-18 2014-10-15 百度在线网络技术(北京)有限公司 Track information pushing method and device
CN104572615A (en) * 2014-12-19 2015-04-29 深圳中创华安科技有限公司 Method and system for on-line case investigation processing
CN104581000A (en) * 2013-10-12 2015-04-29 北京航天长峰科技工业集团有限公司 Method for rapidly retrieving motional trajectory of interested video target
CN106211071A (en) * 2016-07-04 2016-12-07 深圳大学 Group activity method of data capture based on multi-source space-time trajectory data and system
CN107135252A (en) * 2017-04-18 2017-09-05 北京思特奇信息技术股份有限公司 The user intent event recommendation method and apparatus of track are used based on mobile terminal
CN108536813A (en) * 2018-04-04 2018-09-14 平安科技(深圳)有限公司 Track querying method, electronic equipment and storage medium
CN108629000A (en) * 2018-05-02 2018-10-09 深圳市数字城市工程研究中心 A kind of the group behavior feature extracting method and system of mobile phone track data cluster
CN110069585A (en) * 2017-12-05 2019-07-30 腾讯科技(深圳)有限公司 Treating method and apparatus, storage medium and the electronic device of track point data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609445A (en) * 2009-07-16 2009-12-23 复旦大学 Crucial sub-method for extracting topic based on temporal information
CN102567421A (en) * 2010-12-27 2012-07-11 北大方正集团有限公司 Document retrieval method and device
CN104581000A (en) * 2013-10-12 2015-04-29 北京航天长峰科技工业集团有限公司 Method for rapidly retrieving motional trajectory of interested video target
CN104102719A (en) * 2014-07-18 2014-10-15 百度在线网络技术(北京)有限公司 Track information pushing method and device
CN104572615A (en) * 2014-12-19 2015-04-29 深圳中创华安科技有限公司 Method and system for on-line case investigation processing
CN106211071A (en) * 2016-07-04 2016-12-07 深圳大学 Group activity method of data capture based on multi-source space-time trajectory data and system
CN107135252A (en) * 2017-04-18 2017-09-05 北京思特奇信息技术股份有限公司 The user intent event recommendation method and apparatus of track are used based on mobile terminal
CN110069585A (en) * 2017-12-05 2019-07-30 腾讯科技(深圳)有限公司 Treating method and apparatus, storage medium and the electronic device of track point data
CN108536813A (en) * 2018-04-04 2018-09-14 平安科技(深圳)有限公司 Track querying method, electronic equipment and storage medium
CN108629000A (en) * 2018-05-02 2018-10-09 深圳市数字城市工程研究中心 A kind of the group behavior feature extracting method and system of mobile phone track data cluster

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231272A (en) * 2020-09-30 2021-01-15 陈梅玉 Information processing method and information service platform based on remote online office
CN113553407A (en) * 2021-06-18 2021-10-26 北京百度网讯科技有限公司 Event tracing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US8600980B2 (en) Consolidated information retrieval results
US8407781B2 (en) Information providing support device and information providing support method
CN103714119B (en) A kind for the treatment of method and apparatus of browser data
CN112613917A (en) Information pushing method, device and equipment based on user portrait and storage medium
CN105453578A (en) Apparatus, server, and method for providing conversation topic
CN103605715A (en) Method and device used for data integration processing of multiple data sources
CN112732893B (en) Text information extraction method and device, storage medium and electronic equipment
CN110209921B (en) Method and device for pushing media resource, storage medium and electronic device
CN104050243A (en) Network searching method and system combined with searching and social contact
CN110032616A (en) A kind of acquisition method and device of document reading conditions
CN110543457A (en) Track type document processing method and device, storage medium and electronic device
CN110895587B (en) Method and device for determining target user
CN111949849B (en) Fish information acquisition method and device, electronic equipment and readable storage medium
CN111310224B (en) Log desensitization method, device, computer equipment and computer readable storage medium
CN112749258A (en) Data searching method and device, electronic equipment and storage medium
JP7503493B2 (en) Posted information extraction control device, posted information extraction control program
CN106611022B (en) Method and device for improving search efficiency in website
CN104240107A (en) Community data screening system and method thereof
CN108228802B (en) Recommendation method and device for input association
JP6088781B2 (en) Server apparatus, program, and control method
CN115659375A (en) Data processing method, data processing device, storage medium and electronic equipment
CN110941711A (en) Electronic search report acquisition method and apparatus, storage medium, and electronic apparatus
CN109299439B (en) Digital extraction method and apparatus, storage medium, and electronic apparatus
CN112861532B (en) Address standardization processing method, device, equipment and online searching system
CN108287834A (en) Method, apparatus and computing device for pushed information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191206

RJ01 Rejection of invention patent application after publication