CN113761227A - Text data searching method and device - Google Patents

Text data searching method and device Download PDF

Info

Publication number
CN113761227A
CN113761227A CN202010806630.3A CN202010806630A CN113761227A CN 113761227 A CN113761227 A CN 113761227A CN 202010806630 A CN202010806630 A CN 202010806630A CN 113761227 A CN113761227 A CN 113761227A
Authority
CN
China
Prior art keywords
text
corpus
space
time
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010806630.3A
Other languages
Chinese (zh)
Inventor
兰亚伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010806630.3A priority Critical patent/CN113761227A/en
Publication of CN113761227A publication Critical patent/CN113761227A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a text data searching method and device, and relates to the technical field of computers. The method comprises the following steps: extracting at least one of time characteristics or space characteristics of the searched text data as space-time characteristics by using a machine learning model; and determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time labels of the corpus texts, wherein the space-time labels are used for labeling at least one item of time information or space information of the corpus text.

Description

Text data searching method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a text data search method, a text data search apparatus, and a non-volatile computer-readable storage medium.
Background
Due to the development of computer and network technologies, a huge amount of text is stored on today's networks and is growing all the time. Therefore, it is important how to accurately search for desired contents from a huge amount of texts.
In the related art, a search engine serving as an entry for a user to acquire information is mostly implemented based on keyword content matching.
Disclosure of Invention
The inventors of the present disclosure found that the following problems exist in the above-described related art: the method does not have the function of deeply mining the internal relation of the information, so that the accuracy of the search result is low.
In view of this, the present disclosure provides a technical solution for searching text data, which can improve the accuracy of the search result.
According to some embodiments of the present disclosure, there is provided a text data search method including: extracting at least one of time characteristics or space characteristics of the searched text data as space-time characteristics by using a machine learning model; and determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time labels of the corpus texts, wherein the space-time labels are used for labeling at least one item of time information or space information of the corpus text.
In some embodiments, the spatiotemporal tag is generated by: extracting at least one of time characteristics or space characteristics of each sentence in the text to be processed as space-time characteristics by using a machine learning model; and dividing the text to be processed into each corpus text according to the space-time characteristics, and generating a space-time label of each corpus text.
In some embodiments, determining corpus text matching the search text data based on a degree of matching of spatio-temporal features to spatio-temporal tags of respective corpus texts comprises: determining a first corpus text according to the matching degree of the search features and the space-time labels of all corpus texts; determining a second corpus text belonging to the same type of event as the first corpus text according to the event label of the first corpus text; and determining the first language material text and the second language material text as language material texts matched with the search text data.
In some embodiments, the event tag is generated by: extracting event characteristics of each corpus text by using a machine learning model according to context information of each corpus text in the text to be processed; and marking the same event label for the corpus texts with the same event characteristics.
In some embodiments, the corpus text matching the search text data is plural; the method further comprises the following steps: determining relevant events of the searched text data according to the event tags of the multiple matched corpus texts; and generating at least one item of spatial track information or time axis information of the related events according to the space-time labels of the plurality of matched corpus texts.
In some embodiments, the method further comprises at least one of the following steps: according to the spatial track information of the relevant events, marking and displaying the relevant events at corresponding positions on a map; or according to the space track information of the relevant event, determining a relevant area on the map, and displaying the time character information or the time shaft graphic information determined according to the time shaft information on the relevant area.
According to still other embodiments of the present disclosure, there is provided a text data search apparatus including: an extraction unit configured to extract at least one of a temporal feature or a spatial feature of the search text data as a spatio-temporal feature using a machine learning model; and the determining unit is used for determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time labels of the corpus texts, wherein the space-time labels are used for labeling at least one item of time information or space information of the corpus text.
In some embodiments, the spatiotemporal tag is generated by: extracting at least one of time characteristics or space characteristics of each sentence in the text to be processed as space-time characteristics by using a machine learning model; and dividing the text to be processed into each corpus text according to the space-time characteristics, and generating a space-time label of each corpus text.
In some embodiments, the determining unit determines the first corpus text according to a matching degree of the search feature and the space-time label of each corpus text; determining a second corpus text belonging to the same type of event as the first corpus text according to the event label of the first corpus text; and determining the first language material text and the second language material text as language material texts matched with the search text data.
In some embodiments, the event tag is generated by: extracting event characteristics of each corpus text by using a machine learning model according to context information of each corpus text in the text to be processed; and marking the same event label for the corpus texts with the same event characteristics.
In some embodiments, the corpus text matched with the search text data is multiple, and the determining unit determines the relevant event of the search text data according to the event tags of the multiple matched corpus texts; the device also includes: and the generating unit is used for generating at least one item of space track information or time axis information of the related events according to the space-time labels of the plurality of matched corpus texts.
In some embodiments, the apparatus further comprises a display unit for performing at least one of the following steps: according to the spatial track information of the relevant events, marking and displaying the relevant events at corresponding positions on a map; or according to the space track information of the relevant event, determining a relevant area on the map, and displaying the time character information or the time shaft graphic information determined according to the time shaft information on the relevant area.
According to still further embodiments of the present disclosure, there is provided a text data search apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the search method of text data in any of the above embodiments based on instructions stored in the memory device.
According to still further embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a search method of text data in any of the above embodiments.
In the embodiment, the relevance relation in the text data can be deeply mined by taking the space-time characteristics of the text data as a search basis, so that the accuracy of a search result is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure can be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:
FIG. 1 illustrates a flow diagram of some embodiments of a method of searching for textual data of the present disclosure;
FIG. 2 illustrates a flow diagram of some embodiments of step 120 in FIG. 1;
FIG. 3 illustrates a schematic diagram of some embodiments of a method of searching for textual data of the present disclosure;
FIG. 4 shows a schematic diagram of further embodiments of a method of searching for textual data of the present disclosure;
FIG. 5 illustrates a block diagram of some embodiments of a search apparatus of text data of the present disclosure;
FIG. 6 shows a block diagram of further embodiments of a search apparatus for text data of the present disclosure;
fig. 7 shows a block diagram of further embodiments of a device for searching text data according to the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
As mentioned above, the huge amount of text stored on the network contains a great deal of temporal and spatial information, and thus there often exists a spatiotemporal correlation between the text contents. By utilizing a search method without extracting, organizing, correlating, retrieving and analyzing the spatio-temporal information, a user often faces the technical problems that the search result is inaccurate or the search result needs to be manually screened in the process of using a search engine.
In order to solve the technical problem, the present disclosure is based on a natural language processing technology, and intelligently extracts, calculates, and infers time and space information in text content. And cutting the text content into a plurality of spatiotemporal events based on the spatiotemporal scene determined by the spatiotemporal information. Spatiotemporal events may have attributes of time, place, people, event type, and the like.
The accuracy of the search results can be improved using spatiotemporal events as the smallest processing particle for retrieval and analysis. And the spatiotemporal knowledge and the value in the text content can be further mined by combining different application analysis models. For example, the technical solution of the present disclosure can be realized by the following embodiments.
Fig. 1 illustrates a flow diagram of some embodiments of a method of searching for textual data of the present disclosure.
As shown in fig. 1, the method includes: step 110, extracting space-time characteristics; and step 120, determining the matched corpus text.
In step 110, at least one of temporal features or spatial features of the search text data is extracted as spatio-temporal features using a machine learning model.
In step 120, the corpus text matching the search text data is determined according to the matching degree of the spatio-temporal features and the spatio-temporal labels of the corpus texts. The space-time label is used for labeling at least one item of time information or space information of the corpus text.
In some embodiments, a corpus may be established for storing a collection of labeled corpus texts. For example, each corpus text is treated as a spatiotemporal time with a spatiotemporal label.
In some embodiments, a machine learning model is used for extracting at least one of a temporal feature or a spatial feature of each sentence in the text to be processed as a spatio-temporal feature; and dividing the text to be processed into each corpus text according to the space-time characteristics, and generating a space-time label of each corpus text.
In some embodiments, the word segmentation process and the part-of-speech determination process may be performed on each sentence. And extracting the space-time characteristics of each sentence by using a machine learning model according to the processing result. For example, spatiotemporal features may be extracted and spatiotemporal labels labeled using a labeled-LDA (Latent Dirichlet Allocation) model.
In some embodiments, each sentence may be participled using an n-gram model. For example, the single word ω in the sentence can be calculated by the following formulaiProbability of occurrence P (ω) for its first n wordsii-(n-1),…,ωi-1) Comprises the following steps:
Figure BDA0002629360450000061
count () is the number of times the combination of words occurs is counted. That is, P (ω)ii-(n-1),…,ωi-1) Is a combination of single characters (omega)i-(n-1),…,ωi) Word frequency in documents, combined with individual words (ω)i-(n-1),…,ωi-1) The ratio of word frequencies in the document.
According to each omegaiP (ω) ofii-(n-1),…,ωi-1) Calculating the combination of words (omega)i-(n-1),…,ωi) Probability distribution P (ω)i-(n-1),…,ωi). For example, P (ω) may be based onii-(n-1),…,ωi-1) P (ω) is calculated as the product ofi-(n-1),…,ωi). At P (omega)i-(n-1),…,ωi) If the number is larger than the threshold value, combining the single characters (omega)i-(n-1),…,ωi) Divided into one word.
In some embodiments, after the word segmentation processing is performed on each sentence, the part-of-speech tagging may be modeled as a sequence tagging problem, and the part-of-speech tagging may be performed using a machine learning model. For example, the machine learning model may be a hidden Markov model, a conditional random field model, or the like.
In this way, words that do not appear in the dictionary can be partitioned, and word segmentation accuracy can be improved according to the context.
After word segmentation and part-of-speech tagging are carried out, the space-time characteristics can be further extracted. Therefore, the space-time correlation in the text data can be deeply mined to serve as the basis of the following search, and the search accuracy is improved.
In some embodiments, step 120 may be implemented by the embodiment in fig. 2.
Fig. 2 illustrates a flow diagram of some embodiments of step 120 in fig. 1.
As shown in fig. 2, step 120 includes: step 1210, determining a first corpus text; step 1220, determining a second corpus text; and step 1230, determining the matched corpus text.
In step 1210, a first corpus text is determined according to the matching degree of the search features and the spatio-temporal tags of the corpus texts.
In step 1220, a second corpus text belonging to the same type of event as the first corpus text is determined according to the event tag of the first corpus text.
In some embodiments, according to the context information of each corpus text in the text to be processed, extracting the event features of each corpus text by using a machine learning model; and marking the same event label for the corpus texts with the same event characteristics.
In some embodiments, spatiotemporal events belonging to the same event may be classified into the same class of spatiotemporal events and constructed as one set of events. Each spatiotemporal event in an event set has the same event label.
Therefore, multi-time-space correlation analysis of each corpus text can be realized, and different time-space times under the same event are correlated together. For example, event sequencing and geographical classification can be performed on each spatio-temporal event belonging to the same event set according to the spatio-temporal labels, so that the process deduction of one event is realized. Through the space-time correlation, the coverage range of the search result can be improved, and the accuracy of the search result is further improved.
In step 1230, the first corpus text and the second corpus text are determined as corpus texts matching the search text data.
In some embodiments, the corpus text matching the search text data is plural. In this case, the relevant event of searching the text data may be determined according to the event tags of the plurality of matched corpus texts; and generating at least one item of spatial track information or time axis information of the related events according to the space-time labels of the plurality of matched corpus texts.
In some embodiments, the related events are displayed in a labeling mode at corresponding positions on a map according to the spatial track information of the related events.
In some embodiments, a relevant area is determined on a map according to spatial trajectory information of a relevant event, and time text information determined according to time axis information, or time axis graphic information is displayed on the relevant area.
In some embodiments, the server of the technical solution of the present disclosure may be configured by the embodiment in fig. 3.
Fig. 3 shows a schematic diagram of some embodiments of a method of searching for text data of the present disclosure.
As shown in fig. 3, the service end (platform) of the method may include an application presentation layer, a first service layer, a second service layer, and a base component layer.
In some embodiments, the application presentation layer may include a read + Redux framework, a Terria map framework, Echart (visualization tool), and the like.
In some embodiments, the first service layer may include a Shiro + jwt rights framework, a base services module, a data collection module, and the like. For example, an algorithm analysis pool, a spatiotemporal information extraction module, a news situation analysis model, a multi-spatiotemporal association analysis model and the like can be further included.
In some embodiments, the second service layer may include wndshift, car.
In some embodiments, the base components may include Citus, postgresql, Zombodb, ES (Elastic Search), Redis (cache), MapNik, and the like.
In some embodiments, in view of the fact that the data volume of the server is large and a single database is difficult to support, postgresql can be used for building a cluster, and sub-library and sub-table are used for relieving the read-write pressure of the single library and the single table.
In some embodiments, the search of the method may include full text retrieval. The full-text retrieval can be realized by adopting postgresql special-purpose cis database middleware and adopting ES service. For example, a Zombodb plug-in may be employed to access ES services. In this way, Zombodb can enable the postgresql database to internally support the ES full-text index without having to synchronize data in the ES service.
In some embodiments, the data caching service is implemented based on Redis. And rendering the space information into a map by adopting Mapnik. The message queue is implemented using kafka.
In some embodiments, the first service layer responds to the request of the upper application and the second service layer, and obtains data from the platform database to perform business logic processing; and feeding back data to the application layer. And service support is provided for realizing all functions.
Fig. 4 shows a schematic diagram of further embodiments of the text data search method of the present disclosure.
As shown in fig. 4, the document library is used to store document contents as a search range. The document library may include document content and folders (e.g., sub-folder nesting may be supported).
For example, deleting a folder will delete both the contained document content and subfolders. Renaming and copy movement may be supported. The document repository may be set public or private.
For example, the document repository creation and management may default to 10 document repositories created by the maximum support user, and may be configured as needed. The default user maximum usage document storage space may be 200MB and may be configured as desired.
In some embodiments, before the creation, browsing, and management of the document content reaches the user storage space threshold, the document content may be added to the designated folder by uploading the file or providing a link.
In some embodiments, the document content in the document library may be a personal document uploaded by the user or a link text provided by the user; or the web document crawler can be used for crawling from an internet website and uploading. For example, the links may support web page text extraction.
In some embodiments, it is desirable to provide metadata of a document at the time of uploading the document for more accurate full-text parsing. The public or private setting of a document depends on the disclosure of the repository of documents.
In some embodiments, the corpus is used to store the annotated corpus text set for training. The labeled corpus text comprises a label and document content after word segmentation.
For example, a user may default to create up to 10 corpora and may configure as desired. The corpus may not store files, and only extract text content for storage. The corpus text of the user can support at most 2 ten thousand vocabularies by default and can be configured according to the needs.
For example, corpus text may support browsing and editing. After the corpus is updated, the machine learning model may be retrained.
In some embodiments, the corpus text in the corpus can be forwarded from the document library and then edited; may be uploaded directly to the corpus by the user. For example, the extraction of spatiotemporal features and multi-spatiotemporal analysis can be performed on the documents in the document library, and the corpus text is generated and forwarded to the corpus for storage.
In some embodiments, each corpus may correspond to a Labeled-LDA model for labeling spatiotemporal labels. For example, after the Labeled-LDA model is updated, the task of updating the label can be performed. At this point, the spatiotemporal labels may be regenerated using the Labeled-LDA model. The corpus can be set as public or private.
In some embodiments, the event collection may be a user-created logical grouping for collecting the same type of time-space events together. The event set may be used for subsequent analysis and map visualization. For example, a user may create a set of events for an activity that are used to gather all spatiotemporal events for that activity together.
In some embodiments, the set of events may be set public or private. The set of public events does not distinguish whether the spatiotemporal events originate from private documents or public documents. Once the set of events is published, spatiotemporal events derived from within the private document will also be published, but the source document will not.
In some embodiments, all spatiotemporal events corresponding to documents in a document repository may be added in bulk to the set of events. The set of events may provide a data basis for map visualization and analysis. Each document library can establish an event set, and a user can modify the default event set.
In some embodiments, a user may perform a search query on search text through a search engine, the processing granularity of the search being spatiotemporal events. For example, basic queries for keyword queries and advanced queries based on temporal queries and spatial queries may be supported simultaneously.
In some embodiments, tag-based retrieval and specification of a retrieval scope (e.g., all public events and private data of itself, a specified set of events) may be supported.
In some embodiments, the time-space events can be stored in a corpus and an event set after training and labeling the time-space events through word segmentation. When a user searches through the keywords, the event labels or the time-space labels of the event set can be used as indexes for displaying.
In some embodiments, time sorting may be performed according to time information in search text data corresponding to a keyword query or an event query; or carrying out geographical classification according to the spatial information in the search text data corresponding to the spatial query. And inquiring in the event set according to the sorting result and the classification result.
In some embodiments, the search results may be returned to the user. And displaying by using a map service according to the retrieval result so as to perform map visualization analysis.
In some embodiments, the data source of the map visualization analysis is a specified set of events. The event set and charting scheme may be in a one-to-many relationship. For example, the same set of events may create different map visualization schemes (location tracks, timelines, time information, etc.). The charting scheme may be maintained, public or private depending on the set of events used for charting. The charting scheme can be retrieved according to the picture name, the author user name and the data set name.
Fig. 5 illustrates a block diagram of some embodiments of a search apparatus of text data of the present disclosure.
As shown in fig. 5, the search device 5 for text data includes an extraction unit 51 and a determination unit 52.
The extraction unit 51 extracts at least one of a temporal feature or a spatial feature of the search text data as a spatio-temporal feature using a machine learning model.
The determining unit 52 determines a corpus text matching the search text data according to the degree of matching between the spatio-temporal features and the spatio-temporal labels of the corpus texts. The space-time label is used for labeling at least one item of time information or space information of the corpus text.
In some embodiments, the spatiotemporal tag is generated by: extracting at least one of time characteristics or space characteristics of each sentence in the text to be processed as space-time characteristics by using a machine learning model; and dividing the text to be processed into each corpus text according to the space-time characteristics, and generating a space-time label of each corpus text.
In some embodiments, the determining unit 52 determines the first corpus text according to the matching degree of the search feature and the spatio-temporal label of each corpus text; determining a second corpus text belonging to the same type of event as the first corpus text according to the event label of the first corpus text; and determining the first language material text and the second language material text as language material texts matched with the search text data.
In some embodiments, the event tag is generated by: extracting event characteristics of each corpus text by using a machine learning model according to context information of each corpus text in the text to be processed; and marking the same event label for the corpus texts with the same event characteristics.
In some embodiments, the corpus text matching the search text data is plural. The determining unit 52 determines a related event of the search text data according to the event tags of the plurality of matched corpus texts.
In some embodiments, the search apparatus 5 further includes a generating unit 51 for generating at least one of spatial trajectory information or time axis information of the related events according to the spatio-temporal labels of the plurality of matched corpus texts.
In some embodiments, the search apparatus 5 further comprises a display unit 52 for performing at least one of the following steps: according to the spatial track information of the relevant events, marking and displaying the relevant events at corresponding positions on a map; or according to the space track information of the relevant event, determining a relevant area on the map, and displaying the time character information or the time shaft graphic information determined according to the time shaft information on the relevant area.
Fig. 6 shows a block diagram of further embodiments of a device for searching text data according to the disclosure.
As shown in fig. 6, the text data search device 6 of this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61, the processor 62 being configured to execute a search method of text data in any one embodiment of the present disclosure based on instructions stored in the memory 61.
The memory 61 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, a database, and other programs.
Fig. 7 shows a block diagram of further embodiments of a device for searching text data according to the present disclosure.
As shown in fig. 7, the text data search device 7 of this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710, the processor 720 being configured to execute the text data searching method in any of the foregoing embodiments based on instructions stored in the memory 710.
The memory 710 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, and other programs.
The search means 7 for text data may further include an input-output interface 730, a network interface 740, a storage interface 750, and the like. These interfaces 730, 740, 750, as well as the memory 710 and the processor 720, may be connected, for example, by a bus 760. The input/output interface 730 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a speaker. The network interface 740 provides a connection interface for various networking devices. The storage interface 750 provides a connection interface for external storage devices such as an SD card and a usb disk.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.
So far, the search method of text data, the search apparatus of text data, and the nonvolatile computer-readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.
The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (11)

1. A method of searching text data, comprising:
extracting at least one of time characteristics or space characteristics of the searched text data as space-time characteristics by using a machine learning model;
and determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time label of each corpus text, wherein the space-time label is used for labeling at least one item of time information or space information of the corpus text.
2. The search method of claim 1, wherein the spatiotemporal tag is generated by:
extracting at least one of time characteristics or space characteristics of each sentence in the text to be processed as space-time characteristics by using a machine learning model;
and dividing the text to be processed into the corpus texts according to the space-time characteristics, and generating space-time labels of the corpus texts.
3. The search method according to claim 1, wherein the determining corpus text matching the search text data according to the degree of matching the spatio-temporal features with the spatio-temporal labels of the corpus texts comprises:
determining a first corpus text according to the matching degree of the search features and the space-time labels of all corpus texts;
determining a second corpus text belonging to the same type of event as the first corpus text according to the event label of the first corpus text;
and determining the first language material text and the second language material text as language material texts matched with the search text data.
4. The search method of claim 3, wherein the event tag is generated by:
extracting the event characteristics of each corpus text by using a machine learning model according to the context information of each corpus text in the text to be processed;
and marking the same event label for the corpus texts with the same event characteristics.
5. The search method according to any one of claims 1 to 4,
a plurality of corpus texts matched with the search text data;
further comprising:
determining relevant events of the search text data according to event tags of a plurality of matched corpus texts;
and generating at least one item of spatial track information or time axis information of the relevant events according to the space-time labels of the plurality of matched corpus texts.
6. The search method of claim 5, further comprising at least one of the following steps:
according to the spatial track information of the related events, the related events are marked and displayed at corresponding positions on a map; or
And determining a relevant area on a map according to the spatial track information of the relevant event, and displaying time character information or time axis graphic information determined according to the time axis information on the relevant area.
7. An apparatus for searching text data, comprising:
an extraction unit configured to extract at least one of a temporal feature or a spatial feature of the search text data as a spatio-temporal feature using a machine learning model;
and the determining unit is used for determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time label of each corpus text, wherein the space-time label is used for labeling at least one item of time information or space information of the corpus text.
8. The search apparatus according to claim 7,
the determining unit determines the relevant events of the search text data according to the event tags of the plurality of matched corpus texts;
further comprising:
and the generating unit is used for generating at least one item of space track information or time axis information of the related events according to the space-time labels of the plurality of matched corpus texts.
9. The search apparatus of claim 8, further comprising a display unit for performing at least one of the following steps:
according to the spatial track information of the related events, the related events are marked and displayed at corresponding positions on a map; or
And determining a relevant area on a map according to the spatial track information of the relevant event, and displaying time character information or time axis graphic information determined according to the time axis information on the relevant area.
10. An apparatus for searching text data, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of searching for text data of any of claims 1-6 based on instructions stored in the memory.
11. A non-transitory computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the text data search method according to any one of claims 1 to 6.
CN202010806630.3A 2020-08-12 2020-08-12 Text data searching method and device Pending CN113761227A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010806630.3A CN113761227A (en) 2020-08-12 2020-08-12 Text data searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010806630.3A CN113761227A (en) 2020-08-12 2020-08-12 Text data searching method and device

Publications (1)

Publication Number Publication Date
CN113761227A true CN113761227A (en) 2021-12-07

Family

ID=78785654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010806630.3A Pending CN113761227A (en) 2020-08-12 2020-08-12 Text data searching method and device

Country Status (1)

Country Link
CN (1) CN113761227A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060256210A1 (en) * 2005-04-28 2006-11-16 Kathleen Ryall Spatio-temporal graphical user interface for querying videos
JP2010250496A (en) * 2009-04-14 2010-11-04 Nippon Telegr & Teleph Corp <Ntt> Space-time retrieval device, method, and program
CN102393900A (en) * 2011-07-02 2012-03-28 山东大学 Video copying detection method based on robust hash
CN103336957A (en) * 2013-07-18 2013-10-02 中国科学院自动化研究所 Network coderivative video detection method based on spatial-temporal characteristics
US20140188847A1 (en) * 2012-12-27 2014-07-03 Industrial Technology Research Institute Interactive object retrieval method and system
CN103927310A (en) * 2013-01-14 2014-07-16 百度在线网络技术(北京)有限公司 Map data searching suggestion generation method and device
CN104584010A (en) * 2012-09-19 2015-04-29 苹果公司 Voice-based media searching
KR20150111336A (en) * 2015-09-09 2015-10-05 삼성전자주식회사 Method and Apparatus for searching contents
KR20160112746A (en) * 2015-03-20 2016-09-28 오병석 A system and a method for searching prior art information and measuring similarity thereof
TW201804342A (en) * 2016-07-21 2018-02-01 國立成功大學 Search method of spatial-temporal based on multi-rule
CN110472158A (en) * 2018-05-11 2019-11-19 北京搜狗科技发展有限公司 A kind of sort method and device of search entry

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060256210A1 (en) * 2005-04-28 2006-11-16 Kathleen Ryall Spatio-temporal graphical user interface for querying videos
JP2010250496A (en) * 2009-04-14 2010-11-04 Nippon Telegr & Teleph Corp <Ntt> Space-time retrieval device, method, and program
CN102393900A (en) * 2011-07-02 2012-03-28 山东大学 Video copying detection method based on robust hash
CN104584010A (en) * 2012-09-19 2015-04-29 苹果公司 Voice-based media searching
US20140188847A1 (en) * 2012-12-27 2014-07-03 Industrial Technology Research Institute Interactive object retrieval method and system
CN103927310A (en) * 2013-01-14 2014-07-16 百度在线网络技术(北京)有限公司 Map data searching suggestion generation method and device
CN103336957A (en) * 2013-07-18 2013-10-02 中国科学院自动化研究所 Network coderivative video detection method based on spatial-temporal characteristics
KR20160112746A (en) * 2015-03-20 2016-09-28 오병석 A system and a method for searching prior art information and measuring similarity thereof
KR20150111336A (en) * 2015-09-09 2015-10-05 삼성전자주식회사 Method and Apparatus for searching contents
TW201804342A (en) * 2016-07-21 2018-02-01 國立成功大學 Search method of spatial-temporal based on multi-rule
CN110472158A (en) * 2018-05-11 2019-11-19 北京搜狗科技发展有限公司 A kind of sort method and device of search entry

Similar Documents

Publication Publication Date Title
CN109992645B (en) Data management system and method based on text data
US11573996B2 (en) System and method for hierarchically organizing documents based on document portions
US10146862B2 (en) Context-based metadata generation and automatic annotation of electronic media in a computer network
US20220261427A1 (en) Methods and system for semantic search in large databases
CN106383887B (en) Method and system for collecting, recommending and displaying environment-friendly news data
US10740545B2 (en) Information extraction from open-ended schema-less tables
CN101364239B (en) Method for auto constructing classified catalogue and relevant system
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
US20130060769A1 (en) System and method for identifying social media interactions
CN107085583B (en) Electronic document management method and device based on content
CN107844493B (en) File association method and system
JP7451747B2 (en) Methods, devices, equipment and computer readable storage media for searching content
WO2010014082A1 (en) Method and apparatus for relating datasets by using semantic vectors and keyword analyses
US20190266158A1 (en) System and method for optimizing search query to retreive set of documents
CN113297457B (en) High-precision intelligent information resource pushing system and pushing method
US10650191B1 (en) Document term extraction based on multiple metrics
CN111651675B (en) UCL-based user interest topic mining method and device
US20090327877A1 (en) System and method for disambiguating text labeling content objects
CN110633375A (en) System for media information integration utilization based on government affair work
CN111538903B (en) Method and device for determining search recommended word, electronic equipment and computer readable medium
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
CN114706938A (en) Document tag determination method and device, electronic equipment and storage medium
Liu et al. Event-based cross media question answering
CN111881695A (en) Audit knowledge retrieval method and device
US20160085850A1 (en) Knowledge brokering and knowledge campaigns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination