CN113761227A

CN113761227A - Text data searching method and device

Info

Publication number: CN113761227A
Application number: CN202010806630.3A
Authority: CN
Inventors: 兰亚伟
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2021-12-07

Abstract

The disclosure relates to a text data searching method and device, and relates to the technical field of computers. The method comprises the following steps: extracting at least one of time characteristics or space characteristics of the searched text data as space-time characteristics by using a machine learning model; and determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time labels of the corpus texts, wherein the space-time labels are used for labeling at least one item of time information or space information of the corpus text.

Description

Text data searching method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a text data search method, a text data search apparatus, and a non-volatile computer-readable storage medium.

Background

Due to the development of computer and network technologies, a huge amount of text is stored on today's networks and is growing all the time. Therefore, it is important how to accurately search for desired contents from a huge amount of texts.

In the related art, a search engine serving as an entry for a user to acquire information is mostly implemented based on keyword content matching.

Disclosure of Invention

The inventors of the present disclosure found that the following problems exist in the above-described related art: the method does not have the function of deeply mining the internal relation of the information, so that the accuracy of the search result is low.

In view of this, the present disclosure provides a technical solution for searching text data, which can improve the accuracy of the search result.

According to some embodiments of the present disclosure, there is provided a text data search method including: extracting at least one of time characteristics or space characteristics of the searched text data as space-time characteristics by using a machine learning model; and determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time labels of the corpus texts, wherein the space-time labels are used for labeling at least one item of time information or space information of the corpus text.

In some embodiments, the spatiotemporal tag is generated by: extracting at least one of time characteristics or space characteristics of each sentence in the text to be processed as space-time characteristics by using a machine learning model; and dividing the text to be processed into each corpus text according to the space-time characteristics, and generating a space-time label of each corpus text.

In some embodiments, determining corpus text matching the search text data based on a degree of matching of spatio-temporal features to spatio-temporal tags of respective corpus texts comprises: determining a first corpus text according to the matching degree of the search features and the space-time labels of all corpus texts; determining a second corpus text belonging to the same type of event as the first corpus text according to the event label of the first corpus text; and determining the first language material text and the second language material text as language material texts matched with the search text data.

In some embodiments, the event tag is generated by: extracting event characteristics of each corpus text by using a machine learning model according to context information of each corpus text in the text to be processed; and marking the same event label for the corpus texts with the same event characteristics.

In some embodiments, the corpus text matching the search text data is plural; the method further comprises the following steps: determining relevant events of the searched text data according to the event tags of the multiple matched corpus texts; and generating at least one item of spatial track information or time axis information of the related events according to the space-time labels of the plurality of matched corpus texts.

In some embodiments, the method further comprises at least one of the following steps: according to the spatial track information of the relevant events, marking and displaying the relevant events at corresponding positions on a map; or according to the space track information of the relevant event, determining a relevant area on the map, and displaying the time character information or the time shaft graphic information determined according to the time shaft information on the relevant area.

According to still other embodiments of the present disclosure, there is provided a text data search apparatus including: an extraction unit configured to extract at least one of a temporal feature or a spatial feature of the search text data as a spatio-temporal feature using a machine learning model; and the determining unit is used for determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time labels of the corpus texts, wherein the space-time labels are used for labeling at least one item of time information or space information of the corpus text.

In some embodiments, the determining unit determines the first corpus text according to a matching degree of the search feature and the space-time label of each corpus text; determining a second corpus text belonging to the same type of event as the first corpus text according to the event label of the first corpus text; and determining the first language material text and the second language material text as language material texts matched with the search text data.

In some embodiments, the corpus text matched with the search text data is multiple, and the determining unit determines the relevant event of the search text data according to the event tags of the multiple matched corpus texts; the device also includes: and the generating unit is used for generating at least one item of space track information or time axis information of the related events according to the space-time labels of the plurality of matched corpus texts.

In some embodiments, the apparatus further comprises a display unit for performing at least one of the following steps: according to the spatial track information of the relevant events, marking and displaying the relevant events at corresponding positions on a map; or according to the space track information of the relevant event, determining a relevant area on the map, and displaying the time character information or the time shaft graphic information determined according to the time shaft information on the relevant area.

According to still further embodiments of the present disclosure, there is provided a text data search apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the search method of text data in any of the above embodiments based on instructions stored in the memory device.

According to still further embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a search method of text data in any of the above embodiments.

In the embodiment, the relevance relation in the text data can be deeply mined by taking the space-time characteristics of the text data as a search basis, so that the accuracy of a search result is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure can be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:

FIG. 1 illustrates a flow diagram of some embodiments of a method of searching for textual data of the present disclosure;

FIG. 2 illustrates a flow diagram of some embodiments of step 120 in FIG. 1;

FIG. 3 illustrates a schematic diagram of some embodiments of a method of searching for textual data of the present disclosure;

FIG. 4 shows a schematic diagram of further embodiments of a method of searching for textual data of the present disclosure;

FIG. 5 illustrates a block diagram of some embodiments of a search apparatus of text data of the present disclosure;

FIG. 6 shows a block diagram of further embodiments of a search apparatus for text data of the present disclosure;

fig. 7 shows a block diagram of further embodiments of a device for searching text data according to the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

As mentioned above, the huge amount of text stored on the network contains a great deal of temporal and spatial information, and thus there often exists a spatiotemporal correlation between the text contents. By utilizing a search method without extracting, organizing, correlating, retrieving and analyzing the spatio-temporal information, a user often faces the technical problems that the search result is inaccurate or the search result needs to be manually screened in the process of using a search engine.

In order to solve the technical problem, the present disclosure is based on a natural language processing technology, and intelligently extracts, calculates, and infers time and space information in text content. And cutting the text content into a plurality of spatiotemporal events based on the spatiotemporal scene determined by the spatiotemporal information. Spatiotemporal events may have attributes of time, place, people, event type, and the like.

The accuracy of the search results can be improved using spatiotemporal events as the smallest processing particle for retrieval and analysis. And the spatiotemporal knowledge and the value in the text content can be further mined by combining different application analysis models. For example, the technical solution of the present disclosure can be realized by the following embodiments.

Fig. 1 illustrates a flow diagram of some embodiments of a method of searching for textual data of the present disclosure.

As shown in fig. 1, the method includes: step 110, extracting space-time characteristics; and step 120, determining the matched corpus text.

In step 110, at least one of temporal features or spatial features of the search text data is extracted as spatio-temporal features using a machine learning model.

In step 120, the corpus text matching the search text data is determined according to the matching degree of the spatio-temporal features and the spatio-temporal labels of the corpus texts. The space-time label is used for labeling at least one item of time information or space information of the corpus text.

In some embodiments, a corpus may be established for storing a collection of labeled corpus texts. For example, each corpus text is treated as a spatiotemporal time with a spatiotemporal label.

In some embodiments, a machine learning model is used for extracting at least one of a temporal feature or a spatial feature of each sentence in the text to be processed as a spatio-temporal feature; and dividing the text to be processed into each corpus text according to the space-time characteristics, and generating a space-time label of each corpus text.

In some embodiments, the word segmentation process and the part-of-speech determination process may be performed on each sentence. And extracting the space-time characteristics of each sentence by using a machine learning model according to the processing result. For example, spatiotemporal features may be extracted and spatiotemporal labels labeled using a labeled-LDA (Latent Dirichlet Allocation) model.

In some embodiments, each sentence may be participled using an n-gram model. For example, the single word ω in the sentence can be calculated by the following formula_iProbability of occurrence P (ω) for its first n words_i|ω_i-(n-1)，…，ω_i-1) Comprises the following steps:

count () is the number of times the combination of words occurs is counted. That is, P (ω)_i|ω_i-(n-1)，…，ω_i-1) Is a combination of single characters (omega)_i-(n-1)，…，ω_i) Word frequency in documents, combined with individual words (ω)_i-(n-1)，…，ω_i-1) The ratio of word frequencies in the document.

According to each omega_iP (ω) of_i|ω_i-(n-1)，…，ω_i-1) Calculating the combination of words (omega)_i-(n-1)，…，ω_i) Probability distribution P (ω)_i-(n-1)，…，ω_i). For example, P (ω) may be based on_i|ω_i-(n-1)，…，ω_i-1) P (ω) is calculated as the product of_i-(n-1)，…，ω_i). At P (omega)_i-(n-1)，…，ω_i) If the number is larger than the threshold value, combining the single characters (omega)_i-(n-1)，…，ω_i) Divided into one word.

In some embodiments, after the word segmentation processing is performed on each sentence, the part-of-speech tagging may be modeled as a sequence tagging problem, and the part-of-speech tagging may be performed using a machine learning model. For example, the machine learning model may be a hidden Markov model, a conditional random field model, or the like.

In this way, words that do not appear in the dictionary can be partitioned, and word segmentation accuracy can be improved according to the context.

After word segmentation and part-of-speech tagging are carried out, the space-time characteristics can be further extracted. Therefore, the space-time correlation in the text data can be deeply mined to serve as the basis of the following search, and the search accuracy is improved.

In some embodiments, step 120 may be implemented by the embodiment in fig. 2.

Fig. 2 illustrates a flow diagram of some embodiments of step 120 in fig. 1.

As shown in fig. 2, step 120 includes: step 1210, determining a first corpus text; step 1220, determining a second corpus text; and step 1230, determining the matched corpus text.

In step 1210, a first corpus text is determined according to the matching degree of the search features and the spatio-temporal tags of the corpus texts.

In step 1220, a second corpus text belonging to the same type of event as the first corpus text is determined according to the event tag of the first corpus text.

In some embodiments, according to the context information of each corpus text in the text to be processed, extracting the event features of each corpus text by using a machine learning model; and marking the same event label for the corpus texts with the same event characteristics.

In some embodiments, spatiotemporal events belonging to the same event may be classified into the same class of spatiotemporal events and constructed as one set of events. Each spatiotemporal event in an event set has the same event label.

Therefore, multi-time-space correlation analysis of each corpus text can be realized, and different time-space times under the same event are correlated together. For example, event sequencing and geographical classification can be performed on each spatio-temporal event belonging to the same event set according to the spatio-temporal labels, so that the process deduction of one event is realized. Through the space-time correlation, the coverage range of the search result can be improved, and the accuracy of the search result is further improved.

In step 1230, the first corpus text and the second corpus text are determined as corpus texts matching the search text data.

In some embodiments, the corpus text matching the search text data is plural. In this case, the relevant event of searching the text data may be determined according to the event tags of the plurality of matched corpus texts; and generating at least one item of spatial track information or time axis information of the related events according to the space-time labels of the plurality of matched corpus texts.

In some embodiments, the related events are displayed in a labeling mode at corresponding positions on a map according to the spatial track information of the related events.

In some embodiments, a relevant area is determined on a map according to spatial trajectory information of a relevant event, and time text information determined according to time axis information, or time axis graphic information is displayed on the relevant area.

In some embodiments, the server of the technical solution of the present disclosure may be configured by the embodiment in fig. 3.

Fig. 3 shows a schematic diagram of some embodiments of a method of searching for text data of the present disclosure.

As shown in fig. 3, the service end (platform) of the method may include an application presentation layer, a first service layer, a second service layer, and a base component layer.

In some embodiments, the application presentation layer may include a read + Redux framework, a Terria map framework, Echart (visualization tool), and the like.

In some embodiments, the first service layer may include a Shiro + jwt rights framework, a base services module, a data collection module, and the like. For example, an algorithm analysis pool, a spatiotemporal information extraction module, a news situation analysis model, a multi-spatiotemporal association analysis model and the like can be further included.

In some embodiments, the second service layer may include wndshift, car.

In some embodiments, the base components may include Citus, postgresql, Zombodb, ES (Elastic Search), Redis (cache), MapNik, and the like.

In some embodiments, in view of the fact that the data volume of the server is large and a single database is difficult to support, postgresql can be used for building a cluster, and sub-library and sub-table are used for relieving the read-write pressure of the single library and the single table.

In some embodiments, the search of the method may include full text retrieval. The full-text retrieval can be realized by adopting postgresql special-purpose cis database middleware and adopting ES service. For example, a Zombodb plug-in may be employed to access ES services. In this way, Zombodb can enable the postgresql database to internally support the ES full-text index without having to synchronize data in the ES service.

In some embodiments, the data caching service is implemented based on Redis. And rendering the space information into a map by adopting Mapnik. The message queue is implemented using kafka.

In some embodiments, the first service layer responds to the request of the upper application and the second service layer, and obtains data from the platform database to perform business logic processing; and feeding back data to the application layer. And service support is provided for realizing all functions.

Fig. 4 shows a schematic diagram of further embodiments of the text data search method of the present disclosure.

As shown in fig. 4, the document library is used to store document contents as a search range. The document library may include document content and folders (e.g., sub-folder nesting may be supported).

For example, deleting a folder will delete both the contained document content and subfolders. Renaming and copy movement may be supported. The document repository may be set public or private.

For example, the document repository creation and management may default to 10 document repositories created by the maximum support user, and may be configured as needed. The default user maximum usage document storage space may be 200MB and may be configured as desired.

In some embodiments, before the creation, browsing, and management of the document content reaches the user storage space threshold, the document content may be added to the designated folder by uploading the file or providing a link.

In some embodiments, the document content in the document library may be a personal document uploaded by the user or a link text provided by the user; or the web document crawler can be used for crawling from an internet website and uploading. For example, the links may support web page text extraction.

In some embodiments, it is desirable to provide metadata of a document at the time of uploading the document for more accurate full-text parsing. The public or private setting of a document depends on the disclosure of the repository of documents.

In some embodiments, the corpus is used to store the annotated corpus text set for training. The labeled corpus text comprises a label and document content after word segmentation.

For example, a user may default to create up to 10 corpora and may configure as desired. The corpus may not store files, and only extract text content for storage. The corpus text of the user can support at most 2 ten thousand vocabularies by default and can be configured according to the needs.

For example, corpus text may support browsing and editing. After the corpus is updated, the machine learning model may be retrained.

In some embodiments, the corpus text in the corpus can be forwarded from the document library and then edited; may be uploaded directly to the corpus by the user. For example, the extraction of spatiotemporal features and multi-spatiotemporal analysis can be performed on the documents in the document library, and the corpus text is generated and forwarded to the corpus for storage.

In some embodiments, each corpus may correspond to a Labeled-LDA model for labeling spatiotemporal labels. For example, after the Labeled-LDA model is updated, the task of updating the label can be performed. At this point, the spatiotemporal labels may be regenerated using the Labeled-LDA model. The corpus can be set as public or private.

In some embodiments, the event collection may be a user-created logical grouping for collecting the same type of time-space events together. The event set may be used for subsequent analysis and map visualization. For example, a user may create a set of events for an activity that are used to gather all spatiotemporal events for that activity together.

In some embodiments, the set of events may be set public or private. The set of public events does not distinguish whether the spatiotemporal events originate from private documents or public documents. Once the set of events is published, spatiotemporal events derived from within the private document will also be published, but the source document will not.

In some embodiments, all spatiotemporal events corresponding to documents in a document repository may be added in bulk to the set of events. The set of events may provide a data basis for map visualization and analysis. Each document library can establish an event set, and a user can modify the default event set.

In some embodiments, a user may perform a search query on search text through a search engine, the processing granularity of the search being spatiotemporal events. For example, basic queries for keyword queries and advanced queries based on temporal queries and spatial queries may be supported simultaneously.

In some embodiments, tag-based retrieval and specification of a retrieval scope (e.g., all public events and private data of itself, a specified set of events) may be supported.

In some embodiments, the time-space events can be stored in a corpus and an event set after training and labeling the time-space events through word segmentation. When a user searches through the keywords, the event labels or the time-space labels of the event set can be used as indexes for displaying.

In some embodiments, time sorting may be performed according to time information in search text data corresponding to a keyword query or an event query; or carrying out geographical classification according to the spatial information in the search text data corresponding to the spatial query. And inquiring in the event set according to the sorting result and the classification result.

In some embodiments, the search results may be returned to the user. And displaying by using a map service according to the retrieval result so as to perform map visualization analysis.

In some embodiments, the data source of the map visualization analysis is a specified set of events. The event set and charting scheme may be in a one-to-many relationship. For example, the same set of events may create different map visualization schemes (location tracks, timelines, time information, etc.). The charting scheme may be maintained, public or private depending on the set of events used for charting. The charting scheme can be retrieved according to the picture name, the author user name and the data set name.

Fig. 5 illustrates a block diagram of some embodiments of a search apparatus of text data of the present disclosure.

As shown in fig. 5, the search device 5 for text data includes an extraction unit 51 and a determination unit 52.

The extraction unit 51 extracts at least one of a temporal feature or a spatial feature of the search text data as a spatio-temporal feature using a machine learning model.

The determining unit 52 determines a corpus text matching the search text data according to the degree of matching between the spatio-temporal features and the spatio-temporal labels of the corpus texts. The space-time label is used for labeling at least one item of time information or space information of the corpus text.

In some embodiments, the determining unit 52 determines the first corpus text according to the matching degree of the search feature and the spatio-temporal label of each corpus text; determining a second corpus text belonging to the same type of event as the first corpus text according to the event label of the first corpus text; and determining the first language material text and the second language material text as language material texts matched with the search text data.

In some embodiments, the corpus text matching the search text data is plural. The determining unit 52 determines a related event of the search text data according to the event tags of the plurality of matched corpus texts.

In some embodiments, the search apparatus 5 further includes a generating unit 51 for generating at least one of spatial trajectory information or time axis information of the related events according to the spatio-temporal labels of the plurality of matched corpus texts.

In some embodiments, the search apparatus 5 further comprises a display unit 52 for performing at least one of the following steps: according to the spatial track information of the relevant events, marking and displaying the relevant events at corresponding positions on a map; or according to the space track information of the relevant event, determining a relevant area on the map, and displaying the time character information or the time shaft graphic information determined according to the time shaft information on the relevant area.

Fig. 6 shows a block diagram of further embodiments of a device for searching text data according to the disclosure.

As shown in fig. 6, the text data search device 6 of this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61, the processor 62 being configured to execute a search method of text data in any one embodiment of the present disclosure based on instructions stored in the memory 61.

The memory 61 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, a database, and other programs.

As shown in fig. 7, the text data search device 7 of this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710, the processor 720 being configured to execute the text data searching method in any of the foregoing embodiments based on instructions stored in the memory 710.

The memory 710 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader, and other programs.

The search means 7 for text data may further include an input-output interface 730, a network interface 740, a storage interface 750, and the like. These

interfaces

730, 740, 750, as well as the memory 710 and the processor 720, may be connected, for example, by a bus 760. The input/output interface 730 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a speaker. The network interface 740 provides a connection interface for various networking devices. The storage interface 750 provides a connection interface for external storage devices such as an SD card and a usb disk.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media having computer-usable program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.

So far, the search method of text data, the search apparatus of text data, and the nonvolatile computer-readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of searching text data, comprising:

extracting at least one of time characteristics or space characteristics of the searched text data as space-time characteristics by using a machine learning model;

and determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time label of each corpus text, wherein the space-time label is used for labeling at least one item of time information or space information of the corpus text.

2. The search method of claim 1, wherein the spatiotemporal tag is generated by:

extracting at least one of time characteristics or space characteristics of each sentence in the text to be processed as space-time characteristics by using a machine learning model;

and dividing the text to be processed into the corpus texts according to the space-time characteristics, and generating space-time labels of the corpus texts.

3. The search method according to claim 1, wherein the determining corpus text matching the search text data according to the degree of matching the spatio-temporal features with the spatio-temporal labels of the corpus texts comprises:

determining a first corpus text according to the matching degree of the search features and the space-time labels of all corpus texts;

determining a second corpus text belonging to the same type of event as the first corpus text according to the event label of the first corpus text;

and determining the first language material text and the second language material text as language material texts matched with the search text data.

4. The search method of claim 3, wherein the event tag is generated by:

extracting the event characteristics of each corpus text by using a machine learning model according to the context information of each corpus text in the text to be processed;

and marking the same event label for the corpus texts with the same event characteristics.

5. The search method according to any one of claims 1 to 4,

a plurality of corpus texts matched with the search text data;

further comprising:

determining relevant events of the search text data according to event tags of a plurality of matched corpus texts;

and generating at least one item of spatial track information or time axis information of the relevant events according to the space-time labels of the plurality of matched corpus texts.

6. The search method of claim 5, further comprising at least one of the following steps:

according to the spatial track information of the related events, the related events are marked and displayed at corresponding positions on a map; or

And determining a relevant area on a map according to the spatial track information of the relevant event, and displaying time character information or time axis graphic information determined according to the time axis information on the relevant area.

7. An apparatus for searching text data, comprising:

an extraction unit configured to extract at least one of a temporal feature or a spatial feature of the search text data as a spatio-temporal feature using a machine learning model;

and the determining unit is used for determining the corpus text matched with the search text data according to the matching degree of the space-time characteristics and the space-time label of each corpus text, wherein the space-time label is used for labeling at least one item of time information or space information of the corpus text.

8. The search apparatus according to claim 7,

the determining unit determines the relevant events of the search text data according to the event tags of the plurality of matched corpus texts;

further comprising:

and the generating unit is used for generating at least one item of space track information or time axis information of the related events according to the space-time labels of the plurality of matched corpus texts.

9. The search apparatus of claim 8, further comprising a display unit for performing at least one of the following steps:

10. An apparatus for searching text data, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of searching for text data of any of claims 1-6 based on instructions stored in the memory.

11. A non-transitory computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the text data search method according to any one of claims 1 to 6.