JPWO2020263675A5

JPWO2020263675A5 -

Info

Publication number: JPWO2020263675A5
Application number: JP2021539860A
Authority: JP
Publication date: 2023-06-02

Claims

A method of generating a regular expression comprising:
A regular expression generator comprising one or more processors receives a first selection including one or more positive character sequences, each of said one or more positive character sequences comprising said Corresponding to positive examples to be matched by the regular expression generated by the regular expression generator, the method further comprises:
The method includes the regular expression generator generating a first regular expression, the first regular expression matching the positive examples, the method further comprising:
the regular expression generator receiving a second selection including one or more negative character sequences, each of the one or more negative character sequences generated by the regular expression generator; corresponding to negative examples that should not be matched by said regular expression, said method further comprising:
Determining the context of the one or more negative character sequences corresponding to the negative example in response to receiving the second selection;
and updating the first regular expression based on the determined context of the one or more negative character sequences.

2. The method of claim 1, wherein receiving the first selection comprises receiving a selection of the one or more positive character sequences in a first data cell of a data set via a user interface. .

Further comprising the regular expression generator automatically selecting character sequences within a plurality of data cells within the data set that correspond to a first selection that includes the one or more positive character sequences. 3. The method of claim 2.

4. The method of claim 3, wherein receiving the second selection comprises receiving a selection of the one or more negative character sequences in a second data cell of the data set via the user interface. the method of.

Further automatically selecting, by the regular expression generator, character sequences within the plurality of data cells in the data set that correspond to a second selection that includes the one or more negative character sequences. 5. The method of claim 4, comprising:

4. The first selection is highlighted with a first highlighting format and the second selection is highlighted with a second highlighting format different from the first highlighting format. 6. The method according to any one of 1 to 5 .

Determining the context of the one or more negative character sequences corresponding to the negative examples includes:
identifying embedded highlight locations of the second selection;
determining context from data to the left of the embedded highlight location of the second selection;
and determining context from data to the right of the embedded highlight location of the highlighted second selection.

Determining the context of the one or more negative character sequences corresponding to the negative examples further comprises:
the one automatically selected based on the determined context from the data to the left of the embedded highlight position and the determined context from the data to the right of the embedded highlight position; or filtering character sequences within the plurality of data cells in the data set corresponding to the first selection including a plurality of negative character sequences;
removing the filtered character sequences corresponding to the selected one or more negative character sequences from the selected character sequences in the plurality of data cells in the data set; 8. The method of claim 7.

determining the context from data to the left of the embedded highlight location includes identifying a first span to the left of the embedded highlight location;
Filtering the character sequences in the plurality of data cells in the data set corresponding to the selected one or more negative character sequences comprises: 9. The method of claim 8, further comprising identifying spans in the character sequences in the plurality of data cells corresponding to which do not match the first span to the left of the embedded highlight position.

determining context from data to the left of the embedded highlight location further includes identifying a second span to the left of the embedded highlight;
Filtering the character sequences in the plurality of data cells in the data set corresponding to the selected one or more negative character sequences comprises: 10. The method of claim 9, further comprising identifying spans in the character sequences in the plurality of data cells corresponding to which do not match the second span to the left of the embedded highlight position.

determining the context from data to the right of the embedded highlight location includes identifying a first span to the right of the embedded highlight location;
Filtering the character sequences in the plurality of data cells in the data set corresponding to a second selection containing the one or more negative character sequences comprises: and further comprising identifying a span that does not match the first span to the right of the embedded highlight position in the sequence of characters in the plurality of data cells corresponding to a second selection comprising 7. The method according to 7.

A regular expression generator server computer,
a processor;
memory;
a storage medium coupled to the processor, the storage medium storing instructions executable by the processor to implement the method of any one of claims 1 to 11. instrument server computer.

A program comprising instructions for, when executed by one or more processors, causing the one or more processors to perform the method of any one of claims 1 to 11.