WO2024040607A1 - Data access method and apparatus, electronic device, and computer-readable storage medium - Google Patents

Data access method and apparatus, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
WO2024040607A1
WO2024040607A1 PCT/CN2022/115273 CN2022115273W WO2024040607A1 WO 2024040607 A1 WO2024040607 A1 WO 2024040607A1 CN 2022115273 W CN2022115273 W CN 2022115273W WO 2024040607 A1 WO2024040607 A1 WO 2024040607A1
Authority
WO
WIPO (PCT)
Prior art keywords
list
dictionary
matching
keys
key
Prior art date
Application number
PCT/CN2022/115273
Other languages
French (fr)
Chinese (zh)
Inventor
王丹
王德慧
江宁
张拓
王刚
Original Assignee
西门子股份公司
西门子(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西门子股份公司, 西门子(中国)有限公司 filed Critical 西门子股份公司
Priority to PCT/CN2022/115273 priority Critical patent/WO2024040607A1/en
Publication of WO2024040607A1 publication Critical patent/WO2024040607A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a data access method and apparatus, an electronic device, and a computer-readable storage medium. The method comprises: generating a list comprising keys in a dictionary, wherein the dictionary stores data in key-value pairs, and the list supports regular matching; receiving a regular expression comprising a matching rule, wherein the matching rule is described by a character string; matching the list by using the matching rule of the regular expression; and using a hit result matching the list as a retrieval item, and querying the dictionary to access the value corresponding to the retrieval item. Thus, the complexity of directly retrieving a dictionary is reduced, case sensitivity is achieved, and fault tolerance is improved.

Description

数据访问方法、装置、电子设备及计算机可读存储介质Data access methods, devices, electronic equipment and computer-readable storage media 技术领域Technical field
本发明涉及数据处理技术领域,特别是数据访问方法、装置、电子设备及计算机可读存储介质。The present invention relates to the field of data processing technology, in particular to data access methods, devices, electronic equipment and computer-readable storage media.
背景技术Background technique
字典(Dictionary)用于存储键值对(key=>value),是Python等编程语言中的一种数据结构。具体地,字典可视为一种可变容器模型,且可存储任意类型对象。字典的每个键值对用冒号(:)分割,每个键值对之间用逗号(,)分割,整个字典包括在花括号({})中。键(key)必须唯一,值(value)则不必;值可以取任何数据类型,但键必须是不可变数据类型,如:字符串、整型、元组,等等。字典查询效率高,内部使用键来计算一个内存地址(hash)。Dictionary is used to store key-value pairs (key=>value) and is a data structure in programming languages such as Python. Specifically, a dictionary can be viewed as a mutable container model and can store any type of object. Each key-value pair in the dictionary is separated by a colon (:), each key-value pair is separated by a comma (,), and the entire dictionary is enclosed in curly braces ({}). The key (key) must be unique, but the value (value) does not have to be; the value can be of any data type, but the key must be an immutable data type, such as: string, integer, tuple, etc. Dictionary query is efficient and uses keys internally to calculate a memory address (hash).
比如,D1={‘a’:1,‘b’:2},其中D1为字典。字典D1包含两个键:‘a’和‘b’,分别存储值1和2。如果期望获取存储在对应于‘a’的地址处的值,那么D1[′a′]的值为1。For example, D1={‘a’:1,‘b’:2}, where D1 is a dictionary. Dictionary D1 contains two keys: 'a' and 'b', which store the values 1 and 2 respectively. If it is desired to obtain the value stored at the address corresponding to 'a', then the value of D1['a'] is 1.
当字典在大小和深度上变得复杂时,字典中通常存在长键,此时输入正确的键具有难度,因此难以基于键访问数据。尤其是,当长键中同时包含大写字母和小写字母,访问数据更加麻烦。When the dictionary becomes complex in size and depth, there are often long keys in the dictionary and it becomes difficult to enter the correct key and therefore it becomes difficult to access the data based on the key. In particular, when the long key contains both uppercase and lowercase letters, accessing the data is more troublesome.
发明内容Contents of the invention
本发明实施方式提出数据访问方法、装置、电子设备及计算机可读存储介质。The embodiments of the present invention provide data access methods, devices, electronic equipment and computer-readable storage media.
一种数据访问方法,所述方法包括:A data access method, the method includes:
生成包含字典中的键的列表,其中所述字典以键值对的方式存储数据,所述列表支持正则匹配;Generate a list containing keys in a dictionary, where the dictionary stores data in the form of key-value pairs, and the list supports regular matching;
接收包含匹配规则的正则表达式,其中所述匹配规则是通过字符串描述的;Receive a regular expression containing a matching rule, where the matching rule is described by a string;
利用所述正则表达式中的所述匹配规则匹配所述列表;Match the list using the matching rule in the regular expression;
将匹配所述列表的命中结果作为检索项,查询所述字典以访问对应于所述检索项的值。Hits matching the list are used as search terms, and the dictionary is queried to access values corresponding to the search terms.
可见,本发明实施方式生成支持正则匹配的、包含字典中的键的列表,再基于接收到的正则表达式在列表中匹配,因此可以直接利用命中结果在字典中访问对应于检索项的值,而无需用户直接提供冗长的键,提高了用户的使用便利度,并降低了数据访问难度。It can be seen that the embodiment of the present invention generates a list containing keys in the dictionary that supports regular matching, and then matches in the list based on the received regular expression. Therefore, the hit result can be directly used to access the value corresponding to the search item in the dictionary. There is no need for users to directly provide lengthy keys, which improves user convenience and reduces the difficulty of data access.
在示范性实施方式中,所述生成包含字典中的键的列表包括:In an exemplary embodiment, generating a list including keys in the dictionary includes:
解析所述字典,以确定所述字典中的最底层键,其中所述字典中存储的、对应于所述最底层键的值中不嵌套其它字典;Parse the dictionary to determine the lowest-level key in the dictionary, wherein no other dictionaries are nested in the values stored in the dictionary corresponding to the lowest-level key;
生成包含所述字典中的最底层键的列表。Generate a list containing the lowest-level keys in the dictionary.
因此,通过生成包含字典中的最底层键的一张列表,即可以快速访问数据。Therefore, data can be accessed quickly by generating a list containing the lowest-level keys in the dictionary.
在示范性实施方式中,当所述最底层键具有层之间的从属关系时,在所述列表中利用正斜杠作为表征所述从属关系的连接符。In an exemplary embodiment, when the lowest-level key has a dependency relationship between layers, a forward slash is used in the list as a connector characterizing the dependency relationship.
可见,利用编程语言中不具有特殊含义的正斜杠对列表中具有从属关系的最底层键进行关系描述,便于用户准确撰写正则表达式。It can be seen that forward slashes, which have no special meaning in programming languages, are used to describe the relationship of the lowest-level keys with subordinate relationships in the list, making it easier for users to write regular expressions accurately.
在示范性实施方式中,所述字典包括N层的键,其中N为至少为2的正整数;In an exemplary embodiment, the dictionary includes N levels of keys, where N is a positive integer that is at least 2;
所述生成包含字典中的键的列表包括:解析所述字典,以生成N个列表,其中每个列表包含所述字典中的相同对应层的键;The generating a list containing keys in the dictionary includes: parsing the dictionary to generate N lists, wherein each list contains keys of the same corresponding level in the dictionary;
所述接收包含匹配规则的正则表达式包括:接收包含各自匹配规则的N个正则表达式,其中所述N个正则表达式与所述N个列表一一对应;The receiving regular expressions containing matching rules includes: receiving N regular expressions containing respective matching rules, wherein the N regular expressions correspond to the N lists one-to-one;
所述利用正则表达式中的所述匹配规则匹配所述列表包括:利用所述N个正则表达式中的每个正则表达式中的各自匹配规则与对应的列表进行匹配;Using the matching rules in the regular expressions to match the list includes: using respective matching rules in each of the N regular expressions to match the corresponding list;
所述将匹配所述列表的命中结果作为检索项,查询所述字典以访问对应于所述检索项的值包括:将每个正则表达式与对应的列表的命中结果作为检索项,以逐层匹配方式从对应层中确定出对应于所述检索项的值。The step of using the hit result matching the list as a retrieval item and querying the dictionary to access the value corresponding to the retrieval item includes: using the hit result of each regular expression and the corresponding list as a retrieval item to layer-by-layer The matching method determines the value corresponding to the retrieval item from the corresponding layer.
因此,本发明实施方式还利用对应于层数的多个正则表达式,以逐层匹 配方式确定出值的实施方式,减少了列表的复杂度,降低了生成列表的难度。Therefore, the embodiment of the present invention also uses multiple regular expressions corresponding to the number of layers to determine the value in a layer-by-layer matching manner, which reduces the complexity of the list and reduces the difficulty of generating the list.
在示范性实施方式中,所述利用正则表达式中的所述匹配规则匹配所述列表包括:In an exemplary implementation, matching the list using the matching rule in a regular expression includes:
利用所述匹配规则精确匹配所述列表;或Exactly match the list using the matching rules; or
利用所述匹配规则模糊匹配所述列表,其中所述模糊匹配包含下列中的至少一个:The list is fuzzy matched using the matching rule, wherein the fuzzy match includes at least one of the following:
不区分大小写;错字匹配。Case-insensitive; typo matching.
可见,本发明实施方式通过模糊匹配,可以不区分大小写和支持错字匹配,从而降低了用户使用难度。It can be seen that the embodiment of the present invention can be case-insensitive and support typo matching through fuzzy matching, thereby reducing user difficulty.
一种数据访问装置,所述装置包括:A data access device, the device includes:
列表生成模块,被配置为生成包含字典中的键的列表,其中所述字典以键值对的方式存储数据,所述列表支持正则匹配;A list generation module configured to generate a list containing keys in a dictionary, wherein the dictionary stores data in the form of key-value pairs, and the list supports regular matching;
接收模块,被配置为接收包含匹配规则的正则表达式,其中所述匹配规则是通过字符串描述的;A receiving module configured to receive a regular expression containing a matching rule, where the matching rule is described by a string;
匹配模块,被配置为利用所述正则表达式中的所述匹配规则匹配所述列表;a matching module configured to match the list using the matching rule in the regular expression;
查询模块,被配置为将匹配所述列表的命中结果作为检索项,查询所述字典以访问对应于所述检索项的值。A query module configured to use hit results matching the list as retrieval items and query the dictionary to access values corresponding to the retrieval items.
可见,本发明实施方式生成支持正则匹配的、包含字典中的键的列表,再基于接收到的正则表达式在列表中匹配,因此可以直接利用命中结果在字典中访问对应于检索项的值,而无需用户直接提供冗长的键,提高了用户的使用便利度,并降低了数据访问难度。It can be seen that the embodiment of the present invention generates a list containing keys in the dictionary that supports regular matching, and then matches in the list based on the received regular expression. Therefore, the hit result can be directly used to access the value corresponding to the search item in the dictionary. There is no need for users to directly provide lengthy keys, which improves user convenience and reduces the difficulty of data access.
在示范性实施方式中,所述列表生成模块,被配置为解析所述字典,以确定所述字典中的最底层键,其中所述字典中存储的、对应于所述最底层键的值中不嵌套其它字典;生成包含所述字典中的最底层键的列表。In an exemplary embodiment, the list generation module is configured to parse the dictionary to determine a bottom-level key in the dictionary, wherein among the values stored in the dictionary corresponding to the bottom-level key No other dictionaries are nested; produces a list containing the lowest-level keys in the dictionary.
因此,通过生成包含字典中的最底层键的一张列表,即可以快速访问数据。Therefore, data can be accessed quickly by generating a list containing the lowest-level keys in the dictionary.
在示范性实施方式中,所述列表生成模块,被配置为当所述最底层键具有层之间的从属关系时,在所述列表中利用正斜杠作为表征所述从属关系的 连接符。In an exemplary embodiment, the list generation module is configured to use a forward slash in the list as a connector characterizing the dependency relationship when the lowest-level key has a dependency relationship between layers.
可见,利用编程语言中不具有特殊含义的正斜杠对列表中具有从属关系的最底层键进行关系描述,便于用户准确撰写正则表达式。It can be seen that forward slashes, which have no special meaning in programming languages, are used to describe the relationship of the lowest-level keys with subordinate relationships in the list, making it easier for users to write regular expressions accurately.
在示范性实施方式中,所述字典包括N层的键,其中N为至少为2的正整数;In an exemplary embodiment, the dictionary includes N levels of keys, where N is a positive integer that is at least 2;
所述列表生成模块,被配置为解析所述字典,以生成N个列表,其中每个列表包含字典中的相同对应层的键;The list generation module is configured to parse the dictionary to generate N lists, wherein each list contains keys of the same corresponding layer in the dictionary;
所述接收模块,被配置为接收包含各自匹配规则的N个正则表达式,其中所述N个正则表达式与所述N个列表一一对应;The receiving module is configured to receive N regular expressions containing respective matching rules, wherein the N regular expressions correspond to the N lists one-to-one;
所述匹配模块,被配置为利用所述N个正则表达式中的每个正则表达式中的各自匹配规则与对应的列表进行匹配;The matching module is configured to use respective matching rules in each of the N regular expressions to match the corresponding list;
所述查询模块,被配置为将每个正则表达式与对应的列表的命中结果作为检索项,以逐层匹配方式从对应层中确定出对应于所述检索项的值。The query module is configured to use the hit result of each regular expression and the corresponding list as a retrieval item, and determine the value corresponding to the retrieval item from the corresponding layer in a layer-by-layer matching manner.
因此,本发明实施方式还利用对应于层数的多个正则表达式,以逐层匹配方式确定出值的实施方式,减少了列表的复杂度,降低了生成列表的难度。Therefore, the embodiment of the present invention also uses multiple regular expressions corresponding to the number of layers to determine the value in a layer-by-layer matching manner, thereby reducing the complexity of the list and reducing the difficulty of generating the list.
在示范性实施方式中,所述匹配模块,被配置为利用所述匹配规则精确匹配所述列表;或利用所述匹配规则模糊匹配所述列表,其中所述模糊匹配包含下列中的至少一个:In an exemplary embodiment, the matching module is configured to accurately match the list using the matching rules; or fuzzy matching the list using the matching rules, wherein the fuzzy matching includes at least one of the following:
不区分大小写;错字匹配。Case-insensitive; typo matching.
可见,本发明实施方式通过模糊匹配,可以不区分大小写和支持错字匹配,从而降低了用户使用难度。It can be seen that the embodiment of the present invention can be case-insensitive and support typo matching through fuzzy matching, thereby reducing user difficulty.
一种电子设备,包括:An electronic device including:
处理器;processor;
存储器,用于存储所述处理器的可执行指令;memory for storing executable instructions for the processor;
所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述可执行指令以如上任一项所述的数据访问方法。The processor is configured to read the executable instructions from the memory and execute the executable instructions to perform any of the above data access methods.
一种计算机可读存储介质,其上存储有计算机指令,所述计算机指令被处理器执行时实施如上任一项所述的数据访问方法。A computer-readable storage medium has computer instructions stored thereon, and when the computer instructions are executed by a processor, the data access method as described in any one of the above items is implemented.
一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实施如上任一项所述的数据访问方法。A computer program product includes a computer program that implements the data access method as described in any one of the above items when executed by a processor.
附图说明Description of drawings
下面将通过参照附图详细描述本发明的优选实施例,使本领域的普通技术人员更清楚本发明的上述及其它特征和优点,附图中:Preferred embodiments of the present invention will be described in detail below to make the above and other features and advantages of the present invention more apparent to those skilled in the art with reference to the accompanying drawings, in which:
图1是根据本发明实施方式的数据访问方法的流程图。Figure 1 is a flow chart of a data access method according to an embodiment of the present invention.
图2是根据本发明实施方式的字典的示范性示意图。Figure 2 is an exemplary schematic diagram of a dictionary according to an embodiment of the present invention.
图3是根据本发明实施方式的数据访问方法的第一示范性流程图。Figure 3 is a first exemplary flow chart of a data access method according to an embodiment of the present invention.
图4是根据本发明实施方式的数据访问方法的第二示范性流程图。Figure 4 is a second exemplary flow chart of a data access method according to an embodiment of the present invention.
图5是根据本发明实施方式的数据访问装置的示范性结构图。Figure 5 is an exemplary structural diagram of a data access device according to an embodiment of the present invention.
图6是根据本发明实施方式电子设备的示范性结构图。FIG. 6 is an exemplary structural diagram of an electronic device according to an embodiment of the present invention.
其中,附图标记如下:Among them, the reference signs are as follows:
标号label 含义meaning
100100 数据访问方法 Data access methods
101~104101~104 步骤 step
1010 第一字典 first dictionary
1111 第二字典 second dictionary
1212 第三字典 third dictionary
201~205201~205 步骤 step
301~304301~304 步骤 step
500500 数据访问装置 data access device
501501 列表生成模块 List generation module
502502 接收模块 receiving module
503503 匹配模块 matching module
504504 查询模块 Query module
600600 电子设备Electronic equipment
601601 处理器 processor
602602 存储器memory
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚,以下举实施例对本发明进一步详细说明。In order to make the purpose, technical solutions and advantages of the present invention clearer, the following examples are given to further describe the present invention in detail.
为了描述上的简洁和直观,下文通过描述若干代表性的实施方式来对本发明的方案进行阐述。实施方式中大量的细节仅用于帮助理解本发明的方案。但是很明显,本发明的技术方案实现时可以不局限于这些细节。为了避免不必要地模糊了本发明的方案,一些实施方式没有进行细致地描述,而是仅给出了框架。下文中,“包括”是指“包括但不限于”,“根据……”是指“至少根据……,但不限于仅根据……”。由于汉语的语言习惯,下文中没有特别指出一个成分的数量时,意味着该成分可以是一个也可以是多个,或可理解为至少一个。For the sake of simplicity and intuitiveness in description, the solution of the present invention is explained below by describing several representative embodiments. A large number of details in the embodiments are only used to help understand the solution of the present invention. However, it is obvious that the technical solution of the present invention may not be limited to these details when implemented. In order to avoid unnecessarily obscuring the solutions of the present invention, some embodiments are not described in detail, but only give a framework. Hereinafter, "including" means "including but not limited to", and "based on..." means "at least based on..., but not limited to only based on...". Due to Chinese language habits, when the number of a component is not specified below, it means that the component can be one or more, or it can be understood as at least one.
在现有技术中,直接利用键的精确匹配在字典中获取对应的值。然而,当字典结构复杂(比如大小和深度较大)时,字典中通常存在长键,此时难以写出准确长键从字典中检索出值。尤其是,长键中可能包含大写和小写字母,此时写出的长键更易出错。比如,假定具有如下词典D2:In the existing technology, the exact matching of keys is directly used to obtain the corresponding value in the dictionary. However, when the dictionary structure is complex (such as large size and depth), there are usually long keys in the dictionary, and it is difficult to write accurate long keys to retrieve values from the dictionary. In particular, long keys may contain uppercase and lowercase letters, making writing long keys more error-prone. For example, assume that we have the following dictionary D2:
Figure PCTCN2022115273-appb-000001
Figure PCTCN2022115273-appb-000001
当期望访问值['XGB']时,需要用户准确写出如下的表达式:When expecting to access the value ['XGB'], the user is required to accurately write the following expression:
D2['predict_t_config']['LIST_000000']['train_cfg_obj_PT']['LIST_000000']['selected_methods'];D2['predict_t_config']['LIST_000000']['train_cfg_obj_PT']['LIST_000000']['selected_methods'];
这具有相当的难度,而且容易出错。在使用这样的字典编程时,这将是非常低效的。另外,词语的正确拼写也具有难度。例如:字典包含名为“cost”的键,由于键区分大小写,在编程时可能将其错误拼写为“cos”、“Cost”或“coSt”。X['cost']可以正确检索到对应的值,而X['cos']、X['cost']和X['COMST']将不能正确检索出值。This is quite difficult and error-prone. This would be very inefficient when programming with such a dictionary. In addition, spelling words correctly can be difficult. For example: a dictionary contains a key named "cost", which may be misspelled as "cos", "Cost" or "coSt" when programming since keys are case sensitive. X['cost'] can retrieve the corresponding value correctly, but X['cos'], X['cost'] and X['COMST'] will not retrieve the value correctly.
在本发明实施方式中,基于正则表达式解决从复杂字典中获取值的技术难题。考虑到难以直接通过正则表达式从字典中获取值,本发明实施方式首先生成支持正则匹配的列表,其中列表中包含字典中的键,从而可以通过正则表达式匹配列表以确定出键。In the embodiment of the present invention, the technical problem of obtaining values from complex dictionaries is solved based on regular expressions. Considering that it is difficult to obtain values from the dictionary directly through regular expressions, the embodiment of the present invention first generates a list that supports regular matching, where the list contains keys in the dictionary, so that the keys can be determined by matching the list through regular expressions.
图1是根据本发明实施方式的数据访问方法的流程图。如图1所示,该方法包括:Figure 1 is a flow chart of a data access method according to an embodiment of the present invention. As shown in Figure 1, the method includes:
步骤101:生成包含字典中的键的列表,其中所述字典以键值对的方式存储数据,所述列表支持正则匹配。Step 101: Generate a list containing keys in a dictionary, where the dictionary stores data in the form of key-value pairs, and the list supports regular matching.
在这里,字典又称为哈希表(hashmap)或映射表(map),它以key-value的方式存储数据,通过键可以查找值。查找字典时,通常先计算键的哈希(hash)值,然后通过模运算快速定位到数组下标,如果下标只有一个元素,那么就直接返回值(value);如果有多个元素都存储在同一个下标,再进行比较,相同者返回值。比如,可以利用Python编码中的dict函数创建字典。Here, the dictionary is also called a hash map (hashmap) or a mapping table (map). It stores data in a key-value manner, and the value can be looked up through the key. When searching a dictionary, the hash value of the key is usually calculated first, and then the array subscript is quickly located through modular operation. If the subscript has only one element, then the value is returned directly; if there are multiple elements, they are stored Compare the same subscript and return the same value. For example, you can use the dict function in Python coding to create a dictionary.
在这里,对字典进行解析,以生成支持正则匹配的列表,且在列表中包含字典中的键。比如,可以通过下列编程语言生成支持正则匹配的列表:Scala、PHP、C#、Java、C++、Objective-c、Perl、Swift、VBScript、Javascript、Ruby以及Python等等。Here, the dictionary is parsed to generate a list that supports regular matching and contains the keys in the dictionary. For example, lists that support regular matching can be generated through the following programming languages: Scala, PHP, C#, Java, C++, Objective-c, Perl, Swift, VBScript, Javascript, Ruby, Python, etc.
在一个实施方式中,步骤101包括:解析字典,以确定字典中的最底层键,其中字典中存储的、对应于最底层键的值中不嵌套其它字典;生成包含字典中的最底层键的列表。In one embodiment, step 101 includes: parsing the dictionary to determine the lowest-level key in the dictionary, where the value stored in the dictionary corresponding to the lowest-level key is not nested in other dictionaries; generating a file containing the lowest-level key in the dictionary list of.
因此,通过生成包含字典中的最底层键的一张列表,即可以快速访问数据。Therefore, data can be accessed quickly by generating a list containing the lowest-level keys in the dictionary.
比如,图2是根据本发明实施方式的字典的示范性示意图。For example, FIG. 2 is an exemplary schematic diagram of a dictionary according to an embodiment of the present invention.
由图2可见,字典X(第一字典)10中包含位于字典X的第一层的键:‘a’、‘b’和‘c’,其中:‘a’和‘b’的值不包含其它字典,因此‘a’和‘b’属于字典X的最底层键。‘c’的值中包含第二字典11,因此‘c’不属于字典X的最底层键。第二字典11中包含位于字典X的第二层的键:‘c1’、‘c2’和‘d’,‘c1’和‘c2’的值不包含其它字典,因此‘c1’和‘c2’属于字典X的最底层键。‘d’的值中包含第三字典12,因此‘d’不属于字典X的最底层键。第三字典12中包含位于字典X的第三层的键:‘d1’和‘d2’。‘d1’和‘d2’值中不包含其它字典,因此‘d1’和‘d2’属于字典X的最底层键。As can be seen from Figure 2, dictionary X (the first dictionary) 10 contains keys located at the first level of dictionary other dictionaries, so 'a' and 'b' belong to the bottom keys of dictionary X. The value of ‘c’ contains the second dictionary 11, so ‘c’ does not belong to the bottom key of dictionary X. The second dictionary 11 contains the keys located in the second level of dictionary Belongs to the lowest level key of dictionary X. The value of 'd' contains the third dictionary 12, so 'd' does not belong to the bottom key of dictionary X. The third dictionary 12 contains keys located at the third level of dictionary X: 'd1' and 'd2'. The 'd1' and 'd2' values do not contain other dictionaries, so 'd1' and 'd2' belong to the bottom keys of dictionary X.
申请人发现:在目前的Scala、PHP、C#、Java、C++、Objective-c、Perl、Swift、VBScript、Javascript、Ruby以及Python等编程语言中,正斜杠通常并无特殊含义。在一个实施方式中,当最底层键具有层之间的从属关系时,在列表中利用正斜杠(“/”)作为表征从属关系的连接符。因此,本发明实施方式利用编程语言中通常不具有特殊含义的正斜杠,对列表中具有从属关系的最底层键进行关系描述,便于基于键的哈希运算以及有利于用户准确撰写正则表达式。The applicant found that in current programming languages such as Scala, PHP, C#, Java, C++, Objective-c, Perl, Swift, VBScript, Javascript, Ruby and Python, forward slashes usually have no special meaning. In one embodiment, when the lowest-level key has a dependency relationship between layers, a forward slash ("/") is used in the list as a connector characterizing the dependency relationship. Therefore, the embodiment of the present invention uses forward slashes, which usually do not have special meaning in programming languages, to describe the relationships of the lowest-level keys with subordinate relationships in the list, which facilitates key-based hash operations and helps users accurately compose regular expressions. .
举例:假定字典D的结构如下:Example: Assume that the structure of dictionary D is as follows:
Figure PCTCN2022115273-appb-000002
Figure PCTCN2022115273-appb-000002
可见,'c'的值包含有字典结构('c1':5,'c2':10),将字典结构('c1':5,'c2':10)称为字典C。可见,'c'不属于字典D的最底层键(最底层键的值中不再嵌套其它字典)。字典D的最底层键包括'a','b','d','c1'和'c2',其中'c1'和'c2'都从属于'c'。因此,生成的列表包括['a','b','d','c/c1','c/c2']。'c/c1'为'c1'在字典 D中的键,'c/c2'为'c2'在字典D中的键。其中:“/”可以起到连接符的作用,用于确定具有从属关系的最底层键。比如,对于'c/c1',其含义为:'c1'属于'c'值所包含的字典结构中。'c/c1'的键为:'c'在字典D中的键与'c1'在字典C中的键的结合,因此可以直接利用'c/c1'的键在字典D中直接确定出'c1'的值。It can be seen that the value of 'c' contains the dictionary structure ('c1':5, 'c2':10), and the dictionary structure ('c1':5, 'c2':10) is called dictionary C. It can be seen that 'c' does not belong to the bottom key of dictionary D (the value of the bottom key is no longer nested in other dictionaries). The lowest-level keys of dictionary D include 'a', 'b', 'd', 'c1' and 'c2', where 'c1' and 'c2' are both subordinate to 'c'. Therefore, the generated list includes ['a','b','d','c/c1','c/c2']. 'c/c1' is the key of 'c1' in dictionary D, and 'c/c2' is the key of 'c2' in dictionary D. Among them: "/" can play the role of a connector and is used to determine the lowest-level key with a subordinate relationship. For example, for 'c/c1', the meaning is: 'c1' belongs to the dictionary structure contained by the 'c' value. The key of 'c/c1' is: the combination of the key of 'c' in dictionary D and the key of 'c1' in dictionary C. Therefore, you can directly use the key of 'c/c1' to directly determine ' in dictionary D. The value of c1'.
步骤102:接收包含匹配规则的正则表达式,其中匹配规则是通过字符串描述的。Step 102: Receive a regular expression containing a matching rule, where the matching rule is described by a string.
正则表达式又称规则表达式(Regular Expression),在代码中常简写为regex、regexp或RE,是一种文本模式,包括普通字符(例如,a到z之间的字母)和特殊字符(称为“元字符”),属于计算机科学的概念。正则表达式使用单个字符串来描述、匹配一系列匹配某个句法规则的字符串,通常被用来检索、替换那些符合某个模式(规则)的文本。正则表达式是对字符串操作的一种逻辑公式,就是用事先定义好的一些特定字符、及这些特定字符的组合,组成一个“规则字符串”,这个“规则字符串”用来表达对字符串的一种过滤逻辑。由于正则表达式主要应用对象是文本,因此它在各种文本编辑器场合都有应用。Regular expression, also known as regular expression (Regular Expression), often abbreviated as regex, regexp or RE in code, is a text pattern that includes ordinary characters (for example, letters between a to z) and special characters (called "Metacharacters"), a concept in computer science. Regular expressions use a single string to describe and match a series of strings that match a certain syntax rule. They are usually used to retrieve and replace text that matches a certain pattern (rule). A regular expression is a logical formula for string operations. It uses some predefined specific characters and combinations of these specific characters to form a "regular string". This "regular string" is used to express a pair of characters. A filtering logic for strings. Since the main application object of regular expressions is text, it is used in various text editor situations.
正则表达式中的元字符通常具有特殊的含义。比如,对于元字符“?”:当该元字符紧跟在任何一个其他限制符(*,+,?,{n},{n,},{n,m})后面时,匹配模式是非贪婪的。非贪婪模式尽可能少地匹配所搜索的字符串,而默认的贪婪模式则尽可能多地匹配所搜索的字符串。例如,对于字符串“oooo”,“o+”将尽可能多地匹配“o”,得到结果[“oooo”],而“o+?”将尽可能少地匹配“o”,得到结果['o','o','o','o']。再比如,对于元字符“.”:匹配除“\n”和"\r"之外的任何单个字符。Metacharacters in regular expressions often have special meanings. For example, for the metacharacter "?": When the metacharacter immediately follows any other limiter (*, +,?, {n}, {n,}, {n,m}), the matching mode is non-greedy of. Non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much of the searched string as possible. For example, for the string "oooo", "o+" will match as many "o"s as possible, resulting in ["oooo"], while "o+?" will match as few "o"s as possible, resulting in ['o ','o','o','o']. For another example, for the metacharacter ".": match any single character except "\n" and "\r".
在这里,接收用户输入的包含匹配规则的正则表达式,其中匹配规则是通过字符串(通常包含普通字符和元字符)描述的。Here, a regular expression containing matching rules input by the user is received, where the matching rules are described by strings (usually containing ordinary characters and metacharacters).
步骤103:利用正则表达式中的匹配规则匹配列表。Step 103: Use the matching rules in the regular expression to match the list.
在一个实施方式中,步骤103中利用正则表达式匹配列表包括:利用匹配规则精确匹配列表。在精确匹配中,键将被精确匹配,并返回一个值,如果多个键匹配,将引发一个错误和消息,从而建议使用更精确的正则表达式,直到匹配单个键为止。如果此键根本不匹配,则不返回任何内容。例1:精确 匹配:"abc",则只能"abc"匹配,"ab"/"Abc"/"abcd"都不可以匹配。例2:精确匹配:"a\&c"(特殊字符需要转义),则只能"a&c"匹配,"ab"/"abc"/"a&cd"都不可以。In one embodiment, using a regular expression to match the list in step 103 includes: using a matching rule to match the list exactly. In an exact match, the key will be matched exactly and a value will be returned. If multiple keys match, an error and message will be raised, thus recommending the use of more precise regular expressions until a single key is matched. If this key does not match at all, nothing is returned. Example 1: Exact match: "abc", then only "abc" can match, "ab"/"Abc"/"abcd" cannot match. Example 2: Exact match: "a\&c" (special characters need to be escaped), then only "a&c" can be matched, not "ab"/"abc"/"a&cd".
在一个实施方式中,步骤103中利用正则表达式匹配列表包括:利用匹配规则模糊匹配列表,其中模糊匹配包含下列中的至少一个:不区分大小写;错字匹配。在这里,一个正则表达式将匹配多个键,并返回多个值,返回值是键匹配的值列表。例1:[...]可以匹配范围内的字符:"[abc]1",则"a1"/"b1"/"c1"匹配,"ab1"/"x1"都不可以匹配。例2:[...]可以匹配一个范围内的字符:"[a-f]1",则"a1"/"b1"/"f1"匹配,"q1"/"ab1"都不可以匹配。例3:[...]可以匹配范围内的字符:"[a-f0-9]{6}",则"1a2b3c"/"ffffff"/"ff0102"匹配,"abc12"/"A0000F"不可以匹配。In one embodiment, using a regular expression to match the list in step 103 includes: using a matching rule to fuzzy match the list, where fuzzy matching includes at least one of the following: case-insensitive; typo matching. Here, a regular expression will match multiple keys and return multiple values, and the return value is a list of values matched by the key. Example 1: [...] can match characters in the range: "[abc]1", then "a1"/"b1"/"c1" will match, but "ab1"/"x1" will not match. Example 2: [...] can match a range of characters: "[a-f]1", then "a1"/"b1"/"f1" will match, but "q1"/"ab1" will not match. Example 3: [...] can match characters in the range: "[a-f0-9]{6}", then "1a2b3c"/"ffffff"/"ff0102" matches, "abc12"/"A0000F" does not Can match.
举例1:用户输入正则表达式:D4['cfg.*siid'];Example 1: User input regular expression: D4['cfg.*siid'];
输出为:('query result:','cfg.*siid',′′,1)The output is: ('query result:','cfg.*siid',′′,1)
[[('predict_t_config',[[('predict_t_config',
'LIST_000000','LIST_000000',
'train_cfg_obj_PT','train_cfg_obj_PT',
'LIST_000000','LIST_000000',
'siid'),'35788']]'siid'),'35788']]
正则表达式为'cfg'*siid'匹配一个键,并返回一个带有一个结果的列表。结果由两部分组成:第一部分为:('predict_t_config',LIST_000000',…,'siid'),是匹配的键元组。第二部分是这个匹配键的值。如果使用传统方法,需要编写一行很长的代码来获得值“35788”。比如,D2['predict_t_config']['LIST_000000']['train_cfg_obj_PT']['LIST_000000']['siid']。The regular expression matches a key for 'cfg'*siid' and returns a list with one result. The result consists of two parts: the first part is: ('predict_t_config', LIST_000000', ..., 'siid'), which is the matching key tuple. The second part is the value of this matching key. If you use the traditional method, you need to write a long line of code to get the value "35788". For example, D2['predict_t_config']['LIST_000000']['train_cfg_obj_PT']['LIST_000000']['siid'].
假设想要得到键包括'obligate00'或obligate0'1'等的值,直到'obligate05'为止。使用传统的精确键获取方法将非常麻烦,因为必须编写更多的代码。Suppose you want to get the values whose keys include 'obligate00' or obligate0'1', etc., until 'obligate05'. Using traditional methods of exact key acquisition would be cumbersome because more code would have to be written.
举例2:输入正则表达式:D4['obligate[0-5]']Example 2: Enter the regular expression: D4['obligate[0-5]']
下面的结果返回此正则表达式'obligate[0-5]'的5个匹配值,这意味着匹配包含'obligate 0'的键,直到'obligate5'。The result below returns 5 matching values for this regular expression 'obligate[0-5]', which means matching keys containing 'obligate 0' up to 'obligate5'.
输出为:The output is:
Figure PCTCN2022115273-appb-000003
Figure PCTCN2022115273-appb-000003
这些基于正则表达式的方法提供了一种非常有效且稳健的方法来获取、设置和删除可能拼写错误的键。匹配的键可以是仅返回一个值的精确匹配,也可以是返回多个值的模糊匹配。可见,本发明实施方式通过模糊匹配,可以不区分大小写和支持错字匹配,从而降低了用户使用难度。These regular expression-based methods provide a very efficient and robust way to get, set, and remove keys that may be misspelled. The matched key can be an exact match, which returns only one value, or a fuzzy match, which returns multiple values. It can be seen that the embodiment of the present invention can be case-insensitive and support typo matching through fuzzy matching, thereby reducing user difficulty.
步骤104:将匹配列表的命中结果作为检索项,查询字典以访问对应于检索项的值。Step 104: Use the hit result of the matching list as the retrieval item, and query the dictionary to access the value corresponding to the retrieval item.
在这里,访问对应于检索项的值后,可以读取、删除或变更对应于检索项的值。Here, after accessing the value corresponding to the search term, the value corresponding to the search term can be read, deleted, or changed.
在一个实施方式中,字典包括N层的键,其中N为至少为2的正整数;步骤101中生成包含字典中的键的列表包括:解析字典,以生成N个列表,其中每个列表包含字典中的相同对应层的键;步骤102包括:接收包含各自匹配规则的N个正则表达式,其中N个正则表达式与N个列表一一对应;步骤103包括:用N个正则表达式中的每个正则表达式中的各自匹配规则与对应的列表进行匹配;步骤104包括:将每个正则表达式与对应的列表的命中结果作为检索项,以逐层匹配方式从对应层中确定出对应于检索项的值。In one embodiment, the dictionary includes N levels of keys, where N is a positive integer of at least 2; generating a list containing the keys in the dictionary in step 101 includes: parsing the dictionary to generate N lists, where each list contains Keys of the same corresponding layer in the dictionary; Step 102 includes: receiving N regular expressions containing respective matching rules, where N regular expressions correspond to N lists one-to-one; Step 103 includes: using N regular expressions The respective matching rules in each regular expression are matched with the corresponding list; step 104 includes: using the hit result of each regular expression and the corresponding list as a search term, and determining the matching rule from the corresponding layer in a layer-by-layer matching manner. The value corresponding to the retrieved item.
因此,本发明实施方式还利用对应于层数的多个正则表达式,以逐层匹配方式确定出值的实施方式,减少了列表的复杂度,降低了生成列表的难度。Therefore, the embodiment of the present invention also uses multiple regular expressions corresponding to the number of layers to determine the value in a layer-by-layer matching manner, thereby reducing the complexity of the list and reducing the difficulty of generating the list.
可以将本发明实施方式应用到多种场景中。比如,应用到工业园区中的电力数据存储中。The embodiments of the present invention can be applied to various scenarios. For example, it can be applied to power data storage in industrial parks.
可以在能源管理系统中实施本发明实施方式的数据访问方法,其中能源管理系统包含工业园区等区域性的能源管理系统以及集团、工厂、楼宇、微网等现场级的能源管理系统。下面以工业园区中的电力数据管理为例,对本 发明实施方式进行说明。图3是根据本发明实施方式的数据访问方法的第一示范性流程图。The data access method of the embodiment of the present invention can be implemented in an energy management system, where the energy management system includes regional energy management systems such as industrial parks and field-level energy management systems such as groups, factories, buildings, and microgrids. The following describes the implementation of the present invention, taking power data management in an industrial park as an example. Figure 3 is a first exemplary flow chart of a data access method according to an embodiment of the present invention.
步骤201:解析保存有工业园区中的电力数据的字典,以确定字典中的最底层键,其中字典中存储的、对应于最底层键的值中不嵌套其它字典。比如,电力数据中的键通常为地理位置信息,值可以为该地理位置信息处的用电量。由于地理位置信息通常冗长,因此直接提供准确的键以查询用电量具有相当的难度。Step 201: Parse the dictionary that stores the power data in the industrial park to determine the bottom-level key in the dictionary, where no other dictionaries are nested in the values stored in the dictionary that correspond to the bottom-level key. For example, the key in electricity data is usually the geographical location information, and the value can be the electricity consumption at the geographical location information. Since geolocation information is often verbose, it is quite difficult to directly provide accurate keys to query power usage.
比如,针对图2所示的字典为例,确定最底层键包括:‘a’、‘b’、‘c/c1’、‘c/c2’、‘c/d/d1’和‘c/d/d2’。For example, taking the dictionary shown in Figure 2 as an example, determine that the bottom-level keys include: 'a', 'b', 'c/c1', 'c/c2', 'c/d/d1' and 'c/d /d2'.
步骤202:生成包含字典中的最底层键的列表。Step 202: Generate a list containing the lowest-level keys in the dictionary.
比如,承接上例,生成列表[‘a’,‘b’,‘c/c1’,‘c/c2’,‘c/d/d1’,‘c/d/d2’]。基于列表中每一元素的键,可以确定出该元素在字典10中的地址。For example, following the above example, the list [‘a’, ‘b’, ‘c/c1’, ‘c/c2’, ‘c/d/d1’, ‘c/d/d2’] is generated. Based on the key of each element in the list, the address of that element in the dictionary 10 can be determined.
比如,对于元素‘a’:利用‘a’在字典10中的键,可以确定出‘a’在字典10中的地址;对于元素‘c/c1’:利用‘c’在字典10中的地址(由‘c’在字典10中的键决定)以及‘c1’在字典11中的地址(由‘c1’在字典11中的键决定),可以确定出‘c1’在字典10中地址。对于元素‘c/d/d1’:利用‘c’在字典10中的地址(由‘c’在字典10中的键决定)、‘d’在字典11中的地址(由‘d’在字典11中的键决定)以及‘d1’在字典12中的地址(由‘d1’在字典12中的键决定),可以确定出‘d1’在字典10中地址。For example, for element 'a': use the key of 'a' in dictionary 10 to determine the address of 'a' in dictionary 10; for element 'c/c1': use the address of 'c' in dictionary 10 (determined by the key of 'c' in dictionary 10) and the address of 'c1' in dictionary 11 (determined by the key of 'c1' in dictionary 11), the address of 'c1' in dictionary 10 can be determined. For element 'c/d/d1': use the address of 'c' in dictionary 10 (determined by the key of 'c' in dictionary 10), the address of 'd' in dictionary 11 (determined by the key of 'd' in dictionary 10 Determined by the key in dictionary 11) and the address of 'd1' in dictionary 12 (determined by the key of 'd1' in dictionary 12), the address of 'd1' in dictionary 10 can be determined.
步骤203:接收包含匹配规则的正则表达式。Step 203: Receive a regular expression containing a matching rule.
步骤204:利用正则表达式中的匹配规则匹配列表。Step 204: Use the matching rules in the regular expression to match the list.
步骤205:将匹配列表的命中结果作为检索项,查询字典以访问对应于检索项的值。Step 205: Use the hit result of the matching list as the retrieval item, and query the dictionary to access the value corresponding to the retrieval item.
其中,对于列表中包含“/”的元素,当最右侧的“/”处的键得到匹配时,即认为与正则表达式得到匹配,而无视最右侧的“/”的左边键的匹配状态。比如,对于‘c/d/d2’,当‘d2’命中正则表达式,则认定匹配成功,而无需考虑‘c’或‘d’是否命中正则表达式。Among them, for the elements containing "/" in the list, when the key at the rightmost "/" is matched, it is considered to be matched with the regular expression, regardless of the matching of the left key of the rightmost "/" state. For example, for 'c/d/d2', when 'd2' hits the regular expression, the match is considered successful, regardless of whether 'c' or 'd' hits the regular expression.
比如,当接收到X[‘c’],基于列表检索出‘c’的键,基于该键确定出‘c’在字 典10中的地址,再利用‘c’在第一字典10中的地址获取对应的值。再比如,当接收到X[‘c.*’],基于列表检索出‘c’的键,基于该键确定出‘c’在第一字典10中的地址,再利用‘c’在第一字典10中的地址获取对应的值,其中即使第一字典中的‘c’被错误写成‘c4’等错误表达,依然可以正确检索出‘c’在第一字典10中的地址。再比如,接收到X[‘d1’],从列表匹配出‘c/d/d1’的键,基于该键确定‘d1’在第一字典10中的地址,并利用该地址获取第一字典10中的对应的‘d1’的值。For example, when receiving Get the corresponding value. For another example, when receiving X['c.*'], the key of 'c' is retrieved based on the list, the address of 'c' in the first dictionary 10 is determined based on the key, and then 'c' is The address in the dictionary 10 obtains the corresponding value. Even if 'c' in the first dictionary is mistakenly written as 'c4' or other incorrect expressions, the address of 'c' in the first dictionary 10 can still be retrieved correctly. For another example, receive The corresponding value of 'd1' in 10.
图4是根据本发明实施方式的数据访问方法的第二示范性流程图。Figure 4 is a second exemplary flow chart of a data access method according to an embodiment of the present invention.
步骤301:解析保存有工业园区中的电力数据的字典(字典包括N层的键),以生成N个列表,其中每个列表包含字典中的相同对应层的键。比如,电力数据中的键通常为地理位置信息,值可以为该地理位置信息处的用电量。由于地理位置信息通常冗长,因此直接提供准确的键具有相当的难度。Step 301: Parse the dictionary storing the power data in the industrial park (the dictionary includes keys of N layers) to generate N lists, where each list includes keys of the same corresponding layer in the dictionary. For example, the key in electricity data is usually the geographical location information, and the value can be the electricity consumption at the geographical location information. Since geolocation information is often verbose, providing accurate keys directly is challenging.
比如,针对图2所示的字典为例,共有三层,因此N等于3。其中第一层的键包括:‘a’、‘b’、‘c’,第二层的键包括:‘c1’、‘c2’、‘d’,第三层的键包括:‘d1’和‘d2’。因此,生成第一列表[‘a’,‘b’,‘c’],第二列表[‘c1’,‘c2’,‘d’],第三列表[‘d1’,‘d2’]。For example, taking the dictionary shown in Figure 2 as an example, there are three levels, so N equals 3. The keys on the first level include: 'a', 'b', 'c', the keys on the second level include: 'c1', 'c2', 'd', and the keys on the third level include: 'd1' and 'd2'. Therefore, the first list [‘a’, ‘b’, ‘c’], the second list [‘c1’, ‘c2’, ‘d’], and the third list [‘d1’, ‘d2’] are generated.
步骤302:接收最多N个正则表达式,其中N个正则表达式与N个列表一一对应,每个正则表达式中包含各自的匹配规则。Step 302: Receive up to N regular expressions, where N regular expressions correspond to N lists one-to-one, and each regular expression contains its own matching rule.
基于需要获取的值所在的层数,确定需要接收的正则表达式的数目。比如,接收1个正则表达式:X[‘a’]以获取第一层中的‘a’的值。或者,接收2个正则表达式:X[‘c’]和X[‘c1’]以获取第二层中的‘c1’的值。或者,接收3个正则表达式:X[‘c’]、X[‘d’]和X[‘d2’]以获取第三层中的‘d2’的值。Determine the number of regular expressions that need to be received based on the level of the value that needs to be obtained. For example, receive a regular expression: X[‘a’] to get the value of ‘a’ in the first level. Alternatively, receive 2 regular expressions: X[‘c’] and X[‘c1’] to get the value of ‘c1’ in the second layer. Alternatively, receive 3 regular expressions: X[‘c’], X[‘d’] and X[‘d2’] to get the value of ‘d2’ in the third layer.
步骤303:将N个正则表达式中的每个正则表达式与对应的列表进行匹配。Step 303: Match each of the N regular expressions with the corresponding list.
步骤304:将匹配列表的命中结果作为检索项,查询字典以访问对应于检索项的值包括:将每个正则表达式与对应的列表的命中结果作为检索项,以逐层匹配方式从对应层中确定出对应于检索项的值。Step 304: Using the hit result of the matching list as the search item, querying the dictionary to access the value corresponding to the search item includes: using the hit result of each regular expression and the corresponding list as the search item, and matching from the corresponding layer in a layer-by-layer manner. Determine the value corresponding to the search term.
比如,当接收到1个正则表达式:X[‘a’],可以将正则表达式与第一列表 匹配获取以第一层中的‘a’的键,并利用该键检测字典10从而获取到值1。For example, when receiving a regular expression: to the value 1.
比如,当接收到分别对应于从外到里2层的2个正则表达式:X[‘c’]和X[‘c1’]时,可以将对应于第一层(最外面)的X[‘c’]与第一列表匹配获取以第一层中的‘c’的键,并利用该键检测字典10以获取对应的值,即字典11。然后,将对应于第二层的X[‘c1’]与第二列表匹配获取以第二层中的‘c1’的键,并利用‘c1’的键检测字典11从而获取到值,即5。For example, when receiving two regular expressions corresponding to the two layers from the outside to the inside: X['c'] and X['c1'], you can convert the 'c'] matches the first list to obtain the key with 'c' in the first level, and uses this key to detect dictionary 10 to obtain the corresponding value, that is, dictionary 11. Then, match X['c1'] corresponding to the second layer with the second list to obtain the key of 'c1' in the second layer, and use the key of 'c1' to detect dictionary 11 to obtain the value, that is, 5 .
比如,当接收到分别对应于从外到里3层的3个正则表达式:X[‘c’]、X[‘d’]和X[‘d2’]时,可以将对应于第一层(最外面)的X[‘c’]与第一列表匹配获取以第一层中的‘c’的键,并利用该键检测第一字典10从而获取到值,即第二字典11。然后,将对应于第二层的X[‘d’]与第二列表匹配获取以第二层中的‘d’的键,即第三字典12,将对应于第三层的X[‘d2’]与第三列表匹配获取以第三层中的‘d2’的键,利用‘d2’的键检测第三字典12从而获取到值,即8。For example, when receiving three regular expressions corresponding to the three layers from the outside to the inside: X['c'], X['d'] and X['d2'], you can The (outermost) X['c'] matches the first list to obtain the key of 'c' in the first level, and uses this key to detect the first dictionary 10 to obtain the value, that is, the second dictionary 11. Then, match the X['d'] corresponding to the second layer with the second list to obtain the key with 'd' in the second layer, that is, the third dictionary 12, and match the X['d2 corresponding to the third layer '] matches the third list to obtain the key of 'd2' in the third layer, and uses the key of 'd2' to detect the third dictionary 12 to obtain the value, that is, 8.
本发明实施方式还提出了数据访问装置。图5是根据本发明实施方式的数据访问装置的示范性结构图。如图5所示,数据访问装置500包括:The embodiment of the present invention also provides a data access device. Figure 5 is an exemplary structural diagram of a data access device according to an embodiment of the present invention. As shown in Figure 5, the data access device 500 includes:
列表生成模块501,被配置为生成包含字典中的键的列表,其中字典以键值对的方式存储数据,列表支持正则匹配;The list generation module 501 is configured to generate a list containing keys in the dictionary, where the dictionary stores data in the form of key-value pairs, and the list supports regular matching;
接收模块502,被配置为接收包含匹配规则的正则表达式,其中匹配规则是通过字符串描述的;The receiving module 502 is configured to receive a regular expression containing a matching rule, where the matching rule is described by a string;
匹配模块503,被配置为利用正则表达式中的匹配规则匹配列表;The matching module 503 is configured to match the list using the matching rules in the regular expression;
查询模块504,被配置为将匹配列表的命中结果作为检索项,查询字典以访问对应于检索项的值。The query module 504 is configured to use the hit result of the matching list as a retrieval item, and query the dictionary to access the value corresponding to the retrieval item.
在示范性实施方式中,列表生成模块501,被配置为解析字典,以确定字典中的最底层键,其中字典中存储的、对应于最底层键的值中不嵌套其它字典;生成包含字典中的最底层键的列表。In an exemplary embodiment, the list generation module 501 is configured to parse the dictionary to determine the bottom-level key in the dictionary, where the values stored in the dictionary corresponding to the bottom-level key are not nested in other dictionaries; generate a dictionary containing The list of lowest-level keys in .
在示范性实施方式中,列表生成模块501,被配置为当最底层键具有层之间的从属关系时,在列表中利用正斜杠作为表征从属关系的连接符。In an exemplary embodiment, the list generation module 501 is configured to use a forward slash as a connector characterizing the dependency relationship in the list when the lowest-level key has a dependency relationship between layers.
在示范性实施方式中,字典包括N层的键,其中N为至少为2的正整数;列表生成模块501,被配置为解析字典,以生成N个列表,其中每个列表包 含字典中的相同对应层的键;接收模块502,被配置为接收包含各自匹配规则的N个正则表达式,其中N个正则表达式与N个列表一一对应;匹配模块503,被配置为利用N个正则表达式中的每个正则表达式中的各自匹配规则与对应的列表进行匹配;查询模块504,被配置为将每个正则表达式与对应的列表的命中结果作为检索项,以逐层匹配方式从对应层中确定出对应于检索项的值。In an exemplary embodiment, the dictionary includes N levels of keys, where N is a positive integer of at least 2; the list generation module 501 is configured to parse the dictionary to generate N lists, where each list contains the same key in the dictionary. The key of the corresponding layer; the receiving module 502 is configured to receive N regular expressions containing respective matching rules, where the N regular expressions correspond to N lists one-to-one; the matching module 503 is configured to utilize N regular expressions The respective matching rules in each regular expression in the formula are matched with the corresponding list; the query module 504 is configured to use the hit result of each regular expression and the corresponding list as a retrieval item, in a layer-by-layer matching manner. The value corresponding to the retrieval item is determined in the corresponding layer.
在示范性实施方式中,匹配模块503,被配置为利用匹配规则精确匹配列表;或利用匹配规则模糊匹配列表,其中模糊匹配包含下列中的至少一个:不区分大小写;错字匹配。In an exemplary embodiment, the matching module 503 is configured to accurately match the list using matching rules; or fuzzy matching the list using matching rules, where fuzzy matching includes at least one of the following: case-insensitive; typo matching.
本发明实施方式还提出了一种具有处理器-存储器架构的电子设备。图6是根据本发明实施方式电子设备的示范性结构图。The embodiment of the present invention also provides an electronic device with a processor-memory architecture. FIG. 6 is an exemplary structural diagram of an electronic device according to an embodiment of the present invention.
如图6所示,电子设备600包括处理器601、存储器602及存储在存储器602上并可在处理器601上运行的计算机程序,计算机程序被处理器601执行时实现如上任一种的数据访问方法。其中,存储器602具体可以实施为电可擦可编程只读存储器(EEPROM)、快闪存储器(Flash memory)、可编程程序只读存储器(PROM)等多种存储介质。处理器601可以实施为包括一或多个中央处理器或一或多个现场可编程门阵列,其中现场可编程门阵列集成一或多个中央处理器核。具体地,中央处理器或中央处理器核可以实施为CPU或MCU或DSP,等等。As shown in Figure 6, the electronic device 600 includes a processor 601, a memory 602, and a computer program stored in the memory 602 and executable on the processor 601. When the computer program is executed by the processor 601, any of the above data accesses are implemented. method. Among them, the memory 602 can be implemented as various storage media such as electrically erasable programmable read-only memory (EEPROM), flash memory (Flash memory), programmable programmable read-only memory (PROM), etc. The processor 601 may be implemented to include one or more central processing units or one or more field programmable gate arrays, where the field programmable gate array integrates one or more central processing unit cores. Specifically, the central processing unit or central processing unit core may be implemented as a CPU, an MCU, a DSP, or the like.
需要说明的是,上述各流程和各结构图中不是所有的步骤和模块都是必须的,可以根据实际的需要忽略某些步骤或模块。各步骤的执行顺序不是固定的,可以根据需要进行调整。各模块的划分仅仅是为了便于描述采用的功能上的划分,实际实现时,一个模块可以分由多个模块实现,多个模块的功能也可以由同一个模块实现,这些模块可以位于同一个设备中,也可以位于不同的设备中。It should be noted that not all steps and modules in the above-mentioned processes and structure diagrams are necessary, and some steps or modules can be ignored according to actual needs. The execution order of each step is not fixed and can be adjusted as needed. The division of each module is only for the convenience of describing the functional division. In actual implementation, one module can be implemented by multiple modules, and the functions of multiple modules can also be implemented by the same module. These modules can be located on the same device. , or it can be on a different device.
各实施方式中的硬件模块可以以机械方式或电子方式实现。例如,一个硬件模块可以包括专门设计的永久性电路或逻辑器件(如专用处理器,如FPGA或ASIC)用于完成特定的操作。硬件模块也可以包括由软件临时配置的可编程逻辑器件或电路(如包括通用处理器或其它可编程处理器)用于执 行特定操作。至于具体采用机械方式,或是采用专用的永久性电路,或是采用临时配置的电路(如由软件进行配置)来实现硬件模块,可以根据成本和时间上的考虑来决定。The hardware modules in various embodiments may be implemented mechanically or electronically. For example, a hardware module may include specially designed permanent circuits or logic devices (such as a dedicated processor such as an FPGA or ASIC) to perform specific operations. Hardware modules may also include programmable logic devices or circuits (such as general-purpose processors or other programmable processors) temporarily configured by software to perform specific operations. As for the specific use of mechanical means, or the use of dedicated permanent circuits, or the use of temporarily configured circuits (such as configured by software) to implement the hardware modules, it can be decided based on cost and time considerations.
以上所述,仅为本发明的较佳实施方式而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.

Claims (13)

  1. 一种数据访问方法(100),其特征在于,所述方法(100)包括:A data access method (100), characterized in that the method (100) includes:
    生成包含字典中的键的列表,其中所述字典以键值对的方式存储数据,所述列表支持正则匹配(101);Generate a list containing keys in a dictionary, where the dictionary stores data in the form of key-value pairs, and the list supports regular matching (101);
    接收包含匹配规则的正则表达式,其中所述匹配规则是通过字符串描述的(102);Receive a regular expression containing a matching rule, where the matching rule is described by a string (102);
    利用所述正则表达式中的所述匹配规则匹配所述列表(103);Match the list using the matching rule in the regular expression (103);
    将匹配所述列表的命中结果作为检索项,查询所述字典以访问对应于所述检索项的值(104)。Using hits matching the list as search terms, the dictionary is queried to access values corresponding to the search terms (104).
  2. 根据权利要求1所述的方法(100),其特征在于,The method (100) according to claim 1, characterized in that:
    所述生成包含字典中的键的列表(101)包括:Generating a list (101) containing keys in a dictionary includes:
    解析所述字典,以确定所述字典中的最底层键,其中所述字典中存储的、对应于所述最底层键的值中不嵌套其它字典;Parse the dictionary to determine the lowest-level key in the dictionary, wherein no other dictionaries are nested in the values stored in the dictionary corresponding to the lowest-level key;
    生成包含所述字典中的最底层键的列表。Generate a list containing the lowest-level keys in the dictionary.
  3. 根据权利要求2所述的方法(100),其特征在于,当所述最底层键具有层之间的从属关系时,在所述列表中利用正斜杠作为表征所述从属关系的连接符。The method (100) according to claim 2, characterized in that when the bottom-layer key has a dependency relationship between layers, a forward slash is used in the list as a connector characterizing the dependency relationship.
  4. 根据权利要求1所述的方法(100),其特征在于,所述字典包括N层的键,其中N为至少为2的正整数;The method (100) according to claim 1, characterized in that the dictionary includes keys of N levels, where N is a positive integer of at least 2;
    所述生成包含字典中的键的列表(101)包括:解析所述字典,以生成N个列表,其中每个列表包含所述字典中的相同对应层的键;Said generating a list containing keys in the dictionary (101) includes: parsing the dictionary to generate N lists, wherein each list contains keys of the same corresponding level in the dictionary;
    所述接收包含匹配规则的正则表达式(102)包括:接收包含各自匹配规则的N个正则表达式,其中所述N个正则表达式与所述N个列表一一对应;The receiving regular expressions containing matching rules (102) includes: receiving N regular expressions containing respective matching rules, wherein the N regular expressions correspond to the N lists one-to-one;
    所述利用正则表达式中的所述匹配规则匹配所述列表(103)包括:利用所述N个正则表达式中的每个正则表达式中的各自匹配规则与对应的列表进行匹配;The using the matching rules in the regular expressions to match the list (103) includes: using the respective matching rules in each of the N regular expressions to match the corresponding list;
    所述将匹配所述列表的命中结果作为检索项,查询所述字典以访问对应于所述检索项的值(104)包括:将每个正则表达式与对应的列表的命中结果作为检索项,以逐层匹配方式从对应层中确定出对应于所述检索项的值。The step of using the hit result matching the list as a retrieval item and querying the dictionary to access the value corresponding to the retrieval item (104) includes: using the hit result of each regular expression and the corresponding list as a retrieval item, The value corresponding to the retrieval term is determined from the corresponding layer in a layer-by-layer matching manner.
  5. 根据权利要求1-4中任一项所述的方法(100),其特征在于,所述利用正则表达式中的所述匹配规则匹配所述列表(103)包括:The method (100) according to any one of claims 1-4, characterized in that, using the matching rule in a regular expression to match the list (103) includes:
    利用所述匹配规则精确匹配所述列表;或Exactly match the list using the matching rules; or
    利用所述匹配规则模糊匹配所述列表,其中所述模糊匹配包含下列中的至少一个:The list is fuzzy matched using the matching rule, wherein the fuzzy match includes at least one of the following:
    不区分大小写;错字匹配。Case-insensitive; typo matching.
  6. 一种数据访问装置(500),其特征在于,所述装置(500)包括:A data access device (500), characterized in that the device (500) includes:
    列表生成模块(501),被配置为生成包含字典中的键的列表,其中所述字典以键值对的方式存储数据,所述列表支持正则匹配;The list generation module (501) is configured to generate a list containing keys in a dictionary, where the dictionary stores data in the form of key-value pairs, and the list supports regular matching;
    接收模块(502),被配置为接收包含匹配规则的正则表达式,其中所述匹配规则是通过字符串描述的;The receiving module (502) is configured to receive a regular expression containing a matching rule, where the matching rule is described by a character string;
    匹配模块(503),被配置为利用所述正则表达式中的所述匹配规则匹配所述列表;A matching module (503) configured to match the list using the matching rule in the regular expression;
    查询模块(504),被配置为将匹配所述列表的命中结果作为检索项,查询所述字典以访问对应于所述检索项的值。The query module (504) is configured to use hit results matching the list as retrieval items, and query the dictionary to access values corresponding to the retrieval items.
  7. 根据权利要求6所述的装置(500),其特征在于,The device (500) according to claim 6, characterized in that
    所述列表生成模块(501),被配置为解析所述字典,以确定所述字典中的最底层键,其中所述字典中存储的、对应于所述最底层键的值中不嵌套其它字典;生成包含所述字典中的最底层键的列表。The list generation module (501) is configured to parse the dictionary to determine the bottom-level key in the dictionary, wherein the value stored in the dictionary corresponding to the bottom-level key does not nest other values. Dictionary; generates a list containing the lowest-level keys in the dictionary.
  8. 根据权利要求7所述的装置(500),其特征在于,The device (500) according to claim 7, characterized in that
    所述列表生成模块(501),被配置为当所述最底层键具有层之间的从属关系时,在所述列表中利用正斜杠作为表征所述从属关系的连接符。The list generation module (501) is configured to use a forward slash as a connector characterizing the dependency relationship in the list when the lowest-level key has a dependency relationship between layers.
  9. 根据权利要求6所述的装置(500),其特征在于,所述字典包括N层的键,其中N为至少为2的正整数;The device (500) according to claim 6, characterized in that the dictionary includes N levels of keys, where N is a positive integer of at least 2;
    所述列表生成模块(501),被配置为解析所述字典,以生成N个列表,其中每个列表包含字典中的相同对应层的键;The list generation module (501) is configured to parse the dictionary to generate N lists, where each list contains keys of the same corresponding layer in the dictionary;
    所述接收模块(502),被配置为接收包含各自匹配规则的N个正则表达式,其中所述N个正则表达式与所述N个列表一一对应;The receiving module (502) is configured to receive N regular expressions containing respective matching rules, wherein the N regular expressions correspond to the N lists one-to-one;
    所述匹配模块(503),被配置为利用所述N个正则表达式中的每个正则表达式中的各自匹配规则与对应的列表进行匹配;The matching module (503) is configured to use respective matching rules in each of the N regular expressions to match the corresponding list;
    所述查询模块(504),被配置为将每个正则表达式与对应的列表的命中结果作为检索项,以逐层匹配方式从对应层中确定出对应于所述检索项的值。The query module (504) is configured to use the hit result of each regular expression and the corresponding list as a retrieval item, and determine the value corresponding to the retrieval item from the corresponding layer in a layer-by-layer matching manner.
  10. 根据权利要求6-9中任一项所述的装置(500),其特征在于,The device (500) according to any one of claims 6-9, characterized in that,
    所述匹配模块(503),被配置为利用所述匹配规则精确匹配所述列表;或利用所述 匹配规则模糊匹配所述列表,其中所述模糊匹配包含下列中的至少一个:The matching module (503) is configured to accurately match the list using the matching rules; or to fuzzy match the list using the matching rules, wherein the fuzzy matching includes at least one of the following:
    不区分大小写;错字匹配。Case-insensitive; typo matching.
  11. 一种电子设备(600),其特征在于,包括:An electronic device (600), characterized by including:
    处理器(601);processor(601);
    存储器(602),用于存储所述处理器的可执行指令;Memory (602), used to store executable instructions of the processor;
    所述处理器(601),用于从所述存储器(602)中读取所述可执行指令,并执行所述可执行指令以实施权利要求1-5中任一项所述的数据访问方法(100)。The processor (601) is configured to read the executable instructions from the memory (602) and execute the executable instructions to implement the data access method according to any one of claims 1-5 (100).
  12. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述计算机指令被处理器执行时实施权利要求1-5中任一项所述的数据访问方法(100)。A computer-readable storage medium having computer instructions stored thereon, characterized in that when the computer instructions are executed by a processor, the data access method (100) described in any one of claims 1-5 is implemented.
  13. 一种计算机程序产品,其特征在于,包括计算机程序,所述计算机程序被处理器执行时实施权利要求1-5中任一项所述的数据访问方法(100)。A computer program product, characterized by comprising a computer program that implements the data access method (100) according to any one of claims 1-5 when executed by a processor.
PCT/CN2022/115273 2022-08-26 2022-08-26 Data access method and apparatus, electronic device, and computer-readable storage medium WO2024040607A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/115273 WO2024040607A1 (en) 2022-08-26 2022-08-26 Data access method and apparatus, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/115273 WO2024040607A1 (en) 2022-08-26 2022-08-26 Data access method and apparatus, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2024040607A1 true WO2024040607A1 (en) 2024-02-29

Family

ID=90012229

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115273 WO2024040607A1 (en) 2022-08-26 2022-08-26 Data access method and apparatus, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
WO (1) WO2024040607A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070265833A1 (en) * 2006-05-13 2007-11-15 Akihiro Nakayama Constructing regular-expression dictionary for textual analysis
CN102750300A (en) * 2011-12-27 2012-10-24 浙江大学 High-performance unstructured data access protocol supporting multi-granularity searching.
CN103123638A (en) * 2011-11-21 2013-05-29 北京神州泰岳软件股份有限公司 Data searching method and data searching device
CN109299376A (en) * 2018-10-26 2019-02-01 深圳点猫科技有限公司 It is a kind of that method and device is searched for generally based on education cloud operating system
CN109558722A (en) * 2018-12-06 2019-04-02 南方电网科学研究院有限责任公司 A kind of move media inspection method, device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070265833A1 (en) * 2006-05-13 2007-11-15 Akihiro Nakayama Constructing regular-expression dictionary for textual analysis
CN103123638A (en) * 2011-11-21 2013-05-29 北京神州泰岳软件股份有限公司 Data searching method and data searching device
CN102750300A (en) * 2011-12-27 2012-10-24 浙江大学 High-performance unstructured data access protocol supporting multi-granularity searching.
CN109299376A (en) * 2018-10-26 2019-02-01 深圳点猫科技有限公司 It is a kind of that method and device is searched for generally based on education cloud operating system
CN109558722A (en) * 2018-12-06 2019-04-02 南方电网科学研究院有限责任公司 A kind of move media inspection method, device and computer readable storage medium

Similar Documents

Publication Publication Date Title
US10311092B2 (en) Leveraging corporal data for data parsing and predicting
US9411840B2 (en) Scalable data structures
Shekarpour et al. Question answering on interlinked data
Dreßler et al. On the efficient execution of bounded jaro-winkler distances
Kejriwal et al. An unsupervised instance matcher for schema-free RDF data
US10810258B1 (en) Efficient graph tree based address autocomplete and autocorrection
US10552398B2 (en) Database records associated with a tire
Kargar et al. Efficient duplication free and minimal keyword search in graphs
CN105706092A (en) Methods and systems of four-valued simulation
TW202004527A (en) Database access method and device
Alrehamy et al. SemCluster: unsupervised automatic keyphrase extraction using affinity propagation
WO2024040607A1 (en) Data access method and apparatus, electronic device, and computer-readable storage medium
Zeng et al. Linking entities in short texts based on a Chinese semantic knowledge base
US11226970B2 (en) System and method for tagging database properties
Bauer et al. Accurate maximum-margin training for parsing with context-free grammars
CN116501834A (en) Address information processing method and device, mobile terminal and storage medium
US10949465B1 (en) Efficient graph tree based address autocomplete and autocorrection
CN112861495A (en) Method for generating impala SQL statement based on Excel template file
Yan et al. RDF knowledge graph keyword type search using frequent patterns
Ding et al. Multi-schema matching based on clustering techniques
Kolkman Cross-domain textual geocoding: the influence of domain-specific training data
Ghanbarpour et al. Survey on Ranking Functions in Keyword Search over Graph-Structured Data.
CN110471901B (en) Data importing method and terminal equipment
CN109492218B (en) Synonym quick replacement method based on finite state machine determination
Thapa Use Case Driven Evaluation of Database Systems for ILDA

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22956152

Country of ref document: EP

Kind code of ref document: A1