CN113378017B

CN113378017B - Naming convention checking method and device

Info

Publication number: CN113378017B
Application number: CN202110743864.2A
Authority: CN
Inventors: 陈鑫; 董德才; 王瑞琦; 闫凌珍; 李彤敏
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2024-02-02
Anticipated expiration: 2041-06-30
Also published as: CN113378017A

Abstract

The embodiment of the application provides a naming specification checking method and device, when a terminal displays a first interface and receives a first operation for a first keyword on the first interface, the terminal can acquire one or more data names corresponding to the first keyword from a first database of a server, so that the terminal can group any one data name to obtain one or more first names corresponding to any one data name, further, when the similarity between characters corresponding to any one first name and characters corresponding to a second name in a preset digital dictionary is smaller than a first threshold, the terminal can modify the characters corresponding to the first name into characters corresponding to the second name, or when the similarity between the number of digits corresponding to any one first name and the number of digits corresponding to the second name is smaller than a third threshold, the terminal can mark the digits corresponding to the first name, display the marked first names, the operation process is simple, and the execution efficiency is high.

Description

Naming convention checking method and device

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a naming convention checking method and device.

Background

With the increasing of the service demands of users, the service data is in a rapid increasing trend, when the terminal analyzes the service data, the terminal cannot accurately analyze the service demands of the users due to the fact that the naming of the service data is manually input and incorrect naming may cause that the terminal is very important to check the normalization of the naming of the data.

In a possible mode, the terminal can adopt a test verification method or a model verification method to realize normative checking of data naming; the test verification method is to write a test case in a simulation environment according to data naming, take standard data naming as the input of the test case, and perform normalization check on the data naming through processes such as symbol execution, simulation or rapid prototyping, so as to find the data naming to be changed; the basic idea of the model verification method is to represent the terminal as an automaton model, and represent the named data attribute by a logic formula, and further, the model verifies the logic formula in an exhaustive state space mode, so that the named data needing to be changed is found.

However, when the normalization of data naming is checked by adopting the test verification method, different test cases need to be written aiming at different data naming, the time for writing the test cases is long, and the execution time for checking the data naming based on the written test cases is long, and the time is complex and time-consuming; when the model verification method is adopted to check the normalization of data naming, different logic formulas are required to be written aiming at different data naming attributes, and the method is complicated and time-consuming.

Disclosure of Invention

In a first aspect, an embodiment of the present application provides a naming convention checking method, including: displaying a first interface, the first interface including an input area; receiving a first operation for a first keyword at an input area; the first keyword is used for describing the data identification; responding to the first operation, and acquiring one or more data names corresponding to the first keyword from a first database of the server; wherein the first database comprises one or more types of business data, any one of which is named as a literal and/or numerical description; grouping any one of the data names to obtain one or more first names corresponding to the any one of the data names; wherein any one of the first designations is the designation of an alphanumeric description; when the similarity between the characters corresponding to any one of the first names and the characters corresponding to the second names in the preset digital dictionary is smaller than a first threshold value, modifying the characters corresponding to the first names into the characters corresponding to the second names; the data dictionary comprises a second naming corresponding to the first naming, the characters corresponding to the second naming are characters indicated at the corresponding position of any one data naming when the repetition rate of the characters at the corresponding position after a plurality of data naming groups is larger than a second threshold; or when the similarity of the number of the digits corresponding to any one of the first names and the number of the digits corresponding to the second names is smaller than a third threshold value, marking the digits corresponding to the first names, and displaying the marked first names; and when the number of the numbers corresponding to the second naming is the number of the numbers indicated in the corresponding positions of the data naming groups and the number repetition rate of the numbers in the corresponding positions is larger than a fourth threshold value, the number of the numbers indicated in the corresponding positions of any one of the data naming groups is the number of the numbers indicated in the corresponding positions of the data naming groups.

In one possible implementation manner, when the number of words in the words corresponding to any one of the first names is plural, the similarity between the words corresponding to the first names and the words corresponding to the second names satisfies the following formula:

wherein A is _i B is the number of times that the ith word in the first set appears in the text corresponding to the first naming _i The number of times that the ith word in the first union appears in the text corresponding to the second naming is the first union, and n is the number of words in the first union, wherein the first union is the union formed by the text corresponding to the first naming and the text corresponding to the second naming.

In one possible implementation manner, when the number of words in the words corresponding to any one of the first names is one, the similarity between the words corresponding to any one of the first names and the words corresponding to the second names corresponding to the preset digital dictionary is smaller than a first threshold, including:

when words in the words corresponding to the first naming are not equal to words in the words corresponding to the second naming, the similarity between the words corresponding to the first naming and the words corresponding to the second naming in the preset digital dictionary is smaller than a first threshold.

In one possible implementation manner, grouping any one of the data names to obtain one or more first names corresponding to the any one of the data names includes:

Determining a first phrase consisting of an Nth word and an (n+1) th word in any data naming; wherein N is a positive integer greater than or equal to 1; when the first phrase is successfully matched in a preset phrase library, determining that the first phrase is a first name;

or when the first phrase is not successfully matched in the preset phrase library and a first number exists between the N-th word and the (n+1) -th word, determining the first number as a first name;

or when the first phrase is not successfully matched in the preset phrase library and a first number exists between the N-th word and the N+1th word, determining the N+1th word as a first name.

In one possible implementation, the method further includes: displaying a second interface; the second interface includes an editing region; receiving a second operation for a second naming in the editing area; responding to the second operation, and modifying the characters corresponding to the second naming into the characters corresponding to the third naming; or, in response to the second operation, modifying that the number of digits corresponding to the second naming matches the number of digits corresponding to the fourth naming.

In a second aspect, an embodiment of the present application provides a naming convention checking apparatus, including a display unit and a processing unit;

The display unit is used for displaying a first interface, and the first interface comprises an input area;

a processing unit for receiving a first operation for a first keyword at an input area; the first keyword is used for describing the data identification;

the processing unit is further used for responding to the first operation, and acquiring one or more data names corresponding to the first keywords from a first database of the server; wherein the first database comprises one or more types of business data, any one of which is named as a literal and/or numerical description;

the processing unit is also used for grouping any one of the data names to obtain one or more first names corresponding to any one of the data names; wherein any one of the first designations is the designation of an alphanumeric description;

the processing unit is further used for modifying the characters corresponding to the first names into the characters corresponding to the second names when the similarity between the characters corresponding to any one of the first names and the characters corresponding to the second names in the preset digital dictionary is smaller than a first threshold; the data dictionary comprises a second naming corresponding to the first naming, the characters corresponding to the second naming are characters indicated at the corresponding position of any one data naming when the repetition rate of the characters at the corresponding position after a plurality of data naming groups is larger than a second threshold;

Or, the processing unit is further configured to mark the number corresponding to the first name and display the marked first name when the similarity between the number of the number corresponding to any one of the first names and the number of the number corresponding to the second name is smaller than a third threshold; and when the number of the numbers corresponding to the second naming is the number of the numbers indicated in the corresponding positions of the data naming groups and the number repetition rate of the numbers in the corresponding positions is larger than a fourth threshold value, the number of the numbers indicated in the corresponding positions of any one of the data naming groups is the number of the numbers indicated in the corresponding positions of the data naming groups.

In one possible implementation manner, when the number of words in the text corresponding to any one of the first names is one, the processing unit is specifically configured to:

In a possible implementation, the processing unit is specifically further configured to: :

In a possible implementation, the display unit is further configured to display a second interface; the second interface includes an editing region; the processing unit is further used for receiving a second operation aiming at a second naming in the editing area; the processing unit is further used for responding to the second operation and modifying the characters corresponding to the second naming into the characters corresponding to the third naming; or, the processing unit is further configured to modify, in response to the second operation, that the number of digits corresponding to the second naming matches the number of digits corresponding to the fourth naming.

In a third aspect, embodiments of the present application provide a naming convention checking apparatus, the apparatus comprising a processor and a memory, the memory for storing code instructions, the processor for executing the code instructions to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein a computer program or instructions which, when run on a computer, cause the computer to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In a sixth aspect, embodiments of the present application provide a naming convention inspection system, the system comprising: the second aspect and various possible implementations of the second aspect.

In a seventh aspect, the present application provides a chip or chip system comprising at least one processor and a communication interface, the communication interface and the at least one processor being interconnected by wires, the at least one processor being adapted to execute a computer program or instructions to perform the method described in the first aspect or any one of the possible implementations of the first aspect; the communication interface in the chip can be an input/output interface, a pin, a circuit or the like.

In one possible implementation, the chip or chip system described above in the present application further includes at least one memory, where the at least one memory has instructions stored therein. The memory may be a memory unit within the chip, such as a register, a cache, etc., or may be a memory unit of the chip (e.g., a read-only memory, a random access memory, etc.).

It should be understood that, the second aspect to the seventh aspect of the embodiments of the present application correspond to the technical solutions of the first aspect of the embodiments of the present application, and the beneficial effects obtained by each aspect and the corresponding possible implementation manner are similar, and are not repeated.

Drawings

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic diagram of service data provided in an embodiment of the present application;

FIG. 3 is a flowchart illustrating a naming convention checking method according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a naming convention checking method according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a naming convention checking method according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating a naming convention checking method according to an embodiment of the present application;

Fig. 7 is a schematic diagram of a naming convention checking device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

In order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. For example, the first information and the second information are merely for distinguishing different information, and the order thereof is not limited. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application, as shown in fig. 1, where the application scenario includes an operator 101, a terminal 102, and a server 103, where the terminal 102 requests service data to the server 103 in response to an operation of the operator 101, and further, the terminal 102 may display the service data on the terminal 102 based on the service data returned by the server 103, and analyze the service data, so that a service requirement of a user may be obtained according to the analysis; wherein the terminal 102 and the server 103 may communicate via a network.

Based on the embodiment shown in fig. 1, for better describing service data displayed on the terminal 102, fig. 2 is an exemplary schematic diagram of service data provided in the embodiment of the present application, where fig. 2 includes service data 1 and service data 2, when the operator 101 notifies the operation terminal 102, it may cause relevant information of the service data 1 to be displayed on the terminal 102, for example, relevant information of the service data 1 includes a service number or a user name, where, when the service number is 001, the user name is a; when the service number is 002, the user name is B; similarly, the terminal 102 may also display related information of the service data 2, and the specific content of the related information of the service data 2 is not limited in this embodiment.

As shown in fig. 2, since the service number or the user name is manually input and stored in the server 103, it is understood that normalization check is required for the input data name before the manually input data is stored in the server 103.

In a possible manner, the terminal can adopt a test verification method or a model verification method to realize normalization check on data naming.

The test verification method is to write a test case in a simulation environment according to data naming, take standard data naming as input of the test case, and perform normalization check on the data naming through processes such as symbol execution, simulation or rapid prototyping, so that the data naming needing to be changed is found.

The model verification method is a formal verification method based on an automaton theory, the basic idea of the model verification method is to represent a terminal as an automaton model, the attribute of data naming is represented by a logic formula, and furthermore, the model verifies the logic formula in an exhaustive state space mode, so that the data naming needing to be changed is found.

It can be understood that expert review can also be adopted to realize normalization check on the naming of the input data; the expert review method is to manually check the input data naming by the organization related personnel so as to ensure that the data naming can achieve the expected effect.

For example, when the data is consumption data of the user, if the data naming includes time and name, the data naming can reflect the consumption record of the user at a certain time, so that the terminal responds to the operation of the operator, by analyzing the consumption data of the user corresponding to the data naming, the operator can know the consumption trend of the user based on the result of the terminal analysis, and further, the operator can provide the service meeting the user requirement for the user according to the consumption trend of the user.

However, when the terminal performs normalization check on the data naming, the test verification method and the model verification method described above have some problems.

For example, when the normalization of data naming is checked by adopting a test verification method, different test cases need to be written for different data naming, the time for writing the test cases is long, and the execution time for checking the data naming based on the written test cases is long, and the time is complex and time-consuming; when the model verification method is adopted to check the normalization of data naming, although the model verification method can ensure the quality of data naming check by a terminal and reduce the possibility of error of data naming check, the model verification method is not flexible enough, and different logic formulas are required to be written aiming at different data naming attributes, which is complicated and time-consuming.

Moreover, when the normalization of data naming is checked by adopting an expert review method, professional organizations are required to check the naming of the data manually, the data naming check is likely to have missing problems, the integrity of the data naming check cannot be ensured, the workload is high, the time consumption is long, and the accuracy is not high.

It will be appreciated that the purpose of naming data is to facilitate identification of the data, and typically there is a unified specification or standard for naming data, which aims to uniquely identify the data, thereby facilitating easy identification, statistics, and convenient management of the data.

For example, taking an identification card number owned by each person as an example, a general identification card is an 18-bit number, the first 2-bit number is a provincial code, the 3 rd and 4 th digits are city codes, the 5 th and 6 th digits are county codes, the 7 th to 14 th digits are the date of birth, the 15 th and 16 th digits are the place of birth dispatch code, the 17 th digit is a gender code, and the 18 th digit is a check code. Thus, the identity can be verified conveniently under the condition that the meaning represented by the number on the identity card number is understood.

Based on this, the embodiment of the application provides an intelligent, quick and convenient method for checking the normalization of data naming, when a terminal displays a first interface and receives a first operation for a first keyword on the first interface, the terminal responds to the first operation and can acquire one or more data naming corresponding to the first keyword from a first database of a server, so that the terminal can group any one data naming to obtain one or more first naming corresponding to any one data naming, further, when the similarity of characters corresponding to any one first naming and characters corresponding to a second naming corresponding to a preset digital dictionary is smaller than a first threshold, the terminal can modify the characters corresponding to the first naming into characters corresponding to the second naming, or when the similarity of numbers corresponding to any one first naming and numbers corresponding to the second naming is smaller than a third threshold, the terminal can mark the numbers corresponding to the first naming and display the marked first naming, and further, because the terminal groups the data naming and is based on the first naming after grouping and the second naming is not required to be compared with a logic formula, the terminal can execute a simple test case operation.

It should be noted that, the digital dictionary defines and describes data items, data structures, data streams, data storage or processing logic, etc. of data, or it should be understood that the data dictionary is a set of describing data information, and the data dictionary is a set defined for all data elements used in the computer system, so that the data dictionary is a dictionary storing required data information.

For example, an item has an item number, an item name, or an item manager, which can be stored with an item data dictionary, so that when the item data dictionary is used, data information required for the item can be managed easily.

It should be noted that, the process of calculating, by the terminal, the similarity between the text corresponding to the first name and the text corresponding to the second name corresponding to the preset number dictionary, or the process of calculating, by the terminal, the similarity between the number of digits corresponding to the first name and the number of digits corresponding to the second name may be understood as a process of implementing natural language processing (natural languague processing, NLP) by the terminal.

The NLP is an important direction in the fields of computer science and artificial intelligence, and is researched by various theories and methods capable of realizing effective communication between people and terminals by natural language.

The method of the embodiment of the application can be applied to a terminal, which can be a personal digital processing (personal digital assistant, PDA) device, a handheld device (such as a tablet computer) with a wireless communication function, a computing device (such as a personal computer (personal computer, PC)), and the like.

It can be understood that the specific content of the terminal may be set according to an actual application scenario, and the embodiment of the present application is not limited.

In connection with the above description, for example, fig. 3 is a schematic flow chart of a naming convention checking method provided in an embodiment of the present application, as shown in fig. 3, the method may include the following steps:

s301: the terminal displays a first interface.

In this embodiment of the present application, the first interface may be understood as an interface for naming data of the terminal for analyzing service data, where the service data may include project data or consumption data, and the specific content of the service data may be set according to an actual application scenario, and the embodiment of the present application is not limited.

Wherein the first interface includes an input area such that the terminal can perform S302 based on an operation of an operator.

S302: the terminal receives a first operation for a first keyword in the input area.

In the embodiment of the application, the first keyword is used for describing the data identifier, for example, the first keyword is a project number, and the project number is the data identifier; it can be understood that the specific content of the first keyword may be set according to an actual application scenario, which is not limited in the embodiments of the present application.

In this embodiment of the present application, the first operation may be understood as a search operation of the first keyword by the operator, and the operator may obtain the data name for which the naming standardization is to be verified through the search, and thus the terminal may execute S303.

S303: and the terminal responds to the first operation, and acquires one or more data names corresponding to the first keyword from a first database of the server.

In this embodiment of the present invention, the terminal may be configured to obtain, from a first database of the server, one or more data names corresponding to the first keyword based on a search operation of the first keyword by the operator, and because the first database includes one or more types of service data, the operator may obtain a data name that is intended to verify naming normalization.

Where any one data is named a literal and/or numeric description, for example, where the data is named a literal description, the data naming may be understood as a naming that employs a natural language description.

S304: the terminal groups any one data name to obtain one or more first names corresponding to the any one data name.

In this embodiment of the present application, the terminal groups any one of the data names to obtain one or more first names corresponding to any one of the data names, where possible implementation manners are: the terminal determines a first phrase consisting of an N-th word and an N+1-th word in any data naming; when the first phrase is successfully matched in a preset phrase library, determining that the first phrase is a first name; or when the first phrase is not successfully matched in the preset phrase library and a first number exists between the N-th word and the (n+1) -th word, determining the first number as a first name; or when the first phrase is not successfully matched in the preset phrase library and a first number exists between the N-th word and the N+1th word, determining the N+1th word as a first name; wherein N is a positive integer greater than or equal to 1; the successful matching of the first phrase in the preset phrase library can be understood as that the terminal can inquire the first phrase in the preset phrase library, and the unsuccessful matching of the first phrase in the preset phrase library can be understood as that the terminal cannot inquire the first phrase in the preset phrase library.

For example, the data is named { name gender 2020 th bit }, when n=1, the first phrase is { name }, and the terminal can query the name in the preset phrase library, so { name } is the first name; when n=2, the first phrase is { naming }, the terminal cannot query the naming in the preset phrase library, so that the { naming } is not the first naming; when n=3, the first phrase is { gender }, and the terminal can query the gender in the preset phrase library, so that { gender } is a first name; when n=4, the first phrase is { distinguished }, the terminal cannot find distinguished from the preset phrase library, and {2020}, {2020} is the first number between the distinguished and the first phrase, so that {2020} is the first name, and { third } is the first name; when n=5, the first phrase is { bit }, and there is {4} between the first and second bits, {4} is the first number, so {4} is the first name, { bit } is the first name.

When the data is named { name gender 2020 th bit 4} in combination with the above analysis, the first names corresponding to the data names are { name }, { gender }, {2020}, { th }, {4}, and { bit }, respectively.

The preset phrase library can be a phrase library set by a terminal in a factory, or can be added to the terminal later, and specific content of phrases included in the preset phrase library can be set according to actual application scenes.

It can be understood that, the grouping of the data naming by the terminal can be understood as that the terminal performs grouping processing on the data naming on the phrase level, wherein when the terminal performs grouping processing on the data naming on the sentence level, the grouping processing process is too complex, and likewise, when the terminal performs grouping processing on the data naming on the symbol level, the execution time of the processing process is long, so that the execution efficiency is high by performing grouping processing on the data naming on the phrase level, and the efficiency of checking the normalization of the data naming by the terminal can be improved.

S305: when the similarity between the characters corresponding to any one of the first names and the characters corresponding to the second names in the preset digital dictionary is smaller than a first threshold value, the terminal modifies the characters corresponding to the first names into the characters corresponding to the second names.

In this embodiment of the present application, the data dictionary includes a second name corresponding to the first name, where the second name may be preset by an operator, or may be set by the terminal after analyzing the first name, so that the terminal may verify naming standardization of the first name according to the second name in the digital dictionary.

And when the repetition rate of the characters corresponding to the second naming in the corresponding positions of the plurality of data naming groups is larger than a second threshold value, the characters indicated in the corresponding positions of any one of the data naming are displayed.

For example, when the second threshold is 50%, the characters corresponding to the positions after the 4 data naming groups are { construction }, and { construction }, respectively, and when the characters are { construction }, the character repetition rate is 100% × (3/4) =75%, 75% >50%, and therefore, the terminal can set the characters corresponding to the second naming to { construction }.

It may be appreciated that the specific value of the second threshold may be set according to an actual application scenario, which is not limited in the embodiments of the present application.

In this embodiment of the present application, when the number of words in the text corresponding to any one of the first names is different, the method for determining the similarity between the text corresponding to the first name and the text corresponding to the second name is also different.

When the number of words in the words corresponding to any one of the first names is a plurality of words, the terminal can calculate the similarity between the words corresponding to the first names and the words corresponding to the second names based on the cosine similarity calculation method, and the similarity between the words corresponding to the first names and the words corresponding to the second names satisfies the following formula:

For example, the text corresponding to the first naming is a building, the text corresponding to the second naming is a building, and it can be known that the first set is { building, line }, wherein the first word { building } in the first set appears 1 time in the text corresponding to the first naming, the second word { building } in the first set appears 1 time in the text corresponding to the first naming, and the third word { line } in the first set appears 0 time in the text corresponding to the first naming, thus, a ₁ ＝1，A ₂ ＝1，A ₃ =0; similarly, the first word { building } in the first set appears 1 time in the text corresponding to the second naming, the second word { building } in the first set appears 0 time in the text corresponding to the second naming, and the third word { row } in the first set appears 1 time in the text corresponding to the second naming, therefore, B ₁ ＝1，B ₂ ＝0，B ₃ ＝1。

Further, will A ₁ ＝1，A ₂ ＝1，A ₃ =0 and B ₁ ＝1，B ₂ ＝0，B ₃ As shown in the above formula, the similarity=100% × (1/2) =50%.

Thus, when the first threshold is 100%, since 50% <100%, the terminal may modify the text corresponding to the first name to the text corresponding to the second name, for example, the terminal may modify the building to be a building.

For example, the first name corresponds to the letter of the province and the second name corresponds to the identity card, and it can be known that the first union is { body, identity, province }, wherein the first word { body } in the first union appears 0 times in the first name corresponds to the letter, the second word { identity } in the first union appears 1 time in the first name corresponds to the letter, the third word { identity } in the first union appears 1 time in the first name corresponds to the letter, and the fourth word { province } in the first union appears 1 time in the first name corresponds to the letter, therefore, A ₁ ＝0，A ₂ ＝1，A ₃ ＝1，A ₄ =1; similarly, the first word { body } in the first set appears 1 time in the text corresponding to the second naming, the second word { share } in the first set appears 1 time in the text corresponding to the second naming, the third word { certificate } in the first set appears 1 time in the text corresponding to the second naming, and the fourth word { province } in the first set appears 0 time in the text corresponding to the second naming, therefore, B ₁ ＝1，B ₂ ＝1，B ₃ ＝1，B ₄ ＝0。

Further, will A ₁ ＝0，A ₂ ＝1，A ₃ ＝1，A ₄ =1, B ₁ ＝1，B ₂ ＝1，B ₃ ＝1，B ₄ As can be seen from the above formula, the similarity=100% × (2/3) ≡67%.

Thus, when the first threshold is 100%, since 67% <100%, the terminal may modify the text corresponding to the first name to the text corresponding to the second name, e.g., the terminal may modify the provincial certificate to an identity card.

For example, the text corresponding to the first name is an identity card, the text corresponding to the second name is an identity card, and it can be known that the first union is { body, identity }, where the first word { body } in the first union appears 1 time in the text corresponding to the first name, the second word { identity } in the first union appears 1 time in the text corresponding to the first name, and the third word { identity } in the first union appears 1 time in the text corresponding to the first name, thus, a ₁ ＝1，A ₂ ＝1，A ₃ =1; similarly, the first word { body } in the first set appears 1 time in the text corresponding to the second name, the second word { share } in the first set appears 1 time in the text corresponding to the second name, and the third word { certificate } in the first set appears 1 time in the text corresponding to the second name, therefore, B ₁ ＝1，B ₂ ＝1，B ₃ ＝1。

Further, will A ₁ ＝1，A ₂ ＝1，A ₃ =1 and B ₁ ＝1，B ₂ ＝1，B ₃ As shown in the above formula, the similarity is 100% = 1.

Thus, when the first threshold is 100%, since 100% =100%, this means that the text corresponding to the first name is the canonical data name, the terminal does not need to modify the text corresponding to the first name into the text corresponding to the second name.

When the number of words in the words corresponding to any one of the first names is one, and when the words in the words corresponding to the first names are not equal to the words in the words corresponding to the second names, the similarity between the words corresponding to the first names and the words corresponding to the second names in the preset digital dictionary is smaller than a first threshold, and the terminal modifies the words corresponding to the first names into the words corresponding to the second names.

For example, the text corresponding to the first name is { build }, the text corresponding to the second name is { row }, and { build }, is not equal to { row }, so the similarity between the text corresponding to the first name and the text corresponding to the second name is smaller than the first threshold, and therefore, the terminal may modify the text corresponding to the first name to the text corresponding to the second name, e.g., the terminal may modify the build to the row.

It may be understood that the specific value of the first threshold may also be set according to an actual scenario, which is not limited in the embodiments of the present application.

S306: when the similarity of the number of the digits corresponding to any one of the first names and the number of the digits corresponding to the second names is smaller than a third threshold value, the terminal marks the digits corresponding to the first names and displays the marked first names.

In this embodiment of the present application, the number of digits corresponding to the second naming is the number of digits indicated in the corresponding position of any one data naming when the number repetition rate of digits in the corresponding position after the plurality of data naming packets is greater than the fourth threshold.

For example, when the fourth threshold is 50%, the numbers at the corresponding positions after the 4 data naming packets are 3,3,4,3, and the repetition rate when the numbers are 3 is 100% × (3/4) =75%, 75% >50%, so the terminal may set the number of numbers corresponding to the second letter to be 3.

It may be appreciated that the specific value of the fourth threshold may also be set according to an actual scenario, which is not limited in the embodiments of the present application.

Further, the terminal may determine the number of digits corresponding to the first naming according to the number of data corresponding to the second text, and further, when the similarity between the number of digits corresponding to any one of the first naming and the number of digits corresponding to the second naming is smaller than a third threshold, the terminal marks the number corresponding to the first naming, and displays the marked first naming.

In this embodiment of the present application, the similarity between the number of digits corresponding to the first name and the number of digits corresponding to the second name satisfies the following formula: similarity = 100% × (number of digits corresponding to the first naming/number of digits corresponding to the second naming).

It may be understood that a specific formula of similarity between the number of digits corresponding to the first name and the number of digits corresponding to the second name may also be set according to an actual application scenario, which is not limited in this embodiment.

In this embodiment of the present application, the terminal may use a color to highlight a number corresponding to the first name, and the specific implementation manner of marking the number corresponding to the first name by the terminal may be set according to an actual application scenario, which is not limited in this embodiment of the present application.

For example, when the third threshold is 100%, the number corresponding to the first naming is {002}, the number corresponding to the second naming is {0003}, the number of the numbers corresponding to the first naming is 3, the number of the numbers corresponding to the second naming is 4, the second similarity is 75%,75% <100%, and therefore, the terminal marks {002}, and displays the marked {002}.

It may be appreciated that the specific value of the third threshold may also be set according to an actual scenario, which is not limited in the embodiments of the present application.

When comparing numbers corresponding to the first name and numbers corresponding to the second name, only whether the numbers corresponding to the first name and the numbers corresponding to the second name are the same or not is needed, and when the numbers corresponding to the first name and the numbers corresponding to the second name are different, the first name cannot be considered as the standard name, because in the actual application scenario, the numbers corresponding to the first name have different meanings represented in different scenarios, and therefore, the numbers corresponding to the first name should be compared in combination with the actual application scenario.

In summary, in the embodiment shown in fig. 3, after the terminal displays the first interface and receives the first operation for the first keyword on the first interface, the terminal may obtain, in response to the first operation, one or more data names corresponding to the first keyword from the first database of the server, so that the terminal groups any one of the data names to obtain one or more first names corresponding to any one of the data names, and further, when the similarity between the text corresponding to any one of the first names and the text corresponding to the second name corresponding to the preset number dictionary is smaller than the first threshold, the terminal may modify the text corresponding to the first name to the text corresponding to the second name, or when the similarity between the number of the digits corresponding to any one of the first names and the number of the digits corresponding to the second name is smaller than the third threshold, the terminal may mark the digits corresponding to the first name, and display the marked first names.

Compared with an expert review method, the method reduces the labor intensity, and is simple in operation process and high in accuracy; compared with a test verification method, the method of the embodiment of the application performs normalization check after the data are named and grouped, has high execution efficiency, does not need to write different test cases according to different data names, and has simple operation process; compared with a model verification method, the method of the embodiment of the application has high flexibility, different logic formulas do not need to be written according to different data names, the operation process is simple, and the execution efficiency is high.

On the basis of the embodiment shown in fig. 3, fig. 4 is an exemplary flow chart of a naming convention checking method according to an embodiment of the present application, where the embodiment of the present application is used to illustrate a case that an operator changes a second naming in a digital dictionary, as shown in fig. 4, and includes the following steps:

s401: and the terminal displays a second interface.

In this embodiment of the present application, the second interface may be understood as an interface where the operator opens the second name in the digital dictionary on the terminal, and since the second interface includes an editing area, the operator may change the second name in the digital dictionary through the editing area.

S402: the terminal receives a second operation for the second naming in the edit area.

In this embodiment of the present application, the second operation may be understood as an editing operation of the second name by the operator, and the operator may change the second name through editing, so that the terminal may execute S403 based on the editing operation of the second name by the operator.

S403: and the terminal responds to the second operation and modifies the characters corresponding to the second naming into the characters corresponding to the third naming.

In the embodiment of the application, due to different data naming and different scenes of data naming application, the terminal can modify the characters corresponding to the second naming into the characters corresponding to the third naming, so that the characters corresponding to the third naming can carry out normalization check on the data naming; the specific content of the text corresponding to the third naming may be set according to the actual scenario, which is not limited in the embodiment of the present application.

S404: and the terminal responds to the second operation, and modifies the number of the digits corresponding to the second naming to be matched with the number of the digits corresponding to the fourth naming.

In the embodiment of the application, because of different data naming and different scenes of data naming application, the terminal can modify the number of the numbers corresponding to the second naming to be matched with the number of the numbers corresponding to the fourth naming, so that the number of the numbers corresponding to the fourth naming can be subjected to normalization inspection; the specific value of the number of the digits corresponding to the fourth naming may be set according to an actual scenario, which is not limited in the embodiment of the present application.

In summary, in the embodiment shown in fig. 4, the terminal may display the second interface and receive the second operation for the second naming in the editing area, so that, in response to the second operation, the terminal may modify the text corresponding to the second naming to the text corresponding to the third naming, or, in response to the second operation, the terminal may modify that the number of the digits corresponding to the second naming matches the number of the digits corresponding to the fourth naming, so that the operation process is simple, and therefore, the terminal may perform normalization inspection on the data naming in different application scenarios.

With reference to the foregoing description, exemplary, fig. 5 is a schematic flow chart of a naming convention checking method provided in the embodiment of the present application, as shown in fig. 5, a terminal may obtain a data name from a first database of a server, after the terminal performs packet processing on the data name, a first name corresponding to the data name may be obtained, and further, the terminal may implement a normalization check of the first name based on similarity between the first name and a second name corresponding to a preset digital dictionary, so that when similarity between a word corresponding to the first name and a word corresponding to the second name corresponding to the preset digital dictionary is smaller than a first threshold, the terminal modifies the word corresponding to the first name to be a word corresponding to the second name; or when the similarity of the number of the digits corresponding to the first naming and the number of the digits corresponding to the second naming is smaller than a third threshold, the terminal marks the digits corresponding to the first naming and displays the marked first naming.

In order to better describe the method of the embodiment of the present application, in conjunction with the foregoing, by way of example, fig. 6 is a schematic flow chart of a naming convention checking method provided in the embodiment of the present application, where the first keyword is exemplified by the item number.

In this embodiment of the present application, the terminal may obtain the data name corresponding to the project number from the first database of the server, as shown in fig. 6, where the data name may include { construction technology 2019 No. 100 }, { construction technology 2020 No. 220 }, may be { construction technology 2019 No. 150 }, { construction bank 2019 No. 1234 }, and therefore, after the data name is subjected to packet processing, the first name may be obtained, where the first name may include { construction/technology/2019/No. 100/no }, { construction/technology/2020/No. 220/No., { construction/technology/2019/No. 150/No., { construction/bank/2019/No. 1234/no }).

As shown in fig. 6, a second name corresponding to the first name in the preset number dictionary is { construction/science/technology/xxxx/xx/number }, where xxxx is used to represent numbers, and the number of numbers is 4 bits, and similarly, xxx represents numbers, and the number of numbers is 3 bits.

Further, the terminal compares the first name corresponding to any one of the data names with the second name, and thereby the terminal can perform normalization check on the first name corresponding to any one of the data names.

For example, when the first phrase is { construction/bank/2019/1234/number }, the number of digits at the corresponding position indicated by the second naming by the terminal should be 3 bits, and the terminal judges {1234} to be an irregular naming, so the terminal marks {1234} black; in addition, the terminal judges that { construction } and { bank } are also non-canonical names, so that the terminal can modify { construction } to { construction } and { bank } to { technology } based on the content of the text corresponding to the second naming.

It should be noted that, in the embodiments shown in fig. 3 to fig. 6, when the number of the first names obtained by the data naming packet is different from the number corresponding to the second names, the terminal may mark the first names corresponding to the data names, and display the marked data names.

For example, referring to fig. 6, when the data is named { construction technology project 2019 No. 100 }, the first names after the data naming are grouped are { construction/technology/project/2019/No. 100/No., the number of the first names corresponding to the data naming is 7, but the number of the second names corresponding to the preset digital dictionary is 6, so the terminal may mark and display the data named { construction technology project 2019 No. 100 }.

The method according to the embodiment of the present application is described above with reference to fig. 3 to 6, and the naming convention checking device for executing the method according to the embodiment of the present application is described below. It will be appreciated by those skilled in the art that the methods and apparatus may be combined and referred to, and that the naming convention checking apparatus provided in the embodiments of the present application may perform the steps in the naming convention checking method described above.

Fig. 7 is a schematic diagram of a naming convention checking device according to an embodiment of the present application, and as shown in fig. 7, the naming convention checking device 700 may be a terminal, or a chip system applied in the terminal.

The naming convention checking apparatus 700 includes: a display unit 701 and a processing unit 702. Wherein, the display unit 701 is used for supporting the step of the naming convention checking device to perform display, and the processing unit 702 is used for supporting the step of the naming convention checking device to perform information processing.

Exemplary, the display unit 701 is configured to display a first interface, where the first interface includes an input area;

a processing unit 702 for receiving a first operation for a first keyword at an input area; the first keyword is used for describing the data identification;

The processing unit 702 is further configured to obtain, in response to the first operation, one or more data names corresponding to the first keyword from a first database of the server; wherein the first database comprises one or more types of business data, any one of which is named as a literal and/or numerical description;

the processing unit 702 is further configured to group any one of the data names, so as to obtain one or more first names corresponding to any one of the data names; wherein any one of the first designations is the designation of an alphanumeric description;

the processing unit 702 is further configured to modify the text corresponding to the first name to the text corresponding to the second name when the similarity between the text corresponding to any one of the first names and the text corresponding to the second name in the preset digital dictionary is smaller than a first threshold; the data dictionary comprises a second naming corresponding to the first naming, the characters corresponding to the second naming are characters indicated at the corresponding position of any one data naming when the repetition rate of the characters at the corresponding position after a plurality of data naming groups is larger than a second threshold;

or, the processing unit 702 is further configured to mark the number corresponding to the first naming and display the marked first naming when the similarity between the number of the number corresponding to any one of the first naming and the number of the number corresponding to the second naming is less than the third threshold; and when the number of the numbers corresponding to the second naming is the number of the numbers indicated in the corresponding positions of the data naming groups and the number repetition rate of the numbers in the corresponding positions is larger than a fourth threshold value, the number of the numbers indicated in the corresponding positions of any one of the data naming groups is the number of the numbers indicated in the corresponding positions of the data naming groups.

In one possible implementation manner, when the number of words in the text corresponding to any one of the first names is one, the processing unit 702 is specifically configured to:

In a possible implementation manner, the processing unit 702 is specifically further configured to: :

In a possible implementation manner, the display unit 701 is further configured to display a second interface; the second interface includes an editing region; the processing unit 702 is further configured to receive a second operation for a second naming in the editing area; the processing unit 702 is further configured to modify, in response to the second operation, a text corresponding to the second name to a text corresponding to the third name; or, the processing unit 702 is further configured to modify, in response to the second operation, that the number of digits corresponding to the second naming matches the number of digits corresponding to the fourth naming.

In one possible implementation manner, the naming convention checking apparatus may further include: a storage unit 703. The memory unit 703 may include one or more memories, which may be one or more devices, circuits, or devices for storing programs or data.

The memory unit 703 may exist independently and is connected with the display unit 701 and the processing unit 702 through a communication bus; the memory unit 703 may also be integrated with the processing unit 702.

The apparatus of this embodiment may be correspondingly configured to perform the steps performed in the foregoing method embodiments, and the implementation principle and technical effects are similar, which are not described herein again.

Fig. 8 is a schematic structural diagram of a chip according to an embodiment of the present application. Chip 800 includes one or more (including two) processors 810 and a communication interface 830.

In some implementations, the memory 840 stores the following elements: executable modules or data structures, or a subset thereof, or an extended set thereof.

In an embodiment of the present application, memory 840 may include read only memory and random access memory, and provides instructions and data to processor 810. A portion of memory 840 may also include non-volatile random access memory (non-volatile random access memory, NVRAM).

In the present embodiment, memory 840, communication interface 830, and memory 840 are coupled together by bus system 820. The bus system 820 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For ease of description, the various buses are labeled as bus system 820 in FIG. 8.

The methods described in the embodiments of the present application may be applied to the processor 810 or implemented by the processor 810. The processor 810 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in processor 810. The processor 810 may be a general purpose processor (e.g., a microprocessor or a conventional processor), a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), an off-the-shelf programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gates, transistor logic, or discrete hardware components, and the processor 810 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the invention.

The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a state-of-the-art storage medium such as random access memory, read-only memory, programmable read-only memory, or charged erasable programmable memory (electrically erasable programmable read only memory, EEPROM). The storage medium is located in the memory 840, and the processor 810 reads information in the memory 840 and performs the steps of the method described above in connection with its hardware.

In the above embodiments, the instructions stored by the memory for execution by the processor may be implemented in the form of a computer program product. The computer program product may be written in the memory in advance, or may be downloaded in the form of software and installed in the memory.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL), or wireless (e.g., infrared, wireless, microwave, etc.), or semiconductor medium (e.g., solid state disk, SSD)) or the like.

Embodiments of the present application also provide a computer-readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Computer readable media can include computer storage media and communication media and can include any medium that can transfer a computer program from one place to another. The storage media may be any target media that is accessible by a computer.

As one possible design, the computer-readable medium may include compact disk read-only memory (CD-ROM), RAM, ROM, EEPROM, or other optical disk memory; the computer readable medium may include disk storage or other disk storage devices. Moreover, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital versatile disc (digital versatile disc, DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

Combinations of the above should also be included within the scope of computer-readable media. The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A naming convention checking method, said method comprising:

displaying a first interface, the first interface including an input area;

receiving a first operation for a first keyword at the input area; the first keyword is used for describing a data identifier;

responding to the first operation, and acquiring one or more data names corresponding to the first keyword from a first database of a server; wherein the first database comprises one or more types of business data, any one of which is named as a literal and/or numerical description;

grouping any one of the data names to obtain one or more first names corresponding to any one of the data names; wherein any one of the first designations is a designation of an alphanumeric description;

When the similarity between any word corresponding to the first name and the word corresponding to the second name in the preset digital dictionary is smaller than a first threshold value, modifying the word corresponding to the first name into the word corresponding to the second name; the digital dictionary comprises the second names corresponding to the first names, the characters corresponding to the second names are characters indicated at the corresponding positions of the data names when the repetition rate of the characters at the corresponding positions after the data names are grouped is greater than a second threshold value;

or when the similarity of the number of the digits corresponding to any one of the first names and the number of the digits corresponding to the second names is smaller than a third threshold value, marking the digits corresponding to the first names, and displaying the marked first names; and when the number of the numbers corresponding to the second naming is the number of the numbers indicated in the corresponding positions of the data naming groups and the number repetition rate of the numbers in the corresponding positions is larger than a fourth threshold value, any one of the numbers is indicated in the corresponding positions of the data naming groups.

2. The method of claim 1, wherein when the number of words in the text corresponding to any one of the first names is plural, the similarity between the text corresponding to the first name and the text corresponding to the second name satisfies the following formula:

Wherein A is _i B, for the number of times that the ith word in the first set appears in the text corresponding to the first naming _i And for the number of times that the ith word in the first union appears in the words corresponding to the second naming, the first union is a union formed by the words corresponding to the first naming and the words corresponding to the second naming, and n is the number of words in the first union.

3. The method according to claim 1, wherein when the number of words in the words corresponding to any one of the first names is one, the similarity between the words corresponding to any one of the first names and the words corresponding to the second names corresponding to the preset digital dictionary is smaller than a first threshold, including:

when the words in the words corresponding to the first naming are not equal to the words in the words corresponding to the second naming, the similarity between the words corresponding to the first naming and the words corresponding to the second naming in a preset digital dictionary is smaller than the first threshold.

4. A method according to any one of claims 1-3, wherein said grouping any one of said data names to obtain one or more first names corresponding to any one of said data names comprises:

Determining a first phrase consisting of an N-th word and an N+1-th word in any data naming; wherein N is a positive integer greater than or equal to 1;

when the first phrase is successfully matched in a preset phrase library, determining that the first phrase is the first name;

or when the first phrase is not successfully matched in the preset phrase library and a first number exists between the N-th word and the (n+1) -th word, determining that the first number is the first name;

or when the first phrase is not successfully matched in the preset phrase library and the first number exists between the N-th word and the N+1th word, determining that the N+1th word is the first name.

5. A method according to any one of claims 1-3, further comprising:

displaying a second interface; the second interface includes an editing region;

receiving a second operation for the second naming in the editing area;

responding to the second operation, and modifying the characters corresponding to the second naming into the characters corresponding to the third naming;

or, in response to the second operation, modifying that the number of digits corresponding to the second naming matches the number of digits corresponding to the fourth naming.

6. A naming convention checking apparatus, characterized in that the apparatus comprises a display unit and a processing unit;

the processing unit is used for receiving a first operation aiming at a first keyword in the input area; the first keyword is used for describing a data identifier;

the processing unit is further used for responding to the first operation and acquiring one or more data names corresponding to the first keyword from a first database of a server; wherein the first database comprises one or more types of business data, any one of which is named as a literal and/or numerical description;

the processing unit is further configured to group any one of the data names to obtain one or more first names corresponding to any one of the data names; wherein any one of the first designations is a designation of an alphanumeric description;

the processing unit is further configured to modify, when a similarity between any word corresponding to the first name and a word corresponding to a second name corresponding to a preset digital dictionary is smaller than a first threshold, the word corresponding to the first name to be a word corresponding to the second name; the digital dictionary comprises the second names corresponding to the first names, the characters corresponding to the second names are characters indicated at the corresponding positions of the data names when the repetition rate of the characters at the corresponding positions after the data names are grouped is greater than a second threshold value;

Or, the processing unit is further configured to mark the number corresponding to the first name and display the marked first name when the similarity between the number of the numbers corresponding to the first name and the number of the numbers corresponding to the second name is less than a third threshold; and when the number of the numbers corresponding to the second naming is the number of the numbers indicated in the corresponding positions of the data naming groups and the number repetition rate of the numbers in the corresponding positions is larger than a fourth threshold value, any one of the numbers is indicated in the corresponding positions of the data naming groups.

7. The apparatus of claim 6, wherein when the number of words in the text corresponding to any one of the first names is plural, the similarity between the text corresponding to the first name and the text corresponding to the second name satisfies the following formula:

8. A naming convention checking apparatus, comprising a processor and a memory, said memory for storing code instructions; the processor is configured to execute the code instructions to perform the method of any of claims 1-5.

9. A computer readable storage medium storing instructions that, when executed, cause a computer to perform the method of any one of claims 1-5.