CN111949756A - Hazardous chemical substance retrieval method, hazardous chemical substance retrieval device, electronic equipment and medium - Google Patents

Hazardous chemical substance retrieval method, hazardous chemical substance retrieval device, electronic equipment and medium Download PDF

Info

Publication number
CN111949756A
CN111949756A CN202010686460.XA CN202010686460A CN111949756A CN 111949756 A CN111949756 A CN 111949756A CN 202010686460 A CN202010686460 A CN 202010686460A CN 111949756 A CN111949756 A CN 111949756A
Authority
CN
China
Prior art keywords
information
chemical substance
dangerous
hazardous chemical
hazardous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010686460.XA
Other languages
Chinese (zh)
Inventor
薛媛媛
马江峰
张伦玮
吴云汉
王磊彬
赵其龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang Shunxin And Supply Chain Management Of Ltd By Share Ltd
Original Assignee
Xinjiang Shunxin And Supply Chain Management Of Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang Shunxin And Supply Chain Management Of Ltd By Share Ltd filed Critical Xinjiang Shunxin And Supply Chain Management Of Ltd By Share Ltd
Priority to CN202010686460.XA priority Critical patent/CN111949756A/en
Publication of CN111949756A publication Critical patent/CN111949756A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a dangerous chemical substance retrieval method, a dangerous chemical substance retrieval device, electronic equipment and a medium. The dangerous chemical substance searching method in the application comprises the following steps: acquiring search content of dangerous chemicals to be processed, and detecting whether the search content comprises a chemical abstract registration number or not, wherein the chemical abstract registration number has a record in a preset database; if not, matching indexes stored in a preset hazardous chemical substance database with the search contents according to a preset regular expression to obtain hazardous chemical substance information carrying corresponding hazardous chemical substance identifications, wherein the matching comprises matching by using Chinese indexes and matching by using English indexes; under the condition that the dangerous chemical substance information is one, outputting the dangerous chemical substance information; and under the condition that the dangerous chemical information is at least two, traversing the dangerous chemical identifiers of the at least two dangerous chemical information to remove the dangerous chemical information with repeated dangerous chemical identifiers, obtaining non-repeated dangerous chemical information, outputting the non-repeated dangerous chemical information, and accurately and quickly retrieving the dangerous chemical information.

Description

Hazardous chemical substance retrieval method, hazardous chemical substance retrieval device, electronic equipment and medium
Technical Field
The invention relates to the technical field of data processing, in particular to a dangerous chemical substance searching method, a dangerous chemical substance searching device, electronic equipment and a medium.
Background
Dangerous chemicals are also called dangerous chemicals for short, and refer to highly toxic chemicals and other chemicals which have properties of toxicity, corrosion, explosion, combustion supporting and the like and are harmful to human bodies, facilities and environment. Because hazardous chemicals are seriously harmful, related departments of the country make titles and management regulations of hazardous chemicals which are easy to prepare toxic materials and easy to explode, and key management is performed on the two hazardous chemicals.
For the general people, because of the lack of necessary professional knowledge, the danger of the contact and use operations is large, and accidents are likely to happen. Since the CAS number is often used for hazardous Chemical substances, i.e., a registration number prepared by the american Chemical Abstracts Service (CAS) for Chemical substances, the CAS number of the hazardous Chemical substance cannot be accurately known to query, and it is difficult to obtain professional knowledge about the information about the nature of the hazardous Chemical substance.
Disclosure of Invention
The application provides a dangerous chemical substance retrieval method, a dangerous chemical substance retrieval device, electronic equipment and a medium.
In a first aspect, a hazardous chemical substance retrieval method is provided, which includes:
acquiring search contents of hazardous chemicals to be processed, and detecting whether the search contents of the hazardous chemicals comprise a chemical abstract registration number or not, wherein the chemical abstract registration number has a record in a preset database;
if not, matching indexes stored in a preset hazardous chemical substance database with the search contents according to a preset regular expression to obtain hazardous chemical substance information, wherein the hazardous chemical substance information carries corresponding hazardous chemical substance identification, and the matching comprises matching by using a Chinese index and matching by using an English index;
under the condition that the dangerous chemical substance information is one, outputting the dangerous chemical substance information;
and traversing the dangerous chemical identification of the at least two dangerous chemical information under the condition that the number of the dangerous chemical information is at least two, so as to remove the dangerous chemical information with repeated dangerous chemical identification, obtain non-repeated dangerous chemical information, and output the non-repeated dangerous chemical information.
In an optional implementation manner, before the matching, according to a preset regular expression, an index stored in the preset hazardous chemical substance database with the search content, the method further includes:
acquiring pre-stored hazardous chemical substance information in a preset hazardous chemical substance database;
and respectively decomposing the Chinese name and the English name of the hazardous chemical in the pre-stored hazardous chemical information into words, generating a reverse arrangement table for the words, and storing the reverse arrangement table as an index in the preset hazardous chemical database.
In an optional embodiment, the method further comprises:
acquiring property information and/or use information corresponding to the hazardous chemical substance identifier, and determining a key focus field in the pre-stored hazardous chemical substance information according to the property information and/or use information;
and determining the regular expression of each important field according to the type and the attribute of the important field, and constructing a regular expression library.
In an optional implementation manner, the determining, according to the type and the attribute of the important attention field, a regular expression of each important attention field, and constructing a regular expression library includes:
and according to the type and the attribute of the key attention field, combining at least one specific symbol for matching any character to be respectively used as matching items of the regular expressions, sequencing the matching items to form at least one regular expression, and integrating the regular expressions to establish the regular expression library.
In an optional embodiment, the method further comprises:
checking whether abnormal data exist in the pre-stored hazardous chemical substance information or not according to the regular expression library, and if so, correcting the abnormal data; the abnormal data comprises one or more of a missing item, a data item which does not accord with the defined field type and a data item which does not accord with the defined field attribute.
In an optional embodiment, the method further comprises:
under the condition of carrying out newly added operation on the hazardous chemical substance information, if the input field is consistent with a preset regular expression, associating the input field with the preset regular expression;
and if the input field does not accord with the regular expression, outputting input abnormal prompt information.
In an optional embodiment, after the obtaining of the non-duplicated hazardous chemical information and before the outputting of the non-duplicated hazardous chemical information, the method further includes:
under the condition that the number of the non-repeated dangerous chemical information is at least two, acquiring a preset query weight parameter, and acquiring a query score of the non-repeated dangerous chemical information according to the preset query weight parameter and the non-repeated dangerous chemical information;
the outputting the non-repeated hazardous chemical information comprises:
and displaying the non-repeated dangerous chemicals information according to the sequence of the query scores from high to low.
In a second aspect, a hazardous chemical substance detection device is provided, which includes:
the system comprises a detection module, a processing module and a processing module, wherein the detection module is used for acquiring dangerous chemical search contents to be processed and detecting whether the dangerous chemical search contents comprise a chemical abstract registration number or not, and the chemical abstract registration number has records in a preset database;
the matching module is used for matching indexes stored in a preset hazardous chemical substance database with the search contents according to a preset regular expression to obtain hazardous chemical substance information, wherein the index stored in the preset hazardous chemical substance database does not contain a chemical abstract registration number, the hazardous chemical substance information carries a corresponding hazardous chemical substance identifier, and the matching comprises matching by using a Chinese index and matching by using an English index;
the output module is used for outputting the hazardous chemical substance information under the condition that the hazardous chemical substance information is one;
the duplication elimination module is used for traversing the dangerous chemical identification of the at least two pieces of dangerous chemical information under the condition that the number of the dangerous chemical information is at least two, so as to eliminate the dangerous chemical information with repeated dangerous chemical identification and obtain non-repeated dangerous chemical information; the output module is further used for outputting the information of the non-repeated dangerous chemicals.
In a third aspect, an electronic device is provided, comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps as in the first aspect and any one of its possible implementations.
In a fourth aspect, there is provided a computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the steps of the first aspect and any possible implementation thereof.
The method comprises the steps of obtaining dangerous chemical search content to be processed, detecting whether the dangerous chemical search content comprises a chemical abstract registration number or not, wherein the chemical abstract registration number is recorded in a preset database, if not, matching indexes stored in the preset dangerous chemical database with the search content according to a preset regular expression to obtain dangerous chemical information, wherein the dangerous chemical information carries corresponding dangerous chemical identifications, the matching comprises matching by using Chinese indexes and matching by using English indexes, outputting the dangerous chemical information under the condition that the dangerous chemical information is one, traversing the dangerous chemical identifications of at least two pieces of dangerous chemical information under the condition that the dangerous chemical information is at least two, removing the dangerous chemical information with repeated dangerous chemical identifications, obtaining non-repeated dangerous chemical information, and outputting the non-repeated dangerous chemical information, dangerous chemical information can be quickly retrieved through index matching and duplicate removal, so that the query result is more accurate and simplified.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
Fig. 1 is a schematic flow chart of a hazardous chemical substance detection method according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of another hazardous chemical substance detection method according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a hazardous chemical substance detection device according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiments of the present application will be described below with reference to the drawings.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a hazardous chemical substance searching method according to an embodiment of the present disclosure. The method can comprise the following steps:
101. acquiring search contents of dangerous chemicals to be processed, and detecting whether the search contents of the dangerous chemicals comprise a chemical abstract registration number or not, wherein the chemical abstract registration number has a record in a preset database.
The Chemical Abstracts registration number (CAS NO) mentioned in the examples of the present application is a registration number prepared for a Chemical substance by the Chemical Abstracts Service (CAS), which is an important tool for searching Chemical substance information having a plurality of names. Is a unique numerical identification number of a substance (compound, polymeric material, Biological sequences), mixture or alloy.
Through online inquiry tools such as chemical dictionaries, the corresponding chemical substances and detailed chemical knowledge of the substances corresponding to CAS NO can be searched by inputting CAS login numbers in the search boxes.
The subject of embodiments of the present application may be a hazardous chemical detection device, and may be an electronic device, including but not limited to other portable devices such as a mobile phone, laptop computer, or tablet computer having a touch sensitive surface (e.g., a touch screen display and/or a touchpad), which may implement a face recognition function via an application. It should also be understood that in some embodiments, the devices described above are not portable communication devices, but rather are desktop computers having touch-sensitive surfaces (e.g., touch screen displays and/or touch pads). In an optional implementation manner, the hazardous chemical substance retrieval method may be executed in a server through a corresponding software program, and a user may input hazardous chemical substance search content through a client, and the server performs the retrieval process and returns the search content.
Dangerous chemicals are also called dangerous chemicals for short, and refer to highly toxic chemicals and other chemicals which have properties of toxicity, corrosion, explosion, combustion supporting and the like and are harmful to human bodies, facilities and environment. The dangerous chemical substance search content to be processed may be query content input by a user and acquired through a page, and is used for retrieving corresponding dangerous chemical substance information. The dangerous chemical substance searching device can acquire and identify search contents of dangerous chemical substances to be processed, detect whether the search contents include CAS NO recorded in a preset database, and specifically can acquire digital character strings in the search contents to compare the digital character strings in the preset database so as to determine whether the CAS NO recorded in the search contents includes the CAS NO recorded in the preset database. If not, step 102 may be performed; if yes, the hazardous chemical substance information corresponding to the CAS NO may be directly acquired and displayed at the client, and step 103 may be executed.
102. And matching indexes stored in a preset hazardous chemical substance database with the search contents according to a preset regular expression to obtain hazardous chemical substance information, wherein the hazardous chemical substance information carries corresponding hazardous chemical substance identification, and the matching comprises matching by using Chinese indexes and matching by using English indexes.
Regular expressions, also known as Regular expressions (Regular expressions), are often abbreviated as regex, regexp, or RE in code and belong to a concept in computer science.
Regular expressions are a logical formula that operates on strings of characters, which may include common characters (e.g., letters between a and z) and special characters (called "meta characters"), i.e., specific characters defined in advance, and combinations of the specific characters, to form a "regular string" that expresses a filtering logic for the string. A regular expression is a text pattern that describes one or more strings of characters to be matched when searching for text.
The indexes stored in the preset hazardous chemical substance database can be matched with the search contents according to a preset regular expression, namely whether two sentences or phrases have the same characters or words or not is compared, Chinese index matching can be firstly used, then English index matching can be used, corresponding hazardous chemical substance information is obtained, and a primary search result is obtained. The hazardous chemical substance information carries a corresponding hazardous chemical substance identifier, which can be understood as a serial number or a proprietary name of the hazardous chemical substance, and can include CAS NO of the hazardous chemical substance. The hazardous chemical substance information may include the name, abbreviation, classification, attribute, usage mode, and other related knowledge content of the hazardous chemical substance, which is not limited herein.
The establishment of the preset regular expression may refer to the specific description in the embodiment shown in fig. 2. Step 103 may be executed when only one piece of hazardous chemical information is obtained by the search, and step 104 may be executed when two or more pieces of hazardous chemical information are obtained by the search.
103. And outputting the dangerous chemical information when the dangerous chemical information is one.
And outputting the hazardous chemical substance information for the inquired unique corresponding hazardous chemical substance information. Optionally, the hazardous chemical substance information retrieved at the server side may be received through a client background of the user terminal and displayed in a display interface.
104. And traversing the dangerous chemical identification of the at least two dangerous chemical information under the condition that the number of the dangerous chemical information is at least two, so as to remove the dangerous chemical information with repeated dangerous chemical identification, obtain non-repeated dangerous chemical information, and output the non-repeated dangerous chemical information.
Because the program executes two matching operations of Chinese index and English index, the fuzzy search is realized, the result may have repeated data, and the deduplication processing can be performed firstly.
For the dangerous chemical information searched by index matching, if more than one dangerous chemical information is searched, the repeated dangerous chemical information can be removed by traversing the dangerous chemical identifiers of the at least two pieces of dangerous chemical information, only one dangerous chemical information is reserved, and the non-repeated dangerous chemical information can be obtained and then output. The basis of the duplication elimination may be to reserve the latest update time according to the update time of the hazardous chemical substance information, or reserve the hazardous chemical substance information with the highest content richness, that is, with the larger information amount recorded therein, according to the content richness of the hazardous chemical substance information, and provide the hazardous chemical substance information to the user.
In one embodiment, the traversal operation described above may be implemented using a community. A community is a special form of variable, defined using a keyword unit. An union is a struct-like union, all members of the union refer to the same position in the memory, and the memory length of the largest member is taken as the memory size of the union. The union is mainly used to save space, and the default access rights are public.
"Association" is a special class, and is also a type of structure, of data structures. A plurality of different data types can be defined in a 'union', and any data defined by the 'union' is allowed to be loaded in a variable described as the 'union' type, and the data share the same section of memory, so that the purpose of saving space is achieved. In the search result output by the current database, the unique ID of each piece of data, namely the dangerous chemical identification, exists. And traversing the current search result, if repeated ID occurs, deleting the data to finally obtain the search result which is not repeated and can be directly displayed to the user.
In an optional implementation manner, after the obtaining of the non-duplicated hazardous chemical information and before the outputting of the non-duplicated hazardous chemical information, the method further includes:
under the condition that the number of the non-repeated dangerous chemical information is at least two, acquiring a preset query weight parameter, and acquiring a query score of the non-repeated dangerous chemical information according to the preset query weight parameter and the non-repeated dangerous chemical information;
the outputting the non-repetitive hazardous chemical substance information includes:
and displaying the information of the non-repeated dangerous chemicals according to the sequence of the query scores from high to low.
Specifically, a query weight parameter may be preset, and optionally, the preset query weight parameter may include one or more reference items, each reference item corresponds to one weight parameter, for example, the reference item may include, but is not limited to, a recent browsing amount, a name matching degree, and the like. And performing weighting operation on the non-repeated dangerous chemical information according to the reference item and the corresponding weight parameter thereof to obtain the query score of each non-repeated dangerous chemical information. And then displaying the non-repeated dangerous chemical information according to the sequence of the query scores from high to low so as to preferentially display more comprehensive and accurate dangerous chemical information for the user.
The method comprises the steps of obtaining dangerous chemical search content to be processed, detecting whether the dangerous chemical search content comprises a chemical abstract registration number or not, wherein the chemical abstract registration number is recorded in a preset database, if not, matching indexes stored in the preset dangerous chemical database with the search content according to a preset regular expression to obtain dangerous chemical information, wherein the dangerous chemical information carries corresponding dangerous chemical identifications, the matching comprises matching by using Chinese indexes and matching by using English indexes, outputting the dangerous chemical information under the condition that the dangerous chemical information is one, traversing the dangerous chemical identifications of at least two pieces of dangerous chemical information under the condition that the dangerous chemical information is at least two, removing the dangerous chemical information with repeated dangerous chemical identifications, obtaining non-repeated dangerous chemical information, and outputting the non-repeated dangerous chemical information, dangerous chemical information can be quickly retrieved through index matching and duplicate removal, so that the query result is more accurate and simplified.
Referring to fig. 2, fig. 2 is a schematic flow chart of another hazardous chemical substance searching method according to an embodiment of the present disclosure. The method shown in FIG. 2 may be performed prior to the steps of the embodiment shown in FIG. 1 to build a database and index prior to retrieval. As shown in fig. 2, the method may include:
201. acquiring pre-stored hazardous chemical substance information in a preset hazardous chemical substance database, decomposing Chinese names and English names of hazardous chemical substances in the pre-stored hazardous chemical substance information into words respectively, generating a reverse arrangement table for the words, and storing the reverse arrangement table as an index in the preset hazardous chemical substance database.
The subject of embodiments of the present application may be a hazardous chemical detection device, and may be an electronic device, including but not limited to other portable devices such as a mobile phone, laptop computer, or tablet computer having a touch sensitive surface (e.g., a touch screen display and/or a touchpad), which may implement a face recognition function via an application. It should also be understood that in some embodiments, the devices described above are not portable communication devices, but rather are desktop computers having touch-sensitive surfaces (e.g., touch screen displays and/or touch pads). In an optional implementation manner, the hazardous chemical substance retrieval method may be executed in a server through a corresponding software program, and a user may input hazardous chemical substance search content through a client, and the server performs the retrieval process and returns the search content.
The established preset hazardous chemical substance database can comprise a large amount of pre-stored hazardous chemical substance information, firstly, the pre-stored hazardous chemical substance information is indexed, Chinese names and English names of hazardous chemical substances in the pre-stored hazardous chemical substance information are respectively decomposed into words, and an inverted arrangement table is generated for the words and is stored in the preset hazardous chemical substance database as an index.
The method of inverted indexing may be used in embodiments of the present application. The inverted index results from the need to look up records based on the values of attributes in practical applications. Each entry in such an index table includes an attribute value and the address of the record having the attribute value. Since the attribute value is not determined by the record but the position of the record is determined by the attribute value, it is called inverted index (inverted index). The file with the inverted index is called an inverted index file, which is called an inverted file for short. In the embodiment of the application, the inverted file can be generated according to the data after word segmentation processing and stored.
202. Acquiring property information and/or use information corresponding to the hazardous chemical substance identification, and determining a key focus field in the pre-stored hazardous chemical substance information according to the property information and/or use information.
Specifically, the property information describes physical properties and/or chemical properties of the hazardous chemical substance, and the use information describes a use method and/or use precautions of the hazardous chemical substance. According to the hazardous chemical substance identifier, the corresponding hazardous chemical substance related information can be found in the database, wherein the information may include the property information and/or the usage information, and other information, which is not limited herein. Partial words and sentences can be extracted from the information and determined as important attention fields in the pre-stored hazardous chemical substance information, such as 'light blue', 'strong toxicity', 'easy explosion' and the like.
Wherein, the step 201 and the step 202 may be executed without a sequential order.
203. And determining the regular expression of each important field according to the type and the attribute of the important field, and constructing a regular expression library.
After key attention fields in the pre-stored hazardous chemical substance information are determined, the field types and the attributes can be identified, and the regular expression of each key attention field can be determined according to the types and the attributes of the key attention fields.
In an optional implementation manner, the step 203 specifically includes:
and according to the type and the attribute of the key focus field, combining at least one specific symbol for matching any character to be respectively used as matching items of regular expressions, sequencing the matching items to form at least one regular expression, and integrating the regular expressions to establish the regular expression library.
Regular matching may be achieved by a regular matching tool. The specific symbol may be determined according to various existing regular symbols and usage rules thereof, and the combination may be performed according to a preset regular symbol rule. And integrating the regular expressions to establish the regular expression library for the retrieval process of the hazardous chemical substances.
In an alternative embodiment, the method further comprises:
checking whether abnormal data exist in the pre-stored hazardous chemical substance information or not according to the regular expression library, and if so, correcting the abnormal data; the abnormal data comprises one or more of a missing item, a data item which does not accord with the defined field type and a data item which does not accord with the defined field attribute.
Specifically, whether abnormal data exists in the pre-stored hazardous chemical substance information can be checked according to the regular expression in the created regular expression library. That is, the pre-stored data needs to satisfy a specific regular expression, and if not, the pre-stored data can be confirmed as abnormal data, and the abnormal data can be corrected. In an alternative embodiment, the abnormal data may also be checked by an additional detection criterion, and whether the data is abnormal may also be determined by whether there is a missing item, whether the field type and the field attribute match the preset parameters, and the like.
In an optional embodiment, the method further comprises:
under the condition of carrying out newly added operation on the dangerous chemical information, if the input field is consistent with a preset regular expression, associating the input field with the preset regular expression;
and if the input field does not accord with the regular expression, outputting input abnormal prompt information.
And adding, deleting and/or modifying the hazardous chemical substance information in the database. In the case of performing a new addition operation on the hazardous chemical substance information, the above regular expression may also be used. Firstly, acquiring an input field, identifying whether the input field is consistent with a preset regular expression or not, if so, associating the input field with the preset regular expression, and storing the input field in a database after the input of all contents is completed; if the input information does not conform to the input information, abnormal prompt information can be output and input, and the user is prompted to input the content again.
By the method in the embodiment shown in fig. 2, a dangerous chemical substance searching program framework can be established, and is applied to the dangerous chemical substance searching process shown in fig. 1.
According to the embodiment of the application, pre-stored hazardous chemical substance information in a preset hazardous chemical substance database is obtained, Chinese names and English names of hazardous chemical substances in the pre-stored hazardous chemical substance information are respectively decomposed into words, a reverse arrangement table is generated for the words and stored in the preset hazardous chemical substance database as indexes, property information and/or use information corresponding to hazardous chemical substance identifications are obtained, important concern fields in the pre-stored hazardous chemical substance information are determined according to the property information and/or the use information, regular expressions of the important concern fields are determined according to the types and attributes of the important concern fields, and a regular expression library is constructed, so that the hazardous chemical substance detection database and the retrieval frame are simplified, the hazardous chemical substance information retrieval can be used for hazardous chemical substance information retrieval through perfect indexes and regular expressions, hazardous chemical substance information is rapidly retrieved, and query results are more accurate, The process is simplified.
Based on the description of the dangerous chemical substance detection method embodiment, the embodiment of the application also discloses a dangerous chemical substance detection device. Referring to fig. 3, the hazardous chemical substance detection device 300 includes:
a detection module 310, configured to obtain hazardous chemical search content to be processed, and detect whether the hazardous chemical search content includes a chemical abstract registration number, where the chemical abstract registration number has a record in a preset database;
the matching module 320 is configured to match an index stored in a preset hazardous chemical substance database with the search content according to a preset regular expression without including a chemical digest registration number in the hazardous chemical substance search content to obtain hazardous chemical substance information, where the hazardous chemical substance information carries a corresponding hazardous chemical substance identifier, and the matching includes matching using a chinese index and matching using an english index;
an output module 330, configured to output the hazardous chemical substance information if there is one hazardous chemical substance information;
the de-duplication module 340 is configured to traverse the hazardous chemical substance identifications of the at least two pieces of hazardous chemical substance information to remove hazardous chemical substance information with duplicate hazardous chemical substance identifications and obtain non-duplicate hazardous chemical substance information under the condition that the number of the hazardous chemical substance information is at least two; the output module is further used for outputting the information of the non-repeated dangerous chemicals.
Optionally, the apparatus 300 further includes a creating module 350, configured to:
before the matching module 320 matches the index stored in the preset hazardous chemical substance database with the search content according to a preset regular expression, acquiring pre-stored hazardous chemical substance information in the preset hazardous chemical substance database;
and respectively decomposing the Chinese name and the English name of the hazardous chemical in the pre-stored hazardous chemical information into words, generating a reverse arrangement table for the words, and storing the reverse arrangement table as an index in the preset hazardous chemical database.
Optionally, the establishing module 350 is further configured to obtain property information and/or use information corresponding to the hazardous chemical substance identifier, and determine a focus field of interest in the pre-stored hazardous chemical substance information according to the property information and/or use information;
and determining the regular expression of each important field according to the type and the attribute of the important field, and constructing a regular expression library.
Optionally, the establishing module 350 is specifically configured to:
and according to the type and the attribute of the key focus field, combining at least one specific symbol for matching any character to be respectively used as matching items of regular expressions, sequencing the matching items to form at least one regular expression, and integrating the regular expressions to establish the regular expression library.
Optionally, the establishing module 350 is further configured to check whether abnormal data exists in the pre-stored hazardous chemical substance information according to the regular expression library, and if so, correct the abnormal data; the abnormal data comprises one or more of a missing item, a data item which does not accord with the defined field type and a data item which does not accord with the defined field attribute.
Optionally, the establishing module 350 is further configured to:
under the condition of carrying out newly added operation on the dangerous chemical information, if the input field is consistent with a preset regular expression, associating the input field with the preset regular expression;
the output module 330 is further configured to output an entry exception prompt message if the entry field does not match the regular expression.
Optionally, the apparatus 300 further includes a scoring module, configured to obtain a preset query weight parameter after obtaining the non-duplicated hazardous chemical substance information and under the condition that there are at least two pieces of non-duplicated hazardous chemical substance information, and obtain a query score of the non-duplicated hazardous chemical substance information according to the preset query weight parameter and the non-duplicated hazardous chemical substance information;
the output module 330 is specifically configured to:
and displaying the information of the non-repeated dangerous chemicals according to the sequence of the query scores from high to low.
According to an embodiment of the present application, each step involved in the methods shown in fig. 1 and fig. 2 may be performed by each module in the hazardous chemical substance detection apparatus 300 shown in fig. 3, and is not described herein again.
In the hazardous chemical substance searching device 300 of the embodiment of the application, whether a chemical abstract registration number is included in hazardous chemical substance searching content is detected by obtaining the hazardous chemical substance searching content to be processed, the chemical abstract registration number is recorded in a preset database, if not, an index stored in the preset hazardous chemical substance database is matched with the searching content according to a preset regular expression to obtain hazardous chemical substance information, the hazardous chemical substance information carries corresponding hazardous chemical substance identifiers, the matching includes using a chinese index matching and using an english index matching, the hazardous chemical substance information is output when the hazardous chemical substance information is one, the hazardous chemical substance identifiers of the at least two hazardous chemical substance information are traversed under the condition that the hazardous chemical substance information is at least two, so as to remove the hazardous chemical substance information with repeated hazardous chemical substance identifiers, and obtain non-repeated hazardous chemical substance information, the information of the non-repeated dangerous chemicals is output, and the information of the dangerous chemicals can be quickly retrieved through index matching and duplicate removal, so that the query result is more accurate and simplified.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides an electronic device. Referring to fig. 4, the electronic device 400 includes at least a processor 401, an input device 402, an output device 403, and a computer storage medium 404. The processor 401, input device 402, output device 403, and computer storage medium 404 within the terminal may be connected by a bus or other means.
A computer storage medium 404 may be stored in the memory of the terminal, said computer storage medium 404 being adapted to store a computer program comprising program instructions, said processor 401 being adapted to execute said program instructions stored by said computer storage medium 404. The processor 401 (or CPU) is a computing core and a control core of the terminal, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function; in one embodiment, the processor 401 described above in the embodiments of the present application may be configured to perform a series of processes, including the method in the embodiments shown in fig. 1 and fig. 2, and so on.
An embodiment of the present application further provides a computer storage medium (Memory), where the computer storage medium is a Memory device in a terminal and is used to store programs and data. It is understood that the computer storage medium herein may include a built-in storage medium in the terminal, and may also include an extended storage medium supported by the terminal. The computer storage medium provides a storage space that stores an operating system of the terminal. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 401. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.
In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by processor 401 to perform the corresponding steps in the above embodiments; in particular implementations, one or more instructions in the computer storage medium may be loaded by processor 401 and executed to perform any step of the method in fig. 1 and/or fig. 2, which is not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the division of the module is only one logical division, and other divisions may be possible in actual implementation, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

Claims (10)

1. A hazardous chemical substance retrieval method is characterized by comprising the following steps:
acquiring search contents of hazardous chemicals to be processed, and detecting whether the search contents of the hazardous chemicals comprise a chemical abstract registration number or not, wherein the chemical abstract registration number has a record in a preset database;
if not, matching indexes stored in a preset hazardous chemical substance database with the search contents according to a preset regular expression to obtain hazardous chemical substance information, wherein the hazardous chemical substance information carries corresponding hazardous chemical substance identification, and the matching comprises matching by using a Chinese index and matching by using an English index;
under the condition that the dangerous chemical substance information is one, outputting the dangerous chemical substance information;
and traversing the dangerous chemical identification of the at least two dangerous chemical information under the condition that the number of the dangerous chemical information is at least two, so as to remove the dangerous chemical information with repeated dangerous chemical identification, obtain non-repeated dangerous chemical information, and output the non-repeated dangerous chemical information.
2. The hazardous chemical substance searching method according to claim 1, wherein before the matching of the index stored in the preset hazardous chemical substance database with the search content according to a preset regular expression, the method further comprises:
acquiring pre-stored hazardous chemical substance information in a preset hazardous chemical substance database;
and respectively decomposing the Chinese name and the English name of the hazardous chemical in the pre-stored hazardous chemical information into words, generating a reverse arrangement table for the words, and storing the reverse arrangement table as an index in the preset hazardous chemical database.
3. The hazardous chemical substance retrieval method according to claim 2, wherein the method further comprises:
acquiring property information and/or use information corresponding to the hazardous chemical substance identifier, and determining a key focus field in the pre-stored hazardous chemical substance information according to the property information and/or use information;
and determining the regular expression of each important field according to the type and the attribute of the important field, and constructing a regular expression library.
4. The hazardous chemical substance searching method according to claim 3, wherein the step of determining the regular expression of each of the important fields of interest according to the type and the attribute of the important fields of interest to construct a regular expression library comprises the steps of:
and according to the type and the attribute of the key attention field, combining at least one specific symbol for matching any character to be respectively used as matching items of the regular expressions, sequencing the matching items to form at least one regular expression, and integrating the regular expressions to establish the regular expression library.
5. The hazardous chemical substance detection method according to claim 3 or 4, wherein the method further comprises:
checking whether abnormal data exist in the pre-stored hazardous chemical substance information or not according to the regular expression library, and if so, correcting the abnormal data; the abnormal data comprises one or more of a missing item, a data item which does not accord with the defined field type and a data item which does not accord with the defined field attribute.
6. The hazardous chemical substance detection method according to any one of claims 1 to 4, wherein the method further comprises:
under the condition of carrying out newly added operation on the hazardous chemical substance information, if the input field is consistent with a preset regular expression, associating the input field with the preset regular expression;
and if the input field does not accord with the regular expression, outputting input abnormal prompt information.
7. The method for retrieving dangerous chemicals according to any one of claims 1-4, wherein after obtaining the no-repeat dangerous chemicals information and before outputting the no-repeat dangerous chemicals information, the method further comprises:
under the condition that the number of the non-repeated dangerous chemical information is at least two, acquiring a preset query weight parameter, and acquiring a query score of the non-repeated dangerous chemical information according to the preset query weight parameter and the non-repeated dangerous chemical information;
the outputting the non-repeated hazardous chemical information comprises:
and displaying the non-repeated dangerous chemicals information according to the sequence of the query scores from high to low.
8. The utility model provides a dangerous chemicals indexing unit which characterized in that includes:
the system comprises a detection module, a processing module and a processing module, wherein the detection module is used for acquiring dangerous chemical search contents to be processed and detecting whether the dangerous chemical search contents comprise a chemical abstract registration number or not, and the chemical abstract registration number has records in a preset database;
the matching module is used for matching indexes stored in a preset hazardous chemical substance database with the search contents according to a preset regular expression to obtain hazardous chemical substance information, wherein the index stored in the preset hazardous chemical substance database does not contain a chemical abstract registration number, the hazardous chemical substance information carries a corresponding hazardous chemical substance identifier, and the matching comprises matching by using a Chinese index and matching by using an English index;
the output module is used for outputting the hazardous chemical substance information under the condition that the hazardous chemical substance information is one;
the duplication elimination module is used for traversing the dangerous chemical identification of the at least two pieces of dangerous chemical information under the condition that the number of the dangerous chemical information is at least two, so as to eliminate the dangerous chemical information with repeated dangerous chemical identification and obtain non-repeated dangerous chemical information; the output module is further used for outputting the information of the non-repeated dangerous chemicals.
9. An electronic device, characterized in that it comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the method for dangerous chemical detection according to any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the hazardous chemical substance detection method according to any one of claims 1 to 7.
CN202010686460.XA 2020-07-16 2020-07-16 Hazardous chemical substance retrieval method, hazardous chemical substance retrieval device, electronic equipment and medium Pending CN111949756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010686460.XA CN111949756A (en) 2020-07-16 2020-07-16 Hazardous chemical substance retrieval method, hazardous chemical substance retrieval device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010686460.XA CN111949756A (en) 2020-07-16 2020-07-16 Hazardous chemical substance retrieval method, hazardous chemical substance retrieval device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN111949756A true CN111949756A (en) 2020-11-17

Family

ID=73340938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010686460.XA Pending CN111949756A (en) 2020-07-16 2020-07-16 Hazardous chemical substance retrieval method, hazardous chemical substance retrieval device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN111949756A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090106271A1 (en) * 2007-10-19 2009-04-23 International Business Machines Corporation Secure search of private documents in an enterprise content management system
CN102033866A (en) * 2009-09-29 2011-04-27 国际商业机器公司 Method and system for checking chemical name
US20120124064A1 (en) * 2010-11-03 2012-05-17 Microsoft Corporation Transformation of regular expressions
CN103229120A (en) * 2010-09-28 2013-07-31 国际商业机器公司 Providing answers to questions using hypothesis pruning
CN107588915A (en) * 2017-10-18 2018-01-16 厦门大学 A kind of Bridge Influence Line recognition methods and system
CN109767177A (en) * 2018-12-20 2019-05-17 北京航空航天大学 A kind of public security traffic control business data processing and input method based on regular expression
CN110674250A (en) * 2019-08-15 2020-01-10 中国平安财产保险股份有限公司 Text matching method, text matching device, computer system and readable storage medium
CN110866091A (en) * 2019-11-19 2020-03-06 杭州数梦工场科技有限公司 Data retrieval method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090106271A1 (en) * 2007-10-19 2009-04-23 International Business Machines Corporation Secure search of private documents in an enterprise content management system
CN102033866A (en) * 2009-09-29 2011-04-27 国际商业机器公司 Method and system for checking chemical name
CN103229120A (en) * 2010-09-28 2013-07-31 国际商业机器公司 Providing answers to questions using hypothesis pruning
US20120124064A1 (en) * 2010-11-03 2012-05-17 Microsoft Corporation Transformation of regular expressions
CN107588915A (en) * 2017-10-18 2018-01-16 厦门大学 A kind of Bridge Influence Line recognition methods and system
CN109767177A (en) * 2018-12-20 2019-05-17 北京航空航天大学 A kind of public security traffic control business data processing and input method based on regular expression
CN110674250A (en) * 2019-08-15 2020-01-10 中国平安财产保险股份有限公司 Text matching method, text matching device, computer system and readable storage medium
CN110866091A (en) * 2019-11-19 2020-03-06 杭州数梦工场科技有限公司 Data retrieval method and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
吴长江;: "化学物质登记号的获取方法", 农业图书情报学刊, no. 10, pages 102 - 103 *
李鑫, 陆海东: "危险化学品查询系统的设计与实现", 吉林化工学院学报, no. 04, pages 51 - 53 *
查鲁·C.阿加沃尔: "Lucene搜索引擎开发权威经典", 31 October 2008, 北京:中国铁道出版社, pages: 462 - 463 *
樊重俊 等: "大数据分析与应用", 31 January 2016, 海:立信会计出版社, pages: 247 *
牛文斗 等: "设计化学主题数据库的数据集成与实施", 《计算机与应用化学》, vol. 27, no. 12, pages 1655 - 1659 *
牛文斗 等: "设计化学主题数据库的数据集成与实施", 计算机与应用化学, vol. 27, no. 12, pages 1655 - 1659 *

Similar Documents

Publication Publication Date Title
US9519636B2 (en) Deduction of analytic context based on text and semantic layer
RU2547213C2 (en) Assigning actionable attributes to data describing personal identity
CN107085583B (en) Electronic document management method and device based on content
CN110750975B (en) Introduction text generation method and device
CN111400323A (en) Data retrieval method, system, device and storage medium
CN110209659A (en) A kind of resume filter method, system and computer readable storage medium
US11640499B2 (en) Systems, methods and computer program products for mining text documents to identify seminal issues and cases
US9875298B2 (en) Automatic generation of a search query
CN107291951B (en) Data processing method, device, storage medium and processor
CN114676231A (en) Target information detection method, device and medium
CN110489032B (en) Dictionary query method for electronic book and electronic equipment
CN112487159A (en) Search method, search device, and computer-readable storage medium
CN106844406B (en) Search method and search device
JP2011133928A (en) Retrieval device, retrieval system, retrieval method, and computer program for retrieving document file stored in storage device
CN111949755B (en) Information query method and device for hazardous chemicals, electronic equipment and medium
CN111949756A (en) Hazardous chemical substance retrieval method, hazardous chemical substance retrieval device, electronic equipment and medium
CN114491232B (en) Information query method and device, electronic equipment and storage medium
JP2006023968A (en) Unique expression extracting method and device and program to be used for the same
CN113742291A (en) File saving method and device and computer storage medium
KR100659370B1 (en) Method for constructing a document database and method for searching information by matching thesaurus
CN114610955A (en) Intelligent retrieval method and device, electronic equipment and storage medium
JP2012043258A (en) Retrieval system, retrieval device, retrieval program, recording medium and retrieval method
CN112347324A (en) Document query method and device, electronic equipment and storage medium
JP4690232B2 (en) Information processing apparatus, software registration method, and program
CN113094469B (en) Text data analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination