WO2021076862A1 - Technologies for dynamically creating representations for regulations - Google Patents
Technologies for dynamically creating representations for regulations Download PDFInfo
- Publication number
- WO2021076862A1 WO2021076862A1 PCT/US2020/055936 US2020055936W WO2021076862A1 WO 2021076862 A1 WO2021076862 A1 WO 2021076862A1 US 2020055936 W US2020055936 W US 2020055936W WO 2021076862 A1 WO2021076862 A1 WO 2021076862A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- regulation
- object model
- sentences
- processor
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/131—Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/197—Version control
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the present disclosure is directed to creating standardized object models for regulations. More particularly, the present disclosure is directed to platforms and technologies for analyzing ingested regulations, creating object models, and allowing versioning of regulations or object models for the regulations that indicate applicable topics and categories.
- a product is specified according to a protocol and a specification, where the protocol is a collection of compliance and/or voluntary performance testing requirements that a product must meet for a given customer to enter a given market, where a product specification or datasheet may describe the product, features thereof, brand claims, and/or other aspect. Both protocols and specifications may serve to describe differentiators of the product.
- a computer-implemented method of creating object models for regulations for a given market(s) may include: accessing, by a computer processor, a set of regulatory information corresponding to a regulation; segmenting, by the computer processor, the set of regulatory information into a set of structured texts and a set of metadata; generating, by the computer processor, an object model for the regulation, the object model comprising the set of structured texts and the set of metadata; performing, by the computer processor, a linguistic analysis on the object model to detect a set of sentences within the set of structured texts; generating, by the computer processor based on the set of sentences, a summary of the regulation; and enriching, by the computer processor, the object model for the regulation with the summary of the regulation.
- a system for dynamically creating object models for regulations may include: a memory storing instructions, and a processor interfaced with the memory.
- the processor may be configured to execute the instructions to cause the processor to: access a set of regulatory information corresponding to a regulation, segment the set of regulatory information into a set of structured texts and a set of metadata, generate an object model for the regulation, the object model comprising the set of structured texts and the set of metadata, perform a linguistic analysis on the object model to detect a set of sentences within the set of structured texts, generate, based on the set of sentences, a summary of the regulation, and enrich the object model for the regulation with the summary of the regulation.
- FIG. 1A depicts an overview of components and entities associated with the systems and methods, in accordance with some embodiments.
- FIG. IB depicts an overview of certain components configured to facilitate the systems and methods, in accordance with some embodiments.
- FIG. 2 is an example flowchart depicting various functionalities associated with the systems and methods, in accordance with some embodiments.
- FIG. 3 is another example flowchart associated with creating object models for regulations, in accordance with some embodiments. DETAILED DESCRIPTION
- the present embodiments may relate to, inter alia, platforms and technologies for dynamically analyzing regulations applicable to a plurality of products, or components, materials, chemicals, attributes or features that might be associated with a product, across a plurality of jurisdictions.
- systems and methods may receive or otherwise access regulations and segment them into a set of structured texts, where the structured texts may include a header, footer, title, body, sections, sub- sections, paragraphs, lists, sub-lists, citations, references, or any other type of information block indicated in the format of the original document.
- the systems and methods may further segment the regulations into a set of metadata containing additional information related to the content and subject matter of the underlying regulation.
- the systems and methods may further analyze the regulations to determine a set of sentences and, from the set of sentences, automatically generate a summary for each regulation.
- the systems and methods may additionally classify the set of sentences to determine a set of topics for each of the sentences, where each of the set of topics has an associated probability of being applicable to the underlying regulation.
- the systems and methods may avail sentences having topics that meet a threshold probability that enables users to gain additional insight into the regulation.
- the systems and methods may generate object models for the regulations and enhance the object models with additional data determined as a result of the various analyses. It should be appreciated that the systems and methods may operate using the English language or any other language, and/or may translate information from one language(s) into another language(s). [0014] The systems and methods therefore offer numerous benefits.
- the object models generated by systems and methods may be in a standard format which mitigates the various inconsistencies and complexities present in issued regulations governing products and their associated components, materials, chemicals, attributes, features, labelling and packaging.
- various entities or individuals may access the object models to effectively and efficiently ascertain relevant information about the regulations and/or products, components, materials, chemicals, attributes or features that may be affected by the regulations.
- the entities or individuals may determine, from the object models, a set of topics for and/or a set of terms that may be particularly relevant to the underlying regulations.
- the systems and methods may also store the object models in a central database for effective retrieval by the entities and individuals. It should be appreciated that additional benefits are envisioned.
- a change in a particular regulation of a product may drive a change in the design of the product.
- a new regulation is restricting content of a hazardous chemical
- the producer may have to reformulate the product to ensure continual compliance with the new regulation.
- the systems and methods may utilize automated processes to access and manage certain product-related details and features, components, or materials (e.g., to link with product design and development stages of the produce lifecycle).
- the systems and methods may be used by product designers to receive automatic alerts whenever a regulatory change may require a specific change to the design, testing, inspection, certification, labelling or packaging of the product, and may directly integrate with product development applications to automatically reflect applicable regulatory changes and propose or suggest protocol or specification/datasheet augmentations.
- the systems and methods discussed herein address a challenge particular to supply chain management.
- the challenge relates to a difficulty in accurately and effectively assessing how regulations should be interpreted or applied, and determining which regulations may be applicable to products or product categories before their introduction to market, especially because of inconsistencies and complexities between and among product protocols and regulations.
- individuals must manually review regulations to determine their scope and applicability to certain products.
- these conventional methods are often time consuming, ineffective, and/or expensive.
- regulations are not consistent in scope, terminology, and formatting across different jurisdictions. Further, regulations typically do not mention end-products, but rather the components, materials, chemicals, attributes or features affiliated with many types of products or product categories.
- the systems and methods offer capabilities to solve these problems by dynamically analyzing regulations to determine relevant attributes, generating object models for the regulations that are consistent in format and indicate the relevant product attributes, and enabling effective access to the object models. Further, because the systems and methods employ communication between and among multiple devices and components, the systems and methods are necessarily rooted in computer technology in order to overcome the noted shortcomings that specifically arise in the realm of supply chain management.
- FIG. 1A illustrates an overview of a system 100 of components configured to facilitate the systems and methods. It should be appreciated that the system 100 is merely an example and that alternative or additional components are envisioned.
- the system 100 may include a set of electronic devices 101,
- Each of the electronic devices 101, 102 may be any type of electronic device such as a mobile device (e.g., a smartphone), desktop computer, notebook computer, tablet, phablet, GPS (Global Positioning System) or GPS-enabled device, smart watch, smart glasses, smart bracelet, wearable electronic, PDA (personal digital assistant), pager, computing device configured for wireless communication, and/or the like.
- a mobile device e.g., a smartphone
- desktop computer e.g., notebook computer, tablet, phablet, GPS (Global Positioning System) or GPS-enabled device, smart watch, smart glasses, smart bracelet, wearable electronic, PDA (personal digital assistant), pager, computing device configured for wireless communication, and/or the like.
- GPS Global Positioning System
- PDA personal digital assistant
- pager computing device configured for wireless communication, and/or the like.
- any of the electronic devices 101, 102 may be an electronic device associated with an entity such as a company, business, corporation, or the like (e.g., a
- Each of the electronic devices 101, 102 may be used by any individual or person (generally, a user). According to embodiments, the user may use the respective electronic device 101, 102 to input various information associated with a product(s) and/or a regulation(s).
- the product(s) may be offered for sale or otherwise made available for purchase or use by a business, company, service provider, or the like, and may be regulated in an applicable jurisdiction by an applicable regulation(s). Alternatively or additionally, the business, company, service provider, or the like may be contemplating offering the product for sale or purchase in a certain jurisdiction(s).
- the information may represent an iteration, update, or new version of the product(s).
- the user may also use the electronic devices 101, 102 to input a query associated with a product and/or a regulation.
- the electronic devices 101, 102 may communicate with a server computer 115 via one or more networks 110.
- the server computer 115 may be associated with an entity such as a company, business, corporation, or the like, which accesses, aggregates, and analyzes existing and/or new regulations. Additionally or alternatively, the server computer 115 may be associated with an entity such as a company, business, corporation, or the like, which markets, manufactures, or sells products, or otherwise interfaces or communicates with entities that market, manufacture, or sell the products.
- the electronic devices 101, 102 may transmit or communicate, via the network(s) 110, information associated with products and/or regulations, or queries related thereto, to the server computer 115.
- the network(s) 110 may support any type of data communication via any standard or technology including various wide area network or local area network protocols (e.g., GSM, CDMA, VoIP, TDMA, WCDMA, LTE, EDGE, OFDM, GPRS, EV-DO, UWB, Internet, IEEE 802 including Ethernet, WiMAX, Wi-Fi, Bluetooth, and others). Further, in embodiments, the network(s) 110 may be any telecommunications network that may support a telephone call between the electronic devices 101, 102 and the server computer 115.
- the server computer 115 may communicate with one or more product-related data sources 117.
- the product-related data sources(s) 117 may alternatively or additionally receive, access, and/or store various product information, including product components, materials, chemicals, attributes, features, intended use, labelling, packaging, or any data that might pertain to regulatory requirements.
- the product- related data source(s) 117 may be associated with businesses, companies, service providers, or the like, that may have an agreement, partnership, or contract with an entity associated with the server computer 115, and that offer or contemplate offering various products.
- the corresponding product-related data source 117 may push or otherwise send the new or updated product protocol or specification/datasheet to the server computer 115, or the server computer 115 may pull or retrieve the new or updated product information from the corresponding product-related data source 117. Accordingly, the server computer 115 may store the most up-to-date product information issued by the participating businesses, companies, services providers or the like.
- the server computer 115 may additionally communicate with a regulation-related data source(s) 116.
- the regulation-related data source(s) 116 may be associated with various regulatory bodies or agencies that may set or institute regulations.
- the regulation-related data source(s) 116 may be associated with the U.S. Consumer Product Safety Commission (CPSC), the U.S. Environmental Protection Agency (EPA), the U.S. Federal Aviation Administration (FAA), the U.S. Federal Communications Commission (FCC), the U.S. Food and Drug Administration (FDA), the U.S. Federal Trade Commission (FTC), the U.S. National Highway Traffic Safety Administration (NHTSA), the U.S. Nuclear Regulatory Commission (NRC).
- the regulatory bodies or agencies may be any combination of federal- level, state-level, municipal-level, local-level, foreign, digital, or other level of regulatory bodies or agencies.
- the corresponding regulation-related data source 116 may push or otherwise send the new or updated regulation to the server computer 115, or the server computer 115 may pull or retrieve the new or updated regulation from the corresponding regulation-related data source 116.
- the server computer 115 may store the most up-to-date regulations issued by the participating regulatory bodies or agencies.
- the server computer 115 may also store historic versions of the regulations, and may link or otherwise associate the historic versions with the respective current, up-to-date and/or consolidated versions of the regulations.
- the server computer 115 may employ various machine learning techniques, calculations, algorithms, and the like to generate and maintain a machine learning model associated with regulations and/or products.
- the server computer 115 may initially train the machine learning model(s) using a set of training data, or may not initially train the machine learning model(s).
- the server computer 115 may analyze any information received from the regulation-related data source(s) 116, for example using the machine learning model, to analyze regulation information, generate information resulting therefrom, and generate corresponding object models.
- the server computer 115 may avail the result(s) of the analysis (e.g., by presenting the result(s) in a user interface) for review and further selection by a user of the server computer 115).
- the server computer 115 may be configured to interface with or support a memory or storage 113 capable of storing various data, such as in one or more databases or other forms of storage.
- the storage 113 may store data or information associated with any machine learning models and/or object models that are generated by the server computer 115, any regulation information received from the regulation-related data sources 116, or any product information received from the electronic devices 101, 102 and/or from the product-related data source(s) 117.
- the server computer 115 may be in the form of a distributed cluster of computers, servers, machines, or the like.
- the entity may utilize the distributed server computer(s) 115 as part of an on-demand cloud computing platform. Accordingly, when the electronic devices 101, 102 and the data sources 116, 117 interface with the server computer 115, the electronic devices 101, 102 and the data sources 116, 117 may actually interface with one or more of a number of distributed computers, servers, machines, or the like, to facilitate the described functionalities.
- two (2) electronic devices 101, 102 and one (1) server computer 115 are depicted in FIG. 1A, it should be appreciated that greater or fewer amounts are envisioned. For example, there may be multiple server computers, each one associated with a different entity.
- FIG. IB depicts more specific components associated with the systems and methods.
- FIG. IB depicts an example environment 150 in which regulation data 151 is processed into a set of regulation object models 152 via a regulation analyzer platform 155, according to embodiments.
- the regulation analyzer platform 155 may be implemented on any computing device, including the server computer 115 (or in some implementations, one or more of the electronic devices 101, 102) as discussed with respect to FIG. 1A.
- Components of the computing device may include, but are not limited to, a processing unit (e.g., processor(s) 156), a system memory (e.g., memory 157), and a system bus 158 that couples various system components including the memory 157 to the processor(s) 156.
- the processor(s) 156 may include one or more parallel processing units capable of processing data in parallel with one another.
- the system bus 158 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, or a local bus, and may use any suitable bus architecture.
- bus architectures include the Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus (also known as Mezzanine bus).
- the regulation analyzer platform 155 may further include a user interface 153 configured to present content (e.g., information associated with regulations and/or object models generated therefrom). Additionally, a user may make selections to the content via the user interface 153, such as to navigate through different information, select and review certain object models and information related thereto, and/or other actions.
- the user interface 153 may be embodied as part of a touchscreen configured to sense touch interactions and gestures by the user.
- other system components communicatively coupled to the system bus 158 may include input devices such as a cursor control device (e.g., a mouse, trackball, touch pad, etc.) and keyboard (not shown).
- a monitor or other type of display device may also be connected to the system bus 158 via an interface, such as a video interface.
- computers may also include other peripheral output devices such as a printer, which may be connected through an output peripheral interface (not shown).
- the memory 157 may include a variety of computer-readable media.
- Computer-readable media may be any available media that can be accessed by the computing device and may include both volatile and nonvolatile media, and both removable and non-removable media.
- Computer-readable media may comprise computer storage media, which may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, routines, applications (e.g., a regulation analyzer application 160), data structures, program modules or other data.
- Computer storage media may include, but is not limited to, RAM, ROM, EEPROM, FLASH memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the processor 156 of the computing device.
- the regulation analyzer platform 155 may operate in a networked environment and communicate with one or more remote platforms, such as a remote platform 165, via a network(s) 162, such as a local area network (LAN), a wide area network (WAN), telecommunications network, or other suitable network.
- LAN local area network
- WAN wide area network
- telecommunications network or other suitable network.
- the platform 165 may be implemented on any computing device, including one or more of the electronic devices 101, 102, or the server computer 115 as discussed with respect to FIG. 1A, and may include many or all of the elements described above with respect to the platform 155.
- the regulation analyzer application 160 as will be further described herein may be stored and executed by the remote platform 165 instead of by or in addition to the platform 155.
- the regulation analyzer platform 155 may store, as regulation and product data 164, any information associated with product protocols and regulations, such as the received regulation data 151. Additionally, the regulation analyzer application 160 may employ machine learning techniques such as, for example, a regression analysis (e.g., a logistic regression, linear regression, or polynomial regression), k-nearest neighbors, decision trees, random forests, boosting, neural networks, support vector machines, deep learning, reinforcement learning, latent semantic analysis, Bayesian networks, graph analysis, word embeddings, or the like. Generally, the regulation analyzer platform 155 may support various supervised and/or unsupervised machine learning techniques.
- a regression analysis e.g., a logistic regression, linear regression, or polynomial regression
- k-nearest neighbors e.g., a logistic regression, linear regression, or polynomial regression
- decision trees e.g., a logistic regression, linear regression, or polynomial regression
- random forests e.g., a logistic regression, linear regression,
- the regulation analyzer application 160 may analyze the regulation data 151 and may generate resulting object model data 163 which may be stored in the memory 157.
- the regulation analyzer application 160 may enrich the object model data 163 with information generated from analyzing the regulation data 151, using various of the techniques as discussed herein.
- the regulation analyzer application 160 may output the set of regulation object models 152 which may contain the ingested regulation(s) and metadata, along with extracted information, topical classifications, and summaries produced by the regulation analyzer application 160.
- the regulation analyzer application 160 may cause the regulation object model(s) 152 (and, in some cases, the originally- received data 151) to be displayed on the user interface 153 for review by the user of the regulation analyzer platform 155. The user may select to review and/or modify the displayed data.
- FIG. 2 details functionalities associated with the analysis of the regulation data 151 and the generation of the set of regulation object models 152.
- a computer program product in accordance with an embodiment may include a computer usable storage medium (e.g., standard random access memory (RAM), an optical disc, a universal serial bus (USB) drive, a big data processing engine, a NoSQL repository, or the like) having computer-readable program code embodied therein, wherein the computer-readable program code may be adapted to be executed by the processor 156 (e.g., working in connection with an operating systems) to facilitate the functions as described herein.
- a computer usable storage medium e.g., standard random access memory (RAM), an optical disc, a universal serial bus (USB) drive, a big data processing engine, a NoSQL repository, or the like
- the computer-readable program code may be adapted to be executed by the processor 156 (e.g., working in connection with an operating systems) to facilitate the functions as described herein.
- the program code may be implemented in any desired language, and may be implemented as machine code, assembly code, byte code, interpretable source code or the like (e.g., via Golang, Python, Scala, C, C++, Java, Actionscript, Objective-C, Javascript, CSS, XML, JSON).
- the computer program product may be part of a cloud network of resources.
- each of the data 151 and the data 152 may be embodied as any type of electronic document, file, template, object model, etc., that may include various textual content, images, figures, tables, footnotes, citations, appendices, or other referenced materials, and may be stored in memory as program data in a hard disk drive, magnetic disk and/or optical disk drive in the regulation analyzer platform 155 and/or the remote platform 165.
- the set of regulation object models 152 may be stored in JavaScript Object Notation (JSON) format.
- FIG. 2 is an example flowchart depicting various functionalities associated with the systems and methods.
- a server computer e.g., the server computer 115 as discussed with respect to FIG. 1A
- Reference 200 of FIG. 2 represents a corpus or set of data that may be extracted from a web site or another source (such as the regulation-related data source(s) 116 as discussed with respect to FIG. 1A).
- the set of data 200 may include a set of electronic documents respectively corresponding to a set of regulations, where each electronic document may have a defined structure, format (e.g., HTML, PDF, XML, JSON, etc.), and/or the like.
- the set of data 200 may be retrieved from a source via an application programming interface (API), or via one or more other data sources (e.g., web crawls, RSS, etc.).
- the set of data 200 may be annotated or labeled (e.g., may include metadata indicating topics corresponding to the regulation), or may be unlabeled.
- the server computer may analyze or examine (201) the data 200 to segment or parse the data 200 into various components or sections.
- the server computer may identify, extract, organize, or segment the various sections of the electronic document into a set of structured texts including, for example, a title, body or paragraph(s), section or sub-sections, itemized lists or sub-lists, citations, references, header, footer, and/or the like.
- the server computer may identify, extract, or segment various metadata associated with the electronic document.
- the metadata may include a set of tags or topic labels that are included with the electronic document and that may be descriptive of the regulation described in the electronic document.
- the server computer may automatically generate (e.g., using a data model) a set of topics or tags for the electronic document, based on the content of the electronic document, such as if the electronic document does not have such topics or tags included.
- the server computer may alternatively or additionally generate the set of topics or tags for the electronic document based on additional information from internal data sources that may be descriptive of product testing, inspection, and certification requirements, or of the engineering of products and their affiliated components, materials, chemicals, attributes or features.
- the server computer may, for each electronic document, generate a regulation object model (202) which, according to embodiments, may be a structured data object having a consistent format and structure across object models. Subsequent processing of the regulation object models and respective electronic documents may enrich the regulation object models with additional data, as discussed with respect to 203, 204, 205, and 206.
- the regulation object model may be in various formats such as, for example, JSON, XML or RDF.
- the server computer may tokenize and detect sentences included in the electronic document.
- the server computer may detect words and phrases in the electronic documents, represented as n-grams, and may identify sentences or phrases as a consecutive string of n-grams.
- the n-grams, phrases, and sentences as identified or determined by the server computer are represented as 204.
- the server computer may generate a summary for the electronic document.
- the server computer may generate the summary using a relevant or representative sentence(s) from the electronic document, or generate a summary based upon a learned representation of regulatory text, where the summary may be an abstractive summarization.
- the server computer may generate the summary of the regulation, or regulation section, using a machine learning algorithm that yields a summarization model from a corpus of regulatory training data.
- the server computer may use various techniques to rank the set of sentences according to a relevancy of the set of sentences to the regulation, extract or identify a portion of the set of sentences that are deemed most relevant to the requirements of a regulation, and generate the summary using the portion of the set of sentences.
- the server computer may use trained language models, cosine-similarity, natural language processing, or any other sentence or word similarity measure to generate summary content from any portion of the regulation object model.
- the server computer may add the summary content (represented as 206) to the regulation object model, thus enriching the regulation object model.
- the server computer may train, update, modify, or enhance the summarization model(s) in (205) and/or from various portions of the corresponding regulation object model, including addition of topical information (212) and/or extracted entities (215), each of which may have an assigned probability of being applicable to the underlying regulation.
- a classifier artifact may update one or more of a topic prediction model and an entity extraction model, the output of which may feed back (with or without SME/analyst review) into the object model for further training.
- the server computer may apply the classifier model to a portion of text in the electronic document and calculate a probability of applicability of regulatory topics for that sentence.
- Block 210 represents sentences in the electronic document that are classified using the classifier model, and block 211 represents the sentences being filtered based on the probability of applicability of the regulatory topics.
- the server computer may output a list of sentences in the electronic document where each sentence in the list of sentences lists at least one particular regulatory topic having a probability that at least meets a specified threshold.
- a user may query for and/or the server may output sentences in a particular regulation having the predicted regulatory topics “plastics” and “food additive” with at least an 85% probability of applicability, thereby meeting a set threshold for adding a topical assignment to that sentence in the object model.
- the server computer may enable for an understanding of what regulatory topics are being discussed in various sections of the electronic document (block 212), which the enriched regulation object model may additionally reflect.
- the server computer may also extract a set of recognized entities as key terms or phrases from the regulatory text(s).
- a named-entity recognition (NER) artifact may locate and classify named entities mentioned in unstructured text into pre-defined categories.
- the server computer can annotate the text (block 213).
- the server computer may identify key terms and phrases based on blocks of texts having a topic label that exceeds a prediction threshold according to an entity recognition model trained to identify key terms and phrases associated with the predicted regulatory topics (block 214). This results in a set of keywords and/or phrases (block 215) of the text that are meaningful or significant to the electronic document, and/or to the regulation itself.
- the server computer may employ additional or alternative topic- salience techniques or algorithms for extracting a set of keywords or phrases.
- the server computer may add the keywords and/or phrases, along with predicted entity names, to the regulation object model represented by block 202.
- the server computer may output or avail a profile of the underlying regulation for storage in a regulations database 216.
- the server computer may identify regulatory structure, segment and parse the document, assign topic labels, keywords and/or phrases, and add any or all of this information to the regulation object model for that particular regulation.
- the database 216 may store multiple regulation object models representing multiple regulations, where the regulation object models may be consistent in format and structure, and may cover multiple jurisdictions.
- the regulations database 216 may store previous (i.e., historical) and/or consolidated versions of various regulations, where any access of the previous versions may indicate any differences between the previous versions and the current versions of the various regulations.
- a regulation in a particular jurisdiction may be issued that regulates the amount of lead in surface coating of select product categories.
- the server computer may determine an influential or prominent sentence(s) based on frequency statistics over the entire corpus (e.g., sentence(s) that identify relevant chemical restrictions and covered product types), and may generate a summary using the determined sentence(s).
- the server computer may apply a classifier model to determine, from the summary, the following set of topic labels: lead, coating, surface coating, hazardous, toys, children, lead-containing, paint, furniture, consumer product, along with respective probabilities of applicability of the topic labels.
- FIG. 3 depicts a block diagram of an example method 300 for creating object models for regulations.
- the method 300 may be facilitated by an electronic device (such as the server computer 115 or components associated with the regulation analyzer platform 115 as discussed with respect to FIGs. 1A and IB) that may be in communication with additional devices and/or data sources.
- an electronic device such as the server computer 115 or components associated with the regulation analyzer platform 115 as discussed with respect to FIGs. 1A and IB
- additional devices and/or data sources may be in communication with additional devices and/or data sources.
- the method 300 may begin when the electronic device accesses (block 305) a set of regulatory information corresponding to a regulation.
- the electronic device may access the set of regulatory information from one or more data sources.
- the electronic device may segment (block 310) the set of regulatory information into a set of structured texts that may include, for example, a header, footer, title, body, sections, sub-sections, paragraphs, lists, sub-lists, citations, references, or any other type of information block indicated in the format of the original document, and a set of metadata that may contain additional information related to the content and subject matter of the regulation.
- the set of metadata may include a set of topic labels that is originally included in the set of regulatory information.
- the electronic device may examine the set of regulatory information to determine that a set of topic labels is not present, generate a set of applicable topic labels for the set of regulatory information, and store the set of applicable topic labels with the set of metadata.
- the electronic device may generate (block 315) an object model for the regulation, where the object model may include the set of structured texts and the set of metadata.
- the object model may be in various formats such as, for example, JSON, XML, RDF, or other formats.
- the electronic device may perform (block 320) a linguistic analysis on the object model to detect a set of sentences within the set of structured texts.
- the electronic device may use one or more various linguistic and/or statistical analyses to generate a set of token n-grams from the set of sentences, or conversely may detect the set of sentences from the set of n-grams.
- the electronic device may generate (block 325), based on the set of sentences, a summary of the regulation and/or its segments.
- the electronic device may rank the set of sentences within the text and, based on the ranking of the set of sentences, extract a portion of the set of sentences. Further, the electronic device may generate an abstract summary using a portion of the parsed text .
- the electronic device may train (block 330), using the object model and generated summary, a classification and/or an entity recognition model.
- the electronic device may further determine (block 335), using the classification model, a set of topics for any portion of text within the object model. Additionally, the electronic device may determine (block 340), for each topic in the set of topics, a probability that the topic is applicable to the regulation.
- the electronic device may output at least a portion of the regulation object model, where the data in the portion has a probability that at least meets a specified threshold. It should be appreciated that the specified threshold may be a default value and/or configurable.
- the electronic device may enrich (block 345) the object model for the regulation with the summary of the regulation, the set of topics, and the set of extracted attributes, for each selected portion of the object model. Accordingly, with the enrichment and storage of multiple object models corresponding to multiple regulations, the method may ensure a consistent format and structure for the regulation object models across multiple jurisdictions.
- routines, subroutines, applications, or instructions may constitute either software (e.g., code embodied on a non-transitory, machine-readable medium) or hardware.
- routines, etc. are tangible units capable of performing certain operations and may be configured or arranged in a certain manner.
- one or more computer systems e.g., a standalone, client or server computer system
- one or more hardware modules of a computer system e.g., a processor or a group of processors
- software e.g., an application or application portion
- a hardware module may be implemented mechanically or electronically.
- a hardware module may comprise dedicated circuitry or logic that may be permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application- specific integrated circuit (ASIC)) to perform certain operations.
- a hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that may be temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
- the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
- hardware modules are temporarily configured (e.g., programmed)
- each of the hardware modules need not be configured or instantiated at any one instance in time.
- the hardware modules comprise a general-purpose processor configured using software
- the general-purpose processor may be configured as respective different hardware modules at different times.
- Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
- Hardware modules may provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it may be communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and may operate on a resource (e.g., a collection of information).
- a resource e.g., a collection of information
- processors may be temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions.
- the modules referred to herein may, in some example embodiments, comprise processor- implemented modules.
- the methods or routines described herein may be at least partially processor- implemented. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment, or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
- the performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines.
- the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
- any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment.
- the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- the terms “comprises,” “comprising,” “may include,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion.
- a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
- “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Document Processing Apparatus (AREA)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP20877451.3A EP4046119A4 (en) | 2019-10-18 | 2020-10-16 | Technologies for dynamically creating representations for regulations |
| AU2020366040A AU2020366040A1 (en) | 2019-10-18 | 2020-10-16 | Technologies for dynamically creating representations for regulations |
| JP2022523230A JP7609859B2 (ja) | 2019-10-18 | 2020-10-16 | 規制のための表現を動的に作成するための技術 |
| IL291724A IL291724A (en) | 2019-10-18 | 2022-03-27 | Technologies for the dynamic creation of presentations for regulations |
| JP2024220658A JP7834837B2 (ja) | 2019-10-18 | 2024-12-17 | 規制のための表現を動的に作成するための技術 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962923306P | 2019-10-18 | 2019-10-18 | |
| US62/923,306 | 2019-10-18 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021076862A1 true WO2021076862A1 (en) | 2021-04-22 |
Family
ID=75491187
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2020/055936 Ceased WO2021076862A1 (en) | 2019-10-18 | 2020-10-16 | Technologies for dynamically creating representations for regulations |
Country Status (6)
| Country | Link |
|---|---|
| US (3) | US11783132B2 (https=) |
| EP (1) | EP4046119A4 (https=) |
| JP (2) | JP7609859B2 (https=) |
| AU (1) | AU2020366040A1 (https=) |
| IL (1) | IL291724A (https=) |
| WO (1) | WO2021076862A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113947062A (zh) * | 2021-10-28 | 2022-01-18 | 上海纵波科技有限公司 | 一种ppt在线转换生成html文档的方法、装置、设备及介质 |
Families Citing this family (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11763321B2 (en) | 2018-09-07 | 2023-09-19 | Moore And Gasperecz Global, Inc. | Systems and methods for extracting requirements from regulatory content |
| AU2020366040A1 (en) * | 2019-10-18 | 2022-04-21 | Ul Llc | Technologies for dynamically creating representations for regulations |
| US12293375B2 (en) * | 2019-11-08 | 2025-05-06 | Ul Llc | Technologies for using machine learning to determine product certification eligibility |
| US11625535B1 (en) * | 2019-12-05 | 2023-04-11 | American Express Travel Related Services Company, Inc. | Computer-based systems having data structures configured to execute SIC4/SIC8 machine learning embedded classification of entities and methods of use thereof |
| US10956673B1 (en) | 2020-09-10 | 2021-03-23 | Moore & Gasperecz Global Inc. | Method and system for identifying citations within regulatory content |
| US20220147814A1 (en) * | 2020-11-09 | 2022-05-12 | Moore & Gasperecz Global Inc. | Task specific processing of regulatory content |
| US11314922B1 (en) * | 2020-11-27 | 2022-04-26 | Moore & Gasperecz Global Inc. | System and method for generating regulatory content requirement descriptions |
| US20220366432A1 (en) * | 2021-05-12 | 2022-11-17 | FoodChain ID Group, Inc. | Automated product compliance analysis |
| CN113221538B (zh) * | 2021-05-19 | 2023-09-19 | 北京百度网讯科技有限公司 | 事件库构建方法和装置、电子设备、计算机可读介质 |
| US12561629B2 (en) * | 2022-01-27 | 2026-02-24 | International Business Machines Corporation | Identifying regulatory data corresponding to executable rules |
| US11823477B1 (en) | 2022-08-30 | 2023-11-21 | Moore And Gasperecz Global, Inc. | Method and system for extracting data from tables within regulatory content |
| US11783112B1 (en) * | 2022-09-30 | 2023-10-10 | Intuit, Inc. | Framework agnostic summarization of multi-channel communication |
| US20250077775A1 (en) * | 2023-08-29 | 2025-03-06 | Adobe Inc. | Utilizing machine learning models to generate aspect-based transcript summaries |
| US12067343B1 (en) * | 2023-11-30 | 2024-08-20 | Munich Reinsurance America, Inc. | Computing technologies for web forms |
| WO2026028435A1 (ja) * | 2024-08-02 | 2026-02-05 | 株式会社Nttドコモ | テキスト分析装置 |
| US12579594B1 (en) * | 2024-11-22 | 2026-03-17 | BH Operations, LLC | Systems and methods for dynamically updating data for course generation |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040139053A1 (en) * | 2002-01-04 | 2004-07-15 | Haunschild Gregory D. | Online regulatory compliance system and method for facilitating compliance |
| US20130024388A1 (en) * | 2011-07-22 | 2013-01-24 | Kolb Kurt G | Computer-Implemented System And Method For Modeling Contractual Terms As Structured Data For License Compliance Analysis |
| US20130262484A1 (en) * | 2012-04-03 | 2013-10-03 | Bureau Veritas | Method and system for managing product regulations and standards |
| US20150347390A1 (en) * | 2014-05-30 | 2015-12-03 | Vavni, Inc. | Compliance Standards Metadata Generation |
| US9292623B2 (en) * | 2004-09-15 | 2016-03-22 | Graematter, Inc. | System and method for regulatory intelligence |
| US20180330455A1 (en) | 2017-05-13 | 2018-11-15 | Regology, Inc. | Method and system for facilitating implementation of regulations by organizations |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090125283A1 (en) | 2007-09-26 | 2009-05-14 | David Conover | Method and apparatus for automatically determining compliance with building regulations |
| US8204869B2 (en) * | 2008-09-30 | 2012-06-19 | International Business Machines Corporation | Method and apparatus to define and justify policy requirements using a legal reference library |
| WO2011072172A1 (en) * | 2009-12-09 | 2011-06-16 | Renew Data Corp. | System and method for quickly determining a subset of irrelevant data from large data content |
| US10467717B2 (en) * | 2015-10-07 | 2019-11-05 | International Business Machines Corporation | Automatic update detection for regulation compliance |
| US20200151392A1 (en) * | 2015-10-28 | 2020-05-14 | Qomplx, Inc. | System and method automated analysis of legal documents within and across specific fields |
| US10726343B2 (en) * | 2016-11-09 | 2020-07-28 | Cognitive Scale, Inc. | Performing compliance operations using cognitive blockchains |
| US11687827B2 (en) * | 2018-10-04 | 2023-06-27 | Accenture Global Solutions Limited | Artificial intelligence (AI)-based regulatory data processing system |
| US11061959B2 (en) * | 2019-08-02 | 2021-07-13 | Raj Kumar Gulati | Methods and systems for regulatory intelligence |
| AU2020366040A1 (en) * | 2019-10-18 | 2022-04-21 | Ul Llc | Technologies for dynamically creating representations for regulations |
-
2020
- 2020-10-16 AU AU2020366040A patent/AU2020366040A1/en active Pending
- 2020-10-16 WO PCT/US2020/055936 patent/WO2021076862A1/en not_active Ceased
- 2020-10-16 EP EP20877451.3A patent/EP4046119A4/en active Pending
- 2020-10-16 JP JP2022523230A patent/JP7609859B2/ja active Active
- 2020-10-16 US US17/072,319 patent/US11783132B2/en active Active
-
2022
- 2022-03-27 IL IL291724A patent/IL291724A/en unknown
-
2023
- 2023-09-27 US US18/373,671 patent/US12314669B2/en active Active
-
2024
- 2024-12-17 JP JP2024220658A patent/JP7834837B2/ja active Active
-
2025
- 2025-05-12 US US19/205,573 patent/US20250272493A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040139053A1 (en) * | 2002-01-04 | 2004-07-15 | Haunschild Gregory D. | Online regulatory compliance system and method for facilitating compliance |
| US9292623B2 (en) * | 2004-09-15 | 2016-03-22 | Graematter, Inc. | System and method for regulatory intelligence |
| US20130024388A1 (en) * | 2011-07-22 | 2013-01-24 | Kolb Kurt G | Computer-Implemented System And Method For Modeling Contractual Terms As Structured Data For License Compliance Analysis |
| US20130262484A1 (en) * | 2012-04-03 | 2013-10-03 | Bureau Veritas | Method and system for managing product regulations and standards |
| US20150347390A1 (en) * | 2014-05-30 | 2015-12-03 | Vavni, Inc. | Compliance Standards Metadata Generation |
| US20180330455A1 (en) | 2017-05-13 | 2018-11-15 | Regology, Inc. | Method and system for facilitating implementation of regulations by organizations |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4046119A4 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113947062A (zh) * | 2021-10-28 | 2022-01-18 | 上海纵波科技有限公司 | 一种ppt在线转换生成html文档的方法、装置、设备及介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| US12314669B2 (en) | 2025-05-27 |
| EP4046119A1 (en) | 2022-08-24 |
| JP2025060711A (ja) | 2025-04-10 |
| JP7834837B2 (ja) | 2026-03-24 |
| US20250272493A1 (en) | 2025-08-28 |
| IL291724A (en) | 2022-05-01 |
| EP4046119A4 (en) | 2023-11-01 |
| US20210117621A1 (en) | 2021-04-22 |
| JP2022552421A (ja) | 2022-12-15 |
| JP7609859B2 (ja) | 2025-01-07 |
| US20240020480A1 (en) | 2024-01-18 |
| US11783132B2 (en) | 2023-10-10 |
| AU2020366040A1 (en) | 2022-04-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12314669B2 (en) | Technologies for dynamically creating representations for regulations | |
| US12455928B1 (en) | Models for classifying documents | |
| US12572735B2 (en) | Domain-specific document validation | |
| Gu et al. | " what parts of your apps are loved by users?"(T) | |
| US8423568B2 (en) | Query classification using implicit labels | |
| JP5160601B2 (ja) | 相対頻度に基づくフレーズマイニングのためのシステム、方法、及び装置 | |
| US9519686B2 (en) | Confidence ranking of answers based on temporal semantics | |
| US10002371B1 (en) | System, method, and computer program product for searching summaries of online reviews of products | |
| US20150193482A1 (en) | Topic sentiment identification and analysis | |
| US20180239815A1 (en) | Method and system for sentiment analysis of information | |
| US9760828B2 (en) | Utilizing temporal indicators to weight semantic values | |
| US11226946B2 (en) | Systems and methods for automatically determining a performance index | |
| Sharma et al. | NIRMAL: Automatic identification of software relevant tweets leveraging language model | |
| EP4463778A1 (en) | System, method, and computer program product for tokenizing document citations | |
| CN110134844A (zh) | 细分领域舆情监控方法、装置、计算机设备及存储介质 | |
| US20250061308A1 (en) | Using Machine Learning to Extract Information from Electronic Communications | |
| US11475469B2 (en) | Business lines | |
| Ruhwinaningsih et al. | A sentiment knowledge discovery model in Twitter’s TV content using stochastic gradient descent algorithm | |
| CN116629241A (zh) | 一种文档质量评价方法及计算设备 | |
| US12056715B2 (en) | Technologies for dynamically assessing applicability of product regulations to product protocols | |
| Zaman et al. | Using text mining to sub-classify safety concern mentions in online reviews | |
| Lin Htet et al. | Influence of material attribute context and document layout variability on large language model performance in construction material specification | |
| Martinez et al. | Using machine learning to improve regulatory review of flight waivers and exemptions | |
| Mbuthia | A prototype for information capture using natural language processing: a case of Nation online news |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20877451 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022523230 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020366040 Country of ref document: AU Date of ref document: 20201016 Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 2020877451 Country of ref document: EP Effective date: 20220518 |