CN110647584A - Internet of things platform document data management method and device - Google Patents

Internet of things platform document data management method and device Download PDF

Info

Publication number
CN110647584A
CN110647584A CN201910900938.1A CN201910900938A CN110647584A CN 110647584 A CN110647584 A CN 110647584A CN 201910900938 A CN201910900938 A CN 201910900938A CN 110647584 A CN110647584 A CN 110647584A
Authority
CN
China
Prior art keywords
document
information
paragraph
data
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910900938.1A
Other languages
Chinese (zh)
Inventor
张晓霞
曲文武
胡伟凤
纪旭升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Juhaolian Technology Co Ltd
Original Assignee
Qingdao Juhaolian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Juhaolian Technology Co Ltd filed Critical Qingdao Juhaolian Technology Co Ltd
Priority to CN201910900938.1A priority Critical patent/CN110647584A/en
Publication of CN110647584A publication Critical patent/CN110647584A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation

Abstract

The invention discloses a method and a device for managing document data of an Internet of things platform. The device information and the manufacturer information provided by the user are used for analyzing the related document data of the device, so that customized document management service is provided for the client, basic information support can be further provided for the knowledge question and answer of the intelligent customer service system, and the experience of the knowledge question and answer of the user is improved.

Description

Internet of things platform document data management method and device
Technical Field
The embodiment of the invention relates to the technical field of Internet of things, in particular to a method and a device for managing document data of an Internet of things platform.
Background
With the development of economy and the progress of society, various types of intelligent household equipment greatly enrich and improve the life quality of people. However, with the increase of the intelligent equipment category, the installation, use, maintenance, repair, etc. of the equipment become a problem which troubles the customers and users. The Internet of things platform is used as an access and management platform of equipment, allows a client to provide corresponding documents through integration of an intelligent customer service technology of manual intelligence, and provides services of document retrieval, document inquiry and knowledge question answering for the user.
However, since the internet of things platform allows access to the same type of smart devices of different brands, the document contents provided by these clients may have a certain similarity. When a user asks a question, some questions may be raised. For example, the user asks "how to water the water heater", the user only asks the water heater at home, and the user asks the water heater how to water, the semantic expression of the user is clear, but at the server side of the intelligent customer service, the stored knowledge of water supply of the water heater may come from different brands, such as brand a, brand B and … …. The user may be answered with a wrong brand match if the user is provided with answers by similarity analysis alone.
Disclosure of Invention
The embodiment of the invention provides a method and a device for managing document data of an Internet of things platform, which are used for providing customized document management service for a client.
In a first aspect, an embodiment of the present invention provides a method for managing internet of things platform document data, including:
acquiring document information uploaded by a user, wherein the document information comprises equipment information, manufacturer information and equipment-related documents;
performing data preprocessing on the device-related document;
and performing data analysis of a plurality of entity classes on the preprocessed equipment-related document according to the equipment information and the manufacturer information, and storing a data analysis result in a database.
According to the technical scheme, the device related document data are analyzed through the device information and the manufacturer information provided by the user, so that customized document management service is provided for the customer, basic information support can be further provided for the knowledge question and answer of the intelligent customer service system, and the experience of the knowledge question and answer of the user is improved.
Optionally, the performing data preprocessing on the device-related document includes:
and performing data cleaning on the device-related document, wherein the data cleaning comprises removing blank lines, removing unrecognized texts, retaining header information, retaining paragraph information and regularizing data storage.
Optionally, the performing, according to the device information and the vendor information, data analysis of multiple entity classes on the preprocessed device-related document includes:
for document data, generating a document ID according to the equipment information, the manufacturer information and the time for acquiring the document information, and generating a document abstract corresponding to the document ID according to the preprocessed equipment-related document;
for paragraph class data, determining a paragraph ID according to the document ID and the position of the paragraph in the device-related document, and generating a paragraph abstract and a paragraph key word corresponding to the paragraph ID;
for sentence data, determining a sentence ID according to the paragraph ID and the position of the sentence in the paragraph, and generating a sentence keyword corresponding to the sentence ID;
for the part of speech data, the word ID is generated in the sequence of the sentence in which the word is located, and the part of speech corresponding to the word ID is generated.
Optionally, the data parsing result is used for document retrieval, document query or knowledge question and answer.
In a second aspect, an embodiment of the present invention provides a device for managing internet of things platform document data, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring document information uploaded by a user, and the document information comprises equipment information, manufacturer information and equipment-related documents;
the processing unit is used for carrying out data preprocessing on the device-related document; and performing data analysis of a plurality of entity classes on the preprocessed equipment-related document according to the equipment information and the manufacturer information, and storing a data analysis result in a database.
Optionally, the processing unit is specifically configured to:
and performing data cleaning on the device-related document, wherein the data cleaning comprises removing blank lines, removing unrecognized texts, retaining header information, retaining paragraph information and regularizing data storage.
Optionally, the processing unit is specifically configured to:
for document data, generating a document ID according to the equipment information, the manufacturer information and the time for acquiring the document information, and generating a document abstract corresponding to the document ID according to the preprocessed equipment-related document;
for paragraph class data, determining a paragraph ID according to the document ID and the position of the paragraph in the device-related document, and generating a paragraph abstract and a paragraph key word corresponding to the paragraph ID;
for sentence data, determining a sentence ID according to the paragraph ID and the position of the sentence in the paragraph, and generating a sentence keyword corresponding to the sentence ID;
for the part of speech data, the word ID is generated in the sequence of the sentence in which the word is located, and the part of speech corresponding to the word ID is generated.
Optionally, the data parsing result is used for document retrieval, document query or knowledge question and answer.
In a third aspect, an embodiment of the present invention further provides a computing device, including:
a memory for storing program instructions;
and the processor is used for calling the program instruction stored in the memory and executing the management method of the Internet of things platform document data according to the obtained program.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable non-volatile storage medium, which includes computer-readable instructions, and when the computer reads and executes the computer-readable instructions, the computer is caused to execute the method for managing the internet of things platform document data.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a method for managing internet of things platform document data according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a method for managing internet of things platform document data according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a data E-R relationship according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a device for managing internet of things platform document data according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 illustrates an exemplary system architecture, which may be a server 100, including a processor 110, a communication interface 120, and a memory 130, to which embodiments of the present invention are applicable. The server 100 may be a data storage server.
The communication interface 120 is used for communicating with the terminal devices of the users, respectively, and transceiving information transmitted by the terminal devices of the users to implement communication.
The processor 110 is a control center of the server 100, connects various parts of the entire server 100 using various interfaces and routes, performs various functions of the server 100 and processes data by operating or executing software programs and/or modules stored in the memory 130 and calling data stored in the memory 130. Alternatively, processor 110 may include one or more processing units.
The memory 130 may be used to store software programs and modules, and the processor 110 executes various functional applications and data processing by operating the software programs and modules stored in the memory 130. The memory 130 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to a business process, and the like. Further, the memory 130 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
It should be noted that the structure shown in fig. 1 is only an example, and the embodiment of the present invention is not limited thereto.
Based on the server 100, the method for managing the document data of the platform of the internet of things provided by the invention provides customized document management service for the client through the information and the document provided by the client, and provides information support for the accurate knowledge question answering of the intelligent customer service of the platform of the internet of things. The process is shown in fig. 2, and specifically comprises the following steps:
step 1: and the document uploading module is used for providing a document uploading operation interface for the client. And uploading the document and providing corresponding information by the client through the prompt message of the operation interface.
Step 2: the document processing and storing module processes the document uploaded by the client, integrates the information provided by the client into the document ID as a tag in the process, and then stores the tagged processing data into the database.
Based on the above description, fig. 3 shows in detail a flow of the method for managing platform document data of an internet of things according to the embodiment of the present invention, where the flow may be executed by a device for managing platform document data of an internet of things, and the device may be located in the server 100 shown in fig. 1, or may be the server 100.
As shown in fig. 3, the process specifically includes:
step 301, obtaining document information uploaded by a user.
The document information may include device information, vendor information, and device-related documents. That is, after the user logs in the client on the terminal device, a document interface for uploading by the client appears, which prompts the user to provide information including, but not limited to, device information (such as device type, device brand, device model, device production date, etc.), manufacturer information (such as company name, company address, etc.), and device-related documents (such as device description, device maintenance manual, etc., in the form of word, txt, etc.).
Step 302, performing data preprocessing on the device-related document.
Specifically, data cleaning is carried out on the device-related document, and the data cleaning comprises blank line removing, unrecognizable text removing, title information retaining, paragraph information retaining and regularized data storage.
After receiving the device-related document, the device-related document needs to be converted into text data that can be recognized by the computer device, and then the text data is subjected to data cleansing, that is, the device-related document is subjected to data cleansing, wherein the cleansing operation includes, but is not limited to: removing blank lines, removing unrecognizable text, retaining header information, retaining paragraph information, and regularizing data storage.
Step 303, performing data analysis of multiple entity classes on the preprocessed device-related document according to the device information and the manufacturer information, and storing a data analysis result in a database.
After the data of the relevant documents of the equipment are preprocessed, the preprocessed documents can be subjected to data analysis according to the equipment information and the manufacturer information, wherein a plurality of entity classes can comprise document classes, paragraph classes, sentence classes and word classes.
Specifically, for document data, a document ID is generated according to the device information, the manufacturer information and the time for acquiring the document information, and a document abstract corresponding to the document ID is generated according to the preprocessed device-related document. And for paragraph class data, determining a paragraph ID according to the document ID and the position of the paragraph in the device-related document, and generating a paragraph abstract and a paragraph key word corresponding to the paragraph ID. And for the sentence class data, determining a sentence ID according to the paragraph ID and the position of the sentence in the paragraph where the sentence is located, and generating a sentence keyword corresponding to the sentence ID. For the part of speech data, the word ID is generated in the sequence of the sentence in which the word is located, and the part of speech corresponding to the word ID is generated.
In the process of practical application, an incidence relation exists among entity classes according to the hierarchical relation of text granularity and thickness, namely composition. Each type of entity contains a set of attributes. Wherein entities and relationships can be described by the E-R diagram shown in fig. 4.
The data analysis process may specifically be:
a) according to the document title and the document content attribute obtained in the preprocessing process, the document id is generated through a document id generation algorithm, the document abstract is generated through a text abstract method, and the document keywords are generated through a keyword extraction algorithm. Where the document id generation algorithm would integrate the client-provided information into the document id as a tag.
b) Paragraph class data obtains paragraph contents and document ID attributes through document class data, paragraph IDs are generated through a paragraph ID generation algorithm, paragraph abstracts are generated through a text abstraction method, and paragraph keywords are generated through a keyword extraction algorithm.
c) The sentence class data obtains sentence content and document ID attributes through paragraph class data, the sentence ID is generated through a sentence ID generation algorithm, and the sentence keywords are generated through a keyword extraction algorithm.
d) The word type data obtains word attributes in the data preprocessing step, word ids are generated through a word id generation algorithm, and parts of speech are obtained through a part of speech tagging algorithm.
After the data analysis result is obtained, the data can be stored in a json format. The document class data includes fields including, but not limited to, those shown in table 1. Paragraph class data contains fields including, but not limited to, those shown in Table 2. The sentence-like data contains fields including, but not limited to, those shown in table 3. The part-of-speech data contains fields including, but not limited to, those shown in table 4.
TABLE 1
Name of field Type of field Description of the invention
Title Text Document title
Did String Document id
Content Text Document content
Abstract Text Document summarization
Keyword Text Document keywords
TABLE 2
Name of field Type of field Description of the invention
Did String Document id
Pid String Paragraph id
Pcontent Text Paragraph content
Pabstract Text Paragraph abstract
Pkeyword Text Paragraph keywords
TABLE 3
Figure BDA0002211800120000071
Figure BDA0002211800120000081
TABLE 4
Name of field Type of field Description of the invention
Wid String Word id
Word Text Word
Wpos String Part of speech
In order to better explain the embodiment of the present invention, the following describes the process of managing the document data of the internet of things platform in a specific embodiment.
The examples are described below:
1. the user provides information and documents.
Device information: equipment type ═ air conditioning'; equipment brand ═ brand a'; the equipment model is 'model B'; the equipment production date is '2019-01-01'.
Manufacturer information: company name ═ company C'; the company address is 'address D'.
And (4) related documents: word document named' model B air conditioner description
2. And processing document data.
1) And (6) reading data.
The document contents are as follows:
the operation specification is suitable for installation, operation and maintenance of the air conditioner of type B.
To ensure the proper operation of the unit and to prevent failure, the installation must be undertaken by a technician with some knowledge and experience of the refrigeration and air conditioning system. The person installing, operating and maintaining the equipment should have a basic understanding of the principles of the refrigeration and air conditioning system and the electrical controls, and a careful reading of this specification.
2) And (4) preprocessing data.
Extracting a document title: specification of air conditioner type B.
Extracting document contents, and obtaining the following results after removing blank lines, removing unrecognized texts and retaining paragraph information:
the specification is suitable for installation, operation and maintenance of the air conditioner of the type B.
To ensure the proper operation of the unit and to prevent failure, the installation must be undertaken by a technician with some knowledge and experience of the refrigeration and air conditioning system. The person installing, operating and maintaining the equipment should have a basic understanding of the principles of the refrigeration and air conditioning and the electrical control, and a careful reading of this specification. "
3) And (6) data analysis.
a) A document class.
The document id is generated by integrating tag information and a document upload sequence using information (device type ═ air conditioner, ' device brand ═ brand a ', company name ═ company C ') provided by a client as tag information. For example, the document id is generated as follows:
did ═ 01051001', where the first two digits ("01") represent the device type air conditioner, the third, fourth digits ("05") represent brand a, the fifth, sixth digits ("10") represent company C, and the last two digits ("01") represent the document upload order.
The document digest is generated using a textrank-based text digest algorithm, with the following results:
the specification is suitable for installation, operation and maintenance of the air conditioner of the type B. The person installing, operating and maintaining the equipment should have a basic understanding of the principles of the refrigeration and air conditioning and the electrical control, and a careful reading of this specification. "
Document keywords were generated using a TF IDF-based keyword extraction algorithm, with the following results:
an air conditioner; refrigerating; mounting; instructions for performing the steps; and (5) maintaining.
b) Paragraph classes.
In this example, there are two paragraphs, which are analyzed separately.
Paragraph 1 content: the specification is suitable for installation, operation and maintenance of the air conditioner of the type B. "
The Pid of paragraph 1 is generated using the document id and the position ordering of the paragraphs in the document, with the following results:
pid1 is '0105100101', where the first eight bits ('01051001') represent the document id and the last two bits ('01') represent the position ordering order of the paragraph in the document.
Paragraph 1 summary was generated using textrank based text summarization algorithm, with the following results:
the specification is suitable for installation, operation and maintenance of the air conditioner of the type B. "
Paragraph 1 keywords were generated using a TF IDF based keyword extraction algorithm, with the following results:
an air conditioner; instructions for performing the steps; and (6) mounting.
Paragraph 2 content: "to ensure the proper functioning of the unit and to prevent malfunctions, the installation must be undertaken by a technician with some knowledge and considerable experience of the refrigeration and air-conditioning. The person installing, operating and maintaining the equipment should have a basic understanding of the principles of the refrigeration and air conditioning and the electrical control, and a careful reading of this specification. "
The Pid of paragraph 2 is generated using the document id and the position ordering of the paragraphs in the document, with the following results: pid2 ═ 0105100102'.
Paragraph 2 summary was generated using textrank based text summarization algorithm, with the following results:
"to ensure the proper functioning of the unit and to prevent malfunctions, the installation must be undertaken by a technician with some knowledge and considerable experience of the refrigeration and air-conditioning. "
Paragraph 2 keywords were generated using a TF IDF based keyword extraction algorithm, with the following results: refrigerating; mounting; an air conditioner.
c) Sentence class.
In this example, there are 3 sentences which are analyzed separately.
Sentence 1 content: the specification is suitable for installation, operation and maintenance of the air conditioner of the type B. "
The Sid of sentence 1 is generated using paragraph id and sentence position ordering in paragraphs, with the following results:
sid1 is '010510010101', where the first ten digits ('0105100101') represent the paragraph id of the paragraph where the sentence is located, and the last two digits ('01') represent the position number of the sentence in the paragraph where the sentence is located.
The sentence 1 keywords were generated using a TF IDF based keyword extraction algorithm, with the following results:
instructions for performing the steps; the model B; an air conditioner.
Sentence 2 content: "to ensure the proper functioning of the unit and to prevent malfunctions, the installation must be undertaken by a technician with some knowledge and considerable experience of the refrigeration and air-conditioning. "
The Sid of sentence 2 is generated using paragraph id and sentence position ordering in paragraphs, with the following results: sid2 ═ 010510010201'.
The sentence 2 keywords were generated using a TF IDF based keyword extraction algorithm, with the following results: mounting; refrigerating; the technician.
Sentence 3 content: the person installing, operating and maintaining the equipment has a basic understanding of the principles of the refrigeration and air conditioning system and the electrical control, and a careful reading of this specification is made. "
The Sid of sentence 3 is generated using paragraph id and sentence position ordering in paragraphs, with the following results: sid3 ═ 010510010202'.
The sentence 3 keywords were generated using a TF IDF based keyword extraction algorithm, with the following results: refrigerating; electrical; and (6) instructions.
d) Part of speech
The word id is generated using a sequential assignment method.
Parts of speech were generated using the part of speech tagging method, 43 words were removed from punctuation marks, and the analysis results are shown in table 5.
TABLE 5
Word id Word Part of speech
1 Experience with Noun (name)
2 Is provided with Verb and its usage
3 Undertake Verb and its usage
4 Knowledge of Noun (name)
5 Operation of Verb and its usage
…… …… ……
41 Principle of Noun (name)
42 Is suitable for Verb and its usage
43 Is fixed to Adverb
4) Data storage
The data parsing result will be stored in json format.
The storage format of the document class is as follows:
Figure BDA0002211800120000121
the storage format of the paragraph class is as follows (taking paragraph 1 as an example):
Figure BDA0002211800120000122
the storage format of sentence type is as follows (taking sentence 1 as an example):
Figure BDA0002211800120000123
Figure BDA0002211800120000131
the part of speech storage format is as follows (taking word 1 as an example):
Figure BDA0002211800120000132
the method and the device for analyzing the document information acquire the document information uploaded by a user, the document information comprises equipment information, manufacturer information and equipment related documents, the equipment related documents are subjected to data preprocessing, the preprocessed equipment related documents are subjected to data analysis of a plurality of entity classes according to the equipment information and the manufacturer information, and data analysis results are stored in a database. The device information and the manufacturer information provided by the user are used for analyzing the related document data of the device, so that customized document management service is provided for the client, basic information support can be further provided for the knowledge question and answer of the intelligent customer service system, and the experience of the knowledge question and answer of the user is improved.
Based on the same technical concept, fig. 5 exemplarily shows a structure of a management apparatus for platform document data of internet of things according to an embodiment of the present invention, where the apparatus may execute a management process of platform document data of internet of things, and the apparatus may be located in the server 100 shown in fig. 1, or may be the server 100.
As shown in fig. 5, the apparatus specifically includes:
an obtaining unit 501, configured to obtain document information uploaded by a user, where the document information includes device information, manufacturer information, and device-related documents;
a processing unit 502, configured to perform data preprocessing on the device-related document; and performing data analysis of a plurality of entity classes on the preprocessed equipment-related document according to the equipment information and the manufacturer information, and storing a data analysis result in a database.
Optionally, the processing unit 502 is specifically configured to:
and performing data cleaning on the device-related document, wherein the data cleaning comprises removing blank lines, removing unrecognized texts, retaining header information, retaining paragraph information and regularizing data storage.
Optionally, the processing unit 502 is specifically configured to:
for document data, generating a document ID according to the equipment information, the manufacturer information and the time for acquiring the document information, and generating a document abstract corresponding to the document ID according to the preprocessed equipment-related document;
for paragraph class data, determining a paragraph ID according to the document ID and the position of the paragraph in the device-related document, and generating a paragraph abstract and a paragraph key word corresponding to the paragraph ID;
for sentence data, determining a sentence ID according to the paragraph ID and the position of the sentence in the paragraph, and generating a sentence keyword corresponding to the sentence ID;
for the part of speech data, the word ID is generated in the sequence of the sentence in which the word is located, and the part of speech corresponding to the word ID is generated.
Optionally, the data parsing result is used for document retrieval, document query or knowledge question and answer.
Based on the same technical concept, an embodiment of the present invention further provides a computing device, including:
a memory for storing program instructions;
and the processor is used for calling the program instruction stored in the memory and executing the management method of the Internet of things platform document data according to the obtained program.
Based on the same technical concept, an embodiment of the present invention further provides a computer-readable non-volatile storage medium, which includes computer-readable instructions, and when the computer reads and executes the computer-readable instructions, the computer is enabled to execute the method for managing the internet of things platform document data.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for managing document data of an Internet of things platform is characterized by comprising the following steps:
acquiring document information uploaded by a user, wherein the document information comprises equipment information, manufacturer information and equipment-related documents;
performing data preprocessing on the device-related document;
and performing data analysis of a plurality of entity classes on the preprocessed equipment-related document according to the equipment information and the manufacturer information, and storing a data analysis result in a database.
2. The method of claim 1, wherein the pre-processing the device-related document comprises:
and performing data cleaning on the device-related document, wherein the data cleaning comprises removing blank lines, removing unrecognized texts, retaining header information, retaining paragraph information and regularizing data storage.
3. The method of claim 1, wherein the performing data parsing of a plurality of entity classes on the preprocessed device-related document according to the device information and the vendor information comprises:
for document data, generating a document ID according to the equipment information, the manufacturer information and the time for acquiring the document information, and generating a document abstract corresponding to the document ID according to the preprocessed equipment-related document;
for paragraph class data, determining a paragraph ID according to the document ID and the position of the paragraph in the device-related document, and generating a paragraph abstract and a paragraph key word corresponding to the paragraph ID;
for sentence data, determining a sentence ID according to the paragraph ID and the position of the sentence in the paragraph, and generating a sentence keyword corresponding to the sentence ID;
for the part of speech data, the word ID is generated in the sequence of the sentence in which the word is located, and the part of speech corresponding to the word ID is generated.
4. The method of any of claims 1 to 3, wherein the data parsing results are used for document retrieval, document querying, or knowledge question answering.
5. An internet of things platform document data management device is characterized by comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring document information uploaded by a user, and the document information comprises equipment information, manufacturer information and equipment-related documents;
the processing unit is used for carrying out data preprocessing on the device-related document; and performing data analysis of a plurality of entity classes on the preprocessed equipment-related document according to the equipment information and the manufacturer information, and storing a data analysis result in a database.
6. The apparatus as claimed in claim 5, wherein said processing unit is specifically configured to:
and performing data cleaning on the device-related document, wherein the data cleaning comprises removing blank lines, removing unrecognized texts, retaining header information, retaining paragraph information and regularizing data storage.
7. The apparatus as claimed in claim 5, wherein said processing unit is specifically configured to:
for document data, generating a document ID according to the equipment information, the manufacturer information and the time for acquiring the document information, and generating a document abstract corresponding to the document ID according to the preprocessed equipment-related document;
for paragraph class data, determining a paragraph ID according to the document ID and the position of the paragraph in the device-related document, and generating a paragraph abstract and a paragraph key word corresponding to the paragraph ID;
for sentence data, determining a sentence ID according to the paragraph ID and the position of the sentence in the paragraph, and generating a sentence keyword corresponding to the sentence ID;
for the part of speech data, the word ID is generated in the sequence of the sentence in which the word is located, and the part of speech corresponding to the word ID is generated.
8. The apparatus of any of claims 5 to 7, wherein the data parsing result is for document retrieval, document query, or knowledge question answering.
9. A computing device, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 4 in accordance with the obtained program.
10. A computer-readable non-transitory storage medium including computer-readable instructions which, when read and executed by a computer, cause the computer to perform the method of any one of claims 1 to 4.
CN201910900938.1A 2019-09-23 2019-09-23 Internet of things platform document data management method and device Pending CN110647584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910900938.1A CN110647584A (en) 2019-09-23 2019-09-23 Internet of things platform document data management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910900938.1A CN110647584A (en) 2019-09-23 2019-09-23 Internet of things platform document data management method and device

Publications (1)

Publication Number Publication Date
CN110647584A true CN110647584A (en) 2020-01-03

Family

ID=68992544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910900938.1A Pending CN110647584A (en) 2019-09-23 2019-09-23 Internet of things platform document data management method and device

Country Status (1)

Country Link
CN (1) CN110647584A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092936A (en) * 2013-01-08 2013-05-08 华北电力大学(保定) Real-time information acquisition method of dynamic page of Internet of Things
CN104615748A (en) * 2015-02-12 2015-05-13 华北电力大学(保定) Watir-based (web application testing in ruby based) internet-of-things web event processing method
WO2016056864A1 (en) * 2014-10-08 2016-04-14 (주)섬엔지니어링 Iot analysis system using iot virtual file system
CN109800284A (en) * 2018-12-19 2019-05-24 中国电子科技集团公司第二十八研究所 A kind of unstructured information intelligent Answer System construction method of oriented mission
CN109885672A (en) * 2019-03-04 2019-06-14 中国科学院软件研究所 A kind of question and answer mode intelligent retrieval system and method towards online education
CN109992645A (en) * 2019-03-29 2019-07-09 国家计算机网络与信息安全管理中心 A kind of data supervision system and method based on text data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092936A (en) * 2013-01-08 2013-05-08 华北电力大学(保定) Real-time information acquisition method of dynamic page of Internet of Things
WO2016056864A1 (en) * 2014-10-08 2016-04-14 (주)섬엔지니어링 Iot analysis system using iot virtual file system
CN104615748A (en) * 2015-02-12 2015-05-13 华北电力大学(保定) Watir-based (web application testing in ruby based) internet-of-things web event processing method
CN109800284A (en) * 2018-12-19 2019-05-24 中国电子科技集团公司第二十八研究所 A kind of unstructured information intelligent Answer System construction method of oriented mission
CN109885672A (en) * 2019-03-04 2019-06-14 中国科学院软件研究所 A kind of question and answer mode intelligent retrieval system and method towards online education
CN109992645A (en) * 2019-03-29 2019-07-09 国家计算机网络与信息安全管理中心 A kind of data supervision system and method based on text data

Similar Documents

Publication Publication Date Title
US11163936B2 (en) Interactive virtual conversation interface systems and methods
US7386438B1 (en) Identifying language attributes through probabilistic analysis
CN112631997B (en) Data processing method, device, terminal and storage medium
WO2019084810A1 (en) Information processing method and terminal, and computer storage medium
US20140324812A1 (en) Intent management tool for identifying concepts associated with a plurality of users' queries
WO2019062010A1 (en) Semantic recognition method, electronic device and computer readable storage medium
CN109492152B (en) Method, device, computer equipment and storage medium for pushing custom content
CN108563734A (en) Institutional information querying method, device, computer equipment and storage medium
CN110019703B (en) Data marking method and device and intelligent question-answering method and system
CN110968663A (en) Answer display method and device of question-answering system
CN112463986A (en) Information storage method and device
CN113626571B (en) Method, device, computer equipment and storage medium for generating answer sentence
CN108470289B (en) Virtual article issuing method and equipment based on E-commerce shopping platform
CN110737432A (en) script aided design method and device based on root list
CN109145092B (en) Database updating and intelligent question and answer management method, device and equipment
US11361032B2 (en) Computer driven question identification and understanding within a commercial tender document for automated bid processing for rapid bid submission and win rate enhancement
CN111104422A (en) Training method, device, equipment and storage medium of data recommendation model
CN110929526A (en) Sample generation method and device and electronic equipment
CN110647584A (en) Internet of things platform document data management method and device
CN106407271B (en) Intelligent customer service system and updating method of intelligent customer service knowledge base thereof
US20230110127A1 (en) Intelligent creation of customized responses to customer feedback
CN108549722A (en) Multi-platform data publication method, system and medium
US20180150543A1 (en) Unified multiversioned processing of derived data
CN113836296A (en) Method, device, equipment and storage medium for generating Buddhist question-answer abstract
CN117112809B (en) Knowledge tracking method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200103