CN111353031B

CN111353031B - Thesis management method, server and system based on big data

Info

Publication number: CN111353031B
Application number: CN202010122369.5A
Authority: CN
Inventors: 林瀚; 谷俊; 薛忍霞
Original assignee: Hainan Yizhimai Technology Co ltd
Current assignee: Hainan Yizhimai Technology Co ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2023-04-14
Anticipated expiration: 2040-02-27
Also published as: CN111353031A

Abstract

The invention provides a thesis management method, a server and a system based on big data, wherein the method comprises the steps of receiving authentication information sent by a client, wherein the authentication information comprises an authentication code and instruction information; sending the micro-service module to a client, and deploying after receiving the micro-service module by the client; receiving feature information uploaded by a micro-service module, wherein the feature information is obtained by the micro-service module through local preprocessing of thesis data on a client side, and the preprocessing of the thesis data is to extract feature words from a thesis text; retrieving related existing documents from a database according to the characteristic information, and sending the related existing documents to the micro-service module; the microservice module receives the related existing documents, calculates the similarity between the paper text and the related existing documents to obtain the duplicate checking result, and sends the duplicate checking result to the client.

Description

Thesis management method, server and system based on big data

Technical Field

The invention relates to the technical field of thesis management, in particular to a thesis management method, a server and a system based on big data.

Background

The paper refers to an article for researching in various academic fields and describing academic research results, and the article is a means for researching problems and performing academic research and a tool for describing academic research results and performing academic communication.

Disclosure of Invention

The invention aims to provide a thesis management method, a server and a system based on big data, and the thesis management method, the server and the system are used for solving the problems that the confidentiality of thesis duplicate checking is poor and the unreported thesis of a user is easy to leak in the prior art.

The invention provides a thesis management method based on big data in a first aspect, which comprises the following steps:

receiving authentication information sent by a client, wherein the authentication information comprises an authentication code and instruction information;

sending the micro-service module to a client, and deploying after receiving the micro-service module by the client;

receiving feature information uploaded by a micro-service module, wherein the feature information is obtained by the micro-service module through local preprocessing of thesis data on a client side, and the preprocessing of the thesis data is to extract feature words from a thesis text;

retrieving related existing documents from a database according to the characteristic information, and sending the related existing documents to the micro-service module;

and the micro-service module receives the related existing documents, calculates the similarity between the thesis text and the related existing documents to obtain a duplicate checking result, and sends the duplicate checking result to the client.

Further, the sending the micro service module to the client specifically includes:

acquiring client attribute information, and extracting a corresponding micro-service module from a micro-service warehouse according to the attribute information;

registering the micro service module according to the authentication information to generate a configuration file;

and sending the registered micro service module to a client for deployment.

Further, when the related existing document cannot be retrieved from the database according to the feature information, retrieving the related existing document from the third-party server specifically includes:

generating connection information, sending the connection information to the client, and sending the connection information and the characteristic information to a third-party server;

the third-party server retrieves related existing documents from a third-party database according to the characteristic information, and if the related existing documents are retrieved, the third-party server sends connection information to the client;

and the client performs matching verification on the received two pieces of connection information, if the two pieces of connection information are matched, connection is established with a third-party server, and the third-party server sends related existing documents to the client.

Further, the micro service module receives related existing documents, calculates similarity between a thesis text and the related existing documents to obtain a duplicate checking result, and specifically includes:

the method comprises the steps that related existing documents sent by a server are sequentially received, and the related existing documents are sequentially recorded into a transmission queue by the server to be sent;

creating a cache region, and storing the received related existing documents into the cache region;

and extracting related existing documents from the buffer area to perform similarity calculation with the paper text, receiving new related existing documents and storing the new related existing documents in the buffer area until the transmission queue is sent.

A second aspect of the present invention provides a server, comprising:

the first receiving module is used for receiving authentication information sent by a client, wherein the authentication information comprises an authentication code and instruction information;

the sending module is used for sending the micro-service module to a client, and the client deploys after receiving the micro-service module;

the second receiving module is used for receiving the feature information uploaded by the micro-service module, wherein the feature information is obtained by preprocessing the thesis data locally at the client by the micro-service module, and the preprocessing of the thesis data is to extract feature words from the thesis text;

the retrieval module is used for retrieving related existing documents from the database according to the characteristic information and sending the related existing documents to the microservice module;

the microservice module is used for receiving related existing documents, calculating the similarity between the thesis text and the related existing documents to obtain a duplicate checking result, and sending the duplicate checking result to the client.

Further, the sending module further includes:

the acquisition sub-module is used for acquiring client attribute information and extracting a corresponding micro-service module from the micro-service warehouse according to the attribute information;

the registration submodule is used for registering the micro-service module according to the authentication information to generate a configuration file;

and the sending submodule is used for sending the registered micro-service module to the client for deployment.

Further, the retrieval module is further configured to retrieve the relevant existing document from the third-party server when the relevant existing document is not retrieved from the database according to the feature information, and the retrieval module specifically includes a generation sub-module configured to generate connection information, send the connection information to the client, and send the connection information and the feature information to the third-party server,

the third-party server is used for retrieving related existing documents from a third-party database according to the characteristic information, and sending connection information to the client if the related existing documents are retrieved;

the client is also provided with a matching verification module which is used for matching and verifying the received two pieces of connection information, if the two pieces of connection information are matched, the connection is established with a third-party server, and the third-party server sends related existing documents to the client.

Further, the micro service module specifically further includes:

the receiving submodule is used for sequentially receiving related existing documents sent by the server, and the related existing documents are sequentially recorded into a transmission queue by the server for sending;

the creating submodule is used for creating a cache region and storing the received related existing documents into the cache region;

and the calculation sub-module is used for extracting the related existing documents from the buffer area to perform similarity calculation with the thesis text, receiving the new related existing documents and storing the new related existing documents into the buffer area until the transmission queue is sent.

The third aspect of the present invention provides a big data based thesis management system, which includes the server and the client described in the second aspect.

Compared with the prior art, the invention has the beneficial effects that:

the method comprises the steps of sending a micro-service module to a client for deployment, preprocessing a thesis text by the micro-service module locally at the client to obtain characteristic information and uploading the characteristic information, retrieving relevant existing documents from a database by the server according to the characteristic information and sending the relevant existing documents to the micro-service module, and obtaining a duplicate checking result by the micro-service module and calculating the similarity between the relevant existing documents and the thesis text, so that a user can check the duplicate without uploading the thesis text through a network, the safety of the data of the user without publication is further improved, and the load of the server can be effectively reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only preferred embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive efforts.

Fig. 1 is a flowchart illustrating a thesis management method based on big data according to an embodiment of the present invention.

FIG. 2 is a flowchart illustrating a big data based thesis management method according to another embodiment of the present invention.

FIG. 3 is a flowchart illustrating a big data based thesis management method according to another embodiment of the present invention.

FIG. 4 is a flowchart illustrating a big data based thesis management method according to another embodiment of the present invention.

Fig. 5 is a schematic diagram of an overall structure of a server according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate the invention and are not intended to limit the scope of the invention.

Fig. 1 is a schematic flow chart of a thesis management method based on big data according to an embodiment of the present invention.

In this embodiment, the server may be a computer, a server, or other devices; the client can be a computer, a tablet computer, an intelligent handheld terminal and other devices. And the server and the client carry out data interaction through a network.

As shown in fig. 1, the thesis management method based on big data, applied to a server, includes the following steps:

s11, receiving authentication information sent by a client, wherein the authentication information comprises an authentication code and instruction information.

In the embodiment, the operation instruction is to check duplicate of a thesis text, and in other embodiments, the operation instruction may also be to search and download existing documents, check personal account information, and the like.

A database is pre-established in the server, on one hand, the database stores account information of all registered users, and the account information at least comprises unique identification information; on the other hand, existing literature data is stored, including but not limited to articles, books, periodicals, scientific reports, patents.

In addition, the client may process the authentication information before sending the authentication information to the server, where the processing may be to encrypt the authentication code and the instruction information by an encryption algorithm, to add characters to the authentication code and the instruction information, or to perform other processing manners.

And S12, sending the micro-service module to the client, and deploying after receiving the micro-service module by the client.

The server sends the encapsulation of the micro-service module to the client, and the deployment is that the client unpacks the encapsulation of the micro-service module after receiving the encapsulation and installs the encapsulation to the local part of the client so that the micro-service module can operate.

S13, receiving characteristic information uploaded by the micro-service module, wherein the characteristic information is obtained by preprocessing the thesis data locally by the micro-service module at the client, and the preprocessing of the thesis data is to extract characteristic words from the thesis text.

When a user corresponding to a client performs duplicate paper checking, a local paper text needs to be opened through the client, the microservice module performs format recognition and word segmentation on the paper text, and extracts feature words with high occurrence frequency from the paper text, so that related existing documents can be searched in subsequent steps according to the feature words extracted from the paper text; the characteristic words can be multiple, so that the technical range related to the thesis is covered as much as possible, and the condition of missing check is avoided, and the duplicate checking result is not accurate enough.

In addition, before sending the feature information to the server, the client may process the feature information, where the processing may be to encrypt the feature information by an encryption algorithm, to add characters to the feature information, or to perform other processing manners.

And S14, retrieving the related existing literature from the database according to the characteristic information, and sending the related existing literature to the microservice module.

When the related existing documents are searched for the first time from the database according to the characteristic information, the server can perform secondary screening on the related existing documents which are searched for the first time, wherein the secondary screening comprises the following steps: the method comprises the steps of sorting according to the similarity between related existing documents and feature words retrieved for the first time, taking the existing documents with higher similarity as a paper text duplicate checking and comparing object according to a preset value, namely screening the existing documents with the similarity ranking outside the preset value so as to reduce duplicate checking and comparing on the existing documents with lower similarity and improve duplicate checking efficiency.

And S15, the micro-service module receives the related existing documents, calculates the similarity between the thesis text and the related existing documents to obtain a duplicate checking result, and sends the duplicate checking result to the client.

By adopting the thesis management method based on big data provided by the embodiment, the server sends the micro service module to the client for deployment at the local part of the client, the micro service module preprocesses the thesis text at the body of the client, obtains the feature information of the thesis text and uploads the feature information to the server, the server retrieves the related existing documents according to the feature information and sends the related existing documents to the client, and the micro service module compares and calculates the similarity between the related existing documents and the thesis text at the local part to obtain the duplication checking result, so that the duplication checking comparison can be completed at the local part without uploading the thesis text.

Fig. 2 is a flowchart illustrating a thesis management method based on big data according to another embodiment of the present invention.

As shown in fig. 2, the sending the micro service module to the client specifically includes:

and S21, acquiring client attribute information, and extracting a corresponding micro-service module from a micro-service warehouse according to the attribute information.

The micro-service warehouse is used for storing pre-created and packaged micro-service modules. The client attribute information includes, but is not limited to, device information, operating system information, and network information of the client, where the device information may be a device model, and the operating system information may be an operating system version.

In this embodiment, the micro service warehouse stores a plurality of micro service module packages developed for different devices, operating systems, or networks in advance, and the server extracts corresponding micro service modules from the micro service warehouse according to attribute information after acquiring attribute information of the client, so that the micro service modules can be deployed according to characteristics of different types of clients, and device performance of the clients can be better utilized.

And S22, registering the micro service module according to the authentication information to generate a configuration file.

The method comprises the steps that a server registers a micro-service module according to authentication information sent by a user corresponding to a client, the unique identification information of the micro-service module is generated, the association relation between the unique identification information of the micro-service module and the authentication information is established, and then a configuration file is generated for the micro-service module. The configuration file is used for recording the version, the running state and other information of the micro service module.

And S23, sending the registered micro service module to a client for deployment.

In this embodiment, only the registered micro service module is sent to the client by the server for deployment, so that the server can track the micro service module subsequently according to the registration information and the configuration file of the micro service module, and upgrade and maintenance of the micro service module are facilitated.

Fig. 3 is a flowchart illustrating a thesis management method based on big data according to another embodiment of the present invention.

As shown in fig. 3, when the server does not retrieve the related existing document from the database according to the feature information uploaded by the micro service module, retrieving the related existing document from the third-party server specifically includes:

and S31, generating connection information, sending the connection information to the client, and sending the connection information and the characteristic information to a third-party server.

The connection information is a unique secret key generated when the server needs to retrieve related existing documents from the third-party server, and the server simultaneously sends the connection information to the client and the third-party server, so that in the subsequent steps, the client and the third-party server can perform mutual authentication through the connection information.

And S32, the third-party server searches the related existing documents from the third-party database according to the characteristic information, and if the related existing documents are searched, the third-party server sends connection information to the client.

In the step, when the third-party server retrieves the related existing documents, the received connection information is sent to the client; and if the relevant existing documents are not retrieved, feeding back a retrieval result to the server, and after receiving the feedback result that the third-party server does not retrieve the relevant existing documents, feeding back the retrieval result to the client.

And S33, the client performs matching verification on the received two pieces of connection information, if the two pieces of connection information are matched, connection is established with a third-party server, and the third-party server sends related existing documents to the client.

In this embodiment, when the third-party server is required to retrieve the related existing documents, the client does not need to send any information to the third-party server, so that the local data security of the client is ensured, after the third-party server completes retrieval of the related existing documents, the connection information generated by the server is sent to the client, the client performs matching verification on the received connection information sent by the server and the connection information sent by the third-party server, if the two pieces of connection information are matched, the data sent by the third-party server is authentic, at this time, the client establishes connection with the third-party server and receives the related existing documents sent by the third-party server, so that the subsequent thesis text can be checked for duplication.

Fig. 4 is a flowchart illustrating a thesis management method based on big data according to another embodiment of the present invention.

As shown in fig. 4, the microservice module receives the related existing documents, calculates the similarity between the paper text and the related existing documents to obtain the duplication checking result, and specifically includes:

s41, relevant existing documents sent by the server are sequentially received, and the relevant existing documents are sequentially recorded into a transmission queue by the server to be sent.

In this embodiment, after the server or the third-party server retrieves the related existing documents according to the feature information uploaded by the micro service module, a transmission queue is created, the retrieved related existing documents are sequentially entered into the transmission queue, and the related existing documents are sent to the client through the transmission queue.

And S42, creating a buffer area, and storing the received related existing documents into the buffer area.

The cache region is created by the micro-service module according to the memory use condition of the client body, and the size of the cache region can be different for different memory use conditions of different clients.

In addition, when the size of the received related existing document reaches a preset threshold, the micro service module sends first feedback information to the server or the third-party server sending the related existing document, after the server or the third-party server receives the first feedback information, the server or the third-party server suspends sending the related existing document to the client, and the preset threshold may be slightly smaller than the maximum capacity of the cache area.

S43, extracting the related existing documents from the buffer area to perform similarity calculation with the paper text, receiving the new related existing documents and storing the new related existing documents in the buffer area until the transmission queue is sent.

The method comprises the steps that a micro-service module extracts related existing documents from a cache region in sequence to carry out similarity calculation with a thesis text, when one related existing document is extracted, the cache region is released, the micro-service module simultaneously sends second feedback information to a server or a third-party server, the server or the third-party server continues to send the remaining related existing documents in a transmission queue to a client, when the size of the received related existing document reaches a preset threshold value, the micro-service module sends first feedback information to the server or the third-party server again, and the process is repeatedly executed until the related existing documents in the transmission queue of the server or the third-party server are sent completely.

Based on the same inventive concept as the foregoing embodiment, fig. 5 is a schematic diagram of an overall structure of a server according to an embodiment of the present invention.

As shown in fig. 5, the server includes a first receiving module 1, a sending module 2, a second receiving module 3, and a retrieving module 4.

The first receiving module 1 is configured to receive authentication information sent by a client, where the authentication information includes an authentication code and instruction information.

The sending module 2 is used for sending the micro-service module to the client, and the client deploys after receiving the micro-service module.

The second receiving module 3 is configured to receive feature information uploaded by the micro service module, where the feature information is obtained by the micro service module by locally preprocessing the paper data at the client, and the preprocessing the paper data is to extract feature words from a paper text.

And the retrieval module 4 is used for retrieving the related existing documents from the database according to the characteristic information and sending the related existing documents to the microservice module.

Optionally, the sending module 2 further includes an obtaining submodule, a registering submodule, and a sending submodule.

The obtaining submodule is used for obtaining client attribute information and extracting a corresponding micro-service module from a micro-service warehouse according to the attribute information.

And the registration sub-module is used for registering the micro-service module according to the authentication information to generate a configuration file.

Optionally, the retrieving module 4 is further configured to retrieve the relevant existing document from the third-party server when the relevant existing document is not retrieved from the database according to the feature information. The retrieval module 4 further includes a generation sub-module, configured to generate connection information, send the connection information to the client, and send the connection information and the feature information to the third-party server.

And the third-party server is used for retrieving the related existing documents from the third-party database according to the characteristic information, and sending the connection information to the client if the related existing documents are retrieved.

The client is further provided with a matching verification module 5, the matching verification module 5 is used for performing matching verification on the received two pieces of connection information, if the two pieces of connection information are matched, connection is established with a third-party server, and the third-party server sends related existing documents to the client.

Optionally, the microservice module further includes a receiving submodule, a creating submodule, and a calculating submodule.

The receiving submodule is used for sequentially receiving related existing documents sent by the server, and the related existing documents are sequentially recorded into a transmission queue by the server for sending.

The creating submodule is used for creating a cache region and storing the received related existing documents into the cache region.

The server is configured to execute the foregoing embodiments, and reference may be made to the foregoing method embodiments for implementing the principles and technical effects, which are not described herein again.

An embodiment of the present invention further provides a thesis management system based on big data, where the system includes the server and the client described in any of the above embodiments.

The above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more special integrated circuits, or one or more microprocessors, or one or more field programmable gate arrays, or the like. For another example, when one of the above modules is implemented in the form of a processing element dispatcher code, the processing element may be a general purpose processor, such as a central processing unit or other processor that can invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system on a chip.

In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A thesis management method based on big data is applied to a server and is characterized by comprising the following steps:

2. The big-data-based thesis management method according to claim 1, wherein said sending a micro-service module to a client specifically includes:

and sending the registered micro service module to a client for deployment.

3. A thesis management method based on big data as claimed in claim 1, wherein when no relevant existing documents are retrieved from the database according to the feature information, retrieving relevant existing documents from a third party server specifically comprises:

4. A thesis management method based on big data as claimed in claim 1, wherein said micro service module receives related existing documents, calculates similarity between the thesis text and the related existing documents to obtain duplicate checking result, specifically comprising:

and extracting related existing documents from the buffer area to perform similarity calculation with the thesis text, receiving new related existing documents and storing the new related existing documents into the buffer area until the transmission queue is sent.

5. A server, characterized in that the server comprises:

the second receiving module is used for receiving the feature information uploaded by the micro-service module, wherein the feature information is obtained by the micro-service module through local preprocessing of thesis data on the client side, and the preprocessing of the thesis data is to extract feature words from a thesis text;

6. The server according to claim 5, wherein the sending module further comprises:

the acquisition submodule is used for acquiring the attribute information of the client and extracting a corresponding micro-service module from the micro-service warehouse according to the attribute information;

7. The server according to claim 5, wherein the retrieving module is further configured to retrieve the relevant existing document from the third-party server when the relevant existing document is not retrieved from the database according to the feature information, and the retrieving module further comprises a generating sub-module configured to generate a connection information, send the connection information to the client, and send the connection information and the feature information to the third-party server,

8. The server according to claim 5, wherein the microservice module further comprises:

9. A big data based thesis management system, characterized in that said system comprises a server and a client according to any one of claims 5-8.