CN111353031A

CN111353031A - Thesis management method, server and system based on big data

Info

Publication number: CN111353031A
Application number: CN202010122369.5A
Authority: CN
Inventors: 林瀚; 谷俊; 薛忍霞
Original assignee: Hainan Yizhimai Technology Co ltd
Current assignee: Hainan Yizhimai Technology Co ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2020-06-30
Anticipated expiration: 2040-02-27
Also published as: CN111353031B

Abstract

The invention provides a thesis management method, a server and a system based on big data, wherein the method comprises the steps of receiving authentication information sent by a client, wherein the authentication information comprises an authentication code and instruction information; sending the micro-service module to a client, and deploying after receiving the micro-service module by the client; receiving feature information uploaded by a micro-service module, wherein the feature information is obtained by the micro-service module through local preprocessing of thesis data on a client side, and the preprocessing of the thesis data is to extract feature words from a thesis text; retrieving related existing documents from a database according to the characteristic information, and sending the related existing documents to the micro-service module; the microservice module receives the related existing documents, calculates the similarity between the paper text and the related existing documents to obtain the duplicate checking result, and sends the duplicate checking result to the client.

Description

Thesis management method, server and system based on big data

Technical Field

The invention relates to the technical field of thesis management, in particular to a thesis management method, a server and a system based on big data.

Background

The thesis refers to an article for researching in each academic field and describing academic research results, which is a means for researching problems and performing academic research and a tool for describing academic research results and performing academic communication, the colleges and universities in China can examine the graduation papers of students when the students graduate at present, wherein, the repetition rate of paper duplicate checking is one of the important factors for checking the quality and creativity of a paper, and many journal magazines can check the duplicate of the paper before the paper of the student is published, the common practice of paper duplicate checking is that the user sends the own paper to websites of the known network and all parties for duplicate checking through the network, and the websites basically adopt the mode of plain text duplicate checking, and before duplicate checking, a user needs to upload own papers through an external network, so that the confidentiality is poor, and the unpublished papers of the user have higher leakage risk.

Disclosure of Invention

The invention aims to provide a thesis management method, a server and a system based on big data, and aims to solve the problems that in the prior art, the thesis duplicate checking confidentiality is poor, and unreported thesis of a user is easy to leak.

The invention provides a thesis management method based on big data in a first aspect, which comprises the following steps:

receiving authentication information sent by a client, wherein the authentication information comprises an authentication code and instruction information;

sending the micro-service module to a client, and deploying after receiving the micro-service module by the client;

receiving feature information uploaded by a micro-service module, wherein the feature information is obtained by the micro-service module through local preprocessing of thesis data on a client side, and the preprocessing of the thesis data is to extract feature words from a thesis text;

retrieving related existing documents from a database according to the characteristic information, and sending the related existing documents to the micro-service module;

and the micro-service module receives the related existing documents, calculates the similarity between the thesis text and the related existing documents to obtain a duplicate checking result, and sends the duplicate checking result to the client.

Further, the sending the micro service module to the client specifically includes:

acquiring client attribute information, and extracting a corresponding micro-service module from a micro-service warehouse according to the attribute information;

registering the micro service module according to the authentication information to generate a configuration file;

and sending the registered micro service module to a client for deployment.

Further, when the related existing documents are not retrieved from the database according to the feature information, retrieving the related existing documents from the third-party server specifically includes:

generating connection information, sending the connection information to the client, and sending the connection information and the characteristic information to a third-party server;

the third-party server retrieves the related existing documents from the third-party database according to the characteristic information, and if the related existing documents are retrieved, the third-party server sends connection information to the client;

and the client performs matching verification on the received two pieces of connection information, if the two pieces of connection information are matched, connection is established with a third-party server, and the third-party server sends related existing documents to the client.

Further, the micro-service module receives the related existing documents, calculates the similarity between the paper text and the related existing documents to obtain the duplication checking result, and specifically includes:

the method comprises the steps that related existing documents sent by a server are sequentially received, and the related existing documents are sequentially recorded into a transmission queue by the server to be sent;

creating a cache region, and storing the received related existing documents into the cache region;

and extracting related existing documents from the buffer area to perform similarity calculation with the paper text, receiving new related existing documents and storing the new related existing documents in the buffer area until the transmission queue is sent.

A second aspect of the present invention provides a server, comprising:

the first receiving module is used for receiving authentication information sent by a client, wherein the authentication information comprises an authentication code and instruction information;

the sending module is used for sending the micro-service module to a client, and the client deploys after receiving the micro-service module;

the second receiving module is used for receiving the feature information uploaded by the micro-service module, wherein the feature information is obtained by preprocessing the thesis data locally at the client by the micro-service module, and the preprocessing of the thesis data is to extract feature words from the thesis text;

the retrieval module is used for retrieving related existing documents from the database according to the characteristic information and sending the related existing documents to the microservice module;

the microservice module is used for receiving related existing documents, calculating the similarity between the thesis text and the related existing documents to obtain a duplicate checking result, and sending the duplicate checking result to the client.

Further, the sending module further includes:

the acquisition submodule is used for acquiring the attribute information of the client and extracting a corresponding micro-service module from the micro-service warehouse according to the attribute information;

the registration submodule is used for registering the micro-service module according to the authentication information to generate a configuration file;

and the sending submodule is used for sending the registered micro-service module to the client for deployment.

Further, the retrieval module is further configured to retrieve the relevant existing document from the third-party server when the relevant existing document is not retrieved from the database according to the feature information, and the retrieval module specifically includes a generation sub-module configured to generate connection information, send the connection information to the client, and send the connection information and the feature information to the third-party server,

the third-party server is used for retrieving related existing documents from a third-party database according to the characteristic information, and sending connection information to the client if the related existing documents are retrieved;

the client is also provided with a matching verification module which is used for matching and verifying the received two pieces of connection information, if the two pieces of connection information are matched, the connection is established with a third-party server, and the third-party server sends related existing documents to the client.

Further, the micro service module specifically further includes:

the receiving submodule is used for sequentially receiving related existing documents sent by the server, and the related existing documents are sequentially recorded into a transmission queue by the server for sending;

the creating submodule is used for creating a cache region and storing the received related existing documents into the cache region;

and the calculation sub-module is used for extracting the related existing documents from the buffer area to perform similarity calculation with the thesis text, receiving the new related existing documents and storing the new related existing documents into the buffer area until the transmission queue is sent.

The third aspect of the present invention provides a big data based thesis management system, which includes the server and the client described in the second aspect.

Compared with the prior art, the invention has the beneficial effects that:

the method comprises the steps of sending a micro-service module to a client for deployment, preprocessing a thesis text by the micro-service module locally at the client to obtain characteristic information and uploading the characteristic information, retrieving relevant existing documents from a database by the server according to the characteristic information and sending the relevant existing documents to the micro-service module, and obtaining a duplicate checking result by the micro-service module and calculating the similarity between the relevant existing documents and the thesis text, so that a user can check the duplicate without uploading the thesis text through a network, the safety of the data of the user without publication is further improved, and the load of the server can be effectively reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only preferred embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive efforts.

Fig. 1 is a flowchart illustrating a thesis management method based on big data according to an embodiment of the present invention.

FIG. 2 is a flowchart illustrating a big data based thesis management method according to another embodiment of the present invention.

FIG. 3 is a flowchart illustrating a big data based thesis management method according to another embodiment of the present invention.

FIG. 4 is a flowchart illustrating a big data based thesis management method according to another embodiment of the present invention.

Fig. 5 is a schematic diagram of an overall structure of a server according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, the illustrated embodiments are provided to illustrate the invention and not to limit the scope of the invention.

Fig. 1 is a schematic flow chart of a thesis management method based on big data according to an embodiment of the present invention.

In this embodiment, the server may be a computer, a server, or other devices; the client can be a computer, a tablet computer, an intelligent handheld terminal and other devices. And the server and the client carry out data interaction through a network.

As shown in fig. 1, the thesis management method based on big data, applied to a server, includes the following steps:

and S11, receiving authentication information sent by the client, wherein the authentication information comprises an authentication code and instruction information.

In the embodiment, the operation instruction is to check duplicate of a thesis text, and in other embodiments, the operation instruction may also be to search and download existing documents, check personal account information, and the like.

A database is pre-established in the server, on one hand, the database stores account information of all registered users, and the account information at least comprises unique identification information; on the other hand, existing literature data is stored, including but not limited to articles, books, periodicals, scientific reports, patents.

In addition, the client may process the authentication information before sending the authentication information to the server, where the processing may be to encrypt the authentication code and the instruction information by an encryption algorithm, to add characters to the authentication code and the instruction information, or to perform other processing manners.

And S12, sending the micro service module to the client, and deploying after receiving the micro service module by the client.

The server sends the encapsulation of the micro-service module to the client, and the deployment is that the client unpacks the encapsulation of the micro-service module after receiving the encapsulation and installs the encapsulation to the local part of the client so that the micro-service module can operate.

S13, receiving feature information uploaded by the micro service module, wherein the feature information is obtained by preprocessing the paper data locally by the micro service module at the client, and the preprocessing of the paper data is to extract feature words from the paper text.

When a user corresponding to a client performs duplicate checking on a thesis, the user needs to open a thesis text stored locally through the client, the micro-service module performs format recognition and word segmentation on the thesis text, and extracts feature words with high occurrence frequency from the thesis text, so that related existing documents can be searched in the subsequent steps according to the feature words extracted from the thesis text; the characteristic words can be multiple, so that the technical range related to the thesis is covered as much as possible, and the condition of missing check is avoided, and the duplicate checking result is not accurate enough.

In addition, before sending the feature information to the server, the client may process the feature information, where the processing may be to encrypt the feature information by an encryption algorithm, to add characters to the feature information, or to perform other processing manners.

And S14, retrieving the related existing documents from the database according to the characteristic information, and sending the related existing documents to the microservice module.

When the related existing documents are searched for the first time from the database according to the characteristic information, the server can perform secondary screening on the related existing documents which are searched for the first time, wherein the secondary screening comprises the following steps: the method comprises the steps of sorting according to the similarity between related existing documents and feature words retrieved for the first time, taking the existing documents with higher similarity as a paper text duplicate checking and comparing object according to a preset value, namely screening the existing documents with the similarity ranking outside the preset value so as to reduce duplicate checking and comparing on the existing documents with lower similarity and improve duplicate checking efficiency.

And S15, the micro-service module receives the related existing documents, calculates the similarity between the paper text and the related existing documents to obtain a duplication checking result, and sends the duplication checking result to the client.

By adopting the thesis management method based on big data provided by the embodiment, the server sends a micro service module to the client for deployment at the local part of the client, the micro service module preprocesses the thesis text at the body of the client, acquires the feature information of the thesis text and uploads the feature information to the server, the server retrieves the relevant existing documents according to the feature information and sends the relevant existing documents to the client, and the micro service module compares and calculates the similarity between the relevant existing documents and the thesis text at the local part to obtain the duplication checking result, so that the duplication checking comparison can be completed at the local part without uploading the thesis text, the content of the user unpublished thesis can be kept secret better because the duplication checking work is performed at the local part, the operation resource of the server is mainly used for searching the relevant existing documents according to the feature information, the load and the response speed of the server can be further reduced, the user can obtain the duplicate checking result more quickly, and the waiting time of the user is reduced.

Fig. 2 is a flowchart illustrating a thesis management method based on big data according to another embodiment of the present invention.

As shown in fig. 2, the sending the micro service module to the client specifically includes:

and S21, acquiring the client attribute information, and extracting the corresponding micro service module from the micro service warehouse according to the attribute information.

The micro-service warehouse is used for storing pre-created and packaged micro-service modules. The client attribute information includes, but is not limited to, device information, operating system information, and network information of the client, the device information may be a device model, and the operating system information may be an operating system version.

In this embodiment, the micro service warehouse stores a plurality of micro service module packages developed for different devices, operating systems, or networks in advance, and the server extracts corresponding micro service modules from the micro service warehouse according to attribute information after acquiring attribute information of the client, so that the micro service modules can be deployed according to characteristics of different types of clients, and device performance of the clients can be better utilized.

And S22, registering the micro service module according to the authentication information to generate a configuration file.

The server registers the micro service module according to authentication information sent by a user corresponding to the client, the unique identification information of the micro service module is generated, the association relation between the unique identification information of the micro service module and the authentication information is established, and then a configuration file is generated for the micro service module. The configuration file is used for recording the version, the running state and other information of the micro service module.

And S23, sending the registered micro service module to the client for deployment.

In this embodiment, only the registered micro service module is sent to the client by the server for deployment, so that the server can track the micro service module subsequently according to the registration information and the configuration file of the micro service module, and upgrade and maintenance of the micro service module are facilitated.

Fig. 3 is a flowchart illustrating a thesis management method based on big data according to another embodiment of the present invention.

As shown in fig. 3, when the server cannot retrieve the related existing document from the database according to the feature information uploaded by the micro service module, retrieving the related existing document from the third-party server specifically includes:

and S31, generating connection information, sending the connection information to the client, and sending the connection information and the characteristic information to the third-party server.

The connection information is a unique secret key generated when the server needs to retrieve related existing documents from the third-party server, and the server simultaneously sends the connection information to the client and the third-party server, so that the client and the third-party server can perform mutual authentication through the connection information in subsequent steps.

And S32, the third-party server searches the related existing documents from the third-party database according to the characteristic information, and if the related existing documents are searched, the third-party server sends connection information to the client.

In the step, when the third-party server retrieves the related existing documents, the received connection information is sent to the client; and if the relevant existing documents are not searched, feeding back a search result to the server, and after receiving the feedback result that the third-party server does not search the relevant existing documents, feeding back the search result to the client.

And S33, the client performs matching verification on the received two pieces of connection information, if the two pieces of connection information are matched, connection is established with a third-party server, and the third-party server sends related existing documents to the client.

In this embodiment, when the third-party server is required to retrieve the related existing documents, the client does not need to send any information to the third-party server, so that the local data security of the client is ensured, after the third-party server completes retrieval of the related existing documents, the connection information generated by the server is sent to the client, the client performs matching verification on the received connection information sent by the server and the connection information sent by the third-party server, if the two pieces of connection information are matched, the data sent by the third-party server is authentic, at this time, the client establishes connection with the third-party server and receives the related existing documents sent by the third-party server, so that the subsequent thesis text can be checked for duplication.

Fig. 4 is a flowchart illustrating a thesis management method based on big data according to another embodiment of the present invention.

As shown in fig. 4, the microservice module receives the related existing documents, calculates the similarity between the paper text and the related existing documents to obtain the duplication checking result, and specifically includes:

and S41, sequentially receiving the related existing documents sent by the server, wherein the related existing documents are sequentially recorded into a transmission queue by the server for sending.

In this embodiment, after the server or the third-party server retrieves the related existing documents according to the feature information uploaded by the micro service module, a transmission queue is created, the retrieved related existing documents are sequentially entered into the transmission queue, and the related existing documents are sent to the client through the transmission queue.

And S42, creating a buffer area, and storing the received related existing documents into the buffer area.

The cache region is created by the micro-service module according to the memory use condition of the client body, and the size of the cache region can be different for different memory use conditions of different clients.

In addition, when the size of the received related existing document reaches a preset threshold, the micro service module sends first feedback information to the server or the third-party server sending the related existing document, after the server or the third-party server receives the first feedback information, the server or the third-party server suspends sending the related existing document to the client, and the preset threshold may be slightly smaller than the maximum capacity of the cache area.

And S43, extracting the related existing documents from the buffer area to calculate the similarity between the related existing documents and the paper texts, receiving the new related existing documents and storing the new related existing documents in the buffer area until the transmission queue is sent.

The method comprises the steps that a micro-service module extracts related existing documents from a cache region in sequence to carry out similarity calculation with a thesis text, when one related existing document is extracted, the cache region is released, the micro-service module simultaneously sends second feedback information to a server or a third-party server, the server or the third-party server continues to send the remaining related existing documents in a transmission queue to a client, when the size of the received related existing document reaches a preset threshold value, the micro-service module sends first feedback information to the server or the third-party server again, and the process is repeatedly executed until the related existing documents in the transmission queue of the server or the third-party server are sent completely.

Based on the same inventive concept as the foregoing embodiment, fig. 5 is a schematic diagram of an overall structure of a server according to an embodiment of the present invention.

As shown in fig. 5, the server includes a first receiving module 1, a sending module 2, a second receiving module 3, and a retrieving module 4.

The first receiving module 1 is configured to receive authentication information sent by a client, where the authentication information includes an authentication code and instruction information.

The sending module 2 is used for sending the micro-service module to the client, and the client deploys after receiving the micro-service module.

The second receiving module 3 is configured to receive feature information uploaded by the micro service module, where the feature information is obtained by the micro service module by locally preprocessing the paper data at the client, and the preprocessing the paper data is to extract feature words from a paper text.

And the retrieval module 4 is used for retrieving the related existing documents from the database according to the characteristic information and sending the related existing documents to the microservice module.

Optionally, the sending module 2 further includes an obtaining sub-module, a registering sub-module, and a sending sub-module.

The obtaining submodule is used for obtaining client attribute information and extracting a corresponding micro-service module from a micro-service warehouse according to the attribute information.

And the registration submodule is used for registering the micro-service module according to the authentication information to generate a configuration file.

Optionally, the retrieving module 4 is further configured to retrieve the relevant existing document from the third-party server when the relevant existing document is not retrieved from the database according to the feature information. The retrieval module 4 further includes a generation sub-module, configured to generate connection information, send the connection information to the client, and send the connection information and the feature information to the third-party server.

And the third-party server is used for retrieving the related existing documents from the third-party database according to the characteristic information, and sending the connection information to the client if the related existing documents are retrieved.

The client is further provided with a matching verification module 5, the matching verification module 5 is used for performing matching verification on the received two pieces of connection information, if the two pieces of connection information are matched, connection is established with a third-party server, and the third-party server sends related existing documents to the client.

Optionally, the microservice module further includes a receiving submodule, a creating submodule, and a calculating submodule.

The receiving submodule is used for sequentially receiving related existing documents sent by the server, and the related existing documents are sequentially recorded into a transmission queue by the server for sending.

The creating submodule is used for creating a cache region and storing the received related existing documents into the cache region.

The server is configured to execute the foregoing embodiments, and reference may be made to the foregoing method embodiments for implementing the principles and technical effects, which are not described herein again.

An embodiment of the present invention further provides a thesis management system based on big data, where the system includes the server and the client described in any of the above embodiments.

These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more special integrated circuits, or one or more microprocessors, or one or more field programmable gate arrays, or the like. For another example, when some of the above modules are implemented in the form of processing element dispatcher code, the processing element may be a general purpose processor, such as a central processing unit or other processor that can invoke the program code. For another example, the modules may be integrated together and implemented in a system on a chip.

In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A thesis management method based on big data is applied to a server and is characterized by comprising the following steps:

2. The big-data-based thesis management method according to claim 1, wherein said sending a micro-service module to a client specifically includes:

and sending the registered micro service module to a client for deployment.

3. A thesis management method based on big data as claimed in claim 1, wherein when no relevant existing documents are retrieved from the database according to the feature information, retrieving relevant existing documents from a third party server specifically comprises:

4. A thesis management method based on big data as claimed in claim 1, wherein said micro service module receives related existing documents, calculates similarity between the thesis text and the related existing documents to obtain duplicate checking result, specifically comprising:

5. A server, characterized in that the server comprises:

6. The server according to claim 5, wherein the sending module further comprises:

7. The server according to claim 5, wherein the retrieving module is further configured to retrieve the relevant existing document from the third-party server when the relevant existing document is not retrieved from the database according to the feature information, and the retrieving module further comprises a generating sub-module configured to generate a connection information, send the connection information to the client, and send the connection information and the feature information to the third-party server,

8. The server according to claim 5, wherein the microservice module further comprises:

9. A big data based thesis management system, characterized in that said system comprises a server and a client according to any one of claims 5-8.