WO2022041714A1

WO2022041714A1 - Document processing method and apparatus, electronic device, storage medium, and program

Info

Publication number: WO2022041714A1
Application number: PCT/CN2021/083679
Authority: WO
Inventors: 陈嘉航
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2020-08-28
Filing date: 2021-03-29
Publication date: 2022-03-03
Also published as: CN112099870A; CN112099870B

Abstract

A document processing method and apparatus, a device, and a computer-readable storage medium. The method comprises: obtaining a document to be processed (201); receiving a configuration file sent by a third-party platform (202), the configuration file comprising an identifier of a target feature of said document and path information of a file package provided by the third-party platform, and the file package comprising first information representing a feature extraction method of the target feature; under the condition that the identifier of the target feature is different from an identifier of a default feature, obtaining the file package on the basis of the path information of the file package (203); and extracting the target feature from said document on the basis of the first information in the file package (204).

Description

Document processing method, apparatus, electronic device, storage medium and program

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number of 202010884957.2 and the filing date of August 28, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.

technical field

The present application relates to the field of document management of financial technology (Fintech), and relates to, but is not limited to, a document processing method, apparatus, electronic device, computer-readable storage medium and computer program.

Background technique

With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually transforming into financial technology (Fintech). requirements.

At present, in the field of fintech, in order to facilitate document management, it is necessary to extract the features of documents and perform document management based on the features of documents; however, when the features of documents are not default features but new features, it is necessary to write and run the The new program code to realize the extraction of new features of the document leads to increased time cost and labor cost.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a document processing method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program.

The technical solutions of the embodiments of the present application are implemented as follows:

The embodiment of the present application provides a document processing method, the method includes:

Get pending documents;

Receive a configuration file sent by a third-party platform, where the configuration file includes the identifier of the target feature of the document to be processed and the path information of the file package provided by the third-party platform; the file package includes a feature extraction method that characterizes the target feature first information;

When the identifier of the target feature is different from the identifier of the default feature, acquiring the file package based on the path information of the file package;

Based on the first information in the file package, the target feature is extracted from the document to be processed.

In some embodiments of the present application, the file package includes a custom class, and the first information is located in the custom class;

The method further includes: loading the custom class in the file package through a reflection mechanism of a programming language, and acquiring the first information from the loaded custom class.

It can be seen that, in the embodiment of the present application, the custom class in the file package can be loaded through the reflection mechanism of the programming language, that is, regardless of whether the custom class in the file package is known or unknown, it can be based on The principle of the reflection mechanism of the programming language does not need to introduce the custom class in the file package in advance, and can realize the loading of the custom class in the file package; in the case of receiving the file package in real time, the custom class in the file package can be realized dynamic loading.

In some embodiments of the present application, the configuration file further includes second information, where the second information includes: an identifier of the file package and/or an identifier of the custom class;

The loading of the custom class in the file package through the reflection mechanism of the programming language includes:

In the case where it is determined that the second information in the configuration file is information pre-agreed with the third-party platform, the custom class in the file package is loaded through the reflection mechanism of the programming language.

It can be seen that if the second information in the configuration file is the information agreed with the third-party platform in advance, it means that the second information in the configuration file is correct information. The class is beneficial to accurately obtain the first information from the custom class, and further, it is beneficial to accurately extract the target feature.

In some embodiments of the present application, the method further includes:

obtaining a preset encryption method of the second information;

Decrypt the encrypted information in the configuration file based on the decryption method corresponding to the encryption method of the second information to obtain the second information; wherein the encrypted information is based on the encryption method to the second information. information is encrypted.

It can be seen that in the embodiment of the present application, after receiving the configuration file sent by the third-party platform, decryption can be performed based on the decryption method corresponding to the preset encryption method. Therefore, the encrypted transmission of the second information can be realized, which is beneficial to improve the second information. Information security, reducing the risk of second information being attacked.

In some embodiments of the present application, the document processing method further includes:

Predetermining an abstract class, and setting the custom class to inherit the predetermined abstract class;

The obtaining the first information from the loaded custom class includes:

The custom class is instantiated as an object, and when the object belongs to the abstract class, the first information is obtained from the loaded custom class.

It can be seen that in the embodiment of the present application, when the object instantiated by the custom class belongs to an abstract class, it can be considered that the custom class is the correct class, and on this basis, it is beneficial to accurately obtain from the custom class The first information, in turn, facilitates accurate extraction of target features.

In some embodiments of the present application, the method further includes:

In the case that the identifier of the target feature is the same as the identifier of the default feature, the target feature is extracted from the document to be processed based on a predetermined extraction method of the default feature.

It can be seen that, for the case where the target feature is the default feature, the embodiment of the present application does not need to obtain the target feature extraction method from the third-party platform, but can realize the target feature extraction based on the predetermined default feature extraction method, which is easy to implement. Features.

In some embodiments of the present application, the method further includes:

A quality score is performed on the document to be processed based on the target feature, and a quality score value of the document to be processed is obtained.

It can be seen that the embodiments of the present application can implement the quality assessment of the document to be processed on the basis of the target feature, which is beneficial to realize the management of the document to be processed on the basis of the quality assessment of the document to be processed.

In some embodiments of the present application, the target feature includes at least two features; the configuration file includes weight information of each of the at least two features;

Performing a quality score on the document to be processed based on the target feature to obtain a quality score value of the document to be processed, including:

Based on the weight information of each of the at least two features, a weighted sum operation is performed on each of the at least two features to obtain a quality score value of the document to be processed.

It can be seen that the embodiment of the present application can implement the quality assessment of the document to be processed by performing weighted summation of each feature of the target feature, which is beneficial to realize the management of the document to be processed based on the quality assessment of the document to be processed.

In some embodiments of the present application, the extracting the target feature from the document to be processed includes:

The word count of the document to be processed is subjected to discretization data processing according to a plurality of predetermined word count intervals to obtain length-related features, each of which corresponds to a value; the document feature vector of the to-be-processed document is extracted, and the The cosine similarity between the document feature vector of the document to be processed and the document feature vector of the preset template is used as the template correlation feature; the part of speech is determined according to the number ratio of the preset part of speech in the document to be processed to all words in the document to be processed. relevant features;

At least two of length-related features, template-related features, and part-of-speech-related features are used as the target features.

It can be seen that the embodiments of the present application can implement the quality evaluation of the document to be processed based on the length-related features, template-related features, and part-of-speech features, that is, the quality of the to-be-processed document can be accurately evaluated from multiple aspects.

The word count of the document to be processed is subjected to discretization data processing according to a plurality of predetermined word count intervals to obtain a first feature, and each of the word count intervals corresponds to a value; the sentences of the document to be processed are averaged The length is discretized data processing according to a plurality of predetermined sentence length intervals to obtain the second feature, and each sentence length interval corresponds to a value; the document error number of the document to be processed is used as the independent variable of the exponential function, Obtain the value of the exponential function, and use the value of the exponential function as the third feature; perform discretization data processing on the number of advanced words of the document to be processed according to a plurality of predetermined intervals of the number of advanced words , to obtain the fourth feature, each of the high-level vocabulary count intervals corresponds to a value, and the high-level vocabulary represents a vocabulary located in a predetermined high-level vocabulary;

At least two of the first feature, the second feature, the third feature, and the fourth feature are used as the target feature.

It can be seen that the embodiment of the present application can implement the quality evaluation of the document to be processed based on the first feature, the second feature, the third feature and the fourth feature, while the first feature, the second feature, the third feature and the fourth feature There are four different characteristics. Therefore, the embodiments of the present application can accurately evaluate the quality of the document to be processed from various aspects.

An embodiment of the present application provides a document processing device, and the device includes:

The first obtaining module is configured to obtain the document to be processed;

a receiving module, configured to receive a configuration file sent by a third-party platform, the configuration file includes an identifier of a target feature of the document to be processed and path information of a file package provided by the third-party platform; the file package includes a file representing the target the first information of the feature extraction method of the feature;

a second acquiring module, configured to acquire the file package based on the path information of the file package when the identifier of the target feature is different from the identifier of the default feature;

A processing module, configured to extract the target feature from the document to be processed based on the first information in the file package.

The second obtaining module is further configured to load the custom class in the file package through the reflection mechanism of the programming language, and obtain the first information from the loaded custom class.

The second acquisition module is configured to load the custom class in the file package through the reflection mechanism of the programming language, including:

In some embodiments of the present application, the second obtaining module is further configured to obtain a preset encryption method of the second information; based on the decryption method corresponding to the encryption method of the second information, the configuration The encrypted information in the file is decrypted to obtain the second information; wherein the encrypted information is obtained by encrypting the second information based on the encryption method.

In some embodiments of the present application, the second obtaining module is further configured to predetermine an abstract class, and set the custom class to inherit the predetermined abstract class;

The second obtaining module is configured to obtain the first information from the loaded custom class, including:

In some embodiments of the present application, the processing module is further configured to, in the case that the identifier of the target feature is the same as the identifier of the default feature, based on the predetermined extraction method of the default feature, in the The target feature is extracted from the processed document.

In some embodiments of the present application, the processing module is further configured to perform a quality score on the document to be processed based on the target feature, and obtain a quality score value of the document to be processed.

The processing module is configured to perform a quality score on the document to be processed based on the target feature, and obtain a quality score value of the document to be processed, including:

In some embodiments of the present application, the processing module, configured to extract the target feature from the document to be processed, includes:

An embodiment of the present application provides an electronic device, and the electronic device includes:

a memory configured to store executable instructions;

The processor is configured to implement any one of the above document processing methods when executing the executable instructions stored in the memory.

Embodiments of the present application provide a computer-readable storage medium storing executable instructions for implementing any one of the foregoing document processing methods when executed by a processor.

An embodiment of the present application provides a computer program, including computer-readable code, when the computer-readable code is executed in an electronic device, the processor in the electronic device executes any one of the above document processing methods.

In this embodiment of the present application, a document to be processed is obtained; a configuration file sent by a third-party platform is received, where the configuration file includes an identifier of a target feature of the document to be processed and path information of a file package provided by the third-party platform; the file The package includes first information that characterizes the feature extraction method of the target feature; when the identifier of the target feature is different from the identifier of the default feature, the file package is acquired based on the path information of the file package; based on the The first information in the file package extracts the target feature from the document to be processed. It can be seen that, in the embodiment of the present application, in the case where the target feature of the document to be processed needs to be extracted and the target feature is not the default feature, in order to achieve the target feature extraction, no new program code written and run locally is not required, but It is an extraction method that can directly obtain target features from third-party platforms, which reduces time and labor costs to a certain extent.

Description of drawings

The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the technical solutions of the present disclosure.

1 is a schematic diagram of an application scenario of an embodiment of the present application;

Fig. 2 is an optional flowchart of the document processing method provided by the embodiment of the present application;

Fig. 3 is a flow chart of realizing the encrypted transmission of information in the configuration file in the embodiment of the present application;

Fig. 4 is another optional flowchart of the document processing method provided by the embodiment of the present application;

5 is a schematic diagram of an optional composition structure of a document processing apparatus according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an optional composition structure of an electronic device provided by an embodiment of the present application.

detailed description

In the related art, for the management of pre-plan documents, only a scheme similar to the library document management system can be used to realize the upload and download of documents, and this kind of document management method cannot realize the evaluation of document quality; In the management mode, you can upload documents to the document library at will, which may cause the quality of the documents in the document library to be uneven. With the development of individuals, enterprises, and society, there will be more and more documents in the document library.

In related technologies, document management can be achieved by manually evaluating document quality. However, this will increase a lot of labor costs, and each person's document evaluation criteria cannot be saved as experience. The method of manually evaluating document quality is still subject to strong subjectivity and inconvenience. The problem is not objective enough; in the related art, the features of a certain type of document can also be extracted based on feature engineering, and then the document quality can be evaluated based on the extracted features. For example, a certain type of document can be in English. Composition, Chinese composition, etc.; for different types of documents, different types of features may need to be extracted. Therefore, in order to extract different types of features, different feature extraction models need to be developed and deployed, or different feature libraries need to be developed. For the feature extraction model, new program code needs to be written and deployed locally, which increases time and labor costs.

In view of the above technical problems, the technical solutions of the embodiments of the present application are proposed.

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail below with reference to the accompanying drawings. All other embodiments obtained under the premise of creative work fall within the scope of protection of the present application.

The embodiments of the present application provide a document processing method, apparatus, device, and computer-readable storage medium; the document processing methods provided by the embodiments of the present application can be applied to electronic devices, and exemplary electronic devices provided by the embodiments of the present application are described below. application, the electronic device provided by the embodiments of the present application can be implemented as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (eg, a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), etc. .

FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present application. As shown in FIG. 1 , the electronic device 100 may connect to the third-party platform 102 through the network 101; the network 101 may be a wide area network or a local area network, or a combination of the two; The tripartite platform 102 can be implemented based on a terminal and/or a server, and the terminal can be a tablet computer, a notebook computer, a desktop computer, etc., but is not limited to this; the server can be an independent physical server, or a server composed of multiple physical servers A cluster or distributed system can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, and Content Delivery Network (CDN). ), as well as cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

In some embodiments of the present application, the third-party platform 102 may acquire the document to be processed and send the document to be processed to the electronic device 100; the type of the document to be processed may be any type, and in some embodiments, the document to be processed may be It is a Chinese document, an English document or a document in other languages; in some embodiments, the document to be processed may be a plan document, log data of an electronic device, or other documents; it should be noted that the content recorded above is only for the document to be processed. Types are exemplified, and the embodiments of the present application are not limited thereto.

In some embodiments of the present application, the electronic device 100 may obtain the document to be processed locally, or download the document to be processed from the network 101 ; the electronic device 100 may send the document to be processed to the third-party platform 102 .

After acquiring the document to be processed, the third-party platform 102 can determine the target feature of the document to be processed and the feature extraction method of the target feature, and generate a configuration file, the configuration file at least includes the identification of the target feature of the document to be processed and the third-party platform 102 provides the target feature. The path information of the file package; the file package includes the first information representing the feature extraction method of the target feature. Here, the first information may be a program code implementing a feature extraction method of the target feature.

In this embodiment of the present application, the third-party platform 102 determines the target feature according to the actual feature extraction requirement. Here, the target feature may be one feature or may include multiple features. In this embodiment of the present application, the identifier of the target feature may be a name, a serial number, or other identifiers.

In this embodiment of the present application, the file package may include: a code collection that provides at least one function in an object-oriented programming language; exemplarily, the object-oriented programming language may be JAVA language, C++ language, etc., in the object-oriented programming language When the language can be the JAVA language, the above file package can be a jar package.

The third-party platform 102 may send the configuration file and file package to the electronic device 100 .

In this embodiment of the present application, the path information of the file package may indicate the storage location of the file package in the electronic device 100; the electronic device 100 may determine the storage location of the file package according to the configuration file, extract the first information in the file package, and store the file package according to the first information. The information extracts target features in the document to be processed.

The document processing method according to the embodiment of the present application is exemplarily described below with reference to the application scenario shown in FIG. 1 .

FIG. 2 is an optional flowchart of a document processing method provided by an embodiment of the present application. As shown in FIG. 2 , the flowchart may include:

Step 201: Obtain the document to be processed.

Step 202: Receive the configuration file sent by the third-party platform.

Here, the implementation manners of steps 201 to 202 have been described in the above-mentioned contents, and are not repeated here.

Step 203: In the case that the identifier of the target feature is different from the identifier of the default feature, acquire the file package based on the path information of the file package.

In this embodiment of the present application, the default feature is a feature predetermined by the electronic device, and for the default feature, the extraction method of the default feature is also predetermined.

If the identifier of the target feature is different from that of the default feature, it means that the target feature is not the default feature, and the feature extraction method needs to be determined for the target feature. At this time, the file package can be read based on the path information of the file package in the configuration file. .

Step 204: Extract target features from the document to be processed based on the first information in the file package.

In the embodiment of the present application, the first information represents the feature extraction method of the target feature. Therefore, based on the first information, the feature extraction method of the target feature can be determined, and further, the target feature can be extracted from the document to be processed.

In some embodiments of the present application, the feature extraction method of the target feature is implemented based on a natural language processing (Natural Language Processing, NLP) method or other document processing methods. In some embodiments, the feature extraction method of the target feature may include a first method and a second method, wherein the first method may be denoted as a doCalculator method, and the second method may be denoted as a featureCalculate method.

In the embodiment of the present application, processing the document to be processed based on the first method may include: 1) using the NLP method to segment the document to be processed, and then to count data of word granularity; 2) using the NLP method to segment the document to be processed, and then Statistical sentence granularity data; 3) Remove high-frequency words and modal particles and perform denoising processing; 4) Extract data such as main title, subtitle, font size and other data in the document to be processed, for example, the JAVA application programming interface for Microsoft documents can be used (the JAVA Application Programming Interface for Microsoft Document, Apache POI) Extract data such as the main title, subtitle, font size and other data in the document to be processed.

In some embodiments, different language packages may be used to process the to-be-processed document according to different languages of the to-be-processed document. For example, when the to-be-processed document is a Chinese document, a Chinese language processing package (Han Language Processing, HanLP) may be used The to-be-processed document is segmented or sentenced; when the to-be-processed document is an English document, an English language processing package can be used to segment the to-be-processed document or a sentence.

In the embodiment of the present application, after processing the document to be processed based on the first method, a preliminary processing result of the document to be processed can be obtained, and the preliminary processing result includes the value of the feature; then, the preliminary processing of the document to be processed based on the second method can be obtained. The result is further processed, for example, based on the second method, discrete feature values may be normalized, and continuous feature values may be averaged.

It should be noted that the above-mentioned contents are merely illustrative for the implementation of the first method and the second method, and the embodiments of the present application are not limited thereto.

In some embodiments of the present application, when the first information is a program code for implementing a feature extraction method for a target feature, the program code may be executed to obtain the target feature.

In practical applications, steps 201 to 204 may be implemented based on a processor of an electronic device, and the above-mentioned processor may be an application-specific integrated circuit (ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital Signal Processing Device (Digital Signal Processing Device, DSPD), Programmable Logic Device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit, CPU), control at least one of a device, a microcontroller, and a microprocessor. It can be understood that the electronic device that implements the function of the above processor may also be other, which is not limited in the embodiment of the present application.

It can be seen that, in the embodiment of the present application, in the case where the target feature of the document to be processed needs to be extracted and the target feature is not the default feature, in order to achieve the target feature extraction, no new program code written and run locally is not required, but It is an extraction method that can directly obtain target features from third-party platforms, which reduces time and labor costs to a certain extent.

Further, if the target feature needs to be modified, added or deleted, the third-party platform can modify, add or delete the identification of the target feature in the configuration file, and modify the content of the file package, so that the electronic device does not need to be The new program code written and run locally can extract the target features directly based on the received configuration files and file packages.

In some embodiments of the present application, the above-mentioned file package includes a custom class, and the above-mentioned first information is located in the custom class.

Here, the class in the file package represents the collective name or collection of some objects with the same attributes and behaviors in the object-oriented programming language. The object is the abstraction of objective things, and the class is the abstraction of the object, which is an abstract data type; After customizing the class, the third-party platform can set the first information in the custom class.

In the embodiment of the present application, the self-defined class in the file package can also be loaded through the reflection mechanism of the programming language, and the first information can be obtained from the loaded self-defined class.

Here, the reflection mechanism of the programming language refers to the ability of the program to access, detect and modify its own state or behavior; in an example, the reflection mechanism of the JAVA language refers to the ability to construct any arbitrary state or behavior in the running state of the program. An object of a class can know the class to which any object belongs, the member variables and methods of any class, and the properties and methods of any object. This function of dynamically obtaining program information and dynamically calling objects is called the reflection mechanism of JAVA language.

In the current JAVA related technologies, if a third-party method is to be used, the import method is usually used to load the classes in the file package. However, before the import method is used, the classes imported into the file package need to be extracted, so The class of the file package needs to be known in advance; if the class of the file package is unknown, the class of the file package cannot be loaded by the imported method; the class in the file package cannot be dynamically loaded according to the class of the file package received in real time.

However, in the embodiment of the present application, the self-defined class in the file package can be loaded through the reflection mechanism of the programming language, that is, regardless of whether the custom class in the file package is known or unknown, it can be based on the program language. The principle of reflection mechanism does not require the introduction of custom classes in the file package in advance, and the loading of custom classes in the file package can be realized; in the case of receiving the file package in real time, the dynamic loading of the custom class in the file package can be realized .

In some embodiments of the present application, the electronic device may agree with the third-party platform in advance on the identifier of the file package and/or the identifier of the custom class in the file package. For example, the identifier of the file package may be the name of the file package or other Identification, the identification of the custom class in the file package can be the name of the custom class, the number of the custom class or other identifications.

It is understandable that although the electronic device and the third-party platform agree on the identification of the file package and/or the identification of the custom class in the file package, the third-party platform receives a malicious attack or the third-party platform does not generate the file package according to the agreed requirements. When the identifier of the file package or the identifier of the custom class of the file package is different, the identifier of the file package sent by the third-party platform is different from the identifier of the agreed file package, and/or the identifier of the custom class in the file package sent by the third-party platform is different from the identifier of the custom class. There is a difference in the identification of the agreed custom class, which will cause the file package provided by the third-party platform to not meet the actual requirements.

In some embodiments of the present application, the above configuration file may further include second information, where the second information includes: an identifier of a file package provided by a third-party platform and/or an identifier of the above-mentioned custom class.

Correspondingly, through the reflection mechanism of the programming language, an implementation manner of loading the custom class in the file package may be, after determining that the second information in the configuration file is the information pre-agreed with the third-party platform. In the case of , through the reflection mechanism of the programming language, load the custom class in the file package.

It should be noted that, in the case where it is determined that the second information in the configuration file is not the information pre-agreed with the third-party platform, the received file package may be ignored.

In the current JAVA related technologies, the file package provided by the third-party platform is not authenticated. Therefore, if a malicious attacker such as a hacker learns the information such as the custom class name in the file package, it can be realized by imitating the file package. Attacks on electronic devices.

In response to this technical problem, in some embodiments of the present application, the electronic device may also obtain a preset encryption method of the second information; correspondingly, after receiving the configuration file sent by the third-party platform, based on the second information The encrypted information in the configuration file is decrypted according to the decryption method corresponding to the encryption method, and the second information is obtained; wherein, the encrypted information is obtained by encrypting the second information based on the above-mentioned encryption method.

In some embodiments, the electronic device may obtain a preset encryption mode of the second information before receiving the configuration file sent by the third-party platform; for example, the preset encryption mode of the second information may be the electronic device and the third party. The platform agrees on the encryption method of the second information.

Here, after the third-party platform and the electronic device agree on the encryption method of the second information, after generating the second information, the second information can be encrypted by using the agreed encryption method to obtain the encrypted information; then, the configuration file including the encrypted information can be encrypted. sent to electronic device.

In some embodiments of the present application, the above encryption method and decryption method may be set according to actual conditions. For example, the encryption method and the decryption method may be determined based on a symmetric encryption method such as the Data Encryption Standard (DES), or may be determined based on a non-symmetric encryption method. The symmetric encryption method determines the encryption method and the decryption method.

FIG. 3 is a flowchart of implementing encrypted transmission of information in a configuration file in an embodiment of the application. Referring to FIG. 3 , when an encryption mode and a decryption mode are determined based on DES, the process of implementing encrypted transmission of information in the configuration file may include: :

Step 301: The electronic device sends the public key and the private key to the third-party platform,

In this embodiment of the present application, the electronic device may agree with the third-party platform on the above-mentioned second information; the electronic device may store the public key, the private key and the agreed second information in a database, so as to facilitate subsequent verification;

Step 302: The third-party platform encrypts the second information by using the private key.

In the embodiment of the present application, after receiving the private key, the third-party platform does not need to directly encrypt the file package and the classes in the file package, but after writing the second information into the configuration file, uses the private key to encrypt the first The second information is encrypted.

Step 303: The third-party platform writes the public key corresponding to the private key into the configuration file, and sends the configuration file to the electronic device.

In the embodiment of the present application, after encrypting the second information of the configuration file with the private key, and writing the public key corresponding to the private key into the configuration file, the third-party platform can send the configuration file to the electronic device.

In other embodiments, the configuration file further includes the identification of the feature extraction method of the target feature. Correspondingly, the third-party platform can also use the private key to encrypt the identification of the feature extraction method of the target feature; wherein, the feature extraction method of the target feature The identification of the method can be information such as name.

Step 304: The electronic device searches for the private key corresponding to the public key.

In the embodiment of the present application, after receiving the configuration file, the electronic device can read the path information and the public key in the configuration file, and search the database for the private key corresponding to the public key.

Step 305: The electronic device decrypts the encrypted information in the configuration file by using the private key.

In this embodiment of the present application, both the

above steps

304 and 305 may be implemented by a program running in an electronic device.

If the identifier of the file package and/or the identifier of the custom class in the decrypted information is consistent with the agreed second information, it means that the file package is a correct data package.

It can be seen that the embodiment of the present application can make the third-party platform encrypt the second information by agreeing on the encryption method of the second information in the configuration file, and after receiving the configuration file sent by the third-party platform, it can be based on the third-party platform. The decryption method corresponding to the encryption method agreed by the platform is decrypted, so the encrypted transmission of the second information can be realized, which is beneficial to improve the security of the second information and reduce the risk of the second information being attacked.

In some embodiments of the present application, the electronic device may predetermine an abstract class, and set a custom class to inherit the predetermined abstract class; for example, the electronic device may agree with a third-party platform that the custom class inherits the predetermined abstract class abstract class.

Here, an abstract class represents a class that cannot be instantiated as an object; inheritance is a concept in object-oriented software technology, which can make a subclass have the properties and methods of the parent class, or make a subclass inherit methods from the parent class, so that the subclass can have the properties and methods of the parent class. Has the same behavior as the parent class.

In practical applications, the electronic device can, through the interaction of the third-party platform, agree that the custom class in the file package inherits the abstract class; it is understandable that although the electronic device and the third-party platform agree that the custom class inherits the predetermined abstract class, However, when the third-party platform receives a malicious attack or the third-party platform does not inherit the abstract class as agreed upon, the classes in the file package provided by the third-party platform do not actually inherit the above-mentioned abstract class.

In the current JAVA-related technology, if the custom class in the file package provided by the third-party platform does not inherit the predetermined abstract class, it may cause that the electronic device cannot obtain the first information from the custom class.

In view of the above technical problems, in the embodiment of the present application, the implementation manner of obtaining the first information from the custom class may be to instantiate the custom class as an object, and if the object belongs to an abstract class, obtain the first information from the loaded custom class. Class to get the first information.

It should be noted that in the case where it is determined that the object does not belong to the abstract class, the received file package can be ignored.

In some embodiments of the present application, after determining that the received file package is a correct data package, the electronic device needs to determine whether the class in the file package inherits the above-mentioned predetermined abstract class; The custom class loader URLClassloader supports the JAVA reflection function by setting the setAccessible parameter. In this way, the custom class loader URLClassloader can be used to load the custom class in the file package and instantiate the loaded custom class as an object. ; Then, you can use the operator java.getInstanceOf() to judge whether the object instantiated by the custom class belongs to the abstract class; if the object instantiated by the custom class belongs to the abstract class, it means that the class in the file package inherits the abstract class class, at this time, the first information can be obtained from the custom class; if the instantiated object belonging to the custom class does not belong to the abstract class, it means that the class in the file package does not inherit the abstract class, and the file package can be ignored.

In some embodiments of the present application, when the identifier of the target feature is the same as the identifier of the default feature, it means that the target feature is the default feature. In this case, based on the predetermined extraction method of the default feature, the The target features are extracted from the documents to be processed.

In some embodiments, when the target feature of the document to be processed includes multiple features, each feature in the target feature may be a default feature, or each feature in the target feature may not be a default feature, or, in the target feature, each feature may be a default feature. Some of the target features are default features, and another part of the features are not default features; it can be seen that, regardless of whether the target features are default features, the embodiments of the present application provide corresponding feature extraction methods.

When using the method of the embodiment of the present application for document processing, only the program code needs to be deployed in the electronic device for the extraction method of the default feature; when the target feature is not the default feature, only the configuration files and files sent by the third-party platform are required. package, and based on the reflection mechanism of the JAVA language, the corresponding target features can be extracted.

If the identifier of the target feature in the configuration file is only the identifier of the default feature, it means that only the default feature is required, and there is no need to extract new features from the extraction target document to be processed. If non-default features are extracted for the document to be processed, the third-party platform can write the identification of the non-default features into the configuration file, and send the configuration file and the corresponding file package to the electronic device; the electronic device can The package extracts new non-default features. That is to say, the third-party platform can determine the content of the configuration file and the content of the file package according to the extraction requirements of the target feature of the document to be processed. When the target feature to be extracted changes, it only needs to change the target feature in the configuration file. The logo and the contents of the package are sufficient.

In some embodiments, in order to achieve document quality assessment, most of the target features to be extracted may be default features; for different types of documents, new non-default features may need to be extracted, in this case, for For different types of documents, the third-party platform can send different jar packages to the electronic device and determine the different contents of the configuration file. In this way, the electronic device can directly use the feature extraction method provided by the third-party platform to perform non-default features according to different jar packages. Compared with the solution in the related art, which needs to write and run new program codes locally on the electronic device, labor cost and time cost are saved.

In some embodiments of the present application, the above-mentioned document processing method may be implemented by a main thread running on an electronic device, and an exemplary description will be given below with reference to FIG. 4; FIG. 4 is another optional document processing method of the embodiment of the present application. The flowchart, as shown in Figure 4, the main thread of the electronic device can be denoted as the thread epicDocCalculate, and the document processing method implemented based on the main thread of the electronic device can include:

Step 401: Read the configuration file and the file package.

In this embodiment of the present application, the main thread of the electronic device can read the configuration file and the file package sent by the third-party platform.

Step 402: Determine whether the identifier of the target feature is the same as the identifier of the default feature. When the determination result is yes, step 403 is performed; when the determination result is no, step 404 is performed.

In this embodiment of the present application, the main thread of the electronic device may determine, based on the configuration file, whether each target feature identifier of the document to be processed is the same as the default feature identifier.

Step 403: Extract default features.

In this embodiment of the present application, the default feature extraction may be implemented based on a predetermined extraction manner of the default feature.

Step 404: Determine whether the file package and the class in the file package are correct, if both the file package and the class in the file package are correct, go to step 405; if the file package or the class in the file package is incorrect, return to step 401.

In this embodiment of the present application, it may be determined whether the file package and the class in the file package are correct based on the foregoing recorded content, which will not be repeated here.

Step 405: Extract target features from the document to be processed based on the first information in the file package.

It can be seen that, regardless of whether the target feature is the default feature, the target feature extraction can be achieved based on steps 401 to 405 .

Of course, in other embodiments of the present application, the electronic device may also not receive the configuration file sent by the third-party platform after acquiring the document to be processed, but based on the predetermined extraction method of the default feature, Extract default features directly in the document to be processed.

In some embodiments of the present application, after the target feature is extracted, the document to be processed may also be scored based on the target feature to obtain a quality score value of the document to be processed, so as to realize the quality assessment of the document to be processed.

In some embodiments, the target feature includes at least two features; the profile includes weight information for each of the at least two features.

Correspondingly, an implementation manner of performing a quality score on the document to be processed based on the target feature, and obtaining the quality score value of the document to be processed may include:

In this embodiment of the present application, the quality score value of the document to be processed can be calculated according to formula (1).

Among them, S represents the quality score value of the document to be processed, fi represents the _ith feature, _wi represents the weight of the ith feature in the above at least two features, and n represents the number of features of the above at least two features.

In some embodiments, regardless of whether the feature in the target feature is a default feature, for the target feature, the third-party platform may determine the weight of the target feature according to actual requirements, or may determine the target according to the initial weight of the target feature sent by the electronic device. feature weight.

In some embodiments, the electronic device may pre-determine the initial weight of the target feature, and send the initial weight of the target feature to the third-party platform; the third-party platform may directly use the initial weight as the weight of the corresponding feature, or, the initial weight may be used in the initial weight. On the basis of , modify it to get the weight of the corresponding feature.

The contents of the two configuration files are exemplarily described below through Table 1 and Table 2.

Table 1

键key	值value	解释explain
PublicKeyPublicKey	！#abc$dce! #abc$dce	9位的随机字符9 random characters
ClassLocationClassLocation	/lib/mycalculator.jar/lib/mycalculator.jar	jar包位置jar package location
ClassNameClassName	myAlgorithmmyAlgorithm	自定义类的名称(加密)The name of the custom class (encrypted)
featureNamefeatureName	[A,B,C,D][A,B,C,D]	默认的特征提取方法Default feature extraction method
特征权重Feature weights	[0.1,0.2,0.1,0.1][0.1,0.2,0.1,0.1]	初始权重initial weight
非默认特征名称Non-default feature name	[D,E,F][D,E,F]	非默认特征(加密)Non-default feature (encryption)
非默认特征权重Non-default feature weights	[0.1,0.2,0.2][0.1,0.2,0.2]	非默认特征权重Non-default feature weights

In Table 1, PublicKey represents the public key, ClassLocation represents the path of the jar package, ClassName represents the class name, featureName represents the feature weight, ExternalFeatureName represents the non-default feature name, ExternalFeatureWeight represents the non-default feature weight; A, B, C and D represent the features respectively A, feature B, feature C and feature D, feature A, feature B, feature C and feature D represent different default features, the weights of feature A, feature B, feature C and feature D are the initial weights determined by the electronic device, The weights of Feature A, Feature B, Feature C, and Feature D are 0.1, 0.2, 0.1, and 0.1, respectively. D, E, and F represent feature D, feature E, and feature F. Feature D, feature E, and feature F are all non-default features. In Table 1, the weights of feature D, feature E, and feature F are 0.1, 0.2, and 0.2.

Table 2

键key	值value	解释explain
ClassLocationClassLocation	/lib/engCalculator.jar/lib/engCalculator.jar	jar包位置jar package location

ClassNameClassName	myAlgorithmmyAlgorithm	自定义类的名称(加密)The name of the custom class (encrypted)
featureNamefeatureName	[A1][A1]	默认的特征提取方法Default feature extraction method
featureWeightfeatureWeight	[0.4][0.4]	初始权重initial weight
非默认特征名称Non-default feature name	[A2,A3,A4][A2,A3,A4]	非默认特征(加密)Non-default feature (encryption)
非默认特征权重Non-default feature weights	[0.2,0.2,0.2][0.2,0.2,0.2]	非默认特征权重Non-default feature weights

In Table 2, the meanings of ClassLocation, ClassName, featureName, ExternalFeatureName, and ExternalFeatureWeight are the same as those in Table 1, and will not be repeated here; A1, A2, A3, and A4 represent feature A1, feature A2, feature A3, and feature A4, respectively, and feature A1 represents the default Features, the weight of feature A1 is the initial weight determined by the electronic device, the weight of feature A1 is 0.4; feature A2, feature A3 and feature A4 are all non-default features, in Table 2, feature A2, feature A3 and feature A4 The weights are 0.2, 0.2, and 0.2, respectively.

The implementation manner of determining the initial weight of the default feature is exemplarily described below.

In the embodiment of the present application, when the default feature includes multiple features, multiple different candidate weight combinations may be predetermined for the default feature, each candidate weight combination includes a weight of each feature in the default feature, and each candidate weight The sum of the weights of each feature in the combination is equal to 1; a weight combination is selected from the above multiple candidate weight combinations as the initial weight of the default feature.

In some embodiments, an implementation manner of selecting a weight combination from the above-mentioned multiple candidate weight combinations may be: obtaining a manual score value for a pre-acquired sample document; Perform a weighted sum operation on the score values of the sample document to obtain the quality score value of the sample document; in each candidate weight combination, select a candidate weight from the candidate weight combination that satisfies the set condition, and the set condition is: the manual score of the sample document The absolute value of the difference between the value and the quality score value is less than the set value. In one embodiment, a candidate weight whose manual rating value of the sample document is closest to the quality rating value may be selected from the candidate weight combinations that satisfy the set condition.

In some embodiments, the default features include feature A5 and feature A6; for the weight of feature A5, based on a preset step of 0.05, traversing from 0.1 to 0.9, multiple weights of feature A5 are determined; for each of feature A5 The weight of the feature A6 is determined, thereby obtaining each candidate weight combination; the sum of the weights of each feature in each candidate weight combination is equal to 1.

table 3

特征A5权重Feature A5 Weights	特征A6权重Feature A6 Weights
0.10.1	0.90.9
0.150.15	0.850.85
0.200.20	0.800.80
……	……
0.90.9	0.10.1

Each candidate weight combination of feature A5 and feature A6 is shown in Table 3, and the same row of Table 3 represents one candidate weight combination.

After each candidate weight combination of feature A5 and feature A6 is obtained, the absolute value of the difference between the manual score value and the quality score value of the sample document can be determined for each candidate weight combination; In the case of indicating the number of words in the document, Table 4 shows the manual score value and quality score value of the sample document corresponding to each candidate weight combination.

Table 4

Based on the manual rating value and the quality rating value shown in Table 4, and according to the content described above, one weight combination may be selected from the multiple candidate weight combinations as the initial weight of the default feature.

In other embodiments, when the target feature includes both a default feature and a non-default feature, the electronic device may also determine the initial weights of the default feature and the non-default feature at the same time, and send the initial weights of the default feature and the non-default feature to a third-party platform; the third-party platform can directly use the initial weight of the default feature and non-default feature as the weight of the corresponding feature, or it can modify the initial weight of the default feature and non-default feature to obtain the weight of the corresponding feature .

In some embodiments, the default feature includes feature B1, and the non-default feature is feature B2; for the weight of feature B1, based on a preset step of 0.05, traversing from 0.1 to 0.9, multiple weights of feature B1 are determined; for the feature For each weight of B1, the weight of feature B2 is determined to obtain each candidate weight combination; each candidate weight combination includes the weight of feature B1 and the weight of feature B2, and the weight of feature B1 and the weight of feature B2 in each candidate weight combination The sum of the weights is equal to 1.

After each candidate weight combination of feature B1 and feature B2 is obtained, the absolute value of the difference between the manual score value and the quality score value of the sample document can be determined for each candidate weight combination; when the sample document is an English document, the feature B1 When the number of words is represented, and the feature B2 represents the average length of the sentence, Table 5 shows the manual score value and quality score value of the sample document corresponding to each candidate weight combination.

table 5

Based on the manual rating value and the quality rating value shown in Table 5, and according to the content described above, one weight combination can be selected from the multiple candidate weight combinations as the initial weight of the default feature and the non-default feature.

Two implementations for deriving the quality score value of the document to be processed are exemplarily described below.

The first implementation

The document to be processed is a Chinese document, and the target features of the document to be processed include length-related features, template-related features, and part-of-speech-related features; wherein, the length-related features represent the number of words in the to-be-processed document, and the template-related features represent the difference between the to-be-processed document and the preset template. Similarity, the part-of-speech-related feature represents the ratio of the number of words of the preset part-of-speech to the number of all words in the document to be processed. For example, the preset part-of-speech includes verbs and nouns.

In the embodiment of the present application, a plurality of different character count intervals may be predetermined, and each character count interval corresponds to a value. In this way, the value of the length-related feature can be obtained by processing the discretized data of the character count.

In some embodiments, the value of the length-related feature can be determined according to Table 6.

Table 6

字数word count	长度相关特征的取值Values of length-dependent features
字数<100Word count < 100	00

100≤字数<500100≤words<500	0.20.2
500≤字数<900500≤words<900	0.40.4
900≤字数<1300900≤words<1300	0.60.6
1300≤字数<17001300≤words<1700	0.80.8
1700≤字数<20001700≤words<2000	11
字数>2000Word count > 2000	11

In this embodiment of the present application, Apache POI can be used to extract content attribute data from the document to be processed and the preset template, and the content attribute data can include at least one of the following: main title, subtitle, body text, and summary. Title No. 1, Title No. 2, Title No. 3, Title No. 4, Title No. 5, etc; Convert to document feature vector.

In some embodiments, the content attribute data of the preset template is: (title, title No. 1, body, summary), and the document feature vector of the preset template is [1, 1, 1, 1]; When the content attribute data does not contain any of the title, title No. 1, text and summary, the document feature vector of the document to be processed is set to a vector of all zeros; when the content attribute data of the document to be processed contains title, No. 1 In the case of any one of title, text, and summary, determine whether any part of the content attribute data of the document to be processed belongs to the content attribute data of the preset template, and if so, the vector corresponding to any part of the data in the document feature vector The value of the component is 1; if not, the value of the vector component corresponding to any part of the data in the document feature vector is -1.

For ease of understanding, the following three examples are used to illustrate. In the first example, the document to be processed is document 1, and the content attribute data of document 1 is: (title, title No. 1, text, summary), then by comparing the Assuming the content attribute data of the template and document 1, it can be determined that the document feature vector of document 1 is [1, 1, 1, 1]; in the second example, the document to be processed is document 2, and the content attribute data of document 2 is : (title, title 3, title 4, title 5, text, summary), then by comparing the preset template with the content attribute data of document 2, the document feature vector of document 2 can be determined as [1,-1, -1,-1,1,1]; in the third example, the document to be processed is document 3, and the content attribute data of document 3 (title 3, title 4, title 5), it can be seen that the document The content attribute data of 3 is completely different from the content attribute data of the preset template. The content attribute data of document 3 includes any one of the title, the first title, and the summary in the text. Therefore, it can be determined that the document feature vector of document 3 is [ 0,0,0,].

After the document feature vectors of the document to be processed and the preset template are obtained, the similarity between the document to be processed and the preset template can be determined based on the document feature vectors of the document to be processed and the preset template, that is, the selection of the template-related features can be determined. value.

In some embodiments, when the dimensions of the document feature vectors of the document to be processed and the preset template are the same, the similarity between the document to be processed and the preset template may be a cosine similarity, and the calculation formula of the cosine similarity is the formula ( 2).

Among them, G and H represent the document feature vectors of the document to be processed and the preset template, respectively, ||G|| represents the length of the vector G, ||H|| represents the length of the vector H, and G·H represents the vector G and the vector H The dot product of , cos(θ) represents the cosine similarity between the document to be processed and the preset template. It can be seen that cos(θ) represents the value of the template-related features.

It can be understood that the cosine similarity represents the cosine value of the angle between the two vectors. When the cosine similarity is large, it means that the vector G and the vector H are relatively similar; on the contrary, when the cosine similarity is small, it means that the vector G and the vector H exist. larger difference.

In some embodiments, when the document to be processed is the above-mentioned document 1, according to formula (2), it can be determined that the cosine similarity between the document to be processed and the preset template is 1, that is, the similarity of the template-related features of the document to be processed is 1. The value is 1; when the document to be processed is the above-mentioned document 1, according to formula (2), it can be determined that the cosine similarity between the document to be processed and the preset template is 1, that is, the value of the template-related features of the document to be processed is The value is 1.

In the embodiments of the present application, the part-of-speech related features may be determined according to the proportion of nouns and verbs in the document to be processed in all words in the document to be processed; in some embodiments, the number of nouns and the number of verbs in the document to be processed is 20, The total number of words is 50, and the value of part-of-speech-related features is 0.6.

In some embodiments, the word count of the document to be processed is greater than 2000, the document feature vector of the preset template is [1, 1, 1, 1], the document feature vector of the document to be processed is [1, 1, 1, 1], The ratio of nouns and verbs in the document to be processed to all words in the document to be processed is 0.6; it can be determined that the length-related features, template-related features and part-of-speech features of the to-be-processed document are 1, 1 and 0.6 respectively; When the weights of features, template-related features, and part-of-speech features are 0.2, 0.4, and 0.4, respectively, the quality score value of the document to be processed can be calculated according to formula (1), that is, the quality score value of the document to be processed is 0.84 ; In some embodiments, the quality score value of the document to be processed can also be multiplied by 100 to obtain the quality score value of the document to be processed under the percentile system. Here, the quality score of the document to be processed under the percentile system is 84.

The second implementation

The document to be processed is an English document, and the target features of the document to be processed include feature C1, feature C2, feature C3 and feature C4, wherein feature C1 is the default feature, indicating the number of words in the document to be processed; feature C2, feature C3 and feature C4 For non-default features, feature C2 represents the average sentence length of the document to be processed, feature C3 represents the number of document errors in the document to be processed, and feature C4 represents the number of advanced vocabulary of the document to be processed; here, document errors include but are not limited to word spelling errors, Errors in the use of punctuation, the first letter of the first word of each sentence is not capitalized, etc. Advanced vocabulary means vocabulary located in a predetermined advanced vocabulary. In practical applications, users can pre-determine advanced vocabulary according to the content of the document to be processed. surface.

In some embodiments, a plurality of different word count intervals may be predetermined, and each word count interval corresponds to a value. In this way, the value of the feature C1 can be obtained by processing the discretized data of the word count; for example, On the basis of Table 6, the number of words can be replaced by the number of words, and then a plurality of word count intervals and the values corresponding to each word count interval can be obtained.

In some embodiments, after obtaining the length of each sentence in the document to be processed, the length of each sentence can be averaged to obtain the average sentence length; in order to determine the value corresponding to the average sentence length, a plurality of sentences can be predetermined. Length interval, each sentence length interval corresponds to a value. In this way, the value of feature C2 can be obtained by processing the discretized data of the average length of the sentence.

In some embodiments, the value corresponding to the average sentence length can be obtained according to Table 7.

Table 7

句子平均长度average sentence length	特征C2的取值The value of feature C2
句子平均长度<5Average sentence length < 5	00
5≤句子平均长度<75≤Sentence average length<7	0.20.2
7≤句子平均长度<97≤Sentence average length<9	0.40.4
9≤句子平均长度<119≤Sentence average length<11	0.60.6
11≤句子平均长度<1311≤Sentence average length<13	0.80.8
13≤句子平均长度13≤sentence average length	11

In some embodiments, after the number of document errors, the number of document errors can be used as the independent variable of the exponential function, and the value of the dependent variable of the exponential function can be used as the value of the feature C3; here, the base of the exponential function is greater than 0 and less than 1. It is understandable that when the number of document errors is larger, the value of feature C3 is smaller.

Here, the exponential function can be the following formula (3):

Y=R ^x (3);

Among them, X represents the number of document errors, Y represents the value of feature C3, and R∈(0,1), for example, the value of R is 0.9.

In some embodiments, after obtaining the number of advanced words in the document to be processed, a plurality of intervals of the number of advanced words may be predetermined, and each interval of the number of advanced words corresponds to a value. In this way, by discretizing the number of advanced words, The value of feature C4 can be obtained; in an example, when the number of advanced vocabulary is greater than or equal to 20, the value of feature C4 is 1.

In some embodiments, the number of words in the document to be processed is 700, the average sentence length is 20, the number of document errors is 2, the number of advanced vocabulary is 20, the total number of sentences is 40, and the value of R is 0.9; The values of feature C1, feature C2, feature C3 and feature C4 of the document are 0.4, 1, 0.81 and 1 respectively; the weights of feature C1, feature C2, feature C3 and feature C4 are 0.4, 0.2, 0.2 and 0.2 respectively In this case, the quality score value of the document to be processed can be calculated according to formula (1), that is, the quality score value of the document to be processed is 0.722; in some embodiments, the quality score value of the document to be processed can also be multiplied by 100, obtain the quality score value of the document to be processed under the percentile system, here, the quality score value of the document to be processed under the percentile system is 72.2.

The embodiments of the present application can be applied to any document management scenario. In the case where the documents to be processed are pre-plan documents, using the document processing method of the embodiments of the present application, firstly, based on the network communication structure shown in FIG. 1, the electronic device and the The communication of the third-party platform; then, the third-party platform can send the configuration file and file package to the electronic device, and the electronic device can extract the target feature according to the configuration file and file package and adopt NLP and other technologies; finally, based on the extracted The target feature can realize the evaluation and audit of the quality of the plan document, which is beneficial to further optimize the plan document.

On the basis of the document processing method proposed in the foregoing embodiment, an embodiment of the present application also proposes a document processing apparatus; FIG. 5 is a schematic diagram of an optional composition structure of the document processing apparatus according to the embodiment of the present application, as shown in FIG. 5 . As shown, the document processing apparatus 500 may include:

The first obtaining module 501 is configured to obtain documents to be processed;

The receiving module 502 is configured to receive a configuration file sent by a third-party platform, where the configuration file includes an identifier of a target feature of a document to be processed and path information of a file package provided by the third-party platform; The first information of the feature extraction method of the target feature;

The second obtaining module 503 is configured to obtain the file package based on the path information of the file package when the identifier of the target feature is different from the identifier of the default feature;

The processing module 504 is configured to extract the target feature from the document to be processed based on the first information in the file package.

The second obtaining module 503 is further configured to load the custom class in the file package through the reflection mechanism of the programming language, and obtain the first information from the loaded custom class.

The second obtaining module 503 is configured to load the custom class in the file package through the reflection mechanism of the programming language, including:

In some embodiments of the present application, the second obtaining module 503 is further configured to obtain a preset encryption mode of the second information; based on the decryption mode corresponding to the encryption mode of the second information, The encrypted information in the configuration file is decrypted to obtain the second information; wherein the encrypted information is obtained by encrypting the second information based on the encryption method.

In some embodiments of the present application, the second obtaining module 503 is further configured to predetermine an abstract class, and set the custom class to inherit the predetermined abstract class;

The second obtaining module 503 is configured to obtain the first information from the loaded custom class, including:

In some embodiments of the present application, the processing module 504 is further configured to, in the case that the identifier of the target feature is the same as the identifier of the default feature, based on the predetermined extraction method of the default feature, in the The target feature is extracted from the document to be processed.

In some embodiments of the present application, the processing module 504 is further configured to perform a quality score on the document to be processed based on the target feature, and obtain a quality score value of the document to be processed.

The processing module 504 is configured to perform a quality score on the document to be processed based on the target feature, and obtain a quality score value of the document to be processed, including:

In some embodiments of the present application, the processing module 504 is configured to extract the target feature from the document to be processed, including:

In practical applications, the first acquisition module 501, the receiving module 502, the second acquisition module 503, and the processing module 504 can all be implemented by processors, and the above processors can be ASIC, DSP, DSPD, PLD, FPGA, CPU, controller , at least one of a microcontroller and a microprocessor. It can be understood that the electronic device that implements the function of the above processor may also be other, which is not limited in the embodiment of the present application.

It should be noted that the descriptions of the above apparatus embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to the method embodiments. For technical details not disclosed in the device embodiments of the present application, please refer to the descriptions of the method embodiments of the present application for understanding.

It should be noted that, in the embodiments of the present application, if the above-mentioned document processing method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence or in the parts that make contributions to the prior art. The computer software products are stored in a storage medium and include several instructions for A computer device (which may be a terminal, a server, etc.) is caused to execute all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a U disk, a mobile hard disk, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes. As such, the embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, the embodiments of the present application further provide a computer program product, where the computer program product includes computer-executable instructions, and the computer-executable instructions are used to implement any one of the document processing methods provided by the embodiments of the present application.

Correspondingly, an embodiment of the present application further provides a computer storage medium, where computer-executable instructions are stored on the computer storage medium, and the computer-executable instructions are used to implement any one of the document processing methods provided in the foregoing embodiments.

An embodiment of the present application further provides an electronic device, and FIG. 6 is an optional structural schematic diagram of the electronic device provided by the embodiment of the present application. As shown in FIG. 6 , the electronic device 60 includes:

memory 601, configured to store executable instructions;

The processor 602 is configured to implement any one of the above document processing methods when executing the executable instructions stored in the memory 601 .

The above-mentioned processor 602 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor.

The above-mentioned computer-readable storage medium/memory can be a read-only memory (Read Only Memory, ROM), a programmable read-only memory (Programmable Read-Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read-Only Memory) Memory, EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Random Access Memory (FRAM), Flash Memory (Flash Memory), Magnetic Surface Memory, optical disk, or memory such as Compact Disc Read-Only Memory (CD-ROM); it can also be various terminals including one or any combination of the above memories, such as mobile phones, computers, tablet devices, personal digital Assistant etc.

It should be pointed out here that the descriptions of the above storage medium and device embodiments are similar to the descriptions of the above method embodiments, and have similar beneficial effects to the method embodiments. For technical details not disclosed in the embodiments of the storage medium and device of the present application, please refer to the description of the method embodiments of the present application to understand.

It is to be understood that reference throughout the specification to "some embodiments" means that a particular feature, structure or characteristic associated with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of "in some embodiments" in various places throughout this specification are not necessarily necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation. The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.

The unit described above as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit; it may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present application.

In addition, each functional unit in each embodiment of the present application may all be integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration The unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.

Alternatively, if the above-mentioned integrated units of the present application are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application may be embodied in the form of software products in essence or the parts that make contributions to related technologies. The computer software products are stored in a storage medium and include several instructions to make The automatic test line of the device performs all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The methods disclosed in the several method embodiments provided in this application can be arbitrarily combined under the condition of no conflict to obtain new method embodiments.

The features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.

The above is only the embodiment of the present application, but the protection scope of the present application is not limited to this. Covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Industrial Applicability

Embodiments of the present application provide a document processing method, apparatus, device, and computer-readable storage medium; the method includes: acquiring a document to be processed; receiving a configuration file sent by a third-party platform, where the configuration file includes a target of the document to be processed The identifier of the feature and the path information of the file package provided by the third-party platform; the file package includes the first information representing the feature extraction method of the target feature; if the identifier of the target feature is different from that of the default feature In this case, the file package is acquired based on the path information of the file package; and the target feature is extracted from the to-be-processed document based on the first information in the file package. It can be seen that, in the embodiment of the present application, in the case where the target feature of the document to be processed needs to be extracted and the target feature is not the default feature, in order to achieve the target feature extraction, no new program code written and run locally is not required, but It is an extraction method that can directly obtain target features from third-party platforms, which reduces time and labor costs to a certain extent.

Claims

A document processing method, the method comprising:

Get pending documents;

Receive a configuration file sent by a third-party platform, where the configuration file includes the identifier of the target feature of the document to be processed and the path information of the file package provided by the third-party platform; the file package includes a feature extraction method that characterizes the target feature first information;

When the identifier of the target feature is different from the identifier of the default feature, acquiring the file package based on the path information of the file package;

Based on the first information in the file package, the target feature is extracted from the document to be processed.
The document processing method according to claim 1, wherein the file package includes a custom class, and the first information is located in the custom class;

The method further includes: loading the custom class in the file package through a reflection mechanism of a programming language, and acquiring the first information from the loaded custom class.
The document processing method according to claim 2, wherein the configuration file further includes second information, and the second information includes: an identifier of the file package and/or an identifier of the custom class;

The loading of the custom class in the file package through the reflection mechanism of the programming language includes:

In the case where it is determined that the second information in the configuration file is information pre-agreed with the third-party platform, the custom class in the file package is loaded through the reflection mechanism of the programming language.
The document processing method according to claim 3, wherein the method further comprises:

obtaining a preset encryption method of the second information;

Decrypt the encrypted information in the configuration file based on the decryption method corresponding to the encryption method of the second information to obtain the second information; wherein the encrypted information is based on the encryption method to the second information. information is encrypted.
The document processing method according to claim 2, wherein the method further comprises:

Predetermining an abstract class, and setting the custom class to inherit the predetermined abstract class;

The obtaining the first information from the loaded custom class includes:

The custom class is instantiated as an object, and when the object belongs to the abstract class, the first information is obtained from the loaded custom class.
The document processing method according to claim 1, wherein the method further comprises:

In the case that the identifier of the target feature is the same as the identifier of the default feature, the target feature is extracted from the document to be processed based on a predetermined extraction method of the default feature.
The document processing method according to any one of claims 1 to 6, wherein the method further comprises:

A quality score is performed on the document to be processed based on the target feature, and a quality score value of the document to be processed is obtained.
The document processing method according to claim 7, wherein the target feature includes at least two features; the configuration file includes weight information of each of the at least two features;

Performing a quality score on the document to be processed based on the target feature to obtain a quality score value of the document to be processed, including:

Based on the weight information of each of the at least two features, a weighted sum operation is performed on each of the at least two features to obtain a quality score value of the document to be processed.
The document processing method according to claim 8, wherein the extracting the target feature from the document to be processed comprises:

The word count of the document to be processed is subjected to discretization data processing according to a plurality of predetermined word count intervals to obtain length-related features, each of which corresponds to a value; the document feature vector of the to-be-processed document is extracted, and the The cosine similarity between the document feature vector of the document to be processed and the document feature vector of the preset template is used as the template correlation feature; the part of speech is determined according to the number ratio of the preset part of speech in the document to be processed to all words in the document to be processed. relevant features;

At least two of length-related features, template-related features, and part-of-speech-related features are used as the target features.
The document processing method according to claim 8, wherein the extracting the target feature from the document to be processed comprises:

The word count of the document to be processed is subjected to discretization data processing according to a plurality of predetermined word count intervals to obtain a first feature, and each of the word count intervals corresponds to a value; the sentences of the document to be processed are averaged The length is discretized data processing according to a plurality of predetermined sentence length intervals to obtain a second feature, and each sentence length interval corresponds to a value; the document error number of the document to be processed is used as the independent variable of the exponential function, Obtain the value of the exponential function, and use the value of the exponential function as the third feature; perform discretization data processing on the number of advanced words of the document to be processed according to a plurality of predetermined intervals of the number of advanced words , to obtain the fourth feature, each of the high-level vocabulary count intervals corresponds to a value, and the high-level vocabulary represents a vocabulary located in a predetermined high-level vocabulary;

At least two of the first feature, the second feature, the third feature, and the fourth feature are used as the target feature.
A document processing device comprising:

The first obtaining module is configured to obtain the document to be processed;

a receiving module, configured to receive a configuration file sent by a third-party platform, the configuration file includes an identifier of a target feature of the document to be processed and path information of a file package provided by the third-party platform; the file package includes a file representing the target the first information of the feature extraction method of the feature;

a second acquiring module, configured to acquire the file package based on the path information of the file package when the identifier of the target feature is different from the identifier of the default feature;

A processing module, configured to extract the target feature from the document to be processed based on the first information in the file package.
The apparatus of claim 11, wherein the file package includes a custom class, and the first information is located in the custom class;

The second obtaining module is further configured to load the custom class in the file package through the reflection mechanism of the programming language, and obtain the first information from the loaded custom class.
The apparatus according to claim 12, wherein the configuration file further includes second information, the second information including: an identifier of the file package and/or an identifier of the custom class;

The second acquisition module is configured to load the custom class in the file package through the reflection mechanism of the programming language, including:

In the case where it is determined that the second information in the configuration file is information pre-agreed with the third-party platform, the custom class in the file package is loaded through the reflection mechanism of the programming language.
The device according to claim 13, wherein the second obtaining module is further configured to obtain a preset encryption method of the second information; based on the decryption method corresponding to the encryption method of the second information, the The encrypted information in the configuration file is decrypted to obtain the second information; wherein, the encrypted information is obtained by encrypting the second information based on the encryption method.
The apparatus according to claim 12, wherein the second obtaining module is further configured to predetermine an abstract class, and set the custom class to inherit the predetermined abstract class;

The second obtaining module is configured to obtain the first information from the loaded custom class, including:

The custom class is instantiated as an object, and when the object belongs to the abstract class, the first information is obtained from the loaded custom class.
The apparatus according to claim 11, wherein the processing module is further configured to, in the case that the identification of the target feature is the same as the identification of the default feature, based on a predetermined extraction method of the default feature The target feature is extracted from the document to be processed.
The apparatus according to any one of claims 11 to 16, wherein the processing module is further configured to perform a quality score on the document to be processed based on the target feature, and obtain a quality score value of the document to be processed .
The apparatus of claim 17, wherein the target feature includes at least two features; the configuration file includes weight information for each of the at least two features;

The processing module is configured to perform a quality score on the document to be processed based on the target feature, and obtain a quality score value of the document to be processed, including:

Based on the weight information of each of the at least two features, a weighted sum operation is performed on each of the at least two features to obtain a quality score value of the document to be processed.
The apparatus according to claim 18, wherein the processing module, configured to extract the target feature from the document to be processed, comprises:

The word count of the document to be processed is subjected to discretization data processing according to a plurality of predetermined word count intervals to obtain length-related features, each of which corresponds to a value; the document feature vector of the to-be-processed document is extracted, and the The cosine similarity between the document feature vector of the document to be processed and the document feature vector of the preset template is used as the template correlation feature; the part of speech is determined according to the number ratio of the preset part of speech in the document to be processed to all words in the document to be processed. relevant features;

At least two of length-related features, template-related features, and part-of-speech-related features are used as the target features.
The apparatus according to claim 18, wherein the processing module, configured to extract the target feature from the document to be processed, comprises:

The word count of the document to be processed is subjected to discretization data processing according to a plurality of predetermined word count intervals to obtain a first feature, and each of the word count intervals corresponds to a value; the sentences of the document to be processed are averaged The length is discretized data processing according to a plurality of predetermined sentence length intervals to obtain a second feature, and each sentence length interval corresponds to a value; the document error number of the document to be processed is used as the independent variable of the exponential function, Obtain the value of the exponential function, and use the value of the exponential function as the third feature; perform discretization data processing on the number of advanced words of the document to be processed according to a plurality of predetermined intervals of the number of advanced words , to obtain the fourth feature, each of the high-level vocabulary count intervals corresponds to a value, and the high-level vocabulary represents a vocabulary located in a predetermined high-level vocabulary;

At least two of the first feature, the second feature, the third feature, and the fourth feature are used as the target feature.
An electronic device comprising:

a memory configured to store executable instructions;

The processor, when configured to execute the executable instructions stored in the memory, implements the document processing method according to any one of claims 1 to 10.
A computer-readable storage medium storing executable instructions configured to implement the document processing method according to any one of claims 1 to 10 when executed by a processor.
A computer program, comprising computer-readable codes, when the computer-readable codes are executed in an electronic device, a processor in the electronic device executes the document processing method for implementing any one of claims 1 to 10 .