CN113657088A - Interface document analysis method and device, electronic equipment and storage medium - Google Patents

Interface document analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113657088A
CN113657088A CN202110940395.3A CN202110940395A CN113657088A CN 113657088 A CN113657088 A CN 113657088A CN 202110940395 A CN202110940395 A CN 202110940395A CN 113657088 A CN113657088 A CN 113657088A
Authority
CN
China
Prior art keywords
document
interface
document type
processed
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110940395.3A
Other languages
Chinese (zh)
Inventor
林思聪
卓泽城
张晓聪
刘晨晖
缪恒锋
陈梦林
洪赛丁
章文俊
杨哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110940395.3A priority Critical patent/CN113657088A/en
Publication of CN113657088A publication Critical patent/CN113657088A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure discloses an interface document parsing method, an interface document parsing device, electronic equipment, a storage medium and a program product, and relates to the technical field of computers, in particular to the technical field of cloud services. The specific implementation scheme is as follows: identifying the document type of the interface document to be processed; determining a target analysis mode matched with the document type; and analyzing the interface document to be processed according to the target analysis mode to obtain the interface attribute information.

Description

Interface document analysis method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for parsing an interface document, an electronic device, a storage medium, and a program product.
Background
With the continuous development of the internet era, more and more application programs are produced at the same time, and convenience is provided for the life and work of people. The application program interface realizes data access and service association among different application programs, and plays an important role in improving user experience.
Disclosure of Invention
The disclosure provides an interface document parsing method, an interface document parsing device, an electronic device, a storage medium and a program product.
According to an aspect of the present disclosure, there is provided an interface document parsing method, including: identifying the document type of the interface document to be processed; determining a target analysis mode matched with the document type; and analyzing the interface document to be processed according to the target analysis mode to obtain the interface attribute information.
According to another aspect of the present disclosure, there is provided an interface document parsing apparatus including: the type identification module is used for identifying the document type of the interface document to be processed; the mode determining module is used for determining a target analysis mode matched with the document type; and the analysis module is used for analyzing the interface document to be processed according to the target analysis mode to obtain the interface attribute information.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture to which the interface document parsing method and apparatus may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of an interface document parsing method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a flow diagram of an interface document parsing method according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a signaling diagram of an interface document parsing method according to another embodiment of the present disclosure;
FIG. 5 schematically illustrates a block diagram of an interface document parsing apparatus according to an embodiment of the present disclosure; and
FIG. 6 schematically illustrates a block diagram of an electronic device adapted to implement an interface document parsing method according to an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The Data service management (Data service management) platform aims to provide input Data with different Data sources and different Data forms to the outside in a standardized form, wherein the normalization processing of an API (application Program interface) document of an API service is an indispensable task.
In an actual production scenario, a conventional normalization implementation method generally provides a uniform document writing tool, and forms API interface documents of different document types and different data content formats into a normalized API interface document. The document writing tool generally allows the original API developer or the service provider to perform secondary entry, that is, to re-enter the contents of different modules according to the normalized document format. However, because of many clients facing the data service management platform, the secondary entry consumes a lot of manpower in repeated work, and meanwhile, in the process of carrying out secondary transfer entry on the document, the problem of writing errors is easily caused by human negligence.
Another way to achieve normalization is to divide the files provided by the user into modules in the form of compressed packets. However, the realization method cannot guarantee uniform document impression, and actually, the API document modules are simply spliced, so that the document content cannot be dynamically increased or decreased according to the requirement.
The disclosure provides an interface document parsing method, an interface document parsing device, an electronic device, a storage medium and a program product.
According to an embodiment of the present disclosure, there is provided an interface document parsing method including: identifying the document type of the interface document to be processed; determining a target analysis mode matched with the document type; and analyzing the interface document to be processed according to the target analysis mode to obtain the interface attribute information.
According to the embodiment of the disclosure, the interface document to be processed is automatically identified and analyzed, and the interface attribute information is obtained. Therefore, the workload of secondary input is reduced, and the accuracy and the flexibility of normalization processing are improved.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
Fig. 1 schematically illustrates an exemplary system architecture to which the interface document parsing method and apparatus may be applied, according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the interface document parsing method and apparatus may be applied may include a terminal device, but the terminal device may implement the interface document parsing method and apparatus provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, the system architecture 100 according to this embodiment may include a terminal device 101, a network 102, a client 103, and a data service management platform 104. Network 102 is the medium used to provide communication links between terminal devices 101, network 102, clients 103, and data service management platform 104. Network 102 may include various connection types, such as wired and/or wireless communication links, and so forth.
A user may use the terminal device 101 to interact with the data service management platform 104, and the client 103, over the network 102, to receive or send messages and the like. Various messaging client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only) may be installed on terminal device 101.
The terminal device 101 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The data service management platform 104 may be a server providing various services, for example, receiving an interface document sent by the client 103, and performing identification and parsing processing on the interface document to form a normalized interface document with a uniform form.
It should be noted that the interface document parsing method provided by the embodiment of the present disclosure may be generally executed by the data service management platform 104. Accordingly, the interface document parsing apparatus provided by the embodiment of the present disclosure may also be disposed in the data service management platform 104.
For example, the data service management platform 104 may identify and analyze the API interface document provided by the client 103 by using an interface document analysis method to obtain the interface attribute information. The data service management platform 104 is made to respond to the request sent by the user through the terminal device 101, connect to the API interface provided by the client 103 by using the interface attribute information, retrieve the information of the client 103 in real time, and send the information to the terminal device 101 for presentation to the user.
It should be understood that the number of terminal devices, networks, clients, data service management platforms in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
FIG. 2 schematically shows a flow diagram of an interface document parsing method according to an embodiment of the disclosure.
As shown in fig. 2, the method includes operations S210 to S230.
In operation S210, a document type of the interface document to be processed is identified.
In operation S220, a target parsing method matching the document type is determined.
In operation S230, the interface document to be processed is analyzed according to the target analysis mode, so as to obtain the interface attribute information.
According to an embodiment of the present disclosure, the interface document to be processed may be an API interface document to be processed.
According to an embodiment of the present disclosure, the interface document to be processed may be a standardized document generated by writing according to a standard API document specification, i.e., a standard document type, but is not limited thereto. The generated document can also be written according to a custom rule, namely a custom document type. The generated document may also be written irregularly. According to the embodiment of the disclosure, the interface document to be processed can be divided into different document types based on different writing modes.
According to the embodiment of the disclosure, the target document type of the interface document to be processed can be determined from a plurality of different types of document types after the identification operation. And analyzing the interface document to be processed by utilizing a target analysis mode matched with the type of the target document to obtain an analysis result.
According to the embodiment of the disclosure, the interface attribute information may be a part of information in the analysis result, or may be all information in the analysis result. According to the interface attribute information provided by the embodiment of the disclosure, the effect of connecting the target interface and calling data through the target interface can be realized.
According to the embodiment of the disclosure, the interface document to be identified and processed is identified and analyzed, so that the interface attribute information can be automatically obtained, the automatic and intelligent effects are realized, the manpower is liberated, and the efficiency is improved.
In a related technology in the field, the operation of identifying the document type of the interface document to be processed may not be executed, and the interface document to be processed is directly parsed in a uniform parsing manner, so as to obtain the interface attribute information. For example, the interface document to be processed of the standard document type and the interface document to be processed of the custom document type are parsed by utilizing a parsing mode matched with the standard document type. In this case, the to-be-processed interface document of the user-defined document type is analyzed in an analysis mode matched with the standard document type, and the problems that the to-be-processed interface document cannot be completely analyzed and cannot be accurately analyzed exist.
According to the embodiment of the disclosure, the document type of the interface document to be processed is firstly identified, and the interface document to be processed is analyzed by using a target analysis mode matched with the document type of the interface document to be processed. For example, the interface document to be processed of the standard document type is analyzed by using a target analysis mode matched with the standard document type, so that the target analysis mode is matched with the document type, and the integrity and the accuracy of analyzing the interface document to be processed are further improved.
The method, for example, as shown in fig. 2, is further described below with reference to fig. 3-4 in conjunction with specific embodiments.
According to an embodiment of the present disclosure, the document type may include any one of a standard document type, a custom document type, and a general document type.
According to embodiments of the present disclosure, a standard document type may refer to a type of document generated based on a standardized rule. For example, interface documents compiled using standard rules recognized in the art, such as Swagger interface documents, OpenApi interface documents, Postman interface documents, and the like.
According to an embodiment of the present disclosure, each standard document type of interface document to be processed has its fixed, canonical layout format. The recognition can be performed with the layout format as a criterion.
For example, the layout format of the interface document to be processed is compared with the layout format of the interface document of a known standard document type, and if the matching is successful, the standard document type is determined as the document type of the interface document to be processed.
According to the exemplary embodiment of the disclosure, the layout formats of the known standard document types can be classified and unified to obtain the preset layout format rules, and the preset layout format rules for judgment are obtained.
According to an embodiment of the present disclosure, operation S210 may include identifying a layout format of the interface document to be processed, and determining that the document type of the interface document to be processed is a standard document type if the layout format of the interface document to be processed satisfies a preset layout format rule.
According to an embodiment of the present disclosure, operation S220 may include, in a case where the document type is a standard document type, determining that the target parsing manner matching the document type is a standard parsing manner.
According to an embodiment of the present disclosure, operation S230 may include parsing the interface document to be processed in a standard parsing manner to obtain the interface attribute information.
For example, a standard parser matched with the standard document type may be called to parse the interface document to be processed according to a standard parsing manner, so as to obtain the interface attribute information.
According to the embodiment of the present disclosure, the parsing method is not limited, and may be any parsing method as long as the parsing method is suitable for the to-be-processed interface document of the standard document type. For example, a standard parser is used to parse and identify the syntactic structure in the interface document to be processed to obtain a node tree, each node in the node number is analyzed, and interface attribute information contained in each node is obtained through parsing.
According to another embodiment of the present disclosure, a custom document type may refer to a type of document generated according to user-defined rules.
According to an embodiment of the present disclosure, operation S210 may include searching for target text information from a preset location of the interface document to be processed to determine whether a document type of the interface document to be processed is a custom document type. And determining the document type as a self-defined document type under the condition that the target text information is searched.
According to an embodiment of the present disclosure, the target text information may be identification information for identifying a document type of the interface document to be processed. The target text information may be a character string or other types of text information, and as long as the target text information can play a role in identification, the description thereof is omitted here.
According to the embodiment of the disclosure, in the actual application process, the target text information marking of the document type can be performed at a certain position of the interface document by appointing with a user in advance. The preset position for identifying the document type may be a document header of the interface document, but is not limited thereto, and may be any position of the interface document as long as the other position does not affect the identification and analysis of the interface attribute information of the interface document.
According to the embodiment of the present disclosure, by marking the target text information at the preset position, the identification operation of the document type can be made fast and accurate.
According to another embodiment of the present disclosure, operation S220 may include, in a case that the document type is a custom document type, determining that the target parsing scheme matching the document type is a custom parsing scheme.
According to another embodiment of the present disclosure, operation S230 may include parsing the interface document to be processed in a custom parsing manner to obtain the interface attribute information.
According to the embodiment of the disclosure, a custom parser matched with the custom document type to-be-processed interface document can be designed for the custom document type to-be-processed interface document, and the custom parser is used for parsing and identifying the interface attribute information in the to-be-processed interface document.
According to the embodiment of the disclosure, the parsing manner of the custom parser is not limited. The syntax structure of the interface document to be processed can be analyzed by using the syntax tree in a similar way to the analysis mode of the standard analyzer, so that the interface attribute information contained in the node is obtained. The difference from the standard parser is that the syntactic structure, rules of the custom parser are different from the syntactic structure, rules of the standard parser.
According to an embodiment of the present disclosure, a to-be-processed interface document of a general document type may refer to an interface document that fails to identify a specific document type.
According to another embodiment of the present disclosure, operation S210 may include identifying a layout format of the interface document to be processed and searching for target text information from a preset position of the interface document to be processed, and determining that the document type is a general document type in a case that the layout format of the interface document to be processed does not satisfy a preset layout format rule and the target text information is not searched.
According to another embodiment of the present disclosure, operation S220 may include, in a case that the document type is a general document type, determining that the target parsing scheme matching the document type is a general parsing scheme.
According to another embodiment of the present disclosure, operation S230 may include parsing the interface document to be processed in a general parsing manner to obtain the interface attribute information.
According to the embodiment of the disclosure, a universally applicable parsing mode, such as a character stream recognition scheme, can be designed to recognize the interface attribute information in the interface document to be processed.
For example, performing word segmentation on the interface document to be processed to obtain a word segmentation result; and identifying the word segmentation result and determining the interface attribute information.
According to the embodiment of the disclosure, paragraph division and short sentence division can be performed before word segmentation is performed on the interface document to be processed. For example, sentence division processing is performed based on punctuation marks. According to the embodiment of the disclosure, the operation of sentence segmentation processing is executed before word segmentation processing is performed, so that subsequent word segmentation processing is simple and effective.
According to the embodiment of the disclosure, the word segmentation model can be utilized to perform word segmentation on the interface document to be processed. The segmentation Model may be a Hidden Markov Model (HMM), but is not limited thereto, and a segmentation Model architecture capable of achieving a segmentation processing effect known in the art may be applied.
According to the embodiment of the disclosure, the word segmentation result may be identified by using a bert (bidirectional Encoder retrieval from transforms) model, and the interface attribute information may be determined.
According to the embodiment of the disclosure, the historical interface document and the corresponding label (i.e., the interface attribute information) can be used as a training sample, and the BERT model is trained by using the training sample, so as to obtain a model suitable for identifying the interface attribute information in the interface document to be processed provided by the embodiment of the disclosure.
By using the interface document analysis method provided by the embodiment of the disclosure, the document types of the interface document are divided into multiple types, and the analysis is performed by using different analysis modes based on different document types, so that the analysis modes are strong in pertinence, high in matching, complete in analysis, high in accuracy and good in analysis experience.
Fig. 3 schematically shows a flow diagram of an interface document parsing method according to another embodiment of the present disclosure.
As shown in FIG. 3, identifying the document type of the to-be-processed interface document 310 can result in the document type being a standard document type 320, a custom document type 330, or a generic document type 340.
In the case that the document type is the standard document type 320, the standard parser is used to parse the interface document 310 to obtain the interface attribute information 350.
In the case that the document type is the custom document type 330, the custom parser is used to parse the interface document 310 to obtain the interface attribute information 350.
In the case that the document type is the general document type 340, the to-be-processed interface document 310 is parsed in a general parsing manner to obtain the interface attribute information 350.
It should be noted that, before the operation of identifying the document type of the interface document to be processed is performed, it may be determined whether there is a specified description document type in the information from the client, and in the case that it is determined that there is the specified document type, the interface document to be processed is directly parsed in a parsing manner matching the specified document type, and in the case that it is determined that there is no specified document type, the operation of identifying the document type of the interface document to be processed is performed.
According to the embodiment of the disclosure, the document types are related to multiple types, different analysis modes are matched according to different document types, the matching performance is high, and the application range is wide.
According to the embodiment of the disclosure, the content formats of the interface document to be processed are various, and the interface document to be processed can have text formats, such as XML, Word, Json and the like; there are also non-text formats such as PDF format, picture format, etc. In the actual application process, not only text format, but also non-text format. And the non-text format would not facilitate performing the interface document parsing method provided by the embodiments of the present disclosure.
According to the embodiment of the disclosure, in order to facilitate operations such as document type identification and analysis on the interface document to be processed, format conversion operation can be performed on the interface document to be processed. For example, a format conversion operation that converts a non-text format to a text format.
According to the embodiment of the present disclosure, the content format of the interface document to be processed may be identified before performing operation S210, i.e., the operation of identifying the document type of the interface document to be processed; under the condition that the content format of the interface document to be processed is determined to be a text format, directly executing the operation of identifying the document type of the interface document to be processed; and under the condition that the content format of the interface document to be processed is determined to be a non-text format, converting the content format of the interface document to be processed into a text format.
According to the embodiment of the present disclosure, the format conversion manner is not particularly limited. Any format conversion method known in the art may be used. For example, for an image format with table content, the interface document to be processed in a text format may be obtained by identifying text information in the image. For another example, for a non-text format of the PDF type, the non-text format may be converted into an image format, and then text information in the image is extracted to obtain a to-be-processed interface document in a text format.
According to an exemplary embodiment of the present disclosure, a format conversion operation of converting an image format into a text format may be accomplished by constructing a neural network for image recognition.
By using the interface document analysis method provided by the embodiment of the disclosure, the interface document to be processed in a non-text format can be converted into the interface document to be processed in a text format, so that the processing types are multiple and the application range is wide.
According to the embodiment of the present disclosure, the interface attribute information includes one or more of interface function information, request mode information, request parameter information, return result information, and remark information, but is not limited thereto, and is only required to be entity information required when the application program interface is called.
According to an embodiment of the present disclosure, the interface document parsing method may further include the following operations.
For example, extracting the identification information of the interface document to be processed; searching historical access information matched with the identification information from the historical access information set based on the identification information; under the condition that historical access information matched with the identification information is searched, extracting interface attribute information from the historical access information; and in the case that the historical access information matched with the identification information is not searched, performing the operation of identifying the document type of the interface document to be processed.
According to the embodiment of the disclosure, the identification information may be extracted from the interface document to be processed in response to acquiring the interface document to be processed, but the present disclosure is not limited thereto, and the identification information may also be extracted from the request information in response to request information from the client for parsing the interface document to be processed.
According to the embodiment of the disclosure, the type of the identification information of the interface document to be processed is not limited, and the identification information may be an identification capable of identifying the unique identity of the interface document to be processed.
According to an embodiment of the present disclosure, the set of historical access information may be a set of access logs. The access log set may be a set of a plurality of access logs accumulated over a long period of time. Each access log may have interface attribute information about the API interface document recorded therein, and may also have identification information about the API interface document recorded therein. The target access log matched with the identification information of the interface document to be processed can be obtained from the access log set, and the interface attribute information can be extracted from the target access log.
According to the embodiment of the present disclosure, the interface attribute information may be extracted from the target access log by using a word segmentation model (e.g., HMM model) or a classification model (e.g., BERT model), but the present disclosure is not limited thereto, and may be any model architecture as long as the interface attribute information can be extracted from the target access log.
According to the embodiment of the present disclosure, in the case that the target access log matching the identification information is searched, and the interface attribute information is extracted from the target access log, the interface attribute information extracted from the target access log may be directly used as the final interface attribute information, and the subsequent operation of identifying the text type of the interface document to be processed may be stopped, but the present disclosure is not limited thereto. The interface attribute information extracted from the target access log can be used as pre-judgment interface attribute information, so that under the condition that the text type of the interface document to be processed is identified, the interface document to be processed is analyzed by utilizing an analysis mode matched with the text type of the interface document to be processed, and the obtained interface attribute information is obtained, the interface attribute information obtained after analysis is combined with the pre-judgment interface attribute information, and the pre-judgment interface attribute information is utilized to assist in determining the accuracy of the interface attribute information obtained by analysis, so that the interface attribute information can be revised in time when an error occurs, or the unresolved result can be supplemented in time.
According to the embodiment of the disclosure, the interface attribute information may be modularized decentralized information, and the interface attribute information may be combined to obtain a complete normalized interface document.
According to the embodiment of the disclosure, the interface attribute information may be modularized decentralized information, and the interface attribute information may be configured to a corresponding position of the interface document template according to the interface document template to obtain a normalized interface document.
According to the embodiment of the disclosure, after the interface attribute information is assembled to obtain the normalized interface document, the normalized interface document can be stored so as to facilitate subsequent query and call.
According to the exemplary embodiment of the disclosure, the interface attribute information can be sent to the client side after being obtained so as to be displayed to the user, the user can correct the extracted interface attribute information, revise the interface attribute information when errors or problems are found, and return the revised interface attribute information through the client side. So as to receive revised interface attribute information from the client and generate a normalized interface document based on the revised interface attribute information. And under the condition that the user confirms without errors, the confirmation message is directly returned without revision.
Fig. 4 schematically shows a signaling diagram of an interface document parsing method according to another embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S410 to S440.
In operation S410, the data service management platform receives a pending interface document from the client.
In operation S420, the data service management platform performs an operation of identifying a document type of the interface document to be processed, determining a target parsing manner matched with the document type, and parsing the interface document to be processed according to the target parsing manner to obtain interface attribute information.
In operation S430, the interface attribute information is transmitted to the client so that the user can collate the interface attribute information and revise the interface attribute information if an error or problem is found.
In operation S440, the data service management platform receives the revised interface attribute information from the client.
According to the embodiment of the disclosure, the operation of checking the interface document to be processed is designed and sent to the client, so that the accuracy of analyzing the interface document to be processed can be further improved, and the use experience of the subsequent normalized interface document is improved.
FIG. 5 schematically shows a block diagram of an interface document parsing apparatus according to an embodiment of the present disclosure.
As shown in fig. 5, the interface document parsing apparatus 500 may include a type identification module 510, a manner determination module 520, and a parsing module 530.
A type identification module 510, configured to identify a document type of the interface document to be processed.
And a mode determining module 520, configured to determine a target parsing mode matching the document type.
And the analysis module 530 is configured to analyze the interface document to be processed according to the target analysis mode to obtain the interface attribute information.
According to an embodiment of the present disclosure, the document type includes at least one of: standard document type, custom document type, general document type.
According to an embodiment of the present disclosure, the type identifying module may include any one of a first determining unit, a second determining unit, and a third determining unit.
The first type determining unit is used for determining the document type as a standard document type under the condition that the layout format of the interface document to be processed meets a preset layout format rule, wherein the standard document type is the type of the document generated based on the standardized rule.
And the second type determining unit is used for determining the document type to be a self-defined document type under the condition that target text information is searched, wherein the self-defined document type is the document type generated according to a user-defined rule, and the target text information is text information searched from a preset position of the interface document to be processed.
And the third type determining unit is used for determining the document type as a general document type under the condition that the layout format of the interface document to be processed does not meet the preset layout format rule and the target text information is not searched.
According to an embodiment of the present disclosure, the manner determining module may include any one of a first manner determining unit, a second manner determining unit, and a third manner determining unit.
And the first mode determining unit is used for determining that the target analysis mode matched with the document type is the standard analysis mode under the condition that the document type is the standard document type.
And the second mode determining unit is used for determining that the target analysis mode matched with the document type is the custom analysis mode under the condition that the document type is the custom document type.
And the third mode determining unit is used for determining that the target analysis mode matched with the document type is the universal analysis mode under the condition that the document type is the universal document type.
According to the embodiment of the disclosure, the target parsing mode matched with the document type is a general parsing mode.
According to an embodiment of the present disclosure, the parsing module may include a word segmentation unit, and a recognition unit.
And the word segmentation unit is used for performing word segmentation processing on the interface document to be processed to obtain a word segmentation result.
And the recognition unit is used for recognizing the word segmentation result and determining the interface attribute information.
According to an embodiment of the present disclosure, the interface document parsing apparatus may further include an extraction module, a search module, a first determination module, and a second determination module.
And the extraction module is used for extracting the identification information of the interface document to be processed.
And the searching module is used for searching the historical access information matched with the identification information from the historical access information set based on the identification information.
And the first determining module is used for extracting the interface attribute information from the historical access information under the condition that the historical access information matched with the identification information is searched.
And the second determination module is used for executing the operation of identifying the document type of the interface document to be processed under the condition that the historical access information matched with the identification information is not searched.
According to an embodiment of the present disclosure, the interface attribute information may include at least one of: interface function information, request mode information, request parameter information, return result information and remark information.
According to an embodiment of the present disclosure, the interface document parsing apparatus may further include a format recognition module, an execution module, and a conversion module.
And the format identification module is used for identifying the content format of the interface document to be processed.
And the execution module is used for executing the operation of identifying the document type of the interface document to be processed under the condition that the content format of the interface document to be processed is a text format.
And the conversion module is used for converting the content format of the interface document to be processed into a text format under the condition that the content format of the interface document to be processed is a non-text format.
According to an embodiment of the present disclosure, the interface document parsing apparatus may further include a transmitting module, a receiving module, and a generating module.
And the sending module is used for sending the interface attribute information to the client.
And the receiving module is used for receiving the revised interface attribute information from the client.
And the generating module is used for generating a normalized interface document based on the revised interface attribute information.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described above.
According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the interface document parsing method. For example, in some embodiments, the interface document parsing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the interface document parsing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the interface document parsing method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (19)

1. An interface document parsing method, comprising:
identifying the document type of the interface document to be processed;
determining a target analysis mode matched with the document type; and
and analyzing the interface document to be processed according to the target analysis mode to obtain interface attribute information.
2. The method of claim 1, wherein the identifying a document type of the interface document to be processed comprises any one of:
determining the document type as a standard document type under the condition that the layout format of the interface document to be processed meets a preset layout format rule, wherein the standard document type is the type of a document generated based on a standardized rule;
determining the document type to be a user-defined document type under the condition that target text information is searched, wherein the user-defined document type is the document type generated according to a user-defined rule, and the target text information is text information searched from a preset position of the interface document to be processed; and
and under the condition that the layout format of the interface document to be processed does not meet the preset layout format rule and the target text information is not searched, determining that the document type is a general document type.
3. The method of claim 2, wherein the determining a target resolution that matches the document type comprises any of:
determining that a target analysis mode matched with the document type is a standard analysis mode under the condition that the document type is the standard document type;
determining a target analysis mode matched with the document type as a user-defined analysis mode under the condition that the document type is the user-defined document type; and
and under the condition that the document type is the general document type, determining that a target analysis mode matched with the document type is a general analysis mode.
4. The method of claim 3, wherein the target parsing scheme that the document type matches is the generic parsing scheme;
analyzing the interface document to be processed according to the target analysis mode to obtain interface attribute information comprises:
performing word segmentation processing on the interface document to be processed to obtain a word segmentation result; and
and identifying the word segmentation result and determining the interface attribute information.
5. The method of claim 1, further comprising:
extracting the identification information of the interface document to be processed;
searching historical access information matched with the identification information from a historical access information set based on the identification information;
extracting the interface attribute information from the historical access information under the condition that the historical access information matched with the identification information is searched; and
and in the case that the historical access information matched with the identification information is not searched, executing the operation of identifying the document type of the interface document to be processed.
6. The method of claim 1, further comprising:
identifying the content format of the interface document to be processed;
under the condition that the content format of the interface document to be processed is a text format, executing the operation of identifying the document type of the interface document to be processed; and
and under the condition that the content format of the interface document to be processed is a non-text format, converting the content format of the interface document to be processed into a text format.
7. The method of claim 1, wherein the document type includes any of:
standard document type, custom document type, general document type.
8. The method of claim 1, wherein the interface attribute information comprises at least one of:
interface function information, request mode information, request parameter information, return result information and remark information.
9. The method of claim 1, further comprising:
sending the interface attribute information to a client;
receiving revised interface attribute information from the client; and
and generating a normalized interface document based on the revised interface attribute information.
10. An interface document parsing apparatus comprising:
the type identification module is used for identifying the document type of the interface document to be processed;
the mode determining module is used for determining a target analysis mode matched with the document type; and
and the analysis module is used for analyzing the interface document to be processed according to the target analysis mode to obtain the interface attribute information.
11. The apparatus of claim 10, wherein the type identification module comprises any one of:
the first type determining unit is used for determining the document type as a standard document type under the condition that the layout format of the interface document to be processed meets a preset layout format rule, wherein the standard document type is the type of a document generated based on a standardized rule;
the second type determining unit is used for determining that the document type is a self-defined document type under the condition that target text information is searched, wherein the self-defined document type is a document type generated according to a user-defined rule, and the target text information is text information searched from a preset position of the interface document to be processed; and
and the third type determining unit is used for determining that the document type is a general document type under the condition that the layout format of the interface document to be processed does not meet the preset layout format rule and the target text information is not searched.
12. The apparatus of claim 11, wherein the means for determining the manner comprises any one of:
a first mode determining unit, configured to determine, when the document type is the standard document type, that a target parsing mode matched with the document type is a standard parsing mode;
the second mode determining unit is used for determining that the target analysis mode matched with the document type is the user-defined analysis mode under the condition that the document type is the user-defined document type; and
and the third mode determining unit is used for determining that the target analysis mode matched with the document type is the general analysis mode under the condition that the document type is the general document type.
13. The apparatus of claim 12, wherein the target parsing scheme matching the document type is the generic parsing scheme;
the parsing module includes:
the word segmentation unit is used for carrying out word segmentation processing on the interface document to be processed to obtain a word segmentation result; and
and the recognition unit is used for recognizing the word segmentation result and determining the interface attribute information.
14. The apparatus of claim 10, further comprising:
the extraction module is used for extracting the identification information of the interface document to be processed;
the searching module is used for searching historical access information matched with the identification information from a historical access information set based on the identification information;
a first determination module, configured to, in a case where history access information matching the identification information is searched, extract the interface attribute information from the history access information; and
and the second determining module is used for executing the operation of identifying the document type of the interface document to be processed under the condition that the historical access information matched with the identification information is not searched.
15. The apparatus of claim 10, further comprising:
the format identification module is used for identifying the content format of the interface document to be processed;
the execution module is used for executing the operation of identifying the document type of the interface document to be processed under the condition that the content format of the interface document to be processed is a text format; and
and the conversion module is used for converting the content format of the interface document to be processed into a text format under the condition that the content format of the interface document to be processed is a non-text format.
16. The apparatus of claim 10, further comprising:
the sending module is used for sending the interface attribute information to a client;
the receiving module is used for receiving revised interface attribute information from the client; and
and the generating module is used for generating a normalized interface document based on the revised interface attribute information.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.
CN202110940395.3A 2021-08-16 2021-08-16 Interface document analysis method and device, electronic equipment and storage medium Pending CN113657088A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110940395.3A CN113657088A (en) 2021-08-16 2021-08-16 Interface document analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110940395.3A CN113657088A (en) 2021-08-16 2021-08-16 Interface document analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113657088A true CN113657088A (en) 2021-11-16

Family

ID=78479349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110940395.3A Pending CN113657088A (en) 2021-08-16 2021-08-16 Interface document analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113657088A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114791996A (en) * 2022-04-15 2022-07-26 北京百度网讯科技有限公司 Information processing method, device, system, electronic device and storage medium
CN116628451A (en) * 2023-05-31 2023-08-22 江苏华存电子科技有限公司 High-speed analysis method for information to be processed
CN116934438A (en) * 2023-04-14 2023-10-24 济南明泉数字商务有限公司 AI auction decision method and system based on chatGPT model and calculation force

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153717A (en) * 2017-12-29 2018-06-12 北京仁和汇智信息技术有限公司 A kind of structuring processing method and processing device of papers in sci-tech word document
CN111104557A (en) * 2019-11-22 2020-05-05 黄琴 Heterogeneous document processing system and method based on standard document markup language specification
CN111651552A (en) * 2020-06-08 2020-09-11 中国工商银行股份有限公司 Structured information determination method and device and electronic equipment
CN111832396A (en) * 2020-06-01 2020-10-27 北京百度网讯科技有限公司 Document layout analysis method and device, electronic equipment and storage medium
CN113204621A (en) * 2021-05-12 2021-08-03 北京百度网讯科技有限公司 Document storage method, document retrieval method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153717A (en) * 2017-12-29 2018-06-12 北京仁和汇智信息技术有限公司 A kind of structuring processing method and processing device of papers in sci-tech word document
CN111104557A (en) * 2019-11-22 2020-05-05 黄琴 Heterogeneous document processing system and method based on standard document markup language specification
CN111832396A (en) * 2020-06-01 2020-10-27 北京百度网讯科技有限公司 Document layout analysis method and device, electronic equipment and storage medium
CN111651552A (en) * 2020-06-08 2020-09-11 中国工商银行股份有限公司 Structured information determination method and device and electronic equipment
CN113204621A (en) * 2021-05-12 2021-08-03 北京百度网讯科技有限公司 Document storage method, document retrieval method, device, equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114791996A (en) * 2022-04-15 2022-07-26 北京百度网讯科技有限公司 Information processing method, device, system, electronic device and storage medium
CN114791996B (en) * 2022-04-15 2023-06-23 北京百度网讯科技有限公司 Information processing method, device, system, electronic equipment and storage medium
CN116934438A (en) * 2023-04-14 2023-10-24 济南明泉数字商务有限公司 AI auction decision method and system based on chatGPT model and calculation force
CN116628451A (en) * 2023-05-31 2023-08-22 江苏华存电子科技有限公司 High-speed analysis method for information to be processed
CN116628451B (en) * 2023-05-31 2023-11-14 江苏华存电子科技有限公司 High-speed analysis method for information to be processed

Similar Documents

Publication Publication Date Title
WO2020108063A1 (en) Feature word determining method, apparatus, and server
WO2020207167A1 (en) Text classification method, apparatus and device, and computer-readable storage medium
CN113657088A (en) Interface document analysis method and device, electronic equipment and storage medium
CN110263009B (en) Method, device and equipment for generating log classification rule and readable storage medium
CN111143505B (en) Document processing method, device, medium and electronic equipment
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN114861677B (en) Information extraction method and device, electronic equipment and storage medium
CN114495143B (en) Text object recognition method and device, electronic equipment and storage medium
EP4141697A1 (en) Method and apparatus of processing triple data, method and apparatus of training triple data processing model, device, and medium
CN115099239B (en) Resource identification method, device, equipment and storage medium
CN116955561A (en) Question answering method, question answering device, electronic equipment and storage medium
CN114385694A (en) Data processing method and device, computer equipment and storage medium
CN114461665B (en) Method, apparatus and computer program product for generating a statement transformation model
CN114880498B (en) Event information display method and device, equipment and medium
CN114444514B (en) Semantic matching model training method, semantic matching method and related device
CN116340172A (en) Data collection method and device based on test scene and test case detection method
CN113254578B (en) Method, apparatus, device, medium and product for data clustering
CN115470034A (en) Log analysis method, device and storage medium
CN115658903A (en) Text classification method, model training method, related device and electronic equipment
US20210342379A1 (en) Method and device for processing sentence, and storage medium
CN115292506A (en) Knowledge graph ontology construction method and device applied to office field
CN114239562A (en) Method, device and equipment for identifying program code blocks in document
CN113761906B (en) Method, apparatus, device and computer readable medium for parsing document
CN114048315A (en) Method and device for determining document tag, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination