CN111553150A - Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document - Google Patents

Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document Download PDF

Info

Publication number
CN111553150A
CN111553150A CN202010254667.XA CN202010254667A CN111553150A CN 111553150 A CN111553150 A CN 111553150A CN 202010254667 A CN202010254667 A CN 202010254667A CN 111553150 A CN111553150 A CN 111553150A
Authority
CN
China
Prior art keywords
real
information
historical
message
api
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010254667.XA
Other languages
Chinese (zh)
Inventor
刘劲柏
杨超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010254667.XA priority Critical patent/CN111553150A/en
Publication of CN111553150A publication Critical patent/CN111553150A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an automatic API (application program interface) document parsing configuration method, an electronic device and a storage medium, wherein the method comprises the following steps: obtaining historical message information in an API (application program interface) document sample; carrying out segmentation marking on the historical message information, and training a preset automatic word segmentation marking model according to the segmented and marked historical message information; acquiring real-time message information, and processing the real-time message information through an automatic word segmentation and labeling model to acquire a real-time message body; finally, acquiring request parameters, return parameters and IP addresses in the real-time message body; and the request parameter, the return parameter and the IP address are stored in a preset excel database in a Json format. The technical scheme provided by the invention can realize automatic analysis and configuration of the API interface document and improve the working efficiency of processing the API interface document.

Description

Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document
Technical Field
The invention relates to the technical field of data analysis and configuration, in particular to a method, a system, a device and a storage medium for automatic API (application program interface) document analysis and configuration.
Background
The text data is composed of specific data units, such as words, phrases, sentences, paragraphs or combinations of these specific data units, which can be referred to as text data, and the text message is a combined message text composed of specific data units.
In conventional system integration, it is often necessary to read a large number of API interface documents and then perform corresponding code development or system configuration according to specific information in the API interface documents. However, since the data types of various data information in the API interface documents are different, the specific information in the API interface documents is obtained by manual reading, which often causes the phenomena of missing and wrong writing.
In practical applications, regarding the extraction of specific information from the API interface document, some enterprises have introduced a text information extraction technology, which is a technology for extracting specific information from text information. The extraction of noun phrases, names of people, names of places, etc. required in text data belongs to the category of text information extraction technology. However, the conventional text information extraction technology can only extract specific information with simple structure characteristics from text information, and cannot accurately extract all the specific information in the text information, so that the utilization rate of the specific information is seriously reduced.
In addition, the existing text information extraction technology has certain data requirements on the data type of the text information; the feature extraction methods and the extracted keywords used by text information of different data types are different, for example, data types in an API interface document are various and include xml, pdf, word, json, and the like, and various types of data all need to be configured with corresponding feature extraction methods. In addition, in the API interface document, some types of data are distributed in other types of data, for example, the IP address information and the parameter information do not all appear along with corresponding keywords, but are hidden in the message information, and therefore cannot be extracted using the conventional text information extraction technology.
Therefore, although some enterprises already use the traditional text information extraction technology in system integration, the technology cannot automatically extract all key information in various documents, such as interface parameters, IP addresses, Json message samples, XML message samples and the like, so as to complete the description of the API.
Based on the above problems, a method for automatically analyzing all specific information in an API interface document with high efficiency is needed.
Disclosure of Invention
The invention provides an automatic API (application program interface) document parsing and configuring method, a system, an electronic device and a computer storage medium, and mainly aims to solve the problem that all key information in various documents cannot be automatically extracted by using a traditional text information extraction technology in the existing system integration.
In order to achieve the above object, the present invention provides an automatic API interface document parsing configuration method, which includes the following steps:
preprocessing a historical API document sample to acquire historical message information in the historical API document sample;
segment labeling is carried out on the historical message information so as to realize labeling of the beginning of the message, the end of the message, the message body and the non-message body in the historical message information, and a preset automatic word segmentation labeling model is trained according to the segmented and labeled historical message information;
acquiring real-time message information in an API (application program interface) document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through the automatic word segmentation labeling model to acquire a real-time message body of the real-time message information;
extracting request parameters, return parameters and IP addresses in the real-time message body;
and converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information, and storing the first Json format data information into a preset excel database.
In addition, the invention also provides an automatic API document parsing and configuring system, which comprises:
the preprocessing unit is used for preprocessing a historical API document sample to acquire historical message information in the historical API document sample;
the model training unit is used for carrying out segmentation marking on the historical message information so as to realize marking on the beginning, the end, the body and the non-body of the message in the historical message information, and training a preset automatic word segmentation marking model according to the segmented and marked historical message information;
the model application unit is used for acquiring real-time message information in an API (application program interface) document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through the automatic word segmentation labeling model so as to acquire a real-time message body of the real-time message information;
the real-time message body processing unit is used for extracting request parameters, return parameters and IP addresses in the real-time message body;
and the data storage unit is used for converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information and storing the first Json format data information into a preset excel database.
In addition, to achieve the above object, the present invention also provides an electronic device, including: a memory, a processor, and an automation API interface document parsing configuration program stored in the memory and executable on the processor, the automation API interface document parsing configuration program when executed by the processor implementing the steps of:
preprocessing a historical API document sample to acquire historical message information in the historical API document sample;
segment labeling is carried out on the historical message information so as to realize labeling of the beginning of the message, the end of the message, the message body and the non-message body in the historical message information, and a preset automatic word segmentation labeling model is trained according to the segmented and labeled historical message information;
acquiring real-time message information in an API (application program interface) document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through the automatic word segmentation labeling model to acquire a real-time message body of the real-time message information;
extracting request parameters, return parameters and IP addresses in the real-time message body;
and converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information, and storing the first Json format data information into a preset excel database.
In addition, to achieve the above object, the present invention further provides a computer readable storage medium, in which an automatic API interface document parsing configuration program is stored, and when the automatic API interface document parsing configuration program is executed by a processor, the steps of the above automatic API interface document parsing configuration method are implemented.
According to the automatic API interface document analysis configuration method, the electronic device and the computer readable storage medium, segmented labeling of historical message information is achieved through preprocessing, segmented labeling and the like, then the historical message information after segmented labeling is used for training the preset automatic word segmentation labeling model, finally the trained automatic word segmentation model is used for analyzing the API interface document to be configured and storing analyzed data into the excel database, so that automatic analysis and configuration of the API interface document to be configured are achieved, and analysis efficiency of the API interface document is remarkably improved.
Drawings
FIG. 1 is a flowchart of a preferred embodiment of a method for parsing and configuring an API document according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an electronic device according to an embodiment of the invention;
FIG. 3 is a schematic diagram of the internal logic of an automated API interface document parsing configuration program according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details.
Specific embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Example 1
In order to illustrate the method for configuring and parsing the API interface document provided by the present invention, fig. 1 shows a flow of the method for configuring and parsing the API interface document provided by the present invention.
As shown in fig. 1, the method for parsing and configuring the API interface document according to the present invention includes:
s110: acquiring a historical API document sample stored in the system, and preprocessing the historical API document sample to acquire all message information in the historical API document sample; and establishing a corresponding sample training set according to the historical message information.
Specifically, the historical API interface document sample may be obtained from a historical API interface document database preset by the system, where the historical API interface document database is a special database used by the system to store the historical API interface document; in addition, in order to ensure the precision of the later training model, a certain number of historical API interface document samples need to be obtained, and according to experimental verification, the preset number of the historical API interface document samples needed by the method is at least set to 2000.
It should be noted that, a history API interface document sample is generally a pdf format or word format document, and information in the history API interface document sample mainly includes history IP address information, history parameter information, history message information, and history code block information, however, in practical applications, because the history IP address information, the history parameter information, and the history message information are literal or numeric data information in the history API interface document sample, and the character length of the message information is long, in many cases, a certain amount of history IP address information and history parameter information are included in the history message information, and these history IP address information and history parameter information are hidden in the history message information, it is difficult to separate from the history message information by using a conventional information extraction method. Therefore, the invention firstly uses the traditional information extraction technology to preprocess the API interface document sample to preliminarily obtain the historical message information, then uses the historical message information to train the model, and finally accurately obtains the IP address information and the parameter information hidden in the real-time message information in a model extraction mode.
Specifically, during the preprocessing of the history API interface document sample, structured keyword extraction may be performed on the history API interface document sample by using an existing information extraction technology (including a keyword matching technology, a keyword identifier extraction technology, and the like), for example, a keyword search may be performed on the history API interface document sample according to keywords such as an IP address, a request message, a return message, a request parameter, and a return parameter, so as to preliminarily obtain history message information (including a history request message and a history return message), parameter information (including a history request parameter and a history return parameter), and history IP address information in the history API interface document sample. It should be further noted that, because the application of the invention to the historical API interface document sample is only the later training model, and the later training model only needs the historical message information, in the actual application process, only the historical message information needs to be acquired by the keyword matching technology, thereby improving the working efficiency of the whole system.
S120: and carrying out segmentation marking on all historical message information in the sample training set so as to realize marking of the beginning, the end, the message body and the non-message body of the message in the historical message information, and training a preset automatic word segmentation marking model according to the segmented marked sample training set.
The process of carrying out segmentation labeling on all historical message information in the sample training set comprises the following three steps:
firstly, each historical message information is divided into short sentences according to the comma marks, and all the short sentences are separated by empty lines.
Then, each short sentence is divided into a plurality of combinations of message bodies and non-message bodies, wherein the message bodies are real words (words with their own meaning), such as nouns: interface, letters IP, numbers 123, etc., non-bodies of text are dummy words (words without self meaning), such as verb-assist: in, conjunctions: and so on; wherein, each part is separated by a blank space.
Finally, labeling each divided part according to a preset labeling rule to form a short sentence labeling sequence; specifically, S may be marked at the beginning of the message, E may be marked at the end of the message, the message body is marked as I, and the non-message body is marked as O, where each mark is placed behind the corresponding part. For example, a short message sentence "interface IP is: 10.21.34.12 ', the sequence of the marked short sentence is "S (interface) I (ip) I (is) O (10.21.34.12) I E', wherein the words in the parentheses are history message information, the parentheses are only for the convenience of distinguishing the mark from the history message information, and S, I, O, E is the corresponding mark.
It should be noted that the automatic word segmentation and annotation model trained and completed by using the segmented and annotated sample training set has at least two functions, namely a chinese word segmentation function and an automatic annotation function.
Chinese word segmentation is a process of recombining continuous word sequences into word sequences according to a certain specification. It is known that in the English language, spaces are used as natural delimiters between words, while Chinese is a simple demarcation of characters, sentences and paragraphs by distinct delimiters, and only words do not have formal delimiters.
In the later stage, the message information to be processed needs to be automatically segmented, and then sequence marking and subsequent data use can be carried out. Therefore, an automatic word segmentation model needs to be established first, then the automatic word segmentation model is trained, after the precision of the automatic word segmentation model reaches a certain preset precision value, the automatic labeling model with a labeling function is added to the automatic word segmentation model, and subsequent data processing is performed.
Preferably, the present application uses a Hidden Markov Model (HMM), which is a statistical Model for describing a markov process with hidden unknown parameters, as the automatic segmentation Model. The difficulty is to determine the implicit parameters of the process from the observable parameters. These parameters are then used for further analysis.
Hidden markov models are mainly used to solve three problems, the evaluation problem, the decoding problem, and the learning problem, where the decoding problem is how to find a hidden state sequence that is optimal in some sense given an observation sequence of 0 ═ 010203 … 0t and a model parameter of λ ═ a, B, and pi. Among these problems we are interested in the hidden states in the markov model, which cannot be observed directly, but are more valuable, and are usually found using the Viterbi algorithm.
A practical example of this type of problem is chinese word segmentation, i.e. how a sentence is divided into its constituents is only appropriate. For example, the sentence "developing country" is divided into "developing-middle-country" or "developing-china-home". This problem can be solved with a hidden markov model. The word segmentation method of the sentence can be regarded as an implicit state, and the sentence can be regarded as a given observable state, so that the most probable correct word segmentation method is found by building an HMM.
After the hidden markov model is established, the hidden markov model can be trained by using a short sentence in the sample training set until the precision of the hidden markov model reaches a preset precision value.
Specifically, the accuracy of the hidden markov model can be detected using the following formula,
Figure BDA0002436822460000071
wherein Xnew, 1 … Xnew, n is n words in an input short sentence, and Ynew,1, … Ynew, n is a label of model output; xnew, 1 is the first word of the input phrase, Ynew,1 is the first label of the output; ynew, i is the ith label of the output.
When the probability obtained by the formula is greater than a preset precision value (for example, eighty percent), the precision of the hidden Markov model is judged to reach the preset precision value, and the hidden Markov model can be stopped from being trained.
It should be noted that, the hidden markov model is a commonly used prior art, and the innovation point of the present application lies in applying the hidden markov model to sentence division and labeling of message information, so details of the specific technique for training hidden markov by using short sentences in a sample training set are not repeated herein.
S130: after the training of the automatic word segmentation and labeling model is finished, an API interface document to be configured is obtained, then the real-time message information in the API interface document to be configured is extracted through an information extraction technology, and finally word segmentation processing and labeling processing are carried out on the real-time message information through the automatic word segmentation and labeling model so as to obtain the real-time message bodies in all message information.
It should be noted that the API interface document to be configured is an API interface document generated by the system in real time, and in the process of acquiring the real-time message information in the API interface document to be configured, the API interface document to be configured needs to be preprocessed by using an information extraction technique, where the preprocessing process is similar to the preprocessing process in step S110, or the real-time message information in the API interface document to be configured is preliminarily acquired by using a structured keyword extraction technique in a preprocessing manner; in addition, since the preprocessing process in step S110 has already been described in detail, the preprocessing process in step S130 is not described herein again.
S140: after the real-time message body is obtained by using the automatic word segmentation labeling model, the real-time message body is processed by using the information extraction technology again to obtain the request parameter, the return parameter and the IP address of the real-time message body.
Specifically, the message may be specified by a message specific field such as POST, GET HTTP, Content-Type/Accept: referer: and special message fields and ' & ' and ' { ', ' } ', ': ' the equal symbols determine the request parameters or the return parameters in the real-time message body; that is, if the phrase labeled I (real-time text, the same applies hereinafter) includes & ' and ' ═ ' { ', ' } ', ': ' one or more of these symbols, and several phrases adjacent to the phrase labeled I include POST, GET HTTP, Content-Type/Accept: referer: and if so, judging that the phrase marked as I is a request parameter or a return parameter.
More specifically, if some phrases adjacent to the phrase labeled as I include special message fields such as POST, GET HTTP, Content-Type, etc., it is determined that the phrase labeled as I is a request parameter; if several adjacent phrases to the phrase labeled as I contain Accept: and Referer: and if so, judging that the phrase marked as I is a return parameter.
In addition, the IP address in the word group labeled I can be extracted by using a keyword matching technology according to the data characteristics of the IP address, for example, after word segmentation, the word group labeled I is 10.21.34.12, and the IP address extraction can be performed by using a combination of continuous numbers and pause signs.
Although the processing technique using the keyword matching technique in step S140 has the same principle as the keyword matching technique in step S130, the processing step cannot be omitted. Firstly, the keyword matching technology involved in step S130 is used in the process of preprocessing the API interface document to be configured, the keywords are IP addresses, request messages, return messages, request parameters, return parameters, and the like, while the keyword matching technology used in step S140 is used for processing phrases labeled as I after the keywords are determined by technicians according to the data characteristics of various parameters (such as POST, GET HTTP, Content-Type, and the like), the keywords are used to cut the real-time message information first, and the various parameters close to the real-time message information are separated by means of cutting the keywords, so as to accurately determine the starting points and the ending points of the various parameters; if the word segmentation processing is not performed on the real-time message information in advance, the information extraction technology is directly used for processing the real-time message information, and since keywords or key characters corresponding to the same parameter may appear in a real-time message body of the real-time message information, once a certain parameter is matched incorrectly, other parameters are likely to be extracted abnormally.
S150: and converting the data information of the real-time message body, such as the request parameter, the return parameter, the IP address and the like, acquired in the S140 into first Json format data information, and circularly writing the first Json format data information into a preset excel database.
The preset excel database is used for later-stage data calling and configuration by workers, different types of data are stored in different positions of the excel database, and due to the fact that the excel database has specific structural characteristics, the workers only need to call the data in corresponding positions according to own requirements, and do not need to check the content of each document firstly and then perform corresponding format conversion to the traditional processing mode, and finally the application of the data can be achieved.
It should be noted that, the method for how to convert various types of data information into the Json format is a common technical means in the art, for example, a Json. times () function in Python may be used, and the Json. times () function is a coding of a Python data type list in the Json format, and since the specific using method of the Json. times () function is a common technical means in the art, it is not described herein again.
In addition, the Json format data information can be circularly written into a preset excel database by using a function in the SDK POI of java. The SDK POI function is an existing open source function, and the scheme is only used for writing data in an excel database, so that specific execution processes of the function in the SDK POI are not repeated herein.
In addition, it should be noted that the API interface document to be configured further includes real-time IP address information, real-time parameter information, real-time Json code block information, and real-time Xml code block information, so as to improve the integrity of the excel database and facilitate the use of later data information, and in the preprocessing process in step S140, an information extraction technology may be used to obtain corresponding real-time IP address information, real-time parameter information, real-time Xml code block information, and Json code block information; then, in step S150, the information (excluding the Json code block information) and the request parameter, the return parameter, and the IP address in the real-time message are converted into the Json format to form second Json format data information, and the second Json format data information and the first Json code block information are cyclically written into a preset excel database. By the method, the information in each API interface document generated by the system in real time can be stored in the preset excel database, and the integrity of each API interface document data is ensured.
Specifically, the API interface document to be configured may be processed by using a keyword matching technique to obtain real-time IP address information and real-time parameter information in the API interface document to be configured, for example, the used keyword is to extract the real-time IP address information for the IP address, and the keyword is used to request a parameter and return the parameter to extract the real-time parameter information.
In addition, the API interface document to be configured can be processed by using a key identifier extraction technology to acquire real-time Json code block information and real-time Xml code block information in the API interface document to be configured. For example, for a Json code block, symbols of '{', '}', ',', etc. need to be confirmed simultaneously in the API interface document to be configured, and appear several times somewhere in the document, to thereby determine the Json code block. For an Xml code block, then only the presence in the document, <? xmlversion ═ 1.0 "can determine the location of the XML code block, then use a stack to record all the correspondences '< XXX' and '</XXX >', and finally use the object in the first < XXX > and the corresponding </XXX > to determine the end of the XML code.
According to the embodiment, the method for analyzing and configuring the API documents provided by the invention at least has the following advantages:
1. by combining the traditional information extraction model technology with a preset automatic word segmentation and labeling model, the automatic analysis and configuration of the API interface document to be configured can be realized, and the analysis efficiency of the API interface document is obviously improved;
2. storing each Json format data information through a preset excel database, realizing classification storage of each Json format data information, and facilitating use of each type of data in the later period;
3. the IP address information, the parameter information and the Xml code block information which are obtained through preprocessing are all converted into second Json format data information, and then the second Json format data information and the Json code block information are stored in the excel database, so that the integrity of the excel database is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Example 2
Corresponding to the method, the application also provides an automatic API document parsing configuration system, which comprises:
the preprocessing unit is used for preprocessing the historical API document sample to acquire historical message information in the historical API document sample;
the model training unit is used for carrying out segmentation marking on the historical message information so as to realize marking on the beginning, the end, the body and the non-body of the message in the historical message information, and training a preset automatic word segmentation marking model according to the segmented and marked historical message information;
the model application unit is used for acquiring the real-time message information in the API document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through the automatic word segmentation labeling model so as to acquire the real-time message body of the real-time message information;
the real-time message body processing unit is used for extracting the request parameters, the return parameters and the IP addresses in the real-time message body;
and the data storage unit is used for converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information and storing the first Json format data information into a preset excel database.
Example 3
The present invention also provides an electronic device 70. Referring to fig. 2, a schematic structural diagram of an electronic device 70 according to a preferred embodiment of the invention is shown.
In the embodiment, the electronic device 70 may be a terminal device having a computing function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.
The electronic device 70 includes: a processor 71 and a memory 72.
The memory 72 includes at least one type of readable storage medium. At least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 70, such as a hard disk of the electronic device 70. In other embodiments, the readable storage medium may be an external memory of the electronic device 1, such as a plug-in hard disk provided on the electronic device 70, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like.
In the present embodiment, the readable storage medium of the memory 72 is generally used for storing an automation API interface document parsing configuration program 73 installed in the electronic device 70. The memory 72 may also be used to temporarily store data that has been output or is to be output.
The processor 72 may be, in some embodiments, a Central Processing Unit (CPU), microprocessor or other data Processing chip for executing program code stored in the memory 72 or Processing data, such as an automation API document parsing configuration program 73.
In some embodiments, the electronic device 70 is a terminal device of a smartphone, tablet, portable computer, or the like. In other embodiments, the electronic device 70 may be a server.
Fig. 2 only shows the electronic device 70 with components 71-73, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
Optionally, the electronic device 70 may further include a user interface, which may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other devices with voice recognition function, a voice output device such as a sound box, a headset, etc., and optionally may also include a standard wired interface, a wireless interface.
Optionally, the electronic device 70 may further include a display, which may also be referred to as a display screen or a display unit. In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic device 70 and for displaying a visualized user interface.
Optionally, the electronic device 70 may further include a touch sensor. The area provided by the touch sensor for the user to perform touch operation is referred to as a touch area. Further, the touch sensor here may be a resistive touch sensor, a capacitive touch sensor, or the like. The touch sensor may include not only a contact type touch sensor but also a proximity type touch sensor. Further, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.
The area of the display of the electronic device 70 may be the same as or different from the area of the touch sensor. Optionally, the display is stacked with the touch sensor to form a touch display screen. The device detects touch operation triggered by a user based on the touch display screen.
Optionally, the electronic device 70 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.
In the apparatus embodiment shown in FIG. 2, the memory 72, which is a type of computer storage medium, may include an operating system, and an automation API document parsing configuration program 73; the processor 71, when executing the automation API document parsing configuration program 73 stored in the memory 72, performs the following steps:
preprocessing a historical API document sample to obtain historical message information in the historical API document sample;
segment labeling is carried out on the historical message information so as to realize labeling of the beginning, the end, the body and the non-body of the message in the historical message information, and a preset automatic word segmentation labeling model is trained according to the segmented labeled historical message information;
acquiring real-time message information in an API (application program interface) document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through an automatic word segmentation labeling model to acquire a real-time message body of the real-time message information;
extracting request parameters, return parameters and IP addresses in the real-time message body;
and converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information, and storing the first Json format data information into a preset excel database.
In this embodiment, fig. 3 is a schematic diagram of the internal logic of the API parsing configuration program according to the present invention, and as shown in fig. 3, the API parsing configuration program 73 may be further divided into one or more modules, and the one or more modules are stored in the memory 72 and executed by the processor 71 to implement the present invention. The modules referred to herein are referred to as a series of computer program instruction segments capable of performing specified functions. Referring to FIG. 3, a block diagram of a preferred embodiment of the automated API interface document parsing configuration program 73 of FIG. 2 is shown. The automation API interface document parsing configuration program 73 may be segmented into: a preprocessing module 74, a model training and application module 75, a real-time message body processing module 76, and a data storage module 77. The functions or operational steps performed by the modules 74-77 are similar to those described above and will not be described in detail herein, for example, where:
and the preprocessing module 74 is configured to preprocess the historical API interface document sample to obtain historical message information in the historical API interface document sample.
The model training and application module 75 is configured to segment and label the historical message information to label the beginning of the message, the end of the message, the message body, and the non-message body in the historical message information, and train a preset automatic word segmentation labeling model according to the segmented and labeled historical message information; and acquiring real-time message information in the API document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through an automatic word segmentation labeling model to acquire a real-time message body of the real-time message information.
And a real-time message body processing module 76, configured to extract the request parameter, the return parameter, and the IP address in the real-time message body.
And a data storage module 77, configured to convert the request parameter, the return parameter, and the IP address in the real-time body into first Json-format data information, and store the first Json-format data information in a preset excel database.
Example 4
The present invention further provides a computer-readable storage medium, in which an automatic API interface document parsing configuration program 73 is stored, and when being executed by a processor, the automatic API interface document parsing configuration program 73 implements the following operations:
preprocessing a historical API document sample to obtain historical message information in the historical API document sample;
segment labeling is carried out on the historical message information so as to realize labeling of the beginning, the end, the body and the non-body of the message in the historical message information, and a preset automatic word segmentation labeling model is trained according to the segmented labeled historical message information;
acquiring real-time message information in an API (application program interface) document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through an automatic word segmentation labeling model to acquire a real-time message body of the real-time message information;
extracting request parameters, return parameters and IP addresses in the real-time message body;
and converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information, and storing the first Json format data information into a preset excel database.
The specific implementation of the computer-readable storage medium provided by the present invention is substantially the same as the specific implementation of the above-mentioned method for configuring and parsing API documents, and the electronic device, and will not be described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An automatic API document parsing configuration method applied to an electronic device is characterized by comprising the following steps:
preprocessing a historical API document sample to acquire historical message information in the historical API document sample;
segment labeling is carried out on the historical message information so as to realize labeling of the beginning of the message, the end of the message, the message body and the non-message body in the historical message information, and a preset automatic word segmentation labeling model is trained according to the segmented and labeled historical message information;
acquiring real-time message information in an API (application program interface) document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through the automatic word segmentation labeling model to acquire a real-time message body of the real-time message information;
extracting request parameters, return parameters and IP addresses in the real-time message body;
and converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information, and storing the first Json format data information into a preset excel database.
2. The method according to claim 1, wherein preprocessing the sample of the historical API document comprises:
and carrying out keyword matching processing on the historical API interface document to obtain historical message information in the historical API interface document.
3. The method for parsing and configuring API document according to claim 1 or 2, wherein the step of segment labeling the historical message information comprises:
sentence dividing processing is carried out on the historical message information according to the comma marks, so that the historical message information is divided into short sentences, wherein the short sentences are separated by idle lines;
dividing each short sentence to divide each short sentence into a combination of a message body and a non-message body, wherein the message body is a real word, the non-message body is a virtual word, and the message body and the non-message body are separated by a blank space;
and marking each divided short sentence according to a preset marking rule to form a short sentence marking sequence, wherein the message body is marked as I, the beginning of the message body is marked as S, the end of the message body is marked as E, and the non-message body is marked as O.
4. The method according to claim 1, wherein the automatic segmentation tagging model comprises an automatic segmentation model and an automatic tagging model, and the process of training the automatic segmentation tagging model according to the segmented and tagged historical message information comprises:
firstly, training the automatic word segmentation model by using the history message information after segmentation labeling;
and adding the automatic tagging model to the automatic word segmentation model to form a trained automatic word segmentation tagging model.
5. The method according to claim 4, wherein the automatic word segmentation model is a hidden Markov model, and the hidden Markov model is trained by using segmented labeled historical message information until the precision of the hidden Markov model reaches a preset precision value.
6. The method according to claim 1, wherein the API interface document to be configured includes real-time message information; and the number of the first and second electrodes,
the process of acquiring the real-time message information in the API interface document to be configured comprises the following steps: and performing keyword matching processing on the API interface document to be configured to acquire real-time message information in the API interface document to be configured.
7. The method for parsing and configuring the API interface document according to claim 6, wherein the API interface document to be configured further includes real-time IP address information, real-time parameter information, real-time Json code block information, and real-time Xml code block information; and the number of the first and second electrodes,
the process of acquiring the real-time message information in the API interface document to be configured also comprises the following steps;
performing keyword matching processing on the API interface document to be configured to acquire real-time IP address information and real-time parameter information in the API interface document to be configured;
extracting key identifiers from the API interface document to be configured to obtain real-time Json code block information and real-time Xml code block information in the API interface document to be configured; and the number of the first and second electrodes,
after converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information and storing the first Json format data information in a preset excel database, the method further comprises the following steps:
and converting the real-time IP address information, the real-time parameter information and the Xml code block information into second Json format data information, and storing the second Json format data information and the real-time Json code block information into the excel database.
8. An automated API interface document parsing configuration system, the parsing configuration system comprising:
the preprocessing unit is used for preprocessing a historical API document sample to acquire historical message information in the historical API document sample;
the model training unit is used for carrying out segmentation marking on the historical message information so as to realize marking on the beginning, the end, the body and the non-body of the message in the historical message information, and training a preset automatic word segmentation marking model according to the segmented and marked historical message information;
the model application unit is used for acquiring real-time message information in an API (application program interface) document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through the automatic word segmentation labeling model so as to acquire a real-time message body of the real-time message information;
the real-time message body processing unit is used for extracting request parameters, return parameters and IP addresses in the real-time message body;
and the data storage unit is used for converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information and storing the first Json format data information into a preset excel database.
9. An electronic device, comprising: a memory, a processor, and an automation API interface document parsing configuration program stored in the memory and executable on the processor, the automation API interface document parsing configuration program when executed by the processor implementing the steps of:
preprocessing a historical API document sample to acquire historical message information in the historical API document sample;
segment labeling is carried out on the historical message information so as to realize labeling of the beginning, the end, the body and the non-body of the message in the historical message information, and a preset automatic word segmentation labeling model is trained according to the segmented and labeled historical message information;
acquiring real-time message information in an API (application program interface) document to be configured, and performing word segmentation processing and labeling processing on the real-time message information through the automatic word segmentation labeling model to acquire a real-time message body of the real-time message information;
extracting request parameters, return parameters and IP addresses in the real-time message body;
and converting the request parameters, the return parameters and the IP addresses in the real-time message body into first Json format data information, and storing the first Json format data information into a preset excel database.
10. A computer-readable storage medium having stored therein an automated API interface document parsing configuration program which, when executed by a processor, performs the steps of the automated API interface document parsing configuration method of any one of claims 1 through 7.
CN202010254667.XA 2020-04-02 2020-04-02 Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document Pending CN111553150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010254667.XA CN111553150A (en) 2020-04-02 2020-04-02 Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010254667.XA CN111553150A (en) 2020-04-02 2020-04-02 Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document

Publications (1)

Publication Number Publication Date
CN111553150A true CN111553150A (en) 2020-08-18

Family

ID=72007385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010254667.XA Pending CN111553150A (en) 2020-04-02 2020-04-02 Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document

Country Status (1)

Country Link
CN (1) CN111553150A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667278A (en) * 2020-12-25 2021-04-16 山东众阳健康科技集团有限公司 Hospital medical insurance interface configuration method and system
CN112765939A (en) * 2021-02-04 2021-05-07 浪潮云信息技术股份公司 Policy and law and regulation analysis method and system based on regular expression matching algorithm
CN113643013A (en) * 2021-08-11 2021-11-12 中国工商银行股份有限公司 Model establishing method, business processing method, device, electronic equipment and medium
CN115374239A (en) * 2022-07-13 2022-11-22 北京中海住梦科技有限公司 Legal and legal analysis method and device, computer equipment and readable storage medium
CN115964028A (en) * 2021-10-12 2023-04-14 讯联数据(无锡)有限公司 Quick access method and system for third-party payment interface

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287069A (en) * 2019-05-21 2019-09-27 平安银行股份有限公司 ESB automatic interface testing method, server and computer readable storage medium
CN110688315A (en) * 2019-09-26 2020-01-14 招商局金融科技有限公司 Interface code detection report generation method, electronic device, and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287069A (en) * 2019-05-21 2019-09-27 平安银行股份有限公司 ESB automatic interface testing method, server and computer readable storage medium
CN110688315A (en) * 2019-09-26 2020-01-14 招商局金融科技有限公司 Interface code detection report generation method, electronic device, and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112667278A (en) * 2020-12-25 2021-04-16 山东众阳健康科技集团有限公司 Hospital medical insurance interface configuration method and system
CN112667278B (en) * 2020-12-25 2024-01-12 众阳健康科技集团有限公司 Hospital medical insurance interface configuration method and system
CN112765939A (en) * 2021-02-04 2021-05-07 浪潮云信息技术股份公司 Policy and law and regulation analysis method and system based on regular expression matching algorithm
CN113643013A (en) * 2021-08-11 2021-11-12 中国工商银行股份有限公司 Model establishing method, business processing method, device, electronic equipment and medium
CN115964028A (en) * 2021-10-12 2023-04-14 讯联数据(无锡)有限公司 Quick access method and system for third-party payment interface
CN115964028B (en) * 2021-10-12 2023-11-03 讯联数据(无锡)有限公司 Rapid access method and system for third party payment interface
CN115374239A (en) * 2022-07-13 2022-11-22 北京中海住梦科技有限公司 Legal and legal analysis method and device, computer equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN111553150A (en) Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document
TWI636452B (en) Method and system of voice recognition
CN108959242B (en) Target entity identification method and device based on part-of-speech characteristics of Chinese characters
CN109147767B (en) Method, device, computer equipment and storage medium for recognizing numbers in voice
CN111177184A (en) Structured query language conversion method based on natural language and related equipment thereof
WO2021135469A1 (en) Machine learning-based information extraction method, apparatus, computer device, and medium
CN111292751B (en) Semantic analysis method and device, voice interaction method and device, and electronic equipment
CN110569332B (en) Sentence feature extraction processing method and device
CN113495900A (en) Method and device for acquiring structured query language sentences based on natural language
JP2020030408A (en) Method, apparatus, device and medium for identifying key phrase in audio
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN111460131A (en) Method, device and equipment for extracting official document abstract and computer readable storage medium
CN112328761A (en) Intention label setting method and device, computer equipment and storage medium
CN111209396A (en) Entity recognition model training method, entity recognition method and related device
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN110765889A (en) Legal document feature extraction method, related device and storage medium
CN114298035A (en) Text recognition desensitization method and system thereof
CN112149680B (en) Method and device for detecting and identifying wrong words, electronic equipment and storage medium
CN112650858A (en) Method and device for acquiring emergency assistance information, computer equipment and medium
CN112836521A (en) Question-answer matching method and device, computer equipment and storage medium
CN111639156A (en) Query method, device, equipment and storage medium based on hierarchical label
CN116244410A (en) Index data analysis method and system based on knowledge graph and natural language
CN116796726A (en) Resume analysis method, resume analysis device, terminal equipment and medium
CN109635125B (en) Vocabulary atlas building method and electronic equipment
CN111708870A (en) Deep neural network-based question answering method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination