CN111245899B

CN111245899B - Method and system for processing illegal message in web service environment

Info

Publication number: CN111245899B
Application number: CN201911406333.3A
Authority: CN
Inventors: 郑刚; 罗秀; 周跃林; 邓斯玉; 文兆明; 侯方
Original assignee: Guangzhou Aerospace Software Branch Of Aerospace Information Co ltd
Current assignee: Guangzhou Aerospace Software Branch Of Aerospace Information Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2022-09-20
Anticipated expiration: 2039-12-31
Also published as: CN111245899A

Abstract

The invention discloses a method and a system for processing illegal messages in a web service environment, wherein the method comprises the following steps: acquiring message data in a preset format returned by the server through an API (application program interface); scanning and analyzing the characters of the message data in the preset format one by using a character analyzer, and judging whether the currently analyzed characters are illegal characters or not based on a preset legal character coding range condition; and when the illegal character is determined to be the illegal character, calling an abnormal processing interface, judging whether the illegal character accords with the Unicode character encoding range condition or not based on a preset illegal character processing mechanism, and returning a normal character identifier to the character resolver by the abnormal processing interface when the illegal character accords with the Unicode character encoding range condition so that the character resolver can resolve the illegal character as a normal character. The method has simple implementation process, can effectively avoid the problem of operation termination and data loss caused by abnormal throwing of the system due to illegal data, and can effectively ensure the integrity of the xml data.

Description

Method and system for processing illegal message in web service environment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method and a system for processing an illegal packet in a web service environment.

Background

For a large number of developers, data exchange between different systems on the internet has always been a time-consuming problem. Each upgrade of a hardware or software platform often requires a large amount of data to be converted, and even data loss occurs. Therefore, it is becoming more and more important to find a data exchange method with simple encoding, high compatibility and easy reading and writing. xml, a format for data storage in plain text format, provides a software and hardware independent method of data storage. The data exchange in the xml format can not only reduce the complexity of system development, but also make the sharing and exchange of different application data easier. In addition, the xml format data also supports the use of various reading devices (such as palm computers, voice devices, news readers and the like), and can be used by blind persons or other disabled persons. Xml is therefore widely used in data exchange scenarios for different systems.

However, when xml exchange data is encoded in the Unicode character encoding manner, the WebService client cannot normally analyze the control characters, and express rarely-used characters, emoji expressions and other special characters by proxy, so that the system reports illegal character abnormality, and even terminates system operation after the system throws out the abnormality.

The reason of the abnormal problem is that WebService adopts xml format to transmit data, w3c organizes that the range of legal characters in xml includes 63457 characters of common Latin, English, Chinese, Japanese and other languages, and special characters such as control characters, emoji emoticons and rarely-used characters expressed in the form of agent pairs are not in the range of legal characters. Therefore, when the WebService client analyzes the xml message containing the control characters, the rarely-used words expressed in the form of proxy pairs, the emoji expressions and other special character codes, the system can not recognize the codes of the special characters, and the system reports errors. Even various mainstream web browsers output exception information when parsing the xml.

Therefore, a method for processing illegal messages in a web service environment is needed.

Disclosure of Invention

The invention provides a method and a system for processing an illegal message in a web service environment, which aim to solve the problem of how to process the illegal message caused by special characters.

In order to solve the above problem, according to an aspect of the present invention, there is provided a method for processing an illegal packet in a web service environment, the method including:

acquiring message data in a preset format returned by the server through an API (application program interface);

scanning and analyzing the characters of the message data in the preset format one by using a character analyzer, and judging whether the currently analyzed character is an illegal character or not based on a preset legal character coding range condition in the analysis process of each character;

and when the currently analyzed character is determined to be an illegal character, calling an exception handling interface, judging whether the illegal character accords with the Unicode character encoding range condition or not based on a preset illegal character processing mechanism, and returning a normal character identifier to the character analyzer by the exception handling interface when the illegal character accords with the Unicode character encoding range condition so that the character analyzer can analyze the illegal character as a normal character.

Preferably, the obtaining, through the API interface, the message data in the preset format returned by the server includes:

acquiring first message data returned by a server through an API (application programming interface) interface, and carrying out integrity check on the first message data;

and when the first message data passes the integrity check, merging the first message data to acquire the message data in the preset format.

Preferably, the scanning and analyzing the characters of the message data in the preset format one by using a character analyzer, and judging whether the currently analyzed character is an illegal character based on a preset legal character coding range condition in the analyzing process of each character, includes:

scanning and analyzing the message data characters in the preset format one by using a character analyzer;

organizing a legal character encoding range condition of a specified preset format by using W3C to judge whether the current character data is legal or not; if the current character data is legal, scanning the next character data in sequence; and if the current character data are illegal characters, calling an exception handling interface based on a preset illegal character handling mechanism.

Preferably, wherein the illegal character comprises: control characters, uncommon words expressed in proxy pairs, and emoji emoticons; the preset format is an xml format; the preset legal character encoding rule is a judgment rule based on the xml legal character encoding range condition specified by W3C.

Preferably, wherein the method further comprises:

and when the illegal character does not accord with the Unicode character encoding range condition, the exception handling interface returns an exception character identifier to the character resolver, and the character resolver throws an exception.

According to another aspect of the present invention, there is provided a system for processing an illegal message in a web service environment, the system comprising:

the message data acquisition unit is used for acquiring message data in a preset format returned by the server through the API;

the illegal character judgment unit is used for scanning and analyzing the characters of the message data in the preset format one by utilizing a character analyzer, and judging whether the currently analyzed character is an illegal character or not based on a preset legal character coding range condition in the analysis process of each character;

and the illegal character processing unit is used for calling an abnormal processing interface when the currently analyzed character is determined to be an illegal character, judging whether the illegal character accords with the Unicode character coding range condition or not based on a preset illegal character processing mechanism, and returning a normal character identifier to the character analyzer by the abnormal processing interface when the illegal character accords with the Unicode character coding range condition so that the character analyzer can analyze the illegal character as a normal character.

Preferably, the acquiring unit of message data acquires the message data in the preset format returned by the server through an API interface, and includes:

acquiring first message data returned by a server through an API (application program interface), and carrying out integrity check on the first message data;

Preferably, the illegal character determining unit scans and analyzes the characters of the message data in the preset format one by using a character analyzer, and determines whether the currently analyzed character is an illegal character based on a preset legal character encoding range condition in an analyzing process of each character, including:

organizing a legal character encoding range condition of a specified preset format by using W3C to judge whether the current character data is legal or not; if the current character data is legal, continuously scanning the next character data in sequence; and if the current character data are illegal characters, calling an exception handling interface based on a preset illegal character handling mechanism.

Preferably, the illegal character processing unit further comprises:

and when the illegal character does not accord with the Unicode character coding range condition, the exception handling interface returns an exception character identifier to the character resolver, and the character resolver throws an exception.

The invention provides a method and a system for processing an illegal message in a web service environment, wherein when the illegal message is processed, the WebService can normally return control characters and illegal character data generated by rarely-used characters and emoji emoticons expressed by agents by modifying the processing rule of background abnormal data instead of simply throwing out abnormal information or directly filtering out the control characters and the illegal character data generated by rarely-used characters and emoji emoticons expressed by the agents. Compared with the method for directly throwing abnormal information or filtering data, the method has the advantages that the implementation process is simple, developers can effectively avoid the conditions of stopping operation and data loss caused by abnormal throwing of the system due to illegal data under the condition that developers do not need to modify upper-layer service codes when a server and a client call WebService, and the integrity of xml data can be effectively ensured.

Drawings

A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:

FIG. 1 is a flow chart of a method 100 for processing illegal messages in a web services environment according to an embodiment of the present invention; and

fig. 2 is a schematic structural diagram of an illegal message processing system 200 in a web service environment according to an embodiment of the present invention.

Detailed Description

The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.

Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.

Fig. 1 is a flowchart of an illegal message processing method 100 in a web service environment according to an embodiment of the present invention. As shown in fig. 1, compared with the method for processing an illegal packet in a web service environment, the method for processing an illegal packet in a web service environment provided by the embodiment of the present invention has a simple implementation process, and a developer can effectively avoid the situation of stopping operation and data loss due to the fact that an illegal data causes the system to throw an exception without modifying an upper layer service code when a server or a client calls WebService, and can effectively ensure the integrity of xml data. The method 100 for processing an illegal message in a web service environment provided by the embodiment of the invention starts from step 101, and obtains message data in a preset format returned by a server through an API (application program interface) in step 101.

and when the first message data passes the integrity check, merging the first message data to obtain the message data in the preset format.

In an embodiment of the present invention, the preset format is an xml format, and the illegal characters include: control characters, characters other than xml legal encoding ranges specified by W3C such as uncommon characters represented by proxy pairs, emoji emoticons, and the like. And receiving xml message data from a server by using an API (application program interface) of the webService, and finally forming a complete and temporary xml format file by means of data verification, combination and the like. The file is mainly used for analyzing the subsequent character data codes. When receiving an xml file, a client needs to judge whether to completely receive xml data. If the xml fails to accept, the xml message data needs to be received again.

In step 102, the characters of the message data in the preset format are scanned and analyzed one by using a character analyzer, and in the analysis process of each character, whether the currently analyzed character is an illegal character is judged based on a preset legal character coding range condition.

In an embodiment of the present invention, the character parser axis are used to scan and parse the character codes of the returned xml resource objects one by one. And judging whether the current xml character code is legal or not by utilizing the xml legal character code range. And if the character data of the current xml is legal, scanning the next character code in sequence. If the character code of the current xml is illegal, the system calls an exception handling interface based on a preset illegal character handling mechanism.

Specifically, in order to avoid the system throwing exception caused by the existence of illegal data in the xml, the client needs to scan and detect the illegal character of the xml. Here primarily to the axis character parser. The bottom axis parser scans and parses the returned xml message character codes one by one. Whether the current xml character code is legal or not is judged by organizing the specified xml legal character code range by using W3C. And if the character data of the current xml is legal, scanning the next character code in sequence. If the current xml character code is illegal, the system calls an exception handling interface, and the illegal message data is handled by an illegal character handling mechanism based on a preset illegal character handling mechanism.

In step 103, when it is determined that the currently parsed character is an illegal character, an exception handling interface is called, whether the illegal character meets the condition of the Unicode character encoding range is judged based on a preset illegal character handling mechanism, and when the illegal character meets the condition of the Unicode character encoding range, the exception handling interface returns a normal character identifier to the character parser, so that the character parser can parse the illegal character as a normal character.

Preferably, wherein the method further comprises:

In the embodiment of the invention, when an illegal character occurs, the exception handling mechanism does not interrupt xml analysis and can further process the current illegal character. In order to avoid the system from stopping running due to illegal character data, an illegal character processing mechanism does not simply and directly throw or filter the illegal character, but hooks a self-defined abnormal processing interface, analyzes the attribute of the illegal message and modifies the data property of the illegal message, so that an xml parser can treat the illegal character as a normal character, and the WebService interface normally returns the character data. After the current illegal data is processed, axis continues to scan the coded data of the next character until all the coded data of the character in the xml are correctly analyzed and processed. Therefore, all illegal character codes in the xml can be processed correctly, and the illegal character codes do not exist in the xml any more.

The Unicode character encoding space is # x0000- # x10FFFF, and the W3C standard specifies that the range of xml legal characters is Char: # x9| # xA | # xD | [ # x20- # xD7FF ] | [ # xE000- # xffffd ] | [ # x10000- # x10FFFF ]. And the coding of the rarely-used word represented by the control character and the emoji expression which are common to us is not in the xml legal coding range specified by W3C. Therefore, when the xml parsing library parses characters, the codes are regarded as 'illegal messages', and then the exception is thrown. The process flow for such illegal codes is as follows: when the xml parses the character code, if an abnormal character is encountered, the xml parser calls a method in an exception handling interface (such as xmlerrorHandler) to make a decision on the character code, and returns an error identifier to the code identified as the illegal character. The actual modification of the invention is to modify the judgment rule and the exception handling process of the xml exception character according to the character property, firstly, an exception handling interface method is connected in an exception handling program, and whether the exception handling interface method is in the encoding range of the Unicode character is judged according to the encoding of the illegal character. If so, not directly calling the abnormal flow, but returning to the normal character identifier; otherwise, an error flag is returned. When the xml parser finds that the exception handling interface returns the normal character identification, the illegal character can be taken as a normal character to be processed subsequently. Otherwise, the exception continues to be thrown to the upper layer. Through the processing, the effect of transmitting Unicode characters can be achieved.

By hanging a custom exception handling mechanism, according to the attribute of the illegal character, the character identifier returned by the exception handling interface is modified into a normal character identifier, so that the webService interface can normally return the character data. Therefore, the problem that the system throws an exception to stop running due to illegal data is avoided, the data loss condition is also avoided, and the integrity of the xml data is ensured.

Fig. 2 is a schematic structural diagram of an illegal message processing system 200 in a web service environment according to an embodiment of the present invention. As shown in fig. 2, a system 200 for processing an illegal packet in a web service environment according to an embodiment of the present invention includes: a message data acquisition unit 201, an illegal character judgment unit 202, and an illegal character processing unit 203.

Preferably, the message data obtaining unit 201 is configured to obtain, through an API interface, message data in a preset format returned by the server.

Preferably, the acquiring unit 201 of the message data acquires the message data in the preset format returned by the server through the API interface, and includes:

Preferably, the illegal character determining unit 202 is configured to scan and analyze the characters of the message data in the preset format one by using a character analyzer, and determine whether the currently analyzed character is an illegal character based on a preset legal character encoding range condition in an analysis process of each character.

Preferably, the illegal character determining unit 202 scans and analyzes the characters of the message data in the preset format one by using a character analyzer, and determines whether the currently analyzed character is an illegal character based on a preset legal character encoding range condition in an analysis process of each character, including:

Preferably, the illegal character processing unit 203 is configured to, when it is determined that the currently parsed character is an illegal character, invoke an exception handling interface, determine whether the illegal character meets a Unicode character coding range condition based on a preset illegal character processing mechanism, and when the illegal character meets the Unicode character coding range condition, return a normal character identifier to the character parser by the exception handling interface, so that the character parser can parse the illegal character as a normal character.

Preferably, the illegal character processing unit 203 further comprises:

The system 200 for processing an illegal packet in a web service environment according to an embodiment of the present invention corresponds to the method 100 for processing an illegal packet in a web service environment according to another embodiment of the present invention, and is not described herein again.

The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.

Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A method for processing illegal messages in a web service environment is characterized by comprising the following steps:

when the current analyzed character is determined to be an illegal character, calling an exception handling interface, judging whether the illegal character accords with the Unicode character encoding range condition or not based on a preset illegal character processing mechanism, and returning a normal character identifier to the character analyzer by the exception handling interface when the illegal character accords with the Unicode character encoding range condition so that the character analyzer can analyze the illegal character as a normal character;

the obtaining of the message data in the preset format returned by the server through the API interface includes:

2. The method according to claim 1, wherein the step of scanning and parsing the characters of the message data in the preset format one by using a character parser, and judging whether the currently parsed character is an illegal character based on a preset legal character encoding range condition in the parsing process of each character comprises:

3. The method of claim 1, wherein the illegal character comprises: control characters, uncommon words expressed in proxy pairs, and emoji emoticons; the preset format is an xml format; the preset legal character encoding rule is a judgment rule based on the xml legal character encoding range condition specified by W3C.

4. The method of claim 1, further comprising:

5. A system for processing illegal messages in a web services environment, said system comprising:

an illegal character processing unit, configured to call an exception handling interface when it is determined that a currently parsed character is an illegal character, judge whether the illegal character conforms to a Unicode character coding range condition based on a preset illegal character processing mechanism, and return a normal character identifier to the character parser by the exception handling interface when the illegal character conforms to the Unicode character coding range condition, so that the character parser can parse the illegal character as a normal character;

the message data obtaining unit obtains the message data in the preset format returned by the server through the API, and the message data obtaining unit comprises:

6. The system according to claim 5, wherein the illegal character determining unit scans and parses the characters of the message data in the preset format one by using a character parser, and determines whether the currently parsed character is an illegal character based on a preset legal character encoding range condition in a parsing process of each character, including:

7. The system of claim 5, wherein the illegal character comprises: control characters, uncommon words represented by agent pairs, and emoji emoticons; the preset format is an xml format; the preset legal character encoding rule is a judgment rule based on the xml legal character encoding range condition specified by W3C.

8. The system of claim 5, wherein the illegal character processing unit further comprises: