WO2016023471A1 - Procédés de traitement de caractères saisis de manière manuscrite, de séparation et de fusion de données et de traitement de codage et de décodage - Google Patents

Procédés de traitement de caractères saisis de manière manuscrite, de séparation et de fusion de données et de traitement de codage et de décodage Download PDF

Info

Publication number
WO2016023471A1
WO2016023471A1 PCT/CN2015/086672 CN2015086672W WO2016023471A1 WO 2016023471 A1 WO2016023471 A1 WO 2016023471A1 CN 2015086672 W CN2015086672 W CN 2015086672W WO 2016023471 A1 WO2016023471 A1 WO 2016023471A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
encoding
stroke
character
metadata
Prior art date
Application number
PCT/CN2015/086672
Other languages
English (en)
Chinese (zh)
Inventor
张锐
Original Assignee
张锐
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 张锐 filed Critical 张锐
Priority to CN201580042761.6A priority Critical patent/CN106575166B/zh
Priority to CN202310088220.3A priority patent/CN116185209A/zh
Publication of WO2016023471A1 publication Critical patent/WO2016023471A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]

Definitions

  • the present invention relates to data processing technologies, and in particular, to a method for processing handwritten input characters, data splitting and merging, and encoding and decoding.
  • text encoding is the most basic encoding for human input, viewing and editing, modification; for computer analysis and processing.
  • ASCII text encoding standards to today's Unicode
  • standardized text encoding is a basis for transferring information between people and machines and various systems.
  • the existing standardized text encoding is far from enough.
  • standard text encoding and its corresponding text input methods have gradually become the bottleneck of human natural output into the digital world.
  • Standard text-based coding allows humans to participate in the process of data creation, viewing, debugging, and modification, facilitating integration and exchange between different systems, improving the speed of system development, and reducing the cost of system troubleshooting.
  • the text format is redundant for the expression of symbolized data and binary data.
  • the complexity of the structure to be expressed by the system is improved, the complexity of the mark and syntax based on text coding is greatly improved. Data redundancy will also increase.
  • due to the limited number of codes in a specific text encoding standard the conflict between the data content and the grammar mark in the encoding is also inevitable, and text escaping also brings certain data redundancy.
  • binary data is its natural form of data representation. People-defined text format data will also be processed into binary data through conversion to reduce redundancy and improve processing and transmission efficiency. There are also some general binary-based encoding methods, such as the International Standards Organization and the International Telecommunications Union coding standards ANS.1, Google's BufferProtocol, Apache's Thrift and Avro, as well as BSON, Message Pack and so on. However, contrary to the text-based coding method, binary data has the disadvantages of relatively closed, unfavorable exchange, and unfavorable human participation.
  • encoding For encoding, whether it is text encoding or binary encoding, there are two purposes, one is to describe the data object itself, which is also called serialization, which is referred to as the content encoding of the data object.
  • serialization which is referred to as the content encoding of the data object.
  • the aforementioned coding standards and methods are mainly used for content coding.
  • Text-based reference encoding has URN, URL, object identifier (OID) in ANS.1, etc.; binary-based reference encoding has keys in the database, UUID/GUID, IP address, MAC address, MD5, SHA-1, etc.
  • OID object identifier
  • a first aspect of the present invention provides a method for processing handwritten input characters, including:
  • the technical effect of the first aspect of the present invention is to provide a method for processing handwritten input characters, which can realize the effect of inputting a word while inputting, and the user does not need to explicitly or implicitly "start a single text input” or "end".
  • the command of a single text input distinguishes different characters. Therefore, it is not necessary to pause for a period of time or perform some interaction with the system during the writing process, and the writing process is smooth and efficient; and, in the method
  • the character to which the stroke belongs is determined directly by the input position of the stroke, and the identification of the standard character is not required, so that the personalized information and the writing style and characteristics of the user's handwriting input can be retained.
  • a second aspect of the present invention provides a data splitting method, including:
  • the protocol is stripped according to the preset metadata, the metadata in the data object corresponding to the data identifier to be stored is obtained, and the obtained metadata is stripped from the data object.
  • the data content is divided into at least two data segments according to a preset data content splitting specification.
  • the technical effect of the second aspect of the present invention is to provide a data splitting method, which separates the metadata in the user's original data from the data content, and divides the data content into a plurality of data segments, thereby increasing illegal acquisition.
  • the difficulty of the user's original data makes the security of data storage more reliable.
  • a third aspect of the present invention provides a data merging method comprising:
  • the identification information includes positioning information, and the positioning information is used to locate a storage address of part of the data information in the data object;
  • the technical effect of the third aspect of the present invention is to provide a data merging method, which is obtained by stepwise positioning according to the positioning information included in the identification information in the data object acquisition request.
  • the data information stored in each storage body is split, so that each data information is combined according to a preset merge rule to obtain a user original data object, thereby ensuring that data dispersed in each storage body can be efficiently and safely.
  • the acquisition ensures the reliability of the user successfully merging the scattered data into the original data.
  • a fourth aspect of the present invention provides a coding processing method, including:
  • the technical effect of the fourth aspect of the present invention is: obtaining a data object to be encoded and its metadata according to the received encoding processing request, and acquiring an object encoding of the data object according to the encoding warehouse and the data object and the metadata thereof, Since the data object can be encoded according to the metadata of the data object and the encoding warehouse, a flexible and diverse encoding method is realized.
  • a fifth aspect of the present invention provides a decoding processing method, including:
  • the technical effect of the fifth aspect of the present invention is: receiving a decoding processing request, and acquiring an object encoding to be decoded according to the decoding processing request, disassembling the object encoding, obtaining a meta encoding, or the meta encoding and the instance encoding.
  • Querying the code repository, obtaining corresponding metadata and coding specifications according to the meta code, and acquiring data objects corresponding to the object code according to the metadata and the coding protocol, or the metadata, the coding protocol, and the instance code The metadata and the encoding warehouse realize the encoding of the data object. Therefore, not only the flexible coding method is realized, but also the space is saved to a certain extent.
  • the meta-coding of the disassembly and the coding warehouse Effectively improve the efficiency of decoding.
  • FIG. 1A is a flowchart of an embodiment of a method for processing handwritten input characters according to the present invention
  • FIG. 1B is a schematic diagram 1 of a character in a method for processing handwritten input characters according to an embodiment of the present invention
  • 1C is a schematic diagram 2 of a character in a method for processing handwritten input characters according to an embodiment of the present invention
  • FIG. 1 is a schematic diagram of a method for processing handwritten input characters according to an embodiment of the present invention
  • FIG. 1 is a schematic diagram of a state in which a character is inserted in a method for processing handwritten input characters according to an embodiment of the present invention
  • FIG. 1F is a schematic diagram of an editing mode under a selection processing command in an embodiment of a method for processing handwritten input characters according to the present invention
  • FIG. 1G is a schematic diagram of a blank character in an embodiment of a method for processing handwritten input characters according to the present invention
  • FIG. 1H is a flowchart of text editing in an embodiment of a method for processing handwritten input characters according to the present invention
  • 1I is a flowchart of a handwriting program source code conversion method in an embodiment of a method for processing handwritten input characters provided by the present invention
  • FIG. 1J is a detailed flowchart of “standard code conversion for B” in the handwriting program source code conversion method shown in FIG. 1I;
  • FIG. 1K is a schematic diagram of a handwriting program in an embodiment of a method for processing handwritten input characters according to the present invention
  • 1L is a schematic structural diagram of an embodiment of a device for processing handwritten input characters according to the present invention.
  • FIG. 2A is a flowchart of a data splitting method according to an exemplary embodiment
  • 2B-1 is a flowchart of a data splitting method according to another exemplary embodiment
  • 2B-2 is a structural diagram of a system in which a data object of the data splitting method is audio data according to the present invention
  • 2B-3 is a time domain analysis diagram of data objects of the data splitting method according to the present invention.
  • 2B-4 is a diagram of a speech text coding table in which a data object of the data splitting method is audio data according to the present invention
  • 2B-5 is a schematic diagram showing a voice text of a data object of the data splitting method according to the present invention.
  • 2B-6 is another schematic diagram showing the voice text of the data object in the data splitting method according to the present invention.
  • 2B-7 is still another schematic diagram of a voice text of a data object of the data splitting method according to the present invention.
  • 2B-8 is still another schematic diagram of a voice text of a data object in which the data object is a data splitting method according to the present invention
  • 2C is a diagram showing the positional relationship of a data splitting method in a computer system hierarchy according to the present invention.
  • 2D is a flowchart of a data merging method according to an exemplary embodiment
  • 2E is a flowchart of a data merging method according to another exemplary embodiment
  • 2F is a schematic structural diagram of a data splitting apparatus according to an exemplary embodiment
  • 2G is a schematic structural diagram of a data splitting apparatus according to another exemplary embodiment
  • 2H is a schematic structural diagram of a data combining apparatus according to an exemplary embodiment
  • 2I is a schematic structural diagram of a data merging device according to another exemplary embodiment
  • 2J is an exemplary data splitting flowchart
  • 2K is another exemplary data splitting flowchart
  • 2L is an exemplary data merge flowchart
  • 2M is a schematic diagram of an exemplary data split description language definition
  • 2N is a flow chart of an exemplary data split description language visualization
  • Figure 2O is a diagram showing the relationship between concepts in the three concepts of the present invention.
  • FIG. 3 is a schematic diagram of a meta model in the prior art
  • FIG. 4 is a schematic structural diagram of an encoding system of the present invention.
  • FIG. 5C is a flowchart of Embodiment 1 of a coding processing method according to the present invention.
  • FIG. 5D is a flowchart of a specific implementation manner of step 102C in FIG. 5C;
  • Figure 7 is a schematic diagram of the core coding metamodel
  • 8 is a conceptual model of object coding, meta-encoding, instance coding (that is, object reference coding removes the meta-coded part), and data objects and coding meta-objects;
  • FIG. 9 is a diagram showing an example of meta-encoding in the embodiment.
  • Figure 10 is a diagram showing an example of a layer-by-layer correlation of a coded meta-object (variable-length coding of 16-bit word length);
  • FIG. 11 is a schematic diagram of a meta model corresponding to a code
  • Figure 12 is a schematic diagram of a conceptual model of the object encoding
  • FIG. 13 is a flowchart of Embodiment 2 of an encoding processing method according to the present invention.
  • FIG. 14 is a flowchart of Embodiment 3 of a coding processing method according to the present invention.
  • FIG. 15 is a schematic diagram of a glyph corresponding to a non-standard character encoding stored in an encoding warehouse in the handwriting input system of the embodiment;
  • 16 is a core conceptual diagram of an encoding metamodel of an exemplary context-dependent object encoding system
  • 17 is a schematic diagram of a basic object that can be applied to a basic coding space
  • 18 is a schematic diagram showing the coding structure of a 128 fixed length coding scheme
  • Figure 19 is a schematic diagram of four binary bits being four spatial bits
  • Figure 20 is a diagram showing an example of a coding scheme
  • 21 is a diagram showing an example of a coding scheme of UTF-8;
  • Figure 22 is a schematic diagram of object coding consisting of element coding and example coding
  • Figure 23 is a detailed view of the encoding
  • Figure 24 is a rendering result diagram
  • 25 is a schematic diagram of code points other than UTF-8 of OTF-8;
  • Figure 26 is a schematic diagram of the coding to be defined
  • FIG. 27 is a flowchart of Embodiment 4 of a coding processing method according to the present invention.
  • Figure 29 is a schematic diagram of coding combination
  • FIG. 30 is a flowchart of Embodiment 5 of a coding processing method according to the present invention.
  • Figure 31 is a handwriting input program
  • FIG. 33 is a flowchart of Embodiment 2 of a decoding processing method according to the present invention.
  • FIG. 34 is a flowchart of Embodiment 3 of a decoding processing method according to the present invention.
  • FIG. 35 is a flowchart of Embodiment 4 of a decoding processing method according to the present invention.
  • Figure 36 is the content of the handwritten input
  • Figure 37 is a schematic view showing the length of the character pitch
  • Figure 38 is a schematic diagram of a decoding process
  • Figure 39 is a diagram showing an example of a mixed encoded content display
  • Figure 40 is a schematic diagram of the contents of the output
  • Figure 41 is a schematic view showing the strobe stroke falling on the result of the character output
  • Figure 42 is a schematic diagram of adding a standard smiley face icon
  • Figure 43 is a schematic view of an online Go
  • 44 is a schematic structural diagram of a first embodiment of an encoding processing system according to the present invention.
  • FIG. 45 is a schematic structural diagram of a first embodiment of a decoding processing system according to the present invention.
  • 46 is a schematic structural diagram of a word processing system mainly based on an object coding system
  • 47 is a schematic diagram of an architecture of an in-application deployment
  • 49 is a schematic structural diagram of a mobile external device deployment
  • Figure 50 is a schematic diagram of an architecture in which an application shares the same code repository
  • Figure 51 is a diagram showing an example of a network deployment of a code repository being a private cloud deployment or an internal server deployment;
  • Figure 52 is a schematic diagram of the architecture of a point-to-point deployment
  • Figure 53 is a schematic diagram of a hybrid deployment architecture
  • Figure 54 is an architectural diagram of an extended operating system to allow legacy applications to support object encoding
  • Figure 55 is a diagram showing the interaction of an object encoding system and an application system based on the present invention.
  • cloud storage systems and related applications have emerged.
  • the so-called cloud storage system refers to storing user data in a server in the cloud.
  • users can use different terminal devices to access data in the cloud storage at any time, eliminating the migration of data between different terminal systems.
  • users don’t need to By temporarily updating storage devices, cloud storage services provide sufficient scalability to handle a variety of storage needs.
  • Traditional data maintenance tasks, such as data backup and encryption are also transferred to cloud storage servers, which are often more professional and efficient.
  • some data usage patterns different from traditional applications also appear, such as data sharing and network collaboration.
  • a desktop agent is a cloud storage client that is based on a file system.
  • the desktop agent synchronizes the specific folder in the terminal with the cloud storage - the files stored in the folder are automatically uploaded to the server by the agent; other uploaded files received by the server are also automatically downloaded to the corresponding file through the agent. folder.
  • files of the same user are automatically synchronized on different terminals. Users can seamlessly use the data in this folder across platforms in a traditional way.
  • the desktop agent can also automatically synchronize shared folders to different users' terminals, thus facilitating convenient data sharing and cooperation.
  • Dropbox is a typical desktop proxy.
  • Cloud storage systems bring convenient and efficient data access and sharing. But the data stored in the cloud raises an inevitable concern, that is, the protection of security and privacy. The security of core data is completely dependent on the cloud storage system. Many organizations and individuals are based on this, not to put data, at least critical data, in cloud storage systems. There are two main hidden dangers here: one is that the data in the cloud storage is protected by the user's identity authentication. Once the user's identity is stolen, all users' cloud data will be exposed to the thief.
  • the invention mainly relates to a data processing method, system and application, and has the following aspects Effectively solve the above problems.
  • it involves the following three aspects of innovation: (1) a novel handwriting input method and system, especially a method for splitting handwritten input characters; (2) an object-based open codec solution, which can be free, Any encoding method that is open to encode or decode any data object; and (3) an object-based data splitting/merging method that splits/separates the metadata and/or encoded data of the data object from the corresponding data content to Guarantee the security of data content.
  • These technical solutions can be implemented separately or in combination, or combined with other technical fields, alone or in combination.
  • the invention has broad application prospects and great application value.
  • the specific plan is as follows:
  • the invention provides a data object based encoding method, the method comprising:
  • the generating object encoding step in step c) includes: generating a meta-code and/or an instance code for the data object according to a predetermined rule, and by the element The encoding and/or instance encoding generates the object encoding.
  • step of compressing and/or encrypting the data object is further included before step a), and after step c), further comprising generating the generated The encryption step of the object encoding.
  • meta-coding comprises one of the following encodings, or a combination and/or nesting of two or more types: spatial encoding, context encoding, and type encoding. .
  • the method further includes: a data splitting step of splitting the large data object into small data blocks according to a predetermined rule (or As a data segment, steps a) to c) are performed on each of the split data blocks during or after the data splitting process until the encoding of all the data blocks is completed.
  • the invention also provides a data object based decoding method, the method comprising:
  • the step of decoding the object in step b) comprises: disassembling the object code into a meta-code and/or an instance according to a predetermined rule at the time of encoding. coding.
  • an authorization verification step of acquiring a predetermined rule when encoding and/or encoding the object is further included.
  • the invention also provides a handwritten input character splitting method, the method comprising:
  • step c) is performed in one of the following cases: 1) in the input and writing process of the current stroke, 2) or at the current After the stroke input is completed (ie, after the pen is lifted), 3) or after the current line is entered.
  • the current stroke is only compared with the strokes and/or characters within the predetermined range one by one.
  • step c) comprises:
  • the currently entered stroke is the first stroke on the space in the row/column and is in the current row/column
  • Other characters (or strokes) that have been entered are not associated, or if the currently entered stroke is the last stroke in the space in the row/column and is not related to other characters (or strokes) already entered in the current row/column Create a new character for the stroke; if the current stroke is neither the first stroke on the space in the row/column nor the last stroke on the space in the row/column, then the current stroke is entered
  • the spacing between all characters passed is compared and the currently entered stroke is attributed to the associated one or more characters (or strokes).
  • a threshold (MIN_GAP) of a minimum distance between the stroke and the character or the stroke and the stroke is preset, each of The spacing between the stroke and other characters or strokes that have been entered is compared to the threshold to determine the association between the stroke and other characters or strokes.
  • the method further includes: recording, when receiving each input stroke, the input time and the input position information of each stroke.
  • the input time includes a pen down time and a pen up time
  • the input position includes at least: a position when the pen is dropped, a position when the pen is lifted, and the stroke The coordinate position of each point in the handwriting.
  • the invention also provides an object-based data object splitting method, the method comprising:
  • the data splitting/peeling protocol comprises at least one of the following options or a combination of two or more: 1) data content splitting protocol , recording the method and process of splitting the data content; 2) the metadata stripping protocol, recording the method and process of separating the corresponding metadata from the data object; 3) if generated during the data splitting process
  • the encoding also includes an encoding separation protocol, and records the encoding rules and encoding processes between the corresponding encoding and the encoded object.
  • step c) further comprising the step d): reassembling the split data segments.
  • At least a part of the metadata constitutes split metadata.
  • the invention also provides an object-based data object merging method, the method comprising:
  • the method further includes: a storing step of splitting/stripping each data segment Stored separately in different banks or under different secure channels.
  • FIG. 1A is a flowchart of an embodiment of a method for processing handwritten input characters according to the present invention.
  • the method for processing handwritten input characters provided by the embodiment can be closer to people's natural writing habits than the existing handwriting input system, and at the same time completely and truly preserve the writing style and features of the writer.
  • the method in this embodiment may include:
  • Step 101A In the currently activated first target row/column, acquire a stroke input by the user and corresponding input information; wherein the input information includes an input position of the stroke in the first target row/column .
  • the execution subject in this embodiment may be a handwriting input device such as a conventional touch screen, handwriting screen, or other suitable handwriting device, or directly adapted to the handwriting system of the present embodiment.
  • the present embodiment may employ a touch screen type handwriting input device, that is, an input device that can directly input information on the screen by handwriting or by means of a dedicated or non-dedicated writing tool.
  • the embodiment can be applied to any writing mode, and the writing mode can be set by the user or the default setting.
  • the writing manners described in this embodiment may include, but are not limited to, the following methods: writing in a row (corresponding to a commonly used horizontal format, left to right, top-down writing habits); writing in columns (corresponding to vertical Row format, top-down, right-to-left writing habits; can also be other user-defined writing formats, for example, can be a right-to-left writing format set for Arabs; or it can be self Top down, writing format from left to right, and so on.
  • each stroke of the user and its input position can be recorded in chronological order.
  • the system automatically records the ⁇ and the input position of the ⁇ on the panel, for example, the pixel position of the handwriting input screen can be used.
  • the corresponding input position other positioning algorithms or position determining methods may be employed as long as the input position of each stroke can be uniquely determined.
  • a target row/column which can be used as a constraint range for the user's handwriting input, that is, when a row/column is activated, it becomes a target row/column.
  • the user can be prohibited from handwriting input in an area other than the target row/column, or the user is allowed to input at any position, but when the stroke input by the user exceeds the boundary of the target row/column, it can be used.
  • the method provided in this embodiment can be used as a limitation or constraint of input in units of rows (horizontal rows) or columns (vertical rows), that is, the current input can only be limited to a specific row or column, and there is no span. Line or column strokes or text. Based on this row or column constraint, the input can form a stream of characters in the order of input.
  • the method provided by the embodiment is closer to the natural writing habits of the people, so that the writing experience of the user can be more natural and smooth.
  • the range of the target row/column may be displayed on the handwriting input screen, for example, highlighting the target row/column, or displaying a line in a text or letter format on the handwriting input screen/ A column or a grain pattern, etc., to indicate the location of the target row/column that the user can currently input.
  • the currently activated first target row/column may be selected or created. Selecting or creating the currently activated first target row/column can take many forms, and the present embodiment gives the following two.
  • the location range of each row/column is determined, which may specifically include:
  • the row height/column width information is a default value or determined by the user input
  • the position range of each row/column refers to a relative top edge position and a bottom edge of each row in the handwriting input screen.
  • the position or each column is in the opposite left and right positions in the handwriting input screen.
  • the handwriting input screen can be divided into a plurality of rows/columns, and the range of positions of each row/column can be determined.
  • the strokes can be input based on the divided rows/columns.
  • the target row/column can be selected by the user.
  • the target row/column selected by the user may specifically include:
  • Target row/column selection message receives a target row/column selection message input by the user, where the target row/column selection message includes an identifier of the target row/column to be input by the user;
  • a row/column corresponding to the identifier of the target row/column to be input by the user is used as the currently activated first target row/column.
  • the identifier of the target row/column to be input by the user may be any coordinate point clicked by the user, and the row/column where the coordinate point is located is the row/column corresponding to the coordinate point; or, the The identifier of the target row/column to be input by the user may be a row/column number, for example, the 10th row or the 10th column, and the row/column corresponding to the row/column number may be used as the first target of the current activation. Row/column.
  • the user can select the target row/column through the input device that is accessed. For example, when an external keyboard is used, the user can select a target row/column through the keyboard; or, when an external mouse is connected, the user can select a different target row/column by moving the mouse; or, when an external stylus is input, it can be input. Before the pen is in contact with the handwriting input screen, the target row/column is selected by the pointing of the input pen.
  • Select target row/column mode 2 Activate a target row/column based on the characters previously entered by the user.
  • the method may specifically include:
  • the position range refers to a relative top edge position of the first target line in the handwriting input screen and The bottom edge position or the first target is listed in the opposite left and right positions in the handwriting input screen.
  • an appropriate threshold can be set for the width of the first target row/column to meet the needs of a particular user.
  • the natural writing line of the writer may be habitually inclined to the right or to the lower right.
  • the boundary of at least one character that the user has input may be appropriately extended upward or downward by a distance.
  • the two methods of selecting the target row/column provided above are simple and fast; the second method can satisfy the user's personalized input and the handwritten text input in the graphic system.
  • Step 102A for each stroke, according to an input position of the stroke in the first target row/column, or an input position of the stroke in the first target row/column and the first target row
  • the character specified in the /column creating a new character for the stroke or determining the character to which the stroke belongs.
  • This embodiment adopts a text division or division manner different from the prior art, that is, the attribution of the current input stroke is determined based on the correlation between each input stroke and other characters or strokes. Therefore, the method provided in this embodiment can save the user's tedious interaction process by inputting characters, thereby greatly simplifying the input operation.
  • the character refers to an independent character object having a two-dimensional shape, including not only standard characters of ideographic characters, such as single Chinese characters, Japanese, Korean, Arabic, Vietnamese, Burmese, etc. or parts thereof (for example, radicals, etc.) Or standard words of phonetic characters, such as English letters, German, French, Russian, Spanish, etc.; or computer characters based on traditional standard codes, such as ASCII characters, Unicode characters, or a string or the like; a combination of characters and strings of handwritten characters and standard characters; or any graphic or image input by the user, such as a "heart" pattern, a photo, any graffiti, etc., or Any other written expression.
  • standard characters of ideographic characters such as single Chinese characters, Japanese, Korean, Arabic, Vietnamese, Burmese, etc. or parts thereof (for example, radicals, etc.)
  • standard words of phonetic characters such as English letters, German, French, Russian, Spanish, etc.
  • computer characters based on traditional standard codes such as ASCII characters, Unicode characters, or a
  • FIG. 1B is a schematic diagram 1 of a character in a method for processing handwritten input characters according to an embodiment of the present invention.
  • FIG. 1C is a second schematic diagram of a character in a method for processing handwritten input characters according to an embodiment of the present invention.
  • Five characters are shown in FIG. 1B, including "stroke characters”, that is, handwritten characters input by the user, such as first, third, and fourth characters, and "graphic characters", that is, arbitrary graphic or image information input by the user, Such as the second and fifth characters.
  • other characters such as “standard characters” (any of the existing standard fonts), “combined characters” (mixed characters of various characters mixed together), and the like can be input in this embodiment.
  • “combined characters” can also directly include the stylus Stroke - When a handwritten stroke is written directly on a non-"stroke character", a “combined character” is formed. As shown in FIG. 1C, the word “ ⁇ ” is a combination of standard characters and stroke characters.
  • the strokes input in the first target row/column can be automatically divided according to the intrinsic convention of the set language (for example, based on the writing or typesetting manner of each language, etc.).
  • determining the character to which the stroke belongs is a process of splitting the input character.
  • the splitting operation of the input characters ie, the wording operation
  • the splitting operation of the input characters can be realized by splitting one side while inputting, that is, with the natural writing of the user, it can be determined which character the stroke has been input belongs to, so that the side input can be realized.
  • the effect of the word on the side is a process of splitting the input character.
  • one of the following methods may be selected: (1) from the moment the user drops the pen, the input stroke is judged in real time by the dot matrix of the input stroke to determine the attribution thereof. (2) making a judgment on the attribution of each stroke after completing the input of each stroke (ie, raising the pen); (3) after completing the input of one line, or determining that the user has a longer input pause At the same time, all the strokes entered before are judged one by one, and those strokes with the highest correlation or the strongest correlation are attributed to the same character.
  • a new character can be created for the stroke; if the stroke is not the first target row/
  • the first stroke of the column may create a new character for the stroke according to the input position of the stroke in the first target row/column and other characters in the first target row/column Determining the character to which the stroke belongs.
  • the method for processing handwritten input characters provided in this embodiment, in the currently activated first target row/column, acquiring a stroke input by the user and corresponding input information, and according to the stroke in the first target row/column An input position in the input, or an input position of the stroke in the first target row/column and a character specified in the first target row/column, creating a new character for the stroke or determining the stroke
  • the attribute of the attribution can realize the effect of typing on the side of the input. The user does not need to distinguish the different characters by means of explicit or implicit "start single text input” or "end single text input” commands.
  • the writing process is smooth and efficient, and the input position of the stroke is directly determined by the method to determine the character to which the stroke belongs, without standardization. Character recognition, thus retaining the personalized information and writing style and features of the user's handwriting input.
  • the present embodiment can make the handwriting input more natural and smooth, it is more convenient for the elderly and children who are unfamiliar with electronic input devices such as computers, mobile phones, tablet computers, laptop computers, notebooks, and iPads to use these devices.
  • the handwritten input character processing method in this embodiment adopts a pen/paper model.
  • the user can directly activate any line in the page for input.
  • the system can process empty lines between handwritten input and handwritten input as empty paragraphs. For the user, there can be only the command to change the input line, and there is no concept of carriage return or line feed.
  • the line break function can be implemented in multiple manners. In this embodiment, the following four types are provided:
  • the second target row/column is the currently activated target row/column, and the second target row/column is the next row/column of the first target row/column.
  • the position of the line break can be determined by a preset interaction mode. For example, it may be stipulated in advance that the end of the line is confirmed by continuously clicking a corresponding position or button of the right border of the input box or the screen twice or three times each time the line is naturally written to reach the end of the line.
  • a command button can be set at the end of the first target row/column, and when the user clicks the command button, the next row/column is automatically activated for editing.
  • the second target line is/ The column is the currently activated target row/column to enable acquisition of the stroke of the user input in the second target row/column;
  • the second target row/column is the next row/column of the first target row/column.
  • the first target line/ The column and the second target row/column are simultaneously the currently activated target row/column;
  • the second target row/column is the next row/column of the first target row/column.
  • the user's stroke may span multiple rows/columns.
  • the row/column to which the stroke belongs must be determined by certain rules: it can be the row/column where the starting point is located. It can also be the row/column of the end point, or the row/column with the largest proportion.
  • this contradiction can also be alleviated by increasing the row/column spacing between adjacent two rows/columns.
  • the first target row/column and the second target row/column are simultaneously used as the currently activated target row/column, the first target row/column and the second target row/column are both Partial area activation;
  • a starting position of the active area of the first target row/column is set between an end position of an active area of the second target row/column and an end position of an active area of the first target row/column.
  • the user decides whether or not to break the line by fully controlling the position of the handwriting panel representing the active area within the segment.
  • the handwriting panel itself has the feature of automatically breaking lines within the paragraph.
  • the system will move some or all of the handwriting panel to the next line or above according to its position in the paragraph and the relationship with the current line.
  • One line As the position within the segment is different, the content presented in the handwriting panel will change accordingly.
  • the handwriting panel is moved to the last line of the paragraph, the re-triggering of the handwriting panel's automatic line break actually breaks the paragraph.
  • FIG. 1D is a schematic diagram of a method for processing handwritten input characters according to an embodiment of the present invention, in which two adjacent rows are simultaneously activated.
  • the position in the box in the figure is the active area.
  • the active area is a logically continuous area within two adjacent rows/columns, and the user can only input in the active area. Since the active areas of two adjacent rows/columns overlap, this avoids the occurrence of cross-row/column strokes.
  • the active area can also be switched to the full row/column range (the first target row/column or the second target row/column) according to the user's interaction.
  • the target line and the relevant area of the previous line may be used when the distance between the input position of the stroke in the target line and the start position of the line is less than a certain threshold. Simultaneous activation; if the currently activated target line is not the end of the segment, the target row and the relevant region of the next row can be simultaneously activated when the distance between the input position of the stroke in the target row and the end position of the row is less than a certain threshold.
  • the user may need to issue a "line extension" command, followed by a blank line that belongs to this paragraph, in order to enable the function of simultaneously activating two adjacent lines.
  • the first method and the fourth method are that the user actively breaks the line, and the target row/column is transferred through the interaction with the user, which is more accurate; the second method and the third method are automatic line breaking, and no additional interaction with the user is needed. Operation, as long as the user's writing style fully meets the requirements of rows or columns, the end position of each row/column can be automatically recognized without the user having to interactively confirm the end of each row/column, so that the entire handwriting can be input even The screen is made like ordinary paper Use, greatly improving the user's input experience.
  • Line break means that the current paragraph is not over, but since the handwritten character has been entered at the end of the line, the next line needs to be activated; the end of the paragraph means the end of the paragraph, and when the paragraph is judged, it can be inserted after the line.
  • Line then activate the next line of the blank line as the first line of the next paragraph, so that the user can input on the next line of the blank line; or, when the judgment paragraph ends, you can directly activate the next line/column of the line as the next paragraph
  • the first line is used for input.
  • any one of the above-mentioned line break modes one, two, and three may be used to perform line break.
  • some interaction with the user is required.
  • paragraph extension command only makes sense on the last line of the paragraph or the last line inserted.
  • the current edit line and all other lines in the corresponding paragraph of the line will have some sort of visual state to distinguish them from other paragraphs.
  • the new character or the attribute that is created by the acquired stroke is saved every preset time;
  • the stroke input by the user and the corresponding input information may be saved in the first memory; the saved characters are stored in the second memory, and the characters include the composition for each saved character.
  • the strokes and their input information and corresponding characters may all be stored in one memory, which is not limited in this embodiment.
  • any suitable storage method may be employed as long as it can effectively distinguish the characters to which each stroke belongs and each different character.
  • information such as input strokes and divided characters can be stored in a temporary storage location or space of the system (such as RAM or flash memory of the system) while inputting, and the input of each target row/column is ended. All of the divided character and stroke information in the target row/column is then stored in the specified permanent storage location or space.
  • the input information corresponding to the stroke further includes one or a combination of the following: an input time of the stroke, an input strength of the stroke, and the The input speed of the stroke.
  • the input time includes a pen down time and a pen up time of the stroke, and a dwell time of each point in the stroke of the stroke;
  • the input position includes at least: a position when the pen is dropped, a position when the pen is lifted, And the coordinate position of each point in the stroke of the stroke.
  • information such as input time, velocity, and speed of each stroke can be recorded as needed to further refine the input information.
  • the strokes and corresponding input time, velocity and speed can be stored in a separate stroke database in the form of a list.
  • the present embodiment can record and retain the detailed input information of each stroke in accordance with the stroke order at the time of writing while receiving each input stroke, it is possible to completely record and retain all the writing styles associated with each user. And almost all the information that is used to it, such as stroke order style, stroke style, word spacing and other writing features, making for example handwriting identification a breeze.
  • This embodiment also shows great advantages for missing strokes. For example, when the user enters the word “I”, he forgets to input " ⁇ " (dot) in the upper right corner, and finds the missing stroke " ⁇ ” after inputting other characters. At this time, the user can be as normal. Writing on paper is like “I” The “ ⁇ ” is added to the corresponding upper right corner position of the original position of the word. Although the input time of the " ⁇ ” is different from the input time of other strokes of the "I” character, it can be judged from the position information that the " ⁇ " belongs to The previously entered part of the "I” word.
  • the present embodiment can completely retain all the input information including the input time, position, velocity, speed, and word spacing of each stroke, it also provides a wider space for application services such as subsequent editing and other processing. .
  • the input position in the first target row/column according to the stroke in step 102A, or the stroke is in the first target row/column
  • the input position in the first target row/column, the character specified in the first target row/column, the creation of a new character for the stroke, or the character to which the stroke belongs may specifically include:
  • the stroke is associated with at least one character
  • the stroke is attributed according to the associated at least one character.
  • the specified character in the embodiment may be all the characters that are already in the first target row/column; or the specified character may be the to-be-compared region in the first target row/column. a character in the middle, wherein a distance between a boundary position of the area to be compared and the stroke is less than a second preset threshold. Comparing the stroke with only a certain range of characters in the surrounding area can effectively reduce the amount of calculation and improve the efficiency of the stroke attribution determination.
  • Judging the relevance mode Determine the relevance of the stroke to the character by judging whether the stroke coincides with the character. Specifically, the input position in the first target row/column according to the stroke in step 102A, or the input position of the stroke in the first target row/column and the first target The character specified in the row/column, create a new character for the stroke or determine the location
  • the characters to which the stroke belongs may specifically include:
  • the stroke is associated with at least one character
  • the stroke is attributed according to the associated at least one character.
  • strokes that intersect each other can be used as strokes of the same character, and the strokes are assigned to the same character, which is simple and quick.
  • the relationship between the stroke and the character is determined by calculating the distance between the stroke and the character boundary.
  • the input position in the first target row/column according to the stroke in the step 102A, or the input position of the stroke in the first target row/column and the first A character specified in the target row/column a new character is created for the stroke, or a character to which the stroke belongs is determined, which may specifically include:
  • the stroke is associated with at least one character
  • the stroke is attributed according to the associated at least one character.
  • the characters to which the strokes belong can be determined by comparison with a preset third preset threshold.
  • the stroke may be considered to belong to the adjacent character, otherwise a new attribution character may be created for the stroke.
  • Judging the relevance mode 3 Determine the correlation between the stroke and the character by calculating the distance between the stroke and each stroke in the character.
  • the input position according to the stroke in the first target row/column, or the input position of the stroke in the first target row/column and the first target row/column may specifically include:
  • the stroke is associated with at least one character
  • the stroke is attributed according to the associated at least one character.
  • the performing the attribution processing on the stroke according to the at least one associated character may include:
  • At least two characters associated with the stroke If there are at least two characters associated with the stroke, at least two characters are combined and the stroke is attributed to the merged character.
  • a stroke when a stroke can be attributed to the left and right characters at the same time, it indicates that the stroke should be merged with the characters on the left and right sides to form a glyph, for example, the "tree” in the word “side” The positional relationship between the stroke in the middle and the "wood” on the left side and the "inch” on the right side.
  • the preset threshold may not be set as long as the characters can be divided.
  • the association between the stroke and the character can be divided into strong and weak, and the attribution of the stroke is judged according to the strength of the association.
  • the performing the attribution processing on the stroke according to the at least one associated character may include:
  • At least two characters with the strongest association with the stroke at least two characters are merged, and the stroke is attributed to the merged character.
  • the obtaining the most strongly associated character from the stroke from the associated at least one character may include:
  • At least one character associated with the stroke is sorted in order from small to large, and the character corresponding to the minimum distance is used as the most relevant to the stroke. Strong character; or,
  • the default is that the stroke with the upper and lower positional relationship can be attributed to the same character, and only the positional relationship between the stroke and the adjacent left and right characters needs to be judged.
  • the default is that the stroke with the left and right positional relationship can be attributed to the same character, and only the positional relationship between the stroke and the adjacent upper and lower characters needs to be judged.
  • the methods described in the above various manners may be comprehensively used, for example, the method of judging the relevance method 1 is used for some strokes, and some strokes are determined. The method of judging the relevance method 2 is used for judging, and the remaining strokes are judged by the method of judging the correlation method.
  • the method of determining the relevance manner may be used to determine whether the stroke is The other characters already entered in the first target row/column are associated, if not associated, a new character is created for the stroke; if the current stroke is neither the space in the first target row/column If the stroke is not the last stroke, the distance between the currently input stroke and all the characters or strokes that have been input may be compared according to the method of determining the correlation method 2 or determining the correlation method 3, and The currently entered stroke is attributed to the associated one or more characters based on the result of the comparison.
  • the first preset threshold, the second preset threshold, the third preset threshold, and the fourth preset threshold may be determined by the user according to their own writing habits, and may also adopt a system default value.
  • system can also provide visual information to assist in automatic segmentation, such as character-based character segmentation: based on the correlation between the current input stroke and the corresponding text stripe in the current input line, the current input stroke should be determined. character.
  • the text can also be used to determine the attribution of the stroke. Specifically, before the collecting in step 101A acquires the stroke input by the user and the corresponding input information, the first target row/column may be divided to divide the first target row/column into multiple Writing a text.
  • the input position in the first target row/column according to the stroke in step 102A, or the input position of the stroke in the first target row/column and the first target The character specified in the row/column, creating a new character for the stroke or determining the character to which the stroke belongs, including:
  • the stroke is attributed to an existing character in the composition; otherwise, a new character is created in the composition, the stroke being attributed to the new character.
  • the stroke spans a composition
  • a new character the new character belongs to the composition; if the stroke spans at least two composition grids, determining whether there is a character in the at least two composition grids, if the at least two composition grids If there is no character, a new character is created for the stroke, and the new character belongs to the at least two composition grids.
  • the stroke is attributed to the composition in which the character exists, if the at least two compositions If there are multiple characters in the grid, the characters in the plurality of composition grids are merged, and the strokes are attributed to the merged characters.
  • each input character of the embodiment is divided and stored on the basis of a glyph object (non-standard, ie, handwritten character), in other words, in this embodiment, or
  • a glyph object non-standard, ie, handwritten character
  • Each input character that is segmented is treated as a non-standard glyph object; on the other hand, if the handwritten content is ultimately only used for human reading (more on the retention of the original input information form), the division error does not need to be corrected. .
  • This embodiment provides a corrective method, which specifically includes:
  • the correction request including a character to be corrected, or a character to be corrected and a stroke to be corrected;
  • the specific content of the correction request may be different according to different scenarios.
  • the following scenarios are provided:
  • Scenario 1 Combining two characters into one, that is, the correction request is a merge correction request, and the character to be corrected is at least two characters to be merged;
  • the correcting processing is performed on the character to be corrected according to the correcting request, including:
  • Scenario 2 splitting a character into a plurality of characters, that is, the correction request is a split correction request, and the character to be corrected is a character to be split;
  • the correcting processing is performed on the character to be corrected according to the correcting request, including:
  • Scenario 3 changing a stroke attributed to one character to another character, that is, the correction request is a home correction request, the character to be corrected is a character to be vested, and the stroke to be corrected is At least one stroke to be corrected;
  • the correcting processing is performed on the character to be corrected according to the correcting request, including:
  • At least one stroke to be corrected is attributed to the to-be-vested character.
  • the characters that have been split can be re-splitted by interacting with the user, thereby improving the accuracy of character splitting.
  • each character possibly a combination of one or more words, words
  • the method provided by the embodiment can also record the stroke order (based on time) of each stroke written by the user and the shape feature of the corresponding stroke, it is easy to find out the same or similar stroke order according to the information. Characters with stroke shape characteristics can be treated as the same character if the appropriate threshold conditions are met. This makes matching, searching, and searching for characters a breeze, and even searching for the characters entered by the user.
  • the functions of finding and inserting can also be added.
  • the search function may specifically include the following steps:
  • the characters to be searched are compared with the locally saved characters according to the number of strokes of the character to be searched and the stroke feature, and characters matching the characters to be searched are obtained.
  • the split handwritten character characters can be obtained.
  • handwritten text search based on pattern matching can be performed. The main thing is to match each character in the search source with the character to be found one by one. Matching characters can be found by matching the number of strokes and the stroke order.
  • the one-to-one matching between the character to be searched and the stroke in the locally saved character that is, the matching of the curve, if not, the final matching result is a failure, and if they are consistent, the final matching result is successful.
  • any character analysis or other matching method in the prior art can be used to implement the character search function, which is not limited in this embodiment.
  • the function of replacing characters can also be implemented based on the same principle as the search function, and will not be described here.
  • the insertion function of the handwritten text input editing may specifically include the following steps:
  • the insertion request including a target row/column to be inserted, a to-be-inserted position in the target row/column to be inserted, and a character to be inserted;
  • the user needs to add a character at a position that has become an inactive line, for example, when inserting a character between the 3rd and 4th characters of a line, the user needs to activate the line first, and the system will be in the line.
  • the blank character provides an auxiliary interface that accepts user input. The user activates the auxiliary interface between the 3rd and 4th characters of the line, and optionally inserts an insertion operation at the character interval.
  • FIG. 1E is a schematic diagram of a state in which a character is inserted in an embodiment of a method for processing handwritten input characters according to the present invention.
  • the existing characters after the insertion position can be moved to the next line, and the insertion position is to the end of the current line. It is a space for writing. Insert the line marked with the right arrow and click the right arrow to exit the insertion state. Before the insertion is complete, the user can only enter between the two insertion markers.
  • inserts can be nested, that is, inserts can be inserted again. Insert rows have different visual states than normal rows to help users clarify the current editing state.
  • the selection processing command includes any one or a combination of the following: performing copy processing on the at least one character, performing cut processing on the at least one character, and performing replacement processing on the at least one character And performing a merge process on the at least one character.
  • FIG. 1F is a schematic diagram of an editing mode under a selection processing command in an embodiment of a method for processing handwritten input characters according to the present invention. As shown in FIG. 1F, functions such as inserting, pasting, selecting all, selecting, and merging can be displayed on the handwriting input screen to facilitate the user to perform corresponding operations.
  • the embodiment may also insert or add a stroke, a comment, or delete some characters or the like on the input character.
  • the functions of searching, inserting, and copying provided in this embodiment can effectively avoid the disadvantages of the existing handwriting input system being less intuitive and difficult to modify.
  • the number of the first target rows/columns is plural;
  • the active areas corresponding to the plurality of the first target rows/columns do not overlap and are not in contact with each other.
  • multiple users can input in the active areas corresponding to the plurality of first target rows/columns, respectively, satisfying the function that the large-size handwriting input screen allows multiple people to simultaneously input.
  • the embodiment is compatible with the existing keyboard, mouse, and other existing input devices, and the hybrid input is implemented by performing mode switching.
  • the mode switching method in this embodiment may specifically include:
  • the handwriting mode is switched to the target mode, and in the target mode, at least one standard character input by the user is received.
  • the target mode may be a keyboard input mode, a mouse input mode, or other existing input modes.
  • a mixed typesetting can be implemented by adding standard code characters or inserting other symbols or information into the input limits of a row or column in combination with an existing keyboard (see handwritten text mixing in the example of the present application).
  • keyboards can be activated by means of appropriate touch buttons or operations (eg, clicks) to allow the user to freely switch between handwriting input and other conventional input devices such as a keyboard.
  • touch buttons or operations eg, clicks
  • a division form of a standard code may be used, or a division manner of characters in the present invention may be used.
  • the active area can also automatically move with the user's input. For example, the active area is always repositioned with the position of the user's last stroke as the midpoint of the active area. In this way, in most cases, the active area will automatically move as the user writes, so that the location of the active area does not need to be manually set.
  • the system will have a flashing cursor to indicate the current input position.
  • the system displays the active area to indicate the range that can be currently input.
  • the two can be converted to each other according to certain rules. For example, when switching from standard character input to handwriting input, the system sets the position of the active area with the cursor position as the midpoint of the active area; when switching from handwriting input to standard character input, the character position closest to the midpoint of the active area is Is set to the current input position.
  • Control characters exist in the standard code (such as ASCII code) character set.
  • ASCII code ASCII code
  • control characters may be standard control characters, such as spaces, tabulations, line breaks, and the like; or non-standard control characters, such as white space characters.
  • standard control characters are similar to the prior art.
  • this embodiment additionally provides the function of blank characters.
  • the space spacing information between characters can be reserved, for example, the size of the space between the left and right characters for the horizontal format, or the size of the space between the upper and lower characters for the vertical format, etc., and can directly blank The spacing is created as a whitespace character with blank spacing information.
  • the horizontal baseline of the target line where the character is located may be limited to the horizontal baseline of the character, and the character is the most The position of the left part (such as graphics, images, strokes, etc.) is set to the starting position of the character.
  • Each part in the character is based on the baseline and the starting position, and the typesetting direction is recorded in the positive direction. s position. In this way, the same character content can appear in different positions of the text.
  • the corresponding character origin coordinates are correctly calculated according to the line of the character and the position of the character in the line, all the internal components can be correctly drawn.
  • the starting position of each character can be set in a similar manner, and the relative internal coordinates of the starting position are used for the character internal part position.
  • FIG. 1G is a schematic diagram of a blank character in an embodiment of a method for processing handwritten input characters according to the present invention.
  • a custom space character is introduced, and the word spacing is saved as a parameter/content.
  • the numbers 12, 16, and 10 in Fig. 1G are numerical values of each blank character, indicating the length information of each blank character. In the process of analysis and processing (such as identification, bypass, etc.) can be treated differently. Similarly, time-based whitespace characters can be added to the text of the voice input.
  • the maximum coordinate of the character entered by the user along the layout direction is the width of the character.
  • the character width we can store it or not, but recover it by the position information of all internal parts in the character.
  • formatting text as long as you get the width information of all characters (including control characters), you can restore all the characters in the starting position of the row/column, providing a basis for further text rendering.
  • control characters and blank characters are introduced. These control characters have similar models, codes, glyphs, and meanings as the characters handwritten by the user. Therefore, the theory, methods, and tools for processing handwritten input characters can be used directly or indirectly to control characters. Further, the characters handwritten by the user and the control characters can be mixed and processed together, with this base Basic, the splitting of characters is even more significant.
  • the object processed in this embodiment may be a stroke character, a standard character, a graphic character, a combined character or a control character input by the user, or may be a mixture of a plurality of characters.
  • FIG. 1H is a flow chart of text editing in an embodiment of a method for processing handwritten input characters according to the present invention. As shown in FIG. 1H, the text editing in this embodiment may specifically include the following steps:
  • Step 601A Determine the open mode: if the existing document is opened, step 602A is performed; if the new document is created, step 603A is performed.
  • This embodiment is mainly used to provide personalized handwritten character input for related documents, and there are mainly two ways of entering the handwriting input system: a method with document data and a method without document data.
  • the former is to open an existing document, and the latter is to create a new document.
  • Step 602A loading document data and performing typesetting according to the typesetting constraint, and executing step 604A.
  • the related data of the characters may be hierarchically loaded. For example, when formatting a character, all that is required is the width of the associated character (higher for column-based layout), so in this step, only the width information of the character can be loaded. Other information, such as drawing stroke information or contour information, can be loaded on demand later, which saves system resources (memory, network traffic, etc.). And step 604A is performed.
  • Step 603A initializing the handwritten document, and executing step 604A.
  • Step 604A Initialize (empty) the sequence of handwritten text objects representing the character input lines.
  • AL Active Line
  • Step 605A presenting the document content, and performing step 606A.
  • the presented content includes multiple parts: visual information of the document itself (including visual information of handwritten characters, such as the position and shape of characters), visual information of the document presentation environment (such as background, shading, paper border, etc.), Visual information related to document editing (such as selected area, cursor or active area indicating input focus, auxiliary lines, etc.). It is mentioned in step 602A that the visualized data of the handwritten characters must be loaded when it needs to be presented. For characters that do not need to be rendered, their corresponding visualization data may not be loaded.
  • the character stream is loaded from the storage area to Memory, you need to typeset before displaying.
  • the typesetting here refers to line breaks.
  • the line can be broken at the end of the paragraph mark/newline (hard return); the position of each character is calculated in each row/column, and the total length of the input text content is accumulated. Breaks when the position exceeds the maximum position of the line (soft return). The truncated position is at the last breakable line.
  • Punctuation can be broken after the punctuation (punctuation can not be used as the first character after the soft carriage return);
  • Blank spaces can be broken, and the first character of the next line is the following non-whitespace character (the whitespace character cannot be used as the first character after the soft carriage return);
  • East Asian characters can be directly broken before and after;
  • Handwritten characters can be broken directly before and after.
  • whitespace characters can be converted to blank spaces with standard lengths. Continuous blank spaces can be merged directly, so the typesetting algorithm is much simpler. Blank spacing is handled in the same way as whitespace characters.
  • the document model after typesetting includes information for each display line.
  • the line includes words with position (including characters, East Asian characters, and handwritten characters). Blank characters do not need to appear in this model, and the relevant information is implicit in the position attribute of the word (left border, right border (left border + width)). Therefore, blank characters (including white space, standard white space, tab characters, etc. caused by handwriting pitch) can be discarded after typesetting.
  • the text in each line will change with the user's input. User input and erased strokes may cause the spacing of characters to change or generate new characters. As long as the character coordinates are correct, the spacing will be correctly generated. Only when you need to store the edited content, you need to calculate and generate whitespace characters and insert them into the appropriate locations.
  • Step 606A receiving the command, and performing different operations according to the command.
  • the commands here can be commands entered by the user, or they can be system commands or commands passed by other application systems.
  • step 607A if the command is a text encoding typesetting command, step 607A is performed; if the command is to start a handwriting input command, step 608A is performed; if the command is to end a handwriting input command, step 610A is performed; If the command is a system exit command, step 612A is performed.
  • Step 607A Typesetting the text content according to the command.
  • the typesetting constraint and the typesetting direction can also be stored in the information of each character.
  • the internal relative position of all the characters in the current typesetting mode can be adjusted according to this information, thereby correctly drawing the character.
  • the horizontally typed characters are stepped according to the width (that is, the line length is accumulated from left to right according to the typesetting direction), and the vertically typed characters are stepped according to the height. Therefore, in the specific implementation, it is necessary to distinguish between horizontal characters and vertical characters.
  • the internal coordinate system with the line baseline (alignment line) as the horizontal axis and the leftmost stroke point as the vertical axis may be used.
  • the column axis may be the horizontal axis.
  • the highest stroke point is the internal coordinate system of the vertical axis. In this way, different characters will remain in the original alignment state in the corresponding layout drawing.
  • the system can automatically perform coordinate conversion. Although the original alignment between characters cannot be preserved, each character can still be rendered normally.
  • Another example is the change of text layout into ordinary typesetting.
  • the character type is marked in the type of the character, and then the internal coordinate system of each character can be the origin of the lower left corner of the corresponding composition (actually any point, such as the center point).
  • each character is aligned with the corresponding composition.
  • There is no text space/space character in the handwritten text of the text layout but there is a space character.
  • we change the typesetting of texts into ordinary typesetting we can match each word. Recalculate, replace the coordinate system (such as the system with the above baseline and the leftmost intersection as the origin), and insert the corresponding interval character between the characters according to the new coordinate system.
  • Step 608A activate the target row/column, and perform step 609A.
  • the target row/column can be activated, and the text object in the target row/column is activated (loading stroke information), and the object sequence is assigned to AL.
  • the input of the handwritten characters is performed under the constraint of the row/column. Even if the input spans multiple rows/columns, the corresponding characters must eventually be stored in a specific location on a particular row. Therefore, the target row/column of character input can be presented in a visual manner, and the user can also avoid cross-line input through specific settings, such as auxiliary panel, full-screen line editing, and the like.
  • Step 609A Perform handwriting input under the constraint of the activated target row/column, and return to step 605A.
  • handwriting input can be performed under the constraint of the activated target row/column, and each stroke input is automatically combined with the AL according to a certain rule to form a new sequence of handwritten characters (ie, the AL is updated).
  • the input process of the handwritten characters is mainly to automatically combine the input pens into different characters according to the spatial constraints in the row/column.
  • the word spacing effect can be realized by the word spacing constraint or the text constraint. .
  • Step 610A Store the content of the AL Chinese character object, and execute step 611A.
  • the contents of the AL Chinese character object are stored, and if necessary, the AL related text content can be re-typed.
  • the character object in the AL is determined (previously changed dynamically by stroke input). Some of these character objects have not changed, some content (strokes) have changed, and some are brand new characters. Both changed and new characters are new characters.
  • the sequence of characters corresponding to the final AL needs to be updated to their corresponding position in the document. If the storage method of encoding and content splitting is used here, the content of the new character needs to be first stored in the encoding library to obtain the corresponding encoding. The new code sequence is then saved to the appropriate location in the document (typically the in-memory document model).
  • step 611A the AL is cleared, and the process returns to step 605A.
  • Step 612A the end.
  • the processing method of the handwritten input word provided by the embodiment facilitates the user to edit and process the handwritten character, thereby further improving the user's input experience.
  • the corresponding row and column scrolling rulers are set to be up, down, and left. Or expand the input range of the panel to the right, that is, the input range space of the row and column. Also, when the scale is moved, the corresponding target row/column can be displayed and/or activated accordingly.
  • the function of encoding can also be added in this embodiment.
  • the coding function in this embodiment may include:
  • mapping table in the encoding warehouse to obtain the standard language parameters corresponding to the glyphs.
  • the standard language parameters include one or several combinations: numbers, symbols, keywords, public identifiers, and private identifiers.
  • This embodiment can implement the function of encoding characters generated during the handwriting input process, which will be described in detail below.
  • Character can be Refers to handwritten characters of ideograms, such as single Chinese characters, Japanese, Korean, Arabic, Vietnamese, Burmese, etc. or parts thereof (such as radicals, etc.), or handwritten words of phonetic characters, such as English, German, French Western letters or words in Russian, Spanish, etc.; can also be computer characters based on traditional standard codes, such as ASCII characters, Unicode code characters or strings, and even control characters such as spaces, tabs, line breaks Such as special characters, etc.; can also refer to non-standard control characters, such as the spacing or spacing between handwritten characters in this article; can also be mixed with handwritten characters and standard characters and / or synthesized characters or strings; It can be any graphic, image input by the user, such as a "heart” pattern, a photo, any graffiti, etc., or any other written expression.
  • all character objects input in the above manner will be recognized as characters in a
  • the glyphs referred to in the present invention are similar to the concept of characters in a standard font, except that the present invention generates non-standard glyphs. Since the object of the present invention is not to generate a standard font or font, the resulting glyphs of the system of the present invention are likely to include erroneous splitting of various characters or words or merging between them, and may also include user input. Any graphics or images, etc.
  • compilation generation and interpretation execution For modern high-level programming languages, it can be divided into two types: compilation generation and interpretation execution.
  • the former is to convert the source code through a series of compilation and conversion, and generate a binary file that encapsulates the instruction sequence of the target machine (which can be a virtual machine).
  • Binary files need to be loaded into the target system for execution.
  • Interpretation execution refers to an interpreter running in the target system, which reads the source code and runs directly through a series of internal processing.
  • scripting languages typically JavaScript, Lua, Tcl, and so on.
  • Many traditional programming languages are compiled languages such as C, C++, Objective-C, Java, C#, go, Swift, and so on.
  • the core components of the program source code whether it is the compiler or the interpreter, have very similar front-end constructs, even the same.
  • the so-called front end refers to the conversion of source code into an internal intermediate form.
  • the backend refers to converting the intermediate form into machine code, and for the interpreter, the intermediate form is executed by the execution engine.
  • the mid-end there is also processing and optimization for the intermediate form, which is called the mid-end.
  • the focus of this article is on the front-end part, so in general, we don't make a distinction between compile and explain.
  • For the front end here is collectively referred to as the compilation front end.
  • the compilation front end can generally include four processes: lexical scanning, parsing, semantic analysis, and intermediate code generation.
  • the lexical scanner converts the source code into a tag stream; the parser converts the tag stream into an abstract syntax tree; the semantic analysis adds the abstract syntax tree to the semantic tag; the median code generator converts the tagged abstract grammar book into a compiler Intermediate form.
  • IDE Integrated Development Environment
  • the handwritten text system brings a new way of text input, which is safe and convenient.
  • the input and edit results are still character streams, but not the standard code, but the individual code of the input person.
  • the solution is quite straightforward—that is, converting personal-based proprietary encoding to standard encoding. That is to say, the handwritten source code is converted into source code that can be recognized by the normal compile front end. Therefore, the traditional compilation front end is preceded by a conversion process to process the handwritten source code, that is, the entire process can generally include five processes: handwritten source code conversion, lexical scanning, syntax analysis, semantic analysis, and intermediate code generation.
  • This encoding conversion process mainly converts and matches the handwritten source code according to the established rules, and generates the corresponding standard code content, which is separated from the glyphs in the font library.
  • the process is mainly divided into two parts: controll conversion and glyph conversion.
  • control characters in the programming language mainly include spaces, tabs, carriage returns, line feeds, and so on. Since the handwritten text can use the same or similar control characters as the normal text, this conversion is very straightforward. For example, the handwritten interval code is directly converted into a standard blank character. If the handwritten line break uses a standard line feed code directly, it can be retained without conversion.
  • the glyph conversion is mainly to convert the personalized glyph code in the handwritten source code into a pair. Should be coded in standard.
  • the basis of this conversion is the glyphs in the corresponding text font library.
  • the glyph matching service of the handwritten text system is needed. These include digital symbol mapping, keyword mapping, interface identifier mapping, and private identifier generation and mapping.
  • the digital symbol mapping is based on the user-defined glyph digital symbol mapping table, and the glyph search matching is performed in the handwritten source code, and replaced with the corresponding standard code numbers and symbols.
  • the symbols referred to herein refer to punctuation marks used in programming languages, such as addition, subtraction, multiplication and division, greater than, equal to, less than symbols, various brackets, and the like.
  • this glyph digital symbol mapping table is the key to digital symbol mapping.
  • This table is a personalized setting.
  • everyone's writing habits, strokes, and glyphs are not the same. It makes sense to find and match the glyphs of the same person. Therefore, each programmer has its own glyph numeric symbol mapping table, which can only map the handwritten source code written by the programmer.
  • programmers need to authorize specific users/accounts to share their glyph-like numeric symbol mapping tables, and their handwritten source code can be compiled/runned by others. In fact, this is an extension of the security of handwritten text during software development/running.
  • the glyph digital symbol mapping table can be a many-to-one mapping. In other words, multiple glyphs can correspond to the same number and symbol.
  • the glyph number symbol mapping table of a specific user for a specific programming language should in principle be added only to be deleted and modified. Moreover, the contents cannot conflict with each other, such as not allowing the same glyph to correspond to different numbers and symbols.
  • numbers and symbol characters in standard codes are not composed of characters in the alphabet. Therefore, when compiling a front-end lexical scan, the symbol characters are often specially processed, and one symbol can directly terminate the previous lexical mark; the identifier often cannot start with a numeric character. Similarly, we also need a special convention for the opponent to write, in order to facilitate processing. For example, it can be agreed that numbers and symbols can only correspond to independent glyphs, and cannot correspond to combinations of multiple glyphs.
  • the glyph digital symbol mapping table is generally predefined by the user.
  • the keyword mapping is also based on the mapping of the glyphs of the mapping table to the standard code.
  • This mapping table is a glyph keyword mapping table. Is a personal A many-to-one table.
  • Keywords are also crucial for the recognition and parsing of programming languages. Keywords determine the location and number of related syntax elements. Therefore, the content of the glyph keyword mapping table is generally pre-defined by the user, and can also be interactively performed during handwriting source conversion.
  • keyword mapping allows one keyword to correspond to a combination of multiple glyphs, that is, different combinations of the same glyphs can correspond to different keywords.
  • interface identifier mapping also maps glyphs to standard codes.
  • the key here is also a mapping table - glyph identifier mapping table.
  • mapping table For traditional high-level programming languages, there are more or less built-in or third-party libraries. We need to use the corresponding identifiers to access system constants, system functions, standard library functions, class libraries, and so on. These identifiers are often composed of standard code characters.
  • the glyph identifier mapping table is a mapping table between the user's handwriting and the corresponding identifier. In addition, some of the symbols in the handwritten code may also become interfaces - used and accessed by others, in which case we also need to provide the corresponding standard code identifier.
  • the set of target keywords (including system punctuation) mapped to is a well-defined closed, finite set.
  • the target identifier set is an infinite, open collection.
  • the content of the glyph identifier can be pre-defined by the user or interactively during handwritten source conversion.
  • mapping table is automatically generated by the system.
  • the content of this mapping table is the correspondence between the glyphs of the above defined symbols and the corresponding generated standard code identifiers.
  • handwritten text encoding and standard encoding In our handwritten text scheme, we can allow handwritten text encoding and standard encoding to be in the same Mixed use in one content. In the processing of handwriting programming, we also allow such content. In the source code conversion, the part of the standard code is skipped directly, and no conversion is performed. Here, in order to prevent mutual interference between the standard code generated by the handwritten text and the original standard code, we need to insert a blank character between the standard text and the non-control character handwritten text directly adjacent to each other in the conversion process.
  • handwritten text can be constrained by the glyphs of standard coded text, and the user can use any glyph or symbol. So in handwriting programming, we can use any glyph or symbol as a keyword or identifier. But in the process of using, we need to pay attention to the conflict between keywords and identifiers. If the identifier uses the same glyph as a certain keyword, the result of the conversion will often result in a syntax error. By using special glyphs or symbols for keywords, we can circumvent this conflict very well.
  • FIG. 1I is a flowchart of a handwriting program source code conversion method in a method for processing handwritten input characters according to an embodiment of the present invention.
  • FIG. 1J is a detailed flowchart of “standard code conversion for B” in the handwriting program source code conversion method shown in FIG. 1I.
  • the entire conversion process has five inputs: a handwritten program source file, a handwritten character library, a glyph numeric symbol mapping table, a glyph keyword mapping table, and a glyph interface identifier mapping table.
  • the glyph private identifier mapping table is only needed during the conversion process and can be left unused.
  • the source target location mapping table is very important, because the compilation and interpretation execution process after the conversion is completed is performed by inputting the generated standard code object file, and the corresponding system information is also based on the location information in the text file. Given. With this source target location mapping table, we can directly convert this information into the corresponding location within the handwritten source file. This provides the foundation for our entire handwriting programming environment and related aids.
  • the output is mainly a standard code program text file.
  • the conversion process can be integrated with the existing compilation front end, and the process of writing a file can be skipped, and a standard code character stream is generated in the memory for further processing.
  • the previous conversion process assumes that the glyph interface identifier mapping table is pre-defined.
  • the optimized conversion process can generate intermediate files (including complete numeric identifiers and keyword conversions) without the glyph identifier mapping table, and then according to lexical analysis, parsing And the results of semantic analysis intelligently handle handwritten identifiers.
  • a processing rule can be employed: for a handwritten symbol defined by a symbol, its standard code identifier is automatically generated; for an undefined handwritten symbol, an interactive manner is used to query the user for its identifier definition, and automatically according to user input. Generate a glyph interface identifier mapping table.
  • the deeply integrated compiler is used inside the handwritten text editor, and can also implement functions such as syntax coloring and grammatical intelligence, so as to finally realize integrated development based on handwritten characters. surroundings.
  • FIG. 1K is a schematic diagram of a handwriting program in an embodiment of a method for processing handwritten input characters according to the present invention.
  • the handwriting program in Fig. 1K corresponds to the programming language Lua language, which is an embedded scripting language.
  • the corresponding font library code can be as shown in Table 1, Table 2 and Table 3.
  • the code is converted, and the user prepares the glyph digital symbol mapping table as shown in Table 4.
  • the glyph keyword mapping table is shown in Table 5.
  • the glyph interface identifier mapping table is shown in Table 6.
  • the system sets a syntax interval threshold of 20.
  • the private identifier auto-generation rule is two underscores (_) followed by a glyph code sequence connected by an underscore.
  • the first identifier is actually a comment content, meaningless. If we use an optimized conversion process, we can omit the conversion directly when it is identified as a comment.
  • This generated program can be interpreted and executed normally by the traditional Lua interpreter, and its execution semantics are exactly the same as those in the handwritten source code.
  • the method may further include:
  • the protocol is stripped according to the preset metadata, the metadata of the saved handwritten text is obtained, and the obtained metadata is stripped from the handwritten text;
  • the handwritten text is divided into at least two pieces of data according to a preset data content splitting specification.
  • the method may further include:
  • the encoding repository selects or creates an encoding specification according to at least a portion of the metadata, and generates a correspondence corresponding to the metadata according to the encoding specification
  • Encoding according to the encoding protocol encoding the handwritten text, obtaining an example encoding, and acquiring a text encoding corresponding to the handwritten text according to the meta encoding and the example encoding; and receiving the encoding warehouse
  • the text code returned, the text code is a reference code form or a content code form.
  • processing procedure of the data splitting can be referred to the specific introduction of the embodiment of the data splitting method in the instruction manual.
  • the specific process of the encoding processing can be referred to the specific introduction of the embodiment of the subsequent encoding processing method of the specification. Let me repeat.
  • FIG. 1L is a schematic structural diagram of an embodiment of a device for processing handwritten input characters according to the present invention.
  • the processing device for handwriting input characters in this embodiment may include:
  • the acquiring module 1001A is configured to collect, in the currently activated first target row/column, a stroke of the user input and corresponding input information; wherein the input information includes the stroke in the first target row/column Input position
  • a attribution module 1002A for each stroke according to the stroke in the first target line / An input position in the column, or an input position of the stroke in the first target row/column and a character specified in the first target row/column, creating a new character for the stroke or determining the The character to which the stroke belongs.
  • the handwriting input character processing device in this embodiment may be used to perform the method for processing the handwritten input character shown in FIG. 1A.
  • the specific implementation principle may refer to the foregoing embodiment, and details are not described herein again.
  • the handwriting input character processing apparatus acquires a stroke input by the user and corresponding input information in the currently activated first target row/column, and is in the first target row/column according to the stroke An input position in the input, or an input position of the stroke in the first target row/column and a character specified in the first target row/column, creating a new character for the stroke or determining the stroke
  • the attribute of the attribution can realize the effect of typing on the side of the input. The user does not need to distinguish the different characters by means of explicit or implicit "start single text input” or "end single text input” commands.
  • the writing process is smooth and efficient, and the input position of the stroke is directly determined by the method to determine the character to which the stroke belongs, without standardization. Character recognition, thus retaining the personalized information and writing style and features of the user's handwriting input.
  • the collection module 1001A is further configured to:
  • the row height/column width information is a default value or determined by the user input
  • the position range of each row/column refers to a relative top edge position and a bottom edge of each row in the handwriting input screen. a position or a column of opposite left and right positions in the handwriting input screen;
  • Target row/column selection message receives a target row/column selection message input by the user, where the target row/column selection message includes an identifier of the target row/column to be input by the user;
  • a row/column corresponding to the identifier of the target row/column to be input by the user is used as the currently activated first target row/column.
  • the acquisition module 1001A is further configured to:
  • the position range refers to a relative top side position and a bottom side position of the first target line in the handwriting input screen or a relative left side position and a right side position of the first target column in the handwriting input screen.
  • the collection module 1001A is further configured to:
  • the second target row/column is the currently activated target row/column, and the second target row/column is the next row/column of the first target row/column.
  • the acquisition module 1001A is further configured to:
  • the second target line is/ The column is the currently activated target row/column to enable acquisition of the stroke of the user input in the second target row/column;
  • the second target row/column is the next row/column of the first target row/column.
  • the acquisition module 1001A is further configured to:
  • the first target line/ The column and the second target row/column are simultaneously the currently activated target row/column;
  • the second target row/column is the next row/column of the first target row/column.
  • the first target row/column and the second target row/column are simultaneously the currently activated target row/column, the first target row/column and the second target row/column are both partial region activated. ;
  • a starting position of the active area of the first target row/column is set between an end position of an active area of the second target row/column and an end position of an active area of the first target row/column.
  • the home module 1002A is specifically configured to:
  • the stroke is associated with at least one character
  • the stroke is attributed according to the associated at least one character.
  • the specified character is all characters that are already in the first target row/column
  • the specified character is a character in the area to be compared in the first target row/column, wherein a distance between a boundary position of the area to be compared and the stroke is less than a second preset threshold.
  • determining the association between the stroke and the character Sex can include:
  • comparing the input position of the stroke in the first target row/column with position information corresponding to a character specified in the first target row/column, and determining between the stroke and the character Relevance can include:
  • comparing the input position of the stroke in the first target row/column with position information corresponding to a character specified in the first target row/column, and determining between the stroke and the character Relevance can include:
  • At least two characters associated with the stroke If there are at least two characters associated with the stroke, at least two characters are combined and the stroke is attributed to the merged character.
  • the performing the attribution processing on the stroke according to the at least one associated character may include:
  • At least two characters with the strongest association with the stroke at least two characters are merged, and the stroke is attributed to the merged character.
  • At least one character associated with the stroke is sorted in order from small to large, and the character corresponding to the minimum distance is used as the most relevant to the stroke. Strong character; or,
  • At least one character associated with the stroke is sorted and the first character is used as the character most strongly associated with the stroke.
  • the collection module 1001A is further configured to:
  • the home module 1002A can be specifically configured to:
  • the stroke is attributed to an existing character in the composition; otherwise, a new character is created in the composition, the stroke being attributed to the new character.
  • the collection module 1001A is further configured to:
  • the characters to be searched are compared with the locally saved characters according to the number of strokes of the character to be searched and the stroke feature, and characters matching the characters to be searched are obtained.
  • the collection module 1001A is further configured to:
  • the new character or the attribute that is created by the acquired stroke is saved every preset time;
  • the acquisition module 1001A is also used to:
  • the saved characters are stored in the second memory, and for each saved character, the characters include a stroke constituting the character and an index corresponding to the stroke;
  • the index corresponding to the stroke points to the input information corresponding to the stroke in the first memory.
  • the input information corresponding to the stroke further includes one or a combination of the following: an input time of the stroke, an input strength of the stroke, and an input speed of the stroke.
  • the input time includes a pen down time and a pen up time of the stroke, and a dwell time of each point in the stroke of the stroke;
  • the input position includes at least: a position when the pen is dropped, a position when the pen is lifted, and a coordinate position of each point in the handwriting of the stroke.
  • the collection module 1001A is further configured to:
  • the correction request including a character to be corrected, or a character to be corrected and a stroke to be corrected;
  • the correction request is a merge correction request, and the character to be corrected is at least two characters to be merged;
  • the correcting processing is performed on the character to be corrected according to the correcting request, including:
  • the correction request is a split correction request
  • the character to be corrected is a character to be split
  • the correcting processing is performed on the character to be corrected according to the correcting request, including:
  • the correction request is a home correction request
  • the character to be corrected is a character to be vested
  • the stroke to be corrected is at least one stroke to be corrected
  • the correcting processing is performed on the character to be corrected according to the correcting request, including:
  • At least one stroke to be corrected is attributed to the to-be-vested character.
  • the collection module 1001A is further configured to:
  • the insertion request including a target row/column to be inserted, a to-be-inserted position in the target row/column to be inserted, and a character to be inserted;
  • the collection module 1001A is further configured to:
  • the selection processing command includes any one or a combination of the following: performing copy processing on the at least one character, performing cut processing on the at least one character, and performing replacement processing on the at least one character And performing a merge process on the at least one character.
  • the number of the first target rows/columns is plural;
  • the active areas corresponding to the plurality of the first target rows/columns do not overlap and are not in contact with each other.
  • the collection module 1001A is further configured to:
  • the handwriting mode is switched to the target mode, and in the target mode, at least one standard character input by the user is received.
  • the collection module 1001A is further configured to:
  • mapping table in the encoding warehouse to obtain the standard language parameters corresponding to the glyphs.
  • the standard language parameters include one or several combinations: numbers, symbols, keywords, public identifiers, and private identifiers.
  • the data splitting of the present invention is a solution that can effectively solve the above problems.
  • 2A is a flowchart of a data splitting method according to an exemplary embodiment. As shown in FIG. 2A, the present invention provides a data splitting method, including:
  • step 101B when receiving the storage request carrying the identifier of the data to be stored, the protocol is stripped according to the preset metadata, and the metadata in the data object corresponding to the data identifier to be stored is obtained.
  • Step 102B Strip the acquired metadata from the data object.
  • Step 103B Split the data according to the preset data content, and divide the data content into at least two data segments.
  • the method may further include:
  • step 104B the metadata and each data segment are separately stored in different storage bodies or in different secure channels.
  • the data splitting method of the embodiment when receiving the storage request carrying the identifier of the data to be stored, according to the preset metadata stripping rule, obtaining the metadata in the data object corresponding to the data identifier to be stored, and the metadata is The data object is stripped; the data content is split into multiple data segments according to the preset data content; and the metadata and each data segment are separately stored in different storage bodies or in different secure channels.
  • FIG. 2B-1 is a flowchart of a data splitting method according to another exemplary embodiment. As shown in FIG. 2B-1, the present invention provides a data splitting method, including:
  • Step 201B Receive a storage request carrying an identifier of the data to be stored.
  • the data splitting method may be applied to a device such as a terminal (client device) or a network (server device).
  • the device receives a storage request carrying a data identifier to be stored
  • the storage request may be triggered by the terminal application, for example, a mail.
  • the system, the desktop agent and other applications mentioned above take the mail system as an example.
  • the mail system sends the file data
  • it receives the storage request carrying the identifier of the data to be stored, and the data splitting device of the mail system first disassembles the file data. Sub-processing, so that the recipient of the mail needs to obtain the file data fragment from each specified storage body to get the complete
  • the file data is triggered by the user.
  • the data splitting device receives the storage request carrying the data identifier to be stored, and then splits the file.
  • the identifier of the data to be stored may be the name of the file data, and the identifier information such as the message digest algorithm (MD5 code).
  • Step 202B If the metadata specified in the preset metadata stripping protocol includes: attribute information, determine, in the data object corresponding to the data identifier to be stored, the attribute information content that matches the attribute information as metadata.
  • the process of stripping metadata is to separate the metadata of the data object, especially the key metadata, from the data object from its original location, so that only the data content and/or other metadata information remaining cannot be obtained.
  • the key metadata is security-related metadata. Once these key metadata are missing, the system will not be able to read, identify, decode or restore the corresponding data objects.
  • the file type is a key metadata.
  • the file extension is removed
  • the system cannot open the file content normally.
  • Storing file type information and file content data in different cloud storages will cause certain difficulties for malicious attackers or service providers to obtain complete data.
  • Different types of data have different key metadata.
  • tabular data a spreadsheet or database table, etc.
  • its header field name
  • metadata can also cover a wider range. As long as the security of the data is beneficial, any information related to the data content can be separated from the data content itself as metadata.
  • the metadata includes: attribute information; the attribute information is information capable of identifying a unique property of the data object, and is composed of some descriptive information to help find and open the data object. Attributes are not included in the actual content (data content) of the data object, but rather provide information about the data object. It can include a lot of information such as the size of the data object, the type of data, the date the creation was modified, the author, and the rating. Since the attribute information can be set by the person skilled in the art according to the nature of the data object, the content included in the above attribute information is only an example, and is not a limitation on the content of the attribute information.
  • the metadata agreed in the preset metadata stripping protocol includes: a data content identifier and a keyword
  • the data content matching the keyword is determined as metadata from the data content in the data object according to the data content identifier.
  • the data content identifier is used to prompt the extraction location of the metadata from the data content portion, and the keyword is used to indicate the data content that needs to be extracted specifically; the data content matched with the keyword may be key information or sensitive information contained in the data.
  • a number of keywords associated with the account information can be set to extract sensitive information in the account as metadata storage. For example: account number, user ID, user phone, address, etc.
  • the metadata agreed in the preset metadata stripping protocol includes: attribute information, a data content identifier, and a keyword
  • the attribute information content matching the attribute information in the data object is determined as metadata, and according to the data content identifier, From the data content in the data object, the data content matching the keyword is determined as metadata.
  • the strategy for generating the default metadata stripping protocol can be determined by the developer, or it can allow the user to define the applicable protocol.
  • the system needs to do so to present the metadata to the user as comprehensively as possible, and the user can preset the most based on the information.
  • the preset metadata stripping protocol is built into the data splitting system. As in the previous mail client example, the preset metadata stripping protocol can be built into the mail system application.
  • the preset metadata stripping protocol may also be stored with the metadata as part of the metadata content, so that when the recipient merges the data, the data object is merged with reference to the preset metadata stripping protocol.
  • the attachment file (data object) to be sent is split, and the metadata of the attachment file may be: file name, file type, file size, creation time, and the like.
  • the result of file metadata stripping is stored in the file meta information system.
  • the method of dividing the file content and the segmentation result information, such as the hash value or ID of the file fragment, and the storage location of the file fragment, are also stored in the file meta information system. And associated with the corresponding file metadata.
  • all of the content stored in the file meta-information system constitutes an example of this split/peel protocol.
  • Step 203B Detach the acquired metadata from the data object.
  • Stripping also referred to as splitting, refers to metadata that is selected from the data objects that are associated with the data object's split/peel processing.
  • the system will separate the metadata from the data object based on the default metadata stripping protocol (which can be system default or user-selected or user-defined).
  • the statute records information such as rules, constraints, and methods related to metadata split/peel processing. For example, but not limited to: stripping location information of metadata, stripping method of metadata, encoding scheme, information related to stripping encoding, content splitting rules, and other content splitting Closed data and / or information.
  • the metadata may be a complete set or a subset of the metadata of the data object.
  • the type of metadata please refer to the various situations in step 202B above.
  • splitting data such as splitting a data object into multiple segments according to a predetermined rule and saving them separately.
  • this method can not achieve more fine-grained encryption means, and can not separate the important information (metadata) closely related to the data object from the data content itself.
  • the invention adopts a new data splitting method to realize the splitting of data objects. This method not only splits the data object into finer granularity (for example, in characters or even in bits), but also can transfer important information (ie, metadata) closely related to the data object and the data content itself. Peel off.
  • the stripped metadata, data content, and/or the code to be mentioned later can be stored separately in different storage locations or spaces, or under different secure channels, thereby realizing the security of data storage more reliably. .
  • Step 204B Split the data according to the preset data content, and divide the data content into at least two data segments.
  • Content splitting refers to dividing the data content in a data object into several (more than one) segments according to certain rules.
  • the figurative metaphor is like tearing a piece of paper into pieces.
  • content splitting is not necessary, and can be determined according to actual needs. Applications that do not require high content confidentiality may not be split.
  • the content splitting method can use RAID disk array technology to divide data into multiple blocks and write multiple disks in parallel to improve the read and write speed and throughput of the disk.
  • Domain-related content splitting can be divided into domain-related content splitting and domain-independent content splitting.
  • Domain-related content splitting is mainly based on the characteristics of specific domain data, the data is split. For example, structural splitting for specific file formats, or splitting key or sensitive information within the data. The latter may have some overlap with the metadata stripping (when the metadata is in the data).
  • the bank's statement can be stripped of the account information as metadata, or the account information can be split as a data segment for split storage.
  • the preset data content splitting protocol may include at least one of a disk array RAID splitting algorithm and an information dispersed IDA algorithm.
  • Algorithmic researcher Michael O.Rabin first proposed the Information Decentralized IDA algorithm in 1989 to slice data at the bit level so that it is unrecognizable when the data is transmitted or stored in the array, only with the correct density. The user/device of the key can access it. This information is reassembled when accessed with the correct key.
  • information-distributed IDA algorithms and related derivative algorithms have been widely used.
  • Step 205B Perform separation processing on each data segment according to a preset encoding separation specification to obtain a code corresponding to each data segment.
  • each data segment is separately encoded to obtain a code corresponding to each data segment, including:
  • Decoding a protocol according to a preset encoding querying an encoding warehouse, selecting or creating an encoding specification according to at least a part of the metadata, and generating a meta encoding corresponding to the metadata according to the encoding specification; and respectively, according to the encoding protocol, respectively Encoding each data segment to obtain an instance code corresponding to each data segment;
  • each data segment and the metadata transmitting, according to a preset encoding separation protocol, each data segment and the metadata to the encoding warehouse, so that the encoding warehouse selects or creates an encoding specification according to at least a part of the metadata, and generates according to the encoding protocol.
  • a meta-code corresponding to the metadata and according to the coding protocol, respectively encoding the respective data segments to obtain an instance code; and receiving the meta-code and the instance code returned by the coding warehouse.
  • Step 206B Arrange the respective codes according to the original order of the data segments in the data content to obtain the coded arrangement order information.
  • the data splitting method of the present invention covers two different data processing means, one is the stripping of metadata and encoding, and the other is the splitting of data content.
  • the stripping of metadata has been explained in the foregoing.
  • the stripping of the code here refers to splitting the data content into n pieces of data, and then storing or storing the n blocks separately, and obtaining the corresponding code (number) of the number (number). There may be repetitions in which the codes (numbers) are arranged in the order in which they appear.
  • This encoding (numbering) sequence contains the encoding information as well as the encoding ordering information, and the encoding result can be stored in another secure channel.
  • the encoding is different from the previous data fragment, and splitting it out can be called stripping.
  • splitting it out can be called stripping.
  • the metadata portion and/or the encoded portion are further split processed to achieve a finer-grained protection effect.
  • the above-mentioned stripping and splitting can be combined indefinitely, depending on system requirements and processing capabilities.
  • code stripping is based on content splitting, that is, content splitting is to split some or all of the data content according to certain rules, and encode the addressing mode of each split data. .
  • the final encoded result is formed into separate data.
  • reference codes for data are ubiquitous. Such as the key (Address) of the data record in the database; the abbreviated URL (http://dwz.cn/mzot4) for the URL input and reference; the access identifier used in the cloud storage programming interface (API), and so on.
  • These encoding methods can all be used by the encoding mentioned above. If the result of the splitting of the data part is encoded, the encoded result will replace the original corresponding data.
  • the encoding may not be based on content splitting. For example, for data with low security levels, it is not necessary to split the data content. At this point, it is sufficient to give the entire data content a code if necessary, but it may still be necessary to separate the code from the data content. It can be seen that the code stripping of this embodiment is different from the traditional content splitting, and is different from the existing data reference encoding, but a combination of the two. As long as the coding results (including the code itself and its corresponding combination order) are separated from the data content, the security risk of the data can be reduced to some extent. For example: there are 6 bytes of data ACBDAC, split the two bytes of data into the database. AC returns code 1, and BD returns code 2. The result of this data is the sequence of 121, not just 1 and 2. Wherein, the numbers 1 and 2 represent codes; and the arrangement rules of 1, 2, and 1 are coded arrangement order information.
  • the above-mentioned metadata, encoding, and data content stripping/split methods are not mutually exclusive, and they can be used in combination.
  • the other metadata are put together, as long as they are separated from the data content portion; more preferably, the three parts (metadata, encoding parts, data content) are separated according to their respective splitting rules.
  • the steps 202B to 206B are the order in which the content splitting, the metadata stripping, and the encoding stripping are not performed, and they may be performed separately or may be performed at the same time or simultaneously.
  • the encoding operation of the present invention needs to be performed during or after the content splitting process.
  • the metadata stripping can be done before the content is split, the metadata stripping can also be performed after the content splitting and encoding assignment is completed.
  • other data processing methods such as data compression, encryption, and the like may be mixed. It is also possible to add a description of compression and encryption to the various protocols mentioned above, but at this time it is best to re-execute after performing compression and/or encryption.
  • the split step for the metadata is also possible.
  • Step 207B Store the metadata, the code corresponding to each data segment, and the coded sequence information into different storage banks or different secure channels.
  • the metadata agreed in the preset metadata stripping protocol includes: a data object identifier
  • the rule is stripped according to the preset metadata, and the element in the data object corresponding to the data identifier to be stored is obtained.
  • the data includes parsing the data object to generate a data object identifier uniquely corresponding to the data object.
  • step 204B when the data object is audio data, step 204B, according to the preset data content splitting specification, dividing the data content into the at least two data segments may include: adopting a time domain analysis method or a frequency domain typing method, Performing a splitting process on the audio data to obtain an audio data object to be encoded; wherein the audio data object to be encoded includes a sound wave segment and/or a silent segment.
  • voice data and related processing have always been second-class citizens. The reason is mainly caused by the current input, storage and processing methods of voice data and corresponding technical limitations. People now mainly use two methods to process and use voice input through computers and networks: voice calls and voice recognition.
  • Voice call mainly refers to converting the voice signal output by a person into a digital signal through a computer sound capture device, and then through a computer and a computer network or a communication network (here mainly based on packet-switched voice technology, such as VoLTE, based on circuit-switched voice) The technology has nothing to do with the problems we discussed) processing, transmission and storage, and finally played back through the digital audio playback device.
  • Voice calls can be real-time or non-real-time; they can be one-way or two-way.
  • the main problem with current voice calls is the large amount of data, which is not easy to transfer and store.
  • the current audio sampling rates of sound cards are mainly 11KHz, 22KHz, and 44.1KHz.
  • the sound obtained at 11KHz is called telephone sound quality (the telephone uses 8KHz sampling rate), which basically makes people distinguish the voice of the caller; 22KHz is called broadcast sound quality; 44KHz is CD sound quality.
  • Another sampling parameter is the sampling resolution, which refers to the size of a sound signal (generally the amplitude of the sound wave).
  • the common ones are 8 and 16 and the 8 bits can divide the sound signal into 256 levels.
  • the bit can divide the sound signal into more than 60,000 levels. It can be calculated that the data size of the 8-bit stereo (left and right channel) audio signals sampled at 11KHz in 1 second is 22 KB. This is equivalent to the amount of data in Chinese characters of more than 10,000 words.
  • Audio files such as MP3, WMA, MOV, etc.
  • network streaming protocols such as PTSP, MMS, RTP, RSVP, etc.
  • Speech recognition as we already know, literal data is the first class citizen of current computer systems. Text data is standardized, easy to store, easy to view, find, retrieve, and process. Therefore, speech recognition that converts speech input into text data can make more efficient use of the input data.
  • the human natural voice output contains information other than the corresponding text content.
  • the speech recognition is converted into standard text content, the original speech data is generally not retained, and in fact, this part of the information is lost. These information mainly include voice, intonation, tone, tone, pause, etc., which may contain emotions, emotions, and so on.
  • the recognition rate problem is that speech recognition has not yet become a major obstacle to human computer input.
  • the data of the voice call maintains the original voice information, but the amount of data is large, and is not conducive to the automatic analysis and processing of the computer.
  • speech recognition can generate text data, which is convenient for computer transmission, storage, analysis and processing, some original speech information is lost in this process; and the accuracy and reliability of current speech recognition are not guaranteed, and there is no effective Ways to get the sound sample data of most people to improve the recognition rate.
  • This embodiment proposes a compromise method to process the original voice data so that both the original voice data and the text data are saved, which facilitates the transmission, storage and analysis processing of the computer.
  • This text data is not a standard text encoding, but a private encoding for a specific person.
  • the voice data corresponding to the code is stored in a specific text code warehouse, and the voice data in the code warehouse is differentiated and coded according to different users. Users can set access rights for different users for their own voice data.
  • the system is roughly divided into two parts: the code repository and related services surrounding the data.
  • the process of voice input is as follows: 1. The user logs into the code warehouse and selects the voice text input system; 2.
  • the voice text input system registers a series of encoders according to the current user to the code warehouse; 3.
  • the user inputs the system to the voice text. Input continuous speech; 4, voice text input system stores the user's input into the input buffer; 5, the voice text input system divides the voice data in the input buffer according to certain rules to form different data objects; 6, voice text input
  • the system submits the data to the data warehouse through the corresponding encoder, and obtains the corresponding code; 7.
  • the voice text input system stores the obtained code into the text input result, and clears the corresponding input buffer content; 8. Repeat 3 to 7 In the step, the voice text input system continuously obtains the user input and its corresponding code; 9. When the user stops inputting and there is no data in the input buffer, the entire voice input process is completed.
  • FIG. 2B-3 is a time-domain analysis of a piece of audio data, defining an amplitude less than a certain range (here 0.005), and the time is a period of time (here 20ms) is muted. For mutes less than 50ms, we divide directly from the middle, which belongs to one segment before, and then belongs to another segment. For muting greater than or equal to 50ms, we divide from the beginning and the end of the muting.
  • the method of separating the encoding and the content can easily place the encoding and the data content in different secure channels, and has natural security.
  • the voice data stored in the code warehouse is directly related to a specific person, and naturally can be well used as a training sample for analysis and organization.
  • the existing speech analysis and recognition technology can analyze and identify a lot of useful information, such as pitch, tone, pitch, syllable, etc.; and extract more effective feature parameters, such as MFCC parameters, LPCC parameters, etc. Wait.
  • These can be stored in the code repository to provide further coding services for the corresponding speech coding.
  • Voice text output for the obtained voice text content, that is, the encoding result, there are two different output modes, one is graphic output based on text display output, and the other is audio playback based on voice playback. .
  • Graphic output, graphic output of voice text refers to the presentation of voice text in the way of ordinary text, that is, text layout output.
  • the advantage is that the text processing can be processed and processed using existing word processing methods and tools.
  • the support of voice text output can also allow voice text and traditional text, as well as other forms of text (such as graphic text, image text, etc.) appear in the same text document, supporting more colorful applications.
  • the system can present continuous speech text encoding (including speech data encoding and mute duration encoding, etc.) as a whole, for example: "+ an unauthorised speech text (9 characters, 4 silent characters; mute duration total 2 '369)"
  • continuous speech text encoding including speech data encoding and mute duration encoding, etc.
  • the system can also provide relevant search functions, such as a silent search (with or without constraints).
  • the system can display more relevant information and allow the user to play the voice content, for example, display "+ voice content, duration 8" (5 voices) Character, 4 mute characters; mute duration total 2'369) "When the user expands the voice text, more details can be obtained, as shown in Figure 2B-7.
  • Voice text is graphically output and can be visualized in a variety of formats, such as displaying waveforms, spectrograms, visualization durations, etc., depending on the specific application requirements.
  • results of the analysis of the phonetic characters, or the semantic tags added by the user to the characters can also be presented simultaneously.
  • the third and fourth audio characters are also displayed based on the results of the Chinese Pinyin phonetic analysis.
  • the associated system text search can also provide more search control, such as searching based on semantic tags entered by the user.
  • the system decomposes its metacode according to the target character encoding.
  • the system submits a character meta code to the code repository.
  • the encoding warehouse checks the access rights according to the meta code and the current user. If access is disabled, an error message is returned to the system; the system performs a graphical output based on the character encoding; the process ends. If access is allowed, the corresponding encoded metadata is returned to the system; the process continues.
  • the system decomposes the instance code according to the target character encoding.
  • the system parses the instance code according to the encoded metadata. Specifically, if it is a mute character, the instance code is parsed into a mute duration; if it is an audio character, the character code is submitted to the code repository.
  • the encoding repository checks the access rights according to the audio encoding settings and the current user. If access is disabled, an error message is returned; if access is allowed, the corresponding voice data is obtained and returned to the system.
  • the system outputs the characters according to the parsed or obtained data.
  • the waveform data is recovered according to the voice data, and played out.
  • the system needs to obtain all corresponding phonetic characters and related data, and graphically output the visualized form according to certain typographic rules. If the user's play request is obtained, the play buffer is established, and the audio data is played back in turn (while taking into account the play of the silent characters).
  • Voice playback the voice playback output of voice text is similar to the playback of traditional audio data, and does not need to consider the graphic layout of text. However, the playback of voice text is also based on the user's access rights. The voice text can be played only if the user has obtained the access rights of the voice text corresponding to the data.
  • rich search positioning can be performed on voice text, such as searching according to voice duration, mute duration, semantic tags, mixed text in voice text, and the like.
  • Voice text editing by encoding the text of the audio data, makes it possible to edit the voice data in the manner of traditional text editing.
  • the user can conveniently delete, insert, modify, etc. any character, and can also perform traditional text encoding operations such as searching, replacing, copying and pasting.
  • Some of these operations require the use of specialized audio services. For example, change the mute duration, divide an audio character into multiples, combine multiple speech characters into one, and so on.
  • Noise cancellation audio data recorded in normal environments generally have ambient noise. After it is segmented and encoded, it will be played back. Does the noisy voice character data play with the noiseless mute character, will it sound strange?
  • the sound frequency that the human ear can recognize ranges from 20 Hz to 20 kHz.
  • the frequency of the sound emitted by the human body vocal organs is about 80 Hz to 3400 Hz; while the frequency of the human voice is usually 300 Hz to 3000 Hz. For a specific individual, this frequency range is generally more limited.
  • the volume of conversations of normal people indoors is between 20 and 60 decibels. According to this frequency range, we can automatically remove high frequency and low frequency noise. With low decibel delay, we can perform voice detection and automatically get a silent section. Through the spectrum analysis in the silent section, noise filtering can be performed on the entire audio data. It should be noted here that some of the mute segments will have the same frequency range as the audio data. When performing automatic filtering, we must ensure that the audio of the non-silent segment is not processed into a low-decibel silent segment.
  • Real-time voice call since this method is based on the segmentation of voice data, is this method not applicable to voice applications with high real-time requirements? Indeed, this method is still applicable for voice applications that can allow a delay of a few seconds. If the real-time requirements are high, speech segmentation is not possible. However, for these applications, the method can be used to record the voice, which avoids the problems of large amount of traditional voice recording data and difficulty in editing.
  • Voice transmission in traditional voice call applications, voice data can be directly transmitted to the receiver.
  • the voice text is transmitted to the receiver, and the receiver obtains the real voice data from the code warehouse. Will this process be inefficient?
  • the transmission can completely hide some or all of the voice data after the voice data is transmitted.
  • the receiver cannot play in whole or in part even if it receives the voice code. This is not possible in traditional voice call applications.
  • the amount of actual data, the encoded content of the audio data is indeed much smaller than the original audio data, but for the users who ultimately need to use or play the original voice content, the amount of data has not decreased, but has increased ( Voice text encoding part). So, can we say that it is a defect of this method? It is undeniable that for a specific segment of speech, if the final playback can restore the original input, the amount of data is not reduced (this ignores noise cancellation). However, it must be seen that by centralizing the personalized voice data into the code repository, there is actually a significant amount of redundancy. By processing this redundant information, storage efficiency and transmission efficiency can be greatly improved. Below we specify this.
  • the sound that can be emitted in a lifetime is limited.
  • the basic elements/syllables are more limited considering language limitations.
  • the combination of the elements is also very limited.
  • the specific phonemes that can be formed are limited.
  • the voice data is cut into a continuous sound frame.
  • a sound box is generally 10ms to 40ms, and there can be some overlap between the frames. Appropriate frame segmentation facilitates audio analysis and further parameterizes the audio data for ultimate reuse.
  • Some existing audio fingerprint extraction and matching methods can be used to detect redundant voice data well, to implement content normalization, search matching and other services in the code warehouse. For example, Google's Waveprint method (patent US 8411977 B1).
  • Non-speech audio data here is the emphasis on voice data, then for non-speech audio data, such as music, video and audio track data, etc., is this method also applicable?
  • the method of this paper does not change the original data, but it is divided and encoded.
  • the original content is divided into the encoded stream and the corresponding audio data in the encoding warehouse.
  • Final playback will still be able to fully restore and play the original audio. In this sense, there is no problem with using this method.
  • the text obtained by this method is personal and relevant to a particular user. This also ensures subsequent speech analysis, identification and other highly personalized services for the user. If music or other sounds that are not related to the individual user are stored in the code repository and associated with the user, it will actually affect the subsequent personalized service. Therefore, it is better to find ways to divide voice data into other audio channels. Use other coding classifications for other audio data, such as instrument-related coding for music. Finally, data that divides different audio characters into multiple channels is mixed together.
  • the method further comprises: generating a coding order information unique identifier based on the encoded arrangement order information, and/or generating a respective data segment unique identifier based on each data segment, the coding order information unique identifier and/or each data
  • the fragment unique identifier is stored as part of the metadata.
  • the data object identifier, the encoding order information unique identifier, and the data fragment unique identifier uniquely corresponding to the data object are respectively hash values corresponding to the data object, the encoding ordering information, and the content of each data fragment (eg, MD5, SHA1, etc.) ), or a globally unique identifier (UUID/GUID) generated by the system or any other globally unique encoding.
  • the identifier can be used to perform integrity check on its corresponding content to verify whether the identifier matches its corresponding information, and whether the corresponding information is complete.
  • data splitting refers to splitting a complete piece of data into two or more copies, which are then stored in different storage systems.
  • the purpose of the data splitting of the present invention is not only to store but to Data splitting for data security purposes.
  • data stored in a cloud provider users may not trust, but through data splitting, a piece of data can be stored in one or more vendors, and only all data is leaked (including metadata, each Data fragment) can lead to data leakage. This greatly increases the difficulty of illegally merging data.
  • the data splitting of the present invention allows the end user of the data (i.e., the user entitled to own the data) to directly intervene and control.
  • the data splitting method is built on the operating system (including the cloud operating system), specifically in the application system for splitting purposes, or in the splitting service of other application systems.
  • the storage system is built on the storage physical device, the infrastructure under the operating system.
  • the data splitting method of the present invention will eventually use a data storage system.
  • 2C is a diagram showing the positional relationship of a data splitting method in a computer system hierarchy according to the present invention, showing the location of the application field of the present invention in the computer system hierarchy.
  • the splitting and merging of data can be done at the terminal or by the server or service provider.
  • the data obtained from a cloud storage server is not complete and is not enough to pose a threat to the privacy and confidentiality of the user.
  • An attacker needs to obtain the identity of the same user in different cloud storage services in order to get different pieces of data that make up the complete data. This difficulty is often much greater than cracking a single system.
  • the merged specification can restore the fragment data to the original complete data. This gives the user's data an extra layer of protection.
  • the hacker can attack the user's terminal system to obtain complete data before or after the user's spin-off.
  • the mail server can be a conventional mail server
  • the attachment needs to be added to the mail
  • the content of the attached file is split.
  • several of them are stored in the cloud storage specified by the user, and several others are saved in the mail as ordinary attachments.
  • the mail cloud application system can register the metadata and the split information (the default metadata stripping protocol, etc.) in the original attachment file to the file meta-information database (an online service system, Both the sender and the recipient must have an account), and the corresponding data access link can be automatically set for the sender according to the settings of the client.
  • the file meta-information database an online service system, Both the sender and the recipient must have an account
  • the corresponding data access link can be automatically set for the sender according to the settings of the client.
  • the recipient there is no fragment of the data on the terminal side before it downloads the attachment.
  • the actual storage of data is distributed among the cloud storage, the mail server, and the corresponding metadata in the file meta-information. Of course, this data also exists in the sender's terminal (if the sender is not using a distributed file system and the file has not been deleted).
  • the system can automatically locate the corresponding item in the file meta-information according to the content stored in the email as a normal attachment, and then locate the cloud. Part of the content in the store, and restore according to the corresponding split method, and finally restore the original raw data on the recipient's client.
  • the account information required by the recipient's mail client is pre-set. There are at least three accounts involved here: the mail system, the cloud storage system, and the file meta-information system.
  • FIG. 2D is a flowchart of a data merging method according to an exemplary embodiment.
  • the present invention provides a data merging method, including:
  • Step 401B Receive a data object acquisition request carrying the identification information.
  • the identification information includes positioning information, and the positioning information is used to locate a storage address of part of the data information in the data object.
  • Step 402B Acquire storage content corresponding to the positioning information, and obtain data information in the other storage content according to the obtained positioning information in the stored content until all data information of the data object is obtained.
  • Step 403B Combine the acquired data information according to the preset merge rule in the acquired data information to obtain a data object.
  • the data merging method of the embodiment obtains the data object acquisition request carrying the identification information, obtains the storage content indicated by the positioning information according to the positioning information in the identification information, and acquires other storage content according to the positioning information in the storage content.
  • the data information is obtained until all the data information constituting the data object is acquired.
  • the obtained data information is combined and processed to obtain a complete data object.
  • FIG. 2E is a flowchart of a data merging method according to another exemplary embodiment. As shown in FIG. 2E, the present invention provides a data merging method, including:
  • Step 501B Receive a data object acquisition request carrying the identification information.
  • the identification information includes positioning information, and the positioning information is used to locate a storage address of part of the data information in the data object.
  • the type of data information is one or more of the following combinations: metadata, data fragments, encoding, and encoding order.
  • Step 502B Acquire storage content corresponding to the location information, and obtain data information in the other storage content according to the location information in the obtained storage content, until all data information of the data object is obtained.
  • Step 503B Combine the acquired data information according to the preset merge rule in the acquired data information to obtain a data object.
  • one or more pieces of data information are obtained according to the positioning information (the data information may be a piece of data that is split, or may be part or all of the metadata, or may be part or all of the encoding and encoding order), according to a specific rule. That is, the preset merge protocol gradually acquires corresponding data information according to one or more data information, and combines the data information together (ie, metadata, data pieces) The merge, encoding, encoding order, etc. are combined to recover the original data object.
  • the specific merger is as follows:
  • the encoding operation is performed according to the merging algorithm in the preset merging protocol, and the data segment corresponding to the encoding is obtained; and each of the decoded data is decoded according to the encoding order.
  • the data segments are arranged to obtain data objects arranged in the original order of the respective data segments.
  • Metadata agreed in the preset merge specification includes: attribute information, integrity verification is performed on the data objects merged by each data segment according to the attribute information, to confirm that the attribute of the data object matches the attribute information in the metadata; or,
  • the metadata agreed in the preset merge specification includes: a data content identifier and a keyword, the data matching the keyword is merged into the data segment corresponding to the data content identifier, and then each data segment is merged to form a data object. ;or,
  • the metadata agreed in the preset merge specification includes: attribute information, data content identifier, and keyword
  • the data matched with the keyword is merged into the data content corresponding to the data content identifier, and each data segment is determined according to the attribute information.
  • the merged data object performs integrity verification to confirm that the attributes of the merged data object match the attribute information in the metadata.
  • Step 504B If the metadata includes a unique identifier of the data object, perform integrity verification on the merged data object according to the unique identifier.
  • the data merging process is actually the reverse process of the data splitting process and works according to the preset merge statute.
  • the preset merge specification (hereinafter referred to as the merge specification) may be combined with the preset split specification (including: preset metadata stripping protocol, preset data content splitting specification, preset encoding separation specification, etc.
  • the merge/peel protocol it is the same content.
  • a merge specification is data information prepared for data recovery, or it can be called a split merge specification, because it is necessary to ensure that the split data can be recovered back. Therefore, the split statute often includes or implies a merger reduction.
  • the client can locate the storage content in the file meta-information system library, the mail system, the cloud storage, and the like according to the attachment name (ie, the unique identifier of the data object).
  • Data information, data information has split algorithm, each data segment, positioning information and related file metadata items, etc.
  • the mail system can be obtained according to The obtained data information locates and downloads the data segment, and obtains the inverse algorithm according to the splitting algorithm to merge the data segment and the metadata. If there is a code, the data segment can be restored according to the code to obtain the original user data object content; if the metadata includes the data
  • the unique identifier of the object which can also verify the file size, recovery file name, file type, creation time, etc. based on the file metadata.
  • the information of the split protocol in the example of the mail client can be a merge specification. Among them, the specific merge specification, that is, the inversion process, can be derived through the data split description document.
  • the system retains the appropriate split/peel protocol after data splitting and stores the relevant location information (such as its storage location) in the split data segment, or any storage space that is designated for access. in.
  • relevant location information such as its storage location
  • the system will find or extract the corresponding split metadata according to the obtained split/peel protocol or merge specification, and splicing and combining each data segment based on information such as data split/peel protocol or merge specification and metadata. Together, thus recovering the original data.
  • the decoding operation is performed according to the merging algorithm in the preset merging protocol, and the data segment corresponding to the encoding is obtained, including:
  • split target The information of the data object is divided into three parts: a metadata block, a data block (ie, a data segment), and an index block (ie, an encoding). Any information dispersion algorithm can be used, For example, the IDA algorithm divides the contents of the source file after lossless compression into four bytes (32 bits). It should be noted that compression is not necessary. The divided results are sorted and combined and deduplicated, that is, the duplicates are eliminated and saved as data block files that are not duplicated.
  • the divided data block (data segment) is assigned to the index (encoding) of the data block file, and is saved as an index file (arrangement order information of encoding and encoding) in the original order.
  • the file name of the data block file and the index file may be a hash value (MD5, SHA-1, etc.) of the corresponding file content or a system-generated globally unique identifier (GUID) or any other globally unique code.
  • the file name, size, date, and other information of the source file, as well as the file name of the data block file and the index file, can be stored in the metabase.
  • the system can restore the target file through the data merge process: for example, according to the coding order of the index file and the arrangement order information of the code, the data block (data segment) The four-byte content corresponding to the file index position is spliced; the spliced result is decompressed (if previously compressed) to obtain the target file.
  • a desktop agent can also be established. However, this desktop is built on the desktop agent of the basic cloud storage, which automates the above-mentioned splitting and merging process, and brings convenience to users.
  • the split-store desktop agent of the user client runs in the background of the system, such as GoogleDrive and Microsoft's One Drive.
  • Google Drive has a directory C: ⁇ GDrive that automatically syncs with Google's cloud storage
  • One Drive has a directory C: ⁇ MDrive that automatically syncs with Microsoft's cloud storage.
  • the sync directory corresponding to the split storage desktop agent is C: ⁇ DDrive.
  • the desktop proxy service detects the change of the file system, automatically splits the file, saves the data block (data fragment) file to C: ⁇ GDrive, and indexes the file (encoding and The encoded ordering information is saved to C: ⁇ MDrive and the metadata is saved to the proprietary database cloud service.
  • Google and the Microsoft Desktop Agent service will automatically sync the block file and index file to Google and Microsoft's cloud storage respectively. Go to the user's other terminal directory.
  • the corresponding terminal runs the split storage desktop agent, it will detect the changes of C: ⁇ GDrive and C: ⁇ MDrive directory, automatically obtain the metadata, merge it with the data block file and data index file into the original file and save it. It is in the C: ⁇ DDrive directory, which enables synchronization of split/merge storage.
  • FIG. 2F is a schematic structural diagram of a data splitting apparatus according to an exemplary embodiment.
  • the present invention provides a data splitting apparatus, including: an extracting and stripping module 61B, for receiving and carrying When the storage request of the data identifier is to be stored, the metadata is stripped according to the preset metadata, the metadata in the data object corresponding to the data identifier to be stored is obtained, and the obtained metadata is stripped from the data object.
  • the segmentation module 62B is configured to split the data content into at least two data segments according to the preset data content splitting protocol.
  • the storage module 63B is configured to store the metadata and the individual data segments in different storage bodies or in different secure channels.
  • the data splitting apparatus of the embodiment obtains the metadata in the data object corresponding to the data identifier to be stored, and obtains the metadata from the data element corresponding to the data identifier to be stored, by receiving the storage request carrying the identifier of the data to be stored.
  • the data object is stripped; the data content is split into multiple data segments according to the preset data content; and the metadata and each data segment are separately stored in different storage bodies or in different secure channels.
  • FIG. 2G is a schematic structural diagram of a data splitting apparatus according to another exemplary embodiment.
  • the stripping module 61B is obtained, including: a receiving submodule 611B. And for receiving a storage request carrying the identifier of the data to be stored.
  • the determining sub-module 612B is configured to: when the receiving sub-module 611B receives the storage request carrying the data identifier to be stored, the metadata agreed in the preset metadata stripping protocol includes: attribute information; and the data object corresponding to the data identifier to be stored The attribute information content matching the attribute information is determined as metadata; or the metadata used in the preset metadata stripping protocol includes: a data content identifier and a keyword, and corresponding to the data identifier to be stored according to the data content identifier Among the data contents in the data object, the data matching the keyword is determined as metadata; or the metadata used in the preset metadata stripping protocol includes: attribute information, data content identifier, and keyword, The attribute information content matching the attribute information in the data object corresponding to the to-be-stored data identifier is determined as metadata, and according to the data content identifier, the data content matching the keyword is determined as metadata from the data content in the data object.
  • the stripping sub-module 613B is configured to determine the metadata determined
  • the obtaining the stripping module 61B includes: a parsing sub-module 614B, configured to parse the data object to generate a unique correspondence with the data object when the metadata agreed in the preset metadata stripping protocol includes: the data object identifier Data object ID.
  • the apparatus further includes: an encoding module 64B, configured to perform encoding processing on each data segment according to a preset encoding separation protocol, to obtain a code corresponding to each data segment.
  • the arranging module 65B is configured to arrange the respective codes according to the original order of the data segments in the data content to obtain the coded ordering information.
  • the storage module 63B is specifically configured to store metadata, encoding corresponding to each data segment, and encoding sequence information into different storage bodies or different secure channels.
  • the apparatus further includes: an identifier generating module 66B, configured to generate a coding order information unique identifier based on the encoded arrangement order information, and/or generate a respective data segment unique identifier based on each data segment; a storage module 63B, It is also used to store the coding sequence information unique identifier and/or the individual data segment unique identifier as part of the metadata.
  • an identifier generating module 66B configured to generate a coding order information unique identifier based on the encoded arrangement order information, and/or generate a respective data segment unique identifier based on each data segment
  • a storage module 63B It is also used to store the coding sequence information unique identifier and/or the individual data segment unique identifier as part of the metadata.
  • the preset data content splitting protocol includes at least one of a disk array RAID splitting algorithm and an information dispersed IDA algorithm.
  • FIG. 2H is a schematic structural diagram of a data merging device according to an exemplary embodiment. As shown in FIG. 2H, the present invention provides a data merging device, including:
  • the receiving module 81B is configured to receive a data object acquisition request that carries the identification information, where the identification information includes positioning information, and the positioning information is used to locate a storage address of the partial data information in the data object.
  • the obtaining module 82B is configured to obtain the storage content corresponding to the positioning information, and obtain the data information in the other storage content according to the obtained positioning information in the stored content until all the data information of the data object is obtained.
  • the processing module 83B is configured to combine the acquired data information according to the preset merge protocol in the acquired data information to obtain a data object.
  • the data merging device of the embodiment obtains the data object acquisition request carrying the identification information, and obtains the storage content indicated by the positioning information according to the positioning information in the identification information, and then The data information in the other stored content is obtained according to the positioning information in the stored content until all the data information constituting the data object is acquired.
  • the obtained data information is combined and processed to obtain a complete data object.
  • FIG. 2I is a schematic structural diagram of a data merging device according to another exemplary embodiment.
  • the type of data information is one or more of the following combinations.
  • Method metadata, data fragment, encoding, encoding order.
  • the processing module 83B includes: a decoding sub-module 831B, configured to perform a decoding operation on the encoding according to the combining algorithm in the preset merge protocol, to obtain a code corresponding Data fragment.
  • the arranging sub-module 832B is configured to arrange the decoded data segments according to the encoding order to obtain data objects arranged in the original order of the respective data segments.
  • the processing module 83B is specifically configured to: when the metadata agreed in the preset merge specification includes: attribute information, the data object merged with each data segment according to the attribute information Perform an integrity check to confirm that the properties of the data object match the attribute information in the metadata.
  • the metadata agreed in the preset merge specification includes: a data content identifier and a keyword, and the data matched with the keyword is merged into the data segment corresponding to the data content identifier, and then the respective data segments are merged to form Data object.
  • the metadata agreed in the preset merge specification includes: attribute information, a data content identifier, and a keyword, and the data matched with the keyword is merged into the data content corresponding to the data content identifier, and each of the data content is The merged data object of the data fragment is integrity verified to confirm that the attributes of the merged data object match the attribute information in the metadata.
  • the apparatus further includes: an integrity verification module 84B, configured to include a unique identifier of the data object in the metadata, and perform integrity verification on the merged data object according to the unique identifier.
  • a soft/hardware implementation method in accordance with the present invention will be presented in conjunction with various embodiments of the above split and merge method and apparatus, in a specific example.
  • splitting is primarily about considering how the system distributes data across multiple stores in the system architecture.
  • Such systems typically use metadata, coding, and domain-related data content splitting. Therefore, it is possible to naturally disassemble the application domain, that is, to use a domain-related split method.
  • the data split/stripe, merge process is often built into the system's data access layer, associated with domain-related business logic. Whether it is domain-related data splitting or domain-independent data splitting, its data splitting/stripping methods can be varied. Therefore, we introduce the concept of "data split description language (which can be used as part of the split/merge protocol)" to configure the data splitting process.
  • the system or user can split/stripe the data at runtime using a dynamic data split/peel method.
  • the description of the data split/peel method itself (which can be part of the split specification) can be stored in a particular store as part of the stripped out metadata. Different data can have different split/peel methods.
  • the merging of data will vary from data to data, and the merging process must be based on an understanding of the split/peel method description.
  • the data split/peel/merge engine is a system component that parses and executes the data split/peel description information to complete the data split/peel/merge.
  • At the heart of the data split description language and data split/peel/merge model is the data processor model.
  • a data processor is a software/hardware component that processes data.
  • the splitter is used to implement the split function, and the corresponding merged data is called the combiner. They are also data processors.
  • compressors, decompressors, encryptors, decryptors, savers, extractors, etc. are also data processors.
  • the core of the data processor is the processing, in addition to several input ports (including data input port and parameter input port) and several outputs.
  • the data input port corresponds to the data input
  • the output port corresponds to the data output
  • the parameter input port corresponds to the parameter information that needs to be used in the data processing process.
  • the compressor has an input port (and an additional password parameter input port when there is a compressed password), a data output;
  • the splitter has one data input, multiple data outputs;
  • the combiner has multiple data inputs , a data output; saver has a data input, multiple parameter input (corresponding storage location, access access information, etc.), no output (the process is to submit the input to the storage);
  • the extractor has no input, a data output
  • There is also a very special kind of data processor - generator no data input (sometimes with parameter input), one or more data output, and its data output often participates in the entire data processing process as a parameter of data processing.
  • the distributor is a data input, multiple data outputs, and each output data is the same as the input data.
  • the output of one processor must be connected to the input of another processor (either data input or parameter input).
  • another processor either data input or parameter input.
  • the data generator The data generation process is generally irreversible.
  • the reverse processing in the system is the generated data. Can be obtained directly or indirectly from storage and other processors).
  • the data input of a data processor is the data output of its corresponding reverse processor, and the data output is the data input of its reverse processor; the parameter input remains unchanged.
  • the splitter corresponds to the combiner
  • the encryptor corresponds to the decryptor
  • the compressor corresponds to the decompressor
  • the saver corresponds to the extractor
  • the distributor corresponds to the distributor (the process of the distributor inversion has a data input port selection), and so on.
  • the whole process of data splitting/stripping/merging is actually implemented by a network of data processors, and its essence can be characterized by the Petri net model.
  • the processing is transition, the input port is the library, and the output to the next input port is a directed arc.
  • the directed arc from the data processor input port to the processor is hidden. Included inside the processor - when all data ports have data (tokens), the process is automatically activated and the data flows down.
  • the aforementioned data split description language is mainly used to describe the assembly flow diagram of the data processor.
  • a document described in a data split description language is called a data split description document.
  • Data Split Description The data flow diagram described in the document is essentially a data processor. Therefore, another data flow graph can be used as a data processor in one data flow graph.
  • the data split description document actually defines one or more data flow graphs. For documents that are directly used for data split descriptions, you need to specify the final ingress flow graph.
  • Each data flow graph includes multiple data processors and their connection relationships. The connection relationship is described in the data output port of the data processor.
  • the data flow graph has a specified starting data processor. Data split description documents can be rendered and edited graphically.
  • the data splitting and merging engine splits and merges the data according to the description of the data split description document.
  • the corresponding data splitting process is as shown in FIG. 2J: step 1001B, acquiring metadata of the data object to be separated; step 1002B, creating a separate archive document according to the metadata; step 1003B, reading the data to separate the archive document; and step 1004B separating the data
  • the storage document is instantiated into a data flow graph (instantiating the data processor and establishing a connection between them); step 1005B, passing the data to be separated to the starting data processor of the data flow graph; step 1006B, destroying the flow graph after execution Data flow graph.
  • the data splitting and merging engine is mainly responsible for loading the data split description document and instantiating it as executable.
  • the data flow graph finally passes the data to the flow graph for data processing.
  • Number According to the processor as an active object, that is, the instantiated processor object has its own thread/process, which constantly checks its own executable conditions. Once it finds that all input ports have data, it executes automatically and passes the result to other Data processor. After completing these operations, it will destroy itself.
  • the flowchart is as shown in FIG. 2K.
  • Step 1101B determining whether data is transmitted to the input port; if step 1102B is performed, if step 1103B is not performed; step 1102B, receiving input data; step 1103B, determining whether all data ports have Data; if an empty input port (usually a parameter port) is found, that is, an input port without any data source, the user is allowed to enter the corresponding information through the interactive interface. If there is an execution step 1104B, if not returning to the execution of step 1101B; step 1104B, executing a data processing procedure; step 1105B, passing the processing result to the output corresponding data processor.
  • step 1201B locating the corresponding document according to the input information to separate the stored document
  • step 1202B reading the data to separate the stored document
  • step 1203B instantiating the data separated storage document into the corresponding reverse data stream Figure 1204B.
  • the input information may be a reference code of the data split document, or may be a part of the data content after the split.
  • a hash function also known as a hash function
  • the obtained hash value can also be used as a reference code for the document. With this encoding, a corresponding data split document can be obtained.
  • the data splitting document describes the data splitting process, and the corresponding reverse process needs to be obtained when data is merged.
  • This inversion process is actually started from the actual data processor, and the inversion is performed according to the output port traversing the relevant data processor.
  • the process of reversing the data processor varies by type, but in general, the type is changed to the inverse process type, the data input port becomes the output port, and the output port becomes the data input port.
  • the input parameter port is unchanged.
  • the data split description language definition is shown in Figure 2M; the data split description language visualization flow chart is shown in Figure 2N; the data split description document sample is shown in Table 1:
  • the specific splitting process is as follows: the data to be split is first DES encrypted, the encryption key is from the system configuration storage; the encrypted data is split into block data and encoded data by 4-byte split coding; the encoded data is stored in In Amazon S3 cloud storage, the corresponding SHA1 hash value is stored in the metadata database as the key value for addressing the corresponding metadata; the block data is stored in a local file, and the file name is a GUID generated by the system, and the GUID is also used as Key values are stored in the metadata database.
  • the metadata database related records are shown in Table 2; the split items and metadata mapping tables are shown in Table 3;
  • FIG. 2O illustrates the correlation between various concepts under the above three concepts, and some specific application examples that can be extended with these concepts and concepts. These specific applications are merely exemplary, and there are more variations in practical applications, so the present invention has a very broad application prospect.
  • the present invention not only provides a novel handwriting input method and system, but also combines the object-based open codec scheme of the present invention, and object-based data splitting/stripping/merging.
  • the data processing method and system of the method based on the traditional data processing system, constructs an open, secure and efficient data processing system in the true sense of the future and based on the network environment.
  • the basic background content is first introduced, and the generation and development of the computer are inseparable from the coding technique.
  • As a computer-based coding technology it is widely used in the transmission, storage and processing of data, and its importance is self-evident.
  • the rise of cloud computing, big data, and the Internet of things are poised to bring new opportunities and challenges to coding technology.
  • the content encoding is a method of digitizing or converting the content of the encoding object.
  • Base64 encoding various data compression encoding (including lossless compression, lossy compression, etc.), image encoding (JPEG, SVG, etc.), video and audio encoding (PCM, MP3, MP4, etc.) are all in the category of content encoding.
  • the digitized content of the data itself is directly included in the results of the content encoding and can be analyzed and processed by the computer.
  • structured coding technique for describing the structural information of data. It mainly encodes structured data/document content. For example, HTML, MathML, SVG, etc. are specific structured description languages, and the corresponding coding specification is meta-language XML. Similar coding specifications are JSON, Protocol Buffer, etc.
  • the result of a reference encoding process is not the data content itself, but a reference to the content or a description of the addressing path of the access object.
  • Huffman coding is a pair of source symbols (The content itself) establishes an optimized reference encoding method. URL, IP address, RFID, barcode, QR code, ISBN, zip code, etc. are all reference codes.
  • the text encoding (especially the standard encoding) is essentially a reference encoding, which is the encoding corresponding to the specific text position in the text encoding scheme. As the text body, the sound, shape, meaning and other data are only reflected in the coding specification.
  • a computer program can directly process the encoding without encoding the corresponding content (or the corresponding content has been built into the computer program).
  • standardized coding systems such as ASCII and Unicode.
  • Such encoding and encoding combinations themselves already constitute a higher level of data content.
  • Standardized text encoding is such a typical example. Many of today's text-based coding conventions (such as JSON, CSV, XML, etc.) are based on this.
  • OMG a non-profit standardization organization in the computer field, successfully defined a set of languages and standards for object modeling.
  • OMG divides the model into four levels of abstraction: meta-model layer (M3), meta-model layer (M2), model layer (M1), and runtime data object (M0).
  • the meta-model layer contains the elements needed to define the modeling language;
  • the meta-model layer defines the structure and syntax of a modeling language, which can be specifically mapped to UML (Unified Modeling Language) or object-based programming languages such as Java, C#, etc.;
  • the model layer defines a specific system model, specifically the class or object model we often say;
  • the runtime contains the state of a model object at runtime, etc. The object or instance we are talking about.
  • FIG. 3 is a schematic diagram of a meta model in the prior art.
  • a Meta-Object Facility (MOF) is a standardized specification for establishing a metamodel (M2) defined by the OMG.
  • MOF includes a metamodeling language (M3 model) and methods for creating, manipulating models, and metamodels.
  • the object model has multiple levels, static models that represent structure and functionality, and dynamic models that describe runtime behavior.
  • the main focus of this paper is on static models related to coding, including data and interfaces.
  • the object's identifier is actually a reference encoding.
  • the identifier must be unique, paired with the object. should. In this way, the system can locate the corresponding object by identifier addressing.
  • object reference encoding and object identifiers are a concept because their usage goals are consistent.
  • the reference code may not be used as an object identifier.
  • the reference code is only guaranteed to be correctly addressed to the target, and does not necessarily guarantee a one-to-one correspondence with the object.
  • there is a many-to-one situation one object, multiple encodings). For example, a host can have multiple IP addresses; the same website can have multiple URLs.
  • reflection refers to a class of applications that are self-describing and self-controlling. That is to say, such applications use a mechanism to achieve self-representation and examination of their own behavior, and can adjust or modify the state of the behavior described by the application according to the state and result of their behavior. Relevant semantics.
  • Reflection technology has been supported by modern software development platforms, tools, and programming languages. For example, you can use reflection to get metadata directly from running objects in Java and .Net platforms at runtime.
  • FIG. 4 is a schematic diagram of the architecture of the encoding system of the present invention.
  • the encoding system is mainly divided into three parts: a client. End, encoding server, data storage. Among them, the encoding server and the data storage end together constitute an encoding warehouse.
  • the client can obtain a corresponding data object by sending an encoding to the encoding warehouse; and sending the new data object to the encoding warehouse, the corresponding encoding can be obtained.
  • the encoding server provides services to the client.
  • An encoding repository can include one or more data stores in which real data is stored.
  • the encoding server can send data queries to the data storage terminal to obtain, update, and insert related data.
  • the code repository provides a centralized encoding service that allows different clients to share data objects and encode meta-objects by reference encoding. Further, a variety of different systems can register new coded meta-objects with the code repository to meet a variety of different coding requirements. This centralized coding service makes data integration and exchange of various systems easier.
  • the code repository has a built-in data access control system that provides different access rights for different data objects and coded meta objects.
  • the encoded meta-objects and data objects can be stored on different data storage ends, and or set with different data access rights.
  • the encoded meta information is stored in an encoding repository, and the data object itself may exist in the encoding stream (content encoding) or the storage system of the encoding repository.
  • the reference code of the data object exists in the encoded stream.
  • the data objects in the code stream and the code repository can be placed in different secure channels. The separation of this information has natural security on the one hand and better coding efficiency on the other hand.
  • the data storage end can be implemented by using different storage systems such as file storage, relational database, NoSQL database, and cloud storage.
  • the present invention proposes a new object-based coding and decoding scheme and system, and is also an open solution.
  • object-based open coding schemes can be completely personal and non-standard.
  • This non-standard refers to a standard that is different from the traditional ones that are developed and reused by the organization or organization, but the essence is based on the de facto standard (coding protocol) of the coding warehouse.
  • This solution not only provides more flexible and diverse data services, but also provides more reliable security for data.
  • the coding scheme of the present invention can encode data of any type and any length, can have any coding format and arbitrary coding word length, and the coding rules can be not fixed, that is, the coding rules can be randomly changed as needed. This makes it possible to create fully personalized coding.
  • the coding scheme of the present invention is an encoding scheme that can encode an arbitrary object and is independent of the length of the object data, the encoding rule, and the length of the encoded word. This greatly breaks through the inherent form and limitations of existing standard coding. This coding scheme can be arbitrarily expanded. The same code can also be reused in different encoding processes without affecting each other, thus greatly improving the utilization of the code.
  • the concept of the coding scheme of the present invention consists in creating an encoding protocol for the data object based on the metadata of the data object and generating the encoding according to the encoding specification.
  • the present invention can acquire the features or structures of the data objects in an encoded manner and generate corresponding codes for the data objects in accordance with the features and/or structures of the encoded objects.
  • any party involved in the transmission, as well as the receiving and storing parties have the opportunity to obtain all the information in the data.
  • This is not conducive to the confidentiality of data, but also makes the data transmission amount large, increasing the network bandwidth and the burden of CPU processing, especially for large-scale data transmission, and thus reducing the data transmission efficiency.
  • Another feature of the present invention is that only the data objects that need to be transmitted are stored in the code repository, and the corresponding data access rights are set to obtain the corresponding reference code.
  • the reference code of the data object can be exported, and only the receiver that has the data access right can get the complete data. This can greatly reduce the amount of data transferred, while increasing the security and reliability of the data.
  • the encryption process of data does not require any metadata participation, and only the encryption data is needed to convert the original data into content that cannot be normally recognized or displayed. can.
  • the invention can also achieve the effect of encryption, on the one hand, the invention achieves data protection in a completely different way. Specifically, the data content is protected by means of metadata of the data object in a coded isolation manner.
  • the encrypted ciphertext data size is often the same as or larger than the original plaintext, but the present invention only needs to transmit a very small amount of information such as a corresponding reference code.
  • more useful functions and operational space are provided for data processing. For example, but not limited to, it can reduce the transmission of data and reduce the network load; the flexibility of coding also provides greater convenience for subsequent data processing and the like.
  • the encryption needs to convert the original data into a code or data completely different from the original data by a predetermined rule or algorithm, so that it cannot be easily
  • the ground is identified by a third party.
  • the present invention can completely preserve the original form of the data content, and can also realize the security and confidentiality of the data without any modification to the content, which is not possible by the conventional encryption system.
  • the open system of the present invention can assign different encodings to each data segment in the encoding process, and can also set different access rights for different users. This allows for more granular security.
  • the standard character becomes a special object (the object number of the built-in encoding metadata); the object reference encoding becomes a special character - non-standard characters.
  • the present invention can be used to directly accept the digitized result of human natural output, divide it into different data objects according to certain rules, and place it in an encoding warehouse to form non-standard characters (in this paper, non-standard)
  • the character is based on the object reference encoding of the encoding repository, but focuses on emphasizing that the data object is a piece of data obtained by splitting the human digital output result.
  • the present invention can establish a proprietary font for the writer by assigning a custom unique code or code to all or a fragment of the digitized result of the natural output of each human individual.
  • the user can input or add his own font at any time, thereby eliminating the need to input the reference font in advance as disclosed in Chinese Patent No. CN103136769A. The trouble with information.
  • the invention can also place the object reference coding in different coding spaces, such as the user coding space divided by the user, different users can use the same reference code to correspond to different data objects in the coding warehouse; and the coding according to the date Space; coding space divided by geographic location; coding space divided by department; coding space divided according to online session;
  • the coding space divided by the session has a very high security feature - the reference code of the data exists in the coding space corresponding to the session. When the session ends, the corresponding coding space will disappear, and all the codes in the space will not be decoded correctly. . With this feature, the effect of "reading and burning" can be achieved.
  • introducing the coding space and adopting variable length coding can greatly reduce the storage consumption of the reference code and improve the efficiency of transmission, processing and storage.
  • the new data processing system introduces the concept of an encoding repository.
  • the application can not only query and use the encoding meta-objects already in the encoding repository, but also register and use new encoding meta-objects.
  • the new system breaks through the limitations of existing systems from four different levels.
  • Text encoding is non-standardized. Text encoding and corresponding solution
  • the code information is stored in the application system and the code repository, respectively.
  • the code repository can support different levels of code isolation for users, applications, and content. Therefore, we can authorize the access and use of text content through the access control management of the code repository. In other words, the new data processing system has built-in security.
  • open coding allows us to completely break through these limitations.
  • the corresponding text parser can distinguish which text is the mark and which is the content according to the encoded metadata.
  • anything that can be serially encoded can be stored and encoded by the system, such as music melody, dance action, game data, video subtitles and even computer instructions.
  • the stored results are divided into two parts, one is the data object in the encoding warehouse, which can be multimedia data, or proprietary data, and the other part is the encoded code sequence.
  • the reference encoding of such data objects is not unique to the system.
  • Traditional data processing systems based on standardized encoding can also encode arbitrary data. But far from being based on object coding systems, it is simple, efficient, and natural.
  • the object coding in the object-based coding system may include a meta-encoding and an instance coding part, for
  • the number of metacodes is very limited. For example, two bytes of 16 bits can encode more than 60,000 yuan codes, which can actually correspond to more than 60,000 object types, which is for most applications. All are enough.
  • For a specific object due to the arbitrariness of the object encoding, we can directly use a number to represent its instance code, for example, 4 bytes 32 bits can encode more than 4 billion object individuals, plus we can Putting the reference code in a different encoding space, 32 bits is sufficient for most systems. That is, 6 bytes can represent the reference encoding of objects in most applications.
  • variable-length encoding we can often express an object reference encoding with fewer word counts by setting default meta-encoding, using client-side encoding, and so on. In contrast, in order to prevent data block conflicts in cloud storage, it is much simpler and more effective to use a dozen or even dozens of bytes to reference and encode a data block.
  • the new data processing system we can store the data object corresponding to the object reference encoding in the encoding warehouse, which can greatly improve the storage efficiency of the data object, thereby improving the data transmission and processing efficiency.
  • the HTML of the webpage is re-encoded using the object encoding technique, and the elements and attributes of the standard HTML various tags are encoded, and the relevant meta-information is put into the encoding repository, and the size of the obtained webpage document is greatly reduced, which can be Network transmission of web pages saves traffic.
  • the encoding scheme used by object-based data processing systems can be personalized and non-standard. This is mainly achieved by the isolation of the context coding space. Different users and unused applications have their own context coding space. Further access to personalized coding is achieved by accessing a personalized contextual coding space.
  • Each object reference code has a one-to-one correspondence with the data objects in the encoding repository.
  • the data processing system can dynamically add data object types and their encodings.
  • the system automatically stores the input to the encoding repository as it is entered and encodes the location of the content in the encoding repository.
  • the output process is based on the object reference code, the input content is taken from the code repository, and it is played back naturally.
  • the writer writes under a natural writing constraint (such as row constraint or column constraint), and the system writes the content according to natural participle (such as Chinese character segmentation).
  • a natural writing constraint such as row constraint or column constraint
  • natural participle such as Chinese character segmentation
  • the division of words such as the word segmentation of words in the phonetic language
  • the shape of the word or word that is split is stored in the code warehouse, and its corresponding reference code is generated.
  • These encodings are stored in a textual content--ie, a collection of textual encodings in a specific typographical order.
  • the above handwritten text input process is between the text recognition handwriting input and the non-recognition handwriting input. Similar to the text recognition system, this process requires the division of words and words. But the difference is that you don't need to analyze the standard code corresponding to the input, but "input is what you get.” This method does not have the problem of recognition rate, always 100%. This is the same as the non-identifying system. But the difference is that the process divides the input content and encodes them separately. This allows us to perform some word processing on the coding results in the new system, such as editing, copying, pasting, transferring, searching, retrieving, etc., just like ordinary text.
  • data processing systems based on open coding can also be used in optical recognition based input systems. Especially in the recognition of handwriting input, it is not important whether the handwriting is scribbled or not.
  • the optical recognition system based on open coding only needs to divide and input the input image to divide the image and store it in the code warehouse, and generate corresponding Image object reference encoding. It is worth mentioning that due to the personalized characteristics of the code, the corresponding data objects in the code repository formed by the system can be used as a good sample. The results of analytical training can in turn increase the conventional text recognition rate for that particular individual.
  • the data processing system is also applicable to a voice input system.
  • the input sound signal does not need to be identified, and only needs to be simply processed and divided, and can be stored in the code warehouse and encoded accordingly.
  • the data processing system can also be applied to other text input methods, such as Braille, lip language, sign language, and semaphore input.
  • new text can be created based on this new data processing system.
  • Input method For example, on a small-sized screen touch screen device, specific gestures can be designed as branches, word breakers, and end markers, and then input in full-screen handwriting or voice. The input content is divided according to the word segmentation, and is stored in the code warehouse, and the corresponding text code is obtained.
  • a 3D glove-based sign language input method can be designed. The motion information of the 3D glove is stored as a text content in the code repository, and the code corresponds to the character, and a certain time interval is used as a separation of the actions.
  • the output of the sign language is to play back the 3D glove motion information in the code warehouse through the 3D model.
  • the new data processing system has the following advantages:
  • the first aspect simple and natural
  • the new data processing system does not require the generation of specific standard encodings, so the simplest and most natural input method can be designed for the average user to directly encode the result into a personalized encoding.
  • the user can input any content he wants to express, including graphics, symbols, sounds, videos and other multimedia data.
  • the text output in the new data processing system does not need to be recognized, which ensures uninterrupted and efficient input. A smooth and natural user input experience is guaranteed.
  • the new data processing system is a non-standardized object-based reference encoding. People can't understand the content from the text coding sequence, and they need to get the specific content information of the code from the code repository.
  • the access control of the code repository ensures the security of the data content.
  • the code repository is essentially a full-featured cryptographic server. Further, the code sequence and the data in the code repository can be placed in different secure channels, which greatly increases the difficulty for the data thefter to completely obtain all the data.
  • non-standard text based on object encoding can be context-sensitive text.
  • the same encoding can vary from person to person, from application to application, from document to document, from time to time, from location to location, and so on.
  • the application system, and even the individual user can register a new context specification with the code repository, thereby introducing a new coding space to further isolate the text code.
  • the new system has natural security and privacy.
  • the authorized access service of the encoding warehouse can specifically control these special encodings to achieve specific conditions. , the encryption of a specific text encoding.
  • the specific conditions here may be rules based on context (time, place, environment, user, application, etc.) to achieve complex, flexible text encoding security.
  • the encoding repository can also provide users or systems for identity authentication and digital copyright protection.
  • the third aspect open
  • object-based coded data processing systems are a fully open system. Any data object can be placed in the code repository and its reference code can be recorded in non-standard text.
  • Software developers can register new context object specifications, new encoding spaces, new encoding meta objects, new data objects, or add new encoding services to the system, including new non-standard text services (including new non- Standard text input and output, non-standard text editing and other systems).
  • the new data processing system divides, splits, and encodes the same content. In this process, the system can directly filter out useless information, and only retain important information that people pay attention to, such as filtering out noise in the audio, scanning noise points in the text, and so on. Moreover, through the content normalization service, the duplicate content does not need to be repeatedly stored, which greatly reduces the storage space and improves the transmission speed. More importantly, we can use the existing word processing infrastructure and tools to process and process the text-encoded content formed in the new data processing system, such as searching, indexing, editing, and so on.
  • the flexibility of coding deployment means that for the same encoding type, we can selectively configure it into different encoding spaces, thus having different security levels and visibility.
  • the flexibility of access control means that the user or the administrator of the application system can configure the access to the object code very flexibly through the access control settings of the code repository: on the one hand, the access control can be configured to different coding levels, which can be coding. Space, or encoding metadata, or even specific data objects; on the other hand, access control for encoding can be based on different conditions, such as time, location, user, application, state of the domain model, and so on.
  • the data object encoding and the split storage of content in the new data processing system ensure efficient storage and transmission.
  • the content of the data object needs to be transferred from the encoding repository to the consumer only when it is really needed.
  • the unidentified data object content formed in the new data processing system can be a good personalized identification training sample.
  • the trained text recognition system can more effectively identify personalized non-standard text into corresponding standard codes.
  • the format information of the text can be stored in the code repository.
  • Text format characters use non-standard encoding
  • text data can use standard characters arbitrarily without escaping, which will bring efficient text data transmission and processing.
  • the new data processing system mainly has the following aspects:
  • the first aspect is conducive to the popularity and depth of personal computing.
  • the new data processing system makes it possible to access traditional text input methods that are close to nature, solving many people's problems of "computer input is difficult".
  • a safe, natural data processing system is more acceptable to ordinary people.
  • Such computer text input is no longer a matter related to the individual's cultural background and familiarity with the degree of the keyboard, which is conducive to the popularity and depth of personal computing.
  • the second aspect is conducive to the popularity and depth of cloud computing.
  • the third aspect is conducive to the development and popularization of the Internet of Things.
  • the internet of things combines intellisense technology, recognition technology, and pervasive computing technology, and is called the third wave of information industry development after computers and the Internet.
  • the Internet of Things is an extension of the Internet.
  • the Internet of Things has an urgent need for object addressing coding/identification at the three levels of the sensing layer, the network layer, and the application layer.
  • the number of nodes is large, the variety is large, and the processing capability is limited.
  • a huge challenge has not yet formed a common standard.
  • a simple and flexible object coding mechanism can well meet these needs.
  • the fourth aspect is conducive to cultural protection and inheritance
  • the keyboard input of the existing computer text has caused many people to "write the pen and forget the word”.
  • the new data processing system maintains the original writing tradition of humans.
  • the fifth aspect is conducive to environmental protection
  • the new data processing system makes the direct input and use of text on electronic devices more natural, convenient and secure. Conducive to the formation of a paperless environment, and ultimately save the use of paper.
  • FIG. 5C is a flowchart of Embodiment 1 of an encoding processing method provided by the present invention.
  • an execution body of the method in this embodiment is an encoding system, and the method includes:
  • Step 101C Acquire a data object to be encoded and its metadata according to the received encoding processing request.
  • the metadata of the acquired object is mainly the encoded metadata of the acquired object.
  • the encoded metadata can be a subset or a complete set of metadata. For example, but not limited to, the type of object, the corresponding data structure, constraints on storage and transmission, control, and the like.
  • the metadata of the object is the basis of the system and must be extracted from the data in some way.
  • the object's metadata can be automatically obtained using modern software platforms such as reflection mechanisms in Java, .Net, etc.
  • the data object (also referred to herein as an object) is the basic object of data processing in the present invention, that is, the target object to be encoded by the present invention. It can be in any form of data, either as a single word, symbol, part of it, or as an audio, video, multimedia stream or fragment thereof, or as an encoding itself or a document. It includes at least the metadata portion (or metadata) of the data object, and usually includes the content data portion of the data object, which is the remainder of the data object, or data, after stripping the metadata. The content of the object, or the data content, or the content data. The content data can be related or unrelated to the metadata portion.
  • Metadata is data about data objects, and is a description of the characteristics, attributes, intrinsic logical relationships, and/or structures of data objects. Metadata can appear inside, outside the data, along with the data, or with the data. Metadata may include such things as the type of object, creation and or modification dates, historical version information, data structures, interfaces, storage constraints, transmission constraints, encoding constraints, encoding context constraints, and the like.
  • Specific metadata examples may include, but are not limited to, information on the following: description of the assembly; identification (name, version, culture, public key); type of the export; other assemblies from the assembly; Security permissions; description of the type; name, visibility, base class and implementation interface; members (methods, fields, properties, things Pieces, nested types); attributes; other descriptive elements that modify types and members; header and/or table structure information for tables; palettes in drawing files, and more.
  • Metadata is different for different data objects. For example, for the metadata portion of the data object we call it the metadata of the data object; for the metadata portion of the encoding object mentioned later we can call it the encoding metadata.
  • the ability to acquire or add metadata corresponding to a data object at runtime is the basis for the system to encode data objects.
  • Step 102C Acquire an object code of the data object according to the encoding warehouse and the data object and metadata thereof.
  • the data object to be encoded and its metadata are obtained according to the received encoding processing request, and the object encoding of the data object is obtained according to the encoding warehouse and the data object and its metadata, because the data object can be obtained according to the data object.
  • Metadata and encoding repositories to encode data objects thus enabling flexible and diverse encoding.
  • FIG. 5D is a flowchart of a specific implementation manner of step 102C in FIG. 5C.
  • a specific implementation manner of step 102C is as follows:
  • Step 102C1 Select or create an encoding protocol according to the encoding repository and at least a portion of the metadata, and generate a meta encoding corresponding to the metadata according to the encoding specification.
  • metadata related to the subsequent encoding process may be further selected from the metadata, and then the corresponding encoding specification may be created or generated based on the selected metadata.
  • an encoding specification is selected or created, and the encoding specification is saved.
  • the encoding protocol will be utilized to generate the corresponding encoding. You can also set the default or default encoding protocol for the system to perform the corresponding encoding and decoding. In this case, you only need to select without creating a new encoding protocol. Some or all of the coding conventions can be selected or created by the user in an interactive manner. It is worth mentioning that the encoding protocol generated during the encoding process can be automatically destroyed after the encoding process is completed (after the encoding factory), and can also be saved.
  • the process of adding or creating a coding specification can be done while the object is being modeled; it can also be done while the specific application is running. It can be done automatically by certain rules or by interaction.
  • the coding protocol mainly includes the coding mode of the object, and the coding constraints of the internal structure of the object.
  • Step 102C2 compiling data content of the data object according to the coding protocol And obtaining an instance code, and acquiring an object code corresponding to the data object according to the meta code and the instance code.
  • the object coding is a reference coding form or a content coding form.
  • the encoding system mainly includes an encoding warehouse and a client, and the encoding processing flow can have two implementation manners, and the specific details are as follows;
  • Step 1a The client acquires the data object to be encoded and its metadata according to the received encoding processing request.
  • Step 2a The client sends the data object to be encoded and its metadata to the code repository.
  • Step 3a The encoding repository selects or creates an encoding specification according to at least a part of the metadata, and generates a meta encoding corresponding to the metadata according to the encoding specification.
  • the object coding protocol (which may be referred to as an encoding protocol) refers to the specification and constraints on how the data object is coded and decoded. It can include encoding of data objects (content encoding, reference encoding, or a mixture of both), encoding constraints of object metadata (such as schemes for related data serialization, word length, endianness, data alignment, etc.), etc. .
  • the object encoding protocol can also be used as part of the metadata of the data object.
  • Object encoding conventions can be added manually (through the modeler) or automatically (via the tool) when the object is modeled, or interactively (by the user) or automatically (via system policy) at runtime.
  • Encoding metadata refers to metadata associated with a data object codec.
  • the encoded metadata can be part or all of the metadata.
  • the encoding metadata of the data object is the basis for the system to encode and decode the data object.
  • Step 4a The code repository encodes the data content of the data object according to the coding protocol, obtains an instance code, and acquires an object code corresponding to the data object according to the meta code and the instance code.
  • the data object and its metadata are stored in an encoding repository.
  • the corresponding object code generated by the code repository is actually the reference code of the data object in the code repository.
  • Step 5a The client receives the object code returned by the encoding warehouse.
  • the second implementation is:
  • Step 1b The client obtains the data object to be encoded according to the received encoding processing request and Metadata.
  • Step 2b The client queries the encoding warehouse to select or create an encoding specification according to at least a part of the metadata, and generates a meta encoding corresponding to the metadata according to the encoding specification.
  • the client proposes an encoding process request to the encoding server in the encoding repository to obtain a meta-encoding corresponding to the encoding meta-object (actually a reference encoding of the encoding meta-object in the encoding repository).
  • the meta-encoding may include one or a combination and/or nesting of: type coding, spatial coding, and context coding.
  • Step 3b The client encodes the data content of the data object according to the coding protocol, obtains an instance code, and obtains an object code corresponding to the data object according to the meta code and the instance code.
  • the generation of the example encoding is also correspondingly divided into two types: for the example encoding of the content encoding form, the encoding client According to the coding convention, the content of the data object is directly serialized into an instance code.
  • the encoding client sends an encoding request to the encoding server; the encoding server obtains the corresponding data object and the encoding specification and related information according to the request, and stores the data object in the encoding warehouse according to the encoding specification and related information; Generate the corresponding instance code and return it to the client.
  • the decoding process of the object encoding is the inverse of the encoding process.
  • the encoding server obtains the object code to be decoded according to the decoding processing request of the encoding client.
  • the data object in the encoding repository is located according to the encoding and returned to the client.
  • the object encoding obtained for reading multiple steps.
  • the encoding client parses the object encoding into a meta-code and an instance code according to a preset rule.
  • a metacoded decoding request is sent to the encoding server.
  • the decoding process of the above example encoding is also divided into two types: for the content encoding form, the encoding client can directly decode the instance code into corresponding data according to the encoding protocol. Object content.
  • the encoding client issues an instance encoding and decoding request to the encoding server; the encoding server obtains the corresponding instance encoding and encoding protocol and related information according to the request, and locates the data object in the encoding warehouse, and Return to the client.
  • the system first acquires the encoded metadata; and then obtains the corresponding content encoding according to the metadata.
  • the encoding metadata may include encoding type information for locating, loading, or transmitting the encoded content, and constraint information for the target encoding space to which the encoding belongs.
  • the encoding metadata is encoded to obtain a meta-encoding.
  • the encoded content of the meta-encoding in the encoding repository is mainly the encoding meta-object.
  • Meta-encoding is generally an integral part of encoding. After the decoder parses the meta-encoding from the encoding, the corresponding encoding metadata can be obtained according to a certain mechanism.
  • the encoding metadata as a data object, that is, a data object that encodes metadata as content, which may be referred to as encoding.
  • Meta objects can also have their own metacode. Therefore, the encoded metadata as a data object may also have its corresponding metadata encoding, called meta-encoding.
  • FIG. 6 is a relationship between data objects, metadata, encoding protocols, and encoding meta objects.
  • the encoding meta object is also a data object (for a normal data object, it is an M1 abstraction.
  • the level of the object), the model of its metadata (the abstraction level is M2) is called the encoding metamodel.
  • the encoded metadata of the encoded meta-object is part of the encoding metamodel.
  • the coding element model is the cornerstone of the object coding system.
  • the coding element model is relatively stable at runtime and does not change dynamically, but can be extended. That is to say, the encoding metadata of the encoding meta object is built into the system. Therefore, the system can directly store, transfer, and encode and decode these encoded meta-objects.
  • An object coding system can correspond to a unique core coding metamodel (which can have an extension mechanism).
  • FIG. 7 is a schematic diagram of the core coding element model.
  • the meta-encoding as the object encoding of the encoding meta-object, does it also have its own meta-encoding? This is actually related to the specific design of the coding metamodel and the codec method. If there is only one encoding meta-object in the encoding metamodel, the meta-encoding is all of the encoding meta-object. If there are multiple encoding meta objects in the metamodel and they can be encoded into the same metacode at the same time, then this case does not require metacoded metacode. Otherwise, metacoded metacodes are needed to distinguish them. Sometimes, there is a certain hierarchical relationship between the encoded meta-objects. In this case, multi-level decoding may be required to obtain the encoded meta-object of the final data object.
  • variable length coding is more direct and flexible for the expression of this meta-object hierarchy. And easy to handle: the previous code word is the meta code of the next code word, and the latter code word is the meta code of the next code word, so that multiple levels can be nested.
  • FIG. 8 is a conceptual model of object coding, meta-encoding, instance coding (that is, object coding removes the meta-encoding part), and a conceptual model of the data object and the coding meta-object. As shown in FIG. 8, the following layers are shown. relationship:
  • the encoding meta object can also be used as a data object.
  • Meta-encoding itself can also be used as an object encoding
  • Object coding includes meta coding and instance coding
  • the object encoding is associated with the corresponding data object, which implies the same correspondence between the meta-encoding and the encoding meta-object (mainly implicit in relation 1 and relationship 2 above).
  • FIG. 9 is an exemplary diagram of the meta-encoding in the present embodiment.
  • the object encoding is a 128-bit fixed-length encoding.
  • the owner of the object and the object type. They can be related or unrelated, depending on the definition in the encoding metamodel. Correlation or irrelevant corresponding coding logic is different.
  • FIG. 10 is an exemplary diagram of a similar layer-by-layer correlation of coded meta-objects (variable-length coding of 16-bit word length).
  • FIG. 11 is a schematic diagram of a meta model corresponding to the encoding.
  • the encoding type can have one owner (01) or no owner (00). Therefore, both of the above encoding forms are legal. Only the type encoding as the meta-encoded object encoding corresponds to the data object without the owner. The other one represents a data object with the owner.
  • the meta-encoding is generated based on the metadata and the encoding protocol, and an instance encoding is generated based on the data content.
  • a coding factory is another important component of a system that can be dynamically created by an encoding repository or across components or across systems.
  • the coding factory can provide direct codec services for related objects.
  • the code repository can provide two important services: registration and access to encoded metadata; encoding and decoding of object reference encoding.
  • the encoding repository can also use external storage services to store encoded metadata as well as object data. Wait.
  • the final object encoding is generated from the meta-code and the instance code based on predetermined rules.
  • the meta-encoding and the instance coding may be combined into an object coding in an arbitrary manner, such as splicing or by some kind of operation, etc., as long as the two can be reversely disassembled and restored at the time of decoding.
  • the process of generating the object encoding can be placed on the client side or automatically by the encoding factory, depending on the actual design.
  • the content data may also be the application object itself, or may be positioning and index information of the application object.
  • the data access component of the application system can obtain the corresponding application data through some means or algorithm according to the content data, thereby obtaining the final application object.
  • the content of the data object can be stored in a third party storage system that interfaces with the encoding repository, in which case the encoding repository needs to store information about accessing data objects in the third party storage system.
  • the process of encoding a data object is referred to as object-based encoding.
  • Data serialization referred to as serialization, is the process of encoding content into data.
  • the metadata of the data object and the content data ultimately need to be serialized, or stored in the result based on the object encoding (content encoding method), or stored in a storage other than the result (reference encoding method).
  • the content of the data object and the content of the metadata need to be serialized before being transmitted in the system.
  • the serialization of data objects can also be built entirely on object-based coding methods.
  • the key is that the encoded metadata is stored in the encoding warehouse by the method to obtain the corresponding encoded meta-object reference encoding, that is, the meta-encoding.
  • object-based reference coding is the basis of this method.
  • the encoded meta-object can be reference coded to obtain the meta-encoding.
  • meta-encoding we can both reference the data object and serialize the data object, that is, content encoding. In the implementation of the reference code In the process, better, you need to get the content encoding of the data object (use this method for itself), transfer the content encoding to the encoding warehouse for storage, and then get the reference encoding.
  • object encoding refers to encoding of an arbitrary object.
  • the objects here can be either entity objects such as data, content information, images, voices, etc. (generally they can be reference coded), or they can be value objects (for example, dates, which can be encoded by examples), or High-level objects that include internal object structures, such as array objects, table objects, tree/document objects, and more.
  • Object encoding is one of the outputs of this system for encoding arbitrary objects, and is also one of the inputs for object decoding.
  • FIG. 12 is a schematic diagram of a conceptual model of the object encoding.
  • the object encoding may include two parts, one is a meta-encoding, and the other is an example encoding.
  • Meta-encoding is the encoding of an encoded meta-object. Meta-encoding is generally an integral part of object encoding. After the decoder parses the meta-encoding from the encoding, the corresponding encoding metadata can be obtained according to a certain mechanism.
  • Content encoding is the encoding of data content under the corresponding encoding constraints.
  • FIG. 13 is a flowchart of Embodiment 2 of an encoding processing method according to the present invention. On the basis of the foregoing embodiment shown in FIG. 5C, as shown in FIG. 13, the method in this embodiment further includes:
  • Step 201C Set access rights to data in the encoding warehouse.
  • the data may be metadata, data objects, and the like.
  • the metadata includes one or a combination of the following:
  • Type of data object creation time of data object, modification time of data object, historical version information of data object, data structure of data object, interface of data object, storage constraint of data object, transmission constraint of data object, data object Encoding constraints (including constraints on the encoding space).
  • the method may further include:
  • Step 202C Send the object code to the target client.
  • FIG. 14 is a flowchart of Embodiment 3 of an encoding processing method according to the present invention.
  • a specific implementation manner of step 102C2 is:
  • Step 301C Acquire a context object.
  • Step 302C Acquire a corresponding coding space according to the context object and the coded protocol.
  • Step 303C Encode the data content in the data object in the coding space to obtain an instance code.
  • Step 304C Acquire an object code corresponding to the data object according to the meta code and the instance code.
  • the encoding repository (also referred to herein as an encoding repository) may be a repository that stores encoded metadata, encoded meta-objects, and object data, which may also provide related services. Similar to the font library based on the standardized encoding system, the glyph corresponding to the character encoding in the handwriting input system of the present invention can also be stored in the encoding warehouse. 15 is a schematic diagram of a glyph corresponding to a non-standard character encoding stored in an encoding warehouse in the handwriting input system of the embodiment, as shown in FIG. 15, by accessing the glyph information in the encoding warehouse, the application using the new data processing system can render Any text font.
  • the new data processing system uses a solution based on object open coding. You can encode graphics, voice, or other multimedia data, as well as encode different domain data. These encoded metadata are also stored in the encoding repository.
  • the application system can not only query and use various encodings in the encoding warehouse, but also register new encoding types with the encoding warehouse and submit encoded data to them.
  • FIG. 16 is a core conceptual diagram of an encoding metamodel of an exemplary context-dependent object encoding system, as shown in FIG. 16, which illustrates the relationship between some of the core concepts in the encoding metamodel. The definition of these specific concepts is then given.
  • the encoding space refers to the logical space that isolates the object encoding. Objects corresponding to different instance codes of the same object type in different coding spaces are different.
  • the coding space is directly related to one or several coding objects (only one of the above-mentioned coding metamodels), and the (several) coding object is called the space and the direct context of the coding object in the space. This encoding space is called the encoding space of this (several) object.
  • the coding space of the coding object in the coding space is called a subspace.
  • the encoding space is called the parent space of its child space.
  • the encoding space without a parent space is called the root space.
  • the root space is generally the encoding space of the encoding repository.
  • the coding space is a means of hierarchically classifying and isolating the encoded metadata.
  • the coding space is hierarchical, that is, the coding space can also have subspaces.
  • the same code belonging to different coding spaces can correspond to different objects.
  • the same element code can be completely different in different spaces.
  • different coding spaces have different levels of security isolation for encoding.
  • Figure 17 is a schematic diagram of a base object that can be applied to a basic coding space.
  • any code is present in the code repository, with the exception of standard codes.
  • different encoding warehouses correspond to different encoding spaces.
  • the encoding space corresponding to an encoding warehouse is the root space of all encodings of this encoding warehouse.
  • each code has its own owner. Then the coding of different users belongs to different user coding spaces. With the complexity of user models in the coding warehouse, the division of user space can be more complicated. For example, there may be a group space shared by multiple users.
  • the encoding is to be serialized into a specific data store.
  • This data store can be a file, a database field, or a string that is transmitted over the network. Separating the encoding for this data content itself maximizes the security of the encoding. In fact, this content space based on data content isolation is a password book that establishes a content-to-code correspondence.
  • context space In the context of encoding formation and use, the above two encoding spaces (named encoding space, context encoding space) may be implicitly present. We call this the context space.
  • the permutation of different kinds of context objects determines the final context space. For example, different user and application permutations combine to correspond to different context spaces. But in general, the code in the non-standard text content is uniquely corresponding to the content, and the content itself implies the corresponding application and user (except, of course, multi-application, multi-user content). Therefore, it is not necessary to divide the application subspace or the user subspace in the content space. In all context spaces, there is a special space, which is a context-independent coding space, which we call public coding. In fact, standardized coding is public coding. The encoding in the root space is not a common encoding, but an encoding related to the encoding warehouse. The encoding space is the root space corresponding to the encoding warehouse.
  • any code will eventually be embodied as a code.
  • the last code corresponding to the coding space is a meta-code, which we can call spatial coding.
  • the encoding space is actually a special encoding meta-object - its corresponding object instance is still an encoding meta-object.
  • For context-independent spatial coding there is no coding space for this encoding.
  • the coding can correspond to different coding spaces depending on the context object. Therefore, for context-independent coding spaces, such as named encoding space, we can directly use spatial encoding, and the corresponding instance encoding is subspace encoding or other metacoding.
  • the code corresponding to the coded warehouse space is the coded warehouse code.
  • the content space corresponds to the instance code.
  • the application space corresponds to the application code.
  • User space corresponds to the user code.
  • Figure 18 is a schematic diagram of the coding structure of a 128 fixed length coding scheme.
  • the arrangement and combination of the above codes are not unique.
  • the example code can be placed at any position in the object code as long as it is clearly defined in advance.
  • context space coding is implicit in the context in which the encoding is used and does not need to appear in the final object encoding.
  • the currently used encoding repository implies the encoding warehouse encoding; the currently used encoding application implies the corresponding application encoding; the current encoding of the document content implies the instance encoding and the encoding owner's user encoding ( Assume a single-user document).
  • context space encoding must appear in the text to set different encoding contexts to isolate different spaces.
  • the text in a document includes the encoding of multiple encoding repositories.
  • the corresponding encoding warehouse code must appear in the content of the document to distinguish different encoding warehouse spaces.
  • an encoding repository that supports encoding repository encoding must provide information to access the encoding repository for the library encoding.
  • multi-user text content must use user encoding; application encoding must be used in content that can be read and written by multiple applications and that uses application space isolation.
  • Content space is an exception, because content encoding is the encoding of the content of the document itself, one-to-one correspondence with the content of the document. It is not possible to encode multiple content in any content, so the content encoding does not need to be displayed in the encoding.
  • the content encoding can be a hash value of the document content, or a hash value of the application encoding and time stamp. Therefore, content encoding is either calculated in real time or stored as content metadata.
  • encoding does not need to include spatial encoding, but it is necessary to indicate which spatial encoding is used, which can be specified by using spatial bits in the encoding. This space bit actually corresponds to the coding context specification in the coding protocol.
  • FIG. 19 is a schematic diagram of four binary bits being four spatial bits.
  • the coded storage bit may also be called a reserved bit.
  • An illustrative example may be, for example, when the reserved bit is 0, the encoding is from the current encoding repository. Otherwise, additional information is required to define the encoding or specify the encoding source, such as the client encoding that will be mentioned later.
  • the content bit is 0, the encoding is independent of the content; when it is 1, the encoding exists for the specific content.
  • the application bit is 0, the code is independent of the application; when the bit is 1, it is the application-specific code.
  • the code is a public code; when it is 1, it is the code owned by the current document user. vice versa. Any other coding scheme can be used as long as it can effectively distinguish different spaces.
  • the type encoding is the same as the normal encoding, and there is also a coding space.
  • the space of type coding and instance coding can be different.
  • using public coding for user space can serve as a security isolation for the user space.
  • the encoding type of the encoding is user space
  • the instance encoding is public space. Since the instance code must belong to a certain encoding type, the same type of instance encodes the same spatial bits.
  • the metadata of the encoding type in the encoding warehouse can be accessed according to the type encoding.
  • the type encoding must contain the corresponding space to ensure that the decoder can get the correct encoding type information from the encoding repository.
  • the type information in the encoding repository can contain the spatial bits corresponding to the instance encoding, so the spatial bits do not need to be Appears in the example code.
  • Context space is the main means to securely isolate the code.
  • the main body that manages and sets the application with the generated encoding target space should be the individual corresponding to the context object (such as the user) and the administrator (such as system administrator and application administrator).
  • the management space is a hierarchical management that facilitates coding and is registered and used by the application.
  • the code word length is the minimum number of bits required to encode a character in a text encoding system.
  • the encoded word length of UTF-8 is 8 binary bits, or one byte.
  • the encoded word length of UTF-16 is two bytes. In the encoding of a coded word length, not all codes are of this length. But its length must be an integer multiple of the code word length. For an encoding system with a multibyte word length, it is also necessary to consider the endian problem in a coded word length. This problem does not exist in single-byte word lengths. All data is arranged in bytes from low to high.
  • variable length coding system In addition, for fixed length coding and variable length coding, in an coding system, all coding lengths are equal to their coding word lengths, and such an encoding system is called a fixed length coding system. On the contrary, it is called a variable length coding system.
  • the coding word length and the associated coding method are closely related to the coding and decoding process, and are independent of the coding element model. That is to say, the object coding system corresponding to the same coding element model can select different coding word lengths and corresponding different coding methods. It is even possible to support multiple word lengths or combinations of encoding methods at the same time. Of course, it is necessary to design an effective mechanism to distinguish them.
  • the coding length and encoding method of the system are not directly related to the serialization word length and method specified in the specific object coding protocol. However, if the serialization result is part of the object encoding, the compatibility of the object encoding word length and the method needs to be considered.
  • the object encoding system can be a system that is independent of the encoding word length. That is to say, based on the same code repository, there can be different word length coding schemes.
  • a code word length often cannot put down a complete code (as mentioned above, including spatial coding, type coding, and instance coding).
  • variable word length coding that is, one code can include multiple words. For example, the metacode portion and the instance code portion are split into a plurality of consecutive code words. Even so, sometimes a word length encoding does not cover all encoding instances.
  • Figure 20 is a An example diagram of a coding scheme, as shown in FIG. 20, enables the encoder to automatically obtain the corresponding codeword length through the previous or first two bytes.
  • the scheme can represent a coding range of 0 to 265-1.
  • FIG. 21 is an exemplary diagram of the encoding scheme of UTF-8. Compared with the encoding scheme of UTF-8 (as shown in FIG. 21), it is found that the encoding results of the two encoding schemes do not conflict with each other and may appear in the same document.
  • the byte corresponds to the ASCII code portion of UTF-8; when the first two bits of the first byte of the code are 10, the corresponding code is the object.
  • variable length coding scheme with one byte word length and multiple byte word lengths can be designed.
  • the encoding type is the object type to which the relevant encoding convention is added.
  • the encoding context is an abstraction of the context object. It is actually the selection criteria for the selection of context objects at runtime.
  • the above encoding metamodel uses the encoding type plus the object role name. In the same encoding context (generally a specific application), the same type of role name must be unique.
  • the encoding context of the data object in the blog content should be the author user. In this way, when any reader opens the content, there is no problem that the decoding error occurs because the currently logged in user is not the author.
  • the premise of correct decoding is to correctly set the encoding context object. For the blog example, when opening each specific blog content, the corresponding author user object is set as the encoding context object.
  • the encoding context path is referred to as the encoding path, and corresponds to a series of encoding context conventions, which is a constraint on the encoding space to which the instance code of the corresponding data object belongs.
  • the definition of the coding space indicates that the coding space is a hierarchy associated with the encoded object with the associated encoding - the subspace can also have subspaces.
  • the encoding path is the encoding space path that is positioned to determine the encoding object. For example, the image encoding path in a personalized journal might look like this:
  • the image corresponding to the image object encoding can be found in the final application space.
  • Encoding path in the encoding metamodel Is the encoding path of a higher level of abstraction, corresponding to:
  • this encoding path is instantiated to the above encoded path instance by selecting the corresponding context object.
  • the so-called context object is a concrete object corresponding to the context specification.
  • the object must conform to the constraints of the context specification and must be accessible in the corresponding encoding and encoding process. For example, there is an "author" context constraint whose corresponding type is "user".
  • the context constraint When the context constraint is set, the current application cannot be set to the corresponding context object. It must be set with an object of the "user" type.
  • the author object after obtaining the author information corresponding to the document, it can be set to the context object corresponding to the "author" context constraint. If the author object is inaccessible to the current user, the context object cannot be instantiated, which means that the encoding context constraint is not satisfied, and the subsequent related instance encoding cannot be decoded. This is also a concrete manifestation of context-based coding security in this method.
  • the encoding path instance is directly related to the encoding space of the corresponding data object instance code in the encoding warehouse.
  • the storage location of the corresponding data object in the encoding warehouse may also be restricted by the encoding space.
  • the specific implementation of the encoding path for the encoding warehouse can have multiple choices depending on the storage scheme.
  • a simple implementation is to use simple context name splicing to form table names for context-sensitive data objects.
  • the table name of this picture table can be:
  • the instance code of the corresponding data object can directly use the keys of the table.
  • Another implementation of the coding space is to uniformly store the data objects, and only distinguish the coding space for coding.
  • the system maintains a table of encoding spaces as follows:
  • the code space ID field is the table primary key; the parent space ID is a foreign key of the table, and is used to represent the nested relationship of the code space.
  • the picture ID field is the primary key of the table.
  • the data for all images is placed in the table.
  • the other is the corresponding picture encoding table:
  • the code space ID field is a foreign key of the system code space table, and the picture ID field is a foreign key of the picture table.
  • the Encoding Space ID field plus the Encoding field is the primary key of the table.
  • the encoded directory entry is a specific encoded meta-object encoded by the context-dependent object.
  • the encoding directory is a list of encoding directory entries.
  • Each encoding directory entry has a unique number in the encoding directory, which is the metacode.
  • the encoding directory entry is specifically the encoding type plus the encoding path.
  • the encoding path can be a relative path, that is, the current space of the encoding directory item, or an absolute path-based root space; or both can be supported at the same time, and only a mechanism for distinguishing the two needs to be established.
  • the meta-encoding (encoding corresponding to the encoding directory entry) and the instance encoding in the object encoding may not be in one encoding space.
  • the encoding directory entry can unify the spatial encoding and type encoding mentioned above. If a meta-encoding, the encoding type in the corresponding object data (actually the encoding directory entry) is still an encoding directory entry, then the meta-encoding corresponds to An encoding space; the instance encoding after the meta-encoding is actually a meta-encoding. In this way, the meta-encoding can represent both the spatial encoding and the encoding of the encoded directory entry, depending on whether the corresponding encoding type is an encoding directory entry type. Therefore, with the support of this design, the meta-encoding of an object encoding can be one or more meta-encoded groups.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Character Discrimination (AREA)
  • User Interface Of Digital Computer (AREA)
  • Document Processing Apparatus (AREA)

Abstract

L'invention concerne des procédés de traitement de caractères saisis de manière manuscrite, de séparation et de fusion de données et de traitement de codage et de décodage. Une solution de codage et de décodage ouverte orientée objet permet de coder et de décoder tout objet de données par n'importe quelle méthode de codage gratuite et ouverte ; et par rapport à un procédé de séparation/fusion de données à base d'objets, des métadonnées et/ou des données codées d'un objet de données sont séparées/extraites de contenus de données correspondants, de manière à assurer la sécurité du contenu de données. Ces procédés peuvent être mis en oeuvre individuellement, et peuvent également être mis en oeuvre de manière combinée, ou peuvent être combinés à des applications d'autres domaines techniques soit seuls soit de manière combinée.
PCT/CN2015/086672 2014-08-11 2015-08-11 Procédés de traitement de caractères saisis de manière manuscrite, de séparation et de fusion de données et de traitement de codage et de décodage WO2016023471A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201580042761.6A CN106575166B (zh) 2014-08-11 2015-08-11 手写输入字符的处理、数据拆分和合并及编解码处理方法
CN202310088220.3A CN116185209A (zh) 2014-08-11 2015-08-11 手写输入字符的处理、数据拆分和合并及编解码处理方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410392557.4 2014-08-11
CN201410392557 2014-08-11

Publications (1)

Publication Number Publication Date
WO2016023471A1 true WO2016023471A1 (fr) 2016-02-18

Family

ID=55303878

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/086672 WO2016023471A1 (fr) 2014-08-11 2015-08-11 Procédés de traitement de caractères saisis de manière manuscrite, de séparation et de fusion de données et de traitement de codage et de décodage

Country Status (2)

Country Link
CN (2) CN116185209A (fr)
WO (1) WO2016023471A1 (fr)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359283A (zh) * 2018-09-26 2019-02-19 中国平安人寿保险股份有限公司 表格数据的汇总方法、终端设备及介质
CN109804362A (zh) * 2016-07-15 2019-05-24 伊欧-塔霍有限责任公司 通过机器学习确定主键-外键关系
CN110548290A (zh) * 2019-09-11 2019-12-10 珠海金山网络游戏科技有限公司 图文混排方法、装置、电子设备以及存储介质
EP3567507A4 (fr) * 2017-12-20 2020-03-04 iSplit Co., Ltd. Système de gestion de données
CN110968592A (zh) * 2019-12-06 2020-04-07 深圳前海环融联易信息科技服务有限公司 元数据采集方法、装置、计算机设备及计算机可读存储介质
CN111046632A (zh) * 2019-11-29 2020-04-21 智器云南京信息科技有限公司 一种数据提取转换方法、系统、存储介质及电子设备
CN111279304A (zh) * 2017-09-29 2020-06-12 甲骨文国际公司 基于画布上连接的可定位元素配置通信决策树的方法和系统
CN112181950A (zh) * 2020-10-19 2021-01-05 北京米连科技有限公司 一种分布式对象数据库的构建方法
CN112333256A (zh) * 2020-10-28 2021-02-05 常州微亿智造科技有限公司 一种工业物联网下网络传输时数据转化框架系统及其方法
CN112966475A (zh) * 2021-03-02 2021-06-15 挂号网(杭州)科技有限公司 文字相似度确定方法、装置、电子设备及存储介质
CN113360113A (zh) * 2021-05-24 2021-09-07 中国电子科技集团公司第四十一研究所 一种基于oled屏实现动态调整字符显示宽度的系统及方法
TWI738717B (zh) * 2016-03-04 2021-09-11 香港商阿里巴巴集團服務有限公司 基於驗證碼的驗證處理方法及裝置
CN113569534A (zh) * 2020-04-29 2021-10-29 杭州海康威视数字技术股份有限公司 一种检测文档中乱码的方法及装置
CN113625932A (zh) * 2021-08-04 2021-11-09 北京鲸鲮信息系统技术有限公司 一种全屏手写输入的方法及装置
CN113659993A (zh) * 2021-08-17 2021-11-16 深圳市康立生物医疗有限公司 免疫批次数据处理方法、装置、终端及可读存储介质
CN113723048A (zh) * 2021-09-06 2021-11-30 北京字跳网络技术有限公司 设置富文本间距的方法、装置、存储介质及电子设备
US20220107796A1 (en) * 2018-12-25 2022-04-07 Huawei Technologies Co., Ltd. Application Package Splitting and Reassembly Method and Apparatus, and Application Package Running Method and Apparatus
CN114900315A (zh) * 2022-04-24 2022-08-12 北京优全智汇信息技术有限公司 基于ocr和电子签名技术的单据电子化管理系统
US11442712B2 (en) * 2020-06-11 2022-09-13 Indian Institute Of Technology Delhi Leveraging unspecified order of evaluation for compiler-based program optimization
US11494201B1 (en) * 2021-05-20 2022-11-08 Adp, Inc. Systems and methods of migrating client information
US11775843B2 (en) 2017-09-29 2023-10-03 Oracle International Corporation Directed trajectories through communication decision tree using iterative artificial intelligence
CN117371446A (zh) * 2023-12-07 2024-01-09 江西曼荼罗软件有限公司 一种病历文本排版方法、系统、存储介质及电子设备
US12001824B2 (en) * 2018-12-25 2024-06-04 Petal Cloud Technology Co., Ltd. Application package splitting and reassembly method and apparatus, and application package running method and apparatus

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073913B (zh) * 2018-01-05 2022-06-14 南京孜博汇信息科技有限公司 笔迹数据化的数据采集方法
CN110134452B (zh) * 2018-02-09 2022-10-25 阿里巴巴集团控股有限公司 对象处理方法及装置
CN111078907A (zh) * 2018-10-18 2020-04-28 中华图象字教育股份有限公司 汉字树处理方法及其装置
GB2578625A (en) * 2018-11-01 2020-05-20 Nokia Technologies Oy Apparatus, methods and computer programs for encoding spatial metadata
CN110032920A (zh) * 2018-11-27 2019-07-19 阿里巴巴集团控股有限公司 文字识别匹配方法、设备和装置
CN112230781B (zh) * 2019-07-15 2023-07-25 腾讯科技(深圳)有限公司 字符推荐方法、装置及存储介质
CN110543243B (zh) * 2019-09-05 2023-05-02 北京字节跳动网络技术有限公司 一种数据处理方法、装置、设备及存储介质
CN111401137A (zh) * 2020-02-24 2020-07-10 中国建设银行股份有限公司 证件栏位识别的方法和装置
CN114077466A (zh) * 2020-08-12 2022-02-22 北京智邦国际软件技术有限公司 一种Web界面表单中多行多列字段自动布局算法
CN113011413A (zh) * 2021-04-15 2021-06-22 深圳市鹰硕云科技有限公司 基于智能笔手写图像的处理方法、装置、系统及存储介质
CN113760246B (zh) * 2021-09-06 2023-08-11 网易(杭州)网络有限公司 应用程序文本语言处理方法、装置、电子设备及存储介质
CN113608646B (zh) * 2021-10-08 2022-01-07 广州文石信息科技有限公司 一种笔画擦除方法、装置、可读存储介质及电子设备
CN114221783B (zh) * 2021-11-11 2023-06-02 杭州天宽科技有限公司 一种数据选择加密解密系统
CN115022302B (zh) * 2022-08-08 2022-11-25 丹娜(天津)生物科技股份有限公司 设备故障数据远程传输方法、装置、电子设备和存储介质
TWI821128B (zh) * 2023-02-23 2023-11-01 兆豐國際商業銀行股份有限公司 資料核對系統及其方法
CN116827479B (zh) * 2023-08-29 2023-12-05 北京航空航天大学 一种低复杂度的隐蔽通信编解码方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375989A (zh) * 2010-08-06 2012-03-14 腾讯科技(深圳)有限公司 手写识别方法及系统
CN102455867A (zh) * 2011-09-29 2012-05-16 北京壹人壹本信息科技有限公司 一种手写文字信息的匹配处理方法及装置
CN102455845A (zh) * 2010-10-14 2012-05-16 北京搜狗科技发展有限公司 一种文字输入方法和装置
CN103460225A (zh) * 2011-03-31 2013-12-18 松下电器产业株式会社 手写字符输入装置
CN103513898A (zh) * 2012-06-21 2014-01-15 夏普株式会社 手写字符切分方法和电子设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3725877A (en) * 1972-04-27 1973-04-03 Gen Motors Corp Self contained memory keyboard
JP3017740B2 (ja) * 1988-08-23 2000-03-13 ソニー株式会社 オンライン文字認識装置およびオンライン文字認識方法
CN101311887A (zh) * 2007-05-21 2008-11-26 刘恩新 一种计算机手写输入系统及输入方法和编辑方法
CN101673408B (zh) * 2008-09-10 2012-02-22 汉王科技股份有限公司 一种在形状识别结果中嵌入文字信息的方法及装置
CN101739118A (zh) * 2008-11-06 2010-06-16 大同大学 视讯手写文字输入装置及其方法
CN102156608B (zh) * 2010-12-10 2013-07-24 上海合合信息科技发展有限公司 多字符连续书写的手写输入方法
CN102508598B (zh) * 2011-10-09 2014-03-05 北京捷通华声语音技术有限公司 一种字符笔画渐变消隐方法及装置
GB2509552A (en) * 2013-01-08 2014-07-09 Neuratron Ltd Entering handwritten musical notation on a touchscreen and providing editing capabilities

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375989A (zh) * 2010-08-06 2012-03-14 腾讯科技(深圳)有限公司 手写识别方法及系统
CN102455845A (zh) * 2010-10-14 2012-05-16 北京搜狗科技发展有限公司 一种文字输入方法和装置
CN103460225A (zh) * 2011-03-31 2013-12-18 松下电器产业株式会社 手写字符输入装置
CN102455867A (zh) * 2011-09-29 2012-05-16 北京壹人壹本信息科技有限公司 一种手写文字信息的匹配处理方法及装置
CN103513898A (zh) * 2012-06-21 2014-01-15 夏普株式会社 手写字符切分方法和电子设备

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI738717B (zh) * 2016-03-04 2021-09-11 香港商阿里巴巴集團服務有限公司 基於驗證碼的驗證處理方法及裝置
CN109804362A (zh) * 2016-07-15 2019-05-24 伊欧-塔霍有限责任公司 通过机器学习确定主键-外键关系
CN109804362B (zh) * 2016-07-15 2023-05-30 日立数据管理有限公司 通过机器学习确定主键-外键关系
US11775843B2 (en) 2017-09-29 2023-10-03 Oracle International Corporation Directed trajectories through communication decision tree using iterative artificial intelligence
US11900267B2 (en) 2017-09-29 2024-02-13 Oracle International Corporation Methods and systems for configuring communication decision trees based on connected positionable elements on canvas
CN111279304B (zh) * 2017-09-29 2023-08-15 甲骨文国际公司 基于画布上连接的可定位元素配置通信决策树的方法和系统
CN111279304A (zh) * 2017-09-29 2020-06-12 甲骨文国际公司 基于画布上连接的可定位元素配置通信决策树的方法和系统
EP3567507A4 (fr) * 2017-12-20 2020-03-04 iSplit Co., Ltd. Système de gestion de données
CN109359283A (zh) * 2018-09-26 2019-02-19 中国平安人寿保险股份有限公司 表格数据的汇总方法、终端设备及介质
CN109359283B (zh) * 2018-09-26 2023-07-25 中国平安人寿保险股份有限公司 表格数据的汇总方法、终端设备及介质
US12001824B2 (en) * 2018-12-25 2024-06-04 Petal Cloud Technology Co., Ltd. Application package splitting and reassembly method and apparatus, and application package running method and apparatus
US20220107796A1 (en) * 2018-12-25 2022-04-07 Huawei Technologies Co., Ltd. Application Package Splitting and Reassembly Method and Apparatus, and Application Package Running Method and Apparatus
CN110548290B (zh) * 2019-09-11 2023-10-03 珠海金山数字网络科技有限公司 图文混排方法、装置、电子设备以及存储介质
CN110548290A (zh) * 2019-09-11 2019-12-10 珠海金山网络游戏科技有限公司 图文混排方法、装置、电子设备以及存储介质
CN111046632B (zh) * 2019-11-29 2023-11-10 智器云南京信息科技有限公司 一种数据提取转换方法、系统、存储介质及电子设备
CN111046632A (zh) * 2019-11-29 2020-04-21 智器云南京信息科技有限公司 一种数据提取转换方法、系统、存储介质及电子设备
CN110968592B (zh) * 2019-12-06 2023-11-21 深圳前海环融联易信息科技服务有限公司 元数据采集方法、装置、计算机设备及计算机可读存储介质
CN110968592A (zh) * 2019-12-06 2020-04-07 深圳前海环融联易信息科技服务有限公司 元数据采集方法、装置、计算机设备及计算机可读存储介质
CN113569534A (zh) * 2020-04-29 2021-10-29 杭州海康威视数字技术股份有限公司 一种检测文档中乱码的方法及装置
US11442712B2 (en) * 2020-06-11 2022-09-13 Indian Institute Of Technology Delhi Leveraging unspecified order of evaluation for compiler-based program optimization
CN112181950B (zh) * 2020-10-19 2024-03-26 北京米连科技有限公司 一种分布式对象数据库的构建方法
CN112181950A (zh) * 2020-10-19 2021-01-05 北京米连科技有限公司 一种分布式对象数据库的构建方法
CN112333256A (zh) * 2020-10-28 2021-02-05 常州微亿智造科技有限公司 一种工业物联网下网络传输时数据转化框架系统及其方法
CN112333256B (zh) * 2020-10-28 2022-02-08 常州微亿智造科技有限公司 一种工业物联网下网络传输时数据转化框架系统及其方法
CN112966475A (zh) * 2021-03-02 2021-06-15 挂号网(杭州)科技有限公司 文字相似度确定方法、装置、电子设备及存储介质
US11494201B1 (en) * 2021-05-20 2022-11-08 Adp, Inc. Systems and methods of migrating client information
CN113360113A (zh) * 2021-05-24 2021-09-07 中国电子科技集团公司第四十一研究所 一种基于oled屏实现动态调整字符显示宽度的系统及方法
CN113625932A (zh) * 2021-08-04 2021-11-09 北京鲸鲮信息系统技术有限公司 一种全屏手写输入的方法及装置
CN113625932B (zh) * 2021-08-04 2024-03-22 北京字节跳动网络技术有限公司 一种全屏手写输入的方法及装置
CN113659993A (zh) * 2021-08-17 2021-11-16 深圳市康立生物医疗有限公司 免疫批次数据处理方法、装置、终端及可读存储介质
CN113723048A (zh) * 2021-09-06 2021-11-30 北京字跳网络技术有限公司 设置富文本间距的方法、装置、存储介质及电子设备
CN114900315A (zh) * 2022-04-24 2022-08-12 北京优全智汇信息技术有限公司 基于ocr和电子签名技术的单据电子化管理系统
CN114900315B (zh) * 2022-04-24 2024-03-15 北京优全智汇信息技术有限公司 基于ocr和电子签名技术的单据电子化管理系统
CN117371446A (zh) * 2023-12-07 2024-01-09 江西曼荼罗软件有限公司 一种病历文本排版方法、系统、存储介质及电子设备
CN117371446B (zh) * 2023-12-07 2024-04-16 江西曼荼罗软件有限公司 一种病历文本排版方法、系统、存储介质及电子设备

Also Published As

Publication number Publication date
CN116185209A (zh) 2023-05-30
CN106575166B (zh) 2022-11-29
CN106575166A (zh) 2017-04-19

Similar Documents

Publication Publication Date Title
CN106575166B (zh) 手写输入字符的处理、数据拆分和合并及编解码处理方法
US20210165955A1 (en) Methods and systems for modeling complex taxonomies with natural language understanding
US10089299B2 (en) Multi-media context language processing
US11556697B2 (en) Intelligent text annotation
US10423649B2 (en) Natural question generation from query data using natural language processing system
US9003295B2 (en) User interface driven access control system and method
TWI590082B (zh) 應用程式的共享分散式詞庫
US10049098B2 (en) Extracting actionable information from emails
US8750630B2 (en) Hierarchical and index based watermarks represented as trees
CN111026319B (zh) 一种智能文本处理方法、装置、电子设备及存储介质
CN110597963A (zh) 表情问答库的构建方法、表情搜索方法、装置及存储介质
KR20090127936A (ko) 클라이언트 입력 방식
US20120158742A1 (en) Managing documents using weighted prevalence data for statements
CN111026858A (zh) 基于项目推荐模型的项目信息处理方法及装置
CN111314388B (zh) 用于检测sql注入的方法和装置
US20230090050A1 (en) Search architecture for hierarchical data using metadata defined relationships
WO2023024975A1 (fr) Procédé et appareil de traitement de texte, et dispositif électronique
CN111886596A (zh) 使用基于序列的锁定/解锁分类进行机器翻译锁定
JP2022518645A (ja) 映像配信時効の決定方法及び装置
CN105144147A (zh) 检测并重构固定格式文档中的从右到左文本方向、连字和变音符号
CN110990057A (zh) 小程序子链信息的提取方法、装置、设备及介质
US20190188004A1 (en) Software application dynamic linguistic translation system and methods
CN107526742A (zh) 用于处理多语言文本的方法和设备
US9898467B1 (en) System for data normalization
GB2603586A (en) Document access control based on document component layouts

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15832430

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15832430

Country of ref document: EP

Kind code of ref document: A1