WO2017107010A1 - Information analysis system and method based on event regression test - Google Patents

Information analysis system and method based on event regression test

Info

Publication number
WO2017107010A1
WO2017107010A1 PCT/CN2015/098086 CN2015098086W WO2017107010A1 WO 2017107010 A1 WO2017107010 A1 WO 2017107010A1 CN 2015098086 W CN2015098086 W CN 2015098086W WO 2017107010 A1 WO2017107010 A1 WO 2017107010A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
information
module
natural language
text
database
Prior art date
Application number
PCT/CN2015/098086
Other languages
French (fr)
Chinese (zh)
Inventor
易峥
夏炜
陶志伟
潘杭平
Original Assignee
浙江核新同花顺网络信息股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor ; File system structures therefor

Abstract

One aspect of the present invention relates to an information analysis method based on an event regression test. The information analysis method can comprise: collecting information, pre-processing the information and extracting an entity in the information, processing the information and extracting a relevant attribute in the information, determining the type of information and generating a refinement event, generating a corresponding natural language sentence according to the generated refinement event, performing a regression test on the generated natural language sentence, and generating a regression testing report, etc. In addition, the present invention relates to a user being able to select any announcement or news to perform a regression test, and generate a regression testing report. Moreover, the present invention relates to a user being able to input a natural language sentence and perform a regression test on an input natural language sentence so as to generate a regression testing report. Furthermore, the present invention relates to an information analysis system based on an event regression test, comprising a collection module, a system database, a processing module, a natural language processing module and a regression testing module.

Description

Backtesting event information analysis system and method based on FIELD

The present invention relates to a system and method for analyzing information, especially information related to an event obtained natural language sentence is automatically analyzed to obtain its history back to the measurement information.

Background technique

With the growing popularity of the Internet, people are increasingly accustomed to using the Internet to obtain information or analyze information based on the data. Internet continues to expand due to the increasing coverage of the information, the data is also increasing, when people try to use the Internet to obtain certain information, the information will always be a lengthy and diverse content and other characteristics, need to take time to read and analysis. Meanwhile, some industry sectors or users need to make decisions based on current information back-tested historical data. For example, in the financial industry in the area, the study trading strategy or investment strategy, using backtesting can evaluate the performance and effectiveness of the policy in the past period of time to help investors analyze investment decisions. Etc. Another example is the case in weather forecasting, real-time temperature, humidity, wind direction and barometric pressure data, it is possible to predict future weather by analyzing historical weather situation under the same conditions.

Brief

In one aspect the present invention relates to an information analysis system, according to one embodiment, the information analysis system includes a computer-readable storage medium, said storage medium storing executable module, the storage medium includes a collection module, said collection module to collect information; processing module, the processing module can be pre-collected information, extracts the event information from the preprocessing; natural language processing module, the natural language processing module is capable of generating events based on the extracted natural language statement; back sensor module, the sensing module is capable of generating back to back binding measurement result history information generated in accordance with a natural language sentence. A processor capable of executing the computer-readable storage medium storing executable module.

The event to another embodiment of the present invention, the information analysis system further comprises a database, a database capable of storing the collected information, the pre-information, the extracted natural language statement, history information, back to the measurement result.

According to another embodiment of the present invention, the database includes the original information databases, text databases, text preprocessing database, the database entity, the event attribute database, a keyword database, text categorization database, history information database, a database natural language processing, event recognition database, database back-tested module, text template database, dictionary database.

According to another embodiment of the present invention, the information analysis system further comprises a processing module format conversion module, text processing module, the attribute extraction module, an event recognition module.

According to another embodiment of the present invention, the processing module further comprises a text classification module.

According to another embodiment of the present invention, the method uses the processing module comprises a chi-square statistic, information gain, mutual information, odds ratio, cross entropy, difference information between classes, statistical keyword, decision trees, Rocchio, Naive Bayes , neural networks, support vector machines, linear least squares fit, nearest neighbor algorithm, genetic algorithm, sentiment classification, maximum entropy, Generalized Instance Set, synonyms configuration, Boolean association rules, location rules, machine learning.

According to another embodiment of the present invention, the natural language processing module may receive the information from the collection module.

According to another embodiment of the present invention, the sensor module further comprises a return information back to test determination, the evaluation was given back measurement information is determined according to the outcome of the test back.

According to another embodiment of the present invention, the measurement results can be presented back to the user.

Aspect of the invention relates to an information analysis method, the analysis method information includes collecting information; generating a natural language sentence based on the event;; according to the extracted event information to the natural language statements back test analysis.

According to another embodiment of the invention, the collecting information comprising user input information and the non-user input information, the input information sources, including non-user communication terminals and a server.

According to another embodiment of the invention, the collected information includes advertisement information and news information.

According to another embodiment of the present invention, the event extracting further comprises extracting the entity and attributes identified.

According to another embodiment of the present invention, the entity identification further comprises format conversion, text word, and a digital processing unit normalization.

According to another embodiment of the present invention, the attribute extraction can be achieved by the system-defined model.

According to another embodiment of the invention, the natural language sentence can be generated according to the extracted event.

According to another embodiment of the present invention, the natural language statement based on user input information generated.

According to another embodiment of the invention, the natural language sentence in accordance with a further extension event category.

According to another embodiment of the invention, the natural language sentence analysis includes the natural language statement back test.

According to another embodiment of the invention, the natural language utterance may be generated back to back measurement according to the measurement result information type.

DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings for describing the need to use a simple embodiment will be described. Apparently, the drawings in the following description are only some embodiments of the present invention, those of ordinary skill in the art is concerned, without any creative effort, and the like may also be based on these drawings will be applied to the present invention scene. Unless apparent from the locale, or otherwise stated, the figures represent the same structures and the same reference numerals operation.

Figure 1 is a schematic diagram of an exemplary system information analysis system configuration;

It is a block schematic diagram of the information analysis system shown in Figure 2;

Analysis of the information shown in FIG. 3 flowchart;

It is a schematic view of the collection module shown in FIG 4;

It is a structural diagram of the processing modules shown in Figure 5;

Is a schematic diagram of the format converting module shown in Figure 6;

It is a schematic diagram text preprocessing module shown in FIG 7;

It is a schematic diagram text classification module shown in FIG 8;

It is a schematic diagram of the attribute extraction module shown in FIG 9;

Is a flowchart of the processing modules shown in Figure 10;

Natural language processing is a schematic view of the module shown in FIG 11;

It is a schematic diagram of the sensing module back to FIG 12;

It is measured back flowchart shown in FIG 13;

Is a schematic view of the system shown in FIG. 14 database;

Analysis of the information shown in FIG. 15 flowchart;

It is a flowchart of information online analysis system shown in Figure 16;

Is a schematic view of the information analysis system for a news bulletin or interface shown in FIG. 17;

Is a schematic view of the information analysis system for a user input interface shown in FIG. 18;

Announcement text information analysis diagram is used by the system shown in Figure 19.

specific description

As shown in the present specification and claims, unless the context clearly suggest exception, "a", "an", "an" and / or "the" are not specific singular words, also include the plural. Generally, the term "comprising" and "including," and only the prompt includes the step elements have been clearly identified, and these steps do not constitute elements and a row of its list of methods or apparatus may also comprise additional steps or elements.

Information analysis method according to the present specification refers to collecting information, processing the information, generating a natural language sentence, analyzed data to provide reference information. In some embodiments, an aspect of the present invention relates to an information analysis system. The analysis system may include information collection module, a database system, a processing module, and return the natural language processing module sensing module. Another aspect of the invention relates to a method of analysis based on the event information back measured. This method may include collecting information analysis information, the information and the information in the pre-entity extraction, processing and extracts information related to the attribute information, determines the information type and generating a refined event, to generate the corresponding natural refinement according to the generated event language statement, the natural language statements generated were back-tested, generating backtesting reports. Another aspect of the invention relates to a user can select any announcement or news in real-time back-tested, and generate reports back to test. Another aspect of the invention relates to a user can enter any natural language sentence and the input natural language statements backtesting to generate real-time measurement reports back.

Different embodiments of the present invention are applicable to various fields, including but not limited to financial investments and derivatives thereof (including, but not limited to, stocks, bonds, gold, paper gold, silver, foreign exchange, precious metals, futures, IMF etc.), technology (including, but not limited to, mathematics, physics, chemistry and chemical engineering, biology and biological engineering, electronic engineering, communications systems, Internet, networking, etc.), political (including but not limited to politicians, political events, the country), news ( in terms of regions, including but not limited to regional news, national news, international news; news from the main body, includes but is not limited to political news, sports news, technology news, economic news, live news, weather news, etc.) and so on. According to the invention of at least one embodiment, it can be a variety of information resources, such as text, images, audio and video content for rapid refinement, and based on historical information, refining backtesting backtesting strategies and generate reports, allowing users to more quick and easy to understand information that may affect the future. Different embodiments of the present invention include application scenario, but not limited to, web browser plug-in client, customization system, internal analysis systems, artificial intelligence robots, one or more combinations thereof. The above description of the art is only applicable to a specific example, it should not be considered the only possible embodiment. Obviously, those skilled in the art that, in the understanding of the basic principles of the latter event backtesting of information and analysis system based, may be made without departing from this principle, the application field of the implementation of the above methods and systems forms within the scope and various modifications and changes in detail, but such modifications and variations are still described above. For example, in one embodiment of the present invention, based on the measurement report back to form a unified text displayed to the user, for the skilled in the art, it may be based on the measurement report back to the unified audio format or video format to user. Alternatively, or with such modifications or changes as similar, still within the scope of the present invention. In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings for describing the need to use a simple embodiment will be described. Apparently, the drawings in the following description are only some embodiments of the present invention, those of ordinary skill in the art is concerned, without any creative effort, and the like may also be based on these drawings will be applied to the present invention scene. Unless apparent from the locale, or otherwise stated, the figures represent the same structures and the same reference numerals operation.

It is a schematic diagram of an exemplary system information analysis system configuration shown in FIG. Exemplary system configuration 100 may include but is not limited to one or more of the information analysis system 101, one or more networks 102 and one or more information sources 103. Information analysis system 101 may be used to analyze the collected information processing system to generate an analysis result. Information analysis system 101 may be a server or a server group. Group may be a centralized server, such as data centers. A group of servers can be distributed, for example, a distributed system. Information analysis system 101 may be local or may be remote. Network 102 may provide a channel for information exchange. Network 102 may be a single network, the network may be a variety of combinations. Network 102 may include, but is not limited to one local area network, wide area network, a public network, a private network, a wireless local area network, a virtual network, metropolitan city, a public switched telephone network or the like, or various combinations. Network 102 may include multiple network access points, wired or wireless access point, base station or a network switching point, by more access points connected to the network that the data source 102 and transmits information through the network. The information source 103 may provide various information. Information sources 103 may include but is not limited to the server, a communication terminal. Further, the server may be, or any combination of web servers, file servers, database servers, FTP servers, application servers, proxy servers and the like of the server. The communication terminal may be a mobile phone, a personal computer, wearable any combination of devices, tablet computers, smart television, or if the communication terminal. The information source 103 may be transmitted through the network 102 and / or information to the information collection and analysis system 101, the information source 103 may be information input by the user, the information may be a database or other information sources provided.

FIG 2 is a block schematic diagram of the information analysis system shown in FIG. Information analysis system 101 may include, but is not limited to one or more collection module 201, one or more processing modules 202, one or more natural language processing module 203, a sensing module 204 or a plurality of back, one or more system databases 205. Some or all of the above may be connected to the network module 102. The aforementioned means may be centralized or may be distributed. The above-described module one or more modules may be local or remote. The collection module 201 may be mainly used in various ways to collect information required, collect information may be direct (e.g., via the network 102 directly from one or more information sources 103) or indirect (e.g. by The processing module 201, a natural language processing module 202, sensor module 204 back to the system or to obtain information database 205). The processing module 202 may be mainly used for precoding information, precoding information may be manual or may be automated, preprocessing information may include, but are not limited to format conversion, word processing, entity recognition, and digital unit normalization processing, text categorization, event attribute extraction A refinement event recognition, or the like decrypting the encrypted document in various combinations. Natural language processing module 203 may be mainly used for generating a natural language sentence, you may also receive natural language sentence input. Natural language processing module 203 processes the information mode may be manual or may be automated. Backtesting module 204 may be mainly used to analyze the information. Analysis may include but is not limited to a system-defined, user-defined selection, machine learning wherein one or more combinations. Information analysis may be implemented manually, but also may be done automatically, or a combination of both is completed. Database 205 may generally refers to a device having a memory function. The system database 205 is used mainly for storing data generated from the data analysis system 101 and information source 103 operation information collected. Database system 205 may be local or may be remote. Connection or communication between the database system and other system modules may be wired or may be wireless.

The collection module 201 may transmit the collected information to the processing module 202. Transmitting information collection module 201 may also be collected to a natural language processing module 203. Information collection module 201 may transmit the collected sensing module 204 to the back. The collection module 201 may receive a process request module 202 may request access to the system in accordance with the database 205, to obtain the required data. After the required data is acquired, the data collection module 201 may be transmitted to the processing module 202. The collection module 201 may receive a natural language processing module 203 sends the request, the system can also access the database 205 according to the request, to obtain the required data. After the required data is acquired, the data collection module 201 may be transmitted to a natural language processing module 203. Press collection module 201 may receive a request sent by sensing module 204, can access the system database 205 in accordance with the request, to obtain the required data. After the required data is acquired, the data collection module 201 may be transmitted to the sensor module 204 back.

The collection module 201 may be primarily used to collect information required in various ways. The collection module 201 may send a request to the information source 103, to obtain the required information. After collection module 201 acquires necessary information can be obtained information for further processing or stored in the system database 205. The collection module 201 can also send a request to the system database 205 to acquire information stored in the system database 205. Alternatively, the system database 205 can also directly send a request to the information source 103, the acquired information may be stored in the system database 205. The information source 103 may be a server, a communication terminal and the like. Further, the server may be, or any combination of web servers, file servers, database servers, FTP servers, application servers, proxy servers of the server. The communication terminal may be a mobile phone, a personal computer, various combinations of wearable devices, tablet computers, smart television, or the communication terminal. Information required above may include, but are not limited to, a variety of news, one or more research reports, announcements, news, reports, notices, papers, journals and the like. Information required above may be information about various industries, including but not limited to sports, entertainment, economic, political, military, culture, art, science and engineering in one or more. In the form of the required information may include, but are not limited to text, images, audio, video, and other one or more. For example, a video playback of video news site, "World Bank lowered this year's global economic growth forecast to 2.8%", a news website reported the news website "May HSBC China services PMI rose to 53.5", a listed company on a stock exchange release announcement "a Company Limited on the signing of major contracts daily operations of the announcement," a live sports platform release of the football tournament notice "on Saturday will face Chelsea at home to city rivals Arsenal at Stamford bridge."

The processing module 202 may perform bidirectional communication with the collection module 201. The processing module 202 may process the information collection module 201 of the transmission, the information processing can include, but are not limited to one kind of format conversion, text preprocessing, text classification, attributes extraction and event recognition, and the like or various combinations. The processing module 202 may be 201 transmits information to the collection module, the information sent may include, but are not limited to processed information and control information, the control information may include but is not limited to control information collection method, the information collection controlling information on the time, information collection control information and other sources. The processing module 202 may perform two-way communication with the natural language processing module 203. The processing module 202 may be transmitted through the processed information to the natural language processing module 203 may also receive natural language processing module 203 information transmitted. The processing module 202 may perform bidirectional communication with the sensor module 204 back. The processing module 202 may be transmitted through the processed information back to the sensing module 204 may also receive information back to sensing module 204 is transmitted. The processing module 202 may perform two-way communication with the system database 205. The processing module 202 may be transmitted through the processed information to the system for storage database 205, 205 may send a request to the system information database processing system database 205 and receive information transmitted.

Natural language processing module 203 may send a request 201 to the module collection, the collection module 205 or 201 may be from one or more information sources 103 according to a request to access the system database to obtain the required information. After the necessary information is acquired, the collection module 201 transmits the information to the natural language processing module 203. Alternatively information transfer, the collection module 201 after receiving the request from the natural language processing module 203 is sent, the collection module 201 may be in a natural language processing module 203, the information from the information source 103 may be a database system or 205. Natural language processing module 203 may send a request to the processing module 202, processing module 202 may be 205 to obtain information needed to access the system database according to the request. After the necessary information is acquired, the processing module 202 transmits the information to the natural language processing module 203. Alternatively, the processing module 202 in the module 203 after receipt of the request 203 sent from the natural language processing module, the information may be transmitted to the processing module 202 in natural language processing. Alternatively, the natural language processing module 203 may directly access the system database 205, the database 205 transmits a request to the system to obtain the necessary information, the information may be transmitted to a natural language processing module 203. Alternatively, the system 205 may transmit information to the database natural language processing module 203 without receiving the request. Embodiment, the natural language processing module 203 may receive the information directly from the source natural language sentence 103 (not shown in FIG.) In one embodiment of the present invention, the user may be a natural language sentence using the input device, the input device including, but Any combination of one or more keyboards, mouse, camera, scanner, handwriting input board, a voice input device or the like.

Natural language processing module 203 input information may be letters, numbers, characters, words, phrases, sentence, paragraph, text, etc., or wherein one or more, or any number of collection identifier that can be set comprising one or more semantic. Alternatively, the natural language processing module 203 may input information to customize the type of information. In some embodiments of the present invention, the natural language input information processing module 203 may be characterized as a multi-group. For example: a natural language input information processing module 203 may be characterized as a four-tuple {k, c, u, d}. Wherein the parameter k may be configured to represent information source, the information source may include, but is not limited to the collection module 201, processing module 202, a system database 205, the information source 103 (not shown), or any combination of these sources of information. Parameter c may be configured to represent a communication time. For example: c parameters can be configured to represent the year, month, date, and so on. By giving specific numerical values ​​to parameter c, information of the specific time specified by the parameter c to be input natural language processing module 203. The parameter u represents a user may be configured to use the model. User model based on different user needs and data processing models have different functions. Parameter u said they did not use any data model in case of default. Parameter d may be configured to generate information indicates. Generated information refers to various attributes of the user entity and a process of natural language processing has been generated, and the like, various entities and attributes to be used in the subsequent process natural language processing.

Natural language processing module 203 information may be collected for processing to generate a natural language sentence. Generating a natural language statements can be transmitted back to the sensing module 204 to perform back-tested. Specifically, the natural language processing module 203 may send 204 a request to return back to measuring sensor module, then the request is granted, the natural language processing module 203 generates a natural language sentence input sensing module 204 to the return-back test. Alternatively, the natural language processing module 203 may not send a request back to the test, but directly to the natural language sentence generating input sensing module 204 to the return-back test. In one embodiment of the present invention, the back sensing module 204 after receiving the natural language utterance in natural language processing module 203 input natural language statement further processed to generate standard database access instruction to access or recall the appropriate stored in the database historical data.

In another embodiment of the present invention, the natural language processing module 203 may receive events generated processing module 202, natural language processing module 203 may be assembled to the received natural language statements (events), may be measured according to the back needs, plus an additional condition. For example: For stocks event, you need to add "code or short stocks"; backtesting event for the industry, you need to add "stocks corresponding to the industry"; for the whole market events (such as the central bank cut interest rates) back to test you do not need to add any statement .

Note that the above description of the natural language processing module 203 input information only to facilitate understanding of the invention, the present invention should not be considered the only possible embodiment. For the person skilled in the art, after understanding the basic principles of information required, may be made without departing from this principle, the contents of the information required for the various modifications and changes, but these amendments and changes still within the range described above. For example, natural language processing module 203 input information may be characterized as a tuple, triple, quintuple, sextuple, N-tuples, etc., or any combination of the aforementioned types of information.

Back sensing module 204 may be sent back to the collection module 201 requests measurement conditions, the information collection module 201 may request access to the system database 205 acquires required. After the necessary information is acquired, the collection module 201 transmits the information back to the sensing module 204. Alternatively, the collection module 201, after receiving the request 204 sent from the back of the sensing module may transfer information stored in the collection module 201. The sensing module 204 to the back. Back sensing module 204 may send a request to the processing module 202, processing module 202 may obtain needed information 205 according to a request to access the system database. After the necessary information is acquired, the processing module 202 may transmit the information back to the sensing module 204. Alternatively, processing module 202, after receiving the request 204 sent from the back of the sensing module may transfer information stored in the processing module 202 back to the sensing module 204. Back sensing module 204 may send a request to the natural language processing module 203, a natural language processing module 203 may obtain needed information 205 according to a request to access the system database. After the necessary information is acquired, the natural language processing module 203 may transmit the information back to the sensing module 204. Alternatively, the natural language processing module 203, after receiving the request 204 sent from the back of the sensing module may transfer information stored in the natural language processing module 203 back to the sensing module 204. Alternatively, the sensing module 204 may return directly access the system database 205, the database 205 transmits a request to the system to obtain the necessary information, the information may be transmitted back to the sensing module 204. Alternatively, the system database 205 can send information back to the sensing module 204 without receiving the request. Back sensing module 204 receives the input information may include but is not limited to letters, numbers, character, word, sentence, paragraph, chapter, and other natural language sentence, one or more combination thereof. Sources of input information may include but is not limited to the collection module 201, wherein the one or more combinations of the processing module 202, a natural language processing module 203, a system database 205, the information source 103 and the like.

The system database 205 in the other storage device refers to all the system or may have a read / write medium. The system database 205 or other storage devices in the system may be internal to the system, the system may be external devices. The system database 205 or other storage device connections in the system may be wired or may be wireless. Database system 205 may include other storage devices, but is not limited to one hierarchical database, network database and a relational database or the like, or various combinations within the system. The system database 205 or other storage device within the system may be in digital information storage device after utilization of electrical, magnetic or optical manner stored. Within the system database 205 or other storage device system can be used to store various programs and data information, for example. The system database 205 or other storage devices in the system may utilize information stored energy devices, such as various memory, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM) and the like. Within the system database 205 or other storage device may be a system using the magnetic energy stored in the information device, such as hard disks, floppy disks, magnetic tape, magnetic core memory, bubble memory, U disk, flash memory, and the like. Within the system database 205 or other storage device may be a system using an optical device stores information, etc. such as a CD or DVD. The database system 205 or other system using the magneto-optical storage device may stores information device, such as a magnetic optical disk. The system database 205 or other access devices stored within the system may be a random access memory, the serial access memory, read only memory, etc., one or more combinations thereof. Within the system database 205 or other storage system may be non-permanent memory storage devices, may be a permanent memory storage. Storage devices mentioned above are cited some examples, the system may use the storage device is not limited thereto. Within the system database 205 or other storage system may be local or may be remote, and may be on the cloud server.

Within the system database 205 or other storage device operating system information may include, but is not limited to storage, classification, sorting, filtering, etc. of one or more combinations thereof. The system database 205 or other storage devices in the system may transfer or exchange information with the information source 103. Within the system database 205 or other storage device may receive system information source information 103, which will be stored within the system database 205 or other storage devices on the system. According to instructions received inside the other storage devices in the system 205 or the system information database can be extracted, transferred to the information source 103. The instructions may be directly from the information source 103 may be derived from other modules, such as the collection module 201, processing module 202, natural language processing module 203, sensor module 204 back to the other. Other transmission or storage device 201 may exchange information with the collection module 205 or system database system. The system database system 205 or other storage device 201 may receive the information collected by the collection module, which is stored within the system database 205 or other storage devices on the system. According to instructions received, the system database 205 or other storage devices in the system information which can be extracted, transferred to a collection module 201. The instructions may be collected directly from the module 201, may be derived from other modules, such as the processing module 202, natural language processing module 203, sensor module 204 back to the information source 103 and the like.

Within the system database 205 or other storage device system processing module 202 may transfer or exchange information. Within the system database 205 or other storage device system may receive the collected information processing module 202, which will be stored within the system database 205 or other storage devices on the system. According to instructions received inside the other storage devices in the system 205 or the system information database can be extracted, it passed to the processing module 202. The instructions may be directly from the processing module 202, may be derived from other modules, such as the collection module 201, natural language processing module 203, sensor module 204 back to the information source 103 and the like.

Other transmission or storage device 203 may exchange information with natural language processing system database 205 or module within the system. Within the system database 205 or other storage device system may receive a natural language processing module 203 collection of information, which is stored within the system database 205 or other storage devices on the system. According to instructions received inside the other storage devices in the system 205 or the system information database can be extracted, it passed to the natural language processing module 203. The instructions may be directly from the natural language processing module 203, may be derived from other modules, such as the collection module 201, processing module 202, sensor module 204 back to the information source 103 and the like.

Information database 205 may be transmitted directly to obtain information from an information source, the information may be analyzed after treatment. After analysis processing information, it may be processed after the processing module 202 information stored in an information system database 205, may be after the natural language processing module 203 processes stored. The system database 205 or the system other storage devices and methods other modules information transfer may be wired or may be wireless, may be direct or indirect, and may be simultaneously performed or may be performed sequentially, may be and the like may also be non-periodic cycle.

Apparent to one skilled in the art, after understanding the principles of the analysis system and method of information are possible without departing from the principle that, for each module, or any combination with other subsystems constituting the module is connected, on the application forms within the scope and various modifications and changes in the details of the above-described embodiments of methods and systems, but such modifications and variations are still described above. For example, the collection module 201, processing module 202, natural language processing module 203, sensor module 204 back to the system and database 205 may be embodied in different modules of a system, a module may be two to achieve the above two or more modules the scope of the claimed features, such as the processing module 202 may gather information and generates a natural language sentence, the processing module while realizing the function of the collection module 201 and the natural language processing module 203, similar to the modification of the present invention are still within the claims.

Analysis of the information in the flowchart shown in FIG. 3. In step 301 the needed information is collected from an information source 103 (see FIG. 1). Information sources 103 may include but is not limited to the server, a communication terminal. Further, the server may be, or any combination of web servers, file servers, database servers, FTP servers, application servers, proxy servers and the like of the server. The communication terminal may be a mobile phone, a personal computer, wearable any combination of devices, tablet computers, smart television, or the communication terminal. Further, at step 301, user input through various communication terminal may be received natural language statement. Information required above may include, but are not limited to, a variety of news, one or more announcements, commentary, research reports, blog, news, reports, notices, papers, journals and the like. Information required above may be information about various industries, including but not limited to sports, entertainment, economic, political, military, culture, art, science and engineering in one or more. In the form of the required information may include, but are not limited to text, images, audio, video, and other one or more. For example, the news may be a video Web site to play video news, "the World Bank lowered this year's global economic growth forecast to 2.8%", a news website reported the news website "May HSBC China services PMI rose to 53.5", a stock exchange listed company issued "a company Limited on the signing of major contracts daily operations of the announcement," a live sports platform release of the football tournament notice "on Saturday Chelsea will be at home against city rivals Arsenal at Stamford bridge" and so on. Step 301 may be performed by the collection module 201.

Information collected in step 301 in step 302 is processed. Step 302 may be performed by the processing module 202. In some embodiments of the present invention, the information gathered in step 301 may be character information. The character information may be directly or indirectly derived from text, audio, video, or any combination of these sources. Further, when a text message from the audio, the system may extract the audio into text by voice recognition or subtitles. When the text information from a video, the video may be extracted into text by voice recognition or subtitle files. Text messages can be Chinese, English, German, Spanish, Arabic, French, Japanese, Korean, Russian, Portuguese, etc., or any combination of these languages. Further, the text information may be letters, numbers, characters, words, phrases, sentence, paragraph, text, etc., or wherein one or more, or any number of a set of identifiers of the identifier set may comprise one or more semantics. The information processing performed in step 302 may include, but are not limited to format conversion, word processing, entity recognition, and digital unit normalization processing, text categorization, event attribute extraction, refining and other events identified in one or more of. Format converter can convert text information in various formats into a unified text format. Text formatting information may include but are not limited to: one or more pdf, doc, epub, mobi, caj, kdh, nh the like, or the above-described format. Unified text format may include, but is not limited to one txt, ASCII, MIME, etc., or various combinations.

Word processing text information may be extracted according to the word type out the words, the word types may include, but are not limited to nouns, verbs, adjectives, adverbs, auxiliary, onomatopoeic words, numbers, specific symbols, or wherein one or more . Alternatively, text information may also be applied in certain segmentation algorithm is processed. Segmentation algorithm may include but is not limited to word string matching method (i.e., mechanical lexical points) based segmentation method based appreciated, based on statistical word segmentation method, or one or more of the sub-word method. Word processing is complete, text information identifying an entity. Entity may include but is not limited to products, organization name, person, place, time, date, currency, numbers, percentages of one or more of the like. Entity identification methods may include but are not limited to, hidden Markov models, maximum entropy models, support vector machines, rule-based identification method and identification method based on statistics or the like, or one or more of them. Specifically, the system summarizes the elements of past information, define various event categories. For example: class diplomacy, finance, sports, political class, science, education, etc., or one or more of them. The above categories may also contain several levels of sub-categories, such as financial class can include stocks, funds, futures and other subclasses. After the above categories may comprise complete entity recognition, the system will identify the information entity through the text in the digital units and normalized. For example, "The total investment is thirty thousand yuan" to "project a total investment of 30,000 yuan," the "Macy's home game against Real Madrid completed a hat-trick in Barca" into "Messi Barcelona home game against Real Madrid He scored three goals "and so on.

After the completion of the digital unit and normalization, the system will classify text information to obtain broad categories of text information (such as financial class). In one embodiment of the present invention, the system may access the system database 205 (see FIG. 2), and database 205 stores text appearing in the category number of keywords preset weight value or the like characterized by certain properties The method of calculation is calculated, and classified by Calcd. The category keywords may be extracted out by a specific method, the extraction method may include, but are not limited to, the chi-square statistic based on synonyms statistical rules, Boolean association rule, location rule, information gain, mutual information, odds ratio, cross entropy a method of poor information between classes or more thereof. In another embodiment of the present invention, the system may employ ,, text classification methods based on machine learning, a decision tree including, but not limited to, Rocchio, naive Bayes, neural networks, support vector machines, linear least squares fit , nearest neighbor algorithm kNN, genetic algorithm, maximum entropy method, or one or more of them. That marked by good training class label text learning, get classifier, thus new text objects sentiment classification. Text information can be classified and assigned to one or more categories. Category may be pre-defined categories, and may comprise several levels of sub-class. As in the financial sector, the text information can be divided into categories including but not limited to the announcement, news, research reports class, class blog, forum class, microblogging class, interactive class investment. Notice class may include, but not limited to, contract type, annual reports category, emotional etc. subclasses. Alternatively, the announcement of those may include, but are not limited to, periodic reports and equity allocation announcement, trading announcement, announcement to raise funds, major issues, preferential policy announcement, notice of changes in senior management, the acquisition of buyback announcements. According to the news source system can be divided into a reliable source of news and non-reliable sources, such as official information source CCTV Financial Channel can be considered a reliable source of information, news category may include, but are not limited to, finance and economics, political affairs, science and education, politics and law classes, social one or more combinations, sports, military, entertainment and so on. It should be noted that the above described classification is just to facilitate understanding of the invention, the present invention should not be considered the only embodiments. In the classification step of the present invention is not essential, some text information, the system may determine directly its category, it is possible to skip the step of classification. For example: For a title information is displayed as "A company announcement on the signing of the daily operations of major business contracts", the system can directly determine its announcement category.

After completion of text classification, attributes text information can be extracted. A description of the nature of the entity attributes and relationships. For example, as shown in FIG. 19 is "CITIC Securities Co., Ltd. 2014 Annual Report" bulletin cover and map excerpts for this announcement, the entity can be drawn, "CITIC Securities", from the table 19 can be extracted properties include operating income, net profit increase or decrease over the same period, total assets, total liabilities, total shareholders' equity, etc., or one or more of them. After completion of the attribute extraction, the system may, entity and attribute binding methods according to certain rules, generating a refined event. For example, the aforementioned entity "CITIC Securities", it can be one of its attributes of "net profit increase or decrease over the same period," a combination may generate "CITIC Securities 2014 annual net profit growth of 116.2 percent," the refined event. After the event has been refined, the system can generate a natural language sentence (step 303) according to event refinement receive, such as "CITIC Securities net profit growth of more than 100% annual" event can be for individual stocks, industry events and market-wide events natural language statements, such as "CITIC Securities net profit annual growth rate of more than 100%, the brokerage industry annual net profit growth of more than 100% annual net profit growth of more than 100%.". 203 Step 303 can be performed by natural language processing modules. Generating a natural language sentence may be input to the analysis system (e.g., back to the sensing module 204), for analysis (step 304) the identified event. In one embodiment of the present invention, step 304 may be completed by the return sensor module 204. The above analysis may include, but are not limited to, the event back-tested. Backtesting refers to events related to historical events and data according to certain rules of combination, backtesting to generate reports for user reference at the time of investment.

Note that, the above description of the information analysis system will process only for the understanding of the invention, the present invention should not be considered the only possible embodiment. Information may also be collected directly convert natural language sentence (step 303), then the above-described natural language sentence is analyzed (step 304). Alternatively, the system may be collected information is analyzed (step 304) directly.

It is a schematic view of the collection module 201 shown in FIG. The collection module 201 may include, but are not limited to, an acquisition unit 401, a unit 402, a storage unit 403 and processed. The information acquisition unit 401 from the source 103 (see FIG. 2), or other information in the system (e.g., processing module 202, natural language processing module 203, sensor module 204 back to system database 205) acquired the required modules. Information required above may include, but are not limited to, a variety of news, one or more announcements, commentary, research reports, blog, news, reports, notices, papers, journals and the like. Information required above may be information about various industries, including but not limited to sports, entertainment, economic, political, military, culture, art, science and engineering in one or more. Form the required information may include, but are not limited to one text, images, audio, video or more, for example, may be a video news broadcast video news site, "World Bank lowered this year's global economic growth forecast to 2.8% ", a news website reported the news website" May HSBC China services PMI rose to 53.5 ", a listed company on a stock exchange announcement" a Co., Ltd. on the signing of major contracts daily operations "and a live sporting events platform release of the football tournament notice "on Saturday will face Chelsea at home to city rivals Arsenal at Stamford bridge" and so on. Alternatively, the acquisition unit 401 may also receive information directly input by the user, this information may include but is not limited to a natural language sentence, the program language.

The processing unit 402 may process the collected information. Processing may include but is not limited to the collected information stored in the storage unit 403, the collected information is stored in the system database 205, storage unit 403 from the retrieved information and transmits the information to other modules (e.g., processing module 202, Natural language processing module 203, sensor module 204 back to system database 205), from the system database 205 and retrieved send messages to other modules (e.g., processing module 202, natural language processing module 203, sensor module 204 back). Alternatively, processing unit 402 may also be collected information directly to other modules, such as the processing module 202, natural language processing module 203, sensor module 204 back to system database 205. The storage unit 403 may store the collection module 201 to collect information. The storage unit 403 may store the processed information processing unit 402.

The above description of the collection module is merely a specific example, it should not be considered the only possible embodiment. Obviously, those skilled in the art that, in the basic understanding of the principles of information required, may be made without departing from this principle, the contents of the information required for the various modifications and changes, but these amendments and changing the range of the above-described still.

It is a schematic diagram of the processing module 202 shown in FIG. 5. The processing module 202 may include, but is not limited to a format conversion module 501, a text preprocessing module 502, a text classification module 503, an attribute extraction module 504, and an event recognition module 505. Each of the above modules may be independent or may be part of a module integrated into one module. Processing module 202, the format converting module 501 can collect the information collection module 201 performs format conversion. Format conversion can be done automatically, or may be done manually. Format conversion can be performed in real time, it can also be based on a fixed time interval. Can convert file format information include, but are not limited to pdf, doc, docx, epub, mobi, caj, kdh, nh, bmp, jpg, tiff, gif, pcx, tga, exif, fpx, svg, psd, cdr, pcd , dxf, ufo, eps, ai, raw, mpeg, avi, mov, asf, wmv, navi, 3gp, RA, RAM, mkv, flv, rmvb, WebM other one or more combinations. For example, information is collected jpg format image, if the image contains text information, can be converted by OCR (Optical Character Recognition) to identify pictures into text format, such as txt format.

Text preprocessing module 502 may be pretreated text after format conversion, the pretreatment may include, without limitation, text word, entity recognition, the normalization process of one or more of the like. Text classification module 503 may classify the text after pretreatment, after pretreatment text can be divided into categories including, but not limited to, announcement, news, research reports class, class blog, forum class, microblogging class, interactive class investment . Announcement may include but is not limited to the class-based contract, the report type, and other subclasses. Alternatively, the announcement of those may include, but are not limited to restructuring announcement, equity incentive announcement, major contract announcement, preferential policy announcement, notice of changes in senior management, the acquisition of buyback announcements. According to the news source system can be divided into a reliable source of news and non-reliable sources, such as official information source CCTV Financial Channel can be considered a reliable source of information, news category may include, but are not limited to, finance and economics, political affairs, science and education, politics and law classes, social one or more combinations, sports, military, entertainment and so on. Attribute extraction module 504 may automatically match the extracted attributes related event, the extraction rule may be a system configuration, or may be manually configured. The event recognition module 505 may result in the attribute extraction module 504, the preprocessing module 502 of the result of the text, the system database 205 as well as certain rules derived final refinement event.

It is a schematic view of the format conversion module 501 shown in Fig. Format converting module 501 can include, but is not limited to a control unit 601, a text processing unit 602, an image processing unit 603, an audio processing unit 604, and a video processing unit 605. The control unit 601 can select the appropriate processing based on the information collected by the collection module 201; text processing unit 602 may process the information collected by the collection module 201 in text format. Image processing unit 603 the information collection module 201 may collect image format for processing. The audio processing unit 604 may process the information collection module 201 for collecting audio format. The video processing unit 605 may process the information collected by the collection module 201 video format. The present specification, the above-described embodiment, the distribution unit may be independent of, but in some embodiments, the partial units may be combined into a single unit, such as an audio processing unit 604 and the video processing unit 605 may be combined into a single audio and video processing means to achieve both functions.

The control unit 601 can determine the type of information collected by the collection module 201, and select the appropriate processing unit in accordance with the type. For example, the control unit 601 the information collected by the collection module 201 must determine whether the information is text, the text selection processing unit 602 for processing the next step.

Text information processing unit 602 to the collection module 201 may collect text format for processing, text data is converted into a unified format. Specifically, the information collection module 201 collected in text format may include, but are not limited to hypertext markup language format (Hypertext Markup Language, html), extensible hypertext markup language format (Extensible Hypertext Markup Language, xhtml), Scalable Markup language format (Extensible Markup language, xml), pdf format (Portable Document format), and one or more doc docx format (Microsoft's proprietary formats), and the like, the text processing unit 602 is converted to the format described above may be unified text format, text format may include a unified, but not limited txt formats. For example, the information collected by the collection module 201 is "CSBHK Ltd. Annual Report 2014," 19, the pdf format is the format of announcement, the text processing unit 602 may be announced by the pdf format to txt format.

Image processing unit 603 the information collection module 201 may collect image format for processing, into a unified text format. Specifically, the image format information collecting module 201 may be a collection of books, newspapers, magazines, letters and the like, this type of information, including images with text, image processing unit 603 may utilize OCR (Optical Character Recognition) technology picture information into a unified text format.

The audio processing unit 604 may process the data collected by the information collection module 201 in an audio format, into a unified text format. Specifically, the information collected by the collection module 201 may include the audio format, but not limited to one CD, WAVE, AIFF, AU, MPEG, MP3, MIDI, WMA, RealAudio, VQF, OggVorbis, AAC, APE or the like combination, the audio processing unit 604 may use speech recognition technology to convert the text format. Speech recognition techniques may include any combination of one or more of, but is not limited to a method based on a voice channel and a model knowledge, and the pattern matching method using artificial neural network method, or the method described above.

The video processing unit 605 may process the data collected by the information collection module 201 in video format, into a unified text format. Specifically, the collection module 201 to collect information in the video format can include, but is not limited to a combination of one or more Flash Video, AVI, WMV, MPEG, Mastroska, Real Video, QuickTime File Format, Ogg, MOD, and the like, the video processing unit 605 may be a video portion of the subtitle text export, including video subtitles subtitles and closed captions, and text into a unified format. The video processing unit 605 may also extract the audio portion of the video, and voice recognition to convert it into a unified text format. In a specific embodiment, if the video portion is not equipped with subtitles, the video processing unit 605 may extract the audio portion of video voice recognition, and the text is converted to a uniform format, if mounted video subtitles, subtitles will be derived and converted unified text format, you can also choose to extract the audio portion of the speech recognition and text into a unified format.

The above-described four kinds of processing unit 501 includes a format conversion module, i.e., a text processing unit 602, image processing unit 603, an audio processing unit 604 and the video processing unit 605, in some embodiments, may not contain all, may contain only one of the units , some of which may be a unit. In some embodiments, these four are all included in the processing unit, the execution order among the processing units may be performed sequentially, or may be performed simultaneously, but also in any suitable order. After the format converting module 501 converts the information collection module 201 collected format to unified format text, preprocessing module 502 to the text information in text format for subsequent processing.

The above described embodiments are merely specific embodiments of the present invention and should not be considered the only embodiments. Apparent to one skilled in the art, after understanding the principles of the present invention and the contents are possible without departing from the principles of the present invention, in the case of the structure, various changes and modifications in form and detail, but such modifications and right to change the scope of the present invention are still within the claims.

It is a schematic view of text preprocessing module 502 shown in FIG. 7. Text preprocessing module 502 may include but is not limited to a speech recognition unit 701, a word segmentation unit 702, an entity identification unit 703, and a normalization unit 704. Speech recognition unit 701 can recognize the language of the text information through the format converting module 501 processing. Word segmentation unit 702 may process the text word. Entity identification unit 703 may identify the entity of the text. Normalization unit 704 may contain the text content of the digital information and its corresponding units unified normalized, form a standard form of digital data. The aforementioned means may be independent, or may be combined into a unit cell portion. For example, speech recognition unit 701 and the text word can be combined into a unit cell 702.

Speech recognition unit 701 can recognize the language of the text processing format conversion module 501. Information collection module 201 to collect the language used may include but are not limited to Chinese, English, French, Russian, Spanish, Arabic, Japanese, German, and the like, one or more of the speech recognition unit 701 may recognize the collection module 201 to collect the language information used.

Word segmentation unit 702 may identify the language of the text recognition unit 701 perform word segmentation algorithm uses a specific word. Language collection module 201 to collect information used, including in word units of language, such as English, French, Russian, etc. These languages ​​have a natural partition between words; further comprising a word unit of language, words It is a word composed of, but not between words separated by natural, such as Chinese. Therefore, before the Chinese word frequency statistics during the first text of the need for Chinese text word processing, while the English version does not. Segmentation algorithm may include but is not limited to word string matching method (i.e., mechanical lexical points) based segmentation method based appreciated, based on statistical word segmentation method, or any combination of the above several methods of segmentation. One embodiment of the present invention are based on statistical word segmentation method and a method based on the combination of the dictionary. Word segmentation unit 702 may access the system database 205 transmits the request dictionary database, database system 205 receives a request, the request may be sent to the dictionary word segmentation unit 702. Dictionary Dictionary may be for specific areas, for example, can be a dictionary or dictionary for the announcement for the news. Specifically, the dictionary can be a dictionary for the restructuring announcement, the announcement of incentives for the dictionary, the dictionary for a major contract announcement, preferential policies for the announcement of the dictionary, executive changes for the announcement of dictionaries, dictionaries for the acquisition of the repurchase announcement. Text word segmentation unit 702 can be combined statistical results obtained by word and dictionary matching results obtained to get the final word segmentation results.

Entity identification unit 703 may identify an entity after the text processed by the entity through the word recognition method, an entity set may be stored in database 205 in the database system to the identified entity. Entities may include, but are not limited to, product, organization name, names, places, times, dates, currency, numbers, percentages, etc. in one or more specific example, "CITIC Securities Co., Ltd. Annual Report 2014" This Article title information in identifiable entities "CITIC Securities", "Corporation", "2014", "annual report." Entity identification methods may include but are not limited to, hidden Markov models, maximum entropy models, support vector machines, Boolean association rule based, synonyms configuration rule, location rule-based identification method and identification method based on statistics, a few or above combination of any of the identification method.

Normalization unit 704 may be normalized to process digital text and unit, it has the same units. Specifically, for example, the normalization unit 704 may appear in the text of the "rose five percent probability" to "increase the probability of 5%", the "total investment of thirty thousand yuan" to "project a total investment of 30,000 yuan "Wait.

Sequential processing means performing the above-described four kinds of text preprocessing module 502 may comprise a speech recognition unit 701 sequentially, word segmentation unit 702, entity identification unit 703 and a normalization unit 704. Performs text preprocessing module 502 in the order of the processing unit may be a first speech recognition unit 701 performs, according to the identification result of the speech recognition unit 701 determines whether the word segmentation unit 702 performs. When the recognition result is a Chinese text, then the text segmentation unit 702 performs. When the other recognition result having a fixed delimiter language, such as English, Korean, Russian, etc., the word segmentation unit 702 may not be performed. Subsequent entity identification unit 703 and the normalization unit 704 in the order of execution may be sequential, may be a reverse order, may be performed simultaneously. Text preprocessing module 502 may be pretreated by the text information format conversion processing module 501, text classification module 503 may continue to process the text after pretreatment. The above described embodiments are merely specific embodiments of the present invention and should not be considered the only embodiments. Apparent to one skilled in the art, after understanding the principles of the present invention and the contents are possible without departing from the principles of the present invention, in the case of the structure, various changes and modifications in form and detail, but such modifications and right to change the scope of the present invention are still within the claims.

It is a schematic diagram of text classification module 503 shown in Fig. Text classification module 503 may include but is not limited to one or more keyword extracting unit 801, a classification unit 802 or more. Keyword extracting unit 801 may extract a keyword text after the pre-processing module 502 processes the text. Classification unit 802 may extract text keywords classified according to predefined rules. The aforementioned means may be independent, or may be combined into a unit cell portion.

Keyword extracting unit 801 may analyze the text processing and text preprocessing module 502 extracts a keyword. Keyword extraction method may include but is not limited to statistical chi-square statistic based, rule synonyms, Boolean association rule, location rule, information gain, mutual information, odds ratio, cross entropy, poor inter-class information method, the above-described methods or any combination. Specifically, for the "CITIC Securities Co., Ltd. Annual Report 2014," the announcement, keyword extraction unit 801 firstly keyword extraction, the extracted keywords may include, but are not limited to, "CITIC Securities", "2014", "annual report", "net income", "increase or decrease over the same period" and so on.

Classification unit 802 may extract a keyword using the keyword text proposed unit 801 are classified according to certain classification category and attach the label. Classification tree may include but are not limited to, Rocchio, naive Bayes, neural networks, hidden Markov models, support vector machines, linear least squares fit, of kNN nearest neighbor algorithm, the genetic algorithm, the maximum entropy method, etc., or any combination of the above methods. Classification unit 802 may transmit an access request to the keyword database system database 205. Database system 205 after receiving the request, the request is sent to the classification unit 802 keywords. Keywords classification unit 802 may keyword extracting unit 801 in accordance with a certain algorithm to extract keywords and transmission system database 205 matches, the matching result according to the classification of text, and pasting the text to the appropriate category label. Specifically, for the aforesaid entitled "CITIC Securities Co., Ltd. Annual Report 2014," the full text of the announcement, the classification of the unit will return to the announcement of major categories, sub-categories annual report and label. A text according to the matching result, which can belong to different categories, then just the text affixed to the label to two, more than two can have a text label simultaneously.

Examples text classification module 503 is optional in some embodiments. For example, if the information element in the information collection module 201 has collected is determined, the text classification step may be skipped. Specifically, if the information collected is a newsletter, the content of the newsletter is the "2015 Shanghai International Film Festival on the evening of 21 June 2015 closing, the main competition Jin Jue Award announced, Chinese film" burning hot sun heart "成最大赢家, Deng Chao, Guo Tao, Duan Yi Wang won three of the same actor, Cao Bao-Ping won the best director." Elements for text messages, such as time, people, events to clear, text classification module can skip directly to the next module 503 transmits the attribute extraction module 504 for subsequent processing. Keyword extracting unit 503 for the text included in the classification module 801 and the execution order of the classification unit 802 may be sequential, i.e., to perform keyword extracting unit 801, the classification unit 802 performs. The above described embodiments are merely specific embodiments of the present invention and should not be considered the only embodiments. Apparent to one skilled in the art, after understanding the principles of the present invention and the contents are possible without departing from the principles of the present invention, in the case of the structure, various changes and modifications in form and detail, but such modifications and right to change the scope of the present invention are still within the claims.

It is a schematic diagram of the attribute extraction module 504 shown in Fig. Attribute extraction module 504 may include but is not limited to one or more keyword extracting unit 901, one or more attributes extracted templates 902, one or more attribute extraction unit 903. Keyword extraction unit 901 can extract the text keywords; attribute extraction template can store 902 events can extract attribute extraction rules; attribute extraction unit 903 can extract the completion of work on the event properties. The aforementioned means may be independent, or may be combined into a unit cell portion.

Keyword extracting unit 901 may analyze the text processing and text preprocessing module 502 extracts a keyword, the extraction method may include, but are not limited to, the chi-square statistic based on synonyms statistical rules, Boolean association rule, location rule, information gain any combination of mutual information, odds ratio, cross entropy, poor inter-class information method, or the method described above. Keyword extracting unit 901 in some embodiments, is optional. Since the text classification module 503 is optional and, when needed to skip text classification module 503 is directly processed by the attribute information extraction module 504, a keyword extraction unit 901 may be performed, the pre-processed text keyword extraction . If the text classification module 503 has been performed, you can skip the keyword extraction unit 901.

Attribute extraction template can store 902 events can extract attribute extraction rules. Event is a property of the entity and the entity composed. Entity identification unit 703 text preprocessing module 502 in the text has been identified through an entity, the entity formed and set, the attribute extraction template 902 is stored in extraction rules for different properties of different entities. Extraction rules are pre-configured. Configuration can be manually configured, depending on the type of text set in advance, setting a different entity types for each attribute extraction rule text. The method can also be configured machine learning methods. For example, the first batch of training text can be selected. The training manual annotation text by category crisp text. By training learning training text to obtain attribute extractor (not shown in the drawing). The category attribute extractor can extract the required properties according to different text, and then use the property extractor for extracting new text attribute. 902 cases attribute extraction template is optional in some embodiments. When the text classification module 503 no execution, no text classification without a corresponding attribute extraction template.

Attribute extraction unit 903 may complete the text extracted from the properties of the work. When performing text classification module 503, attribute extraction unit 903 can be labeled according to the text in the text classification results and to select the appropriate attribute extraction template. When there are two or more text affixed to the label, it is possible to select simultaneously a corresponding number of attribute extraction template, and extracting the text attribute, the results are clustered. Specifically, for example, for the "CITIC Securities Co., Ltd. Annual Report 2014," the information, giving it 503 of the label text classification module is a subclass of the annual report announcement categories. According to the tag, attribute extraction unit 903 selects the corresponding attribute extraction template is extracted according to event attribute the selected template. Event attributes may include, but are not limited to, operating income rose year on year and the amount of size, the amount of information on the net profit, total assets and increase the amount of size, etc. aspects. For example, the above notice may be drawn one of the "net profit attributable to shareholders of the parent / period over the previous year growth (%) / 116.20%" This attribute information. The above described embodiments are merely specific embodiments of the present invention and should not be considered the only embodiments. Apparent to one skilled in the art, after understanding the principles of the present invention and the contents are possible without departing from the principles of the present invention, in the case of the structure, various changes and modifications in form and detail, but such modifications and right to change the scope of the present invention are still within the claims.

Returning to Figure 5, the event recognition module 505 may recognize the completion of the work of the event, entity identification unit 703 text preprocessing module 502 in the text has been identified through an entity, and the formation of a set of entities. Attribute extraction module 504 extracts text from a desired set of properties. Event recognition module 505 can extract the results based on entity and attribute recognition results, recognition template based on certain events, the events identified and generating a refined event. Specifically for example, for the "2014 Annual Report CSI Co., Ltd.", the text preprocessing module 302 identifies the entity comprises "CSI", "Limited", "2014", "Annual Report", " net profit, "" increase or decrease over the same period, "and so on, attribute extraction module 304 extracts the event of a property is:" net profit attributable to shareholders of the parent / period year (%) / 116.20% over the previous year, "event identification module 505 can extract the final results of the event based on refined entity recognition results and attributes: "CITIC Securities net profit growth in 2014 annual report announcement attributable to shareholders equal to 116.2%."

In certain embodiments, when the information collection module 201 to collect user input information back to the test, there may be complex logic, event recognition module 505 can not only recognize the event refinement and attribute information of the entity in accordance with . Then you need to be identified in the event database 205 in the system according to the event repository (event or event recognition attribute database 1405 database 1411), and certain rules method. Specifically, for example, when a user enters the information is "the bid amount accounted for more than 50% of company revenue contract", the bid announcement is likely that the data will not be included in operating income. At this time can be calculated to obtain the final event of refinement categories based data (such as historical data) and certain rules database method. The above described embodiments are merely specific embodiments of the present invention and should not be considered the only embodiments. Apparent to one skilled in the art, after understanding the principles of the present invention and the contents are possible without departing from the principles of the present invention, in the case of the structure, various changes and modifications in form and detail, but such modifications and within the scope of protection of the present invention as claimed in claim still change

It is a flowchart of the processing module 10 shown in FIG. The system information collection module 201 of the transmission format conversion, to convert it into a unified text format (step 1001). Format conversion may include, but not limited to one or more of text, images, audio, video format conversion. Step 1001 may be implemented by the format converting module 501. The system information is pre-processed text (step 1002). Pretreatment may include, but is not limited to speech recognition, text word, entity recognition, normalization, etc. of one or more. Step 1002 can be implemented by the text preprocessing module 502. Text classification system (step 1003). Classification step may include but not limited to keyword extraction and classification. Step 1003 can be implemented by the text classification module 503. Text attribute extraction system (step 1004). Step 1004 can be implemented by the attribute extraction module 504. The system identifies the event (step 1005). Step 1005 may be implemented by event recognition module 505. Alternatively, the system may also not through step 1001 to step 1002 directly, without passing through step 1003 to step 1004 directly.

It is a schematic diagram of the natural language processing module 203 shown in Fig. Natural language processing module 203 may include but is not limited to a collecting unit 1101, and a natural language generation unit 1102. Information collecting unit 1101 can be accessed by other modules in the system needs to collect (e.g., the collection module 201, processing module 202, sensor module 204 back to system database 205). Natural language information generating unit 1102 can be collected to the collection unit 1101 into a natural language sentence. In some embodiments of the present invention, the collection unit 1101 may receive a thinning process output module 202 of the event. Meanwhile, the further collecting unit 1101 may receive user input 201 from the information collection module. Natural language generation unit 1102 may receive refined event, and the user input information, processes the event of thinning. In one embodiment of the invention, for example, in the field of stock, for the announcement, the user can choose to generate natural language statements for individual stocks, or select industry-specific natural language statements can also choose a natural language sentence for the whole market, or wherein one or more. For example in the field of journalism for IPO shares, the user can choose to generate a market for natural language sentence. It should be noted that the above natural language statements can also be generated automatically without user intervention. For example, for stock, the system can automatically generate a natural language sentence for individual stocks, industry-specific natural language sentence, the whole market for natural language sentence, or one of them or more thereof. In the field of journalism for IPO shares, the system can automatically generate a natural language sentence for the broader market. Further, in the field of journalism, natural language generation unit 1102 may generate a natural language sentence for commodity prices, natural language sentence for weather conditions, natural language statements against demographic. It said natural language statements related to the default values ​​can (only statement to change the incident occurred in the absence of the value of the case). Natural language generation unit 1102 generates a natural language statement can be inputted to the back-test module 204 to perform the natural language sentence back test. In other embodiments of the present invention, the user input to be measured back to a natural language statement collection module 201, a natural language processing module 203 receives the natural language sentence input by the user from the collection module 201. Alternatively, the user can directly be measured back to a natural language sentence input processing module 203 of the natural language query box (not shown in the drawing). Natural language processing module 203 may be a user input natural language sentence pretreatment, to obtain a standard sequence of nodes (node ​​node indicators and conditions including at least the node), and the relationship between the index node and other nodes, node tree structure. Node tree index can be used to characterize combinations of conditions. The query instruction may generate a data node tree. This data can be input back to the query instruction sensing module 204 to perform back test analysis. Historical data can be user natural language processing module 203 calls stored in the system database 205, the user can use Boolean operators (AND, OR, NOT, etc.) to a number of combinations of natural language statements together.

Note that, the above description of the natural language processing module 203 are merely specific examples, and should not be regarded as the only possible embodiments. Obviously, those skilled in the art that, in the basic understanding of the principles of information required, may be made without departing from this principle, the contents of the information required for the various modifications and changes, but these amendments and changing the range of the above-described still. For example, natural language processing module 203 may also receive back to back measurement result of the sensing module 204 outputs.

Is a schematic back sensing module 12 shown in FIG. Back sensing module 204 may comprise a standard unit 1201 question, a question of other unit 1202, an optimization unit 1203, and an expansion unit 1204. Standard questions unit 1201, unit 1202 other questions, the optimization unit 1203 and the extension unit 1204 may be separate. Some of the above-described unit cells may be combined into a single unit of work.

Standard questions unit 1201 can receive a standard system of natural language statement event. Some embodiments of the present invention, the standard unit 1201 may receive questions in natural language processing module 203 generates a natural language sentence, may be received from other modules in the natural language utterance. Other modules include, but not limited to one collection module 201, processing module 202, natural language processing module 203 and system database 205 and the like or various combinations. Other questions unit 1202 may receive non-system natural language sentence. Non-natural language sentence system may include but is not limited to a user input a definition of the expert, like the system of extraction of one or more combinations thereof. 1201 standard question and other questions unit cell 1202 may be combined into a question unit, the question unit and the system may receive a natural language sentence input by the user events.

Optimization unit 1203 can optimize the combination according to the received policy information and back calculating method. Optimizations can be automated, it can also be artificial. For example, in the financial sector, basic back-tested data obtained by the above calculation method including, but not back to the holding period, the average single income, earnings per maximum single minimum income, the expected annualized rate of return, trading times, profit and loss ratio, the success rate, the maximum rate retracement, circumferential beat rate, Sharpe ratio, the maximum number of consecutive days without picking results, having an average number of daily stock or the like in various combinations. Further optimization unit 1203 may also give the optimal strategy based on backtesting results, while there will be reports and rating. 1204 expansion units can be configured to provide subscription feature can also be configured to provide information sharing. Subscriptions may be an extension unit 1204 user subscription includes a specific keyword selection information according to the information by various means through the system analyzes the content pushed to the user. Sharing can be a user through a variety of ways to share information of interest to a friend and so on.

Back test is a flowchart 13 shown in FIG. Step 1301 receives information. Receiving source information may include but is not limited to the collection module 201, processing module 202, natural language processing module 203 and system database 205. Received information may be a natural language sentence, it can be machine statement. Step 1301 can be back-tested module 204 receives a natural language statement. Natural language sentence information may be directly input by the user, it may be generated by other modules. Step 1301 may be performed by standard 1203 question unit 1201 and / or other units question.

In step 1302, the received natural language sentence with the historical data back to the test analysis, historical data may be stored in the system database 205, may be stored in the sensor module 204 back. Backtesting analysis of natural language sentence and historical data can be achieved by a certain optimization methods. Optimization method may include but is not limited to a system-defined, user-defined selection, machine learning wherein one or more combinations.

In step 1303, the results of optimization analysis of information will match the corresponding text template. Text templates can be defined by the system, it can be user-defined. Content and template matching results may include, but are not limited to one backtesting report, rating the optimal strategy, trend forecasting, etc., or various combinations.

Note that, the above description of the information analysis system processes only to facilitate understanding of the invention, the present invention should not be considered the only possible embodiment. Obviously, those skilled in the art, in the understanding of the basic principles required, may be made without departing from this principle, the contents of the information required for the various modifications and changes, but these amendments and changes still within the range described above. For example, in step 1303, back-tested historical information may be matched with text templates and also with voice, video, pictures and other template matching. As another example, test process may return back from the sensor module 204 is completed, may be a natural language processing module 203, processing module 202, the collection module 201 and the like is completed.

Returning to Figure 12, back to the sensing module 204 may further include an extension unit 1204. 1204 expansion units can be configured to provide subscription feature can also be configured to provide information sharing. Expansion unit may include, but are not limited to API of various types, such as object-oriented API, database and an application program interface frame (API), the protocol API, the API interface, Web API, or wherein one or more . Subscriptions may be an extension unit 1204 user subscription includes a specific keyword selection information according to the information by various means through the system analyzes the content pushed to the user. Sharing can be a user to share information of interest to a friend through a variety of ways. Subscriptions 1204 expansion unit may include, but are not limited to provide the push information to the user, can also recommend similar interests focus on the user, can also recommend a review of information, and provides information such as whether to help score. Extension unit 1204 pushed ways including, but not limited to mobile client software, e-mail, SMS, RSS portals, online single user aggregators, search engines, browser, instant messaging software, social networking and so on. 1204 pushing extension unit period may be set by the system, or may be user-defined. Push cycle can be regularly can also be irregular. Push may be real time may also be delayed. Content in the form of push expansion unit 1204 may include, but are not limited to text, voice, images, animation, video of one or more. Content extension unit 1204 pushed may include, but are not limited to, the user has browsed the information content updates can be information the user concerned, it can be the system based on user records recommendation information may also be heat case and other similar information of interest in the One or more.

1204 expansion unit sharing can be a way to release user information, to the designated place to share, choose who can see the information. Information sharing content can be a single piece of information may be pieces of information may be information section selected content can also be information about the overall content of the page, it can be content to share may be information Comments share, may be concerned about the degree of information sharing score can also help share information and so on. Information sharing can include but are not limited to SMS, MMS, e-mail, QQ, MSN, micro-channel, microblogging, watercress, Twitter, Facebook, Instagram, everyone, instant messaging and other software tools in one or more. Receiving information shared objects may include but is not limited to a single friend, one or more of a plurality of friends, friends circle, public circles, forums, and the like of other users. Information sharing content formats may include, but is not limited to one text, images, voice, animation, video, web links or more. Implementing functions described above for the information sharing mode it is merely a specific example, it should not be considered the only possible embodiment. The above description of expansion unit 1204 is merely a specific example, it should not be considered the only possible embodiment. Apparent to one skilled in the art, after understanding the basic principles of the expansion unit 1204, may be made without departing from the principles and steps of the embodiment DETAILED DESCRIPTION expansion unit, and can realize the function extension unit various modifications and changes in form and details, within the scope of these modifications and variations of the above-described still.

205 is a schematic diagram of the system database module 14 shown in FIG. Database system 205 may include, but are not limited to, one of the original information database, a text database, a pre-text database, a database entity, an event attribute database, a keyword database, a text classification database, a history information database, a natural language database processing, an event identification database, a database module back to test a template a text database, a dictionary database or various combinations. Database 205 may store data and template data to be processed. For example, historical information collected historical information database 1409 can be classified stored in the database. Similarly, when the information processing any updates, the information is also updated in real time in the database as a keyword database 1406, updating the database of synonyms achieved. Apparent to one skilled in the art, after understanding the principles of database information analysis systems and methods may be made without departing from this principle, the various forms of embodiment and the details of the above-described method and system applications within the scope of amendments and changes, but these modifications and changes are still described above. For example, the database system 205 may be in various databases collection module 201, processing module 202, natural language processing module 203 and sensor module 204 back to its function, respectively. Database system 205 may be various types of database is a database that implement two and more than two data library features, such as text preprocessing database 1403 may store data simultaneously preprocessing, data entities, attributes and events keywords, etc., while implementing entity database 1404, an event attribute database 1405, 1406 and keyword database functions.

Analysis of the information shown in FIG. 15 flowchart. Information analysis system to collect information required for 1501 pairs of steps. Step 1501 may be performed by the collection module 201. Information required above may include, but are not limited to, a variety of news, one or more announcements, commentary, research reports, blog, news, reports, notices, papers, journals and the like. Information required above may be information about various industries, including but not limited to sports, entertainment, economic, political, military, culture, art, science and engineering in one or more. In the form of the required information may include, but are not limited to text, images, audio, video, and other one or more. In some embodiments of the present invention, the information collection system in step 1501 may be text information. The text information includes, but is not limited to the following formats: pdf, doc, epub, mobi, caj like, or wherein one or more.

The system may be pretreated in a step 1502 the text information pairs. Step 1502 may be performed by the processing module 202. Text preprocessing may include, but are not limited to format conversion, word processing, entity recognition, and digital processing unit normalization one or more combinations thereof. For example, information systems collected in step 1501 is "CITIC Securities Co., Ltd. 2014 Annual Report" announcement which is a PDF file can be downloaded at the Shanghai Stock Exchange Web site. System through the format conversion, converts the text into a bulletin txt format to facilitate subsequent word and text processing. While the part of the interior of the PDF table format analysis process, the remaining portion formatting information and contextual information. After the format conversion is completed, the system will process the word in a certain announcement method. Word processing can be performed based on a statistical model and dictionary database 1407. Alternatively, the segmentation process may also be achieved by application of certain rules. Rules may include but are not limited to, synonyms configuration, Boolean association rule, location rule, or in which one or more. After the completion of word processing, the system will notice an entity identified. Including but not limited to identifying entities product, organization name, person, place, time, date, currency, numbers, percentages and the like. Specifically, the system summarizes the elements of past information, define various event categories. For example: class diplomacy, finance, sports, political class, science, education, etc., or one or more of them. The above categories may also contain several levels of sub-categories, for example, financial category includes bonds, stocks category, funds and so on. After identification is complete, the system will be digital and the announcement units normalized. For example, the "net profit growth of three percent" to "net profit growth of 3%."

After completion of the pretreatment described herein, the system will process the announcement text (step 1503). Text processing can be completed by the processing module 202. After the announcement system after pretreatment keyword matching. Keyword matching configuration may be synonyms, Boolean association rules, the method of combining the location rules. The results of keyword matching or otherwise treated, the system may be determined category (step 1504) the announcement. For example, the map "CITIC Securities Co., Ltd. 2014 Annual Report" shown in Figure 19, the extracted keywords may be "CITIC Securities", "2014", "Annual Report", "net income", "increase or decrease over the same period," and so on, then the notice may be determined for financial reporting categories. Step 1504 may be performed by the processing module 202.

After completing the class judgment, the system can generate an event based on a refinement of the rules (step 1505). In some embodiments of the present invention, after the completion of the category determination, text information can be extracted attributes. Attribute description of the nature of the entity or relationship. For example, for the "2014 Annual Report of CITIC Securities Co., Ltd." as shown in Figure 19, the entity can be extracted as "CITIC Securities", from the table of FIG. 19 can be extracted attributes can be revenue, net profit increase or decrease over the same period, total assets, total liabilities, total shareholders' equity, etc., or one or more of them. After completion of the attribute extraction, the system may, entity and attribute binding methods according to certain rules, generating a refined event. Step 1505 may be performed by the processing module 202. Generating a refined event rules may include but are not limited to, synonyms configuration, Boolean association rule, location rule, or in which one or more. For the above announcement extracted entity "CITIC Securities", it can be one of its attributes of "net profit increase or decrease over the same period," a combination may generate "CITIC Securities 2014 annual net profit growth of 116.2 percent," the refined event.

After generating a refined event, the system can generate a natural language sentence for the refinement event (step 1506). Step 1506 may be performed by natural language processing 203 module. The notice, the system can generate a natural language sentence for three, "the annual report of CITIC Securities net profit rose more than 100%", "brokerage industry annual net profit growth of more than 100%", "annual net profit growth of more than 100% "that correspond to the three levels of stocks, sectors and the entire stock market. The system can for three or more generation of natural language sentence, stocks were back-tested events, industry events back-tested, and the whole market back test (step 1507). Step 1507 may be accomplished by back sensing module 204. After completing back-tested, the system can backtesting results matching text template to generate backtesting report (1508). Step 1508 may be accomplished by back sensing module 204. For the above announcement, the system generates a sample report can be back-tested "for nearly a year, A-share all financial reports announcement the next day closing average yield 0.48%, up 47.77% probability. Among them, the securities industry a total of 165 published similar announcements, The next day closing average yield 0.72%, up 46.67 percent probability, the next day the stock fell probability is too large, the low probability of profit optimal strategy: holding up the highest probability is 11 days after the close of sale, the average yield 9.40 percent "due. up and down 50% in the probability distribution, it is determined "insignificant."

Note that, the above description of the information analysis system flowchart only to facilitate understanding of the invention should not be considered the only embodiments. Apparent to one skilled in the art, after understanding the principles of the present invention and the contents are possible without departing from the principles and structure of the present invention, various modifications and changes in form and details, these amendments and changing the scope of rights of the present invention are still within the claims. For example, at step 1504, the system may collect information directly input by the user, and converts the natural language utterance information (step 1506). Similarly, after completion of the information collection systems, text processing may be performed (step 1503) directly. Similarly, after the completion of text processing system can generate refined event directly after treatment based on text, step 1504 is not required.

It is a flowchart of the interworking information analysis system 16 shown in FIG. Information analysis system in step 1601 receiving a natural language sentence. The natural language statements can be direct user input may be a natural language sentence by processing the announcement, news and other text get. In particular, the user may enter a natural language sentence information analysis system through interface provided (see FIG. 18). Users can enter any natural language sentence. For example, in the financial field, users can enter "000 826, signing major contracts"; major contracts signed according to the system can retrieve the code input natural language sentence corresponding to 000,826 companies. Natural language sentence numbers and dates can be in any format (see FIG. 18).

Information analyzing a natural language sentence received by the system at step 1602 the process of step 1601. Processing may include, without limitation, word processing, entity recognition, and digital unit normalization processing, text categorization, event attribute extraction, refining and other event identifying one or more compositions. Information analysis system in step 1603 through step 1602 in the processing of natural language sentence were back-tested, and then report back to test 1604 is generated in step. Back on the specific content of the measurement report will be described in detail in FIG. 18.

Figure 17 is a schematic diagram of the information analysis system for news or announcement of an interactive interface. Referring to Figure 2, the interface may be generated by sensing module 204 and back to back to show the measurement result (measurement display Press). The interface may be displayed on various electronic devices. The electronic device may include, but are not limited to cell phones, personal computers, tablet computer, PDA, smart watch, smart appliances, smart vehicles, etc., or wherein one or more. In the address bar, the user can enter any news bulletin or uniform resource locator (URL) to read the announcement or news and analysis. Alternatively, the user can enter any announcement or news of IP (Internet Protocol) address.

In the query box 1701, the user can enter the full name of the announcement or news. Alternatively, the user can also enter a keyword or news bulletin, in order to select a specific news or announcement in the results list. After selecting a specified announcement or good news, the announcement or news title, text content, as well as backtesting report can be displayed on the interactive interface. 1702 area can display text content of the announcement or news of all or part of, the user can mouse, keypad, touch screen, voice control or the touch pad to view the body content. 1703 display area can be selected for backtesting report or news bulletin. The back measurement report may include, without limitation display of historical data, for example,

"Over the last year, A-share all signed a major contract announcement, the next day closing average yield 0.38%, up 47.62% probability.

Which, XX Pharmaceuticals announced a total of 16 similar announcement the next day closing average yield 0.29%, up 68.75 percent probability, the stock rose the next day a great probability, high probability of profit. "

In addition to showing historical data, the rating of the announcement or news and advice strategies can also be displayed in the area in 1703. The recommended strategy based on the historical performance of the best back-tested after the latest time period, the proposed policy may be, for example, the highest probability is holding up one day after the closing to sell, the average yield 0.29%. Reports and other ratings may be good, bad, or insignificant. 1704 can display the latest regional news bulletin or a user-friendly view. Display can display a list that shows only the title and time of the latest announcement or news. Area 1705 can display the information about the selected report or news, such as the selected report or news announcement (news) type, release time, or the name of the security bulletin news involved, securities code, announcement (news) number.

The above description merely show particular embodiments of the classification module of the present invention should not be considered the only embodiments. Apparent to one skilled in the art, after understanding the principles of the present invention and the contents are possible without departing from the principles of the present invention, in the case of the structure, various changes and modifications in form and detail, but such modifications and right to change the scope of the present invention are still within the claims.

FIG 18 is a schematic view of the information analysis system for a user input interface. Referring to Figure 2, the interface may be generated by sensing module 204 and back to back to show the measurement result (measurement display Press). The interface may be displayed on various electronic devices, the electronic device may include, but are not limited to, cell phones, personal computers, tablet computer, PDA, smart watch, smart appliances, smart vehicles, etc., or wherein one or more. In the address bar, the user can enter a uniform resource locator (URL) to any text in the text reading and analysis of results. Alternatively, the user can enter any text in the IP (Internet Protocol) address.

In the area 1801, the user can enter any natural language sentence. As in the financial field, the user may input "10 December the 20th average bond; amplitude less than 3%; over two descending order" or "dividend yield more than 3% for two consecutive years; EPS of greater than 2 million; market capitalization of less than 5 billion year on year revenue growth from small to large. " Natural sentence input numbers and dates can be in any format performance numbers and dates. Before that date may be, the last week, last week, 5 week, last month, last quarter, the year before that, in recent days N, N recent weeks, in recent days, etc. N. Numbers can be 3 per 1,1 / 3,5 million, 5,5% percent and so on. It may be a range of numbers, such as 5 to 10 yuan, 5-10%, etc. Natural language sentence input can also be added collation. Collation can be, such as ratio of large to small, than from small to large, from small to large of Price, Quote change from large to small, from large to small turnover, turnover from small to large, capital descending flow, outstanding, from small to large, DDE big to small, from small to large market capitalization, basic earnings per share from big to small, from large to small sales margins, gross profit margin from largest to smallest, net assets (YoY growth rate) from largest to smallest, return on assets roe from largest to smallest, interest earned multiple large to small, operating income (year on year growth rate) from largest to smallest, etc., or one or more of them.

In the area 1802, the user can set analysis strategy. For example, users can set the time range, stock positions, to buy time, holding period, only the profit condition, stop condition, transaction rates and so on. Specifically, the user can set the time to buy to buy the next day after the opening, it may be provided only the surplus condition is "greater than 25%, 5% retracement take profit." After setting the content area 1801 and 1802 in the area, the user can click on the search button, backtesting report on the input natural language sentence to be generated. Regional strategy for 1803 can generate ratings show backtesting report and recommendations. Reports estimate of the maximum rating can be expected annualized rate of return and maximum success rate and so on. 1804 display area can report back to test for natural language sentence input. Backtesting report may include, but are not limited to back-test data analysis, the cumulative gains chart, income distribution, transaction history inquiry and so on. Backtesting data may include, but are not limited to the holding period, the average single income, single minimum income, expected annual rate of return, the number of transactions, profit and loss ratio, the success rate, the maximum rate of retracement, Zhou beat the rate, the Sharpe ratio The maximum number of days in a row without picking result, the average number of daily stock picks, etc.

The above description merely show particular embodiments of the classification module of the present invention should not be considered the only embodiments. Apparent to one skilled in the art, after understanding the principles of the present invention and the contents are possible without departing from the principles of the present invention, in the case of the structure, various changes and modifications in form and detail, but such modifications and right to change the scope of the present invention are still within the claims.

The above description of the art is only applicable to a specific example, it should not be considered the only possible embodiment. Clearly, for the skilled in the art, in an information analyzing understand the basic principles of the method and system may be made without departing from the principles and details of the implementation of the above form of the method and system applications within the scope of the various modifications and changes, but these modifications and changes are still described above. Those who can organize into a data configuration system can use the system of the present invention describes the realization of functional information analysis, for example, the invention can be used as a browser plug-in, when a user browses the site, the need for the current Web page news or announcements of information analysis, the plug-back test using the news or advertisement history information, and the prediction given; in the same manner, the system may also be embedded in the system to the company financial statements intelligent data analysis; in addition, various sensors collect data, such as a temperature sensor, a humidity sensor wind sensor may read environmental data, historical trends can be analyzed by the system environment and predict future environmental changes; medicine, the use of different ages of the same drug effect back test, disease symptoms such as cold, based on historical data analysis to give how many days recovery and so on.

Claims (18)

  1. An information analysis system, including:
    A computer-readable storage medium, said storage medium storing executable module, comprising:
    Collection module, the collection module to collect information;
    A processing module, the processing module can be pre-collected information, extracts the event information from the preprocessing;
    Natural language processing module, the natural language processing module is capable of generating a natural language sentence based on the extracted event;
    Back to the sensor module, the sensor module can obtain return history information generated according to the natural language statement, and the history information generating binding back measurement result;
    A processor capable of executing the computer-readable storage medium storing executable module.
  2. According to claim 1, said system further comprising analyzing information of a database, a database capable of storing the collected information, the pre-information, the extracted event, the natural language sentence, the history information back to the measurement result.
  3. According to claim 2, the original information comprises a database of databases, text databases, text preprocessing database, the database entity, the event attribute database, a keyword database, text categorization database, history information database, a database natural language processing, event recognition database, backtesting module database, text template database, dictionary database.
  4. According to claim 1, further comprising said processing module format conversion module, text processing module, the attribute extraction module, an event recognition module.
  5. 4, the processing module further comprises a text classification module according to claim.
  6. The 1, said method comprising processing modules employed chi-square statistic, information gain, mutual information, odds ratio, cross entropy, difference information between classes Claim keyword statistics tree, Rocchio, naive Bayes, nerve networks, support vector machines, linear least squares fit, nearest neighbor algorithm kNN, genetic algorithms, sentiment classification, maximum entropy, Generalized Instance Set, synonyms configuration, Boolean association rules, location rules, machine learning.
  7. According to claim 1, the natural language processing module may receive information from the collection module.
  8. According to claim 1, said backing further comprises a sensing module determining measurement information back, the back measurement information is determined according to the outcome of the evaluation given back test.
  9. According to claim 1, the measurement results can be presented back to the user.
  10. An information analysis method, comprising:
    collect information;
    The event extracting said information;
    It generates a natural language sentence according to the event;
    Obtaining historical information based on the natural language sentence;
    The combination of historical information on the natural language statements backtesting analysis.
  11. According to claim 10, the information collecting comprises collecting user input information and the non-user input information, the input information sources, including non-user communication terminals and a server.
  12. According to claim 11, said collecting comprises collecting information bulletin information and news information.
  13. According to claim 10, further comprising said event extraction and entity recognition attribute extraction.
  14. According to claim 13, further comprising identifying the physical format conversion, text word, and a digital processing unit normalization.
  15. According to claim 13, the attribute extraction can be achieved by the system defined by the model.
  16. 10, the natural language sentence can be generated according to input information according to claim user.
  17. 10, the natural language sentence can be further expanded according to claim event category.
  18. 10, the natural language sentence may be generated back to back measurement according to the measurement result information type claims.
PCT/CN2015/098086 2015-12-21 2015-12-21 Information analysis system and method based on event regression test WO2017107010A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/098086 WO2017107010A1 (en) 2015-12-21 2015-12-21 Information analysis system and method based on event regression test

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/098086 WO2017107010A1 (en) 2015-12-21 2015-12-21 Information analysis system and method based on event regression test

Publications (1)

Publication Number Publication Date
WO2017107010A1 true true WO2017107010A1 (en) 2017-06-29

Family

ID=59088809

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/098086 WO2017107010A1 (en) 2015-12-21 2015-12-21 Information analysis system and method based on event regression test

Country Status (1)

Country Link
WO (1) WO2017107010A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120240020A1 (en) * 2002-09-16 2012-09-20 Mckeown Kathleen R System and method for document collection, grouping and summarization
CN103473263A (en) * 2013-07-18 2013-12-25 大连理工大学 News event development process-oriented visual display method
CN103488663A (en) * 2012-06-11 2014-01-01 国际商业机器公司 System and method for automatically detecting and interactively displaying information about entities, activities, and events from multiple-modality natural language sources
US20150339269A1 (en) * 2014-05-23 2015-11-26 Alon Konchitsky System and method for generating flowchart from a text document using natural language processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120240020A1 (en) * 2002-09-16 2012-09-20 Mckeown Kathleen R System and method for document collection, grouping and summarization
CN103488663A (en) * 2012-06-11 2014-01-01 国际商业机器公司 System and method for automatically detecting and interactively displaying information about entities, activities, and events from multiple-modality natural language sources
CN103473263A (en) * 2013-07-18 2013-12-25 大连理工大学 News event development process-oriented visual display method
US20150339269A1 (en) * 2014-05-23 2015-11-26 Alon Konchitsky System and method for generating flowchart from a text document using natural language processing

Similar Documents

Publication Publication Date Title
Sprenger et al. Tweets and trades: The information content of stock microblogs
Pang et al. Opinion mining and sentiment analysis
US20130311485A1 (en) Method and system relating to sentiment analysis of electronic content
Ravi et al. A survey on opinion mining and sentiment analysis: tasks, approaches and applications
US20120296845A1 (en) Methods and systems for generating composite index using social media sourced data and sentiment analysis
US20090164387A1 (en) Systems and methods for providing semantically enhanced financial information
US20070112764A1 (en) Web document keyword and phrase extraction
US20090319449A1 (en) Providing context for web articles
Hu et al. Text analytics in social media
US20100312769A1 (en) Methods, apparatus and software for analyzing the content of micro-blog messages
US20090055242A1 (en) Content identification and classification apparatus, systems, and methods
US20120254074A1 (en) Contextually Transformed Learning Layer
US7685091B2 (en) System and method for online information analysis
US20110087486A1 (en) System, report, and method for generating natural language news-based stories
US20100318537A1 (en) Providing knowledge content to users
US20120316916A1 (en) Methods and systems for generating corporate green score using social media sourced data and sentiment analysis
US20120197993A1 (en) Skill ranking system
Barbier et al. Data mining in social media
CN101923545A (en) Method for recommending personalized information
US20090216741A1 (en) Prioritizing media assets for publication
US8266148B2 (en) Method and system for business intelligence analytics on unstructured data
CN101819572A (en) Method for establishing user interest model
Nassirtoussi et al. Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment
CN101420313A (en) Method and system for clustering customer terminal user group
US20120254225A1 (en) Generating content based on persona

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15911010

Country of ref document: EP

Kind code of ref document: A1