CN105843889B - Credibility-based data acquisition method and system for big data and common data - Google Patents

Credibility-based data acquisition method and system for big data and common data Download PDF

Info

Publication number
CN105843889B
CN105843889B CN201610164635.4A CN201610164635A CN105843889B CN 105843889 B CN105843889 B CN 105843889B CN 201610164635 A CN201610164635 A CN 201610164635A CN 105843889 B CN105843889 B CN 105843889B
Authority
CN
China
Prior art keywords
data
credibility
target data
group
reliability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610164635.4A
Other languages
Chinese (zh)
Other versions
CN105843889A (en
Inventor
朱定局
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN201610164635.4A priority Critical patent/CN105843889B/en
Publication of CN105843889A publication Critical patent/CN105843889A/en
Application granted granted Critical
Publication of CN105843889B publication Critical patent/CN105843889B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Abstract

The invention relates to a data acquisition method and a system, wherein the method comprises the following steps: acquiring acquisition conditions, and acquiring a plurality of target data according to the acquisition conditions; classifying the target data according to a preset feature matching degree to obtain a data group; respectively obtaining individual credibility corresponding to each target data, and obtaining group credibility of data groups according to the individual credibility; judging whether the group credibility is greater than or equal to a preset value; if yes, corresponding target data in the collected data group are stored in a target database or a big data storage library. Therefore, the data group consisting of the target data is screened according to the group credibility and the preset value, and the corresponding target data is acquired when the group credibility is greater than or equal to the preset value, so that unreliable data are prevented from being acquired, and the accuracy of data acquisition is improved.

Description

Credibility-based data acquisition method and system for big data and common data
Technical Field
The invention relates to the technical field of data processing, in particular to a credibility-based data acquisition method and system for big data and common data.
Background
When data is collected, for example, big data is collected, the traditional method usually only collects the data according to the data type specified by the system or the database, and then directly stores the collected data in the system or the database for standby. For example, when the language database storing the language information needs to correctly interpret or pronounce a word, the language information with the interpretation or pronunciation of the word is directly collected and put into the language database, and it is not necessary to check whether the interpretation or pronunciation of the word by the collected language information is correct. For the situations that the source credibility of the acquired data is unclear and the same data corresponds to the acquired data of a plurality of different sources, the traditional data acquisition method cannot check the correctness of the data, the situations of data storage errors or contradictions are easy to occur, and the acquisition accuracy rate is low.
Disclosure of Invention
In view of the above, it is necessary to provide a data acquisition method and system for improving the acquisition accuracy.
A method of data acquisition comprising the steps of:
acquiring acquisition conditions, and acquiring a plurality of target data according to the acquisition conditions;
classifying the target data to obtain a data group;
respectively obtaining individual credibility corresponding to each target data, and obtaining group credibility of the data group according to the individual credibility;
judging whether the group credibility is greater than or equal to a preset value;
and if so, acquiring corresponding target data in the data group and storing the target data in a target database or a big data storage library.
A data acquisition system comprising:
the data acquisition module is used for acquiring acquisition conditions and acquiring a plurality of target data according to the acquisition conditions;
the data classification module is used for classifying the target data to obtain a data group;
the group credibility calculation module is used for respectively obtaining the individual credibility of each corresponding target data and obtaining the group credibility of the data group according to the individual credibility;
the credibility analysis module is used for judging whether the group credibility is greater than or equal to a preset value;
and the data acquisition module is used for acquiring corresponding target data in the data group and storing the corresponding target data in a target database or a big data storage library when the group credibility is greater than or equal to the preset value.
According to the data acquisition method and the data acquisition system, the acquisition conditions are acquired, and after a plurality of target data are acquired according to the acquisition conditions, the target data are classified to obtain a data group; then respectively obtaining individual credibility corresponding to each target data, and obtaining group credibility of the data group according to the individual credibility; and judging whether the group credibility is greater than or equal to a preset value, if so, acquiring corresponding target data in the data group and storing the target data in a target database or a big data storage library. Therefore, the data group consisting of the target data is screened according to the group credibility and the preset value, and the corresponding target data is acquired when the group credibility is greater than or equal to the preset value, so that unreliable data are prevented from being acquired, and the accuracy of data acquisition is improved.
Drawings
FIG. 1 is a flow chart of a data collection method according to the present invention in one embodiment;
FIG. 2 is a flowchart illustrating an embodiment of finding the credibility of the collected object according to the identity information, and using the credibility of the collected object as the individual credibility of the corresponding target data;
FIG. 3 is a flow chart of a data acquisition method of the present invention in another embodiment;
FIG. 4 is a flowchart illustrating a step of modifying the confidence level of the collected object according to the feedback information to obtain a new initial confidence level of the collected object in an embodiment;
FIG. 5 is a block diagram of a data acquisition system in accordance with the present invention;
FIG. 6 is a block diagram of a data acquisition system according to another embodiment of the present invention;
FIG. 7 is a block diagram of a confidence correction module in an embodiment.
Detailed Description
Big data refers to a data set that cannot be captured, managed and processed in an affordable time frame using conventional software tools, and is characterized by a huge amount and difficulty in collection, processing and analysis.
The general data referred to herein refers to non-big data.
Confidence is the degree to which a person or thing is trusted, and is the degree of confidence that a group is true empirically.
Referring to fig. 1, a data acquisition method in an embodiment of the present invention is implemented based on credibility, big data oriented and general data, and includes the following steps.
S110: acquiring acquisition conditions, and acquiring a plurality of target data according to the acquisition conditions.
The acquisition condition refers to information for specifying characteristics of data to be acquired, including objects and attributes. A plurality of target data simultaneously conforming to the acquisition conditions can be acquired by the acquisition conditions. The target data may be large data or general data.
In one embodiment, the collection condition is text information, mandarin pronunciation information, dialect pronunciation information, and the like of a certain specified word, that is, the object is a certain specified word, and the attribute includes text information, mandarin pronunciation information, dialect pronunciation information, and the like. Correspondingly, the target data acquired according to the acquisition condition comprises text information and/or voice information. For example, when the collection condition is the voice information of the word, the target data is obtained by correspondingly collecting the voices recorded by the user a, the user B and the user C.
S130: and classifying the target data to obtain a data group.
In one embodiment, step S130 includes step 11 and step 12.
Step 11: and extracting preset characteristics of the target data.
The preset features can be selected according to the acquisition conditions of the target data. For example, in this embodiment, the preset feature is text information and/or speech information of a specified word in the collection condition.
Step 12: and taking the target data with the matching degree of the preset characteristics larger than or equal to the preset matching degree as a data group.
The preset matching degree can be selected according to actual conditions. If the preset feature matching degree is greater than or equal to the preset matching degree, it indicates that the preset features of the corresponding target data are relatively similar and can be classified into one category. Through the classification according to the matching degree of the preset features, the similar target data can be conveniently and uniformly processed, and the efficiency of multi-data acquisition is improved.
S150: and respectively acquiring individual credibility corresponding to each target data, and acquiring group credibility of the data group according to the individual credibility.
It will be appreciated that, because the predetermined characteristics of each target data in a data population are relatively similar, the true credibility of each target data in a data population is similar, and thus the population credibility in a data population may represent the true credibility of each target data in the data population.
In one embodiment, the step of obtaining the individual reliability corresponding to each target data in step S150 includes step S21 and step S22.
Step 21: and respectively acquiring the identity information of the acquired object providing the target data according to the target data.
The identity information of the acquired object refers to information for identifying the identity of the acquired person. Each collected object corresponds to unique identity information. In this embodiment, the acquired object is a person, that is, the target data is provided by the acquired person. For example, the target data is a voice message of a word, and the voice message is recorded by the user a, then the user a is the person to be collected of the target data. Specifically, in this embodiment, the identity information of the acquired object is an identification number of the acquired person. It is understood that in other embodiments, the collected object may also be a website, and correspondingly, the identity information of the collected object is a website.
Step 22: and searching the credibility of the acquired object according to the identity information, and taking the credibility of the acquired object as the individual credibility of the corresponding target data.
For example, in the embodiment where the target data includes text information and/or voice information of a word, step 22 is specifically to obtain the credibility of the collected object from the language database. The language database comprises a plurality of text information and/or voice information, identity information of a collected object of each text information and/or voice information, credibility of each identity information, and association relations between the text information and/or voice information and the identity information and the credibility.
It is to be understood that, in other embodiments, the individual credibility may also be pre-stored in correspondence with the target data, that is, each target data corresponds to one individual credibility, and as long as the target data is obtained, the individual credibility may be obtained according to the association correspondence.
In one embodiment, the step of obtaining the group reliability of the data group according to the individual reliability in step S150 includes: and calculating the average value of the individual credibility of all the target data in the data group to obtain the group credibility of the data group.
For example, the individual credibility of each target data in a certain data population is: 0.5, 0.4, 0.6, 1, the group reliability of the data group is (0.5+0.4+0.6+1)/4 is 0.625. It is understood that in other embodiments, other computing methods may be used to obtain the group confidence level.
S170: and judging whether the group credibility is greater than or equal to a preset value. If not, the data group which is currently obtained does not meet the requirements, possibly is an error data group and is not collected; if yes, go to step S190.
The preset value can be specifically set according to the required data acquisition accuracy. In this embodiment, the preset value is 0.6. It can be understood that in other embodiments, if the requirement on the data acquisition accuracy is higher, a preset value is set to be increased appropriately, for example, 0.8; if the requirement on the data acquisition accuracy is low, the reduction preset value is set appropriately, for example, 0.5.
S190: and storing corresponding target data in the collected data group into a target database or a big data storage library.
Wherein, the target database refers to a traditional database for storing common data, such as a relational database; a big data store refers to a store for storing big data. When the collected data is common data, the data is stored in a target database, and when the collected data is big data, the data is stored in a big data storage library.
For example, in the embodiment where the target data is text information and/or voice information of a word, the corresponding target data in the collected data group is stored in the language database.
The corresponding target data in the data group with the group credibility larger than or equal to the preset value is collected and stored in the target database or the big data storage library, and the target data is screened according to the group credibility, so that the data collection accuracy can be improved.
In one embodiment, step S190 includes: and storing all target data contained in the collected data group into a target database or a big data storage library.
By collecting all target data in the data group with the group credibility larger than or equal to the preset value, multi-data collection is realized while the data correctness is verified, and the data collection efficiency is improved.
In another embodiment, step S190 includes: and searching the target data with the highest individual credibility in the data population and storing the target data in a target database or a big data storage library.
The method has the advantages that the optimal target data is selected by collecting the target data with the highest individual credibility in the data group with the group credibility larger than or equal to the preset value, and the data collection accuracy can be improved to the maximum extent.
In one embodiment, referring to fig. 2, step 22 includes steps S151 to S157.
S151: and judging whether the initial credibility of the acquired object exists in the target database or the big data storage according to the identity information. If yes, go to step S153; if not, step S155 is executed.
S153: and taking the initial credibility as the credibility of the acquired object.
S155: and taking the preset credibility as the credibility of the acquired object.
S157: and taking the credibility of the acquired object as the individual credibility of the corresponding target data.
The preset credibility can be specifically set according to actual conditions. In this embodiment, the predetermined reliability is 0.5.
By judging whether the initial credibility of the acquired object exists or not, if not, the default preset credibility is used as the credibility of the acquired object, so that each acquired person can be ensured to correspond to one credibility, and the condition that the target data does not have corresponding individual credibility is avoided.
In one embodiment, referring to fig. 3, after step S190, the method further includes: step S210 and step S230.
S210: and acquiring feedback information of the target data.
Wherein, the feedback information refers to whether the user feeds back the target data correctly. For example, the feedback information may include information in the "correct" or similar sense, and information in the "wrong" or similar sense.
S230: and correcting the reliability of the acquired object according to the feedback information to obtain new initial reliability of the acquired object, and storing the new initial reliability and the acquired object in a target database or a big data storage library in a correlation manner.
The initial reliability of the collected object is corrected according to the feedback of the user, so that the accuracy of the initial reliability can be improved in time, more accurate reference can be provided for subsequent data collection, and the accuracy of the data collection is improved.
In one embodiment, the feedback information includes positive feedback and negative feedback. For example, "correct" indicates positive feedback and "error" indicates negative feedback. Referring to fig. 4, the step S230 of modifying the reliability of the acquired object according to the feedback information to obtain a new initial reliability of the acquired object includes step S231 and step S235.
S231: and judging whether the type of the feedback information is positive feedback or not. If so, it indicates that the target data is correct, step S233 is executed, and if not, it indicates that the type of the feedback information is negative feedback and the target data is incorrect, step S235 is executed.
S233: and improving the reliability of the collected object according to a preset difference value to obtain new initial reliability of the collected object.
S235: and reducing the reliability of the collected object according to a preset difference value to obtain new initial reliability of the collected object.
The preset difference value can be specifically set according to actual conditions. For example, in the present embodiment, the preset value is 0.1. Therefore, each time positive feedback is obtained, the initial reliability of the corresponding acquired object is improved by 0.1 on the original basis; the initial reliability of the corresponding collected object is reduced by 0.1 on the original basis every time negative feedback is obtained.
In this embodiment, the initial reliability is equal to or greater than 0 and equal to or less than 1. Step S233 specifically includes:
Y=min(1,(X+0.1));
step S235 specifically includes:
Y=max(0,(X-0.1));
wherein X is the initial credibility of the acquired object before correction, and Y is the initial credibility of the acquired object after correction.
According to the data acquisition method, the acquisition conditions are acquired, a plurality of target data are acquired according to the acquisition conditions, and then the target data are classified according to the preset feature matching degree to obtain a data group; then respectively obtaining individual credibility corresponding to each target data, and obtaining group credibility of the data group according to the individual credibility; and judging whether the group credibility is greater than or equal to a preset value, if so, acquiring corresponding target data in the data group and storing the target data in a target database or a big data storage library. Therefore, the data group consisting of the target data is screened according to the group credibility and the preset value, and the corresponding target data is acquired when the group credibility is greater than or equal to the preset value, so that unreliable data are prevented from being acquired, and the accuracy of data acquisition is improved.
Referring to fig. 5, a data acquisition system in an embodiment of the present invention is implemented based on credibility, big data oriented and common data, and includes a data acquisition module 110, a data classification module 130, a group credibility calculation module 150, a credibility analysis module 170, and a data acquisition module 190.
The data obtaining module 110 is configured to obtain a collecting condition, and obtain a plurality of target data according to the collecting condition.
The acquisition condition refers to information for specifying characteristics of data to be acquired, including objects and attributes. A plurality of target data simultaneously conforming to the acquisition conditions can be acquired by the acquisition conditions. The target data may be large data or general data.
In one embodiment, the collection condition is text information, mandarin pronunciation information, dialect pronunciation information, and the like of a certain specified word, that is, the object is a certain specified word, and the attribute includes text information, mandarin pronunciation information, dialect pronunciation information. Correspondingly, the target data acquired according to the acquisition condition comprises text information and/or voice information. For example, when the collection condition is the voice information of the word, the target data is obtained by correspondingly collecting the voices recorded by the user a, the user B and the user C.
The data classification module 130 is configured to classify the target data to obtain a data group.
In one embodiment, the data classification module 130 is specifically configured to: and extracting preset features of the target data, and taking the target data with the matching degree of the preset features larger than or equal to the preset matching degree as a data group.
The preset features can be selected according to the acquisition conditions of the target data. For example, in this embodiment, the preset feature is text information and/or speech information of a specified word in the collection condition.
The preset matching degree can be selected according to actual conditions. If the preset feature matching degree is greater than or equal to the preset matching degree, the corresponding target data is represented as similar data, and can be classified into one type. Through classifying according to the preset feature matching degree, the similar target data are conveniently and uniformly processed, and the efficiency of multi-data acquisition is improved.
The group credibility calculation module 150 is configured to obtain individual credibility of each target data, and obtain group credibility of the data group according to the individual credibility.
It will be appreciated that, because the predetermined characteristics of each target data in a data population are relatively similar, the true credibility of each target data in a data population is similar, and thus the population credibility in a data population may represent the true credibility of each target data in the data population.
In one embodiment, the group credibility calculation module 150 includes an identity information obtaining unit (not shown), an individual credibility obtaining unit (not shown), and a calculation unit (not shown).
The identity information acquisition unit is used for acquiring the identity information of the acquired object providing the target data according to the target data. The identity information of the acquired object refers to information for identifying the identity of the acquired person. Each collected object corresponds to unique identity information. In this embodiment, the acquired object is a person, that is, the target data is provided by the acquired person. For example, the target data is a voice message of a word, and the voice message is recorded by the user a, then the user a is the person to be collected of the target data. Specifically, in this embodiment, the identity information of the acquired object is an identification number of the acquired person. It is understood that in other embodiments, the collected object may also be a website, and correspondingly, the identity information of the collected object is a website.
The individual credibility obtaining unit is used for searching the credibility of the collected object according to the identity information and taking the credibility of the collected object as the individual credibility of the corresponding target data. For example, in an embodiment where the target data includes text information and/or speech information of a word, the individual reliability acquiring unit specifically acquires the reliability of the collected object from the language database. The language database comprises a plurality of text information and/or voice information, identity information of a collected object of each text information and/or voice information, credibility of each identity information, and association relations between the text information and/or voice information and the identity information and the credibility.
It is to be understood that, in other embodiments, the individual credibility may also be pre-stored in correspondence with the target data, that is, each target data corresponds to one individual credibility, and as long as the target data is obtained, the individual credibility may be obtained according to the association correspondence.
The computing unit is used for obtaining the group credibility of the data group according to the individual credibility.
In one embodiment, the computing unit is specifically configured to: and calculating the average value of the individual credibility of all the target data in the data group to obtain the group credibility of the data group.
For example, the individual credibility of each target data in a certain data population is: 0.5, 0.4, 0.6, 1, the group reliability of the data group is (0.5+0.4+0.6+1)/4 is 0.625. It is understood that in other embodiments, other computing methods may be used to obtain the group confidence level.
The confidence level analysis module 170 is configured to determine whether the group confidence level is greater than or equal to a predetermined value. If not, the correctness of the currently acquired data group does not meet the requirement, and the data group may be an error data group and is not acquired; if yes, the accuracy of the data group meets the requirement.
The preset value can be specifically set according to the required data acquisition accuracy. In this embodiment, the preset value is 0.6. It can be understood that in other embodiments, if the requirement on the data acquisition accuracy is higher, a preset value is set to be increased appropriately, for example, 0.8; if the requirement on the data acquisition accuracy is low, the reduction preset value is set appropriately, for example, 0.5.
The data acquisition module 190 is configured to acquire target data corresponding to the data group and store the target data in a target database or a big data repository when the group reliability is greater than or equal to a preset value.
Wherein, the target database refers to a traditional database for storing common data, such as a relational database; a big data store refers to a store for storing big data. When the collected data is common data, the data is stored in a target database, and when the collected data is big data, the data is stored in a big data storage library.
The corresponding target data in the data group with the group credibility larger than or equal to the preset value is collected and stored in the target database or the big data storage library, and the target data is screened according to the group credibility, so that the data collection accuracy can be improved. For example, in the embodiment where the target data is text information and/or voice information of a word, the corresponding target data in the collected data group is stored in the language database.
In one embodiment, the data collection module 190 is specifically configured to collect all target data included in the data population and store the collected target data in a target database or a big data repository. By collecting all target data in the data group with the group credibility larger than or equal to the preset value, multi-data collection is realized while the data correctness is verified, and the data collection efficiency is improved.
In another embodiment, the data collection module 190 is specifically configured to search the target data with the highest individual reliability in the data population and store the target data in the target database or the big data storage library. The method has the advantages that the optimal target data is selected by collecting the target data with the highest individual credibility in the data group with the group credibility larger than or equal to the preset value, and the data collection accuracy can be improved to the maximum extent.
In one embodiment, the individual reliability obtaining unit in the group reliability calculating module 150 is specifically configured to determine whether the initial reliability of the corresponding acquired object exists in the target database or the big data repository according to the identity information. When the initial credibility exists, taking the initial credibility as the credibility of the corresponding acquired object; otherwise, the preset credibility is used as the credibility of the acquired object, and the individual credibility used for taking the credibility of the acquired object as the corresponding target data.
The preset credibility can be specifically set according to actual conditions. In this embodiment, the predetermined reliability is 0.5.
By judging whether the initial credibility of the acquired object exists or not, if not, the default preset credibility is used as the credibility of the acquired object, so that each acquired person can be ensured to correspond to one credibility, and the condition that the target data does not have corresponding individual credibility is avoided.
In one embodiment, referring to fig. 6, the data acquisition system further includes a feedback information obtaining module 210 and a reliability correcting module 230.
The feedback information acquiring module 210 is configured to acquire feedback information of the target data.
Wherein, the feedback information refers to whether the user feeds back the target data correctly. For example, the feedback information may include information in the "correct" or similar sense, and information in the "wrong" or similar sense.
The reliability correction module 230 is configured to correct the reliability of the acquired object according to the feedback information to obtain a new initial reliability of the acquired object, and store the new initial reliability and the acquired object in association with each other in the target database or the big data repository.
The initial reliability of the collected object is corrected according to the feedback of the user, so that the accuracy of the initial reliability can be improved in time, more accurate reference can be provided for subsequent data collection, and the accuracy of the data collection is improved.
In one embodiment, the feedback information includes positive feedback and negative feedback. For example, "correct" indicates positive feedback and "error" indicates negative feedback. Referring to fig. 7, the reliability correction module 230 includes a feedback information judgment unit 231, a reliability improvement unit 233, a reliability reduction unit 235, and a data storage unit 237.
The feedback information judgment unit 231 is configured to judge whether the type of the feedback information is positive feedback. If yes, the target data is correct, if not, the type of the feedback information is negative feedback, and the target data is wrong.
The reliability improving unit 233 is configured to, when the type of the feedback information is positive feedback, improve the reliability of the acquired object according to a preset difference value to obtain a new initial reliability of the acquired object.
The reliability reducing unit 235 is configured to reduce the reliability of the acquired object according to a preset difference value to obtain a new initial reliability of the acquired object when the type of the feedback information is negative feedback.
The data storage unit 237 is used to store the new initial credibility and the collected object association into the target database or the big data storage library. And the corrected initial credibility is associated with the acquired object for storage, so that the subsequent use is facilitated.
The preset difference value can be specifically set according to the actual situation. For example, in the present embodiment, the preset value is 0.1. Therefore, each time positive feedback is obtained, the initial reliability of the corresponding acquired object is improved by 0.1 on the original basis; the initial reliability of the corresponding collected object is reduced by 0.1 on the original basis every time negative feedback is obtained.
In this embodiment, the initial reliability is equal to or greater than 0 and equal to or less than 1. The reliability improving unit 233 specifically includes:
Y=min(1,(X+0.1));
and acquiring new individual credibility. The reliability reducing unit 235 specifically depends on:
Y=max(0,(X-0.1));
and acquiring new individual credibility. Wherein X is the initial credibility of the acquired object before correction, and Y is the initial credibility of the acquired object after correction.
In the data acquisition system, the data acquisition module 110 acquires acquisition conditions, and after a plurality of target data are acquired according to the acquisition conditions, the data classification module 130 classifies the target data according to a preset feature matching degree to obtain a data group; the group credibility calculation module 150 respectively acquires individual credibility corresponding to each target data, and acquires group credibility of the data group according to the individual credibility; the credibility analysis module 170 determines whether the group credibility is greater than or equal to a preset value, and if so, the data acquisition module 190 acquires corresponding target data in the data group and stores the target data in a target database or a big data storage library. Therefore, the data group consisting of the target data is screened according to the group credibility and the preset value, and the corresponding target data is acquired when the group credibility is greater than or equal to the preset value, so that unreliable data are prevented from being acquired, and the accuracy of data acquisition is improved.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (6)

1. A data acquisition method is characterized by comprising the following steps:
acquiring acquisition conditions, and acquiring target data provided by a plurality of acquired objects which accord with the acquisition conditions according to the acquisition conditions; the acquisition condition refers to information for specifying characteristics of data to be acquired, and comprises an object and attributes; the object comprises a specified word, and the attribute comprises text information and/or voice information;
classifying the target data to obtain a data population, wherein the data population comprises: extracting preset features of the target data, and taking the target data with the matching degree of the preset features larger than or equal to the preset matching degree as a data group; the preset features are selected according to the acquisition conditions of the target data;
respectively obtaining individual credibility corresponding to each target data, and obtaining group credibility of the data group according to the individual credibility;
judging whether the group credibility is greater than or equal to a preset value;
if so, acquiring corresponding target data in the data group and storing the target data in a target database or a big data storage library;
the step of respectively acquiring individual credibility corresponding to each target data comprises:
respectively acquiring the identity information of an acquired object providing the target data according to the target data;
and searching the credibility of the acquired object according to the identity information, and taking the credibility of the acquired object as the individual credibility corresponding to the target data.
2. The data collection method according to claim 1, wherein the step of finding the credibility of the collected object according to the identity information, and using the credibility of the collected object as the individual credibility of the target data comprises:
judging whether the initial credibility of the acquired object exists in the target database or the big data storage library or not according to the identity information;
if so, taking the initial reliability as the reliability of the acquired object;
if not, taking the preset credibility as the credibility of the acquired object;
and taking the credibility of the acquired object as the individual credibility corresponding to the target data.
3. The data acquisition method according to claim 2, wherein after the step of acquiring the corresponding target data in the data population and storing the corresponding target data in a target database or a big data repository, the method further comprises:
acquiring feedback information of the target data;
and correcting the reliability of the acquired object according to the feedback information to obtain a new initial reliability of the acquired object, and storing the new initial reliability and the acquired object into the target database or the big data storage library in a correlation manner.
4. The data acquisition method according to claim 3, wherein the step of modifying the reliability of the acquired object according to the feedback information to obtain a new initial reliability of the acquired object comprises:
judging whether the type of the feedback information is positive feedback;
if so, improving the reliability of the acquired object according to a preset difference value to obtain a new initial reliability of the acquired object;
if not, reducing the credibility of the acquired object according to a preset difference value to obtain a new initial credibility of the acquired object.
5. A data acquisition system, comprising:
the data acquisition module is used for acquiring acquisition conditions and acquiring target data provided by a plurality of acquired objects which accord with the acquisition conditions according to the acquisition conditions; the acquisition condition refers to information for specifying characteristics of data to be acquired, and comprises an object and attributes; the object comprises a specified word, and the attribute comprises text information and/or voice information;
the data classification module is used for classifying the target data to obtain a data group, and comprises: extracting preset features of the target data, and taking the target data with the matching degree of the preset features larger than or equal to the preset matching degree as a data group; the preset features are selected according to the acquisition conditions of the target data;
the group credibility calculation module is used for respectively obtaining the individual credibility of each corresponding target data and obtaining the group credibility of the data group according to the individual credibility;
the credibility analysis module is used for judging whether the group credibility is greater than or equal to a preset value;
the data acquisition module is used for acquiring corresponding target data in the data group and storing the corresponding target data in a target database or a big data storage library when the group credibility is greater than or equal to the preset value;
wherein the group credibility calculation module comprises:
the identity information acquisition unit is used for acquiring the identity information of the acquired object providing the target data according to the target data;
an individual reliability obtaining unit, configured to find reliability of the acquired object according to the identity information, and use the reliability of the acquired object as an individual reliability corresponding to the target data;
and the calculating unit is used for acquiring the group credibility of the data group according to the individual credibility.
6. The data acquisition system of claim 5, further comprising:
the feedback information acquisition module is used for acquiring feedback information of the target data;
and the reliability correction module is used for correcting the reliability of the acquired object according to the feedback information to obtain a new initial reliability of the acquired object, and storing the new initial reliability and the acquired object into the target database or the big data storage library in a correlation manner.
CN201610164635.4A 2016-03-21 2016-03-21 Credibility-based data acquisition method and system for big data and common data Active CN105843889B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610164635.4A CN105843889B (en) 2016-03-21 2016-03-21 Credibility-based data acquisition method and system for big data and common data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610164635.4A CN105843889B (en) 2016-03-21 2016-03-21 Credibility-based data acquisition method and system for big data and common data

Publications (2)

Publication Number Publication Date
CN105843889A CN105843889A (en) 2016-08-10
CN105843889B true CN105843889B (en) 2020-08-25

Family

ID=56587790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610164635.4A Active CN105843889B (en) 2016-03-21 2016-03-21 Credibility-based data acquisition method and system for big data and common data

Country Status (1)

Country Link
CN (1) CN105843889B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664497B (en) * 2017-03-30 2020-11-03 大有秦鼎(北京)科技有限公司 Data matching method and device
CN107315968B (en) * 2017-06-29 2019-08-23 国信优易数据有限公司 A kind of data processing method and equipment
CN107292183B (en) * 2017-06-29 2019-08-23 国信优易数据有限公司 A kind of data processing method and equipment
CN109034194B (en) * 2018-06-20 2022-03-04 东华大学 Transaction fraud behavior deep detection method based on feature differentiation
CN110412467B (en) * 2019-07-30 2021-07-23 重庆邮电大学 Lithium battery fault data screening method constrained by normalized mutual information criterion
CN112835947B (en) * 2019-11-22 2024-04-02 杭州海康威视系统技术有限公司 Target identification method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101681400A (en) * 2007-06-07 2010-03-24 皇家飞利浦电子股份有限公司 Be used to provide the credit system of the degree of reiability of health data
CN103177092A (en) * 2013-03-08 2013-06-26 深圳先进技术研究院 Data updating method and system of knowledge base and knowledge base

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101341948B1 (en) * 2012-01-10 2013-12-16 대한민국 Management system and method for knowledge information of industrial technology
CN102692615B (en) * 2012-03-02 2014-09-10 安徽中兴继远信息技术股份有限公司 System capable of automatically acquiring electric quantity data
CN103533546B (en) * 2013-10-29 2017-03-22 无锡赛思汇智科技有限公司 Implicit user verification and privacy protection method based on multi-dimensional behavior characteristics
CN104092601B (en) * 2014-07-28 2017-12-05 北京微众文化传媒有限公司 The recognition methods of social networks account and device
CN104618372B (en) * 2015-02-02 2017-12-15 同济大学 A kind of authenticating user identification apparatus and method that custom is browsed based on WEB
CN105405068A (en) * 2015-11-30 2016-03-16 国网北京市电力公司 Power data detection method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101681400A (en) * 2007-06-07 2010-03-24 皇家飞利浦电子股份有限公司 Be used to provide the credit system of the degree of reiability of health data
CN103177092A (en) * 2013-03-08 2013-06-26 深圳先进技术研究院 Data updating method and system of knowledge base and knowledge base

Also Published As

Publication number Publication date
CN105843889A (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN105843889B (en) Credibility-based data acquisition method and system for big data and common data
WO2020224119A1 (en) Audio corpus screening method and device for use in speech recognition, and computer device
CN106033416B (en) Character string processing method and device
CN108447471B (en) Speech recognition method and speech recognition device
CN111309912A (en) Text classification method and device, computer equipment and storage medium
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
US20160019671A1 (en) Identifying multimedia objects based on multimedia fingerprint
WO2019080661A1 (en) Method and device for identifying intention of user
JP2015526797A5 (en)
CN110175334B (en) Text knowledge extraction system and method based on custom knowledge slot structure
CN107229627B (en) Text processing method and device and computing equipment
CN111739539A (en) Method, device and storage medium for determining number of speakers
CN111210402A (en) Face image quality scoring method and device, computer equipment and storage medium
CN114358001A (en) Method for standardizing diagnosis result, and related device, equipment and storage medium thereof
US20100030714A1 (en) Method and system to improve automated emotional recognition
CN113761137B (en) Method and device for extracting address information
CN113436614A (en) Speech recognition method, apparatus, device, system and storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN116484052A (en) Educational resource sharing system based on big data
CN113539235B (en) Text analysis and speech synthesis method, device, system and storage medium
WO2022156450A1 (en) Knowledge base query method and apparatus, computer device, and storage medium
CN113609864B (en) Text semantic recognition processing system and method based on industrial control system
CN113539234B (en) Speech synthesis method, device, system and storage medium
CN105808769A (en) Data acquisition method and system facing big data and generic data
CN115509485A (en) Filling-in method and device of business form, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant