CN117609594A - Data analysis method and device, electronic equipment and storage medium - Google Patents

Data analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117609594A
CN117609594A CN202311294235.1A CN202311294235A CN117609594A CN 117609594 A CN117609594 A CN 117609594A CN 202311294235 A CN202311294235 A CN 202311294235A CN 117609594 A CN117609594 A CN 117609594A
Authority
CN
China
Prior art keywords
determining
keywords
website
text information
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311294235.1A
Other languages
Chinese (zh)
Inventor
隗伟
齐成斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Reso Consulting Co ltd
Original Assignee
Beijing Reso Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Reso Consulting Co ltd filed Critical Beijing Reso Consulting Co ltd
Priority to CN202311294235.1A priority Critical patent/CN117609594A/en
Publication of CN117609594A publication Critical patent/CN117609594A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application relates to a data analysis method, a device, electronic equipment and a storage medium, and relates to the field of data processing. The method and the device have the effect of reducing useless data in the data retrieval result.

Description

Data analysis method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a data analysis method, apparatus, electronic device, and storage medium.
Background
In the fields of industry investigation, market big data retrieval and the like, massive data are generally required to be retrieved on the internet, and then integrated and filtered to obtain the required data. At present, related keywords are input into a search engine for searching, and then data recorded with the keywords are reserved, so that required data are obtained, but a large amount of useless data still exist in the obtained data, so that subsequent use analysis of the required data is affected and interfered, and the final searching effect is reduced.
Disclosure of Invention
In order to reduce useless data in a data retrieval result as much as possible, the application provides a data analysis method, a data analysis device, electronic equipment and a storage medium.
In a first aspect, the present application provides a data analysis method, which adopts the following technical scheme:
a method of data analysis, comprising:
searching according to the set keywords to obtain a plurality of websites to be selected;
extracting text information in each website to be selected, wherein each text information is recorded with the keyword;
determining the quality score of each text message, and determining target text messages based on the quality scores, wherein the quality scores represent the correlation degree of the to-be-selected paragraphs and the keywords;
and generating a data report based on the target text information and the website to be selected corresponding to each target text information.
By adopting the technical scheme, the user sets the keywords according to the data requirements to be known, the to-be-selected websites recorded with the content related to the keywords can be obtained by searching according to the set keywords, and the text information of each to-be-selected website is extracted because the data content or information is recorded in the form of characters and is convenient for analyzing the data in the to-be-selected websites, after each text information is determined, the quality score of the degree of correlation between each text information and the keywords is determined, the required target text information with high degree of correlation with the keywords can be more intuitively and accurately screened through the quality score, so that the data quantity of other useless data is reduced, and a data report is generated according to the target text information and the to-be-selected website corresponding to the target text information, thereby facilitating the user to view the data related to the keywords and access the original websites recorded with the data.
In another possible implementation manner, the determining the quality score of each text message and determining the target text message based on the quality score includes:
determining a first number of hit keywords in each text message, a first number of occurrences of each keyword in each text message, and a number of spacing characters between two adjacent keywords in each text message;
determining an average number of occurrences of the keyword based on the first number, and determining a variance of the number of interval characters based on the number of interval characters;
calculating the quality score of each text message based on the first number of hit keywords in each text message, the average times, the variance and the corresponding first coefficients;
and determining the text information with the quality score reaching a preset score threshold as target text information.
In another possible implementation manner, each target text information includes a plurality of paragraphs, and the generating a data report based on the target text information and the candidate website corresponding to each target text information includes:
filtering paragraphs which are not recorded with the keywords in each target text message to obtain to-be-selected paragraphs;
determining a second target keyword hit in each section to be selected, a second number of occurrences of each second target keyword in each section to be selected, and a sum of the second numbers;
Determining a second mass fraction of each section to be selected based on the number of second target keywords hit in each section to be selected, the sum of the second times and the respective corresponding second coefficients;
determining a preset number of to-be-selected paragraphs according to the second mass fraction from high to low;
and generating a data report based on the preset number of the to-be-selected paragraphs and the corresponding to-be-selected websites.
In another possible implementation, the method further includes:
determining the quantity of target text information corresponding to each website to be selected in the data report;
acquiring the release time of each target text message corresponding to each website to be selected;
determining the update frequency of the related content of each website to be selected about the keywords based on the release time;
determining the score of each website to be selected based on the quantity, the updating frequency and the respective corresponding third coefficient of the target text information corresponding to each website to be selected;
determining a website to be selected, the score of which reaches a preset score threshold value, as a target website;
and searching based on the updating frequency corresponding to the target website and the keywords, and updating the data report if new text information is searched.
In another possible implementation manner, the updating the data report further includes:
outputting prompt information and/or uploading the updated data report to the cloud server.
In another possible implementation, the method further includes:
determining remaining segments to be selected from the segments to be selected of each target text message, wherein the remaining segments to be selected are segments to be selected except for the preset number of segments to be selected;
determining sentences with the keywords recorded in the rest of the candidate paragraphs, and generating a list based on the sentences;
determining the corresponding relation between a list and the preset number of the candidate paragraphs, and storing the list in a data report;
and controlling and displaying the list based on the corresponding relation.
In another possible implementation manner, the controlling displaying the list based on the correspondence relation includes:
uploading the data report to a cloud server, and acquiring real-time access behaviors of a user to the data report through terminal equipment, wherein the real-time access behaviors comprise paragraphs to be selected and displayed pictures on the terminal equipment, and the paragraphs to be selected are checked by the user in real time;
determining a list based on the corresponding relation of the real-time checked to-be-selected paragraphs;
And determining a blank position from the display screen and controlling the terminal equipment to display the list at the blank position.
In a second aspect, the present application provides a data analysis device, which adopts the following technical scheme:
a data analysis device, comprising:
the retrieval module is used for retrieving according to the set keywords to obtain a plurality of websites to be selected;
the extraction module is used for extracting text information in each website to be selected, and each text information is recorded with the keyword;
the quality score determining module is used for determining the quality score of each text message and determining target text messages based on the quality score, wherein the quality score represents the correlation degree of the to-be-selected paragraph and the keywords;
and the report generation module is used for generating a data report based on the target text information and the candidate websites corresponding to each target text information.
By adopting the technical scheme, the user sets the keywords according to the data requirements to be known, the search module searches according to the set keywords to obtain the website to be selected, and the data content or information is recorded in the website to be selected in a text form, so that the extraction module extracts text information of each website to be selected, analysis is conveniently carried out on data in the website to be selected, after each text information is determined, the quality score determining module determines the quality score of the degree of correlation between each text information and the keywords, the required target text information with high degree of correlation with the keywords can be more intuitively and accurately screened through the quality score, the data quantity of other useless data is reduced, the report generating module generates a data report according to the target text information and the website to be selected corresponding to the target text information, the user can conveniently check the data related to the keywords and access the original website recorded with the data, compared with the existing data collecting and analyzing means, the data analysis screening and filtering can be carried out according to the quality score of each text information, and the most relevant data in the search result can be reserved, and the useless data is reduced to the maximum degree.
In another possible implementation manner, the quality score determining module is specifically configured to, when determining a quality score of each text message and determining the target text message based on the quality score:
determining a first number of hit keywords in each text message, a first number of occurrences of each keyword in each text message, and a number of spacing characters between two adjacent keywords in each text message;
determining an average number of occurrences of the keyword based on the first number, and determining a variance of the number of interval characters based on the number of interval characters;
calculating the quality score of each text message based on the first number of hit keywords in each text message, the average times, the variance and the corresponding first coefficients;
and determining the text information with the quality score reaching a preset score threshold as target text information.
In another possible implementation manner, each target text information includes a plurality of paragraphs, and the report generating module is configured to generate a data report on a website to be selected based on the target text information and the corresponding target text information, specifically configured to:
filtering paragraphs which are not recorded with the keywords in each target text message to obtain to-be-selected paragraphs;
Determining a second target keyword hit in each section to be selected, a second number of occurrences of each second target keyword in each section to be selected, and a sum of the second numbers;
determining a second mass fraction of each section to be selected based on the number of second target keywords hit in each section to be selected, the sum of the second times and the respective corresponding second coefficients;
determining a preset number of to-be-selected paragraphs according to the second mass fraction from high to low;
and generating a data report based on the preset number of the to-be-selected paragraphs and the corresponding to-be-selected websites.
In another possible implementation, the apparatus further includes:
the quantity determining module is used for determining the quantity of target text information corresponding to each website to be selected in the data report;
the time acquisition module is used for acquiring the release time of each target text message corresponding to each website to be selected;
an update frequency determining module, configured to determine an update frequency of relevant content of each website to be selected with respect to the keyword based on the posting time;
the score determining module is used for determining the score of each website to be selected based on the quantity, the updating frequency and the respective corresponding third coefficient of the target text information corresponding to each website to be selected;
The target website determining module is used for determining the websites to be selected, the scores of which reach a preset score threshold value, as target websites;
and the updating module is used for searching based on the updating frequency corresponding to the target website and the keywords, and updating the data report if new text information is searched.
In another possible implementation, the apparatus further includes:
the output module is used for outputting prompt information and/or the uploading module is used for uploading the updated data report to the cloud server.
In another possible implementation, the apparatus further includes:
the remaining alternative paragraph determining module is used for determining remaining alternative paragraphs from the alternative paragraphs of each target text message, wherein the remaining alternative paragraphs are the alternative paragraphs except the preset number of alternative paragraphs;
a sentence determining module, configured to determine sentences in the remaining candidate paragraphs that record the keywords, and generate a list based on the sentences;
the corresponding relation determining module is used for determining the corresponding relation between a list and the preset number of the segments to be selected and storing the list in a data report;
and the control display module is used for controlling and displaying the list based on the corresponding relation.
In another possible implementation manner, the control display module is specifically configured to, when controlling to display the list based on the correspondence relation:
uploading the data report to a cloud server, and acquiring real-time access behaviors of a user to the data report through terminal equipment, wherein the real-time access behaviors comprise paragraphs to be selected and displayed pictures on the terminal equipment, and the paragraphs to be selected are checked by the user in real time;
determining a list based on the corresponding relation of the real-time checked to-be-selected paragraphs;
and determining a blank position from the display screen and controlling the terminal equipment to display the list at the blank position.
In a third aspect, the present application provides an electronic device, which adopts the following technical scheme:
an electronic device, the electronic device comprising:
at least one processor;
a memory;
at least one application, wherein the at least one application is stored in the memory and configured to be executed by the at least one processor, the at least one processor configured to: a method of data analysis according to any one of the possible implementations of the first aspect is performed.
In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:
A computer readable storage medium, which when executed in a computer, causes the computer to perform a data analysis method according to any one of the first aspects.
In summary, the present application includes at least one of the following beneficial technical effects:
1. setting keywords according to data requirements to be known, searching according to the set keywords to obtain a website to be selected, recording content related to the keywords, wherein the data content or information is recorded in a text form and is in the website to be selected, so that text information of each website to be selected is extracted, analysis is conveniently carried out on data in the website to be selected, after each text information is determined, quality scores of the degree of correlation between each text information and the keywords are determined, target text information which is required and has high degree of correlation with the keywords can be more intuitively and accurately screened out through the quality scores, the data quantity of other useless data is reduced, a data report is generated according to the target text information and the website to be selected corresponding to the target text information, so that the user can conveniently view the data related to the keywords and access the original website recorded with the data, and compared with the existing data collection analysis means, data analysis screening and filtering are carried out according to the quality scores, and data most related to the keywords in a search result can be reserved to the greatest extent, and useless data is reduced;
2. And determining the most relevant and most attached target websites with the keywords by calculating the score of each website to be selected, searching the target websites according to the keywords on the target websites according to the update frequency of each target website after determining the target websites, thereby judging whether new data about the keywords exist, if the new data about the keywords are released, updating the data report more conveniently and rapidly, improving the timeliness of the data, searching the new data through the target websites with higher relevant degree with the keywords, filtering websites with low scores, namely with low relevant degree with the keywords, and improving the searching efficiency.
Drawings
Fig. 1 is a flow chart of a data analysis method in an embodiment of the application.
Fig. 2 is a schematic flow chart of calculating mass fractions in an embodiment of the present application.
FIG. 3 is a flow chart of generating a data report in an embodiment of the present application.
FIG. 4 is a flow chart of determining a target website and updating a data report in an embodiment of the present application.
Fig. 5 is a schematic flow chart of determining a sentence list recorded with keywords in the embodiment of the present application.
Fig. 6 is a schematic diagram of a specific flow of controlling a display list in an embodiment of the present application.
Fig. 7 is a schematic structural diagram of a data analysis device in an embodiment of the present application.
Fig. 8 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The present application is described in further detail below with reference to the accompanying drawings.
Modifications of the embodiments which do not creatively contribute to the invention may be made by those skilled in the art after reading the present specification, but are protected by patent laws only within the scope of claims of the present application.
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, unless otherwise specified, the term "/" generally indicates that the associated object is an "or" relationship.
Embodiments of the present application are described in further detail below with reference to the drawings attached hereto.
The embodiment of the application provides a data analysis method, which is executed by electronic equipment, wherein the electronic equipment can be a server or terminal equipment, and the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., and the terminal device and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein, and as shown in fig. 1, the method includes step S101, step S102, step S103, and step S104, where,
and S101, searching according to the set keywords to obtain a plurality of websites to be selected.
For the embodiment of the application, the user can input the keywords on the visual operation interface through the input device such as the keyboard, the electronic equipment searches on the internet through the search engine, the keywords serve as specific attributes of the content of a certain aspect which the user wants to know, and the related content can be obtained through searching through the keywords. And searching by a search engine to obtain a plurality of websites related to the keywords, namely the websites to be selected. For example, a user may input related keywords in a search engine to search for a change in sales of an industry. It should be understood that the number of keywords may be one or more.
S102, extracting text information in each website to be selected.
Wherein, each text message is recorded with a keyword.
For the embodiment of the application, after a plurality of websites to be selected are retrieved, each website represents a link, and specific content about the keywords can be checked by clicking the links. The electronic device can automatically click on each of the candidate websites to obtain content within the websites. Since information is generally written on a website in a text form, text information in which keywords are written can be obtained by OCR (Optical Character Recognition) technology or by copying contents on the website.
S103, determining the quality score of each text message, and determining the target text message based on the quality score.
Wherein, the quality score characterizes the correlation degree of the section to be selected and the keywords.
For the embodiment of the application, after the text information of each candidate website is extracted, other contents are recorded in addition to the related contents about the keywords. It is therefore necessary to calculate the quality score of each text message, i.e., the degree of correlation of each text message as a whole with the keywords, i.e., the degree of matching with what the user needs to know about. The quality of each text message can be more intuitively represented through the quality score, and the comparison and the determination of the proper target text message, namely the text message with higher quality score, are facilitated.
S104, generating a data report based on the target text information and the candidate websites corresponding to each target text information.
For the embodiment of the application, after the electronic equipment determines the target text information, the target text information can be integrated and summarized to generate a text document, so that a data report is obtained. And the data report also records the candidate websites corresponding to each target text message. Therefore, the user can enter the corresponding website to be selected to check the content of the original record when checking the data report. And searching the keywords to obtain text information about the keywords, calculating the quality score of the text information, screening out target text information according to the quality score, and generating a data report from the screened target text information. Thereby filtering out text information with low quality and inferior quality, and improving the efficiency and quality of data retrieval.
In one possible implementation manner of the embodiment of the present application, the quality score of each text message is determined in step S103, and the target text message is determined based on the quality score, which specifically includes step S1031, step S1032, step S1033, and step S1034, as shown in fig. 2, where,
s1031, determining a first number of hit keywords in each text message, a first number of occurrences of each keyword in each text message, and a number of spacing characters between two adjacent keywords in each text message.
For the embodiment of the application, after the electronic device extracts the text information of each website to be selected, the electronic device can search and find each keyword in the text information in sequence, so that the hit keywords in each text information are determined. For example, keywords are "industry", "sales" and "growth". After the electronic equipment retrieves a certain text message, determining two keywords of "certain industry" and "sales" hit in the text message, namely, the first number is 2, and the more the hit keywords, the more the description is matched with the keywords, namely, the content which the user needs to know. Then, the electronic device may traverse the hit keywords in the text information, so as to determine the number of times the hit keywords appear in the text information, i.e., the first number of times. Assuming that the first number of times of the keyword "certain industry" is 10 and the first number of times of the keyword "sales" is 5, the more the number of occurrences of the keyword, the more relevant the description is to the keyword. Since the matching degree with the keywords needs to be judged from the text information as a whole, the electronic device needs to determine the number of interval characters between two adjacent keywords in the text information, and the number of interval characters can represent the correlation degree of the text information with the keywords as a whole.
S1032, determining the average number of times the keyword appears based on the first number of times, and determining the variance of the number of interval characters based on the number of interval characters.
For the embodiment of the present application, the electronic device calculates the average number of times after determining the first number of times of each keyword in the text information, taking step S1031 as an example, where the first number of times of the keyword "certain industry" is 10, and the first number of times of the keyword "sales" is 5, then the electronic device determines the sum of the first numbers of times to be 15, and divides the sum by the first number of times 2 to calculate the average number of times to be 7.5. In other embodiments, the average number of times may be calculated by dividing the sum of the first times by the number of keywords set by the user. After the electronic equipment determines the number of interval characters between two adjacent keywords, the average number of interval characters can be calculated by an average value calculation formula, then the variance of the number of interval characters can be calculated by a variance calculation formula, the variance represents the distribution condition of the keywords in the whole text information, the smaller the variance is, the more average the distribution of the keywords is, and the degree of correlation between the text information and the keywords is higher on the whole. The larger the variance, the more discrete the keyword distribution, the lower the degree of correlation of the text information with the keywords as a whole, and the higher the degree of correlation with the keywords in only a part of the content. Let the calculated variance of the electronic device be 0.85.
S1033, calculating the quality score of each text message based on the first number, average times and variance of hit keywords in each text message and the corresponding first coefficients.
For the embodiment of the present application, since the variances of the first number, the average number and the interval number of characters can represent the quality of the text information, and the three aspects have different degrees of influence from the quality of the text information, different first coefficients are set for the three, and it is assumed that the first coefficient corresponding to the first number is 0.8, the first coefficient corresponding to the average number is 0.8, and the first coefficient corresponding to the variance is-2. Taking step S1031 and step S1032 as examples, the electronic device calculates the quality score of the still text information as 2×0.8+7.5×0.8+0.85× (-2) =6.9. It should be understood that the first coefficient may be adaptively adjusted according to actual situations or requirements, and in other embodiments, the first number may be replaced by a ratio of the first number to the number of the set keywords.
And S1034, determining the text information with the quality score reaching a preset score threshold as target text information.
For the embodiment of the present application, assuming that the preset quality score threshold is 6, the preset quality score threshold is used as a demarcation point with a higher quality score, that is, a demarcation point of the target text information is screened, the text information with the quality score reaching the preset quality score threshold is determined to be the target text information, taking step S1033 as an example, if the quality score 6.9 of the text information reaches the preset quality score threshold 6, the electronic device determines the text information to be the target text information. The correlation degree of each text message and the keywords can be intuitively known through calculating the quality score, the correlation degree of the text message and the keywords is quantized, and the retrieved mass data can be more accurately screened.
In one possible implementation manner of the embodiment of the present application, each target text information includes a plurality of paragraphs, and the step S104 generates a data report based on the target text information and the candidate website corresponding to each target text information, which specifically includes a step S1041, a step S1042, a step S1043, a step S1044, and a step S1045, as shown in fig. 3, where,
s1041, filtering paragraphs which are not recorded with keywords in each target text message, and obtaining the to-be-selected paragraphs.
For the embodiment of the application, the text information is generally composed of a plurality of paragraphs, so that the paragraphs, in which no keyword is recorded in each text information, are filtered, namely, the irrelevant paragraphs are filtered, so that the data volume is reduced, and the paragraphs to be selected of each text information can be obtained after the irrelevant paragraphs are filtered.
S1042, determining the second target keywords hit in each section to be selected, the second times of each second target keyword in each section to be selected and the sum of the second times.
For the embodiment of the application, after determining the to-be-selected paragraphs, the electronic device can determine the hit keywords, namely the second target keywords, in each to-be-selected paragraph, and further perform traversal search on the hit second target keywords, so as to determine the second times of occurrence of each second target keyword in each to-be-selected paragraph. Assuming that 2 second target keywords exist in a certain section to be selected are respectively "certain industry" and "sales," the second times of the certain industry "in the section to be selected are 3 times, the second times of the" sales "in the section to be selected are 2 times, and the sum of the second times is calculated to be 5 times by the electronic equipment.
S1043, determining a second mass fraction of each section to be selected based on the number of the second target keywords hit in each section to be selected, the sum of the second times and the second coefficient corresponding to each second target keyword.
For the embodiment of the application, the more the number of the second target keywords hit in the paragraph to be selected, the higher the correlation degree between the paragraph to be selected and the keywords, and likewise, the larger the sum of the second times, the higher the correlation degree between the paragraph to be selected and the keywords. And the two aspects have different influence degrees on the quality of the paragraphs to be selected, so different second coefficients are set for the two aspects, and the second coefficient corresponding to the sum of the second times is 0.8 on the assumption that the second coefficient corresponding to the second target keyword is 3. Taking step S1042 as an example, the electronic device calculates the second mass fraction of the selected paragraph to be 2×3+5×0.8=10. Likewise, the second coefficient can be adaptively adjusted according to the actual situation and the requirement, which is only an example.
S1044, determining a preset number of candidate segments according to the second mass fraction from high to low.
For the embodiment of the present application, the electronic device may calculate the second mass fraction of each candidate segment in the text information in the manner described in step S1043. After the second mass fraction of each section to be selected is obtained, the electronic equipment can sort the sections to be selected according to the order from the high to the low of the second mass fraction. The electronic equipment determines two paragraphs to be selected with the highest second quality score, namely the two paragraphs to be selected with the highest keyword correlation degree and the highest content recording quality, and the integral text information is more accurately represented by the preset number of the paragraphs to be selected with the highest second quality score, assuming that the preset number is 2.
S1045, generating a data report based on the preset number of the to-be-selected paragraphs and the corresponding to-be-selected websites.
For the embodiment of the application, after determining the preset number of the candidate paragraphs with the highest quality score, the electronic device can establish the corresponding relation between the candidate paragraphs and the corresponding candidate websites, so that a user can conveniently access websites recording original contents, specifically, hyperlinks between the websites and the candidate paragraphs can be established, the user can access the original websites by directly clicking the candidate paragraphs, and then a data report in the form of a text document can be generated, so that the user can conveniently view the data report.
In one possible implementation manner of the embodiment of the present application, the method further includes step S1, step S2, step S3, step S4, step S5, and step S6, as shown in fig. 4, where step S1 may be performed after step S104, where,
s1, determining the number of target text information corresponding to each website to be selected in the data report.
For the embodiment of the application, different text information may come from the same candidate website, for example, when the candidate website belongs to a news portal, different text information may exist from the same candidate website. Therefore, the electronic equipment determines the website to be selected to which each target text message belongs, and further can determine the quantity of the target text messages corresponding to each website to be selected. Assume that 3 websites to be selected exist, namely, a website to be selected A, a website to be selected B and a website to be selected C, wherein the number of the corresponding target text information is 3 websites to be selected A, 2 websites to be selected B and 1 website to be selected C.
S2, acquiring the release time of each target text message corresponding to each website to be selected.
For the embodiment of the application, the relevant authors of the target text information also correspond to the release time when releasing the content on the website, so that the electronic equipment extracts the content of the webpage where the target text information is located or captures background data, and the release time of each target text information can be obtained.
And S3, determining the update frequency of the related content of each website to be selected about the keywords based on the release time.
For the embodiment of the application, the update frequency of each website to be selected in the process of publishing the content related to the keywords can be determined through the publishing time, for example, the time interval of two adjacent target text messages is calculated according to the publishing time, then the average time interval is calculated, and the update frequency can be represented through the average time interval. Taking step S2 as an example, the electronic device calculates that the update frequency of "website a to be selected" is 5 days, the update frequency of "website B to be selected" is 8 days, and the update frequency of "website C to be selected" is 10 days. It should be noted that if the website to be selected has only one target text message, the time interval from the target text message to the current time is taken as the update frequency.
And S4, determining the score of each website to be selected based on the quantity of the target text information corresponding to each website to be selected, the updating frequency and the corresponding third coefficient.
For the embodiment of the application, the more the number of the target text information of the website to be selected is, the more the data about the keywords recorded by the website to be selected is, and the more the website to be selected is related to the keywords. The higher the update frequency, the more active the data update of the selected website about the keywords, and the more relevant the keywords. Therefore, different third coefficients are set for the two, and the third coefficient corresponding to the update frequency is-0.2 on the assumption that the third coefficient corresponding to the number of target text information of the website to be selected is 10. Taking step S2 and step S3 as an example, the electronic device calculates the score of "website a to be selected" as 3×10+5× (-0.2) =29; the score of "website B to be selected" is 2×10+8× (-0.2) =18.4; the score of "website C to be selected" is 1×10+10× (-0.2) =8.
Likewise, the third coefficient can be adaptively adjusted according to the actual situation.
S5, determining the website to be selected, the score of which reaches a preset score threshold value, as a target website.
For the embodiment of the present application, assuming that the preset score threshold is 15, the preset score threshold is used as a demarcation point with a higher score, that is, a demarcation point of a target website is screened, the website to be selected with the score reaching the preset quality score threshold is determined to be the target website, taking step S4 as an example, if the "website to be selected" a and the "website to be selected" B reach the preset quality score threshold 15, the electronic device determines the "website to be selected" a and the "website to be selected" B "as the target websites. The matching degree and quality of each website to be selected and the keywords can be intuitively reflected by calculating the score of each website to be selected, so that new data about the keywords can be conveniently acquired subsequently.
And S6, searching based on the update frequency and the keywords corresponding to the target website, and updating the data report if new text information is searched.
For the embodiment of the application, after determining the target websites, the electronic device determines the time for searching according to the keywords next time according to the update frequency and the current time of each target website, wherein the current time can be obtained through a clock chip local to the electronic device or through the internet. When the electronic equipment judges that the next retrieval time is reached, retrieval is carried out on the target website according to the keywords, so that whether new data related to the keywords exist or not is judged, if the new data related to the keywords exist, the data report can be updated more conveniently and rapidly, the timeliness of the data can also be improved, the new data is retrieved through the target website with higher correlation degree with the keywords, websites with low filtering scores, namely websites with low correlation degree with the keywords, are filtered, and the retrieval efficiency is improved.
In one possible implementation manner of the embodiment of the present application, step S106 further includes outputting a prompt message and/or uploading the updated data report to the cloud server.
For the embodiment of the application, in order to facilitate the user to know in time that new data about keywords are retrieved, the electronic device may send "related data has been updated, please view in time" text message information to the terminal device of the user, so as to achieve the effect of prompt in time, or may directly upload the updated data report to the cloud server, so that the user can view the data conveniently later, and may send prompt information and upload the updated data report to the cloud server, which is not limited herein. It is to be appreciated that the electronic device is communicatively coupled to the cloud server.
In one possible implementation manner of the embodiment of the present application, the method further includes a step Sa, a step Sb, a step Sc, and a step Sd, as shown in fig. 5, where the step Sa may be performed after the step S103, where,
sa, determining the rest candidate paragraphs from the candidate paragraphs of each target text message.
Wherein the remaining alternative paragraphs are other than the preset number of alternative paragraphs.
For the embodiment of the application, the sentences with the keywords recorded therein are also included in the remaining paragraphs to be selected, namely, sentences with weaker degrees of relevance to the keywords, so that the electronic device determines the remaining paragraphs to be selected first, thereby facilitating subsequent extraction of data with possible reference values.
And Sb, determining sentences in the rest of the candidate paragraphs, which are recorded with the keywords, and generating a list based on the sentences.
For the embodiment of the application, after determining the keywords in each remaining paragraph to be selected, the electronic device may identify the nearest punctuation mark before the keywords and the nearest punctuation mark after the keywords, thereby extracting sentences containing the keywords, and then inducing the extracted sentences into a table to generate a list.
And Sc, determining the corresponding relation between the list and the preset number of the candidate paragraphs, and storing the list in a data report.
For the embodiment of the application, the electronic device establishes the corresponding relation between the list and the preset number of the candidate paragraphs of each target text message, so that the data which are strongly related to the keywords and the data which are weakly related to the keywords in the same target text message are associated, and a user can conveniently check all the data which are related to the keywords in the same target text message. The electronic device also stores the list in a data report, facilitating subsequent users to view the content in the list.
And Sd, controlling the display list based on the corresponding relation.
For the embodiment of the application, after the electronic equipment determines the corresponding relation, the display list can be controlled according to the corresponding relation, so that the data weakly related to the keywords in the display target text information is output, the data is more comprehensively displayed, and omission is not easy to occur.
In one possible implementation manner of the embodiment of the present application, the step Sd controls the display list based on the correspondence relationship, including step Sd1, step Sd2, and step Sd3, as shown in fig. 6, where
Sd1, uploading the data report to a cloud server, and acquiring real-time access behaviors of a user to the data report through terminal equipment.
The real-time access behavior comprises a section to be selected, which is checked by a user in real time, and a display picture on the terminal equipment.
For the embodiment of the application, the terminal equipment of the user can be a mobile phone, a personal computer or the like, and the terminal equipment of the user is in communication connection with the cloud server. After the electronic equipment uploads the data report to the cloud server, the terminal equipment of the user can access the cloud server so as to view the data report. And the access behaviors of the users when viewing the data reports are synchronously uploaded to the cloud server in real time, so that the electronic equipment acquires the real-time access behaviors of the users in real time.
Sd2, determining a list based on the real-time checked candidate paragraphs and the corresponding relation of the real-time checked candidate paragraphs.
For the embodiment of the application, the paragraphs to be selected, which are viewed in real time by the user, can be known according to the pointing position of the cursor on the user terminal device. After determining that the candidate section which is being checked by the user falls, a list belonging to the same target text information can be determined according to the corresponding relation.
And the electronic equipment can determine the blank position in the picture on the terminal equipment according to the display picture in the real-time access behavior. For example, after denoising the display picture, gray level conversion is performed, and binarization processing is performed, so that the blank position can be determined. The display screen may be input to a trained network model to identify the text display area, and after the text display area is identified, the positions other than the text display area may be determined as blank positions.
Sd 3. determining a blank position from the display screen and controlling the terminal device to display a list at the blank position.
For the embodiment of the application, after the blank position is determined, the electronic equipment controls the terminal equipment to display the list at the blank position, so that all data related to the keywords in the target text information can be more conveniently checked under the condition that the user is not influenced to check the candidate paragraphs strongly related to the keywords.
The above embodiments describe a data analysis method from the viewpoint of a method flow, and the following embodiments describe a data analysis apparatus from the viewpoint of a virtual module or a virtual unit, and the following embodiments are described in detail.
An embodiment of the present application provides a data analysis device 20, as shown in fig. 7, the data analysis device 20 may specifically include:
the retrieval module 201 is configured to retrieve according to the set keywords to obtain a plurality of websites to be selected;
the extracting module 202 is configured to extract text information in each website to be selected, where each text information records a keyword;
a quality score determining module 203, configured to determine a quality score of each text message, and determine a target text message based on the quality score, where the quality score characterizes a degree of correlation between the candidate paragraph and the keyword;
The report generating module 204 is configured to generate a data report based on the target text information and the candidate websites corresponding to each target text information.
The embodiment of the application discloses a data analysis device 20, wherein, the user sets keywords according to the data requirement to be known, the search module 201 searches according to the set keywords to obtain the website to be selected recorded with the content related to the keywords, because the data content or information is recorded in the website to be selected in a text form, the extraction module 202 extracts the text information of each website to be selected, thereby facilitating the analysis of the data in the website to be selected, after determining each text information, the quality score determining module 203 determines the quality score of the degree of correlation between each text information and the keywords, the required target text information with high degree of correlation with the keywords can be screened out more intuitively and accurately through the quality score, thereby reducing the data quantity of other useless data, the report generating module 204 generates a data report according to the target text information and the website to be selected corresponding to the target text information, thereby facilitating the user to view the data related to the keywords and access the original website recorded with data, compared with the existing data collection analysis means, the data can be filtered and analyzed according to the quality score, the data can be filtered and the data related to the maximum degree of the search result can be reduced.
In one possible implementation manner of the embodiment of the present application, when determining the quality score of each text message and determining the target text message based on the quality score, the quality score determining module 203 is specifically configured to:
determining a first number of hit keywords in each text message, a first number of occurrences of each keyword in each text message, and a number of spacing characters between two adjacent keywords in each text message;
determining an average number of occurrences of the keyword based on the first number, and determining a variance of the number of interval characters based on the number of interval characters;
calculating the quality score of each text message based on the first number, average times and variance of hit keywords in each text message and the corresponding first coefficient;
and determining the text information with the quality score reaching a preset score threshold as target text information.
In one possible implementation manner of this embodiment of the present application, each target text information includes a plurality of paragraphs, and the report generating module 304 is configured to generate a data report based on the target text information and a website to be selected corresponding to each target text information, specifically configured to:
filtering paragraphs which are not recorded with keywords in each target text message to obtain to-be-selected paragraphs;
Determining second target keywords hit in each section to be selected, second times of occurrence of each second target keyword in each section to be selected and sum of the second times;
determining a second quality score of each section to be selected based on the number of second target keywords hit in each section to be selected, the sum of the second times and the respective corresponding second coefficient;
determining a preset number of to-be-selected paragraphs according to the second mass fraction from high to low;
and generating a data report based on the preset number of the to-be-selected paragraphs and the corresponding to-be-selected websites.
In one possible implementation manner of the embodiment of the present application, the apparatus 20 further includes:
the quantity determining module is used for determining the quantity of target text information corresponding to each website to be selected in the data report;
the time acquisition module is used for acquiring the release time of each target text message corresponding to each website to be selected;
an update frequency determining module, configured to determine an update frequency of related content of each website to be selected about the keyword based on the posting time;
the score determining module is used for determining the score of each website to be selected based on the quantity, the updating frequency and the corresponding third coefficient of the target text information corresponding to each website to be selected;
The target website determining module is used for determining the websites to be selected, the scores of which reach a preset score threshold value, as target websites;
and the updating module is used for searching based on the updating frequency and the keywords corresponding to the target website, and updating the data report if new text information is searched.
In one possible implementation manner of the embodiment of the present application, the apparatus 20 further includes:
the output module is used for outputting prompt information and/or the uploading module is used for uploading the updated data report to the cloud server.
In one possible implementation manner of the embodiment of the present application, the apparatus 20 further includes:
the remaining candidate paragraph determining module is used for determining remaining candidate paragraphs from the candidate paragraphs of each target text message, wherein the remaining candidate paragraphs are the candidate paragraphs except for the preset number of candidate paragraphs;
the sentence determining module is used for determining sentences with keywords recorded in the remaining to-be-selected paragraphs and generating a list based on the sentences;
the corresponding relation determining module is used for determining the corresponding relation between the list and the preset number of the to-be-selected paragraphs and storing the list in the data report;
and the control display module is used for controlling the display list based on the corresponding relation.
In one possible implementation manner of the embodiment of the present application, when the control display module controls the display list based on the correspondence relationship, the control display module is specifically configured to:
uploading the data report to a cloud server, and acquiring real-time access behaviors of a user to the data report through terminal equipment, wherein the real-time access behaviors comprise to-be-selected paragraphs which are checked by the user in real time and display pictures on the terminal equipment;
determining a list based on the corresponding relation of the real-time checked to-be-selected paragraphs;
and determining a blank position from the display screen and controlling the terminal device to display a list at the blank position.
It will be clear to those skilled in the art that, for convenience and brevity of description, the specific operation of the data analysis device 20 described above may refer to the corresponding procedure in the foregoing method embodiment, and will not be described in detail herein.
In an embodiment of the present application, as shown in fig. 8, an electronic device 30 shown in fig. 8 includes: a processor 301 and a memory 303. Wherein the processor 301 is coupled to the memory 303, such as via a bus 302. Optionally, the electronic device 30 may also include a transceiver 304. It should be noted that, in practical applications, the transceiver 304 is not limited to one, and the structure of the electronic device 30 is not limited to the embodiment of the present application.
The processor 301 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. Processor 301 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
Bus 302 may include a path to transfer information between the components. Bus 302 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect Standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. Bus 302 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or type of bus.
The Memory 303 may be, but is not limited to, a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory ), a CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 303 is used for storing application program codes for executing the present application and is controlled to be executed by the processor 301. The processor 301 is configured to execute the application code stored in the memory 303 to implement what is shown in the foregoing method embodiments.
Among them, electronic devices include, but are not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. But may also be a server or the like. The electronic device shown in fig. 8 is only an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present application.
The present application provides a computer readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above. Compared with the prior art, in the embodiment of the application, the user sets the keywords according to the data requirements to be known, the data content or information is recorded in the websites to be selected in the form of characters, so that the text information of each website to be selected is extracted, analysis of the data in the websites to be selected is facilitated, after each text information is determined, the quality score of the degree of correlation between each text information and the keywords is determined, the required target text information with high degree of correlation with the keywords can be screened out more intuitively and accurately through the quality score, the data quantity of other useless data is reduced, a data report is generated according to the target text information and the websites to be selected corresponding to the target text information, the user is facilitated to view the data related to the keywords and access the original websites recorded with the data, compared with the existing data collection and analysis means, the data most related to the keywords can be retained in the search result by calculating the quality score of each text information, and analyzing and screening and filtering the data according to the quality score to the maximum degree.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (10)

1. A method of data analysis, comprising:
searching according to the set keywords to obtain a plurality of websites to be selected;
Extracting text information in each website to be selected, wherein each text information is recorded with the keyword;
determining the quality score of each text message, and determining target text messages based on the quality scores, wherein the quality scores represent the correlation degree of the to-be-selected paragraphs and the keywords;
and generating a data report based on the target text information and the website to be selected corresponding to each target text information.
2. The method of claim 1, wherein said determining a quality score for each text message and determining a target text message based on said quality score comprises:
determining a first number of hit keywords in each text message, a first number of occurrences of each keyword in each text message, and a number of spacing characters between two adjacent keywords in each text message;
determining an average number of occurrences of the keyword based on the first number, and determining a variance of the number of interval characters based on the number of interval characters;
calculating the quality score of each text message based on the first number of hit keywords in each text message, the average times, the variance and the corresponding first coefficients;
And determining the text information with the quality score reaching a preset score threshold as target text information.
3. The method according to claim 1, wherein each target text message includes a plurality of paragraphs, and the generating a data report based on the target text message and the candidate website corresponding to each target text message includes:
filtering paragraphs which are not recorded with the keywords in each target text message to obtain to-be-selected paragraphs;
determining a second target keyword hit in each section to be selected, a second number of occurrences of each second target keyword in each section to be selected, and a sum of the second numbers;
determining a second mass fraction of each section to be selected based on the number of second target keywords hit in each section to be selected, the sum of the second times and the respective corresponding second coefficients;
determining a preset number of to-be-selected paragraphs according to the second mass fraction from high to low;
and generating a data report based on the preset number of the to-be-selected paragraphs and the corresponding to-be-selected websites.
4. A method of data analysis according to claim 1, wherein the method further comprises:
Determining the quantity of target text information corresponding to each website to be selected in the data report;
acquiring the release time of each target text message corresponding to each website to be selected;
determining the update frequency of the related content of each website to be selected about the keywords based on the release time;
determining the score of each website to be selected based on the quantity, the updating frequency and the respective corresponding third coefficient of the target text information corresponding to each website to be selected;
determining a website to be selected, the score of which reaches a preset score threshold value, as a target website;
and searching based on the updating frequency corresponding to the target website and the keywords, and updating the data report if new text information is searched.
5. The method of claim 4, wherein said updating said data report further comprises:
outputting prompt information and/or uploading the updated data report to the cloud server.
6. A method of data analysis according to claim 3, wherein the method further comprises:
determining remaining segments to be selected from the segments to be selected of each target text message, wherein the remaining segments to be selected are segments to be selected except for the preset number of segments to be selected;
Determining sentences with the keywords recorded in the rest of the candidate paragraphs, and generating a list based on the sentences;
determining the corresponding relation between a list and the preset number of the candidate paragraphs, and storing the list in a data report;
and controlling and displaying the list based on the corresponding relation.
7. The data analysis method according to claim 6, wherein the controlling the display of the list based on the correspondence relation includes:
uploading the data report to a cloud server, and acquiring real-time access behaviors of a user to the data report through terminal equipment, wherein the real-time access behaviors comprise paragraphs to be selected and displayed pictures on the terminal equipment, and the paragraphs to be selected are checked by the user in real time;
determining a list based on the corresponding relation of the real-time checked to-be-selected paragraphs;
and determining a blank position from the display screen and controlling the terminal equipment to display the list at the blank position.
8. A data analysis device, comprising:
the retrieval module is used for retrieving according to the set keywords to obtain a plurality of websites to be selected;
the extraction module is used for extracting text information in each website to be selected, and each text information is recorded with the keyword;
The quality score determining module is used for determining the quality score of each text message and determining target text messages based on the quality score, wherein the quality score represents the correlation degree of the to-be-selected paragraph and the keywords;
and the report generation module is used for generating a data report based on the target text information and the candidate websites corresponding to each target text information.
9. An electronic device, comprising:
at least one processor;
a memory;
at least one application program, wherein the at least one application program is stored in the memory and configured to be executed by the at least one processor, the at least one application program: for performing a data analysis method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed in a computer, causes the computer to perform a data analysis method according to any one of claims 1 to 7.
CN202311294235.1A 2023-10-08 2023-10-08 Data analysis method and device, electronic equipment and storage medium Pending CN117609594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311294235.1A CN117609594A (en) 2023-10-08 2023-10-08 Data analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311294235.1A CN117609594A (en) 2023-10-08 2023-10-08 Data analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117609594A true CN117609594A (en) 2024-02-27

Family

ID=89944863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311294235.1A Pending CN117609594A (en) 2023-10-08 2023-10-08 Data analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117609594A (en)

Similar Documents

Publication Publication Date Title
US10248662B2 (en) Generating descriptive text for images in documents using seed descriptors
US9779356B2 (en) Method of machine learning classes of search queries
CN110362372B (en) Page translation method, device, medium and electronic equipment
CN106547871B (en) Neural network-based search result recall method and device
US20090319449A1 (en) Providing context for web articles
CN110334356B (en) Article quality determining method, article screening method and corresponding device
US9582835B2 (en) Apparatus, system, and method for searching for power user in social media
US11423096B2 (en) Method and apparatus for outputting information
CN111192176B (en) Online data acquisition method and device supporting informatization assessment of education
CN110737774A (en) Book knowledge graph construction method, book recommendation method, device, equipment and medium
CN113688310A (en) Content recommendation method, device, equipment and storage medium
CN112740202A (en) Performing image search using content tags
CN105512300B (en) information filtering method and system
EP3706014A1 (en) Methods, apparatuses, devices, and storage media for content retrieval
CN113806660A (en) Data evaluation method, training method, device, electronic device and storage medium
CN110191124B (en) Web front-end development data-based website identification method and device and storage equipment
JP2016076115A (en) Information processing device, information processing method and program
KR101263403B1 (en) Apparatus and method for keyword searching according to priority of inputted word and computer readable medium having stored thereon computer executable instruction for performing the method
CN117609594A (en) Data analysis method and device, electronic equipment and storage medium
CN112926297B (en) Method, apparatus, device and storage medium for processing information
CN114550157A (en) Bullet screen gathering identification method and device
US10810236B1 (en) Indexing data in information retrieval systems
CN110147488B (en) Page content processing method, processing device, computing equipment and storage medium
JP2017072964A (en) Information analyzing apparatus and information analyzing method
CN111125548A (en) Public opinion supervision method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination