You have to login in order to post a reply to this topic.

2 replies [Last post]
baz
User offline. Last seen 37 weeks 5 days ago. Offline
Joined: 11/12/2009
Posts: 47

Hi,
I would like a workflow which downloads the text from web pages in googles top Xn results based on a search keyword. So if winautomation was the first result, I'd get only the text from the WA landing page (not the html) and the script would then do the same at the second result...and move on a predefined number of times.
I dont beleive its possible to grab just the text...but WA have surprised me so many times, I thought I 'd ask.
All the best
 
Barry

codex
codex's picture
User offline. Last seen 18 weeks 4 days ago. Offline
Joined: 11/12/2009
Posts: 161
Re: Download Web Page text from Google search results based ...

The timeless question: "How do I strip HTML tags from a text file?"

I have attached a sample job that visits the first 5 pages of results in google and strips them off their html tags. This is a pretty basic script that does not remove javascript code for example and leaves many gaps between the text.

If you find it useful and want me to include parsing and replacing for javascript code and stuff, post here and I'll see what I can do. I hope that I will be able to make it extract the text from the websites in a pretty decent-looking format...

AttachmentSize
Get Text from Websites Based on Google Search.waj 20 KB
__________________

Error is not blindness, error is cowardice

__________________

Error is not blindness, error is cowardice

peter-
peter-'s picture
User offline. Last seen 19 hours 4 min ago. Offline
Joined: 11/20/2009
Posts: 54
Re: Download Web Page text from Google search results based ...

First I would like to say hi to you all

This is an interesting project and one that I was going to look into later when I have finished my currant project.

Can I suggest adding the option to retrieve 10,20,30,50,100 results per page?
And also the results title the snippet then the url and saving them to a csv file
It could also prove useful to then get the alexa ranking for each result and add it to the csv file
Then taking the data to yahoo site explore to find the number of back links for each url and again add this to the csv file
This is what I had in mind to do

Pete

 

__________________

ourarticles

__________________

ourarticles