Hi,
I would like a workflow which downloads the text from web pages in googles top Xn results based on a search keyword. So if winautomation was the first result, I'd get only the text from the WA landing page (not the html) and the script would then do the same at the second result...and move on a predefined number of times.
I dont beleive its possible to grab just the text...but WA have surprised me so many times, I thought I 'd ask.
All the best
Barry


The timeless question: "How do I strip HTML tags from a text file?"
I have attached a sample job that visits the first 5 pages of results in google and strips them off their html tags. This is a pretty basic script that does not remove javascript code for example and leaves many gaps between the text.
If you find it useful and want me to include parsing and replacing for javascript code and stuff, post here and I'll see what I can do. I hope that I will be able to make it extract the text from the websites in a pretty decent-looking format...
Error is not blindness, error is cowardice
Error is not blindness, error is cowardice