Moving a file from a website into a csv format or excel spreadsheet?
abraham
![]()
- Joined: Dec 1, 2011
- Posts: 7
Hello,
I would like to move the information from this website:
http://www.civilsheriff.com/RealEstateSales.asp
to an excel file or csv file. I can not figure out how to begin the process.
If we are able to solve this issue, I would like to automate the checking of this site for updated information.
Thanks in advance,
Abraham
t-hex
![]()
- Joined: Nov 13, 2011
- Posts: 152
0.
Select the date you want to pass through to 'MYSaleDate' var.,
based on MM/d/YYY
1.
Download the webpage (queried through asp) locally or through WA>Download File action>Save it to a variable or local file:
http://www.civilsheriff.com/RealEstate/RealEstateBody.asp?MYSaleDate=12/1/2011
(you'll get the actual html body)
2.
Parse the webpage for required information using fine-tuned regular expressions
OR, Strip HTML Tags, then process data
OR, If you're not familiar with the above, use 'Table2Clipboard' addon with Firefox.
Use/automate 'Copy whole table' function from 'FF>Table2Clipboard' context menu, grab clipboard data using WA, segregate that data using delimiter: 'tab' or '\t' character.
3.
Then build a table/list, maintaining matches/positions from parsing/format matches.
4.
Write lists to excel
or comma delimit one or more lists, save it into a final list, then write to csv file...
Samantha
![]()
- Joined: Apr 23, 2010
- Posts: 2328
abraham
![]()
- Joined: Dec 1, 2011
- Posts: 7
Thanks for all your help.
T-Hex,
1. Thanks for the firefox tip. I don't understand the parse and download step you mentioned but the tabletolist function did exactly what I wanted to do.
2. Is there anyway to control this function with WA?
3. How did you find this link:http://www.civilsheriff.com/RealEstate/RealEstateBody.asp?MYSaleDate=12/1/2011
Samantha,
I import your sample job into WA an it worked fine. I don't understand the parse section of the code. I'm affriad that learning the code will take longer than using the firefox workaround. I really appreciate you taking the time out of your day to put this job together.
1. Is there anyway to setup a trigger to search for new information or set dates? I know that a new page with select dates will be released on a normal basis. I do not know when. If you revisit the website, you ill notice that the last available date is 1/26/2011. I know that the next date with information on it is 02/02/2012. Is there a way to program WA to:
a. Look for this date.
b. perform the Table to list program I will create.
Thanks again for all your help. I really appreciate it.
Abraham
t-hex
![]()
- Joined: Nov 13, 2011
- Posts: 152
[quote=abraham]I don't understand the parse and download step you mentioned...[/quote] [quote=abraham]Is there anyway to control this function with WA?[/quote]
Hi abraham, I would advise you not to. Because the method Samantha used is way better.
If you look at actions in Samantha's job... the required pages are downloaded, then parsed using regular expressions in order to match the required text, looped through dates/etc and so on. Thats the way I was talking about, in my previous post. Its how it should be...
Anyway, here's how, using Table2Clip addon: 0. Loop dates
1. Run application action in WA: http://www.civilsheriff.com/RealEstate/RealEstateBody.asp?MYSaleDate=%DATE%
2. Wait for a time constant (for the page to load) 3. Focus Browser dialog title
4. Send right-click 5. Focus on "TableToClipboard" context item>Send 'Left Click' or use 'Up/Down Keys'
6. Send leftt click to 'Copy Whole Table' 7. Get Clipboard Text WA action
8. 'Split Text' WA action to split text in clipboard into list delimited by fisrt newline char then tab character (file attached, see how its being done)
Again, the above method is not recommended...
[quote=abraham]How did you find this link:http://www.civilsheriff.com/RealEstate/RealEstateBody.asp?MYSaleDate=12/1/2011[/quote]
Following internal page redirections from/in the webpage source i.e.: http://www.civilsheriff.com/RealEstateSales.asp
Here's how (Follow the following images/snapshots or attached images): [FF > View > Page Source (Ctrl+U)] http://i1209.photobucket.com/albums/cc395/t-hex/WA/CivilSherriff/1.png http://i1209.photobucket.com/albums/cc395/t-hex/WA/CivilSherriff/2.png
http://i1209.photobucket.com/albums/cc395/t-hex/WA/CivilSherriff/3.png
--------------------------------------------------
As for 'how to know if a new date has been added?'
You can maintain a small database containing current dates. Imagine a *.txt file containing current dates present in the webpage: http://www.civilsheriff.com/RealEstate/RealEstateFilter.asp
DATE1
DATE2 DATE3 .....n
Check every day for chages in the date list. Loop through the above date stack:
If DATE1NEW != DATE1 OR DATE2 OR DATE3
>add new date to top/end of the stack. >>ELSE >>>If DATE2NEW != DATE1 OR DATE2 OR DATE3 >>>>add new date to top/end of the stack.
and so on...
Another possibility: If the webpage can be monitored daily, download the webpage to a variable>convert variable to list>compare current webpage variable list to the same from the previous day>Continue with the output...
Execute/run "CivilSherriff_and_Table2Clip.waj" only after copying the whole table using Table2Clipboard.
Samantha
![]()
- Joined: Apr 23, 2010
- Posts: 2328