Moving a file from a website into a csv format or excel spreadsheet?

abraham

  • Joined: Dec 1, 2011
  • Posts: 7

Thu 12/1/2011 - 4:35

 Hello,

 

I would like to move the information from this website:

 

http://www.civilsheriff.com/RealEstateSales.asp

 

to an excel file or csv file.  I can not figure out how to begin the process.

 

If we are able to solve this issue, I would like to automate the checking of this site for updated information.

 

Thanks in advance,

 

Abraham

#1

t-hex

  • Joined: Nov 13, 2011
  • Posts: 152

Fri 12/2/2011 - 9:02

0.

Select the date you want to pass through to 'MYSaleDate' var.,

based on MM/d/YYY

 

1.

Download the webpage (queried through asp) locally or through WA>Download File action>Save it to a variable or local file:

 

http://www.civilsheriff.com/RealEstate/RealEstateBody.asp?MYSaleDate=12/1/2011

 

(you'll get the actual html body)

 

2.

Parse the webpage for required information using fine-tuned regular expressions

 

OR, Strip HTML Tags, then process data

 

OR, If you're not familiar with the above, use 'Table2Clipboard' addon with Firefox.

Use/automate 'Copy whole table' function from 'FF>Table2Clipboard' context menu, grab clipboard data using WA, segregate that data using delimiter: 'tab' or '\t' character.

 

3.

Then build a table/list, maintaining matches/positions from parsing/format matches.

 

4.

Write lists to excel

or comma delimit one or more lists, save it into a final list, then write to csv file...

#2

Samantha

  • Joined: Apr 23, 2010
  • Posts: 2328

Fri 12/2/2011 - 14:05

Hi Abraham, Welcome to our Forum! I had a quick look at the website, it's not really consistent, but you could grab the information you're looking for plus more! I'm attaching a sample job, it could get a lot of improvement, but it's a base job you could work with Let me know if you need any additional help with this one. :) Samantha

CivilSherriff.com_.waj

#3

abraham

  • Joined: Dec 1, 2011
  • Posts: 7

Fri 12/2/2011 - 20:37

 Thanks for all your help.  

 

T-Hex, 

 

1. Thanks for the firefox tip.  I don't understand the parse and download step you mentioned but the tabletolist function did exactly what I wanted to do.

 

2. Is there anyway to control this function with WA?

 

3. How did you find this link:http://www.civilsheriff.com/RealEstate/RealEstateBody.asp?MYSaleDate=12/1/2011

 

 

Samantha,

 

I import your sample job into WA an it worked fine.  I don't understand the parse section of the code.  I'm affriad that learning the code will take longer than using the firefox workaround.  I really appreciate you taking the time out of your day to put this job together.

 

1. Is there anyway to setup a trigger to search for new information or set dates?  I know that a new page with select dates will be released on a normal basis.  I do not know when.  If you revisit the website, you ill notice that the last available date is 1/26/2011.  I know that the next date with information on it is 02/02/2012.  Is there a way to program WA to:

 

a. Look for this date.

b. perform the Table to list program I will create.

 

Thanks again for all your help.  I really appreciate it.

 

Abraham

#4

t-hex

  • Joined: Nov 13, 2011
  • Posts: 152

Sat 12/3/2011 - 13:11

  [quote=abraham]I don't understand the parse and download step you mentioned...[/quote]    [quote=abraham]Is there anyway to control this function with WA?[/quote]
    Hi abraham,    I would advise you not to. Because the method Samantha used is way better.
  If you look at actions in Samantha's job... the required pages are downloaded, then parsed using regular expressions in order to match the required text, looped through dates/etc and so on. Thats the way I was talking about, in my previous post. Its how it should be...    
  Anyway, here's how, using Table2Clip addon:    0. Loop dates
  1. Run application action in WA:  http://www.civilsheriff.com/RealEstate/RealEstateBody.asp?MYSaleDate=%DATE%
  2. Wait for a time constant (for the page to load)  3. Focus Browser dialog title
  4. Send right-click  5. Focus on "TableToClipboard" context item>Send 'Left Click' or use 'Up/Down Keys'
  6. Send leftt click to 'Copy Whole Table'  7. Get Clipboard Text WA action
  8. 'Split Text' WA action to split text in clipboard into list delimited by fisrt newline char then tab character (file attached, see how its being done)    
  Again, the above method is not recommended...    
  [quote=abraham]How did you find this link:http://www.civilsheriff.com/RealEstate/RealEstateBody.asp?MYSaleDate=12/1/2011[/quote]    
  Following internal page redirections from/in the webpage source i.e.:    http://www.civilsheriff.com/RealEstateSales.asp
    Here's how (Follow the following images/snapshots or attached images):  [FF > View > Page Source (Ctrl+U)]    http://i1209.photobucket.com/albums/cc395/t-hex/WA/CivilSherriff/1.png  http://i1209.photobucket.com/albums/cc395/t-hex/WA/CivilSherriff/2.png
http://i1209.photobucket.com/albums/cc395/t-hex/WA/CivilSherriff/3.png    
  --------------------------------------------------    
  As for 'how to know if a new date has been added?'    
  You can maintain a small database containing current dates.  Imagine a *.txt file containing current dates present in the webpage:    http://www.civilsheriff.com/RealEstate/RealEstateFilter.asp
    DATE1  
  DATE2  DATE3  .....n
    Check every day for chages in the date list. Loop through the above date stack:
    If DATE1NEW != DATE1 OR DATE2 OR DATE3
  >add new date to top/end of the stack.  >>ELSE  >>>If DATE2NEW != DATE1 OR DATE2 OR DATE3  >>>>add new date to top/end of the stack.  
  and so on...      
  Another possibility: If the webpage can be monitored daily, download the webpage to a variable>convert variable to list>compare current webpage variable list to the same from the previous day>Continue with the output...    
  Execute/run "CivilSherriff_and_Table2Clip.waj" only after copying the whole table using Table2Clipboard.

1_0.png 2_0.png 3.png CivilSherriff_and_Table2Clip.waj

#5

Samantha

  • Joined: Apr 23, 2010
  • Posts: 2328

Mon 12/5/2011 - 20:13

What you could do, is keep a small log of the dates you've checked, if the date is found within the Log file, then it will do nothing, just move on to the next page. If there is a date that has not been processed (therefore does not exist in the log file, it will then do the processing. In regards to grabbing the values, i would suggest that you kept to it, it's easier, and regular expressions might seem creepy, however, trust me, they're not. Keep us updated on your progress! :) Samantha

#6
Not a member yet? Register

Copyright 2013 - Softomotive Ltd