Advice please regarding learning scraping data from websites

Discuss anything related to using the program (eg. triggered betting tactics)

Moderator: 2020vision

Advice please regarding learning scraping data from websites

Postby danjuma » Fri Mar 19, 2010 1:35 pm

Hi guys,

I am thinking of learning a technique for the above. After a quick reference to wiki, found out there are various tecniques which has confused me more :? Basically you have data scraping, screen scraping, web scraping, data mining, report mining etc.

What I want to be able to do is just extract certain data from websites like fixed odds and spreadbetting sites etc on to an excel sheet. So what are the pros and cons of the above techniques please in terms of ease of learning/implementing, most appropriate, most flexible etc please, as I don't really have the time/need/intelligence :lol: to learn them all?


Many thanks
User avatar
danjuma
 
Posts: 347
Joined: Mon Apr 21, 2008 4:17 pm

Postby doris_day » Fri Mar 19, 2010 1:43 pm

From my own limited experience it depends what data you want to scrape and how the site is coded.
The simplest way is to use a web query in Excel. Very often that's all you need to do. I'd always give that a try first and you might be surprised how easy it is.
I've used data extraction software that's worked fine but I supoose the best way is to code your own tailor made app using VB, C## or whatever language you decide to learn.
I'm now at the point where I've decided to learn VB properly and work from there.
Best of luck.
'He was looking for the card so high and wild he'd never need to deal another' - Leonard Cohen
User avatar
doris_day
 
Posts: 968
Joined: Fri Nov 02, 2007 12:34 am

Postby doris_day » Fri Mar 19, 2010 1:46 pm

Another way is not to learn anything at all and go to somewhere like RAC (Rentacoder) and get one of the great coders they have to do it all for you. They're cheaper than you think :)
'He was looking for the card so high and wild he'd never need to deal another' - Leonard Cohen
User avatar
doris_day
 
Posts: 968
Joined: Fri Nov 02, 2007 12:34 am

Postby osknows » Fri Mar 19, 2010 8:36 pm

I would say you have 2 seperate things there

data scraping, screen scraping & web scraping is about actually getting the data from the source

data mining & report mining is more about using the data to uncover trends/information

Basically if anything can be seen on the screen then 99.9% of the time you can get the data, how you get it varies and is constantly changing as sites update, introduce new technology and implement security measures. It is good practice to abide by any rules and not to overburden any site with too many requests.

doris_day has given a very good place to start using excel and webqueries. My advice is that if you intend to use extracted information for non-live analysis then excel and VBA is by far the easiest method as excel files are easily ported to other apps for analysis and storage. VB & C# would be better suited for live situations and building complete apps from scratch.

Website technology has moved beyond the days where all data is in the HTML code; often these days there are technologies like SOAP, Javascript and AJAX which update only parts of the webpage. Sometimes you may need to build a bot to replicate a user access. It all depends on the site and technology being used.

The best thing is to start a small project with just one site. See if you can extract the data you need using..
1. Excel webquery
2. Source HTML
3. Parsing Javascript/SOAP/AJAX

If you can do these you should be able to get to 75%+ of sites.

This is also a good book http://www.heatonresearch.com/book/http ... sharp.html
User avatar
osknows
 
Posts: 946
Joined: Wed Jul 29, 2009 12:01 am

Postby xraymitch » Fri Mar 19, 2010 9:12 pm

In addition to the excellent advice from Doris and osknows have a look at imacros from iopus they have a freeware version and also a business version which comes with a 30 day unlimited free trial.

Whatever you do with a web browser, iMacros can automate it.
Form Filling, Web Scripting, Data Extraction, Web Testing, Excel Web Queries.

http://www.iopus.com/download/

http://www.iopus.com/imacros/compare/

Ray 8)
xraymitch
 
Posts: 410
Joined: Wed Jun 25, 2008 7:06 am
Location: UK

Postby Timstertoo » Fri Mar 19, 2010 10:24 pm

Maybe these two links can help you on your way:

http://www.packtpub.com/article/web-scr ... ith-python
http://www.packtpub.com/article/web-scr ... hon-part-2

Python is an open source language, easy to learn for beginners and plenty of info freely available on the web.

Good luck!
User avatar
Timstertoo
 
Posts: 26
Joined: Tue Mar 16, 2010 5:54 pm
Location: Amsterdam

Postby danjuma » Sat Mar 20, 2010 1:20 pm

Thanks guys for all your advice/suggestions. Much appreciated. :)
User avatar
danjuma
 
Posts: 347
Joined: Mon Apr 21, 2008 4:17 pm

Postby danjuma » Sat Mar 20, 2010 1:42 pm

xraymitch wrote:In addition to the excellent advice from Doris and osknows have a look at imacros from iopus they have a freeware version and also a business version which comes with a 30 day unlimited free trial.

Whatever you do with a web browser, iMacros can automate it.
Form Filling, Web Scripting, Data Extraction, Web Testing, Excel Web Queries.

http://www.iopus.com/download/

http://www.iopus.com/imacros/compare/

Ray 8)


Thanks xraymitch. I already know about iMacros, but the bit I need which is for 'extracting'data is not actaully in the free version. Also, I have stopped using firefox these days, as for some reason for some time now, it's been awfully slow, even though I'm using the latest version 3.6. I now use Oepra 10.50, brilliant so far.
User avatar
danjuma
 
Posts: 347
Joined: Mon Apr 21, 2008 4:17 pm


Return to Discussion

Who is online

Users browsing this forum: No registered users and 102 guests

Sports betting software from Gruss Software


The strength of Gruss Software is that it’s been designed by one of you, a frustrated sports punter, and then developed by listening to dozens of like-minded enthusiasts.

Gruss is owned and run by brothers Gary and Mark Russell. Gary discovered Betfair in 2004 and soon realised that using bespoke software to place bets was much more efficient than merely placing them through the website.

Gary built his own software and then enhanced its features after trialling it through other Betfair users and reacting to their improvement ideas, something that still happens today.

He started making a small monthly charge so he could work on it full-time and then recruited Mark to help develop the products and Gruss Software was born.

We think it’s the best of its kind and so do a lot of our customers. But you can never stand still in this game and we’ll continue to improve the software if any more great ideas emerge.