Scraping query

Please post any questions regarding the program here.

Moderator: 2020vision

Scraping query

Postby crocogotter » Wed May 06, 2015 7:31 am

I've seen reference to scraping in a few places, but I don't really understand the mechanics of it. Obviously if I could use it I would save a fair bit of time, so I'll briefly explain how I operate and hopefully someone will tell me if scraping could be applicable.

My system involves greyhounds. I use an excel spreadsheet to predict winners, and a second one which I load with selections to trigger my bets. For the first workbook I manually copy the page from the racing post website with the race information on (the usual page with all the dogs times and stuff) and the spreadsheet then predicts the winner. I then copy the next race and get another winner and so on until I have a list of selections to back with I put into the second workbook which triggers my bets. This takes about half an hour each morning.

Is it possible for some code to, for instance, when a new race is loaded, make excel scrape the relevant page of the racing post for that race into a spreadsheet? The spreadsheet could then predict the winner and back it (subject to it meeting various other criteria), without me having to manually copy and paste each race card page into a spreadsheet each morning.

As I say, I've no idea if this is viable so is something like this possible?
crocogotter
 
Posts: 57
Joined: Fri Apr 08, 2011 2:27 pm

Re: Scraping query

Postby Captain Sensible » Wed May 06, 2015 5:00 pm

I'd imagine quite a few of us have similar sheets that pull data from the web relevant to each market, there used to be a sheet on here somewhere that linked to the racing post to get the liveshows off the RP site. Dunno if it's still around as the RP changed things so you needed to log in for the data plus I think they mess around with javascript to display the info but as long as the RP webpages are in a regular format containing the time and course in the URL it shouldn't be too hard to recreate the URL from the data on Gruss
User avatar
Captain Sensible
 
Posts: 2923
Joined: Sat Nov 19, 2005 2:29 pm

Re: Scraping query

Postby crocogotter » Wed May 06, 2015 7:27 pm

This is a typical page address:

"http://www.racingpost.com/greyhounds/card.sd#resultDay=2015-05-06&raceId=1368338"

I don't need to log in to view this page.

So, in theory, it should be possible that, when BA loads a new market at, say, a minute before the off, excel could scrape this page onto a spreadsheet, and then load another page when the next market loads and so on?
crocogotter
 
Posts: 57
Joined: Fri Apr 08, 2011 2:27 pm

Re: Scraping query

Postby Captain Sensible » Wed May 06, 2015 7:44 pm

Excel and VBA can both scrape webpages and then obviously parse that page into however you want the data presented. Only had a quick look at the page but it looks like it's pulling the data to complete the page using javascript , now whether that's been put there to stop people site scraping or just to display the data nicer who knows. But that does make life harder for you scraping it probably one of the reasons why most people scraped the sporting life site as that's only html and with the way they format the URL's it makes life very easy.

I'd try and look to see if the other sites like sportinglife etc had the info you need to run it before trying to scrape RP as it does look more complicated than something as simple as the SP pages like http://www.sportinglife.com/greyhounds/ ... 15/romford
User avatar
Captain Sensible
 
Posts: 2923
Joined: Sat Nov 19, 2005 2:29 pm

Re: Scraping query

Postby crocogotter » Wed May 06, 2015 8:07 pm

Unfortunately that Sporting Life page doesn't contain all the information that I need to make my selection.
crocogotter
 
Posts: 57
Joined: Fri Apr 08, 2011 2:27 pm

Re: Scraping query

Postby crocogotter » Wed May 06, 2015 8:46 pm

The Betfair website actually provides all the info I need, but only when selecting the race and time from drop down boxes from this page

http://form.greyhounds.betfair.com/racingform

The url doesn't seem to change when you select a different race using the drop down boxes so I guess this won't work either.
crocogotter
 
Posts: 57
Joined: Fri Apr 08, 2011 2:27 pm

Re: Scraping query

Postby Captain Sensible » Wed May 06, 2015 9:34 pm

Yep they all use things like javascript to generate the pages. Just had a quick look at the Racing Post page and the actual data is hidden within the main page and then the different parts like form, odds and latest shows are just displayed using javascript. The pages it calls is

http://www.racingpost.com/greyhounds/ca ... 2015-05-06 , I guess you're best bet is writing something to pull the main greyhound card page at the start of the day so you can match up all the id's with the race times etc then use them to trigger off a grab of the standard card page and just parse out the data you want displayed.
User avatar
Captain Sensible
 
Posts: 2923
Joined: Sat Nov 19, 2005 2:29 pm

Re: Scraping query

Postby crocogotter » Thu May 07, 2015 4:56 am

Captain Sensible wrote:Yep they all use things like javascript to generate the pages. Just had a quick look at the Racing Post page and the actual data is hidden within the main page and then the different parts like form, odds and latest shows are just displayed using javascript. The pages it calls is

http://www.racingpost.com/greyhounds/ca ... 2015-05-06 , I guess you're best bet is writing something to pull the main greyhound card page at the start of the day so you can match up all the id's with the race times etc then use them to trigger off a grab of the standard card page and just parse out the data you want displayed.


Think I'm out of my depth here, I can't see the race id's when I pull up the main card, is that information hidden in the page? I've looked at the race id's for today and I can't work out how they're generated, so I would need to get them from that page I guess.

If you think this would be viable I'll maybe consider getting in some help to take it further. I seem to remember there were some developers on a section somewhere on the forum, is that still the case?
crocogotter
 
Posts: 57
Joined: Fri Apr 08, 2011 2:27 pm

Re: Scraping query

Postby alrodopial » Thu May 07, 2015 7:47 am

There are two ways for you
1.Hiring a coder , I suggest Os for this job
2.Doing it by yourself after searching the web for "scraping a webpage", asking for help here or elsewhere etc

Avoiding this daily boring task is important so I suggest the first as the second will take you much time.
Have in mind that you will need to learn a little vba as in the future you will want to change a small part , copy data to next cell etc
alrodopial
 
Posts: 1384
Joined: Wed Dec 06, 2006 9:59 pm

Re: Scraping query

Postby Captain Sensible » Thu May 07, 2015 11:48 am

crocogotter wrote:
Think I'm out of my depth here, I can't see the race id's when I pull up the main card, is that information hidden in the page? I've looked at the race id's for today and I can't work out how they're generated, so I would need to get them from that page I guess.

If you think this would be viable I'll maybe consider getting in some help to take it further. I seem to remember there were some developers on a section somewhere on the forum, is that still the case?



Yep the id's are hidden with the URL's for the race times. I'm not that clued up with VBA but could easily code it up in php so anyone with a decent knowledge of VBA shouldn't have any problem.

I'd echo Alrodopial's post too it makes life a lot easier having things fully auto, a few of mine just run along happily day in day out reloading data once the date changes and scraping data off the web also, If I was looking for a develpoer I'd certainly approach Osknows first, he's helped me and plenty of others on the forum with coding problems on numerous occasions, certainly knows his VBA and a real asset to the forum.

If you do look for a developer just make sure you're clear about what you need even something simple as screenshots showing what page you want scraped and how it's then to be laid out on excel is a real time saver. Most developers will already have some basic scraping scripts and may even have done something similar already, plus it doesn't seem what you're looking for is hard to do either.
User avatar
Captain Sensible
 
Posts: 2923
Joined: Sat Nov 19, 2005 2:29 pm

Re: Scraping query

Postby crocogotter » Thu May 07, 2015 12:02 pm

Thanks for the replies guys, in actual fact I dropped Osknows a PM late last night (once it became clear that this was going to be beyond me) as someone recommended him about a week ago in the Developers section. Hopefully he'll get back to me and we'll get something sorted between us.
crocogotter
 
Posts: 57
Joined: Fri Apr 08, 2011 2:27 pm

Re: Scraping query

Postby mak » Fri May 08, 2015 3:19 pm

I am sure OS will answer your PM and your needs. You should consider though that very often web pages change their data and their elements, and that will make you to renew your scrap code. Just mention it. :)
mak
 
Posts: 1086
Joined: Tue Jun 30, 2009 8:17 am

Re: Scraping query

Postby crocogotter » Sun May 10, 2015 5:15 pm

Just a quick update, Osknows did indeed reply to my PM. He gave me a very reasonable price for what I wanted doing, produced a draft version inside a day and has since modified it a couple of times to present the data in the most helpful way for me, drastically minimising the work I've had to do to modify my spreadsheets to work with the scraped data. As Captain Sensible rightly says, he's a real asset to the forum.
crocogotter
 
Posts: 57
Joined: Fri Apr 08, 2011 2:27 pm


Return to Help

Who is online

Users browsing this forum: No registered users and 60 guests

Sports betting software from Gruss Software


The strength of Gruss Software is that it’s been designed by one of you, a frustrated sports punter, and then developed by listening to dozens of like-minded enthusiasts.

Gruss is owned and run by brothers Gary and Mark Russell. Gary discovered Betfair in 2004 and soon realised that using bespoke software to place bets was much more efficient than merely placing them through the website.

Gary built his own software and then enhanced its features after trialling it through other Betfair users and reacting to their improvement ideas, something that still happens today.

He started making a small monthly charge so he could work on it full-time and then recruited Mark to help develop the products and Gruss Software was born.

We think it’s the best of its kind and so do a lot of our customers. But you can never stand still in this game and we’ll continue to improve the software if any more great ideas emerge.