r - Automated download from multiple URLs, handling Feedback Dialog / cookies -
i try bulk download corine landcover zip-files this site think interconnected feedback survey window google analytics making troubles here.. there sort of curl handler these issues?
i viewed cookie in sqlite manager, these entries: basedomain = "europa.eu", name = "clc06_c133.zip", value = "sectors%3denvironment", host = "www.eea.europa.eu",.. cookie path "c:\users\kay\appdata\roaming\mozilla\firefox\profiles\ckut8fjm.default\cookies.sqlite"
setwd("d:/gis database/corinelc/") mylist <- list( clc06_1 <- "111 - continuous urban fabric", clc06_2 <- "112 - discontinuous urban fabric", clc06_3 <- "121 - industrial or commercial units", clc06_4 <- "122 - road , rail networks , associated land", clc06_5 <- "123 - port areas", clc06_6 <- "124 - airports", clc06_7 <- "131 - mineral extraction sites", clc06_8 <- "132 - dump sites", clc06_9 <- "133 - construction sites", clc06_10 <- "141 - green urban areas", clc06_11 <- "142 - sport , leisure facilities", clc06_12 <- "211 - non-irrigated arable land", clc06_13 <- "212 - permanently irrigated land", clc06_14 <- "213 - rice fields", clc06_15 <- "221 - vineyards", clc06_16 <- "222 - fruit trees , berry plantations", clc06_17 <- "223 - olive groves", clc06_18 <- "231 - pastures", clc06_19 <- "241 - annual crops associated permanent crops", clc06_20 <- "242 - complex cultivation patterns", clc06_21 <- "243 - land principally occupied agriculture, significant areas of natural vegetation", clc06_22 <- "244 - agro-forestry areas", clc06_23 <- "311 - broad-leaved forest", clc06_24 <- "312 - coniferous forest", clc06_25 <- "313 - mixed forest", clc06_26 <- "321 - natural grasslands", clc06_27 <- "322 - moors , heathland", clc06_28 <- "323 - sclerophyllous vegetation", clc06_29 <- "324 - transitional woodland-shrub", clc06_30 <- "331 - beaches, dunes, sands", clc06_31 <- "332 - bare rocks", clc06_32 <- "333 - sparsely vegetated areas", clc06_33 <- "334 - burnt areas", clc06_34 <- "335 - glaciers , perpetual snow", clc06_35 <- "411 - inland marshes", clc06_36 <- "412 - peat bogs", clc06_37 <- "421 - salt marshes", clc06_38 <- "422 - salines", clc06_39 <- "423 - intertidal flats", clc06_40 <- "511 - water courses", clc06_41 <- "512 - water bodies", clc06_42 <- "521 - coastal lagoons", clc06_43 <- "522 - estuaries", clc06_44 <- "523 - sea , ocean") # extract clc codes 3-digit number in string: foo1 <- function(x) unlist(strsplit(x, " - "))[[1]] # , names foo2 <- function(x) unlist(strsplit(x, " - "))[[2]] codes <- sapply(mylist, foo1, simplify = t) names <- sapply(mylist, foo2, simplify = t) # make urls names_url <- gsub(",", "", gsub("\\s", "-", names)) dl_url <- paste0("http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-2/", codes, "-", names_url, "/clc06_c", codes, ".zip/at_download/file") # zip file names get_zip_name <- function(x) unlist(strsplit(x, "/"))[grep(".zip", unlist(strsplit(x, "/")))] # function plug sapply dl_url_to_zip <- function(x) download.file(x, dest = get_zip_name(x)) # gives http status 404! sapply(dl_url, dl_url_to_zip)
you can use httr
package:
require(httr) require(xml) response <- get("http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-2") doc <- htmlparse(content(response,as="text")) files <- xpathsapply(doc,'//*/a[contains(@href,"http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-2/")]/@href') files <- files[-c(1,47:50)] files[10] #href #"http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-2/141-green-urban-areas/clc06_c141.zip/at_download/file" dl_url[10] #[1] "http://www.eea.europa.eu/data-and-maps/data/clc-2006-vector-data-version-2/141-green-urban-areas/clc06_c141.zip/at_download/file"
notice had capitals green-urban-areas
instead of green-urban-areas
response <- get(files[10]) writebin(content(response),'test.zip')
i started httr
package assumed might need cookies. in fact
download.file(files[10],'test.zip',mode="wb")
would work equally well. error in original code capitalisation.
Comments
Post a Comment