Skip to content
This repository has been archived by the owner on Feb 7, 2025. It is now read-only.

cf_query failed to grab data behind the proxy and a workaound #7

Closed
HaizhenWu opened this issue Jan 8, 2017 · 2 comments
Closed

cf_query failed to grab data behind the proxy and a workaound #7

HaizhenWu opened this issue Jan 8, 2017 · 2 comments
Assignees

Comments

@HaizhenWu
Copy link

Hi,

In running the example, we had the following issue,

> reefton.data = cf_query(public.user, all.dts, reefton.st,
+                         paste(as.Date(Sys.time()) - 182, "9"))
connecting to CliFlo...
reading data...
Error in textConnection(doc) : invalid 'text' argument
In addition: Warning message:
In if (is_HTML) { :
  the condition has length > 1 and only the first element will be used
Error in cf_logout(user, msg = FALSE) : HTTP error
> reefton.data
Error: object 'reefton.data' not found

The reason was that we are in an environment need proxy setting, and the function postForm() called by cf_query() behaved abnormally (returning raw rather than character string) in this situation. (as reported in http://stackoverflow.com/questions/14421149/postform-request-https-using-proxy-returns-rubbish-content as well)

We found a workaround, i.e. adding the following line

if(is.raw(doc)) {doc <- rawToChar(doc)}

after line 215 of https://github.com/ropensci/clifro/blob/master/R/cfQuery.R, to forcibly convert raw type data to character string.

Also, in our environment, the lines 216-217 of https://github.com/ropensci/clifro/blob/master/R/cfUser.R

header = getURLContent("https://cliflo.niwa.co.nz/pls/niwp/wa.logout",
                       curl = curl, header = TRUE, cainfo = cert)

gives

> header
$header
                      status                statusMessage 
                       "200" "Connection established\r\n" 

$body
   [1] 3c 21 44 4f 43 54 59 50 45 20 48 54 4d 4c 20 50 55 42 4c 49 43 20 22 2d 2f 2f 57 33 43 2f 2f 44 54 44 20 48 54 4d 4c 20 34 2e 30 31
  [45] 20 54 72 61 6e 73 69 74 69 6f 6e 61 6c 2f 2f 45 4e 22 0a 20 20 20 20 20 20 22 68 74 74 70 3a 2f 2f 77 77 77 2e 77 33 2e 6f 72 67 2f
...

which stopped cf_query() at line 179 of https://github.com/ropensci/clifro/blob/master/R/cfQuery.R

on.exit(cf_logout(user, msg = FALSE))

since statusMessage is not "OK".

I feel the logic of cf_logout could be improved, by saying,

cf_logout = function(object, msg = TRUE){
  cookies = file.path(tempdir(), object@username)
  curl = getCurlHandle(cookiejar = cookies,
                       cookiefile = cookies,
                       .opts = cf_parallel[["curl_opts"]])
  cert = system.file("CurlSSL/cacert.pem", package = "RCurl")

  header = getURLContent("https://cliflo.niwa.co.nz/pls/niwp/wa.logout",
                         curl = curl, header = TRUE, cainfo = cert)
								 
  if (!grepl("OK", header$header["statusMessage"])) {
    file.remove(cookies)
    if(msg) {message("HTTP error")}
  } else {
	  getURL("https://cliflo.niwa.co.nz/pls/niwp/wa.logout",
				curl = curl, cainfo = cert)
	file.remove(cookies)
	if(msg) {message("Logout successful")}
  }
}

since even the statusMessage is not "OK", we sometime may still want to remove the cookies and continue (rather than stop) the function.

After both cf_query and cf_logout have been modified, we are able to obtain the data

> reefton.data = cf_query(public.user, all.dts, reefton.st,
+                         paste(as.Date(Sys.time()) - 182, "9"))
connecting to CliFlo...
reading data...
UserName is = public
Number of charged rows output = 0
Number of free rows output = 999
Total number of rows output = 999
Note: The end date was revised to meet the maximum number of rows allowed per query [1000]
or due to running out of rows in your subscription. Also, one or more datatypes may have been disabled due to the above.
Copyright NIWA 2017 Subject to NIWA's Terms and Conditions
See: http://cliflo.niwa.co.nz/pls/niwp/doc/terms.html
Comments to: cliflo@niwa.co.nz


> reefton.data
List containing clifro data frames:
              data      type              start                end rows
df 1) Surface Wind  9am only (2016-07-10  9:00) (2017-01-09  9:00)  184
df 2)         Rain     Daily (2016-07-10  9:00) (2017-01-08  9:00)  183
df 3)      Max_min    Hourly (2016-07-10  9:00) (2016-08-05 16:00)  632

Could you please look at this?

Regards,
Eric

@blasee blasee self-assigned this Jan 8, 2017
@blasee
Copy link
Contributor

blasee commented Jan 8, 2017

Hi Eric,

Thanks for the detailed information on your issue and your suggestions. I've updated the development version (3.1-999) incorporating your code changes. Could you please let me know how you get on with this latest version.

Thanks,
Blake

@HaizhenWu
Copy link
Author

Hi Blake,

I have just installed clifro_3.1-999, and it works well in our environment.

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_New Zealand.1252  LC_CTYPE=English_New Zealand.1252   
[3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C                        
[5] LC_TIME=English_New Zealand.1252    

attached base packages:
[1] graphics  grDevices datasets  grid      tcltk     utils     stats     methods   base     

other attached packages:
[1] clifro_3.1-999  mbie_0.9.7      scales_0.4.1    ggplot2_2.2.0   tcltk2_1.2-11  
[6] devtools_1.12.0 dplyr_0.5.0     Defaults_1.1-1  withr_1.0.2    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8        magrittr_1.5       munsell_0.4.3      colorspace_1.3-1  
 [5] R6_2.2.0           stringr_1.1.0      plyr_1.8.4         tools_3.3.2       
 [9] gtable_0.2.0       DBI_0.5-1          selectr_0.3-1      lazyeval_0.2.0    
[13] assertthat_0.1     digest_0.6.10      tibble_1.2         reshape2_1.4.2    
[17] RColorBrewer_1.1-2 bitops_1.0-6       RCurl_1.95-4.8     memoise_1.0.0     
[21] stringi_1.1.2      XML_3.98-1.5       lubridate_1.6.0   
> ?clifro
> library(RCurl)
Loading required package: bitops
> library(XML)
> library(selectr)
> 
> creds <- AskCreds(Title = "User Log In Name and Password", startuid = "",
+                   returnValOnCancel = "ID_CANCEL")
> options(RCurlOptions = list(proxy = 'http://proxybcw.wd.govt.nz:8080',
+                             proxyusername = creds$uid, proxypassword = creds$pwd))
> 
> 
> 
> library(clifro)
> ## Not run: 
> # Create a public user ----------------------------------------------------
> 
> public.user = cf_user() # Defaults to "public"
> public.user
public user - only data from Reefton Ews (3925) available
> # Select datatypes --------------------------------------------------------
> 
> # 9am Surface wind (m/s)
> wind.dt = cf_datatype(2, 1, 4, 1)
> 
> # Daily Rain
> rain.dt = cf_datatype(3, 1, 1)
> 
> # Daily temperature extremes
> temp.dt = cf_datatype(4, 2, 2)
> 
> # Combine them together
> all.dts = wind.dt + rain.dt + temp.dt
> all.dts
                     dt.name              dt.type  dt.options dt.combo
dt1                     Wind         Surface wind   [9amWind]      m/s
dt2            Precipitation Rain (fixed periods)    [Daily ]         
dt3 Temperature and Humidity         Max_min_temp [HlyMaxMin]         
> # Select the Reefton Ews station ------------------------------------------
> 
> reefton.st = cf_station()
> reefton.st
          name network agent      start        end open distance       lat      lon
1) Reefton Ews  F21182  3925 1960-08-01 2017-01-10 TRUE        0 -42.11578 171.8601
> 
> # Submit the query --------------------------------------------------------
> 
> # Retrieve all data from ~ six months ago at 9am
> reefton.data = cf_query(public.user, all.dts, reefton.st,
+                         paste(as.Date(Sys.time()) - 182, "9"))
connecting to CliFlo...
reading data...
UserName is = public
Number of charged rows output = 0
Number of free rows output = 999
Total number of rows output = 999
Note: The end date was revised to meet the maximum number of rows allowed per query [1000]
or due to running out of rows in your subscription. Also, one or more datatypes may have been disabled due to the above.
Copyright NIWA 2017 Subject to NIWA's Terms and Conditions
See: http://cliflo.niwa.co.nz/pls/niwp/doc/terms.html
Comments to: cliflo@niwa.co.nz


> reefton.data
List containing clifro data frames:
              data      type              start                end rows
df 1) Surface Wind  9am only (2016-07-11  9:00) (2017-01-09  9:00)  183
df 2)         Rain     Daily (2016-07-11  9:00) (2017-01-09  9:00)  183
df 3)      Max_min    Hourly (2016-07-11  9:00) (2016-08-06 17:00)  633
> # Plot the data -----------------------------------------------------------
> 
> # Plot the 9am surface wind data (first dataframe in the list) ---
> reefton.data[1]
 9am only surface wind (m/s) 
      Station          Date.local Dir.DegT Speed.ms Dir StdDev Spd StdDev Period.Hrs Freq
1 Reefton Ews 2016-07-11 09:00:00      110      0.1          0        0.2          1    H
2 Reefton Ews 2016-07-12 09:00:00      310      0.4         23        0.4          1    H
3 Reefton Ews 2016-07-13 09:00:00      330      1.8         61        0.9          1    H
4 Reefton Ews 2016-07-14 09:00:00       92      1.2         87        0.5          1    H
[~~ omitted 179 rows ~~]
> # all identical - although passed to different methods
> plot(reefton.data)    #plot,cfDataList,missing-method

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants