Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is beam profile overlap help behaving correctly? #415

Closed
bart1 opened this issue Jan 7, 2021 · 5 comments
Closed

Is beam profile overlap help behaving correctly? #415

bart1 opened this issue Jan 7, 2021 · 5 comments
Assignees
Labels
Milestone

Comments

@bart1
Copy link
Collaborator

bart1 commented Jan 7, 2021

While working with @BerendWijers we noticed a error in the intergrate_to_ppi function. It fails when there is only one non NA values in the eta quantity of the vp:

require(bioRad)
#> Loading required package: bioRad
#> Welcome to bioRad version 0.5.2.9410
#> Docker daemon running, Docker functionality enabled (vol2bird version 0.5.0)
pvolfile <- system.file("extdata", "volume.h5", package = "bioRad")
example_pvol <- read_pvolfile(pvolfile)
data(example_vp)
my_ppi <- integrate_to_ppi(example_pvol, example_vp, nx = 50, ny = 50)
#> Warning in proj4string(output): CRS object has comment, which is lost in output
example_vp$data$eta
#>  [1]           NA 9.360947e+02 1.214729e+03 1.009605e+03 3.802453e+02
#>  [6] 2.260477e+02 2.525710e+02 2.177226e+02 2.175499e+02 1.913836e+02
#> [11] 1.319144e+02 7.948948e+01 3.380873e+01 1.201020e+01 8.496607e+00
#> [16] 7.096342e+00 5.475014e-01 5.672123e-02 4.670447e-02 4.765693e-02
#> [21] 4.313115e-02 2.215075e-02 3.500013e-02 1.164730e-02 2.986746e-02
example_vp$data$eta[3:25]<-NA
my_ppi <- integrate_to_ppi(example_pvol, example_vp, nx = 50, ny = 50)
#> Error in approxfun(vp$data$height + vp$attributes$where$interval/2, vp$data[[quantity]]): need at least two non-NA values to interpolate

This error comes from the following bit of beam_profile_overlap_help:

bioRad/R/beam.R

Lines 210 to 217 in 001f6a3

if(all(is.na(vp$data[[quantity]]))){
beamprof$vpr <- NA
} else{
beamprof$vpr <- approxfun(
vp$data$height + vp$attributes$where$interval / 2,
vp$data[[quantity]]
)(height)
}

It is here accounted for that all quantity values can be NA in which case NA is returned however approxfun also fails when there is only one none NA value. A fix could be to also return NA when there is only one none NA values. Thinking about this a bit more I do however wonder if the current behavior of this function is correct as approxfun omits all NA values while it might be the case that these should actually be assumed to be zero. I think this is most easily illustrated by the following example:

v <- c(-.5, .5, 1.5, 2.5)
approxfun(0:4, c(0, 1, NA, NA, 2))(v)
#> [1]       NA 0.500000 1.166667 1.500000
approxfun(0:4, c(0, 1, NA, NA, 2), na.rm = F)(v)
#> [1]  NA 0.5  NA  NA
approxfun(0:4, c(0, 1, 0, 0, 2))(v)
#> [1]  NA 0.5 0.5 0.0

I think if we really think NA means zero the last interpolation (where all NAs are replaced by 0) might be the most valid.

@adokter what do you think? I cant completely see through where and how this function is used so I did not directly want to change it but thought I would bring it up

Created on 2021-01-07 by the reprex package (v0.3.0)

@adokter
Copy link
Owner

adokter commented Jan 7, 2021

Hi @bart1 and @BerendWijers , thanks for catching this bug.

Strictly I would say NA values are not the same as 0, because NA maps to nodata in the ODIM convention, the value to denote areas void of data (never radiated), therefore I prefer maintaining the NA value for as long as possible.

In practice for this specific case it doesn't make much numerical difference, because omitting NA values in these final statements within a sum is identical to setting those elements to zero:

bioRad/R/beam.R

Lines 220 to 222 in 001f6a3

beamprof$vpr <- beamprof$vpr / sum(step * beamprof$vpr, na.rm = T)
# calculate the Bhattacharyya coefficient of the density profile and radiation coverage pattern
sum(step * sqrt(beamprof$radiation * beamprof$vpr), na.rm = T)

So it boils down to whether we like how approxfun() handles the interpolation, which defaults to NA whenever entering a segment that approaches an NA value

typically your v vector is on a much finer grid than the data itself, so the difference between the current interpolation (green) and your suggestion of replacing with zeros (blue) looks like this (red the data points with value at 2 equal to NA):

v <- seq(-.5,5,length.out=100)

ipol1=approxfun(0:4, c(25, 50, NA, 50, 25), na.rm = F)(v)
ipol2=approxfun(0:4, c(25, 50, 0, 50, 25))(v)

plot(c(0,1,3,4),c(25, 50, 50, 25), col='red',pch=16,ylim=c(0,50))
points(v,ipol2,pch=2,col='blue')
points(v,ipol1,col='green')

Rplot

Mathematically approxfun does the right thing, as it can't know how to linearly interpolate to the NA value. How to interpolate around the NA really depends on whether the NA value represents an (possibly irradiated) area of airspace that did or did not contain birds, so it's all about assumptions. I think my vote remains with the green interpolation, as in that case at least the mean and sd of the raw data points and the interpolated datapoints remains the same. Any further thoughts?

@bart1
Copy link
Collaborator Author

bart1 commented Jan 7, 2021

I definitely agree NA and 0 are different. I'm just not completely sure when eta becomes NA and when 0. I can definitely agree to the green interpolation and it is not so different to from the blue one as you say. I would object to the current (here red) interpolation. As na.rm now is not set is defaults to TRUE meaning all intermediate values become like the red line. Is there is a larger height region NA this could lead to strange results.

v <- seq(-.5,5,length.out=100)
ipol1=approxfun(0:4, c(25, 50, NA, 50, 25), na.rm = F)(v)
ipol2=approxfun(0:4, c(25, 50, 0, 50, 25))(v)
ipol3=approxfun(0:4, c(25, 50, NA, 50, 25))(v)

plot(c(0,1,3,4),c(25, 50, 50, 25), col='red',pch=16,ylim=c(0,50))
points(v,ipol2,pch=2,col='blue')
points(v,ipol1,col='green')
points(v,ipol3,pch=3, col='red')

Created on 2021-01-07 by the reprex package (v0.3.0)

To summarize I make two changes first approxfun will use na.rm=F from now on and second the first check for all NA values will be changed to having at least 2 non NA values. If you agree I will implement these and we can close the issue.

@adokter
Copy link
Owner

adokter commented Jan 7, 2021

Hi @bart1, you're right the na.rm=F is missing, so that should definitely be corrected as you suggested.

I'm just not completely sure when eta becomes NA and when 0

eta becomes zero if the airspace is empty (linear reflectivity zero, log reflectivity -Inf dBZ) or when it's set to zero by sd_vvp_threshold() criterium. It's NA when no pixels of any sweep that have their center within the layer, i.e. when there is no data.

Looking at this example of alternating values and NAs:

> approxfun(0:3, c(25, NA, 50, NA), na.rm = F)(v)
  [1] NA NA NA NA NA NA NA NA NA 25 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 50 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [75] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> approxfun(0:3, c(25, 0, 50, 0), na.rm = F)(v)
  [1]        NA        NA        NA        NA        NA        NA        NA        NA        NA 25.000000 23.611111 22.222222 20.833333 19.444444 18.055556 16.666667 15.277778 13.888889 12.500000 11.111111  9.722222  8.333333
 [23]  6.944444  5.555556  4.166667  2.777778  1.388889  0.000000  2.777778  5.555556  8.333333 11.111111 13.888889 16.666667 19.444444 22.222222 25.000000 27.777778 30.555556 33.333333 36.111111 38.888889 41.666667 44.444444
 [45] 47.222222 50.000000 47.222222 44.444444 41.666667 38.888889 36.111111 33.333333 30.555556 27.777778 25.000000 22.222222 19.444444 16.666667 13.888889 11.111111  8.333333  5.555556  2.777778  0.000000        NA        NA
 [67]        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA
 [89]        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA

Here replacing NA with zeros might be a better solution after all, to avoid the interpolation become overwhelmingly NA, even when half of the interpolated vector has valid values. Let me know what you think, and please go ahead with making the change. We should also properly document this in the help pages.

@adokter adokter added the bug label Jan 7, 2021
@adokter adokter added this to the 0.6.0 milestone Jan 7, 2021
adokter added a commit that referenced this issue Jan 8, 2021
@adokter
Copy link
Owner

adokter commented Jan 8, 2021

@bart1 I was preparing a pull request to bring the master branch up to date with latest developments, so fixed this one quickly as well, let me know if it doesn't behave as expected

@bart1
Copy link
Collaborator Author

bart1 commented Jan 11, 2021

Thanks I agree with the solution, I tested and can confirm it works like expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants