You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
Thank you for your great paper.
And, trying to implement GMAC algorithm I found possible error with standart deviation in MixtureGaussianHead.
So questionable string is:
sigs = torch.sqrt(self.min_var * F.softplus(self.linear2(x)) + self.min_var)
The question is: why do we multiply softplus by minimum possible variation?
So if we chose minimum variation too low or equals to zero that effectively disable output of this network because we can't expect that linear layer can freely give us output of magnitude more than 10 or 100. And for current defaults it must output magnitudes of 10000 and more to give us variation of just 1 in magnitude. And it can't overcome min_var of zero at all!
So i think two possible correct solutions is:
sigs = torch.sqrt(F.softplus(self.linear2(x)) + self.min_var)
or
sigs = torch.sqrt(self.max_var * F.sigmoid(self.linear2(x)) + self.min_var)
The first variant is not limiting maximum sigs.
Second is clipping bouth minimum and maximum sigs.
And as a side note softplus function is poorly scales for very big amplitudes. So I recommend to parameterize sigmoids by exponent.
Something like:
sigs = torch.sqrt(F.exp(self.linear2(x)) + self.min_var)
Or we can shift start amplitudes to zeros by changing biases in linear2 or by shift in expression:
sigs = torch.sqrt(F.exp(self.linear2(x) - 2) + self.min_var)
The text was updated successfully, but these errors were encountered:
Hi!
Thank you for your great paper.
And, trying to implement GMAC algorithm I found possible error with standart deviation in MixtureGaussianHead.
So questionable string is:
sigs = torch.sqrt(self.min_var * F.softplus(self.linear2(x)) + self.min_var)
The question is: why do we multiply softplus by minimum possible variation?
So if we chose minimum variation too low or equals to zero that effectively disable output of this network because we can't expect that linear layer can freely give us output of magnitude more than 10 or 100. And for current defaults it must output magnitudes of 10000 and more to give us variation of just 1 in magnitude. And it can't overcome min_var of zero at all!
So i think two possible correct solutions is:
sigs = torch.sqrt(F.softplus(self.linear2(x)) + self.min_var)
or
sigs = torch.sqrt(self.max_var * F.sigmoid(self.linear2(x)) + self.min_var)
The first variant is not limiting maximum sigs.
Second is clipping bouth minimum and maximum sigs.
And as a side note softplus function is poorly scales for very big amplitudes. So I recommend to parameterize sigmoids by exponent.
Something like:
sigs = torch.sqrt(F.exp(self.linear2(x)) + self.min_var)
Or we can shift start amplitudes to zeros by changing biases in linear2 or by shift in expression:
sigs = torch.sqrt(F.exp(self.linear2(x) - 2) + self.min_var)
The text was updated successfully, but these errors were encountered: