-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Bug fix: Removes length limit (of TLD) when validating a hostname #5992
Bug fix: Removes length limit (of TLD) when validating a hostname #5992
Conversation
The regular expression matching a tld in a hostname was limited to 2 to 10 chars. This limit was reached with new gTLDs like .photography. Added this use case to tests and removed length limit of 10 chars to allow new TLDs like .photography.
@weierophinney Could you please backport this change to ZF 1? |
Set a maximum limit of 63 chars according to this: http://archive.icann.org/en/topics/new-gtlds/idn-3-char-requirement-15feb10-en.pdf (maybe I'm reading that wrong, though). |
@@ -522,7 +522,7 @@ public function isValid($value) | |||
do { | |||
// First check TLD | |||
$matches = array(); | |||
if (preg_match('/([^.]{2,10})$/iu', end($domainParts), $matches) | |||
if (preg_match('/([^.]{2,63})$/iu', end($domainParts), $matches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably good enough for now. This is not the most correct as this would only allow for a 63-byte U-label (composed multibyte characters) when the specification is that the A-label (punycode) must be no more than 63 characters. The current heavyweight is the Tamil-language domain for .SG that comes in at 22 characters on the A-label: XN--CLCHC0EA0B2G2A9GCD and has a 33-byte U-label: சிங்கப்பூர்.
I'm contemplating doing a wholesale changeover of the arrays and checking from the composed UTF-8 names to the punycode A labels. This would make it easier to make sure the RTL domains aren't being munged on accident, and allow for a more correct byte checking.
@weierophinney The intl extension is bundled with >=5.3.0. Can I rely on the availability of the intl extension?Otherwise I'll have to go find code to do the U->A label conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moderndeveloperllc think it can be made optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ocramius I could check for the availability of idn_to_ascii(). If it wasn't there, it would mean including code similar what Net_IDNA2 has to decode the UTF-8 U-label to the punycode A-label. It would be functional opposite of decodePunycode(). Considering I'm not that level of programmer, it would probably be wholesale inclusion of the Net_IDNA2 logic.
Is it standard ZF practice to account for missing default extensions? There's been a discussion about the necessity of the pure-php Zend\Json encode/decode logic. I would think this is a similar discussion. Perhaps for 2.4 or 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be aware of the IDNA hell. The Web Hypertext Application Technology Working Group recommends following: http://url.spec.whatwg.org/#idna
Or copy it from the Perl implementation: http://search.cpan.org/~cfaerber/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moderndeveloperllc We cannot rely on availability of ext/intl - in particular, consider new alternative runtimes such as HHVM, which still don't have that extension available. So, it would need to be made optional.
Additionally, consider ZF1, which still has a minimum supported PHP version of 5.2; that means ext/intl availability cannot be assumed at all for that platform. If you want me to backport it, we need to make that an optional dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weierophinney at least with hhvm, the INTL dilemma is solved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weierophinney I had not thought about this being backported to ZF1. I would make the extension optional. So what is the policy on what is basically wholesale inclusion of PEAR library logic? There may be more than one way to do the logic, but I'm quite sure I don't know how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to make sure you follow the license details for Net_IDNA. If the license is not compatible (e.g., LGPL, GPL), we cannot include it. Otherwise, we need to ensure that the class and file docblocks indicate the copyright for the original library from which the code was copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's LGPL. I'm fine with the change @garte suggested while I get everything refactored, added in, etc. There are no current cases his logic would incorrectly let through. It will take some time to get all I want done, and I'll put in my own PR on develop
when it's ready for review.
@weierophinney You aren't waiting on me are you? I'll put in a new PR when I get time to do more work on this. |
What's the status on this? Is new code to be done because of LGPL issues? |
@Ocramius This particular pull request from @garte is rather limited in scope and just removes a specific, outdated limit on TLD length. It's fine as a stopgap measure. The class itself has other issues:
I can handle the last three points, but creating PHP code to implement RFC 3492 is, I'm sorry to admit, beyond my current skill/availability - and what I can do is predicated upon that first point. |
@garte @moderndeveloperllc merged/rebased manually. Thanks! |
…th-limitations' into develop Close zendframework/zendframework#5992 Forward port zendframework/zendframework#5992
The regular expression matching a tld in a hostname was limited to 2 to 10 chars.
This limit was reached with new gTLDs like .photography. Added this use case to
tests and removed length limit of 10 chars to allow new TLDs like .photography.