Skip to content
This repository has been archived by the owner on Jan 8, 2020. It is now read-only.

Bug fix: Removes length limit (of TLD) when validating a hostname #5992

Closed

Conversation

garte
Copy link
Contributor

@garte garte commented Mar 18, 2014

The regular expression matching a tld in a hostname was limited to 2 to 10 chars.
This limit was reached with new gTLDs like .photography. Added this use case to
tests and removed length limit of 10 chars to allow new TLDs like .photography.

Ocramius and others added 2 commits March 18, 2014 00:54
The regular expression matching a tld in a hostname was limited to 2 to 10 chars.
This limit was reached with new gTLDs like .photography. Added this use case to
tests and removed length limit of 10 chars to allow new TLDs like .photography.
@garte
Copy link
Contributor Author

garte commented Mar 18, 2014

@weierophinney Could you please backport this change to ZF 1?

@garte
Copy link
Contributor Author

garte commented Mar 18, 2014

Set a maximum limit of 63 chars according to this: http://archive.icann.org/en/topics/new-gtlds/idn-3-char-requirement-15feb10-en.pdf (maybe I'm reading that wrong, though).

@@ -522,7 +522,7 @@ public function isValid($value)
do {
// First check TLD
$matches = array();
if (preg_match('/([^.]{2,10})$/iu', end($domainParts), $matches)
if (preg_match('/([^.]{2,63})$/iu', end($domainParts), $matches)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably good enough for now. This is not the most correct as this would only allow for a 63-byte U-label (composed multibyte characters) when the specification is that the A-label (punycode) must be no more than 63 characters. The current heavyweight is the Tamil-language domain for .SG that comes in at 22 characters on the A-label: XN--CLCHC0EA0B2G2A9GCD and has a 33-byte U-label: சிங்கப்பூர்.

I'm contemplating doing a wholesale changeover of the arrays and checking from the composed UTF-8 names to the punycode A labels. This would make it easier to make sure the RTL domains aren't being munged on accident, and allow for a more correct byte checking.

@weierophinney The intl extension is bundled with >=5.3.0. Can I rely on the availability of the intl extension?Otherwise I'll have to go find code to do the U->A label conversion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moderndeveloperllc think it can be made optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ocramius I could check for the availability of idn_to_ascii(). If it wasn't there, it would mean including code similar what Net_IDNA2 has to decode the UTF-8 U-label to the punycode A-label. It would be functional opposite of decodePunycode(). Considering I'm not that level of programmer, it would probably be wholesale inclusion of the Net_IDNA2 logic.

Is it standard ZF practice to account for missing default extensions? There's been a discussion about the necessity of the pure-php Zend\Json encode/decode logic. I would think this is a similar discussion. Perhaps for 2.4 or 3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware of the IDNA hell. The Web Hypertext Application Technology Working Group recommends following: http://url.spec.whatwg.org/#idna

Or copy it from the Perl implementation: http://search.cpan.org/~cfaerber/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moderndeveloperllc We cannot rely on availability of ext/intl - in particular, consider new alternative runtimes such as HHVM, which still don't have that extension available. So, it would need to be made optional.

Additionally, consider ZF1, which still has a minimum supported PHP version of 5.2; that means ext/intl availability cannot be assumed at all for that platform. If you want me to backport it, we need to make that an optional dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weierophinney at least with hhvm, the INTL dilemma is solved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weierophinney I had not thought about this being backported to ZF1. I would make the extension optional. So what is the policy on what is basically wholesale inclusion of PEAR library logic? There may be more than one way to do the logic, but I'm quite sure I don't know how.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to make sure you follow the license details for Net_IDNA. If the license is not compatible (e.g., LGPL, GPL), we cannot include it. Otherwise, we need to ensure that the class and file docblocks indicate the copyright for the original library from which the code was copied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's LGPL. I'm fine with the change @garte suggested while I get everything refactored, added in, etc. There are no current cases his logic would incorrectly let through. It will take some time to get all I want done, and I'll put in my own PR on develop when it's ready for review.

@moderndeveloperllc
Copy link
Contributor

@weierophinney You aren't waiting on me are you? I'll put in a new PR when I get time to do more work on this.

@Ocramius
Copy link
Member

What's the status on this? Is new code to be done because of LGPL issues?

@moderndeveloperllc
Copy link
Contributor

@Ocramius This particular pull request from @garte is rather limited in scope and just removes a specific, outdated limit on TLD length. It's fine as a stopgap measure.

The class itself has other issues:

  • The is no PHP-native code to do encoding from U-label to A-label for installs without intl extension. Net_IDNA2 has LGPL code that does this.
  • It uses U-labels instead of A-labels in the checking algorithms.
  • The list of TLDs is not only outdated, but will be more difficult to keep up-to-date thanks to the profusion of gTLDs.
  • The list of IDN-capable TLDs is very outdated.

I can handle the last three points, but creating PHP code to implement RFC 3492 is, I'm sorry to admit, beyond my current skill/availability - and what I can do is predicated upon that first point.

@Ocramius
Copy link
Member

@garte @moderndeveloperllc merged/rebased manually. Thanks!

@Ocramius Ocramius added the bug label Nov 13, 2014
gianarb pushed a commit to zendframework/zend-validator that referenced this pull request May 15, 2015
gianarb pushed a commit to zendframework/zend-validator that referenced this pull request May 15, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants