Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querySelector and attribute's value #37

Closed
dmitriy-krista opened this issue Aug 7, 2019 · 5 comments
Closed

querySelector and attribute's value #37

dmitriy-krista opened this issue Aug 7, 2019 · 5 comments
Assignees

Comments

@dmitriy-krista
Copy link

When specifying the value for an attribute in querySelector() method like tag[attr='value'] , the method doesn't work as expected and returns null.

Example to reproduce the issue:

<?php

require "vendor/autoload.php";

$html = file_get_contents("https://github.com");
$document = new \Gt\Dom\HTMLDocument($html);

//will print "Enterprise"
echo $document->querySelector("nav > ul > li > a[data-ga-click]")->innerText . "\n";

//will throw PHP Notice: Trying to get property 'innerText' of non-object
echo $document->querySelector("nav > ul > li > a[data-ga-click='(Logged out) Header, go to Enterprise']")->innerText . "\n";

The expected behaviour is that the last line should print "Enterprise"

If I try to run the exact querySelector call in my browser, it works correctly and both lines print "Enterprise":

document.querySelector("nav > ul > li > a[data-ga-click]").innerText

document.querySelector("nav > ul > li > a[data-ga-click='(Logged out) Header, go to Enterprise']").innerText

@g105b g105b self-assigned this Aug 8, 2019
@g105b
Copy link
Member

g105b commented Aug 8, 2019

Thanks for reporting this. I'll look into it shortly.

@g105b
Copy link
Member

g105b commented Aug 9, 2019

Hi @dmitriy-krista, it looks like the problem is with the use of the comma character, which isn't taking into account the quotes surrounding it. I've tested a simple attribute selector which works fine, so it looks like the way the attribute selector is parsed in PHP.Gt/CssXPath is to blame.

I'll transfer this issue there and find a fix soon.

@g105b g105b transferred this issue from PhpGt/Dom Aug 9, 2019
g105b added a commit that referenced this issue Aug 9, 2019
@g105b
Copy link
Member

g105b commented Aug 9, 2019

I have isolated the bug to the naive use of explode on commas to separate queries:

$cssParts = explode(",", $this->cssSelector);

@g105b
Copy link
Member

g105b commented Aug 9, 2019

Here is a test case I'm working on before implementing: https://regex101.com/r/xB7rQ7/224

Notice the last example doesn't work. If anyone could help me understand backreferences in this example, we can squish the bug and close this issue.

@g105b g105b closed this as completed in 9918437 Aug 9, 2019
@g105b
Copy link
Member

g105b commented Aug 9, 2019

Works a treat now. I'll release this to the next version which will be adopted by the next version of Dom.

Have fun!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants