Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filtering out deleted ports and loop over all ports to identify deleted ports #122

Closed
wants to merge 3 commits into from

Conversation

arjunsalyan
Copy link
Member

I ran the tests on EC2 instance and the process was really fast, finished within 10-15 seconds.

So, I believe we can run it with every update and get rid of the old confusing method to detect deleted ports altogether.

@arjunsalyan arjunsalyan requested a review from mojca September 26, 2019 17:52
@codecov
Copy link

codecov bot commented Sep 26, 2019

Codecov Report

Merging #122 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #122      +/-   ##
=========================================
- Coverage    73.9%   73.9%   -0.01%     
=========================================
  Files          45      45              
  Lines        1330    1341      +11     
=========================================
+ Hits          983     991       +8     
- Misses        347     350       +3
Impacted Files Coverage Δ
app/ports/models/port.py 88.66% <100%> (+0.18%) ⬆️
app/ports/management/commands/update-portinfo.py 96.29% <100%> (ø) ⬆️
app/ports/views.py 58.16% <100%> (+0.41%) ⬆️
app/ports/filters.py 100% <100%> (ø) ⬆️
app/ports/tests/test_update_portinfo.py 100% <100%> (ø) ⬆️
app/ports/templatetags/utilities.py 36.36% <0%> (-6.5%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f09d9b...dba5688. Read the comment docs.

@arjunsalyan
Copy link
Member Author

This has been deployed on the test site:
http://ec2-52-34-234-111.us-west-2.compute.amazonaws.com/

@arjunsalyan arjunsalyan changed the title Loop over all ports to identify deleted ports Allow filtering out deleted ports and loop over all ports to identify deleted ports Sep 29, 2019
@mojca
Copy link
Member

mojca commented Sep 29, 2019

This PR contains three somewhat unrelated commits. I'll try to shortly comment on each one of them.

  • Looping through all ports to identify old deleted ports

    I need to take a closer look, but in the future I would be in favour of perhaps switching to updating port info "after each commit" (but maybe not more frequently than every N minutes in case that a large number of updates flows in). Generally this is only needed as a one-time fix for something that was broken for a while, and ideally we'll never need it again. If we now execute this code every few commits, even if it only runs for 15 seconds, such code snippets might keep adding up to very long execution times. It does make sense to execute some cleanups on regular basis, but doing it each time still sounds a bit of an overkill in the long run.
    We will also need to address licences one day (Format of the License field #44) which would also require modifying all ports, despite most of them not being actually updated (I would vote for modifying the string in the json file).
    So it would probably make sense to come up with some more generic way of batch-updating ports when such a need arises. Any thoughts about that?

  • Show only active ports by default in port listings

    • Yes, definitely, this needs to go in. Ideally we would also add some unit tests though, to make sure that all those queries are correct. They are by now a bit easy too easy to break if someone else starts contributing code.
    • We still have one bug left: if you check the page of apache-ant, it lists jump as having a build dependency on apache-ant. But jump was deleted, so we shouldn't be listing that dependency. I see three options: (1) delete deleted ports, (2) delete all dependencies whenever a port gets deleted, (3) make complex queries to ensure that only active dependencies are listed. I think I would go for (2), and probably also not display the dependency fields for any deleted ports. I guess that still leaves the question about whether we should keep maintainers. Probably we should remove maintainers as well?
  • Add checkbox to toggle displaying of deleted ports

    • Actually, it's the other commit that addresses Don't display deleted ports unless the user asks you to #110 :) The problem was that deleted ports are shown in the first place. (Don't consider this to be a serious feedback.)
    • You mentioned that you deployed the solution, but you should at least mention some deleted ports on which this could be tested. I searched the logs and now found python37-devel and jump as two recent examples.
    • I'm not ever sure that we really really need this checkbox, but if we do have it, I would probably call it "Include delete ports" rather than "Show deleted ports". The latter might suggest that only deleted ports will be shown.
    • I wanted to say that when the ports are shown as search results, we would probably need some kind of visual marker for deleted, but I see that you already added that, thanks a lot. Do we want something similar for obsolete ports as well?
    • Generally I would say that similar functionality could be used for other things, but I'm pretty sure we don't want do include deleted ports when searching through various categories, maintainers etc. If/when we add some advanced search functionality, I would probably even delete the huge "Include deleted ports" checkbox from the main page and only leave the functionality on the "very special search" page.

@umeshksingla
Copy link
Member

You mentioned that you deployed the solution, but you should at least mention some deleted ports on which this could be tested. I searched the logs and now found python37-devel and jump as two recent examples.

I felt that too when I tried to check and but eventually felt asleep looking for deleted ports in macports-ports repository but I tried thinking about it in a different way. If he would have given an example of a deleted port, we might have just checked that and merged and would not be able to catch any potential bugs (any slightly complicated example like jump you just mentioned) which we may find with a little exploration.

I'm not ever sure that we really really need this checkbox, but if we do have it, I would probably call it "Include delete ports" rather than "Show deleted ports". The latter might suggest that only deleted ports will be shown.

Yes, "Include" is certainly better than "show". I wanted to go for something like:
Status: [ ] Active [ ] Deleted [ ] Obsolete [ ] xyz

But it seemed too much for now. And I am sure people who are using it will find it very inconvenient to go through the huge number of results the search returns.

I would probably even delete the huge "Include deleted ports" checkbox from the main page and only leave the functionality on the "very special search" page.

Completely agree. Or search bar can support queries with tags like include:deleted python-devel3, much like Gmail or Google Search.

@mojca
Copy link
Member

mojca commented Sep 29, 2019

I wanted to go for something like:
Status: [ ] Active [ ] Deleted [ ] Obsolete [ ] xyz

But it seemed too much for now.

No way. Already the checkbox for deleted ports might be too much.

Also note that our list of deleted ports is very incomplete (almost semi-random). If I search for deleted ports, I might eventually expect to find them all.

Or search bar can support queries with tags like include:deleted python-devel3, much like Gmail or Google Search.

Maybe, but that's still quite far down the road of all other features that we might want to have.

@arjunsalyan
Copy link
Member Author

arjunsalyan commented Sep 29, 2019

Generally this is only needed as a one-time fix for something that was broken for a while, and ideally we'll never need it again. If we now execute this code every few commits, even if it only runs for 15 seconds, such code snippets might keep adding up to very long execution times.

So you are suggesting that we go for the old method?

  • We still have one bug left: if you check the page of apache-ant, it lists jump as having a build dependency on apache-ant. But jump was deleted, so we shouldn't be listing that dependency.

I did not want to change any information we picked from the Portfile. And I feel that we should not, but maybe I am wrong.

  • You mentioned that you deployed the solution, but you should at least mention some deleted ports on which this could be tested.

Oh, sorry, my bad.

Do we want something similar for obsolete ports as well?

Maybe yellow for obsolete ports?

If/when we add some advanced search functionality, I would probably even delete the huge "Include deleted ports" checkbox from the main page and only leave the functionality on the "very special search" page.

Port discovery is one of the most important part of the app, why not convert the homepage to a full fledged search page? The box that contains the search box could become more advanced? I am just saying.

@mojca
Copy link
Member

mojca commented Sep 29, 2019

Generally this is only needed as a one-time fix for something that was broken for a while, and ideally we'll never need it again. If we now execute this code every few commits, even if it only runs for 15 seconds, such code snippets might keep adding up to very long execution times.

So you are suggesting that we go for the old method?

What was the old method?

I'm trying to brainstorm here, pointing out some concerns (reasons why doing this too often might not be the best idea).

What would you or @umeshksingla consider the best solution here?

  • We still have one bug left: if you check the page of apache-ant, it lists jump as having a build dependency on apache-ant. But jump was deleted, so we shouldn't be listing that dependency.

I did not want to change any information we picked from the Portfile. And I feel that we should not, but maybe I am wrong.

It might be ok to show apache-ant as a dependency of jump (on jump's page), but not to show jump as a dependent of apache-ant. When developers make changes to a port like apache-ant they might want to check if all dependent ports still work, or check if the port can safely be renamed or deleted without affecting others. Including deleted ports on that list is just contributing to noise.

Keeping deleted ports around is useful to some extent (if users want to create a new port it could help to know that such a port existed in the past; external links to that port would not break etc.). But the exact information about that port is of very limited use. If the port was first obsoleted, most of the information was gone at that point anyway. There is definitely no maintainer (even if one was present before the deletion, the port is de-facto no longer maintainer), dependencies are bogus, website might be broken, ...

You could certainly keep the information about dependencies, but I'm afraid this will needlessly complicate proper handling at the other end, where you really don't want to show deleted dependents / dependencies.

Do we want something similar for obsolete ports as well?

Maybe yellow for obsolete ports?

Sounds fine. (Or basically any other way, I don't really care.)

If/when we add some advanced search functionality, I would probably even delete the huge "Include deleted ports" checkbox from the main page and only leave the functionality on the "very special search" page.

Port discovery is one of the most important part of the app, why not convert the homepage to a full fledged search page? The box that contains the search box could become more advanced? I am just saying.

What if we start by creating a dedicated /search page and experimenting on that one?

What we could do on the first/main page is then add a More / Advanced button which shows additional search options (it could be either javascript or redirect to a different search page; probably the first). Then "Include deleted ports" could be one of the advanced search options.

@arjunsalyan
Copy link
Member Author

What was the old method?

We were taking all the ports under the updated portdirs and checking if they were available in the latest JSON.

I'm trying to brainstorm here, pointing out some concerns (reasons why doing this too often might not be the best idea).

Initially, even I did not like the idea of running this every time. But seeing the speed and simplicity of the code, I got inclined towards this. However, as you mentioned, running this every time might not be a good idea.

What would you or @umeshksingla consider the best solution here?

In any case, we will have to run it at least once. Maybe, we can merge this and later switch back to the old method?

It might be ok to show apache-ant as a dependency of jump (on jump's page), but not to show jump as a dependent of apache-ant.

Okay.

What we could do on the first/main page is then add a More / Advanced button which shows additional search options (it could be either javascript or redirect to a different search page; probably the first). Then "Include deleted ports" could be one of the advanced search options.

Seems good!

@arjunsalyan
Copy link
Member Author

One thing I am realising is, I need to break it into several different Pull Requests, otherwise one thing is just getting delayed due to other.

@mojca
Copy link
Member

mojca commented Oct 10, 2019

Yes, breaking into smaller chunks probably makes sense.

Also one random thought. I provided some feedback about the imperfection of importing the new builders, but on the other hand ... it will do less harm to have suboptimal queries (but complete build history in the database) than not show any builds at all. After all it's just about displaying data, so we could easily go with your initial approach, just import data, and work on improving the sql queries later.

@arjunsalyan
Copy link
Member Author

Also one random thought. I provided some feedback about the imperfection of importing the new builders, but on the other hand ... it will do less harm to have suboptimal queries (but complete build history in the database) than not show any builds at all. After all it's just about displaying data, so we could easily go with your initial approach, just import data, and work on improving the sql queries later.

Yes, that was what I also though initially, but now I have both things in place: #119 . However, if we quickly just want to add those new builders, I can create a new PR.

@arjunsalyan
Copy link
Member Author

The newer update and search mechanism handles everything out-of-the-box. This PR is obsolete.

@arjunsalyan arjunsalyan closed this Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants