-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outbox: Add upgrade script to populate outbox items #1175
Conversation
The risk here is, that all added items will be queued for federation again and because these are quite a lot, it could cause quite some traffic. I think we have to find a way to add historical data without publishing them. |
I was hoping the |
Oh, haven't seen that. True, but this might be only for the moment they were added... the state of the post will still be "pending for federation", so it will be handled at one time... |
da26595 adds a localized version of |
04a4fe4
to
7de214c
Compare
Does it make sense to have some of the functions in the collection, instead of the migration? Should we have dedicated classes for bigger migrations, like for example |
Sure, what are you thinking?
At some point I'd like to suggest renaming Migration to Upgrade to align with Core and free up the namespace for any future content migration work. Not sure we need separate classes for upgrade routines (yet). As a matter of fact, I created |
fea0686
to
45b3b2d
Compare
Ran the upgrade, it worked nicely. Was just toying around with decoding the JSON content of one of the outbox items, like
Uh-oh. So I try another couple of posts and they work fine. So it's something in this post. Here's the JSON and the original post content: JSON {"@context":["https://www.w3.org/ns/activitystreams",{"Hashtag":"as:Hashtag","sensitive":"as:sensitive"}],"id":"http://wp.test/?p=140","type":"Note","attachment":[],"attributedTo":"http://wp.test/?author=1","audience":"http://wp.test/?author=0","content":"u003Ch2u003EComparing Metau0026#8217;s CodeLlama to OpenAIu0026#8217;s Copilotu003C/h2u003Eu003Cpu003EAfter my u003Ca href=u0022https://mattwiebe.blog/2023/08/30/how-to-run-codellama-in-vscode-on-macos/u0022u003Elast postu003C/au003E about setting up CodeLlama, some colleagues have asked the million dollar question: how does CodeLlama compare to Copilot? My early answer is: u0026#8220;I donu0026#8217;t know yetu0026#8221; but hereu0026#8217;s a useful comparison.u003C/pu003Eu003Cpu003EI have little shell script that lets me interface with u003Ca href=u0022https://phacility.com/phabricator/arcanist/u0022u003EArcanistu003C/au003E (u003Ccodeu003Earcu003C/codeu003E), part of the Phabricator toolset we use internally. One of the (many) weird things about it vs something like Github is that your PRs (Diffs in Phabricator parlance) are independent from git branches. This is a problem when I need to update a Diff, but I donu0026#8217;t know what Iu0026#8217;ve been working with from the current branch. u003Ccodeu003Egit pushu003C/codeu003E is just not possible, I need to run u003Ccodeu003Earc diff --update DXXXXXu003C/codeu003E instead. But what is the ID?u003C/pu003Eu003Cpu003EThereu0026#8217;s already a command for this: u003Ccodeu003Earc whichu003C/codeu003E. But its output is messy. Iu0026#8217;m working on a Diff that adds u003Ca href=u0022https://github.com/Automattic/wordpress-activitypub/pull/395u0022u003Ethis plugin PRu003C/au003E to WordPress.com, fixing (inevitable) bugs along the way, and this is what u003Ccodeu003Earc whichu003C/codeu003E gives me:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003EREPOSITORYTo identify the repository associated with this working copy, arc followed this process: Configuration value u0022repository.callsignu0022 is set to u0022WPGITu0022. Found a unique matching repository.This working copy is associated with the WordPress.com repository.COMMIT RANGEIf you run 'arc diff', changes between the commit: 212461c993d9d7ad [redacted commit message]...and the current working copy state will be sent to Differential, becauseit is the merge-base of 'origin/trunk' and HEAD, as specified by'git:merge-base(origin/trunk)' in your project 'base' configuration.You can see the exact changes that will be sent by running this command: $ git diff 212461c993d9d7ad..HEADThese commits will be included in the diff: 60eb751ea35ecfe1 [redacted commit message] 55dc4536770138b0 ActivityPub: add Follow Me blockMATCHING REVISIONSThese Differential revisions match the changes in this working copy: D120281 ActivityPub: add Follow Me block Reason: Commit message for '55dc4536770138b0' has explicit 'Differential Revision'.Since exactly one revision in Differential matches this working copy, it willbe updated if you run 'arc diff'.u003C/preu003Eu003Cpu003E(That last line is a lie.) I already have a shell command that cuts out the noise and just provides me with the URL and Diff ID that I want:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003Earc which | grep -E 'D[0-9]+\ ' -o | awk '{ printf u0022https://redacted.a8c.com/%s %s\nu0022, $1, $1 }'u003C/preu003Eu003Cpu003EWhich for the above gives me:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003Ehttps://redacted.a8c.com/D120281 D120281u003C/preu003Eu003Cpu003ECool. But I decided that Iu0026#8217;d stripped too much context, I wanted the Diff title after the ID as well, so that my output would be like this:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003EActivityPub: add Follow Me blockhttps://redacted.a8c.com/D120281 D120281u003C/preu003Eu003Cpu003EThat was hard because u003Ccodeu003Eawku003C/codeu003E separates on spaces and would only give me the first word of the title in my first attempt:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003Earc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf u0022%s\nhttps://redacted.a8c.com/%s %s\nu0022, $2, $1, $1 }'ActivityPub: https://redacted.a8c.com/D120281 D120281u003C/preu003Eu003Cpu003ESo I decided to ask both Copilot and CodeLlama (u003Ccodeu003E7b-instruct-q5_K_Mu003C/codeu003E) for help. I kept my wording a bit vague to see how they would do. u003C/pu003Eu003Cpu003ESpoiler: they both contributed something useful, and it was doubly interesting that CodeLlama told me I was doing it wrong.u003C/pu003Eu003Ch3 class=u0022wp-block-headingu0022u003EPromptu003C/h3u003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003EI have the following shell script, which parses an ID and the title out of arc which output.arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf u0022%s\nhttps://code.a8c.com/%s %s\nu0022, $2, $1, $1 }'arc which line that concerns me looks like:` D120281 ActivityPub: add Follow Me block`The trouble is that awk thinks $2 is only the first word of the remaining line, just u0022ActivityPubu0022. I want the whole line in the output. How can I achieve that?u003C/preu003Eu003Ch3 class=u0022wp-block-headingu0022u003ECopilot responseu003C/h3u003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003ETo include the entire title in the output, you can modify the awk command to print all fields starting from the second field. Here's the updated command:```arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf u0022%s\nhttps://code.a8c.com/%s u0022, $2, $1; for (i=2; iu0026lt;=NF; i++) printf u0022%s u0022, $i; printf u0022\nu0022 }'```'In this updated command, we're using a for loop to print all fields starting from the second field. The NF variable represents the total number of fields in the input, so we're looping from the second field (i=2) to the last field (iu0026lt;=NF). We're then using printf to print each field separated by a space.This should output the entire title in the output, along with the ID and URL.u003C/preu003Eu003Ch3 class=u0022wp-block-headingu0022u003ECodeLlama responseu003C/h3u003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003ETo get the whole line in the output, you need to use a different approach with awk. Instead of using $2 to print out the second word of the remaining line, you can use $0 to print out the entire line. Here's an updated version of your shell script that uses awk to print out the whole line:```arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf u0022https://code.a8c.com/%s %s\nu0022, $1, $0 }'```In this version of the script, we use awk to print out the entire line ($0) as part of the URL that we want to output. We also use $1 to print out the identifier followed by one or more numbers. I hope this helps! Let me know if you have any questions.u003C/preu003Eu003Cpu003ENeither approach was exactly what I wanted, but I learned something from both and was able to use this to produce my desired outcome:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003Earc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ for (i=2; iu0026lt;=NF; i++) printf u0022%s u0022, $i; printf u0022\n %s https://code.a8c.com/%s\nu0022, $1, $1; }'u003C/preu003Eu003Cpu003EIn any case, googling for code help is probably pretty much dead. And this is the first generation of these tools, which will only get better.u003C/pu003E","contentMap":{"en":"u003Ch2u003EComparing Metau0026#8217;s CodeLlama to OpenAIu0026#8217;s Copilotu003C/h2u003Eu003Cpu003EAfter my u003Ca href=u0022https://mattwiebe.blog/2023/08/30/how-to-run-codellama-in-vscode-on-macos/u0022u003Elast postu003C/au003E about setting up CodeLlama, some colleagues have asked the million dollar question: how does CodeLlama compare to Copilot? My early answer is: u0026#8220;I donu0026#8217;t know yetu0026#8221; but hereu0026#8217;s a useful comparison.u003C/pu003Eu003Cpu003EI have little shell script that lets me interface with u003Ca href=u0022https://phacility.com/phabricator/arcanist/u0022u003EArcanistu003C/au003E (u003Ccodeu003Earcu003C/codeu003E), part of the Phabricator toolset we use internally. One of the (many) weird things about it vs something like Github is that your PRs (Diffs in Phabricator parlance) are independent from git branches. This is a problem when I need to update a Diff, but I donu0026#8217;t know what Iu0026#8217;ve been working with from the current branch. u003Ccodeu003Egit pushu003C/codeu003E is just not possible, I need to run u003Ccodeu003Earc diff --update DXXXXXu003C/codeu003E instead. But what is the ID?u003C/pu003Eu003Cpu003EThereu0026#8217;s already a command for this: u003Ccodeu003Earc whichu003C/codeu003E. But its output is messy. Iu0026#8217;m working on a Diff that adds u003Ca href=u0022https://github.com/Automattic/wordpress-activitypub/pull/395u0022u003Ethis plugin PRu003C/au003E to WordPress.com, fixing (inevitable) bugs along the way, and this is what u003Ccodeu003Earc whichu003C/codeu003E gives me:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003EREPOSITORYTo identify the repository associated with this working copy, arc followed this process: Configuration value u0022repository.callsignu0022 is set to u0022WPGITu0022. Found a unique matching repository.This working copy is associated with the WordPress.com repository.COMMIT RANGEIf you run 'arc diff', changes between the commit: 212461c993d9d7ad [redacted commit message]...and the current working copy state will be sent to Differential, becauseit is the merge-base of 'origin/trunk' and HEAD, as specified by'git:merge-base(origin/trunk)' in your project 'base' configuration.You can see the exact changes that will be sent by running this command: $ git diff 212461c993d9d7ad..HEADThese commits will be included in the diff: 60eb751ea35ecfe1 [redacted commit message] 55dc4536770138b0 ActivityPub: add Follow Me blockMATCHING REVISIONSThese Differential revisions match the changes in this working copy: D120281 ActivityPub: add Follow Me block Reason: Commit message for '55dc4536770138b0' has explicit 'Differential Revision'.Since exactly one revision in Differential matches this working copy, it willbe updated if you run 'arc diff'.u003C/preu003Eu003Cpu003E(That last line is a lie.) I already have a shell command that cuts out the noise and just provides me with the URL and Diff ID that I want:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003Earc which | grep -E 'D[0-9]+\ ' -o | awk '{ printf u0022https://redacted.a8c.com/%s %s\nu0022, $1, $1 }'u003C/preu003Eu003Cpu003EWhich for the above gives me:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003Ehttps://redacted.a8c.com/D120281 D120281u003C/preu003Eu003Cpu003ECool. But I decided that Iu0026#8217;d stripped too much context, I wanted the Diff title after the ID as well, so that my output would be like this:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003EActivityPub: add Follow Me blockhttps://redacted.a8c.com/D120281 D120281u003C/preu003Eu003Cpu003EThat was hard because u003Ccodeu003Eawku003C/codeu003E separates on spaces and would only give me the first word of the title in my first attempt:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003Earc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf u0022%s\nhttps://redacted.a8c.com/%s %s\nu0022, $2, $1, $1 }'ActivityPub: https://redacted.a8c.com/D120281 D120281u003C/preu003Eu003Cpu003ESo I decided to ask both Copilot and CodeLlama (u003Ccodeu003E7b-instruct-q5_K_Mu003C/codeu003E) for help. I kept my wording a bit vague to see how they would do. u003C/pu003Eu003Cpu003ESpoiler: they both contributed something useful, and it was doubly interesting that CodeLlama told me I was doing it wrong.u003C/pu003Eu003Ch3 class=u0022wp-block-headingu0022u003EPromptu003C/h3u003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003EI have the following shell script, which parses an ID and the title out of arc which output.arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf u0022%s\nhttps://code.a8c.com/%s %s\nu0022, $2, $1, $1 }'arc which line that concerns me looks like:` D120281 ActivityPub: add Follow Me block`The trouble is that awk thinks $2 is only the first word of the remaining line, just u0022ActivityPubu0022. I want the whole line in the output. How can I achieve that?u003C/preu003Eu003Ch3 class=u0022wp-block-headingu0022u003ECopilot responseu003C/h3u003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003ETo include the entire title in the output, you can modify the awk command to print all fields starting from the second field. Here's the updated command:```arc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ printf u0022%s\nhttps://code.a8c.com/%s u0022, $2, $1; for (i=2; iu0026lt;=NF; i++) printf u0022%s u0022, $i; printf u0022\nu0022 }'```'In this updated command, we're using a for loop to print all fields starting from the second field. The NF variable represents the total number of fields in the input, so we're looping from the second field (i=2) to the last field (iu0026lt;=NF). We're then using printf to print each field separated by a space.This should output the entire title in the output, along with the ID and URL.u003C/preu003Eu003Ch3 class=u0022wp-block-headingu0022u003ECodeLlama responseu003C/h3u003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003ETo get the whole line in the output, you need to use a different approach with awk. Instead of using $2 to print out the second word of the remaining line, you can use $0 to print out the entire line. Here's an updated version of your shell script that uses awk to print out the whole line:```arc which | grep -E 'D[0-9]+\ (.*)' -o | awk '{ printf u0022https://code.a8c.com/%s %s\nu0022, $1, $0 }'```In this version of the script, we use awk to print out the entire line ($0) as part of the URL that we want to output. We also use $1 to print out the identifier followed by one or more numbers. I hope this helps! Let me know if you have any questions.u003C/preu003Eu003Cpu003ENeither approach was exactly what I wanted, but I learned something from both and was able to use this to produce my desired outcome:u003C/pu003Eu003Cpre class=u0022wp-block-syntaxhighlighter-codeu0022u003Earc which | grep -E 'D[0-9]+ (.*)' -o | awk '{ for (i=2; iu0026lt;=NF; i++) printf u0022%s u0022, $i; printf u0022\n %s https://code.a8c.com/%s\nu0022, $1, $1; }'u003C/preu003Eu003Cpu003EIn any case, googling for code help is probably pretty much dead. And this is the first generation of these tools, which will only get better.u003C/pu003E"},"icon":{"type":"Image","url":"http://wp.test/wp-content/uploads/2024/09/image-3-150x150.jpg","mediaType":"image/jpeg"},"published":"2023-08-31T16:41:15Z","tag":[],"url":"http://wp.test/2023/08/comparing-metas-codellama-to-openais-copilot/","to":["https://www.w3.org/ns/activitystreams#Public"],"cc":["http://wp.test/wp-json/activitypub/1.0/actors/1/followers"],"mediaType":"text/html","replies":{"id":"http://wp.test/wp-json/activitypub/1.0/posts/140/replies","type":"Collection","first":{"id":"http://wp.test/wp-json/activitypub/1.0/posts/140/replies?page=0","type":"CollectionPage","partOf":"http://wp.test/wp-json/activitypub/1.0/posts/140/replies","items":[]}},"sensitive":false} Post Content (had to use a gist because the post itself was messing up formatting here with https://gist.github.com/mattwiebe/e644bb43d7bea38aa781acba6b89ae8a The error happens inside the |
2f7c2ff
to
32e3213
Compare
@mattwiebe Could I ask you to try again, now that the slashing PR has been merged and was rebased into this one? |
Yup I tried it and it works nicely. But! It appears to not be doing the offset properly, it's just processing the first 50 posts over and over again. It was doing this both before and after my changes above, but it also seems to be failing some relevant tests. Eg always this entry in the
|
includes/class-migration.php
Outdated
@@ -173,7 +173,7 @@ public static function maybe_migrate() { | |||
if ( \version_compare( $version_from_db, '4.7.3', '<' ) ) { | |||
add_action( 'init', 'flush_rewrite_rules', 20 ); | |||
} | |||
if ( \version_compare( $version_from_db, 'unreleased', '<' ) ) { | |||
if ( \version_compare( $version_from_db, '4.7.4', '<' ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why adding a version number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aaaah, you needed it for testing!?! ;)
includes/class-migration.php
Outdated
* @param string $callback The callback to run. | ||
* @param mixed ...$args The arguments to pass to the callback. | ||
*/ | ||
public static function async_upgrade( $callback, ...$args ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static function async_upgrade( $callback, ...$args ) { | |
public static function async_upgrade( $callback ) { |
Should we get rid of this now that we're calling func_get_args()
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I wonder if func_get_args is even needed…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not be with the new structure. But this works :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love that the async_upgrade
gives us a proof-of-concept for later doing other rolling operations later, like batching in send_activity_to_followers()
- at some point there will be too many inboxes, and we'd also like to spread out the updates to prevent DDOS spikes.
Everything is working as I'd expect, now.
Proposed changes:
activitypub_upgrade
event for async upgrades with locking.add_to_outbox
function that skips any federation considerations.Other information:
Testing instructions:
/includes/class-migration.php
to a version of your choice (e.g.4.7.3.1
).activitypub.php
to the same version./wp-admin/edit.php?post_type=ap_outbox
on your test site to see the results.