-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Migration utf8 -> utf8mb4 and test emojis for comment and answer #3007
Conversation
Generated by 🚫 Danger |
I restarted this, but if you see this error you can restart it too by closing and reopening the PR! |
@jywarren Tests passed in travis. But when i tried on local machine using mysql it failed. |
Hmm, I think we may need to redo the mysql encoding as in #2665 Do you want to try the suggestion @icarito made on changing encoding to |
Sure! So should we change the encoding of the whole db or just the tables or columns we need? |
Let's start by doing just the columns, and if they work, we can probably
move the whole db? But i think that's a Q for @icarito.
…On Mon, Jul 9, 2018 at 3:02 PM Naman Gupta ***@***.***> wrote:
Sure! So should we change the encoding of the whole db or just the tables
or columns we need?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3007 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ_PnrcpkUfRUpPYEZnY25eLJA0bhks5uE6iFgaJpZM4VGgFv>
.
|
@jywarren I have changed encoding for comment field in comments table. |
Pushing to unstable |
Ah, did this end up working? |
As the person is inactive for more than a month, I am closing the PR. In case you want to push changes please feel free to open a new PR. |
Actually this one I'd like to reopen and take on myself - we really do need to solve this issue for many tables. Thanks! |
I think this last reference that I shared looks the simplest / most modern.
That should be about it!
Also note that if there are indexed fields of type VARCHAR(255) then we may need the following initializer:
Credit for these instructions to David Chua. |
# changing table that will store unicode execute: | ||
execute "ALTER TABLE comments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin" | ||
# changin string/text column with unicode content execute: | ||
execute "ALTER TABLE comments MODIFY comment VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really feel comfortable just truncating the field?
Let's not truncate the field just yet (line 6), it shouldn't be necesssary.
Also let's use collation utf8mb4_0900_ai_ci
it seems to be preferred (we run MariaDB 10.2 which supports this).
Thanks so much for pushing this along, Sebastian!!! I will plan to go
through it carefully.
…On Tue, May 21, 2019 at 1:03 AM Sebastian Silva ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In db/migrate/20180721114724_change_comments_to_utf8mb4.rb
<#3007 (comment)>:
> @@ -0,0 +1,8 @@
+class ChangeCommentsToUtf8mb4 < ActiveRecord::Migration[5.2]
+ def change
+ # changing table that will store unicode execute:
+ execute "ALTER TABLE comments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
+ # changin string/text column with unicode content execute:
+ execute "ALTER TABLE comments MODIFY comment VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
Do we really feel comfortable just truncating the field?
Let's not truncate the field just yet (line 6), it shouldn't be necesssary.
Also let's use collation utf8mb4_0900_ai_ci it seems to be preferred (we
run MariaDB 10.2 which supports this).
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#3007?email_source=notifications&email_token=AAAF6J5BGH35JJIYCORD2P3PWN67VA5CNFSM4FI2AFX2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOBZFXJWA#pullrequestreview-239826136>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAF6J6BBPDFEVFDHLUWLITPWN67VANCNFSM4FI2AFXQ>
.
|
Hi, just checking if you've gotten stuck on this at all, or if I could help in any way? Thanks! |
I wonder if this could be a hall-of-fame?
…On Fri, Jan 17, 2020 at 1:13 AM Sidharth Bansal ***@***.***> wrote:
Hi, just checking if you've gotten stuck on this at all, or if I could
help in any way? Thanks!
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#3007?email_source=notifications&email_token=AAAF6J4Q5VXEJPZ5EO5CIY3Q6FD7XA5CNFSM4FI2AFX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJGSFGI#issuecomment-575480473>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAF6J25622IGECQZ7UQHP3Q6FD7XANCNFSM4FI2AFXQ>
.
|
Request accepted: Here is the task https://codein.withgoogle.com/dashboard/tasks/6653314876309504/ |
Hey, I believe it would be better if the GCI task could also be broken down into multiple hard/hof tasks, according to the steps mentioned above, since it would be difficult to complete all the steps by a single person. Thanks. |
Jeff and I agreed to prioritize this so I'll be reviewing the reference https://mathiasbynens.be/notes/mysql-utf8mb4 again and try to break it down into smaller pieces so it is clear what needs to be done. |
Models which could possibly contain emoji, as far as I can determine:
|
Hi @icarito - what if we, temporarily, screened for emoji using regex, using an ActiveRecord validation, before we save records. Here is some regex we might be able to use: https://www.regextester.com/106421 https://github.com/mathiasbynens/emoji-regex Actually i think we already have some ability to scan for emoji: plots2/app/helpers/application_helper.rb Lines 9 to 46 in 591f59a
Wait, we can use the REGEX provided by that lib, already installed, to replace them! https://github.com/janlelis/unicode-emoji Yes, something like this, in the require "unicode/emoji"
require "gemoji"
string = "String which contains all kinds of emoji:😴▶️🛌🏽🇵🇹🏴2️⃣🤾🏽♀️"
string.match(Unicode::Emoji::REGEX) do |match|
string.gsub!(match.to_s, ":" + Emoji.find_by_unicode(match.to_s) + ":")
end Using |
Making a new issue for this: #7469 🙌 |
I guess this would be an interim solution but the thing is that emojis are not the only kind of invalid characters for this encoding. For instance acording to https://mathiasbynens.be/notes/mysql-utf8mb4 - he found the issue by trying to insert the U+1D306 TETRAGRAM FOR CENTRE (𝌆) symbol. So I think it'd be a lot of effort for a few cases, and rather the encoding migration should be done. |
Hi @jywarren @namangupta01 I finally got around to working on this and made it into a new PR here #7479 - also, after I did, I found a snippet that may be a nicer / more complete migration? Please check and review if we should test this in unstable. Migration is not reversible so I guess we should try to get our tests to work (get unstable to fail before pushing this?). Let me know what you think! I also found that not all emojis should trigger this problem, only those that require 4 bytes such as U+1F4A9 PILE OF POO (💩) or U+1D306 TETRAGRAM FOR CENTRE (𝌆) Regards, |
Okay an update on this. I tried the latest patch on |
Restoring a fresh dump from backup for unstable. |
I am feeling and channeling the focus on this issue. 👓 ☄️ 🦸 Here's my current assessment and plan. Since Naman's well started PR, I expanded the code to convert every one of our database tables and fields to the new encoding, but the times we tried to run this in the staging database, it crashes and corrupts the database, probably because of the size of our data overflowing our RAM. The plan I have now is as follows:
|
This article from Moodle more or less describes the procedure we want to follow: |
Thanks @namangupta01 @jywarren and everyone who contributed to this. I'm closing it in favour of #8003 which is the solution that worked! Good try though, it would've worked with a smaller database. |
Make sure these boxes are checked before your pull request (PR) is ready to be reviewed and merged. Thanks!
#2928
rake test
fixes #0000
-style reference to original issue #@publiclab/reviewers
for help, in a comment belowIf tests do fail, click on the red
X
to learn why by reading the logs.Please be sure you've reviewed our contribution guidelines at https://publiclab.org/contributing-to-public-lab-software
We have a loose schedule of reviewing and pulling in changes every Tuesday and Friday, and publishing changes on Fridays.
Thanks!