You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These datasets undergo the following preprocessing steps -
117
167
@@ -124,7 +174,7 @@ These datasets undergo the following preprocessing steps -
124
174
5. Then, duplicate values are removed from this new dataset.
125
175
6. Finally, only the instances that match the regex pattern ```^[A-Za-z0-9_-]{0,11}$``` are kept, while the rest are removed. This keeps the number of instances to a minimum by removing unnecessary words or phrases.
126
176
127
-
Preprocessing yields a dataset of 2885 instances, that helps ensure the generated IDs are safe for using in URLs and for sharing on social media platforms.
177
+
Preprocessing yields a dataset of 3279 instances, that helps ensure the generated IDs are safe for using in URLs and for sharing on social media platforms.
128
178
129
179
The preprocessing was done on this [Colab Jupyter notebook](https://colab.research.google.com/drive/1LRA3_Qa_0qCL9bkfo06ztjWkr-aP4rz1).
0 commit comments