2
2
3
3
.. _use-crate-node :
4
4
5
- ===============================================
6
- Troubleshooting with the ``crate-node `` command
7
- ===============================================
5
+ ==========================
6
+ The ``crate-node `` command
7
+ ==========================
8
8
9
- This document shows you how to troubleshoot CrateDB nodes with the
10
- ` crate-node `_ command. Using this command, you can:
9
+ Use the ` crate-node `_ command to troubleshoot CrateDB cluster nodes.
10
+ Using this command, you can:
11
11
12
- * Repurpose nodes and clean up their old data
12
+ * Repurpose nodes and clean up their old data.
13
13
* Force the election of a master node (and the creation of a new cluster) in
14
- the event that you lose too many nodes to be able to form a quorum
15
- * Detach nodes from an old cluster so they can be moved to a new cluster
14
+ the event that you lose too many nodes to be able to form a quorum.
15
+ * Detach nodes from an old cluster so they can be moved to a new cluster.
16
16
17
17
.. rubric :: Table of contents
18
18
@@ -28,38 +28,35 @@ This document shows you how to troubleshoot CrateDB nodes with the
28
28
Repurpose a node
29
29
================
30
30
31
+ .. rubric :: About
32
+
31
33
In a situation where you have irrecoverably lost the majority of the
32
34
master-eligible nodes in a cluster, you may need to form a new cluster.
33
-
34
35
When forming a new cluster, you may have to change the `role `_ of one or more
35
36
nodes. Changing the role of a node is referred to as *repurposing * a node.
36
37
37
38
Each node checks the contents of its :ref: `data path <crate-reference:conf-env >`
38
- at startup. If CrateDB
39
- discovers unexpected data, it will refuse to start. Specifically :
39
+ at startup. If CrateDB discovers unexpected data, it will refuse to start.
40
+ The specific rules are :
40
41
41
42
- Nodes configured with `node.data `_ set to ``false `` will refuse to start if
42
- they find any shard data at startup
43
+ they find any shard data at startup.
43
44
44
45
- Nodes configured with both `node.master `_ set to ``false `` and `node.data `_
45
46
set to ``false `` will refuse to start if they have any index metadata at
46
- startup
47
+ startup.
47
48
48
49
The `crate-node `_ :ref: `repurpose command <crate-reference:cli-crate-node-commands >`
49
- can help you clean up the necessary
50
- node data so that CrateDB can be restarted with a new role.
50
+ can help you clean up the necessary node data, so that CrateDB can be restarted
51
+ with a new role.
51
52
52
-
53
- Procedure
54
- ---------
53
+ .. rubric :: Procedure
55
54
56
55
To repurpose a node, first of all, you must stop the node.
57
-
58
56
Then, update the settings `node.data `_ and `node.master `_ in the ``crate.yml ``
59
57
:ref: `configuration file <crate-reference:config >` as needed.
60
-
61
58
The ``node.data `` and ``node.master `` settings can be configured in four
62
- different ways, each corresponding to a different type of node:
59
+ different ways, each corresponding to a different type of node.
63
60
64
61
+-------------------+------------------------+-----------------------------+
65
62
| Role | Configuration | After repurposing |
@@ -95,7 +92,7 @@ deleted (i.e., "cleaned up") after repurposing the node to that configuration.
95
92
Before running the ``repurpose `` command, make sure that any data you want
96
93
to keep is available on other nodes in the cluster.
97
94
98
- Then, run the ``repurpose `` command:
95
+ Then, invoke the ``repurpose `` command.
99
96
100
97
.. code-block :: console
101
98
@@ -112,33 +109,36 @@ Then, run the ``repurpose`` command:
112
109
Node successfully repurposed to master and no data.
113
110
114
111
As mentioned in the command output, you can pass in ``-v `` to get a more
115
- verbose output, like so:
112
+ verbose output.
116
113
117
114
.. code-block :: console
118
115
119
116
sh$ ./bin/crate-node repurpose -v
120
117
121
- Finally, start the node again.
122
-
123
- The node has been successfully repurposed.
118
+ Finally, start the node again. After that, the node has been successfully
119
+ repurposed.
124
120
125
121
126
122
.. _crate-node-unsafe-bootstrap :
127
123
128
124
Perform an unsafe cluster bootstrap
129
125
===================================
130
126
127
+ .. rubric :: About
128
+
131
129
When communication is lost between one or more nodes in a cluster (e.g., during
132
- a * cluster partition* ), the situation is assumed to be temporary and safeguards
130
+ a ` network partition`_ ), the situation is assumed to be temporary and safeguards
133
131
exist to prevent the election of a master node unless a `quorum `_ can be
134
132
established.
135
133
136
134
However, if the situation is permanent (i.e., you have irrecoverably lost a
137
- majority of the nodes in your cluster), you will need to force the election of
135
+ majority of the nodes in your cluster), also known as a `split-brain `_ situation,
136
+ you will need to force the election of
138
137
a master. Forcing a master election without quorum is referred to as an *unsafe
139
138
cluster bootstrap *.
140
139
141
- The `crate-node `_ ``unsafe-bootstrap `` command can help you choose a new master
140
+ The :ref: `unsafe-bootstrap command <crate-reference:cli-crate-node-commands >`
141
+ can support you to choose a new master
142
142
node and subsequently perform an unsafe cluster bootstrap.
143
143
144
144
.. WARNING ::
@@ -160,8 +160,7 @@ node and subsequently perform an unsafe cluster bootstrap.
160
160
have access to the file system.
161
161
162
162
163
- Procedure
164
- ---------
163
+ .. rubric :: Procedure
165
164
166
165
Before you continue, you must stop all master-eligible nodes in the cluster.
167
166
@@ -175,12 +174,11 @@ Before you continue, you must stop all master-eligible nodes in the cluster.
175
174
Once all master-eligible nodes in the cluster have been stopped, you can
176
175
manually select a new master.
177
176
178
- To help you select a new master, the ``unsafe-bootstrap `` command returns
179
- information about the node cluster state as a pair of values in the form
180
- *(term, version) *.
181
-
177
+ To support you selecting a new master node, the ``unsafe-bootstrap `` command
178
+ returns information about the node cluster state as a pair of values in the
179
+ form *(term, version) *.
182
180
You can gather this information (safely) by issuing the ``unsafe-bootstrap ``
183
- command and answering "no" (``n ``) at the confirmation prompt, like so:
181
+ command and answering "no" (``n ``) at the confirmation prompt.
184
182
185
183
.. code-block :: console
186
184
@@ -211,8 +209,8 @@ value, select any one of them.
211
209
that you elect a master node with the freshest state data. This, in turn,
212
210
minimizes the potential for data loss and inconsistency.
213
211
214
- Once you have selected a node to elect to master, run the ``unsafe-bootstrap ``
215
- command on that node and answer yes (``y ``) at the confirmation prompt:
212
+ Once you have selected a node to elect to master, invoke the ``unsafe-bootstrap ``
213
+ command on that node and answer yes (``y ``) at the confirmation prompt.
216
214
217
215
.. code-block :: console
218
216
@@ -226,46 +224,45 @@ command on that node and answer yes (``y``) at the confirmation prompt:
226
224
227
225
Confirm [y/N] y
228
226
229
- If the operation was successful, the command will output:
227
+ If the operation was successful, the program will acknowledge it.
228
+ **Note: ** This success message indicates that the operation was completed.
229
+ You may still experience data loss and inconsistencies.
230
230
231
231
.. code-block :: console
232
232
233
233
Master node was successfully bootstrapped
234
234
235
- .. NOTE ::
236
-
237
- This success message indicates that the operation was completed. You may
238
- still experience data loss and inconsistencies.
239
-
240
- Start the bootstrapped node and verify that it has started a new cluster with
235
+ Now, start the bootstrapped node and verify that it has started a new cluster with
241
236
one node and elected itself as the master.
242
237
243
238
Before you can add the rest of the nodes to the new cluster, you must detach
244
239
them from the old cluster (see the :ref: `next section
245
240
<crate-node-detach-cluster>`).
246
241
247
- When that's done, start the nodes and verify that they join the new cluster.
242
+ After that's done, start the nodes and verify that they join the new cluster.
248
243
249
244
.. NOTE ::
250
245
251
246
Once the new cluster is up-and-running and all recoveries are complete, you
252
- are responsible for assessing the cluster for data loss and
253
- inconsistencies.
247
+ are advised to assess the database for data loss and inconsistencies.
254
248
255
249
256
250
.. _crate-node-detach-cluster :
257
251
258
252
Detach a node from its cluster
259
253
==============================
260
254
255
+ .. rubric :: About
256
+
261
257
To protect nodes from inadvertently rejoining the wrong cluster (e.g., in the
262
258
event of a network partition), each node binds to the first cluster it joins.
263
259
264
260
However, if a cluster has permanently failed (see the :ref: `previous section
265
261
<crate-node-unsafe-bootstrap>`) you must detach nodes before you can move them
266
262
to a a new cluster.
267
263
268
- The `crate-node `_ ``detach-cluster `` command can help you move a node to a new
264
+ The :ref: `detach-cluster command <crate-reference:cli-crate-node-commands >`
265
+ supports you moving a node to a new
269
266
cluster by resetting the cluster it is bound to (i.e., *detaching * it from its
270
267
existing cluster).
271
268
@@ -278,8 +275,7 @@ existing cluster).
278
275
cluster bootstrap <crate-node-unsafe-bootstrap>`.
279
276
280
277
281
- Procedure
282
- ---------
278
+ .. rubric :: Procedure
283
279
284
280
To detach a node, run:
285
281
@@ -293,7 +289,7 @@ To detach a node, run:
293
289
294
290
Confirm [y/N] y
295
291
296
- You should see this:
292
+ A corresponding message confirms success.
297
293
298
294
.. code-block :: console
299
295
@@ -304,14 +300,16 @@ When the node is started again, it will be able to join a new cluster.
304
300
.. NOTE ::
305
301
306
302
You may also have to update the :ref: `discovery configuration
307
- <crate-reference:conf_discovery>` so that
303
+ <crate-reference:conf_discovery>`, so that
308
304
nodes are able to find the new cluster.
309
305
310
306
311
307
.. _crate-node : https://cratedb.com/docs/crate/reference/en/latest/cli-tools.html#cli-crate-node
312
308
.. _data path : https://cratedb.com/docs/crate/reference/en/latest/config/environment.html#application-variables
309
+ .. _network partition : https://en.wikipedia.org/wiki/Network_partition
313
310
.. _node.data : https://cratedb.com/docs/crate/reference/en/latest/config/node.html#node-types
314
311
.. _node.master : https://cratedb.com/docs/crate/reference/en/latest/config/node.html#node-types
315
312
.. _quorum : https://cratedb.com/docs/crate/reference/en/latest/concepts/clustering.html#master-node-election
316
313
.. _role : https://cratedb.com/docs/crate/reference/en/latest/config/node.html#node-types
314
+ .. _split-brain : https://en.wikipedia.org/wiki/Split-brain_(computing)
317
315
.. _UUID : https://en.wikipedia.org/wiki/Universally_unique_identifier
0 commit comments