openforcefield · mattwthompson · Mar 5, 2025 · Feb 28, 2025 · Mar 5, 2025
diff --git a/examples/protein_ligand/protein_ligand.ipynb b/examples/protein_ligand/protein_ligand.ipynb
@@ -345,23 +345,50 @@
    "cell_type": "markdown",
    "id": "30",
    "metadata": {},
+   "source": [
+    "Before we move on to the solvent, we should check the partial charge of our protein-ligand complex. Since we want to run simulations with net neutral charge, we'll add counter-ions to balance the total charge of the protein-ligand complex. To be safe, we can also double-check that the total charge in the `docked_intrcg` object matches the sum of the formal charges of the `Molecule` representations which more closely describe our intended chemistry."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "total_charge = round(sum(docked_intrcg[\"Electrostatics\"].charges.values()), 3)\n",
+    "\n",
+    "assert total_charge == protein.total_charge + ligand.total_charge, (\n",
+    "    f\"Total charge of the system is {total_charge}, not {protein.total_charge + ligand.total_charge}\"\n",
+    ")\n",
+    "total_charge"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32",
+   "metadata": {},
    "source": [
     "### Solvent component\n",
     "\n",
-    "We'll reuse the Sage force field from earlier here, as it includes parameters for TIP3P water, but first we need coordinates for our solvated system. This is a portion of the OpenFF ecosystem that will be streamlined in the future, but we can use a PACKMOL wrapper to get the job done. We're adding a fixed amount of water for this quick example so the density will be wrong, but imagine it's right."
+    "We'll reuse the Sage force field from earlier here, as it includes parameters for TIP3P water, but first we need coordinates for our solvated system. This is a portion of the OpenFF ecosystem that will be streamlined in the future, but we can use a PACKMOL wrapper to get the job done. We're adding a fixed amount of water for this quick example so the density will be wrong, but imagine it's right.\n",
+    "\n",
+    "We'll also add three chloride ions to balance the net charge of the protein-ligand complex. For a production run, you'll probably want a more realistic salt concentration - the goal here is just to get something running with a net neutral charge."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "31",
+   "id": "33",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Construct a water molecule\n",
     "water = Molecule.from_smiles(\"O\")\n",
     "water.generate_conformers(n_conformers=1)\n",
     "\n",
+    "ion = Molecule.from_smiles(\"[Cl-]\")\n",
+    "ion.generate_conformers(n_conformers=1)\n",
+    "\n",
     "# Come up with a box size based on the size of the protein plus a 2 nm buffer\n",
     "xyz = protein.conformers[0]\n",
     "centroid = xyz.sum(axis=0) / xyz.shape[0]\n",
@@ -372,8 +399,8 @@
     "n_water = 1000\n",
     "\n",
     "packed_topology = pack_box(\n",
-    "    molecules=[water],\n",
-    "    number_of_copies=[n_water],\n",
+    "    molecules=[water, ion],\n",
+    "    number_of_copies=[n_water, int(total_charge.m)],\n",
     "    solute=docked_intrcg.topology,\n",
     "    box_vectors=box_vectors,\n",
     ")"
@@ -382,7 +409,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "32",
+   "id": "34",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -393,7 +420,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "33",
+   "id": "35",
    "metadata": {},
    "source": [
     "And now we can create the interchange! The `Topology` we got from `pack_box` includes the positions we'll later apply to the solvated complex. For now, we need an `Interchange` that represents the water component. We can pass it Sage, wihch contains TIP3P parameters, and a topology of `n_water` water molecules without worrying about the positions, since we'll just set those later."
@@ -402,27 +429,27 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "34",
+   "id": "36",
    "metadata": {},
    "outputs": [],
    "source": [
     "water_intrcg = Interchange.from_smirnoff(\n",
     "    force_field=ForceField(\"openff_unconstrained-2.0.0.offxml\"),\n",
-    "    topology=[water] * n_water,\n",
+    "    topology=[water] * n_water + [ion] * int(total_charge.m),\n",
     ")"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "35",
+   "id": "37",
    "metadata": {},
    "source": [
     "## Putting the system together"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "36",
+   "id": "38",
    "metadata": {},
    "source": [
     "Now that we've got all the pieces, we can combine the docked protein-ligand system with the solvent, and add in the positions and box vectors we just worked out\n",
@@ -436,7 +463,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "37",
+   "id": "39",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -448,7 +475,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "38",
+   "id": "40",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -476,23 +503,23 @@
   },
   {
    "cell_type": "markdown",
-   "id": "39",
+   "id": "41",
    "metadata": {},
    "source": [
     "## Exporting to simulation engines"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "40",
+   "id": "42",
    "metadata": {},
    "source": [
     "Finally, we can export the final Interchange object to models understood by various simulation engines. Some of these exports are not yet optimized for large files."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "41",
+   "id": "43",
    "metadata": {},
    "source": [
     "### OpenMM"
@@ -501,7 +528,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "42",
+   "id": "44",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -512,7 +539,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "43",
+   "id": "45",
    "metadata": {},
    "source": [
     "### Amber"
@@ -521,7 +548,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "44",
+   "id": "46",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -531,7 +558,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "45",
+   "id": "47",
    "metadata": {},
    "source": [
     "### LAMMPS"
@@ -540,7 +567,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "46",
+   "id": "48",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -551,7 +578,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "47",
+   "id": "49",
    "metadata": {},
    "source": [
     "### GROMACS"
@@ -560,7 +587,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "48",
+   "id": "50",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -569,7 +596,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "49",
+   "id": "51",
    "metadata": {},
    "source": [
     "## Energy tests\n",
@@ -582,7 +609,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "50",
+   "id": "52",
    "metadata": {
     "tags": []
    },
@@ -593,7 +620,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "51",
+   "id": "53",
    "metadata": {},
    "source": [
     "For convenience there is a function `get_summary_data` that runs through all available engines and summarizes the results in a Pandas DataFrame. (This cell might take a minute to execute). We're setting the argument `_engines` to a non-defualt value so that the LAMMPS driver is skipped even if it's available; normally this argument is unnecessary if you don't have LAMMPS installed on your system."
@@ -602,7 +629,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "52",
+   "id": "54",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -612,7 +639,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "53",
+   "id": "55",
    "metadata": {},
    "source": [
     "We can see from these large energy differences that something is wrong - this stems from the experimental `Interchange` combination operation producing incorrect results."
@@ -621,7 +648,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "54",
+   "id": "56",
    "metadata": {
     "tags": []
    },
@@ -632,7 +659,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "55",
+   "id": "57",
    "metadata": {},
    "source": [
     "In the future this should work more smoothly with identical energies reported by each engine. In lieu of that, we can evaluate the energy of each _component_ that we previously added together. This requires setting box vectors for each component and also setting the water positions, which we skipped earlier since we were able to use PACKMOL results directly."
@@ -641,7 +668,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "56",
+   "id": "58",
    "metadata": {
     "tags": []
    },
@@ -650,13 +677,13 @@
     "for subset in [ligand_intrcg, protein_intrcg, water_intrcg]:\n",
     "    subset.box = system_intrcg.box\n",
     "\n",
-    "water_intrcg.positions = system_intrcg.positions[-3000:,]"
+    "water_intrcg.positions = system_intrcg.positions[-3003:,]"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "57",
+   "id": "59",
    "metadata": {
     "tags": []
    },
@@ -668,7 +695,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "58",
+   "id": "60",
    "metadata": {
     "tags": []
    },
@@ -680,7 +707,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "59",
+   "id": "61",
    "metadata": {
     "tags": []
    },
@@ -691,11 +718,37 @@
   },
   {
    "cell_type": "markdown",
-   "id": "60",
+   "id": "62",
    "metadata": {},
    "source": [
     "We can see from these results that each engine reports nearly identical energies for each individual component."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63",
+   "metadata": {},
+   "source": [
+    "## Verifying charge neutrality\n",
+    "\n",
+    "Earlier, when adding waters to the system, we added three chloride ions to balance the net +3 charge of the protein-ligand complex. Did we actually end up with a charge-neutral system?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import openmm\n",
+    "\n",
+    "nbforce = openmm_system.getForces()[0]\n",
+    "charge = 0 * openmm.unit.elementary_charge\n",
+    "for i in range(nbforce.getNumParticles()):\n",
+    "    charge += nbforce.getParticleParameters(i)[0]\n",
+    "charge"
+   ]
   }
  ],
  "metadata": {
@@ -715,7 +768,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.11"
+   "version": "3.12.9"
   }
  },
  "nbformat": 4,