The rest of this article gives simple methodologies for deploying and managing the fractal application by editing the Jsonnet configuration and then applying these changes. In order to reproduce this case study there are a few pre-requisites:
Once those are satisfied, follow these steps:
To deploy the application, run make -j. This should start running 3 Packer builds in parallel. In a separate terminal, use tail -f *.log in order to watch their progress. When the images are built, Terraform will show you the proposed changes (a long list of resources to be created). Enter y to confirm the changes. In time, the application will be deployed. Now you only need the appserv ip address to connect to it. You can get this using "gcloud compute addresses list" or by navigating the Google Cloud Platform console to "networks". Opening that ip in a web browser should take you to the fractal application itself.
The application can be brought down again by running terraform destroy. Terraform remembers the resources it created via the terraform.tfstate file. This will not destroy the Packer images; they can be deleted from the console or from gcloud.
Managing a production web service usually means making continual changes to it instead of bringing the whole thing down and up again, as we will shortly discuss. However it is still useful to bring up a fresh application for testing / development purposes. A copy or variant of the production service can be brought up concurrently with the production service (e.g., in a different project). This can be useful for QA, automatic integration testing, or load testing. It is also useful for training new ops staff or rehearsing a complex production change in a safe environment.
Managing the Cassandra cluster requires a combination of configuration alteration (to control the fundamental compute resources) and use of the Cassandra command line tool "nodetool" on the instances themselves. For example "nodetool status fractal" on any Cassandra instance will give information about the whole cluster.
To add a new node (expand the cluster), simply edit service.jsonnet
, add another
instance in the Terraform configuration and run make -j. Confirm the changes (the only
change should be the new instance, e.g., db4). It will start up and soon become part of the
cluster.
To remove a node, first decommission it using nodetool -h HOSTNAME decommission. When that
is complete, destroy the actual instance by updating service.jsonnet
to remove the
resource and run make -j again. Confirm the removal of the instance. It is OK to remove
the first node, but its replacement should use GcpTopUpMixin instead of
GcpStarterMixin. You can recycle all of the nodes if you do it one at a time,
which is actually necessary for emergency kernel upgrades.
If a node is permanently and unexpectedly lost (e.g., a disk error), or you removed it without first decommissioning it, the cluster will remain in a state where it expects the dead node to return at some point (as if it were temporarily powered down or on the wrong side of a network split). This situation can be rectified with nodetool removenode UUID, run from any other node in the cluster. In this case it is probably also necessary to run nodetool repair on the other nodes to ensure data is properly distributed.
To introduce new functionality to the application server it is useful to divert a small proportion of user traffic to the new code to ensure it is working properly. After this initial "canary" test has passed, the remaining traffic can then be confidently transferred to the new code. The same can be said for the tile generation service (e.g. to update the C++ code).
The model used by this example is that the application server logic and static content are
embedded in the application server image. Canarying consists of building a new image and
then rolling it out gradually one instance at a time. Each step of this methodology
consists of a small modification to service.jsonnet
and then running make -j.
service.jsonnet
to update the date embedded in
the name field to the current date, and also make any desired changes to the
configuration of the image or any of the referenced Python / HTML / CSS files.
service.jsonnet
. The easiest way to do this is to copy-paste the
existing definition, and modify the image name in the copy to reflect the new image. This
allows easily rolling back by deleting the copied code, and you can also transition the
rest of the nodes by deleting the original copy. Thus, the duplication is only temporary.
At this point the configuration may look like this:
google_compute_instance: { ["appserv" + k]: resource.FractalInstance(k) { name: "appserv" + k, image: "appserv-v20141222-0300", ... } for k in [1, 2, 3] } + { ["appserv" + k]: resource.FractalInstance(k) { name: "appserv" + k, image: "appserv-v20150102-1200", ... } for k in [4] } + ...Also modify the appserv target pool to add the new instance 4, thus ensuring it receives traffic.
appserv: { name: "appserv", health_checks: ["${google_compute_http_health_check.fractal.name}"], instances: [ "%s/appserv%d" % [zone(k), k] for k in [1, 2, 3, 4] ], },
google_compute_instance: { ["appserv" + k]: resource.FractalInstance(k) { name: "appserv" + k, image: "appserv-v20141222-0300", ... } for k in [] } + { ["appserv" + k]: resource.FractalInstance(k) { name: "appserv" + k, image: "appserv-v20150102-1200", ... } for k in [4, 5, 6] } + ...
We have shown how Jsonnet can be used to centralize, unify, and manage configuration for a realistic cloud application. We have demonstrated how programming language abstraction techniques make the configuration very concise, with re-usable elements separated into template libraries. Complexity is controlled in spite of the variety of different configurations, formats, and tasks involved.
Finally we demonstrated how with a little procedural glue to drive other processes (the humble UNIX make), we were able to build an operations methodology where many aspects of the service can be controlled centrally by editing a single Jsonnet file and issuing make -j update commands.