Build caching with Docker Compose and Docker

This is just a quick note on building images with docker-compose and how caching may affect iterative development.

For demonstration purposes, imagine a simple project, as follows…

Project Directory

Dockerfile contents

docker-compose.yml contents

If you start the service, using…

…Docker Compose will helpfully build an image from the Dockerfile  run it.  So far so good.

Now, if I modify the code in  run.py tainting the file…

… and then down / up the service, will the running container include the latest changes in run.py , or an older version?  Hmm…!

The answer is… the older build of the image will be reused!  The running container will not reflect my changes to run.py.

So why is this?  Because an image already exists satisfying the spec in docker-compose.yml, Docker Compose will run the existing image unless explicitly instructed to build the image again.  It’s easy to specify a build step, either as a standalone command to docker-compose or as a parameter to the up  command:

This is fairly intelligent, as it will reuse cached layers as much as possible and only rebuild the parts of the image that must be rebuilt.

If you don’t want any layers to be reused (for example, if one of your Dockerfile commands installs a package and the package has changed externally), you can add the  --no-cache option:

However, this will force rebuilding all the layers, so in my example above it needs to install redis again rather than just use the existing cache.

Hope this helps.

scoop

If you aren’t using scoop, and you are working in a Windows-based workstation environment, wow, are you missing out.

Key benefits for me: It’s fast, transparent, has a good library of software, typically installs stuff to user space, and has a robust shim/alias system that makes it trivially easy to install and maintain things like HashiCorp tools, git, grep, and other essentials with almost no effort.

The documentation is pretty good too: https://github.com/lukesampson/scoop/wiki

Keeping software up to date is a matter of one command:

Additional buckets (the known ones, or other community buckets) let you expand the software available:

I can’t recommend it enough!  Hope this helps.

Thanks to my colleague Jeff Kraemer for introducing this tool to me a while back!

Automate Vester with Jenkins and Docker

Summary

A code driven approach to automate VMware vSphere configuration management, by running Vester with Jenkins and Docker.  Based entirely on open source projects.

In this article, I’ll summarize configuration of each component and provide example code to get you started.

The main idea:

  1. You, the administrator, describe your vSphere desired state in a couple Vester config files (text files in JSON format)
  2. You check those config files into source control
  3. Jenkins periodically runs Vester against your vSphere environments, using your config files, and collects the results
  4. You can view the test results in a handy report format, as shown here:
Vester Test Results – Shown in Jenkins Blue Ocean UI

Project links

Code

Components

Also, if you want to run Vester in a container, for now you’ll need my branch of Vester (PR pending): https://github.com/jeffgreenca/Vester/tree/fix/docker

If you are curious about my jeffgreenca/mspowercli Docker image, you can review the Dockerfile and assets here: https://github.com/jeffgreenca/mspowercli

Design overview

Here’s an architecture diagram of the solution:

Diagram walk-thru

  1. The administrator commits Vester config files to internal source control.
  2. Jenkins pipeline is triggered on source control commit and time-based triggers, to run periodically
  3. When the Jenkins pipeline runs, Jenkins starts a Docker container with all the necessary components to run Vester, including the custom config files.
  4. Invoke-Vester is executed against each vCenter, in parallel for speed.  Read-only vCenter credentials stored in Jenkins are used to establish the connection.
  5. Jenkins saves the Vester test results using the NUnit plugin
  6. The administrator reviews the test results via the Jenkins Blue Ocean UI

About dockerizing Vester

If you’re just trying to replicate my results, you can skip reading this section, as I’ve already done this part of the work for you.

Initially, the simplest approach was to try and load Vester into the VMware PowerCLI core docker image.  This didn’t work, for a couple of reasons:

  • VMware’s PowerCLI core docker image didn’t really load the Pester module, the PowerShell testing framework that Vester is built on.
  • Vester code base is pretty free with assuming\ to be the path delimiter.  Enter Linux: what are these/‘s doing in my paths?!
  • One module, VMware.VumAutomation is not (yet?) ported to PowerCLI core.

I solved each problem as follows:

Generate Vester configs

Use New-VesterConfig per the normal Vester usage to generate configuration files for each vCenter environment.  You’ll maintain these files in a specific path in the git repository.

Set up the git repository

Hopefully, you’ll already have some kind of git-based source control system.  This could certainly use a cloud-based solution, but keep in mind you’ll be storing your Vester config files so be sure it is sufficiently secured.  For self-hosted, I am currently using https://gogs.io and enjoying it.

Create a new repository, “auto-vester” or whatever you’d like to call it.  To save time, you can start by forking my example repository here: https://github.com/jeffgreenca/auto-vester-example

The repository will have this structure:

The way I setup my environment, each vCenter has a Vester JSON config file under configs/<environment>/ where <environment> is for dev, prod, staging, etc.

The naming is important, as the JSON config files are referenced in the Jenkinsfile pipeline definition.

Set up Jenkins Blue Ocean

Deploy Jenkins

If you don’t already have a Jenkins instance, it’s easy to stand up a simple system using the provided Docker image.

For this project, I used Docker Compose to stand up a Jenkins Blue Ocean instance.  For example, this docker-compose.yml file should work on any Linux server with Docker CE engine installed and Docker Compose installed.  For production, you’d want to protect your system behind HTTPS but for a proof of concept this should be fine:

To start up the system:

$ docker-compose up -d

Attach to the container’s log to get the setup key required by the Jenkins web-based setup wizard.  Browse to http://<host>:8080/ and then to load the Blue Ocean interface, the URL will be http://<host>:8080/blue

Install a plugin

Install the NUnit plugin as it is used to read the Vester test results.

Create credentials

Vester needs credentials to connect to vSphere.  Lucky for us, Jenkins can store these credentials encrypted and make them available for a pipeline.  Follow the instructions here: https://jenkins.io/doc/book/using/using-credentials/

You’ll need to create a username/password credential type, with the ID vcenter-vester-prod (or whatever you edit the Jenkins pipeline definition to reference).

Configure the Jenkins pipeline

I created a Jenkins Declarative Pipeline, which you can modify to your needs.  This is stored in the Jenkinsfile in the repository and tells Jenkins what to do when the pipeline runs.  It also includes a time-based trigger to run the pipeline periodically.

A couple of key notes:

  • Edit the environment { vServer = ” } assignment to your vCenter(s)!
  • All steps are run inside a Docker container using my image jeffgreenca/mspowercli downloaded automatically from the Docker hub
  • In the “Download Vester” stage, I check out my dockerized version of the Vester project which supports running on PowerCLI core.
  • The parallel {} block speeds up execution by running multiple Invoke-Vester tests at once
  • Environment variables are used to define the vCenter to connect to for each stage, which enables using the appropriate config file and saving the test results to a unique XML file
  • For readability, each vCenter is just a copy and paste of the “stage” block.
  • All of the test results are collected in a single step at the end of the pipeline.  Because Invoke-Vester exits cleanly even when tests fail, this doesn’t need to be in a post { always { } } block.

Add the pipeline to Jenkins Blue Ocean

In Jenkins Blue Ocean (https://<host>:8080/blue) click New Pipeline and enter the information to connect to your git repository.  You’ll need to configure at least read access to your repository for Jenkins.  See the Jenkins documentation for more information: https://jenkins.io/doc/tutorials/build-a-multibranch-pipeline-project/

You already have a Jenkinsfile in your repository, so the pipeline will be created automatically once you point to the repository.

From here on out, you can manually trigger the master branch pipeline to execute, or wait for it to happen automatically based on the cron trigger.

Review test results

At this point, you should have a more or less working setup.  Here’s some screenshots of what you should be seeing in Jenkins when you run your pipeline.

In-progress pipeline executing

Here you can see a pipeline running, with one Invoke-Vester stage still executing and before test results have been collected.

Jenkins Pipeline Execution In Progress

Examine test results

Once the pipeline completes, the Test tab will be populated with the test results.  One of the really helpful things about this system is the reporting on “new tests failing,” so you can determine if something recently changed in your environment vs. the last execution of the pipeline, even when some tests are failing.

After the pipeline runs, viewing the test results

View details about a specific  failed test

Each failed test can be clicked on to get the stack trace which will show you exactly what is misconfigured for this object.  I’ve blurred out the names of my syslog servers and the hostname of this ESXi host, but you can see that the syslog server I configured in my Vester config desired state file does not match the actual configuration on this particular host:

Viewing details about a particular failed test. “Desired” and “Actual” show the specific misconfigured values.

What’s next?

It’s up to you!  One option would be to add a pipeline stage that applies Vester with the -Remediate flag turned on, to automatically remediate your environment.  This might be a good place to consider using the Jenkins input command, which allows the pipeline to pause and wait for user input before continuing with the actual environment-changing remediation.

Thanks for reading, and I hope this helps you out!

Forwarding Centrify Event Logs to VMware Log Insight using REST API

If you use both Centrify and VMware Log Insight, and need to pull event logs from Centrify into Log Insight (in near real-time), this tool is for you!

This tool queries the Centrify API for your tenant, pulling the most recent entries in the Event table (logins, application launches, configuration changes, etc.).  Then, it structures the event data into a syntax that can be imported to Log Insight, keeping event time accuracy and offering a series of structured fields to Log Insight to enable easy creation of dashboards, analytics queries, and so on.  All the good stuff that Log Insight offers.

I wrote this to be modular and and it should be relatively easy to adapt for other data sources and destinations.  So even if you are using only one of these two solutions, but trying to solve a similar problem, my code might be a starting point for you too.

Install this as a cron job, by scheduling run.sh on a 5 minute interval on some server.  You place some configuration in a config.py file, and that’s it.  Very simple.

Recurrent Neural Networks, Python, and the TV sitcom Frasier

Frasier, without the (recorded) laughs…

At a recent dinner party, some friends and I wondered: What would the TV show Frasier be without the audience laugh track? And further, could machine learning be used to automatically mute out the undesired audience audio?

Well actually, Frasier… with audio feature extraction visualized in the lower corner

The Code

So, I built a tool to automatically mute the sections of audience laughter.

The repository includes a Jupyter notebook with detailed explanation, example video clip with processed audio, a trained model, and the CLI tool to apply this to your own audio files.

I wrote this in Python and used several open source libraries, notably Keras for describing a multi-layer recurrent neural network, librosa for audio feature extraction, and (of course) numpy to manipulate the data.  It’s amazing what you can do in under 300 lines of code, thanks to contributions of the open source community.

More detail

What makes this an interesting problem, in my opinion, is that the “laugh track” audio is highly variable, real studio audience recordings, so properly classifying the audio required a more complex machine learning approach. Using a model that considered time-series data, rather than simple point-in-time consideration.

I used a number of fantastic tools and resources along the way – starting with Jupyter to rapidly prototype different approaches to the problem, and visualize the data as I went along. After I had built a solution I was happy with, I worked to convert it from a notebook, which is good for reference, exploration, and demonstration, to a standalone Python tool with a command line interface, which can be run on-demand to train or apply the model.

What’s next?

In theory, this tool could be trained on any two classes of audio, and then used to mute out one of those two classes. I don’t believe there is anything particularly special about identifying “laughs” vs other kinds of distinct audio!  All my code is released under MIT license, so you are welcome to do whatever you’d like with it.

Enjoy.

Recent Posts

Categories

Archives

GiottoPress by Enrique Chavez