VMware Validated Designs, Software Defined Data Center Version 4.1 (vRA 7.3, NSX 6.3.3)

I use these.  Generally speaking, they are really good.  It helps to have a team with mastery of the products, this is not quite a SDDC for dummies.

Glad to see the update including latest versions of the vRealize suite, NSX, and other core products.

VCAP7-CMA Design SME Certification Development Experience

A few months ago, I was invited to participate in VMware’s VCAP7-CMA Design exam development process.  What a great experience!

I’ll keep my comments brief to avoid any material that may be covered by NDA, but I would like to touch on one topic that seems to be a hot subject.  In the latest advanced design exams, VMware has announced they are removing the visual drag and drop style whiteboard questions.  Some folks (I’m looking at you reddit thread) seem to be under the impression that VMware doesn’t want to bother with a complex testing tool.  While I am not privy to the business drivers for this decision, I can certainly relay that the messaging around the change during the workshop I participated in was 100% focused on issues with test taker feedback and grading challenges.

In my technical circles here in Sacramento, this is a fairly common topic of discussion and I imagine elsewhere also.  The common complaint is the gap between real world applied skill, and certifications, and closing that gap so certifications carry weight and “mean something.”

The consensus seems to be for exams that test the skill of operating, troubleshooting, and configuring a system, the best test is an extended lab-based exam.  In this day of automation and cloud computing, I see no reason labs need to be expensive to administer or grade. Provided the scenarios are complex enough, test dumps become a much smaller issue.  You also cut down on nonsense questions and close the gap between what I’ve heard called “book exams” vs. “real world exams,” the former being memorizing every word of documentation to pass an exam, the later being a skilled administrator capable of working with the system in question.  At the end of the test, either you have configured the system to operate correctly, or you have not.  Simple!

Applying this idea to design exams becomes perhaps more tricky.  If I could craft an ideal scenario, it would require every candidate to present a live whiteboard session to a small panel of experts.  I suppose in some ways, this describes the later part of the VCDX process.  But it comes with high cost, high degree of uniqueness per session, difficult to scale, and so on.  Perhaps that is where the drag and drop canvas design idea came from – automate the whiteboard session.  Unfortunately, there are clearly problems in executing this approach.  One of those problems is the whiteboard session allows the panel to ask interactive questions and get clarification.  During a technical interview, this is a great way to see how the candidate thinks as well as how well they grasp the technology.

This is a long way to say the canvas design questions are a great aspiration but have proved challenging in practice.  So, how do you write a design exam without canvas questions?  There are lots of options.  For example, you could present the test-taker with a drawing of a design, perhaps with a flaw, and prompt for clarification regarding the design.  Or, you could use the old tried and true scenario based questions.  There are other options too.  These approaches in my estimation do not demand as much from the test taker as blank whiteboard and marker, but they seem highly repeatable and fair, do not require the test taker to explain their logic in order to see if their drawing is actually valid.  They should be sufficient for the degree of knowledge tested in an advanced level exam.  Again, we still have the expert level certification to denote true mastery of the subject and products.

All in all, I was very impressed by the test development process, with an enormous amount of metrics, analysis, and process behind the scenes that frankly, I had not really had occasion to think about as a test taker where the exam is a magical black box.  I have a high degree of confidence in the folks VMware has running the certification exam development program, and I expect to continue to see improvements in the program based on real-world data collected in the field.



vRealize Log Insight and vRA Embedded vIDM – Password Complexity

When configuring vRealize Log Insight 4.5 to use vRealize Automation 7.3 embedded VMware Identity Manager (per this blog post), I ran into an issue with password complexity.

I specified a tenant name and provided valid tenant administrator credentials to register with vIDM.  However, the web interface indicated an error when clicking Test Connection, related either to bad username/password or unknown response.  I resolved this by using what I’d characterize as less “special” complex characters in the the local tenant administrator user account password, then running Test Connection again.  Success!

Investigating the log files on the vRealize Log Insight system, I found a useful entry in one of the log files:

  • /var/log/loginsight/ui_runtime.log

For posterity, the steps to change a vRA 7.3 tenant local user account password are:

  • Log in to vRA as the default administrator
  • Navigate to Tenants -> Your Tenant Name -> Local Users
  • Click the local user account to manage
  • Click Edit
  • Change the password

So, this is the same issue we’ve seen in a few different places and products.  My recommendation is to always use passwords with a high degree of entropy, but in some cases you need to be careful of special characters that can be misinterpreted by some of the product line.  Fun times.  Hope this helps.

The Hummingbird and Machine Learning

A flighty, feathered Anna’s hummingbird graced our back patio with her minuscule nest. I grabbed a Pi, USB cam, duct tape, and started time lapse recording.

Left: Real Photo. Right: Web Cam Image
I got to thinking what I could do with all these pictures, and decided, “Hey, maybe I could use some basic machine learning techniques to classify images!”  I decided I’d implement a simple neural network to label each time lapse frame as “bird on nest” or “empty nest”, and visualize the results.

A couple of ground rules – I wanted to build the neural network “myself”, rather than find an off-the-shelf and image analysis solution, as the goal was personal learning primarily and fun secondary.  I decided simple was OK – the camera is fixed position, and the major visual variations are not complex – bird state, background movement due to wind, lighting changes due to time of day, and the web cam IR filter that engages in low light.


Using the trained neural network to classify images as “on nest” or “off nest” worked pretty well.  Here is a short example from the test set results:

I classified several days of data, and then used that labelled data to generate a heat map showing time on nest.

Around the time the eggs hatched you can see a significant shift in percentage of day hours spent on the nest.  She’s much more active, I assume hunting for food.  The grey bars indicate no images were available for that time period.  The color scale indicates percentage of time spent on the nest, darker = more time on nest.



I used a few core tools and resources for this project:

  • TensorFlow – Google’s open source AI engine
  • Keras – high-level front end to TensorFlow (or Theano), used to build the neural network
  • OpenCV – for dealing with image data
  • Pandas – for data analysis
  • Seaborn – for data visualization
  • Python 3.5 with numpy for all the matrix manipulation type stuff.
  • Stanford’s Machine Learning MOOC at Coursera, Stackexchange sites, and some helpful examples on Kaggle.


My process had three components.  1) train a neural network to classify on nest / off nest, 2) use the learned model to classify all 170,000 samples or so, 3) do some data visualization on the results.

Data Wrangling?

To get my image data into a reasonable format, I took the following steps using OpenCV (cv2) and numpy:

  • Loaded images in grayscale from my NAS
  • Extracted a fixed Region of Interest, the area with the nest
  • Normalized pixel intensities between 0.0 and 1.0.
  • Reshaped the image data into a vector

The result was a 34,000 element vector for each source image (170×200 pixel region of interest).

I manually classified 1,987 images into two folders, 0-nobird and 1-bird.  This was actually not too time consuming, I swear.

Defining the Neural Network

Using Keras (♥♥♥), I defined a densely connected neural network with 84 input nodes and 84 hidden layers.  The output is a 2-element vector using softmax.  I used Parametric Rectified Linear Unit (PReLU) for activations as that gave me the best results when testing against my cross validation set compared to the other activations I tried.  I also tuned the learning rate and regularization value using my cross validation set and the below gave me good results.

Training the Neural Network

I split my manually classified data set into three parts, using random shuffle with a fixed seed to ensure each execution would have consistent results.  60% training, 20% for cross validation, and finally 20% for testing.  I trained my network on the training set using 50 epochs, then evaluated the model against the cross validation set, and tweaked the model experimenting with various configurations.  Finally, I confirmed that my selected model performed well against the test set.

Once I had something I liked, I saved the trained model:

Using Visualization to Check the Results

I setup a routine to convert my processed 34000 element vectors into grey scale images and display them on a grid.  Then, I setup a routine to randomly select a handful of positive and negative examples to show on the screen.  Here’s one of the more interesting sets, where you can see a couple false negatives where the web cam has glitched out and offset the frame, but a human can detect there is a hummingbird sitting on the nest.

This simple neural network processes the data and says, hey the inputs that matter in that frame don’t match my learned weights for classifying this as [ 0 1 ], and it slightly more meets the classification for [ 1 0 ] no bird on nest.  More advanced image recognition techniques would need to be used to locate the bird in the frame regardless of location, based on features perhaps.


My second script was responsible for loading up the already trained model, and using it to classify the 130k or so images I had collected.  This was simply a matter of parsing the files from my NAS shared drive, loading each image using my previously described image processing method, and running the model against it to return classifications.  To make this faster, I built batches of 500 images, appending the unrolled image matrixes into 34000 element vectors and combining those into 34000×500 matrixes, which was handed to the Keras model for predictions.

Note – the “Faster” in this function name came from learning that using np.vstack to append to matricies is apparently very, very slow compared to Python’s list append() method.

I saved the results in a CSV file that included the full image path (timestamp embedded in the filename) and the classification result.

Exploring the Results

Then, I learned the basics of pandas and seaborn so that I could visualize the data.  I won’t include my code, because it is probably not a very good example.

Eventually I got to some mess like this:

And that seemed to work well enough for this little fun project.



vSphere 6.5 “Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster”

EDIT – Today (7/5), VMware GSS confirmed this is a bug that will be addressed in the next patch release.


Setting up a simple vSphere 6.5 (vCenter build 5178943) environment, using percentage-based HA we have been seeing the following Configuration Issue flag on the cluster (note, it is NOT an alarm!):

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster


You can see the alarms have Acknowledge/Reset options, this is a Configuration Issue message instead.

Possible Solutions

Doing some research I’ve collected the following things people said resolved this:

  • Turn on, or turn off Proactive HA and then disable HA on the cluster, then re-enable HA
  • Verify no firewall is interfering with HA
  • Use various methods to reset FDM on the hosts – disconnect/reconnect, move into another cluster, so on…
  • Set AC percentage to a different, low value (10% CPU/Memory)

In our case, for 3 clusters across 3 vCenters, none of those options worked.  Instead:

  • Disable “Override calculated failover capacity.”

That worked, as a workaround.  Unfortunately, setting ANY value (1%, 10%, 33%, 100%) for percentage based AC causes the config issue.  When I disable Override, the calculated value is 33% as expected.  So, it has to do with manually specifying a value, rather than the particular value specified.

Working Configuration, for a 3-node cluster:

This throws a configuration issue:

Hope that helps.