January 22, 2020


Community Blog Round Up 20 January 2020

We’re super chuffed to see another THREE posts from our illustrious community – Adam Young talks about api port failure and speed bumps while Lars explores literate programming.

Shift on Stack: api_port failure by Adam Young

I finally got a right-sized flavor for an OpenShift deployment: 25 GB Disk, 4 VCPU, 16 GB Ram. With that, I tore down the old cluster and tried to redeploy. Right now, the deploy is failing at the stage of the controller nodes querying the API port. What is going on?

Read more at https://adam.younglogic.com/2020/01/shift-on-stack-api_port-failure/

Self Service Speedbumps by Adam Young

The OpenShift installer is fairly specific in what it requires, and will not install into a virtual machine that does not have sufficient resources. These limits are 16 GB RAM, 4 Virtual CPUs, and 25 GB Disk Space. This is fairly frustrating if your cloud provider does not give you a flavor that matches this. The last item specifically is an artificial limitation as you can always create an additional disk and mount it, but the installer does not know to do that.

Read more at https://adam.younglogic.com/2020/01/self-service-speedbumps/

Snarl: A tool for literate blogging by Lars Kellogg-Stedman

Literate programming is a programming paradigm introduced by Donald Knuth in which a program is combined with its documentation to form a single document. Tools are then used to extract the documentation for viewing or typesetting or to extract the program code so it can be compiled and/or run. While I have never been very enthusiastic about literate programming as a development methodology, I was recently inspired to explore these ideas as they relate to the sort of technical writing I do for this blog.

Read more at https://blog.oddbit.com/post/2020-01-15-snarl-a-tool-for-literate-blog/

by Rain Leander at January 22, 2020 09:09 PM

January 20, 2020

Emilien Macchi

Developer workflow with TripleO

In this post we’ll see how one can use TripleO for developing & testing changes into OpenStack Python-based projects (e.g. Keystone).


Even if Devstack remains a popular tool, it is not the only one you can use for your development workflow.

TripleO hasn’t only been built for real-world deployments but also for developers working on OpenStack related projects like Keystone for example.

Let’s say, my Keystone directory where I’m writing code is in /home/emilien/git/openstack/keystone.

Now I want to deploy TripleO with that change and my code in Keystone. For that I will need a server (can be a VM) with at least 8GB of RAM, 4 vCPU and 80GB of disk, 2 NICs and CentOS7 or Fedora28 installed.

Prepare the repositories and install python-tripleoclient:

If you’re deploying on recent Fedora or RHEL8, you’ll need to install python3-tripleoclient.

Now, let’s prepare your environment and deploy TripleO:

Note: change the YAML for your own needs if needed. If you need more help on how to configure Standalone, please check out the official manual.

Now let’s say your code needs a change and you need to retest it. Once you modified your code, just run:

Now, if you need to test a review that is already pushed in Gerrit and you want to run a fresh deployment with it, you can do it with:

I hope these tips helped you to understand how you can develop and test any OpenStack Python-based project without pain, and pretty quickly. On my environment, the whole deployment takes less than 20 minutes.

Please give any feedback in comment or via email!

by Emilien at January 20, 2020 02:32 PM

January 19, 2020

Adam Young

Shift on Stack: api_port failure

I finally got a right-sized flavor for an OpenShift deployment: 25 GB Disk, 4 VCPU, 16 GB Ram. With that, I tore down the old cluster and tried to redeploy. Right now, the deploy is failing at the stage of the controller nodes querying the API port. What is going on?

Here is the reported error on the console:

The IP address of is attached to the following port:

$ openstack port list | grep "0.0.5"
| da4e74b5-7ab0-4961-a09f-8d3492c441d4 | demo-2tlt4-api-port       | fa:16:3e:b6:ed:f8 | ip_address='', subnet_id='50a5dc8e-bc79-421b-aa53-31ddcb5cf694'      | DOWN   |

That final “DOWN” is the port state. It is also showing as detached. It is on the internal network:

Looking at the installer code, the one place I can find a reference to the api_port is in the template data/data/openstack/topology/private-network.tf used to build the value openstack_networking_port_v2. This value is used quite heavily in the rest of the installers’ Go code.

Looking in the terraform data built by the installer, I can find references to both the api_port and openstack_networking_port_v2. Specifically, there are several object of type openstack_networking_port_v2 with the names:

$ cat moc/terraform.tfstate  | jq -jr '.resources[] | select( .type == "openstack_networking_port_v2") | .name, ", ", .module, "\n" '
api_port, module.topology
bootstrap_port, module.bootstrap
ingress_port, module.topology
masters, module.topology

On a baremetal install, we need an explicit A record for api-int.<cluster_name>.<base_domain>. That requirement does not exist for OpenStack, however, and I did not have one the last time I installed.

api-int is the internal access to the API server. Since the controllers are hanging trying to talk to it, I assume that we are still at the stage where we are building the control plane, and that it should be pointing at the bootstrap server. However, since the port above is detached, traffic cannot get there. There are a few hypotheses in my head right now:

  1. The port should be attached to the bootstrap device
  2. The port should be attached to a load balancer
  3. The port should be attached to something that is acting like a load balancer.

I’m leaning toward 3 right now.

The install-config.yaml has the line:
octaviaSupport: “1”

But I don’t think any Octavia resources are being used.

$ openstack loadbalancer pool list

$ openstack loadbalancer list

$ openstack loadbalancer flavor list
Not Found (HTTP 404) (Request-ID: req-fcf2709a-c792-42f7-b711-826e8bfa1b11)

by Adam Young at January 19, 2020 12:55 AM

January 17, 2020


Best of 2019 Blogs, Part 4: Announcing Docker Enterprise 3.0 General Availability

Today, we’re excited to announce the general availability of Docker Enterprise 3.0 – the only desktop-to-cloud enterprise container platform enabling organizations to build and share any application and securely run them anywhere - from hybrid cloud to the edge.

by Docker Enterprise Team at January 17, 2020 04:00 PM

January 16, 2020

Fleio Blog

Fleio 2020.01: Operations user interface, improvements to the ticketing system, bug fixes and more

Fleio version 2019.12 is now available! The latest version was published today, 2019-01-16. Operations user interface With the 2020.01 release we have added a new user interface feature: Operations We decided to add this feature in order to improve the way that some tasks were being done and to make all the preparations to move […]

by Marian Chelmus at January 16, 2020 01:02 PM

January 15, 2020

Adam Young

Self Service Speedbumps

The OpenShift installer is fairly specific in what it requires, and will not install into a virtual machine that does not have sufficient resources. These limits are:

  • 16 GB RAM
  • 4 Virtual CPUs
  • 25 GB Disk Space

This is fairly frustrating if your cloud provider does not give you a flavor that matches this. The last item specifically is an artificial limitation as you can always create an additional disk and mount it, but the installer does not know to do that.

In my case, there is a flavor that almost matches; it has 10 GB of Disk space instead of the required 25. But I cannot use it.

Instead, I have to use a larger flavor that has double the VCPUs, and thus eats up more of my VCPU quota….to the point that I cannot afford more than 4 Virtual machines of this size, and thus cannot create more than one compute node; OpenShift needs 3 nodes for the control plane.

I do not have permissions to create a flavor on this cloud. Thus, my only option is to open a ticket. Which has to be reviewed and acted upon by an administrator. Not a huge deal.

This is how self service breaks down. A non-security decision (link disk size with the other characteristics of a flavor) plus Access Control rules that prevent end users from customizing. So the end user waits for a human to respond

In my case, that means that I have to provide an alternative place to host my demonstration, just in case things don’t happen in time. Which costs my organization money.

This is not a ding on my cloud provider. They have the same OpenStack API as anyone else deploying OpenStack.

This is not a ding on Keystone; create flavor is not a project scoped operation, so I can’t even blame my favorite bug.

This is not a ding on the Nova API. It is reasonable to reserve the ability to create Flavors to system administrators. If instances have storage attached, to provide it in reasonable sized chunks.

My problem just falls at the junction of several different zones of responsibility. It is the overlap that causes the pain in this case. This is not unusual

Would it be possible to have a more granular API, like “create customer flavor” that built a flavor out of pre-canned parts and sizes? Probably. That would solve my problem. I don’t know if this is a general problem, though.

This does seem like it is something that could be addressed by a GitOps type approach. In order to perform an operation like this, I should be able to issue a command that gets checked in to git, confirmed, and posted for code review. An administrator could then confirm or provide an alternative approach. This happens in the ticketing system. It is human-resource-intensive. If no one says “yes” the default is no…and thing just sits there.

What would be a better long term solution? I don’t know. I’m going to let this idea set for a while.

What do you think?

by Adam Young at January 15, 2020 05:18 PM

January 10, 2020

Dan Prince

Keystone Operator Deploy/Upgrade on OpenShift

Keystone deploy and upgrade with an OpenShift/Kubernetes Operator

by Dan Prince at January 10, 2020 09:05 PM

January 06, 2020


Community Blog Round Up 06 January 2020

Welcome to the new DECADE! It was super awesome to run the blog script and see not one, not two, but THREE new articles by the amazing Adam Young who tinkered with Keystone, TripleO, and containers over the break. And while Lars only wrote one article, it’s the ultimate guide to the Open Virtual Network within OpenStack. Sit back, relax, and inhale four great articles from the RDO Community.

Running the TripleO Keystone Container in OpenShift by Adam Young

Now that I can run the TripleO version of Keystone via podman, I want to try running it in OpenShift.

Read more at https://adam.younglogic.com/2019/12/running-the-tripleo-keystone-container-in-openshift/

Official TripleO Keystone Images by Adam Young

My recent forays into running containerized Keystone images have been based on a Centos base image with RPMs installed on top of it. But TripleO does not run this way; it runs via containers. Some notes as I look into them.

Read more at https://adam.younglogic.com/2019/12/official-tripleo-keystone-images/

OVN and DHCP: A minimal example by Lars Kellogg-Stedman

Introduction A long time ago, I wrote an article all about OpenStack Neutron (which at that time was called Quantum). That served as an excellent reference for a number of years, but if you’ve deployed a recent version of OpenStack you may have noticed that the network architecture looks completely different. The network namespaces previously used to implement routers and dhcp servers are gone (along with iptables rules and other features), and have been replaced by OVN (“Open Virtual Network”).

Read more at https://blog.oddbit.com/post/2019-12-19-ovn-and-dhcp/

keystone-db-init in OpenShift by Adam Young

Before I can run Keystone in a container, I need to initialize the database. This is as true for running in Kubernetes as it was using podman. Here’s how I got keystone-db-init to work.

Read more at https://adam.younglogic.com/2019/12/keystone-db-init-in-openshift/

by Rain Leander at January 06, 2020 12:52 PM

December 30, 2019

Slawek Kaplonski

Analyzing number of failed builds per patch in OpenStack projects

Short background For the past few years I have been an OpenStack Neutron contributor, core reviewer and now even the project’s PTL. One of my responsibilities in the community is taking care of our CI system for Neutron. As part of this job I have to constantly check how various CI jobs are working and if the reasons for their failure are related to the patch or the CI itself.

December 30, 2019 10:56 PM

December 21, 2019

Adam Young

Running the TripleO Keystone Container in OpenShift

Now that I can run the TripleO version of Keystone via podman, I want to try running it in OpenShift.

Here is my first hack at a deployment yaml. Note that it looks really similar to the keystone-db-init I got to run the other day.

If I run it with:

oc create -f keystone-pod.yaml

I get a CrashLoopBackoff error, with the following from the logs:

$ oc logs pod/keystone-api 
 sudo -E kolla_set_configs
 sudo: unable to send audit message: Operation not permitted
 INFO:main:Loading config file at /var/lib/kolla/config_files/config.json
 ERROR:main:Unexpected error:
 Traceback (most recent call last):
 File "/usr/local/bin/kolla_set_configs", line 412, in main
 config = load_config()
 File "/usr/local/bin/kolla_set_configs", line 294, in load_config
 config = load_from_file()
 File "/usr/local/bin/kolla_set_configs", line 282, in load_from_file
 with open(config_file) as f:
 IOError: [Errno 2] No such file or directory: '/var/lib/kolla/config_files/config.json' 

I modified the config.json to remove steps that were messing me up. I think I can now remove evn that last config file, but I left it for now.

   "command": "/usr/sbin/httpd",
   "config_files": [
              "source": "/var/lib/kolla/config_files/src/*",
              "dest": "/",
              "merge": true,
              "preserve_properties": true
    "permissions": [
            "path": "/var/log/kolla/keystone",
            "owner": "keystone:keystone",
            "recurse": true

I need to add the additional files to a config map and mount those inside the container. For example, I can create a config map with the config.json file, a secret for the Fernet key, and a config map for the apache files.

oc create configmap keystone-files --from-file=config.json=./config.json
kubectl create secret generic keystone-fernet-key --from-file=../kolla/src/etc/keystone/fernet-keys/0
oc create configmap keystone-httpd-files --from-file=wsgi-keystone.conf=../kolla/src/etc/httpd/conf.d/wsgi-keystone.conf

Here is my final pod definition

apiVersion: v1
kind: Pod
  name: keystone-api
    app: myapp
  - image: docker.io/tripleomaster/centos-binary-keystone:current-tripleo 
    imagePullPolicy: Always
    name: keystone
      value: "/var/lib/kolla/config_files/src/config.json"
      value: "COPY_ONCE"
    - name: keystone-conf
      mountPath: "/etc/keystone/"
    - name: httpd-config
      mountPath: "/etc/httpd/conf.d"
    - name: config-json
      mountPath: "/var/lib/kolla/config_files/src"

    - name: keystone-fernet-key
      mountPath: "/etc/keystone/fernet-keys/0"
  - name: keystone-conf
      secretName: keystone-conf
      - key: keystone.conf
        path: keystone.conf
        mode: 511	
  - name: keystone-fernet-key
      secretName: keystone-fernet-key
      - key: "0"
        path: "0"
        mode: 511	
  - name: config-json
       name: keystone-files
  - name: httpd-config
       name: keystone-httpd-files

And show that it works for basic stuff:

$ oc rsh keystone-api
sh-4.2# curl
{"versions": {"values": [{"status": "stable", "updated": "2019-07-19T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.13", "links": [{"href": "", "rel": "self"}]}]}}curl (HTTP:// response: 300, time: 3.314, size: 266

Next steps: expose a route, make sure we can get a token.

by Adam Young at December 21, 2019 12:31 AM

December 19, 2019

Adam Young

Official TripleO Keystone Images

My recent forays into running containerized Keystone images have been based on a Centos base image with RPMs installed on top of it. But TripleO does not run this way; it runs via containers. Some notes as I look into them.

The official containers for TripleO are currently hosted on docker.com. The Keystone page is here:

Don’t expect the docker pull command posted on that page to work. I tried a comparable one with podman and got:

$ podman pull tripleomaster/centos-binary-keystone
Trying to pull docker.io/tripleomaster/centos-binary-keystone...
  manifest unknown: manifest unknown
Trying to pull registry.fedoraproject.org/tripleomaster/centos-binary-keystone...

And a few more lines of error output. Thanks to Emilien M, I was able to get the right command:

$ podman pull tripleomaster/centos-binary-keystone:current-tripleo
Trying to pull docker.io/tripleomaster/centos-binary-keystone:current-tripleo...
Getting image source signatures
Copying config 9e85172eba done
Writing manifest to image destination
Storing signatures

Since I did this as a normal account, and not as root, the image does not get stored under /var, but instead goes somewhere under $HOME/.local. If I type

$ podman images
REPOSITORY                                       TAG               IMAGE ID       CREATED        SIZE
docker.io/tripleomaster/centos-binary-keystone   current-tripleo   9e85172eba10   2 days ago     904 MB

I can see the short form of the hash starting with 9e85. I copy that to use to match the subdir under ls /home/ayoung/.local/share/containers/storage/overlay-image

ls /home/ayoung/.local/share/containers/storage/overlay-images/9e85172eba10a2648ae7235076ada77b095ed3da05484916381410135cc8884c/

If I cat that file, I can see all of the layers that make up the image itself.

Trying a naive: podman run docker.io/tripleomaster/centos-binary-keystone:current-tripleo I get an error that shows just how kolla-centric this image is:

$ podman run docker.io/tripleomaster/centos-binary-keystone:current-tripleo
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
ERROR:__main__:Unexpected error:
Traceback (most recent call last):
  File "/usr/local/bin/kolla_set_configs", line 412, in main
    config = load_config()
  File "/usr/local/bin/kolla_set_configs", line 294, in load_config
    config = load_from_file()
  File "/usr/local/bin/kolla_set_configs", line 282, in load_from_file
    with open(config_file) as f:
IOError: [Errno 2] No such file or directory: '/var/lib/kolla/config_files/config.json'

So I read the docs. Trying to fake it with:

$ podman run -e KOLLA_CONFIG='{}'   docker.io/tripleomaster/centos-binary-keystone:current-tripleo
+ sudo -E kolla_set_configs
INFO:__main__:Validating config file
ERROR:__main__:InvalidConfig: Config is missing required "command" key

When running with TripleO, The config files are generated from Heat Templates. The values for the config.json come from here.
This gets me slightly closer:

podman run  -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -e KOLLA_CONFIG='{"command": "/usr/sbin/httpd"}'   docker.io/tripleomaster/centos-binary-keystone:current-tripleo

But I still get an error of “no listening sockets available, shutting down” even if I try this as Root. Below is the whole thing I tried to run.

$ podman run   -v $PWD/fernet-keys:/var/lib/kolla/config_files/src/etc/keystone/fernet-keys   -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -e KOLLA_CONFIG='{ "command": "/usr/sbin/httpd", "config_files": [ { "source": "/var/lib/kolla/config_files/src/etc/keystone/fernet-keys", "dest": "/etc/keystone/fernet-keys", "owner":"keystone", "merge": false, "perm": "0600" } ], "permissions": [ { "path": "/var/log/kolla/keystone", "owner": "keystone:keystone", "recurse": true } ] }'  docker.io/tripleomaster/centos-binary-keystone:current-tripleo

Lets go back to simple things. What is inside the container? We can peek using:

podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo ls

Basically, we can perform any command that will not last longer than the failed kolla initialization. No Bash prompts, but shorter single line bash commands work. We can see that mysql is uninitialized:

 podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo cat /etc/keystone/keystone.conf | grep "connection ="
#connection = 

What about those config files that the initialization wants to copy:

podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo ls /var/lib/kolla/config_files/src/etc/httpd/conf.d
ls: cannot access /var/lib/kolla/config_files/src/etc/httpd/conf.d: No such file or directory

So all that comes from external to the container, and is mounted at run time.

$ podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo cat /etc/passwd  | grep keystone

Which owns the config and the log files.

$ podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo ls -la /var/log/keystone
total 8
drwxr-x---. 2 keystone keystone 4096 Dec 17 08:28 .
drwxr-xr-x. 6 root     root     4096 Dec 17 08:28 ..
-rw-rw----. 1 root     keystone    0 Dec 17 08:28 keystone.log
$ podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo ls -la /etc/keystone
total 128
drwxr-x---. 2 root     keystone   4096 Dec 17 08:28 .
drwxr-xr-x. 2 root     root       4096 Dec 19 16:30 ..
-rw-r-----. 1 root     keystone   2303 Nov 12 02:15 default_catalog.templates
-rw-r-----. 1 root     keystone 104220 Dec 14 01:09 keystone.conf
-rw-r-----. 1 root     keystone   1046 Nov 12 02:15 logging.conf
-rw-r-----. 1 root     keystone      3 Dec 14 01:09 policy.json
-rw-r-----. 1 keystone keystone    665 Nov 12 02:15 sso_callback_template.html
$ podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo cat /etc/keystone/policy.json

Yes, policy.json is empty.

Lets go back to the config file. I would rather not have to pass in all the config info as an environment variable each time. If I run as root, I can use the podman bind-mount option to relabel it:

 podman run -e KOLLA_CONFIG_FILE=/config.json  -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -v $PWD/config.json:/config.json:z   docker.io/tripleomaster/centos-binary-keystone:current-tripleo  

This eventually fails with the error message “no listening sockets available, shutting down” Which seems to be due to the lack of the httpd.conf entries for keystone:

# podman run -e KOLLA_CONFIG_FILE=/config.json  -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -v $PWD/config.json:/config.json:z   docker.io/tripleomaster/centos-binary-keystone:current-tripleo  ls /etc/httpd/conf.d

The clue seems to be in the Heat Templates. There are a bunch of files that are expected to be in /var/lib/kolla/config_files/src in side the container. Here’s my version of the WSGI config file:

Listen 5000
Listen 35357

ServerSignature Off
ServerTokens Prod
TraceEnable off

ErrorLog /var/log/kolla/keystone/apache-error.log"

    CustomLog /var/log/kolla/keystone/apache-access.log" common

LogLevel info

        AllowOverride None
        Options None
        Require all granted

    WSGIDaemonProcess keystone-public processes=5 threads=1 user=keystone group=keystone display-name=%{GROUP} python-path=/usr/lib/python2.7/site-packages
    WSGIProcessGroup keystone-public
    WSGIScriptAlias / /usr/bin/keystone-wsgi-public
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On
    = 2.4>
      ErrorLogFormat "%{cu}t %M"
    ErrorLog "/var/log/kolla/keystone/keystone-apache-public-error.log"
    LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\"" logformat
    CustomLog "/var/log/kolla/keystone/keystone-apache-public-access.log" logformat

    WSGIDaemonProcess keystone-admin processes=5 threads=1 user=keystone group=keystone display-name=%{GROUP} python-path=/usr/lib/python2.7/site-packages
    WSGIProcessGroup keystone-admin
    WSGIScriptAlias / /usr/bin/keystone-wsgi-admin
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On
    = 2.4>
      ErrorLogFormat "%{cu}t %M"
    ErrorLog "/var/log/kolla/keystone/keystone-apache-admin-error.log"
    LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\"" logformat
    CustomLog "/var/log/kolla/keystone/keystone-apache-admin-access.log" logformat

So with a directory structure like this:

C[root@ayoungP40 kolla]find src/ -print

And a Kolla config.json file like this:

   "command": "/usr/sbin/httpd",
   "config_files": [
              "source": "/var/lib/kolla/config_files/src/etc/keystone/fernet-keys",
              "dest": "/etc/keystone/fernet-keys",
              "merge": false,
              "preserve_properties": true
              "source": "/var/lib/kolla/config_files/src/etc/httpd/conf.d",
              "dest": "/etc/httpd/conf.d",
              "merge": false,
              "preserve_properties": true
              "source": "/var/lib/kolla/config_files/src/*",
              "dest": "/",
              "merge": true,
              "preserve_properties": true
    "permissions": [
            "path": "/var/log/kolla/keystone",
            "owner": "keystone:keystone",
            "recurse": true

I can run Keystone like this:

podman run -e KOLLA_CONFIG_FILE=/config.json  -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -v $PWD/config.json:/config.json:z   -v $PWD/src:/var/lib/kolla/config_files/src:z  docker.io/tripleomaster/centos-binary-keystone:current-tripleo

by Adam Young at December 19, 2019 09:00 PM

OpenStack Superuser

The 10th China Open Source Hackathon Recap: Projects, Talks, and More

This week, the 10th China Open Source Hackathon was held in Beijing. Since its first event in 2015, the China Open Source Hackathon has been held ten times, and this week’s event featured OpenStack, Kata Containers and StarlingX. Although it snowed heavily in Beijing this week, it did not cool down the developers’ enthusiasm for this Hackathon. Without further ado, Superuser collected the activities that you might have missed.

Kata Containers:

At this Hackathon, five developers from Ant Financial and Alibaba demonstrated two important features of the coming 2.0 dev cycle. Among them, Tao Peng and Eryu Guan demoed a mirroring system named Nydus that they have designed. Nydus uses the new developments of OCI artifacts and virtio-fs, combined with the OCI mirroring community’s future evolution direction, and considers isolation, pull speed, memory efficiency, and more to provide reference for the mirroring design of Kata 2.0.

The Kata Containers developers also coded and modified Kata at the Hackathon to support parsing the Nydus rootfs mount format, so it can achieve the extremely fast startup of Kata Containers. In addition, aiming to reduce the resource consumption requirements of Kata, Hui Zhu, Bo Yang, and Fupan Li replaced GRPC with rust-ttrpc on the basis of reimplementing kata-agent with rust, and made corresponding modifications to the kata runtime. They also demoed how to start Kata Containers with a combination of cloud hypervisor + rust kata-agent + ttrpc, and compared it with the current 1.X version of kata (a combination of qemu + go kata-agent + grpc). The original kata-agent implemented with go language and GRPC when running anonymous pages consumed about 11M at runtime, but the kata-agent implemented with rust language and TTRPC consumed only about 500,000, which greatly reduces the consumption of memory resources by Kata Containers itself.


As the project featured since the first China Open Source Hackathon, OpenStack has participated in this event ten times. The enthusiasm of the OpenStack Cinder, Cyborg and Nova project developers is still have an affect everyone. Since this Hackathon happened at the beginning stage of the OpenStack Ussuri release, OpenStack developers not only reviewed bugs and submitted patches, but also had some discussion on the points from the spec that will help people propose their feature in Ussuri.

Another spotlight on OpenStack at this Hackathon is OpenStack Tricircle. On the first day of the Hackathon, the presentation, delivered by Professor Fangming Liu from the Huazhong University of Science and Technology helped attendees learn more about OpenStack Tricircle. OpenStack Tricircle provides networking automation across Neutron servers in multi-region OpenStack clouds. The clouds are supported by geo-distributed datacenters and deployed in multiple regions. The OpenStack Tricircle team also has collaborated with Huawei as well as other members in the OpenStack community.


In this two-day Hackathon, the StarlingX community cleared out the path of Ceph containerization for StarlingX, which made a big step toward Cloud Native. To continuously enhance StarlingX network manageability, StarlingX is looking at the feasibility of integrating SDN solutions.

Similar to the previous China Open Source Hackathon, the StarlingX team organized a mini meetup and technical discussion on StarlingX 4.0 feature open discussion for Ceph containerization and small node Blueprint spec update. The StarlingX team had a tech discussion with the Juniper Network team who shared Tungsten Fabric SDN feature sets, architecture and a few BGP VPN solutions (such as VSNX, CSNX). The StarlingX developers helped the JITStack team as new community contributors, and they also had a discussion with the China Unicom Wo Cloud team for their Industry Edge solution enabling community roadmap alignment.

This week, the StarlingX community not only officially released the StarlingX 3.0 release on Monday, but also won this year’s China Excellent Open Source Project Award at the 9th China Cloud Computing Standards and Application Conference. Congratulations to the StarlingX community and the community members who contributed code! 

The post The 10th China Open Source Hackathon Recap: Projects, Talks, and More appeared first on Superuser.

by Sunny Cai at December 19, 2019 07:33 PM

December 18, 2019

Adam Young

keystone-db-init in OpenShift

Before I can run Keystone in a container, I need to initialize the database. This is as true for running in Kubernetes as it was using podman. Here’s how I got keystone-db-init to work.

The general steps were:

  • use oc new-app to generate the build-config and build
  • delete the deployment config generated by new-app
  • upload a secret containing keystone.conf
  • deploy a pod that uses the image built above and the secret version of keystone.conf to run keystone-manage db_init
oc delete deploymentconfig.apps.openshift.io/keystone-db-in

To upload the secret.

kubectl create secret generic keystone-conf --from-file=../keystone-db-init/keystone.conf

Here is the yaml definition for the pod

apiVersion: v1
kind: Pod
  name: keystone-db-init-pod
    app: myapp
  - image: image-registry.openshift-image-registry.svc:5000/keystone/keystone-db-init
    imagePullPolicy: Always
    name: keystone-db-init
    - name: keystone-conf
      mountPath: "/etc/keystone/"
  - name: keystone-conf
      secretName: keystone-conf
      - key: keystone.conf
        path: keystone.conf
        mode: 511       
    command: ['sh', '-c', 'cat /etc/keystone/keystone.conf']

While this is running as the keystone unix account, I am not certain how that happened. I did use the patch command I talked about earlier on the deployment config, but you can see I am not using that in this pod. That is something I need to straighten out.

To test that the database was initialized:

$ oc get pods -l app=mariadb-keystone
NAME                       READY   STATUS    RESTARTS   AGE
mariadb-keystone-1-rxgvs   1/1     Running   0          9d
$ oc rsh mariadb-keystone-1-rxgvs
sh-4.2$ mysql -h mariadb-keystone -u keystone -pkeystone keystone
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 908
Server version: 10.2.22-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [keystone]> show tables;
| Tables_in_keystone                 |
| access_rule                        |
| access_token                       |
46 rows in set (0.00 sec)

I’ve fooled myself in the past thinking that things have worked when they have note. To make sure I am not doing that now, I dropped the keystone database and recreated it from insider the mysql monitor program. I then re-ran the pod, and was able to see all of the tables.

by Adam Young at December 18, 2019 08:48 PM

December 17, 2019

OpenStack Superuser

Spreading OpenStack in Roots of Education: Open Infra Institute Day Kollam (Kerala)

Kollam, India—Bangalore based OpenStack enthusiasts and members of OpenTech Foundation gathered together for OpenStack Technology Institute Day in Amritha University campus in Kollam, Kerala, India. It was well-received by many students and raised interesting queries which Dell’s Principal Architect Prakash Ramchandran and VMware’s Dr. Ganesh Hiregoudar answered. Along with both of them, Institute Day was driven by Professor Vipin with FOSS updates and Calsoft‘s Digambar Patil was around to mentor students for upstream contributions including setting up the environment, submit patches, etc.

Here are some of the glimpses of event.

About OpenStack Technology Institute Day: 

OpenStack Technology Institute Day is a program to share knowledge about the different ways of contributing to OpenStack like providing new features, writing documentation, participating in working groups, and so forth. The educational program is built on the principle of open collaboration and will teach the students how to find information and navigate the intricacies of the project’s technical tools and social interactions in order to get their contributions accepted. The training is focusing on hands-on practice like the students can use a prepared development environment to learn how to test, prepare and upload new code snippets or documentation for review. The attendees are also given the opportunity to join a mentoring program to get further help and guidance on their journey to become an active and successful member of the OpenStack community.

About the author

Sagar Nangare is a technology blogger, focusing on data center technologies (networking, telecom, cloud, storage) and emerging domains like edge computing, IoT, machine learning, AI). He works at Calsoft Inc. as a digital strategist.

The post Spreading OpenStack in Roots of Education: Open Infra Institute Day Kollam (Kerala) appeared first on Superuser.

by Sagar Nangare at December 17, 2019 08:00 AM

December 16, 2019


Community Blog Round Up 16 December 2019

We’re super chuffed that there’s already another article to read in our weekly blog round up – as we said before, if you write it, we’ll help others see it! But if you don’t write it, well, there’s nothing to set sail. Let’s hear about your latest adventures on the Ussuri river and if you’re NOT in our database, you CAN be by creating a pull request to https://github.com/redhat-openstack/website/blob/master/planet.ini.

Reading keystone.conf in a container by Adam Young

Step 3 of the 12 Factor app is to store config in the environment. For Keystone, the set of configuration options is controlled by the keystone.conf file. In an earlier attempt at containerizing the scripts used to configure Keystone, I had passed an environment variable in to the script that would then be written to the configuration file. I realize now that I want the whole keystone.conf external to the application. This allow me to set any of the configuration options without changing the code in the container. More importantly, it allows me to make the configuration information immutable inside the container, so that the applications cannot be hacked to change their own configuration options.

Read more at https://adam.younglogic.com/2019/12/reading-keystone-conf-in-a-container/

by Rain Leander at December 16, 2019 11:45 AM

December 12, 2019

OpenStack Superuser

Ethernet VPN Deployment Automation with OpenStack and ODL Controller

The cool thing about OpenStack is – its tight integration with SDN solutions like OpenDaylight to keep apart network traffic, on-demand scaling and enabling centralized control on geographically distributed data centers. In this article, we will talk about a proposed SDN based architecture in which how OpenStack and OpenDaylight can be used to automate the deployment of VPN instances (Ethernet VPN in this case), centrally manage them along with regular updates network policies and enhancement in terms of scalability and response time on VPNs.

Problem with Interconnection of data centers with L2VPN

Virtual Private Network is generally used for geographically distributed data center interconnection. There were a lot of generations of VPN technologies that were introduced to address the connectivity needs between different sites. Layer -2 VPN (L2VPN) is the one that is widely used by organizations due to its flexibility and transparency. Virtual Private Lan Service (VPLS) service is used by L2VPN to connect different data centers. The main advantage of VPLS is that it can extend the VLAN to data centers. But VPLS has its own barriers in terms of redundancy, scalability, flexibility, and limited forwarding policies. However, Internet Service Providers (ISPs) use Multiprotocol Label Switching (MPLS) for data center interconnection because of its flexibility and ease in deployment. That triggers the necessity to have VPN technology designed for MPLS. This is where Ethernet VPN (EVPN) comes in, that address concerns and challenges associated with using VPN with MPLS. EVPN simple enables an L2 VPN connection over MPLS.

The core problem with EVPN was with manual configuration and management of EPVN instances which can cause huge time consumption, error-prone configuration and high OPEX.

An SDN Based Solution

To address the problem, SDN based architecture was proposed by researchers and engineers from Karlstd University and Ericsson. It utilized OpenDaylight SDN controller and OpenStack for automated remote deployment and automation of EVPN related tasks.

The offered solution in this paper mainly reduces two existing limitations. One is flexible network management automation and other is control plane complexity of MPLS based VPN and provision of flexibility for adding new network changes.


Before we dive into the architecture, let’s talk about how EVPN is a key technology for this solution to run EVPN dynamically on MPLS. EVPN uses MP-BGP in its control plane as a signaling method to broadcast addresses that removes the need of traditional flood-and-learn in the data plane. In EVPN, the control and data plane are abstracted and separated. That allows MPLS and Provider Edge Backbone Bridge to be used together with the EVPN control plane.

SDN Based EVPN Deployment Automation Architecture

The above architecture depicts the model-driven network management and automation of EVPN instances. In this model a YANG data modeling language is used to define services and configurations, represent state data and process notifications. A configuration data defined in YANG file transmitted to network devices. NETCONF protocol is used to for transmission of configuration along with installation, deletion, and manipulation of configuration of network devices. Transmitted messages are encoded in XML file. NETCONF admin help data to pass through, validate the configuration and after successful execution admin commit changes to network devices. SDN controller leverages the NETCONF for automating the configuration of EVIs on provider edge routers.

Let’s understand the role of key components in the architecture

OpenStack: It is used as a central cloud platform to orchestrate the management of EVPNs using SDN controller. OpenStack Neutron project API is used to communicate with ODL SDN controller to manage EVPN instances attached in network.

OpenDaylight SDN Controller: It is the core element of this architecture which extends the Multiprotocol Border Gatway Protocol (MP-BGP) inside OpenDaylight controller with MP-BGP control plane (EVPN instances on the provider edge/data center) and the VPNService inside the OpenDaylight controller that automates EVPN configuration using YANG and NETCONF. This bypasses the slow and error-prone tasks of manual EVPN configuration.

Open vSwtich (OVS): This switch sits inside OpenStack compute nodes. It is used to isolate the traffic among different VMs and connects them to the physical network.

Provider Edge (PE) routers: The PE acts as a middleware for the data centers and supports EVPN and MP-BGP extensions as well as NETCONF and YANG.

Above architecture solution is evaluated. You can refer to the paper for test results here.


The post Ethernet VPN Deployment Automation with OpenStack and ODL Controller appeared first on Superuser.

by Sagar Nangare at December 12, 2019 02:00 PM

Adam Young

Reading keystone.conf in a container

Step 3 of the 12 Factor app is to store config in the environment. For Keystone, the set of configuration options is controlled by the keystone.conf file. In an earlier attempt at containerizing the scripts used to configure Keystone, I had passed an environment variable in to the script that would then be written to the configuration file. I realize now that I want the whole keystone.conf external to the application. This allow me to set any of the configuration options without changing the code in the container. More importantly, it allows me to make the configuration information immutable inside the container, so that the applications cannot be hacked to change their own configuration options.

I was running the pod and mounting the local copy I had of the keystone.conf file using this command line:

podman run --mount type=bind,source=/home/ayoung/devel/container-keystone/keystone-db-init/keystone.conf,destination=/etc/keystone/keystone.conf:Z --add-host keystone-mariadb:   --network maria-bridge  -it localhost/keystone-db-init 

It was returning with no output. To diagnose, I added on /bin/bash to the end of the command so I could poke around inside the running container before it exited.

podman run --mount /home/ayoung/devel/container-keystone/keystone-db-init/keystone.conf:/etc/keystone/keystone.conf    --add-host keystone-mariadb:   --network maria-bridge  -it localhost/keystone-db-init /bin/bash

Once inside, I was able to look at the keystone log file. A Stack trasce made me realize that I was not able to actually read the file /etc/keystone/keystone.conf. Using ls I would show up like this:

-?????????? ? ?        ?             ?            ? keystone.conf:

It took a lot of trial and error to recitify it including:

  • adding a parallel entry to my hosts /etc/password and /etc/groups file for the keystone user and group
  • Ensuring that the file was owned by keystone outside the container
  • switching to the -v option to create the bind mount, as that allowed me to use the :Z option as well.
  • addingthe -u keystone option to the command line

The end command looked like this:

podman run -v /home/ayoung/devel/container-keystone/keystone-db-init/keystone.conf:/etc/keystone/keystone.conf:Z  -u keystone         --add-host keystone-mariadb:   --network maria-bridge  -it localhost/keystone-db-init 

Once I had it correct, I could use the /bin/bash executable to again poke around inside the container. From the inside, I could run:

$ keystone-manage db_version
$ mysql -h keystone-mariadb -ukeystone -pkeystone keystone  -e "show databases;"
| Database           |
| information_schema |
| keystone           |

Next up is to try this with OpenShift.

by Adam Young at December 12, 2019 12:09 AM

December 10, 2019

OpenStack Superuser

Unleashing the OpenStack “Train”: Contribution from Intel and Inspur

The OpenStack community released the latest version, “Train”, on October 16th. As Platinum and Gold members of OpenStack Foundation, Intel and Inspur OpenStack teams are actively contributing to the community projects, such as Nova, Neutron, Cinder, Cyborg, and others. During the Train development cycle, both companies collaborated, contributed to and completed multiple achievements. This includes 4 blueprints and design specifications in Train, commits, reviews and more, and reflects the high level of contribution to the development of OpenStack code base.

In early September 2019, Intel and Inspur worked together and used the InCloud OpenStack 5.6 (ICOS 5.6) to validate a single cluster deployment with 200 and 500 nodes. This created a solid foundational reference architecture for OpenStack in a large-scale single cluster environment. Intel and Inspur closely monitor the latest development updates in the community and upgraded ICOS5.6 to support new features of Train. For example, while validating the solution, a networking bottleneck issue (Neutron IPAM DLM and IP address allocation) was found in a large-scale high concurrency provisioning scenario (e.g. >800 VM creation). After applying a distributed lock solution with etcd, the network creation process was optimized and significantly improved system performance. The team also worked on Nova project to provide “delete on termination” feature for VM volumes. This greatly improves operation efficiency for cloud administrators. Another important new feature “Nova VPMEM” is also included in OpenStack “Train” release. This feature can guarantee persistent data storage functionality across power cycles, at a lower cost and larger capacity compared to DRAM. This can significantly improve workload performance for applications such as Redis, Rocksdb, SAP HANA, Aerospike, etc.

Intel and Inspur shared many of the engineering best practices at the recent Shanghai Open Infrastructure Summit, including resources for 500 node large-scale cluster deployment in relevant sessions such as “full stack security chain of trust and best practices in cloud”, “improving private cloud performance for big data analytics workloads”, and more.

Chief Architect of Intel Data Center Group, Enterprise & Government for China Division, Dr Yih Leong Sun said: Intel is actively contributing to the OpenStack upstream community and will continue to improve OpenStack architecture with Intel’s latest technology. We strive to build a software defined infrastructure, optimized at both the software and hardware layer, and to deliver an Open Cloud solution that meets the workload performance requirements of the industry.

Vice President of Inspur Group, Zhang Dong indicated: Inspur is increasingly investing more on upstream community and contributing our knowledge and experience with industry deployment and usage. We continue to strengthen our technical leadership and contribution in the community, to help users solve real-world challenges, and to promote the OpenStack adoption.


Photo // CC BY NC

The post Unleashing the OpenStack “Train”: Contribution from Intel and Inspur appeared first on Superuser.

by Brin Zhang and Lily Wu at December 10, 2019 08:00 AM

December 09, 2019


Community Blog Round Up 09 December 2019

As we sail down the Ussuri river, Ben and Colleen report on their experiences at Shanghai Open Infrastructure Summit while Adam dives into Buildah.

Let’s Buildah Keystoneconfig by Adam Young

Buildah is a valuable tool in the container ecosystem. As an effort to get more familiar with it, and to finally get my hand-rolled version of Keystone to deploy on Kubernetes, I decided to work through building a couple of Keystone based containers with Buildah.

Read more at https://adam.younglogic.com/2019/12/buildah-keystoneconfig/

Oslo in Shanghai by Ben Nemec

Despite my trepidation about the trip (some of it well-founded!), I made it to Shanghai and back for the Open Infrastructure Summit and Project Teams Gathering. I even managed to get some work done while I was there. 🙂

Read more at http://blog.nemebean.com/content/oslo-shanghai

Shanghai Open Infrastructure Forum and PTG by Colleen Murphy

The Open Infrastructure Summit, Forum, and Project Teams Gathering was held last week in the beautiful city of Shanghai. The event was held in the spirit of cross-cultural collaboration and attendees arrived with the intention of bridging the gap with a usually faraway but significant part of the OpenStack community.

Read more at http://www.gazlene.net/shanghai-forum-ptg.html

by Rain Leander at December 09, 2019 12:24 PM

December 06, 2019

OpenStack Superuser

A Guide to Kubernetes Etcd: All You Need to Know to Set up Etcd Clusters

We all know Kubernetes is a distributed platform that orchestrates different worker nodes and can be controlled by central master nodes. There can be ‘n’ number of worker nodes that can be distributed to handle pods. To keep track of all changes and updates of these nodes and pass on the desired action, Kubernetes uses etcd.

What is etcd in Kubernetes?

Etcd is a distributed reliable key-value store which is simple, fast and secure. It acts like a backend service discovery and database, runs on different servers in Kubernetes clusters at the same time to monitor changes in clusters and to store state/configuration data that should to be accessed by a Kubernetes master or clusters. Additionally, etcd allows Kubernetes master to support discovery service so that deployed application can declare their availability for inclusion in service.

The API server component in Kubernetes master nodes communicates with etcd the components spread across different clusters. Etcd is also useful to set up the desired state for the system.

By means of key-value store for Kubernetes etcd, it stores all configurations for Kubernetes clusters. It is different than traditional database which stores data in tabular form. Etcd creates a database page for each record which do not hampers other records while updating one. For example, this might happen that few records may require additional columns, but those not required by other records in the same database. This creates redundancy within database. Etcd adds and manages all records in reliable way for Kubernetes.

Distributed and Consistent

Etcd stores a critical data for Kubernetes. By means of distributed, it also maintains a copy of data stores across all clusters on distributed machines/servers. This copy is identical for all data stores and maintains the same data from all other etcd data stores. If one copy get destroys, the other two hold the same information.

Deployment Methods for etcd in Kubernetes Clusters

Etcd is implementation is architected in such a way to enable high availability in Kubernetes. Etcd can be deployed as pods in master nodes

Figure – etcd in the same cluster

Image source: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

It can also be deployed externally to enable resiliency and security

Figure – etcd deployed externally

Image source: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

How etcd Works

Etcd acts as the brain of the Kubernetes cluster. Monitoring the sequence of changes is done using the ‘Watch’ function of etcd. With this function, Kubernetes can subscribe to changes within clusters and execute any state request coming from API server. Etcd co-ordinates with different components within distributed clusters. Etcd reacts to changes with state of components and other components may get reacted to changes.

There might be a situation – while maintaining the same copy of all state among group of etcd components in clusters, the same data needs to be stored in two etcd instances. However, etcd is not supposed to update the same record in different instances. 

In such cases, etcd does not process the writes on each cluster node. Instead, only one of the instances gets the responsibility to process the writes internally. That node is called leader. The other nodes in cluster elect a leader using RAFT algorithm. Once the leader get elected, the other node becomes the followers of the leader. 

Now, when the write requests came to the leader node then the leader processes the write. The leader etcd node broadcasts a copy of the data to other nodes. If one of the follower nodes is not active or offline that moment, based on the majority of available nodes write requests get a complete flag. Normally, the write gets the complete flag if the leader gets consent from the other members in the cluster. 

This is the way they elect the leader among themselves and how do they ensure a write is propagated across all instances. This distributed consensus is implemented in etcd using raft protocol.

How Clusters Work in etcd 

Kubernetes is the main consumer for etcd project, initiated by CoreOS. Etcd has become a norm for functionality and overall tracking of Kubernetes cluster pods. Kubernetes allows various cluster architectures that may involve etcd as a crucial component or might involve multiple master nodes along with etcd as isolated component. 

The role of etcd changes per system configuration in any particular architecture. Such dynamic placement of etcd to manage clusters can be implemented to improve scaling. The result is easily supported and managed workloads. 

Here are the steps for initiating etcd in Kubernetes.

Wget the etcd files:

wget -q --show-progress --https-only --timestamping \ "https://github.com/etcd-io/etcd/releases/download/v3.4.0/etcd-v3.4.0-linux-amd64.tar.gz"

Tar and install the etcd server and the etcdctl tools:


tar -xvf etcd-v3.4.0-linux-amd64.tar.gz  

sudo mv etcd-v3.4.0-linux-amd64/etcd* /usr/local/bin/



sudo mkdir -p /etc/etcd /var/lib/etcd  

sudo cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/


Get the internal IP address for the current compute instance. It will be will be used to deal with client requests and data transmission with etcd cluster peers.:

INTERNAL_IP=$(curl -s -H "Metadata-Flavor: Google" \  http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip)

 Place the unique name for etcd to match the hostname of the current compute instance:

ETCD_NAME=$(hostname -s)

Create the etcd.service systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/etcd.service[Unit]





ExecStart=/usr/local/bin/etcd \\  

--name ${ETCD_NAME} \\  

--cert-file=/etc/etcd/kubernetes.pem \\  

--key-file=/etc/etcd/kubernetes-key.pem \\  

--peer-cert-file=/etc/etcd/kubernetes.pem \\  

--peer-key-file=/etc/etcd/kubernetes-key.pem \\  

--trusted-ca-file=/etc/etcd/ca.pem \\  

--peer-trusted-ca-file=/etc/etcd/ca.pem \\  

--peer-client-cert-auth \\  

--client-cert-auth \\  

--initial-advertise-peer-urls https://${INTERNAL_IP}:2380 \\  

--listen-peer-urls https://${INTERNAL_IP}:2380 \\  

--listen-client-urls https://${INTERNAL_IP}:2379, \\  

--advertise-client-urls https://${INTERNAL_IP}:2379 \\  

--initial-cluster-token etcd-cluster-0 \\  

--initial-cluster controller-0=,controller-1=,controller-2= \\  

--initial-cluster-state new \\  








Initiate etcd Server


sudo systemctl daemon-reload  

sudo systemctl enable etcd  

sudo systemctl start etcd


Repeat above commands on: controller-0, controller-1, and controller-2.

List the etcd cluster members:

sudo ETCDCTL_API=3 etcdctl member list \  

--endpoints= \  

--cacert=/etc/etcd/ca.pem \  

--cert=/etc/etcd/kubernetes.pem \  



3a57933972cb5131, started, controller-2,,

f98dc20bce6225a0, started, controller-0,,

ffed16798470cab5, started, controller-1,,


Etcd is an independent project at its core. But, it has been used extensively by the Kubernetes community to provide various benefits for managing states of clusters, enabling further automation for dynamic workloads. The key benefit for using Kubernetes with etcd is that, etcd is itself a distributed database that co-align with distributed Kubernetes clusters. So, using etcd with Kubernetes is vital for the health of the clusters.

About the author

Sagar Nangare is a technology blogger, focusing on data center technologies (networking, telecom, cloud, storage) and emerging domains like edge computing, IoT, machine learning, AI). He works at Calsoft Inc. as a digital strategist.

The post A Guide to Kubernetes Etcd: All You Need to Know to Set up Etcd Clusters appeared first on Superuser.

by Sagar Nangare at December 06, 2019 01:00 PM

December 03, 2019

Adam Young

Let’s Buildah Keystoneconfig

Buildah is a valuable tool in the container ecosystem. As an effort to get more familiar with it, and to finally get my hand-rolled version of Keystone to deploy on Kubernetes, I decided to work through building a couple of Keystone based containers with Buildah.

First, I went with the simple approach of modifying my old Dockerfiles to a later release of OpenStack, and kick off the install using buildah. I went with Stein.

Why not Train? Because eventually I want to test 0 down time upgrades. More on that later

The buildah command was just:

 buildah bud -t keystone 

However, to make that work, I had to adjust the Dockerfile. Here is the diff:

diff --git a/keystoneconfig/Dockerfile b/keystoneconfig/Dockerfile
index 149e62f..cd5aa5c 100644
--- a/keystoneconfig/Dockerfile
+++ b/keystoneconfig/Dockerfile
@@ -1,11 +1,11 @@
-FROM index.docker.io/centos:7
+FROM docker.io/centos:7
-RUN yum install -y centos-release-openstack-rocky &&\
+RUN yum install -y centos-release-openstack-stein &&\
     yum update -y &&\
     yum -y install openstack-keystone mariadb openstack-utils  &&\
     yum -y clean all
 COPY ./keystone-configure.sql /
 COPY ./configure_keystone.sh /
-CMD /configure_keystone.sh
\ No newline at end of file
+CMD /configure_keystone.sh

The biggest difference is that I had to specify the name of the base image without the “index.” prefix. Buildah is strictah (heh) in what it accepts.

I also updated the package to stein. When I was done, I had the following:

$ buildah images
REPOSITORY                 TAG      IMAGE ID       CREATED          SIZE
localhost/keystone         latest   e52d224fa8fe   13 minutes ago   509 MB
docker.io/library/centos   7        5e35e350aded   3 weeks ago      211 MB

What if I wanted to do these same things via manual steps? Following the advice from the community, I can translate from Dockerfile-ese to buildah. First, I can fetch the original image using the buildah from command:

container=$(buildah from docker.io/centos:7)
$ echo $container 

Now Add things to the container. We don’t build a new layer with each command, so the && approach is not required. So for the yum installs:

buildah run $container yum install -y centos-release-openstack-stein
buildah run $container yum update -y
buildah run $container  yum -y install openstack-keystone mariadb openstack-utils
buildah run $container  yum -y clean all

To Get the files into the container, use the copy commands:

buildah copy $container  ./keystone-configure.sql / 
buildah copy $container ./configure_keystone.sh / 

The final steps: tell the container what command to run and commit it to an image.

buildah config --cmd /configure_keystone.sh $container
buildah commit $container keystone

What do we end up with?

$ buildah images
REPOSITORY                 TAG      IMAGE ID       CREATED              SIZE
localhost/keystone         latest   09981bc1e95a   About a minute ago   509 MB
docker.io/library/centos   7        5e35e350aded   3 weeks ago          211 MB

Since I have an old, hard-coded IP address for the MySQL server, it is going to fail. But lets see:

buildah run centos-working-container /configure_keystone.sh
2019-12-03T16:34:16.000691965Z: cannot configure rootless cgroup using the cgroupfs manager

And there it hangs. We’ll work on that in a bit.

I committed the container before setting the author field. That should be a line like:
buildah config --author "ayoung@redhat.com"
to map line-to-line with the Dockerfile.

by Adam Young at December 03, 2019 04:43 PM

December 01, 2019

Thomas Goirand

Upgrading an OpenStack Rocky cluster from Stretch to Buster

Upgrading an OpenStack cluster from one version of OpenStack to another has become easier, thanks to the versioning of objects in the rabbitmq message bus (if you want to know more, see what oslo.versionedobjects is). But upgrading from Stretch to Buster isn’t easy at all, event with the same version of OpenStack (it is easier to be running OpenStack Rocky backports on Stretch and upgrade to Rocky on Buster, rather than upgrading OpenStack at the same time as the system).

The reason it is difficult, is because rabbitmq and corosync in Stretch can’t talk to the versions shipped in Buster. Also, in a normal OpenStack cluster deployment, services on all machines are constantly doing queries to the OpenStack API, and exchanging messages through the RabbitMQ message bus. One of the dangers, for example, would be if a Neutron DHCP agent could not exchange messages with the neutron-rpc-server. Your VM instances in the OpenStack cluster then could loose connectivity.

If a constantly online HA upgrade with no downtime isn’t possible, it is however possible to minimize down time to just a few seconds, if following a correct procedure. It took me more than 10 tries to be able to do everything in a smooth way, understanding and working around all the issues. 10 tries, means installing 10 times an OpenStack cluster in Stretch (which, even if fully automated, takes about 2 hours) and trying to upgrade it to Buster. All of this is very time consuming, and I haven’t seen any web site documenting this process.

This blog post intends to document such a process, to save the readers the pain of hours of experimentation.

Note that this blog post asserts you’re cluster has been deployed using OCI (see: https://salsa.debian.org/openstack-team/debian/openstack-cluster-installer) however, it should also apply to any generic OpenStack installation, or even to any cluster running RabbitMQ and Corosync.

The root cause of the problem more in details: incompatible RabbitMQ and Corosync in Stretch and Buster

RabbitMQ in Stretch is version 3.6.6, and Buster has version 3.7.8. In theory, the documentation of RabbitMQ says it is possible to smoothly upgrade a cluster with these versions. However, in practice, the problem is the Erlang version rather than Rabbit itself: RabbitMQ in Buster will refuse to talk to a cluster running Stretch (the daemon will even refuse to start).

The same way, Corosync 3.0 in Buster will refuse to accept messages from Corosync 2.4 in Stretch.

Overview of the solution for RabbitMQ & Corosync

To minimize downtime, my method is to shutdown RabbitMQ on node 1, and let all daemons (re-)connect to node 2 and 3. Then we upgrade node 1 fully, and then restart Rabbit in there. Then we shutdown Rabbit on node 2 and 3, so that all daemons of the cluster reconnect to node 1. If done well, the only issue is if a message is still in the cluster of node 2 and 3 when daemons fail-over to node 1. In reality, this isn’t really a problem, unless there’s a lot of activity on the API of OpenStack. If this was the case (for example, if running a public cloud), then the advise would simply to firewall the OpenStack API for the short upgrade period (which shouldn’t last more than a few minutes).

Then we upgrade node 2 and 3 and make them join the newly created RabbitMQ cluster in node 1.

For Corosync, node 1 will not allow start the VIP resource before node 2 is upgraded and both nodes can talk to each other. So we just upgrade node 2, and turn off the VIP resource on node 3 immediately when it is up on node 1 and 2 (which happens during the upgrade of node 2).

The above should be enough reading for most readers. If you’re not that much into OpenStack, it’s ok to stop reading this post. For those who are move involved users of OpenStack on Debian deployed with OCI, let’s go more in details…

Before you start: upgrading OCI

In previous versions of OCI, the haproxy configuration was missing a “option httpcheck” for the MariaDB backend, and therefore, if a MySQL server on one node was going down, haproxy wouldn’t detect it, and the whole cluster could fail (re-)connecting to MySQL. As we’re going to bring some MySQL servers down, make sure the puppet-master is running with the latest version of puppet-module-oci, and that the changes have been applied in all OpenStack controller nodes.

Upgrading compute nodes

Before we upgrade the controllers, it’s best to start by compute nodes, which are the most easy to do. The easiest way is to live-migrate all VMs away from the machine before proceeding. First, we disable the node, so no new VM can be spawned on it:

openstack compute service set --disable z-compute-1.example.com nova-compute

Then we list all VMs on that compute node:

openstack server list –all-projects –host z-compute-1.example.com

Finally we migrate all VMs away:

openstack server migrate --live hostname-compute-3.infomaniak.ch --block-migration 8dac2f33-d4fd-4c11-b814-5f6959fe9aac

Now we can do the upgrade. First disable pupet, then tweak the sources.list, upgrade and reboot:

puppet agent --disable "Upgrading to buster"
apt-get remove python3-rgw python3-rbd python3-rados python3-cephfs librgw2 librbd1 librados2 libcephfs2
rm /etc/apt/sources.list.d/ceph.list
sed -i s/stretch/buster/g /etc/apt/sources.list
mv /etc/apt/sources.list.d/stretch-rocky.list /etc/apt/sources.list.d/buster-rocky.list
echo "deb http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main
deb-src http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main" >/etc/apt/sources.list/buster-rocky.list
apt-get update
apt-get dist-upgrade

Then we simply re-apply puppet:

puppet agent --enable ; puppet agent -t
apt-get purge linux-image-4.19.0-0.bpo.5-amd64 linux-image-4.9.0-9-amd64

Then we can re-enable the compute service:

openstack compute service set --enable z-compute-1.example.com nova-compute

Repeate the operation for all compute nodes, then we’re ready for the upgrade of controller nodes.

Removing Ceph dependencies from nodes

Most likely, if running with OpenStack Rocky on Stretch, you’d be running with upstream packages for Ceph Luminous. When upgrading to Buster, there’s no upstream repository anymore, and packages will use Ceph Luminous directly from Buster. Unfortunately, the packages from Buster are in a lower version than the packages from upstream. So before upgrading, we must remove all Ceph packages from upstream. This is what has been done just above for the compute nodes also. Upstream Ceph packages are easily identifiable, because upstream uses “bpo90” instead of what we do in Debian (ie: bpo9), so the operation can be:

apt-get remove $(dpkg -l | grep bpo90 | awk '{print $2}' | tr '\n' ' ')

This will remove python3-nova, which is fine as it is also running on the other 2 controllers. After switching the /etc/apt/sources.list to buster, Nova can be installed again.

In a normal setup by OCI, here’s the sequence of command that needs to be done:

rm /etc/apt/sources.list.d/ceph.list
sed -i s/stretch/buster/g /etc/apt/sources.list
mv /etc/apt/sources.list.d/stretch-rocky.list /etc/apt/sources.list.d/buster-rocky.list
echo "deb http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main
deb-src http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main" >/etc/apt/sources.list/buster-rocky.list
apt-get update
apt-get dist-upgrade
apt-get install nova-api nova-conductor nova-consoleauth nova-consoleproxy nova-placement-api nova-scheduler

You may notice that we’re replacing the Stretch Rocky backports repository by one for Buster. Indeed, even if all of Rocky is in Buster, there’s a few packages that are still pending for the review of the Debian stable release team before they can be uploaded to Buster, and we need the fixes for a smooth upgrade. See release team bugs #942201, #942102, #944594, #941901 and #939036 for more details.

Also, since we only did a “apt-get remove”, the Nova configuration in nova.conf must have stayed, and nova is already configured, so when we reinstall the services we removed when removing the Ceph dependencies, they will be ready to go.

Upgrading the MariaDB galera cluster

In an HA OpenStack cluster, typically, a Galera MariaDB cluster is used. That isn’t a problem when upgrading from Stretch to Buster, because the on-the-wire format stays the same. However, the xtrabackup library in Stretch is held by the MariaDB packages themselves, while in Buster, one must install the mariadb-backup. As a consequence, best is to simply turn off MariaDB in a node, do the Buster upgrade, install the mariadb-backup package, and restart MariaDB. To avoid that the MariaDB package attempts restarting the mysqld daemon, best is to mask the systemd unit:

systemctl stop mysql.service
systemctl disable mysql.service
systemctl mask mysql.service

Upgrading rabbitmq-server

Before doing anything, make sure all of your cluster is running with the python3-oslo.messaging version >= 8.1.4. Indeed, version 8.1.3 suffers from a bug where daemons would attempt reconnect constantly to the same server, instead of trying each of the servers described in the transport_url directive. Note that I’ve uploaded 8.1.4-1+deb10u1 to Buster, and that it is part of the 10.2 Buster point release. Though upgrading oslo.messaging will not restart daemons automatically: this must be done manually.

The strategy for RabbitMQ is to completely upgrade one node, start Rabbit on it, without any clustering, then shutdown the service on the other 2 node of the cluster. If this is performed fast enough, no message will be list in the message bus. However, there’s a few traps. Running “rabbitmqctl froget_cluster_node” only removes a node from the cluster for those who will still be running. It doesn’t remove the other nodes from the one which we want to upgrade. The way I’ve found to solve this is to simply remove the mnesia database of the first node, so that when it starts, RabbitMQ doesn’t attempt to cluster with the other 2 which are running a different version of Erlang. If it did, then it would just fail and refused to start.

However, there’s another issue to take care. When upgrading the 1st node to Buster, we removed Nova, because of the Ceph issue. Before we restart the RabbitMQ service on node 1, we need to install Nova, so that it will connect to either node 2 or 3. If we don’t do that, then Nova on node 1 may connect to the RabbitMQ service on node 1, which at this point, is a different RabbitMQ cluster than the one in node 2 and 3.

rabbitmqctl stop_app
systemctl stop rabbitmq-server.service
systemctl disable rabbitmq-server.service
systemctl mask rabbitmq-server.service
[ ... do the Buster upgrade fully ...]
[ ... reinstall Nova services we removed when removing Ceph ...]
rm -rf /var/lib/rabbitmq/mnesia
systemctl unmask rabbitmq-server.service
systemctl enable rabbitmq-server.service
systemctl start rabbitmq-server.service

At this point, since the node 1 RabbitMQ service was down, all daemons are connected to the RabbitMQ service on node 2 or 3. Removing the mnesia database removes all the credentials previously added to rabbitmq. If nothing is done, OpenStack daemons will not be able to connect to the RabbitMQ service on node 1. If like I do, one is using a config management system to populate the access rights, it’s rather easy: simply re-apply the puppet manifests, which will re-add the credentials. However, that isn’t enough: the RabbitMQ message queues are created when the OpenStack daemon starts. As I experienced, daemons will reconnect to the message bus, but will not recreate the queues unless daemons are restarted. Therefore, the sequence is as follow:

Do “rabbitmqctl start_app” on the first node. Add all credentials to it. If your cluster was setup with OCI and puppet, simply look at the output of “puppet agent -t –debug” to capture the list of commands to perform the credential setup.

Do a “rabbitmqctl stop_app” on both remaining nodes 2 and 3. At this point, all daemons will reconnect to the only remaining server. However, they wont be able to exchange messages, as the queues aren’t declared. This is when we must restart all daemons in one of the controllers. The whole operation normally doesn’t take more than a few seconds, which is how long your message bus wont be available. To make sure everything works, check the logs in /var/log/nova/nova-compute.log of one of your compute nodes to make sure Nova is able to report its configuration to the placement service.

Once all of this is done, there’s nothing to worry anymore about RabbitMQ, as all daemons of the cluster are connected to the service on node 1. However, one must make sure that, when upgrading node 2 and 3, they don’t reconnect to the message service on node 2 and 3. So best is to simply stop, disable and mask the service with systemd before continuing. Then, when restarting the Rabbit service on node 2 and 3, OCI’s shell script “oci-auto-join-rabbitmq-cluster” will make them join the new Rabbit cluster, and everything should be fine regarding the message bus.

Upgrading corosync

In an OpenStack cluster setup by OCI, 3 controllers are typically setup, serving the OpenStack API through a VIP (a Virtual IP). What we call a virtual IP is simply an IP address which is able to move from one node to another automatically depending on the cluster state. For example, with 3 nodes, if one goes down, one of the other 2 nodes will take over hosting the IP address which serves the OpenStack API. This is typically done with corosync/pacemaker, which is what OCI sets up.

The way to upgrade corosync is easier than the RabbitMQ case. The first node will refuse to start the corosync resource if it can’t talk to at least a 2nd node. Therefore, upgrading the first node is transparent until we touch the 2nd node: the openstack-api resource wont be started on the first node, so we can finish the upgrade in it safely (ie: take care of RabbitMQ as per above). The first thing to do is probably to move the resource to the 3rd node:

crm_resource --move --resource openstack-api-vip --node z-controller-3.example.com

Once the first node is completely upgraded, we upgrade the 2nd node. When it is up again, we can check the corosync status to make sure it is running on both node 1 and 2:

crm status

If we see the service is up on node 1 and 2, we must quickly shutdown the corosync resource on node 3:

crm resource stop openstack-api-vip

If that’s not done, then node 3 may also reclaim the VIP, and therefore, 2 nodes may it. If running with the VIP using L2 protocol, normally switches will connect only one of the machines declaring the VIP, so even if we don’t take care of it immediately, the upgrade should be smooth anyway. If, like I do in production, you’re running with BGP (OCI allows one to use BGP for the VIP, or simply use an IP on a normal L2 network), then the situation must be even better, as the peering router will continue to route to one of the controllers in the cluster. So no stress, this must be done, but no need to hurry as much as for the RabbitMQ service.

Finalizing the upgrade

Once node 1 and 2 are up, most of the work is done, and the 3rd node can be upgraded without any stress.

Recap of the procedure for controllers

  • Move all SNAT virtual routers running on node 1 to node 2 or 3 (note: this isn’t needed if the cluster has network nodes).
  • Disable puppet on node 1.
  • Remove all Ceph libraries from upstream on node 1, which also turn off some Nova services that runtime depend on them.
  • shutdown rabbitmq on node 1, including masking the service with systemd.
  • upgrade node 1 to Buster, fully. Then reboot it. This probably will trigger MySQL re-connections to node 2 or 3.
  • install mariadb-backup, start the mysql service, and make sure MariaDB is in sync with the other 2 nodes (check the log files).
  • reinstall missing Nova services on node 1.
  • remove the mnesia db on node 1.
  • start rabbitmq on node 1 (which now, isn’t part of the RabbitMQ cluster on node 2 and 3).
  • Disable puppet on node 2.
  • populate RabbitMQ access rights on node 1. This can be done by simply applying puppet, but may be dangerous if puppet restarts the OpenStack daemons (which therefore may connect to the RabbitMQ on node 1), so best is to just re-apply the grant access commands only.
  • shutdown rabbitmq on node 2 and 3 using “rabbitmqctl stop_app”.
  • quickly restart all daemons on one controller (for example the daemons on node 1) to declare message queues. Now all daemons must be reconnected and working with the RabbitMQ cluster on node 1 alone.
  • Re-enable puppet, and re-apply puppet on node 1.
  • Move all Neutron virtual routers from node 2 to node 1.
  • Make sure the RabbitMQ services are completely stopped on node 2 and 3 (mask the service with systemd).
  • upgrade node 2 to Buster (shutting down RabbitMQ completely, masking the service to avoid it restarts during upgrade, removing the mnesia db for RabbitMQ, and finally making it rejoin the newly node 1 single node cluster using oci-auto-join-rabbitmq-cluster: normally, puppet does that for us).
  • Reboot node 2.
  • When corosync on node 2 is up again, check corosync status to make sure we are clustering between node 1 and 2 (maybe the resource on node 1 needs to be started), and shutdown the corosync “openstack-api-vip” resource on node 3 to avoid the VIP to be declared on both nodes.
  • Re-enable puppet and run puppet agent -t on node 2.
  • Make node 2 rabbitmq-server has joined the new cluster declared on node 1 (do: rabbitmqctl cluster_status) so we have HA for Rabbit again.
  • Move all Neutron virtual routers of node 3 to node 1 or 2.
  • Upgrade node 3 fully, reboot it, and make sure Rabbit is connected to node 1 and 2, as well as corosync working too, then re-apply puppet again.

Note that we do need to re-apply puppet each time, because of some differences between Stretch and Buster. For example, Neutron in Rocky isn’t able to use iptables-nft, and puppet needs to run some update-alternatives command to select iptables-legacy instead (I’m writing this because this isn’t obvious, it’s just that sometimes, Neutron fails to parse the output of iptables-nft…).

Last words as a conclusion

While OpenStack itself has made a lot of progress for the upgrade, it is very disappointing that those components on which OpenStack relies (like corosync, who is typically used as the provider of high availability), aren’t designed with backward compatibility in mind. It is also disappointing that the Erlang versions in Stretch and Buster are incompatible this way.

However, with the correct procedure, it’s still possible to keep services up and running, with a very small down time, even to the point that a public cloud user wouldn’t even notice it.

As the procedure isn’t easy, I strongly suggest anyone attempting such an upgrade to train before proceeding. With OCI, it is easy to do run a PoC using the openstack-cluster-installer-poc package, which is the perfect environment to train on: it’s easy to reproduce, reinstall a cluster and restart the upgrade procedure.

by Goirand Thomas at December 01, 2019 04:45 PM

November 28, 2019


Comparison of Software Defined Networking (SDN) Controllers. Part 8: Tungsten Fabric

Aptira Comparison of Software Defined Networking (SDN) Controllers. Tungsten Fabric

The previous Software Defined Networking (SDN) in this series might help users and organisations to choose a right SDN controller for their platform that matches their network infrastructure and requirements. These controllers could be a suitable choice to be used in Communication Service Providers (CSP), data centers, research or suitable choice for integration with other platforms. However, with the current IT market, organisations are moving towards migrating their old infrastructure to the Cloud and cloudifying every part of their infrastructure. As such, we will now look at one of the SDN controllers which has been designed to work in a cloud-grade network – Tungsten Fabric (TF).

TF can be a suitable choice for cloud builders and cloud-native platform engineers. It has been first associated with Juniper but now is under the Linux Foundation umbrella.


Tungsten Fabrics architecture is composed of two major software components: TF vRouter and TF Controller.

Aptira Tungsten Fabric Architecture
TF vRouter is used for packet forwarding and applying network and security policies to the devices in the network.

  • VRouters need to be run in each host or compute node in the network. It replaces the Linux bridge and traditional routing stack IP tables, or OpenVSwitch networking on the compute hosts.
  • The TF Controller communicates with the vRouters via Extensible Messaging and Presence Protocol (XMPP) to apply the desired networking and security policies.

TF Controllers consists of following software services:

  • Control and Configuration services for communicating with vRouters and maintaining the network topology and network policies.
  • Analytics services for telemetry and troubleshooting.
  • Web UI services for interacting with users.
  • And finally, services to provide integration with private and public could, CNI plugins, virtual machine and bare metal.

Tungsten Fabric version 5.0 and later architecture use microservices based on Docker containers as shown in figure below to deploy the services mentioned above. This makes the controller resilient against failure and highly available which result in the customer user experience.

Aptira Tungsten Fabric Architecture

Modularity and Extensibility

TF microservice-based architecture allows developing particular services based on the performance requirement and increasing load. Also, microservices by nature are modular which makes the maintenance and extensibility of the platform easy whilst isolating the failure of services from each other.


Cluster Scalability

  • TF proceeds towards cluster scalability in a modular fashion. This means each TF role can be scaled horizontally by adding more nodes for that related role. Also, the number of pods for each node is scalable. Zookeeper has been used to choose the active node so the number of pods deployed in the Controller and Analytics nodes must be an odd number according to the nature of the Zookeeper algorithm.

Architectural Scalability

  • TF supports BGP protocol and each TF controller can be connected to other controllers via the BGP protocol. This means TF can be used to connect different SDN islands.


  • Southbound: TF uses the XMPP protocol for communicating with vRouters (data plane) to deliver the overlay SDN solution. BPG also can be used to communicate with legacy devices.
  • Northbound: TF supports Web GUI and RESTful APIs. Plug-ins integrate with other platforms such as orchestrators, clouds and OSS/BSS.


Analytics nodes extract usable telemetry information form infrastructure. The data can then be normalised to the common format and the output is sent via the Kafka service into a Cassandra database. This data can be used in a multitude of ways operationally, from problem solving to capacity planning. Redis uses the data for generating graphs and running queries. The Redis pod is deployed between the analytics pod and the Web UI pod.

Resilience and Fault Tolerance

The modular architecture of Tungsten Fabric makes it resilient against failure, with typically several controllers/pods running on several servers for high availability. Also, the failure of a service is isolated, so it does not affect the whole system. The API and Web GUI services are accessed through a load balancer. The load balancer can allow pods to be in different subnets.

Programming Language

TF supports C++, Python, Go, Node.js.


TF was first associated with Juniper but is now supported under the Linux Foundation Networking umbrella and boasts a large developer and user community.


Given this evaluation; TF is a suitable choice for cloud builders and cloud-native platform engineers. This is because it works flexibly with private and public Clouds, CNI plugins, virtual machines and bare metal. Depending on the orchestrator integrated, it exposes heat APIs, Kubernetes APIs, etc. to instantiate network and security policies. The scalability of TF makes it highly available and resilient against failure which increases the customer user experience. Finally, the modularity features of it allows users to easily customise, read, test and maintain each module separately.

SDN Controller Comparisons:

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Comparison of Software Defined Networking (SDN) Controllers. Part 8: Tungsten Fabric appeared first on Aptira.

by Farzaneh Pakzad at November 28, 2019 12:48 PM

November 26, 2019

OpenStack Superuser

Inside open infrastructure: The latest from the OpenStack Foundation

Welcome to the latest edition of the OpenStack Foundation Open Infrastructure newsletter, a digest of the latest developments and activities across open infrastructure projects, events and users. Sign up to receive the newsletter and email community@openstack.org to contribute.

Spotlight on the Open Infrastructure Summit Shanghai

Attendees from over 45 countries attended the Open Infrastructure Summit earlier this month that was hosted in Shanghai, followed by the Project Teams Gathering (PTG). Use cases, tutorials, and demos covering 40+ open source projects including Airship, Ceph, Hadoop, Kata Containers, Kubernetes, OpenStack, StarlingX, and Zuul were featured at the Summit.

With the support of the active Open Infrastructure community in China, the market share of OpenStack in the APAC region is expected to increase by 36% in the next four years (451 Research report: OpenStack Market Monitor, 451 Research, September 2019). Currently, China is the second largest market adopting OpenStack software, and it ranks second in the code contribution of the latest version of the OpenStack Train release. Just like what Jonathan Bryce said in the keynotes, “The Summits bring our community members together to meet face to face, advancing the software we build and use daily.”
Check out the highlights of the Open Infrastructure Summit Shanghai:

  • In the Monday morning keynotes, Guohua Xi, the President of the China Communications Standards Association (CCSA), kicked off the event by sharing a call to action for the Chinese community to encourage cross community collaboration to drive innovation. Open Infrastructure users including Baidu, China Mobile, China Telecom, China Unicom, Intel, and Tencent also gave a keynote and shared the key role of the open source projects, such as Kata Containers and OpenStack, in their 5G and container business strategies. Keynote videos are now available here
  • In breakout sessions, Alibaba, Baidu and Tencent presented their Open Infrastructure use cases, highlighting the integration of multiple technologies including Ceph, Kata Containers, Kubernetes, OpenStack, and more. China Railway, China Mobile, Walmart Labs, Line and China UnionPay are among additional Open Infrastructure users who shared their innovations and open source best practices at the Shanghai Summit. Breakout session videos are being added here
  • For its latest release Train, OpenStack received 25,500 code changes by 1,125 developers from 150 different companies. This pace of development makes OpenStack one of the top three most active open source projects in the world alongside Chromium and Linux. 
  • Selected by members of the OSF community, Baidu ABC Cloud Group and Edge Security Team won the Superuser Award for the unique nature of its Kata Containers and OpenStack use case as well as its integration and application of open infrastructure.
  • Combining OpenStack and Kubernetes to address users’ infrastructure needs at scale, Airship joined Kata Containers and Zuul as confirmed Open Infrastructure Projects supported by the OpenStack Foundation. SKT, Intel, Inspur and more companies presented their Airship uses case on developing infrastructure solution.
  • Congratulations to Troila for being elected as a new Gold Member of the OpenStack Foundation! Learn more about it here

Summit keynote videos are already available, and breakout videos will be available on the Open Infrastructure videos page in the upcoming weeks. Thank you to our Shanghai Summit sponsors for supporting the event!

OpenStack Foundation (OSF)

  • The next OSF event will be a collaboration-centric event, happening in Vancouver, Canada June 8-11, 2020. Mark your calendars!
  • Troila was elected as a new Gold Member for the OpenStack Foundation at the Shanghai Board of Directors meeting.

Airship: Elevate your infrastructure

  • Last month, Airship was confirmed by OSF as a top level project — congratulations to the community!
  • The Airship community has made significant progress in Airship 2.0. 17% of planned work was completed, and another 18% is in progress and/or in review. The community is looking for more developers to contribute code. Interested in getting involved? Check out this page.

Kata Containers: The speed of containers, the security of VMs

OpenStack: Open source software for creating private and public clouds

  • Several OpenStack project teams, SIGs and working groups met during the Project Teams Gathering in Shanghai to prepare the Ussuri development cycle. Reports are starting to be posted to the openstack-discuss mailing-list.
  • Sławek Kapłoński, the Neutron PTL, recently reported that neutron-fwaas, neutron-vpnaas, neutron-bagpipe and neutron-bgpvpn are lacking interested maintainers. The Neutron team will drop those modules from future official OpenStack releases if nothing changes by the ussuri-2 milestone, February 14. If you are using those features and would like to step up to help, now is your chance!
  • We are looking for a name for the ‘V’ release of OpenStack, to follow the Ussuri release. Learn more about it in this post by Sean McGinnis
  • The next OpenStack Ops meetup will happen in London, UK on January 7-8. Stay tuned for registration information!

StarlingX: A fully featured cloud for the distributed edge

  • The StarlingX community met during the Project Teams Gathering in Shanghai to discuss topics like 4.0 release planning, documentation and how to improve the contribution process. You can check notes on their etherpad for the event.
  • The upcoming StarlingX 3.0 release will contain the Train version of OpenStack. The community is working on some last bits including testing and bug fixes before the release in December. You can find more information in StoryBoard about the release.

Zuul: Stop merging broken code

  • The Open Infrastructure Summit in Shanghai included a variety of talks, presentations, and discussions about Zuul; a quick project update from lead Zuul maintainer James Blair during keynotes set the tone for the days which followed.

Find the OSF at these upcoming Open Infrastructure community events

Questions / feedback / contribute

This newsletter is written and edited by the OSF staff to highlight open infrastructure communities. We want to hear from you! If you have feedback, news or stories that you want to share, reach us through community@openstack.org . To receive the newsletter, sign up here.

The post Inside open infrastructure: The latest from the OpenStack Foundation appeared first on Superuser.

by Allison Price at November 26, 2019 09:11 PM

November 25, 2019

Ghanshyam Mann

Recap of Open Infrastructure Summit & PTG, Shanghai 2019

Open Infrastructure Summit, Shanghai 2019

Open Infrastructure Summit followed by OpenStack PTG was held in Shanghai, China: 4th Nov 2019 till 8th Nov 2019. The first 3 days were for Summit which is market event including Forum sessions and the last 3 days for Project Team Gathering (PTG) with one day overlap.

I arrived in Shanghai on 1st Nov to participate in pre-summit events like Upstream Training and Board of Directors meeting.

    Upstream Institute Training Shanghai:

Like other Summits, Upstream training was held in Shanghai for 1.5 days. The second half on 2nd Nov and a full day on 3rd Nov. Thanks to Lenovo and Jay to sponsor the training this time too.


The first day was 9 mentors and ~20 students. The first day covered the introduction, registration and governance part including VM image setup etc. Students were from different countries, for example South Korea, India and of course China. Two developers from South Korea were interested in Swift contribution. They later joined the Swift PTG and interacted with the team.  One developer from India is doing cloud testing of their baremetal nodes via QA tooling. I had further discussion with him in QA PTG. I am happy to get this kind of interaction in every training and useful to get them onboard in Upstream activities.

The second day was fewer mentors and more students. I and few more mentors could not participate in Training due to the Joint leadership meeting.

    Ussuri cycle community-wide goals discussion:

Three goals were discussed in detail and how to proceed with each of them. Etherpad.

    Drop Python 2.7 Support:

Ussuri is time to drop the python 2 support from OpenStack. Plan and schedule were already discussed during TC office hour and on ML.  This was agreed to make community-wide goal. We discussed keeping the CI/CD support for Swift which is the only project keeping the py2 support. Swift needs the devstack to keep installing on py2 env with the rest of the services on py3 (same as old jobs when Swift was on py2 by default on devstack). There is no oslo dependency from swift and all the other dependency will be capped for py2 version. Requirements check job currently checks that if openstack/requirements list two entries for a requirement. smcginnis patch to change the requirement check is already merged. Everything else will go as discussed in ML. The work on this already started and patches for all the services are up for review now.

    Project Specific New Contributor & PTL Docs

As per feedback in Forum sessions, this is a good goal which will make documentation more consistent. All the projects should edit their contributor.rst to follow a more complete template and adjust/add PTL documentation. This is accepted as a pre-approved as Ussuri goal. Kim Hindhart is working on getting EU funding for people to work on OpenStack and they like consistent documentation.

    Switch remaining legacy jobs to Zuul v3 and drop legacy support

Many projects are still not ready for this goal. Grenade job is not yet on zuulv3 which is required to finish first. Few projects waiting for big projects finishing the zuulv3 migration first. This needs more work and can be a “pre-approved” thing for V, and this would be split to focus on the Grenade work in U. We will continue to review the proposed goal and pre-work etc.

Other than above 3 goals, there were few more ideas for goal candidate and good to go in goal backlogs etherpad:
– cdent: stop using paste, pastedeploy and WSME
Note from Chris: This does not need to be a community goal as such but requires the common solution from TC WSME is still used, has contributions, and at least a core or two

– cmurphy: Consistent and secure default policies. As per the forum discussion this is going with pop-up team first.

– support matrix documentation to be consistent across projects. going with pop-up team (fungi can propose the pop-up team in governance) first Richard Pioso (rpioso) to help fungi on this once consistent framework is identified, the pop-up team can expire with the approval of a related cycle goal for implementing it across remaining projects

    OpenStack QA PTG & Forum Sessions Summary:

I wrote a separate blog to summarize the QA discussions that happened in Forum or PTG.

    Nova API Policies defaults:


Nova planned to implement the default policy refresh by adopting the system scope and new default roles available in keystone. This was planned for the Train cycle when spec was merged but could not start the implementation. Nova Spec is already merged for Ussuri cycle. The main challenge to do this work is how to complete this in a single cycle so that users upgrade would not impact more than once. We discussed various options like a flag to suppress the deprecation warning or new policy enforcement. Getting all review up and hold the procedural hold on the first patch and later we merge all of them together. Doing the code up after first set merge and more active review will be required for this. Keystone team will help in reviewing the changes. I am very positive to complete this in Ussuri cycle.

    Technical Committee:

Friday was the full day for Technical Committee discussions. It was started with fun when JP collected the number of TC interested per topic and least interested topic to be discussed first :). He did a good job in organizing the discussion with time-based checks.   I am summarizing the few of the Topic below:

    Select U release goals:

This session was to selected the Ussuri goal. Ussuri cycle is already started so we have to finalize the goal asap. We agreed to proceed with below two goals for the Ussuri cyle.

1. Drop Python 2.7 Support – Already Accepted.
2. Project Specific New Contributor & PTL Docs – Under Review

Goal “Switch remaining legacy jobs to Zuul v3 and drop legacy support” will be pre-selected for V cycle, that does not mean to stop the review on the proposed goal or any ongoing work. All ongoing efforts will continue on this.

    Structurize the goal select schedule

After Summit and PTG event is merged we did not do well on community-wide goal selection. It was late in Train cycle also and so is for Ussuri.  Rico and I discussed about having the schedule for goal selection in TC. The main goal is to finalize the cycle community-wide goal before cycle start. This etherpad had the drafted schedule which all TC member agreed on. The first cycle to enforce this schedule will be challenging but let’s see how it will go. I will propose the schedule into TC documents soon.

    Python 2 drop current progress and next steps: 

During PTG, many projects discussed their plan to drop the python2 support. Cinder decided that we are going to just remove py2 compatibility and then do changes to backports in the cases that we need to for py27. Glance has greater challenges around WSGI. Other projects are almost good to go on this. Devstack also defaults to py3 now.

    Release naming convention:

This was again non-fun discussion. Fortunately, we concluded this. The agreement is that any community member can propose the name and TC will finally select the final one.

Few more topics were discussed in PTG which can be found in this Etherpad or ML.

by Ghanshyam Mann at November 25, 2019 05:04 AM

November 24, 2019

Ghanshyam Mann

Open Infrastructure Summit, Shanghai 2019: QA Summit & PTG Summary

OpenStack Quality Assurance Summit & PTG Summary

Open Infrastructure Summit, Shanghai 2019

Open Infrastructure Summit followed by OpenStack PTG was held in Shanghai, China: 4th Nov 2019 till 8th Nov 2019.

The first 3 days were for Summit where we had the forum sessions about user feedback on QA tooling on Monday and the last 3 days for Project Team Gathering (PTG) with one day overlap.

QA Forum sessions

    OpenStack QA – Project Update:  Wednesday, November 6, 10:15am-10:30am

We gave the updates on what we finished on Train and draft plan for the Ussuri cycle.
due to fewer contributors in QA, Train cycle activities are decreased as compare to Stein.  We tried to maintain the daily QA
activity and finished a few important things.

Slides: QA Project Update

    Users / Operators adoption of QA tools / plugins :Mon 41:20pm – 2:00pm

Etherpad. This is another useful session for QA to get feedback as well as information about downstream tooling.

Few tools we talked about:

  • Fault injection tests

One big concern shared from a few people about a long time to get merged tempest patches. One idea to solve this is to bring critical reviews in Office hours.


  QA PTG: 6th – 8th Nov:

It was a small gathering this time for one day for PTG on Wednesday. Even with small number of developers, we had good discussions on many topics.  I am summarizing the discussions:


  Train Retrospective  

Retrospective bought up the few key issues where we need improvement. We collected the below action items including bug triage. Untriage QA bugs are increasing day by day.

  • Action:
    • need to discuss blacklist plugins and how to notify and remove them if dead – gmann
    • start the process of community-goal work in QA – masayuki
    • sprint for bug triage with number of volunteers – 
      • (chandankumar)Include one bug in each sprint in TripleO CI tempest member
      • Traige the new bug and then pick the bug based on priority
      • For tripleo Ci team we will track here: https://tree.taiga.io/project/tripleo-ci-board/ – chandankumar

  How to deal with an aging testing stack. 

With testtools being not so active, we need to think on the alternate or best suitable options to solve this issue. We discussed the few options which need to be discussed further on ML.

  • Can we fork the dependecies of testtools in Temepst or stestr ? 
  • As we are removing the py2.7 support in tempest, we can completly ignore/remove the unittest2 things but that is not case for testtools ?
  • Remove the support of unittest2 from testtools ? py2.7 is going away from everywhere and testools can create tag or something for py2.7 usage ?
  • Since Python2 is going EOL on 01st Jan, 2020, so let’s create a tag and remove the unitest2 with unitest for python3 release only


  • Document the official supported test runner by Tempest. –  Soniya Vyas/Chandan Kumar
  • ML to discuss the above options – gmann 

  Remove/migrate the .testr.conf to .stestr

60 openstack/* repositories have .stestr.conf AND .testr.conf. We don’t need to have both files at least. Let’s take a look some of them and make a plan to remove if we can.

If both exist then remove the .testr.conf and Then verify that .stestr conf has the correct test path. If only .testr.conf then migrate to stestr.conf

We need to figure out the purpose of pbr .testr.conf code before removing. Is this just old codes or necessary?

  Moving subunit2html script from os-testr

Since os-testr runner piece in os-testr project is deprecated but subunit2html project still exists there, it is widely used across the OpenStack ecosystem, Can we move to somewhere else?  I do not find any benefits to move those scripts to other places. We asked chandan to open an issue on stestr to discuss moving to stestr repo. mtreinish replied on this: os-testr meant to be the place in openstack that we could host the ostestr runner wrapper/script subunit2html, generate_subunit, etc. Just because ostestr is deprecated and being removed doesn’t mean it’s not the proper home for those other tools.

  Separate integrated services tests can be used in TriplO CI

TriplO CI maintains a separate file to run dependent tests per service. Tempest has dependent services tox and integrated jobs and the same can be used in TriplO CI.

For example:

  • tox for networking.

  RBAC testing strategy

This was a cross-project strategy for positive/negative testing for system scope and new defaults in keystone. Keystone has implemented the new defaults and system scope in its policy and added a unit test to cover the new policies.  Nova is implementing the same in Ussuri cycle. As discussed in Denver PTG also, Tempest will implement the new credential for all 9 personas available in keystone.  Slowly migrate the tests start using the new policies. That will be done via a flag switching Tempest to use system scope or new defaults and that flag will be false to keep using the old policies for stable branch testing.

We can use patrole tests or implement new tests in the Tempest plugin and verify the response. Both have the issue of performing the complete operation which is not required always for policy verification.  Running full functional tests is expensive and duplicates existing tests. One solution for that (we talked about it in Denver PTG also) is via some flag like os-profiler by just do the policy check and return the API response with specific return code.


  • Tempest to provide the all 9 personas available from keystone. Slowly migrate Tempest existing tests to run with new policies.
  • We agreed to have two ways to test the policy:
    1. Tempest like tests in tempest plugins with the complete operation and verify the things on response, not just policy return code. It depends on the project if they want to implement such tests.
    2. Unit/Functional tests on the project’s side.
  • Document the both way so that project can adopt the best suitable one.

  How to remove tempest plugin sanity BLACKLIST

We have tempest plugin blacklist. It should be removed in the future if possible. Some of them shouldn’t be as a tempest-plugin because they’re just neutron studium things which already moved to neutron-tempest-plugin but still exiting in repo also. Some of them are less active.  Remove below plugins from BLACKLIST:

  • openstack/networking-generic-switch needs to be checked (setup.py/cfg?)


  • Add the start date in blacklist doc so that we can know how long a plugin is blacklisted. 
  • After 60 days: we send the email notification to openstack-discuss, PTL, maitainer and TC to either fix it or remove it from the governance. 

  Python 2.7 drop plan

We discussed the next steps to drop the py2 from Tempest and other QA tools.


  • Will be doing before milestone 2
  • Create a new tag for python-2.7 saying it is the last tag and document that the Tempest tag needs Train u-c. 
  • Test the Tempest tag with Train u-c, if fail then we will disucss. 
  • TripleO and OSA is going to use CentOS 8 for train and master

  Adding New glance tests to Tempest

We discussed on testing the new glance v2 api and feature. Below are the glance features and agreed points on how to test them.

  • Hide old images: Test can be added in Tempest. Hide the image and try to boot the server from the image in scenario tests. 
  • Delete barbican secrets from glance images: This test belongs to barbican-tempest-plugin which can be run as part of the barbican gate using an existing job. Running barbican job on glance gate is not required, we can add a new job (multi stores) on glance gate which can run this + other new features tests. 
  • Multiple stores: DevStack patch is already up, add a new zuul job to set up multiple stores and run on the glance gate with api and scenario tests. gmann to setup the zuulv3 job for that.

  Tempest volunteers for reviewing patches

We’ve noticed that the amount of merged patches in October is less than in September and much less than it was during the summer. This has been brought in feedback sessions also. There is no perfect solution for this. Nowadays QA has less active core developers. We encourage people to bring up the critical or stuck patches in office hours.

  Improving Tempest cleanup

Tempest cleanup is not so stable and not a perfect design. We have spec up to redesign that but could not get a consensus on that. I am ok to move with resource prefix with UUID.  We should extend the cleanup tool for plugins also.

  Ussuri Priority & Planning

This was the last session for the PTG which could not happen on Wed due to strict time-up policy of the conference place which I really liked. Time-based working is much needed for IT people :). We met on Thursday morning in coffee area and discussed about priority for Ussuri cycle. QA Ussuri Priority Etherpad has the priority items with the assignee.

See you in Vancouver!

by Ghanshyam Mann at November 24, 2019 11:51 PM

Open Infrastructure Summit: QA Project Updates, Shanghai 2019

Open Infrastructure Summit, Shanghai 2019

        OpenStack QA – Project Update:  Wednesday, November 6, 10:15am-10:30am

This time no video recording for the Project Updates. Here are the complete slides QA Project Update.

Train cycle Stats:

by Ghanshyam Mann at November 24, 2019 12:31 AM

November 19, 2019


Create and manage an OpenStack-based KaaS child cluster

Deploying and managing Kubernetes clusters doesn't have to be complicated. Here's how to do it with Mirantis Kubernetes as a Service (KaaS).

by Nick Chase at November 19, 2019 05:02 PM

November 18, 2019

StackHPC Team Blog

High Performance Ethernet for HPC – Are we there yet?

Recently there has been a resurgence of interest around the use of Ethernet for HPC workloads, most notably from recent announcements from Cray and Slingshot. In this article I examine some of the history around Ethernet in HPC and look at some of the advantages within modern HPC Clouds.

Of course Ethernet has been the mainstay of many organisations involved in High Throughput Computing large-scale cluster environments (e.g. Geophysics, Particle Physics, etc.) although it does not (generally) hold the mind-share for those organisations where conventional HPC workloads predominate, notwithstanding the fact that for many of these environments, the operational workload for a particular application rarely goes above a small to moderate number of nodes. Here Infiniband has held sway for many years now. A recent look at the TOP500 gives some indication of the spread of Ethernet vs. Infiniband vs. Custom or Proprietary interconnects for both system and performance share, or as I often refer to them as the price-performance and performance, respectively, of the HPC market.

Ethernet share of the TOP500

My interest in Ethernet was piqued some 15-20 years ago as it is a standard, and very early on there were mechanisms to obviate kernel overheads which allowed some level of scalability even back in the days of 1Gbps. This meant even then, that one could exploit Landed-on-Motherboard network technology instead of more expensive PCI add-in cards, Since then as we moved to 10Gbps and beyond, and I coincidentally joined Gnodal (later acquired by Cray), RDMA-enablement (through RoCE and iWarp) allowed standard MPI environment support and with the 25, 50 and 100Gbps implementations, bandwidth and latency promised on par with Infiniband. As a standard we would expect a healthy ecosystem of players within both the smart NIC and switch markets to flourish. For most switches such support is now a standard (see next section). In terms of rNICs Broadcom, Chelsio, Marvel and Mellanox currently offer products supporting either, or both, the RDMA Ethernet protocols.

Pause for Thought (Pun Intended)

I think the answer to the question, on “are we there yet” is, (isn’t it always) going to be “it depends”. That “depends” will largely be influenced by the market segmentation into the Performance, Price-Performance and Price regimes. The question is can Ethernet address the areas of “Price” and “Price-Performance” as opposed to the “Performance Region” where some of the deficiencies of Ethernet RDMA may well be exposed, e.g. multi-switch congestion at large scale but for moderate sized clusters with nodes spanning only a single switch may well be a better fit.

So for example, a cluster of 128 nodes (minus nodes for management, access, storage): if it was possible to assess that 25GbE vs 100Gbps EDR was sufficient, then I can build a system from a single 32-port 100GbE Switch (using break-out cables) as opposed to multiple 36-port EDR switches, which if I take the standard practise of over-subscription, I would end-up with similar cross-sectional bandwidth to the single Ethernet switch anyway. Of course, within the bounds of a single switch the bandwidth would be higher for IB. I guess down the line with 400GbE devices coming to a Data Centre soon, this balance will change.

Recently I had the chance to revisit this when running test benchmarks on a bare-metal OpenStack system being used for prototyping of the SKA (I’ll come on to OpenStack a bit later on but just to remark here that this system runs OpenStack to prototype an operating environment for the Science Data Processing Platform of the SKA).

I wanted to stress-test the networks, compute nodes and to some extent the storage. StackHPC operate the system as a performance prototype platform on behalf of astronomers across the SKA community and so ensuring performance is maintained across the system is critical. The system, eponymously named ALaSKA, looks like this.


ALaSKA is used to software-define various platforms of interest to various aspects of the SKA-Science Data Processor. The two predominant platforms of interest currently are a Container Orchestration environment (previously Docker-Swarm but now Kubernetes) and a Slurm-as-a-Service HPC platform.

Here we focus on the latter of these, which gives us a good opportunity to look at 100G IB vs 25G RoCE vs 25Gbps TCP vs 10G (network not shown in the above diagram but is used for provisioning) to compare performance. First let us look more closely at the Slurm PaaS. From the base, compute, storage and network infrastructure we use OpenStack Kayobe to deploy the OpenStack control plane (based on Kolla-Ansible) and then marshal the creation of bare-metal compute nodes via the OpenStack Ironic service. The flow looks something like this with the Ansible Control Host being used to configure the OpenStack (via a Bifrost service running on the seed node) as well the configuration of network switches. Github provides the source repositories.


Further Ansible playbooks together with OpenStack Heat permit the deployment of the Slurm platform, based on the latest OpenHPC image and various high performance storage subsystems, in this case using BeeGFS Ansible playbooks. The graphic above depicts the resulting environment with the addition of OpenStack Monasca Monitoring and Logging Service (depicted by the lizard logo). As we will see later on, this provides valuable insight to system metrics (for both system administrators and the end user).

So let us assume that we first want to address the Price-Performance and Price driven markets - at scale we need to be concerned around East-West traffic congestion between switches, where this can be somewhat mitigated by the fact that with modern 100GbE switches we can break-out to 25/50GbE which increases the arity of a single switch and (likely congestion). Of course, this means we need to be able to justify the reduction in bandwidth of the NIC. Of course if the total system only spans a single switch then congestion may not be an issue, although further work may be required to understand end-point congestion.

To test the systems performance I used (my preference) HPCC and OpenFoam as two benchmark environments. All tests used gcc, MKL and openmpi3 and no attempt was made to further optimise the applications. Afterall, all I want to do is run comparative tests of the same binary, by changing run-time variables to target the underlying fabric. For openmpi, this can be achieved with the following (see below). The system uses an OpenHPC image. At the BIOS level, the system has hyperthreading enabled and so I was careful to ensure that process placement ensured I pinned only half the number of available slots (I’m using Slurm) and mapped by CPU. This is important to know when we come to examine the performance dashboards below. Here are the specific mca parameters for targeting the fabrics.

DEV=" roce ibx eth 10Geth"
for j in $DEV;

if [ $j == ibx ]; then
MCA_PARAMS="--bind-to core --mca btl openib,self,vader  --mca btl_openib_if_include mlx5_0:1 "
if [ $j == roce ]; then
MCA_PARAMS="--bind-to core --mca btl openib,self,vader  --mca btl_openib_if_include mlx5_1:1
if [ $j == eth ]; then
MCA_PARAMS="--bind-to core --mca btl tcp,self,vader  --mca btl_tcp_if_include p3p2"
if [ $j == 10Geth ]; then
MCA_PARAMS="--bind-to core --mca btl tcp,self,vader  --mca btl_tcp_if_include em1"
if [ $j == ipoib ]; then
MCA_PARAMS="--bind-to core --mca btl tcp,self,vader  --mca btl_tcp_if_include ib0"

In the results below, I’m comparing the performance across each network using HPCC for a size of 8 nodes (up to 256 cores, albeit 512 virtual cores are available as described above). I think this would cover the vast majority of cases in Research Computing.


HPCC Benchmark

The results for major operations of the HPCC suite are shown below together with a personal narrative of the performance. A more thorough description of the benchmarks can be found here.

8 nodes 256 cores

Benchmark 10GbE (TCP) 25GbE (TCP) 100Gb IB 25GbE RoCE
HPL_Tflops 3.584 4.186 5.476 5.233
PTRANS_GBs 5.656 16.458 44.179 17.803
MPIRandomAccess_GUPs 0.005 0.004 0.348 0.230
StarFFT_Gflops 1.638 1.635 1.636 1.640
SingleFFT_Gflops 2.279 2.232 2.343 2.322
MPIFFT_Gflops 27.961 62.236 117.341 59.523
RandomlyOrderedRingLatency_usec 87.761 100.142 3.054 2.508
RandomlyOrderedRingBandwidth_GBytes 0.027 0.077 0.308 0.092
  • HPL – We can see here that it is evenly balanced between low-latency and b/w with RoCE and IB on a par even with the reduction in b/w of RoCE. In one sense this performance underlies the graphics shown above in terms of HPL, where Ethernet occupies ~50% of the share of total clusters which is not matched in terms of the performance share.
  • PTRANS – Performance pretty much in line with b/w
  • GUPS – latency dominated. IB wins by some margin
  • STARFFT– Embarrassingly Parallel (HTC use-case) no network effect.
  • SINGLEFFT – No effect no comms.
  • MPIFFT – Heavily b/w dominated see effect of 100 vs 25 Gbps (no latency effect)
  • Random Ring Latency – see effect of RDMA vs. TCP. Not sure why RoCE is better that IB, but may be due to the random order?
  • Random Ring B/W – In line with 100Gbps (IB) vs 25Gbps (RDMA) vs TCP networks.


I took the standard Motorbike benchmark and ran this on 128 (4 nodes) and 256 (8 nodes) cores on the same networks as above. I did not change the mesh sizing between runs and thus on higher processor counts, comms will be imbalanced. The results are shown below, showing very little difference between the RDMA networks despite the bandwidth difference.

Nodes(Processors) 100Gbps IB 25Gbps ROCE 25Gbps TCP 10Gbps TCP
8/(256) 87.64 93.35 560.37 591.23
4/(128) 99.83 101.49 347.19 379.32

Elapsed Time in Seconds. NB the increase in time for TCP when running on more processors!

Future Work

So at present I have only looked at MPI communication. The next big thing to look at is storage, where the advantages of Ethernet need to be assessed not only in terms of performance but also the natural advantage the Ethernet standard has in connectivity for many network-attached devices.

Why OpenStack

As was mentioned above, one of the prototypical aspects of the AlaSKA system is to model operational aspects of the Science Data Processor element of the SKA. A good description of the SDP and the Operational scenarios are described in the architectural description of the system. A description of the architecture and that prototyping can be found here.

Using Ethernet, and in particular the use of High Performance Ethernet (HPC Ethernet in the parlance of Cray), holds a particular benefit in the case of on-premise cloud, as infrastructure may be isolated in terms of multiple tenants. For the particular case of IB and OPA this can be achieved using ad-hoc methods for the respective network. For Ethernet, however, multi-tenancy is native.

For many HPC scenarios, multi tenancy is not important, nor even a requirement. For others, it is key and mandatory, e.g. secure clouds for clinical research. One aspect of multi-tenancy is shown in the analysis of the results, where we use the aspects of OpenStack Monasca (multi-tenant monitoring and logging service) and Grafana dashboards. More information on the architecture of Monasca can be found in a previous blog article.

Appendix – OpenStack Monasca Monitoring O/P


The plot below shows CPU usage and network b/w for the runs of HPCC using a grafana dashboard and OpenStack Monasca monitoring as a service. The 4 epochs are shown for the IB, RoCE, 25Gbps (TCP) and 10Gbps (TCP). The total CPU usage is set at 50% as these are HT-enabled nodes but mapped by-core with 1 thread per core. Thus, we are only consuming 50% of the available resources. Network bandwidth is shown for 3 of the epochs shown. “Inbound ROCE Network Traffic”, “Inbound Infiniband Network Traffic” and “Inbound Bulk Data Network Traffic” – Bulk Data Network refers to an erstwhile name for the ingest network for the SDP.

HPCC performance data in Monasca

For the case of CPU usage, a reduction in performance is observed for the TCP cases. This is further evidenced by a 2nd plot that shows the system CPU, showing heavy system overhead for the 4 separate epochs.

HPCC CPU performance data in Monasca

by John Taylor at November 18, 2019 02:00 AM

November 17, 2019

Christopher Smart

Use swap on NVMe to run more dev KVM guests, for when you run out of RAM

I often spin up a bunch of VMs for different reasons when doing dev work and unfortunately, as awesome as my little mini-itx Ryzen 9 dev box is, it only has 32GB RAM. Kernel Samepage Merging (KSM) definitely helps, however when I have half a dozens or so VMs running and chewing up RAM, the Kernel’s Out Of Memory (OOM) killer will start executing them, like this.

[171242.719512] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d435\x2dtest\x2dvm\x2dcentos\x2d7\x2d00.scope,task=qemu-system-x86,pid=2785515,uid=107
[171242.719536] Out of memory: Killed process 2785515 (qemu-system-x86) total-vm:22450012kB, anon-rss:5177368kB, file-rss:0kB, shmem-rss:0kB
[171242.887700] oom_reaper: reaped process 2785515 (qemu-system-x86), now anon-rss:0kB, file-rss:68kB, shmem-rss:0kB

If I had more slots available (which I don’t) I could add more RAM, but that’s actually pretty expensive, plus I really like the little form factor. So, given it’s just dev work, a relatively cheap alternative is to buy an NVMe drive and add a swap file to it (or dedicate the whole drive). This is what I’ve done on my little dev box (actually I bought it with an NVMe drive so adding the swapfile came for free).

Of course the number of VMs you can run depends on the amount of RAM each VM actually needs for what you’re running on it. But whether I’m running 100 small VMs or 10 large ones, it doesn’t matter.

To demonstrate this, I spin up a bunch of CentOS 7 VMs at the same time and upgrade all packages. Without swap I could comfortably run half a dozen VMs, but more than that and they would start getting killed. With 100GB swap file I am able to get about 40 going!

Even with pages swapping in and out, I haven’t really noticed any performance decrease and there is negligible CPU time wasted waiting on disk I/O when using the machines normally.

The main advantage for me is that I can keep lots of VMs around (or spin up dozens) in order to test things, without having to juggle active VMs or hoping they won’t actually use their memory and have the kernel start killing my VMs. It’s not as seamless as extra RAM would be, but that’s expensive and I don’t have the slots for it anyway, so this seems like a good compromise.

by Chris at November 17, 2019 07:26 AM

November 15, 2019

Slawek Kaplonski

My Summary of OpenStack PTG in Shanghai

This is my summary of OpenStack PTG which had place in Shanghai in November 2019. It is brief summary of all discussions which we had in Neutron room during 3 days event. On boarding Slides from onboarding session can be found at here In my opinion onboarding was good. There was around 20 (or even more) people in the room during this session. Together with Miguel Lavalle we gave talk about

November 15, 2019 12:56 PM

November 14, 2019

Nate Johnston

Shanghai PTG Summary - Remote

I attended the Neutron meetings for the OpenInfra PTG in Shanghai last week. I was not in Shanghai, so I participated entirely remotely over BlueJeans. Remote Participation Typically I would work most of a day - 5-6 hours with a nap in the middle - and then be on the PTG from 3-5 hours in the evening. The timeshift was such that the scheduled block of meetings started at 8:00pm my time and ended at 3:30am.

November 14, 2019 07:55 PM

Ben Nemec

Oslo in Shanghai

Despite my trepidation about the trip (some of it well-founded!), I made it to Shanghai and back for the Open Infrastructure Summit and Project Teams Gathering. I even managed to get some work done while I was there. :-)

First, I recommend reading the opening of Colleen Murphy's blog post about the event (and the rest of it too, if you have any interest in what Keystone is up to). It does an excellent job of describing the week at a high level. To summarize in my own words, the energy of this event was a little off. Many regular contributors were not present because of the travel situation and there was less engagement from local contributors than I would have hoped for. However, that doesn't mean nothing good came out of it!

In fact, it was a surprisingly active week for Oslo, especially given that only myself and two other cores were there and we had limited discussion within the team. It turns out Oslo was a popular topic of conversation in various Forum sessions, particularly oslo.messaging. This led to some good conversation at the PTG and a proposal for a new Oslo library. Not only were both Oslo summit sessions well attended, but good questions were asked in both so people weren't just there waiting for the next talk. ;-) In fact, I went 10 minutes over time on the project update (oops!), in part because I hadn't really planned time for questions since I've never gotten any in the past. Not complaining though.

Read on for more detail about all of this.

oslo.messaging drivers

It should come as no surprise to anyone that one of major pain points for OpenStack operators is RabbitMQ administration. Rabbit is a frequent bottleneck that limits the scale of deployed clouds. While it should be noted that this is not always Rabbit's fault, scaling of the message queue is a problem almost everyone runs into at some point when deploying large clouds. If you don't believe me, ask someone how many people attended the How we used RabbitMQ in wrong way at a scale presentation during the summit (which I will talk more about in a bit). The room was packed. This is definitely a topic of interest to the OpenStack community.

A few different solutions to this problem have been suggested. First, I'll talk about a couple of new drivers that have been proposed.


This was actually submitted to oslo.messaging even before the summit started. It's a new driver that uses the NATS messaging system. NATS makes some very impressive performance claims on its site, notably that it has around an order of magnitude higher throughput than RabbitMQ. Anybody interested in being able to scale their cloud 10x just by switching their messaging driver? I thought so. :-)

Now, this is still in the early discussion phase and there are some outstanding questions surrounding it. For one, the primary Python driver is not compatible with Eventlet (sigh...) which makes it unusable for oslo.messaging. There does exist a driver that would work, but it doesn't seem to be very maintained and as a result we would likely be taking on not just a new oslo.messaging driver but also a new NATS library if we proceed with this. Given the issues we've had in the past with drivers becoming unmaintained and bitrotting, this is a non-trivial concern. We're hoping to work with the driver proposers to make sure that there will be sufficient staffing to maintain this driver in the long run. If you are interested in helping out with this work please contact us ASAP. Currently it is being driven by a single contributor, which is likely not sustainable.

We will also need to ensure that NATS can handle all of the messaging patterns that OpenStack uses. One of the issues with previous high performance drivers such as ZeroMQ or Kafka was that while they were great at some things, they were missing important functionality for oslo.messaging. As a result, that functionality either had to be bolted on (which reduces the performance benefits and increases the maintenance burden) or the driver had to be defined as notification-only, in which case operators end up having to deploy multiple messaging systems to provide both RPC and notifications. Even if the benefits are worth it, it's a hard sell to convince operators to deploy yet another messaging service when they're already struggling with the one they have. Fortunately, according to the spec the NATS driver is intended to be used for both so hopefully this won't be an issue.


In one of the sessions, I believe "Bring your crazy idea", a suggestion was made to add a gRPC driver to oslo.messaging as well. Unfortunately, I think this is problematic because gRPC is also not compatible with Eventlet, and I'm not sure there's any way to make it work. It's also not clear to me that we need multiple alternatives to RabbitMQ. As I mentioned above, we've had problems in the past with alternative drivers not being maintained, and the more drivers we add the more maintenance burden we take on. Given that the oslo.messaging team is likely shrinking over the next cycle, I don't know that we have the bandwidth to take on yet another driver.

Obviously if someone can do a PoC of a gRPC driver and show that it has significant benefits over the other available drivers then we could revisit this, but until that happens I consider this a non-starter.

Out-of-tree Drivers

One interesting suggestion that someone made was to implement some of these proposed drivers outside of oslo.messaging. I believe this should be possible with no changes to oslo.messaging because it already makes use of generic entry points for defining drivers. This could be a good option for incubating new drivers or even as a longer term solution for drivers that don't have enough maintainers to be included in oslo.messaging itself. We'll need to keep this option in mind as we discuss the new driver proposals.

Reduce the amount of RPC in OpenStack

This also came out of the crazy idea session, but I don't recall that there was much in the way of specifics (I was distracted chatting with tech support in a failed attempt to get my cell phone working during this session). In general, reducing the load on the messaging layer would be a good thing though. If anyone has suggestions on ways to do this please propose them on the openstack-discuss mailing list.


Now we get to some very concrete solutions to messaging scaling that have already been implemented. LINE gave the RabbitMQ talk I mentioned earlier and had some novel approaches to the scaling problems they encountered. I suggest watching the recording of their session when it is available because there was a lot of interesting stuff in it. For this post, I'm going to focus on some of the changes they made to oslo.messaging in their deployment that we're hoping to get integrated into upstream.

Separate Notification Targets

One important architecture decision that LINE made was to use a separate RabbitMQ cluster for each service. This obviously reduces the load on an individual cluster significantly, but it isn't necessarily the design that oslo.messaging assumes. As a result, we have only one configuration section for notifications, but in a split architecture such as LINE is using you may want service-specific notifications to go to the service-specific Rabbit cluster. The spec linked in the title for this section was proposed to provide that functionality. Please leave feedback on it if this is of interest to you.

oslo.messaging instrumentation and oslo.metrics

One of the ways LINE determined where their messaging bottlenecks were was some instrumentation that they added to oslo.messaging to provide message-level metrics. This allowed them to get very granular data about what messages were causing the most congestion on the messaging bus. In order to collect these metrics, they created a new library that they called oslo.metrics. In essence, the oslo.messaging instrumentation calls oslo.metrics when it wants to output a metric, oslo.metrics then takes that data, converts it to a format Prometheus can understand, and serves it on an HTTP endpoint that the oslo.metrics library creates. This allowed them to connect the oslo.messaging instrumentation to their existing telemetry infrastructure.

Interestingly, this concept came up in other discussions throughout the week as well, so we're hoping that we can get oslo.metrics upstreamed (currently it is something they implemented downstream that is specific to their deployment) and used in more places. Another interesting related possibility was to add a new middleware to oslo.middleware that could do a similar thing for the API services and potentially provide useful performance metrics from them.

We had an extended discussion with the LINE team about this at the Oslo PTG table, and the next steps will be for them to fill out a spec for the new library and hopefully make their code changes available for review. Once that is done, we had commitments from a number of TC members to review and help shepherd this work along. All in all, this seems to be an area of great interest to the community and it will be exciting to see where it goes!

Policy Improvements

I'm going to once again refer you to Colleen's post, specifically the "Next Steps for Policy in OpenStack" section since this is being driven more by Keystone than Oslo. However, one interesting thing that was discussed with the Nova team that may affect Oslo was how to manage these changes if they end up taking more than one cycle. Because the oslo.policy deprecation mechanism is used to migrate services to the new-style policy rules, operators will start seeing quite a few deprecation messages in their logs once this work starts. If it takes more than one cycle then that means they may be seeing deprecations for multiple cycles, which is not ideal.

Currently Nova's plan is to queue up all of their policy changes in one big patch series of doom and once they are all done merge the whole thing at once. It remains to be seen how manageable such a patch series that touches code across the project will be though. If it proves untenable, we may need to implement some sort of switch in oslo.policy that would allow deprecations to be temporarily disabled while this work is ongoing, and then when all of the policy changes have been made the switch could be flipped so all of the deprecations take effect at once. As of now I have no plans to implement such a feature, but it's something to keep in mind as the other service projects get serious about doing their policy migrations.


The news is somewhat mixed on this front. Unfortunately, the people (including me) who have been most involved in this work from the Keystone and Oslo sides are unlikely to be able to drive it to completion due to changing priorities. However, there is still interest from the Nova side, and I heard rumors at the PTG that there may be enough operator interest in the common quota work that they would be able to have someone help out too. It would be great if this is still able to be completed as it would be a shame to waste all of the design work and implementation of unified limits that has already been done. The majority of the initial API is available for review and just needs some massaging to be ready to merge. Once that happens, projects can start consuming it and provide feedback on whether it meets their needs.

Demo of Oslo Tools That Make Life Easier for Operators

A bit of shameless self-promotion, but this is a presentation I did in Shanghai. The recording isn't available yet, but I'll link it once it is. In essence, this was my attempt to evangelize some Oslo tools that have been added somewhat recently but people may not have been aware of. It covers what the tools are good for and how to actually use them.


As I tweeted on the last day of the PTG, this was a hard event for me to leave. Changes in my job responsibilities mean this was likely my last summit and my last opportunity to meet with the OpenStack family face-to-face. Overall it was a great week, albeit with some rough edges, which is a double-edged sword. If the week had gone terribly maybe I wouldn't have been so sad to leave, but on the other hand it was nice to go out on a high note.

If you made it this far, thanks! Please don't hesitate to contact me with any comments or questions.

by bnemec at November 14, 2019 06:35 PM

SUSE Conversations

The Brains Behind the Books – Part VII: Alexandra Settle

The content of this article has been contributed by Alexandra Settle, Technical Writer at the SUSE Documentation Team. It is part of a series of articles focusing on SUSE Documentation and the great minds that create the manuals, guides, quick starts, and many more helpful documents.       A Dream of  Ice Cream Shops and Lego […]

The post The Brains Behind the Books – Part VII: Alexandra Settle appeared first on SUSE Communities.

by chabowski at November 14, 2019 12:23 PM

November 13, 2019

Colleen Murphy

Shanghai Open Infrastructure Forum and PTG

The Open Infrastructure Summit, Forum, and Project Teams Gathering was held last week in the beautiful city of Shanghai. The event was held in the spirit of cross-cultural collaboration and attendees arrived with the intention of bridging the gap with a usually faraway but significant part of the OpenStack community …

by Colleen Murphy at November 13, 2019 01:00 AM

Sean McGinnis

November 2019 OpenStack Board Notes

The Open Infrastructure Summit was held in mainland China for the first time the week of November 4th, 2019, in Shanghai. As usual, we took advantage of the opportunity of having so many members in one place by having a Board of Directors meeting on Sunday, November 3.

Attendance was a little lighter due to visa challenges, travel budgets, and other issues. But we still had a quorum with a lot of folks in the room, and I’m sure it was a nice change for our Chinese board members and others from the APAC region.

The original meeting agenda is published on the wiki as usual.

OSF Updates

Following the usual pattern, Jonathan Bryce kicked things off with an update of Foundation and project activity.

One interesting thing that really stood out to me, which Jonathan also shared the next day in the opening keynotes, as an analyst report putting OpenStack’s market at $7.7 billion in 2020. I am waiting for those slides to be published, but I think this really showed that despite the decrease in investment by companies in the development of OpenStack, its adoption and growth is stable and growing.

This was especially highlighted in China, with companies like China UnionPay, China Mobile, and other large companies from other industries increasing their use of OpenStack. And public clouds like Huawei and other local service providers basing their services on top of OpenStack.

I can definitely state from experience after that week, access to the typical big 3 public cloud providers in the US is a challenge through the Great Firewall. Being able to base your services on top of a borderless open source option like OpenStack is a great option with the current political pressures. A community-based solution, rather than a foreign tech company’s offerings, probably makes a lot of sense and is helping drive this adoption.

Of course, telecom adoption is still growing as well. I’m not as involved in that space, but it really seems like OpenStack is becoming the de facto standard for having a programmable infrastructure to base dynamic NFV solutions on top of, but directly with VMs and baremetal, and as a locally controlled platform to serve as the underlying infrastructure for Kubernetes.

Updates and Community Reports

StarlingX Progress Report

The StarlingX project has made a lot of progress over the last several months. They are getting closer and closer to the latest OpenStack code. They have been actively working on getting their custom changes merged upstream so they do not need to continue maintaining a fork. So far, they have been able to get a lot of changes in to various projects. They hope to eventually be able to just deploy standard OpenStack services configured to meet their needs, focusing instead on the services on top of OpenStack that make StarlingX attractive and a great solution for edge infrastructure.

Indian Community Update

Prakash Ramchandran gave an update on the various meetups and events being organized across India. This is a large market for OpenStack. Recently approved government initiatives could make this an ideal time to help nurture the Indian OpenStack community.

I’m glad to see all of the activity that Prakash has been helping support there. This is another region where I expect to see a lot of growth in OpenStack adoption.

Interop Working Group

Egle gave an update of the Interop WG activity and the second 2019 set of changes were approved. Nothing too exciting there, with just minor updates to the interop requirements.

The larger discussion was about the need and the health of the Interop WG. Chris Hoge was a very active contributor to this, but he recently left the OSF, and the OpenStack community, to pursue a different opportunty. Egle Sigler is really the only one left on the team, and she has shared that she would not be able to do much more with the group other than keeping the lights on.

This team is responsible for the guidelines that must be followed for someone to certify their service or distribution of OpenStack meets the minimum functionality requirements to be consistent with other OpenStack deployments. This is certification is needed to be able to use the OpenStack logo and be called “OpenStack Powered”.

I think there was pretty unanimous agreement that this kind of thing is still very important. Users need to be able to have a consistent user experience when moving between OpenStack-based clouds. Inconsistency would lead to unexpected behaviors or responses and a poor user experience.

For now it is a call for help and to raise awareness. It did make me think about how we’ve been able to decentralize some efforts within the community, like moving documentation into each teams repos rather than having a centralized docs team and docs repo. I wonder if we can put some of this work on the teams themselves to mark certain API calls as “core”, then some testing in place to ensure none of these set APIs are changed or start producing different results. Something to think about at least.

First Contact SIG Update

The First Contact SIG works on things to make getting involved in the community easier. They’ve done a lot of work in the past on training and contributor documentation. They’ve recently added a Contributing Organization Guide that is targeted at the organization management level to help them understand how they can make an impact and help their employees to be involved and productive.

That’s an issue we’ve had to varying degrees in the past. Companies have had good intentions of getting involved, but they are not always sure where to start. Or they task a few employees to contribute without a good plan on how or where to do so. I think it will be good having a place to direct these companies to, to help them understand how to work with OpenStack and an open source community.

Troila Gold Member Application

Troila is an IT services company in China that provides a cloud product based on OpenStack to their customers. They have been using OpenStack for some time and saw the value in becoming an OSF Gold level sponsor.

As part of the Member Committee, Rob Esker and I met with them the week prior to go over their application and answer any questions and give feedback. That preview was pretty good, and Rob and I only had minor suggestions for them to help highlight what they have been doing with OpenStack and what their future plans were.

They had taken these suggestions and made updates to their presentation, and I think they did a very nice job explaining their goals. There was some discussion and additional questions from the board, but after a quick executive session, we voted and approved Troila as the latest Gold member of the OpenStack Foundation.

Combined Leadership Meeting

The second half of the day was a joint session with the Board and the Technical Committees or Technical Steering Committees of the OpenStack, StarlingX, Airship, Kata, and Zuul projects. Each team gave a community update for their respective areas.

My biggest takeaway from this was that although we are unresources in some areas, we really do have a large and very active community of people that really care about the things they are working on. Seeing growing adoption for things like Kata Containers and Zuul is really exciting.

Next Meeting

The next meeting will be a conference call on December 10th. No word yet on the agenda for that, but I wouldn’t expect too much being so soon after Shanghai. I expect their will probably be some buzz about the annual elections coming up.

Once available, the agenda will be published to the usual spot.

I have the issue that I have been able to finish out my term because the rest of the board voted to allow me to do so as an exception to the two seat per company limit since I had rejoined Dell half way through the year. That won’t apply for the next election, so if the three of us from Dell all hope to continue, one of us isn’t going to be able to.

I’ve waffled on this a little, but at least right now, I do think I am going to run for election again. Prakash has been doing some great work with his participation in the India OpenStack community, so I will not feel too bad if I lose out to him. I do think I’ve been more integrated in the overall development community, so since an Individual Director is supposed to be a representative for the community, I do hope I can continue. That will be up to the broader community, so I am not going to worry about it. The community will be able to elect those they support, so no matter what it will be good.

by Sean McGinnis at November 13, 2019 12:00 AM

November 12, 2019

Sean McGinnis

Why is the Cinder mascot a horse?!

I have to admit, I have to laugh to myself every time I see the Cinder mascot in a keynote presentation.

Cinder horse mascot

History (or, why the hell is that the Cinder mascot!)

The reason at least a few of us find it so funny is that it’s a bit of an inside joke.

Way back in the early days of Cinder, someone from Solidfire came up with a great looking cinder block logo for the project. It was along the style if the OpenStack logo at the time and was nice and recognizable.

Cinder logo

Then around 2016, they decided it was time to refresh the OpenStack logo and make it look more modern and flat. Our old logo no longer matched the overall project, but we still loved it.

I did make an attempt to update it. I made a stylized version of the Cinder block logo using the new OpenStack logo as a basis for it. I really wish I could find it now, but I may have lost the image when I switched jobs. You may still see it on someone’s laptop - I had a very small batch of stickers made while I was still Cinder PTL.

It was soon after the OpenStack logo change that the Foundation decided to introduce mascots for each project. They were asking for each team to thing of an animal that they could identify with. It was supposed to be a fun exercise for the teams to be able to pick their own kind of logo, with graphic designers coming up with very high quality images.

The Cinder team didn’t really have an obvious animal. At least not as obvious as a Cinder block had been. It was during one of our midcycle meetups in Ft. Collins, Co while we were brainstorming that led to our horse.

Trying to think of something that would actually represent the team, we were talking over what Cinder actually was. We were mostly all from different storage vendors. We refer to the different storage devices that are used with Cinder as backends.

Backends are also what some call butts. Butts… asses. Donkeys are also called asses. Donkey!

One or two people on the team had cultural objects to having a donkey as a mascot. They didn’t think it was a good representation of our project. So we compromised with going with a horse.

So we asked for a horse to be our mascot. The initial design they came up with was a Ferrari looking stallion. Way to sporty and fierce for our team. Even though the OpenStack Foundation has actually published it and even created some stickers, we explained our, erm… thought process… behind coming up with the horse in the first place. The design team was great, and went back to the drawing board. The result is the back-end view of the horse that we have today. They even worked a little ‘C’ into the swish of the horse’s tail.

So that’s the story behind the Cinder logo. It’s just because we’re all a bunch of backends.

by Sean McGinnis at November 12, 2019 12:00 AM

November 11, 2019


Community Blog Round Up 11 November 2019

As we dive into the Ussuri development cycle, I’m sad to report that there’s not a lot of writing happening upstream.

If you’re one of those people waiting for a call to action, THIS IS IT! We want to hear about your story, your problem, your accomplishment, your analogy, your fight, your win, your loss – all of it.

And, in the meantime, Adam Young says it’s not that cloud is difficult, it’s networking! Fierce words, Adam. And a super fierce article to boot.

Deleting Trunks in OpenStack before Deleting Ports by Adam Young

Cloud is easy. It is networking that is hard.

Read more at https://adam.younglogic.com/2019/11/deleting-trunks-before-ports/

by Rain Leander at November 11, 2019 01:46 PM

November 09, 2019



Aptira OSN Day

The Open Networking technology landscape has evolved quickly over the last two years. How can Telco’s keep up?

Our team of Network experts have used Software Defined Networking techniques for many different use cases, including: Traffic EngineeringSegment RoutingIntegration and Automated Traffic Engineering and many more, addressing many of the key challenges associated with networks; including security, volume and flexibility concerns to provide customers with an uninterrupted user experience.

At OSN Day, we will be helping attendees to learn about the risks associated with 5G networks. Edge Compute is needed for 5G and 5G-enabled use cases, but currently 5G-enabled use cases are ill-defined and incremental revenue is uncertain. Therefore, it’s not clear what is actually required, and the Edge business case is risky. We’ll be on site explaining how to mitigate against these risks, ensuring successful network functionality through the implementation of a risk-optimised approach to 5G. You can download the full whitepaper here.

We will also have our amazingly talented Network Consultant Farzaneh Pakzad presenting in The Programmable Network breakout track. Farzaneh will be comparing, rating and evaluating each of the most popular Open Source SDN controllers in use todayThis comparison will be useful for organisations to help them select the right SDN controller for their platform which match their network design and requirements. 

Farzaneh has a PhD in Software Defined Networks from the University of Queensland. Her research interests include Software Defined Networks, Cloud Computing and Network Security. During her career, Farzaneh has provided advisory service for transport SDN solutions and implemented Software Defined Networking Wide Area Network functionalities for some of Australia’s largest Telco’s.

We’ve got some great swag to giveaway and will also be running a demonstration on Tungsten Fabric as a Kubernetes CNI, so if you’re at OSN Day make sure you check out Farzaneh’s session in Breakout room 2 and also visit the team of Aptira Solutionauts in the expo room. They can help you to create, design and deploy the network of tomorrow.

Ready to move your network into the software defined future?
Automate your network with ONAP.

Find Out How

The post OSN-Day appeared first on Aptira.

by Jessica Field at November 09, 2019 12:53 PM

November 07, 2019

Adam Young

Deleting Trunks in OpenStack before Deleting Ports

Cloud is easy. It is networking that is hard.

Red Hat supports installing OpenShift on OpenStack. As a Cloud SA, I need to be able to demonstrate this, and make it work for customers. As I was playing around with it, I found I could not tear down clusters due to a dependency issue with ports.

When building and tearing down network structures with Ansible, I had learned the hard way that there were dependencies. Routers came down before subnets, and so one. But the latest round had me scratching my head. I could not get ports to delete, and the error message was not a help.

I was able to figure out that the ports linked to security groups. In fact, I could unset almost all of the dependencies using the port set command line. For example:

openstack port set openshift-q5nqj-master-port-1  --no-security-group --no-allowed-address --no-tag --no-fixed-ip

However, I still could not delete the ports. I did notice that there was a trunk_+details section at the bottom of the port show output:

trunk_details         | {'trunk_id': 'dd1609af-4a90-4a9e-9ea4-5f89c63fb9ce', 'sub_ports': []} 

But there is no way to “unset” that. It turns out I had it backwards. You need to delete the port first. A message from Kristi Nikolla:

the port is set as the parent for a “trunk” so you need to delete the trunk firs

Kristi In IRC
<pre lang="bash">curl -H "x-auth-token: $TOKEN" https://kaizen.massopen.cloud:13696/v2.0/trunks/</pre>

It turns out that you can do this with the CLI…at least I could.

$ openstack network trunk show 01a19e41-49c6-467c-a726-404ffedccfbb
admin_state_up UP
created_at 2019-11-04T02:58:08Z
id 01a19e41-49c6-467c-a726-404ffedccfbb
name openshift-zq7wj-master-trunk-1
port_id 6f4d1ecc-934b-4d29-9fdd-077ffd48b7d8
project_id b9f1401936314975974153d78b78b933
revision_number 3
status DOWN
tags [‘openshiftClusterID=openshift-zq7wj’]
tenant_id b9f1401936314975974153d78b78b933
updated_at 2019-11-04T03:09:49Z

Here is the script I used to delete them. Notice that the status was DOWN for all of the ports I wanted gone.

for PORT in $( openstack port list | awk '/DOWN/ {print $2}' ); do TRUNK_ID=$( openstack port show $PORT -f json | jq  -r '.trunk_details | .trunk_id ') ; echo port  $PORT has trunk $TRUNK_ID;  openstack network trunk delete $TRUNK_ID ; done

Kristi had used the curl command because he did not have the network trunk option in his CLI. Turns out he needed to install python-neutronclient first.

by Adam Young at November 07, 2019 07:27 PM

November 06, 2019

StackHPC Team Blog

Worlds Collide: Virtual Machines & Bare Metal in OpenStack

Ironic's mascot, Pixie Boots

To virtualise or not to virtualise?

If performance is what you need, then there's no debate - bare metal still beats virtual machines; particularly for I/O intensive applications. However, unless you can guarantee to keep it fully utilised, iron comes at a price. In this article we describe how Nova can be used to provide access to both hypervisors and bare metal compute nodes in a unified manner.


When support for bare metal compute via Ironic was first introduced to Nova, it could not easily coexist with traditional hypervisor-based workloads. Reported workarounds typically involved the use of host aggregates and flavor properties.

Scheduling of bare metal is covered in detail in our bespoke bare metal blog article (see Recap: Scheduling in Nova).

Since the Placement service was introduced, scheduling has significantly changed for bare metal. The standard vCPU, memory and disk resources were replaced with a single unit of a custom resource class for each Ironic node. There are two key side-effects of this:

  • a bare metal node is either entirely allocated or not at all
  • the resource classes used by virtual machines and bare metal are disjoint, so we could not end up with a VM flavor being scheduled to a bare metal node

A flavor for a 'tiny' VM might look like this:

openstack flavor show vm-tiny -f json -c name -c vcpus -c ram -c disk -c properties
  "name": "vm-tiny",
  "vcpus": 1,
  "ram": 1024,
  "disk": 1,
  "properties": ""

A bare metal flavor for 'gold' nodes could look like this:

openstack flavor show bare-metal-gold -f json -c name -c vcpus -c ram -c disk -c properties
  "name": "bare-metal-gold",
  "vcpus": 64,
  "ram": 131072,
  "disk": 371,
  "properties": "resources:CUSTOM_GOLD='1',

Note that the vCPU/RAM/disk resources are informational only, and are zeroed out via properties for scheduling purposes. We will discuss this further later on.

With flavors in place, users choosing between VMs and bare metal is handled by picking the correct flavor.

What about networking?

In our mixed environment, we might want our VMs and bare metal instances to be able to communicate with each other, or we might want them to be isolated from each other. Both models are possible, and work in the same way as a typical cloud - Neutron networks are isolated from each other until connected via a Neutron router.

Bare metal compute nodes typically use VLAN or flat networking, although with the right combination of network hardware and Neutron plugins other models may be possible. With VLAN networking, assuming that hypervisors are connected to the same physical network as bare metal compute nodes, then attaching a VM to the same VLAN as a bare metal compute instance will provide L2 connectivity between them. Alternatively, it should be possible to use a Neutron router to join up bare metal instances on a VLAN with VMs on another network e.g. VXLAN.

What does this look like in practice? We need a combination of Neutron plugins/drivers that support both VM and bare metal networking. To connect bare metal servers to tenant networks, it is necessary for Neutron to configure physical network devices. We typically use the networking-generic-switch ML2 mechanism driver for this, although the networking-ansible driver is emerging as a promising vendor-neutral alternative. These drivers support bare metal ports, that is Neutron ports with a VNIC_TYPE of baremetal. Vendor-specific drivers are also available, and may support both VMs and bare metal.

Where's the catch?

One issue that more mature clouds may encounter is around the transition from scheduling based on standard resource classes (vCPU, RAM, disk), to scheduling based on custom resource classes. If old bare metal instances exist that were created in the Rocky release or earlier, they may have standard resource class inventory in Placement, in addition to their custom resource class. For example, here is the inventory reported to Placement for such a node:

$ openstack resource provider inventory list <node UUID>
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit |  total |
| VCPU           |              1.0 |       64 |        0 |         1 |        1 |     64 |
| MEMORY_MB      |              1.0 |   131072 |        0 |         1 |        1 | 131072 |
| DISK_GB        |              1.0 |      371 |        0 |         1 |        1 |    371 |
| CUSTOM_GOLD    |              1.0 |        1 |        0 |         1 |        1 |      1 |

If this node is allocated to an instance whose flavor requested (or did not explicitly zero out) standard resource classes, we will have a usage like this:

$ openstack resource provider usage show <node UUID>
| resource_class |  usage |
| VCPU           |     64 |
| MEMORY_MB      | 131072 |
| DISK_GB        |    371 |
| CUSTOM_GOLD    |      1 |

If this instance is deleted, the standard resource class inventory will become available, and may be selected by the scheduler for a VM. This is not likely to end well. What we must do is ensure that these resources are not reported to Placement. This is done by default in the Stein release of Nova, and Rocky may be configured to do the same by setting the following in nova.conf:

report_ironic_standard_resource_class_inventory = False

However, if we do that, then Nova will attempt to remove inventory from Placement resource providers that is already consumed by our instance, and will receive a HTTP 409 Conflict. This will quickly fill our logs with unhelpful noise.

Flavor migration

Thankfully, there is a solution. We can modify the embedded flavor in our existing instances to remove the standard resource class inventory, which will result in the removal of the allocation of these resources from Placement. This will allow Nova to remove the inventory from the resource provider. There is a Nova patch started by Matt Riedemann which will remove our standard resource class inventory. The patch needs pushing over the line, but works well enough to be cherry-picked to Rocky.

The migration can be done offline or online. We chose to do it offline, to avoid the need to deploy this patch. For each node to be migrated:

nova-manage db ironic_flavor_migration --resource_class <node resource class> --host <host> --node <node UUID>

Alternatively, if all nodes have the same resource class:

nova-manage db ironic_flavor_migration --resource_class <node resource class> --all

You can check the instance embedded flavors have been updated correctly via the database:

sql> use nova
sql> select flavor from instance_extra;

Now (Rocky only), standard resource class inventory reporting can be disabled. After the nova compute service has been running for a while, Placement will be updated:

$ openstack resource provider inventory list <node UUID>
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
| CUSTOM_GOLD    |              1.0 |        1 |        0 |         1 |        1 |     1 |

$ openstack resource provider usage show <node UUID>
| resource_class |  usage |
| CUSTOM_GOLD    |      1 |


We hope this shows that OpenStack is now in a place where VMs and bare metal can coexist peacefully, and that even for those pesky pets, there is a path forward to this brave new world. Thanks to the Nova team for working hard to make Ironic a first class citizen.

by Mark Goddard at November 06, 2019 02:00 AM

November 04, 2019

Dan Smith

Start and Monitor Image Pre-cache Operations in Nova

When you boot an instance in Nova, you provide a reference to an image. In many cases, once Nova has selected a host, the virt driver on that node downloads the image from Glance and uses it as the basis for the root disk of your instance. If your nodes are using a virt driver that supports image caching, then that image only needs to be downloaded once per node, which means the first instance to use that image causes it to be downloaded (and thus has to wait). Subsequent instances based on that image will boot much faster as the image is already resident.

If you manage an application that involves booting a lot of instances from the same image, you know that the time-to-boot for those instances could be vastly reduced if the image is already resident on the compute nodes you will land on. If you are trying to avoid the latency of rolling out a new image, this becomes a critical calculation. For years, people have asked for or proposed solutions in Nova for allowing some sort of image pre-caching to solve this, but those discussions have always become stalled in detail hell. Some people have resorted to hacks like booting host-targeted tiny instances ahead of time, direct injection of image files to Nova’s cache directory, or local code modifications. Starting in the Ussuri release, such hacks will no longer be necessary.

Image pre-caching in Ussuri

Nova’s now-merged image caching feature includes a very lightweight and no-promises way to request that an image be cached on a group of hosts (defined by a host aggregate). In order to avoid some of the roadblocks to success that have plagued previous attempts, the new API does not attempt to provide a rich status result, nor a way to poll for or check on the status of a caching operation. There is also no scheduling, persistence, or reporting of which images are cached where. Asking Nova to cache one or more images on a group of hosts is similar to asking those hosts to boot an instance there, but without the overhead that goes along with it. That means that images cached as part of such a request will be subject to the same expiry timer as any other. If you want them to remain resident on the nodes permanently, you must re-request the images before the expiry timer would have purged them. Each time an image is pre-cached on a host, the timestamp for purge is updated if the image is already resident.

Obviously for a large cloud, status and monitoring of the cache process in some way is required, especially if you are waiting for it to complete before starting a rollout. The subject of this post is to demonstrate how this can be done with notifications.

Example setup

Before we can talk about how to kick off and monitor a caching operation, we need to set up the basic elements of a deployment. That means we need some compute nodes, and for those nodes to be in an aggregate that represents the group that will be the target of our pre-caching operation. In this example, I have a 100-node cloud with numbered nodes that look like this:

$ nova service-list --binary nova-compute
| Binary | Host |
| nova-compute | guaranine1 |
| nova-compute | guarnaine2 |
| nova-compute | guaranine3 |
| nova-compute | guaranine4 |
| nova-compute | guaranine5 |
| nova-compute | guaranine6 |
| nova-compute | guaranine7 |
.... and so on ...
| nova-compute | guaranine100 |

In order to be able to request that an image be pre-cached on these nodes, I need to put some of them into an aggregate. I will do that programmatically since there are so many of them like this:

$ nova aggregate-create my-application
| Id | Name | Availability Zone | Hosts | Metadata | UUID |
| 2 | my-application | - | | | cf6aa111-cade-4477-a185-a5c869bc3954 |
$ for i in seq 1 95; do nova aggregate-add-host my-application guaranine$i; done
... lots of noise ...

Now that I have done that, I am able to request that an image be pre-cached on all the nodes within that aggregate by using the nova aggregate-cache-images command:

$ nova aggregate-cache-images my-application c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2

If all goes to plan, sometime in the future all of the hosts in that aggregate will have fetched the image into their local cache and will be able to use that for subsequent instance creation. Depending on your configuration, that happens largely sequentially to avoid storming Glance, and with so many hosts and a decently-sized image, it could take a while. If I am waiting to deploy my application until all the compute hosts have the image, I need some way of monitoring the process.

Monitoring progress

Many of the OpenStack services send notifications via the messaging bus (i.e. RabbitMQ) and Nova is no exception. That means that whenever things happen, Nova sends information about those things to a queue on that bus (if so configured) which you can use to receive asynchronous information about the system.

The image pre-cache operation sends start and end versioned notifications, as well as progress notifications for each host in the aggregate, which allows you to follow along. Ensure that you have set [notifications]/notification_format=versioned in your config file in order to receive these. A sample intermediate notification looks like this:

'index': 68,
'total': 95,
'images_failed': [],
'uuid': 'ccf82bd4-a15e-43c5-83ad-b23970338139',
'images_cached': ['c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2'],
'host': 'guaranine68',
'id': 1,
'name': 'my-application',

This tells us that host guaranine68 just completed its cache operation for one image in the my-application aggregate. It was host 68 of 95 total. Since the image ID we used is in the images_cached list, that means it was either successfully downloaded on that node, or was already present. If the image failed to download for some reason, it would be in the images_failed list.

In order to demonstrate what this might look like, I wrote some example code. This is not intended to be production-ready, but will provide a template for you to write something of your own to connect to the bus and monitor a cache operation. You would run this before kicking off the process, it waits for a cache operation to begin, prints information about progress, and then exists with a non-zero status code if there were any errors detected. For the above example invocation, the output looks like this:

$ python image_cache_watcher.py
Image cache started on 95 hosts
Aggregate 'foo' host 95: 100% complete (8 errors)
Completed 94 hosts, 8 errors in 2m31s
Errors from hosts:
Image c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2 failed 8 times

In this case, I intentionally configured eight hosts so that the image download would fail for demonstration purposes.


The image caching functionality in Nova may gain more features in the future, but for now, it is a best-effort sort of thing. With just a little bit of scripting, Ussuri operators should be able to kick off and monitor image pre-cache operations and substantially improve time-to-boot performance for their users.

by Dan at November 04, 2019 07:30 PM


How to build an edge cloud part 1: Building a simple facial recognition system

Learn about the basics of building an edge cloud -- and build a facial recognition system while you're at it.

by Nick Chase at November 04, 2019 07:06 PM

OpenStack Superuser

Baidu wins Superuser Award at Open Infrastructure Summit Shanghai

 The Baidu ABC Cloud Group & Security Edge teams is the 11th organization to win the Superuser Award. The news was announced today at the Open Infrastructure Summit in Shanghai. Baidu ABC Cloud Group and Edge Security Team integrated Kata Containers into the platform for all of Baidu internal and external cloud services including edge applications. Their cloud products, including both VMs and bare metal servers, cover 11 regions in China with over 5,000 physical machines. Today, 17 important online businesses have been migrated to the Kata Containers platform thus far.

Elected by members of the OSF community, the team that wins the Superuser Award is lauded for the unique nature of its use case as well as its integration and application of open infrastructure. Four out of five nominees for the Superuser Award presented today were from the APAC region: Baidu ABC Cloud Group and Edge Security Team, InCloud OpenStack Team of Inspur, Information Management Department of Wuxi Metro, and Rakuten Mobile Network Organization. Previous award winners from the APAC region include China Mobile, NTT Group, and the Tencent TStack Team.

Baidu Keynote at Open Infrastructure Summit

On the keynote stage in Shanghai, Baidu Cloud Senior Architect Zhang Yu explained that Kata Containers provides a virtual machine-like security mechanism at the container level, which gives their customers a great deal of confidence. When moving their business to a container environment, they have less concern. Kata Containers is compatible with the OCI standard and users can directly manage the new environment with popular management suites such as Kubernetes. Kata Containers is now an official project under the OpenStack Foundation, which gives the company confidence to invest in the project.

“Baidu is an amazing example of how open infrastructure starts with OpenStack,” said Mark Collier, COO of the OpenStack Foundation. “They’re running OpenStack at massive scale, combined with other open infrastructure technologies like Kata Containers and Kubernetes, and they’re doing it in production for business-critical workloads.”

*** Download the Baidu Kata Containers White Paper ***

The company has published a white paper titled, “The Application of Kata Containers in Baidu AI Cloud” available here.

The post Baidu wins Superuser Award at Open Infrastructure Summit Shanghai appeared first on Superuser.

by Allison Price at November 04, 2019 04:39 AM

November 02, 2019

StackHPC Team Blog

StackHPC joins the OpenStack Marketplace

In many areas, our participation in the OpenStack community is no secret.

One area we haven't focussed on is our commercial representation within the OpenStack Foundation. As described here, StackHPC works with clients to solve challenging problems with cloud infrastructure. Our business has been won through word of mouth.

Now our services can also be found in the OpenStack Marketplace.

John Taylor, StackHPC's co-founder and CEO, adds:

We are pleased to announce our OpenStack Foundation membership and inclusion in the OpenStack Marketplace. Our success in driving the HPC and Research Computing use-case in cloud has been in no small part coupled to working closely with the OpenStack Foundation and the open community it fosters. The era of hybrid cloud and the emergence of converged AI/HPC infrastructure and coupled workflows is now upon us, driving the need for architectures that seamlessly transition across these resources while not compromising on performance. We look forward to continuing our partnership with OpenStack through the Scientific SIG and to active participation within OpenStack projects.

Get in touch

If you would like to get in touch we would love to hear from you. Reach out to us via Twitter or directly via our contact page.

by Stig Telfer at November 02, 2019 09:00 AM

October 31, 2019


RDO Train Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Train for RPM-based distributions, CentOS Linux and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Train is the 20th release from the OpenStack project, which is the work of more than 1115 contributors from around the world.

The release is already available on the CentOS mirror network at http://mirror.centos.org/centos/7/cloud/x86_64/openstack-train/. While we normally also have the release available via http://mirror.centos.org/altarch/7/cloud/ppc64le/ and http://mirror.centos.org/altarch/7/cloud/aarch64/ – there have been issues with the mirror network which is currently being addressed via https://bugs.centos.org/view.php?id=16590.

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Linux and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS Linux users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

PLEASE NOTE: At this time, RDO Train provides packages for CentOS7 only. We plan to move RDO to use CentOS8 as soon as possible during Ussuri development cycle so Train will be the last release working on CentOS7.

Interesting things in the Train release include:

  • Openstack Ansible, which provides ansible playbooks and roles for deployment, added murano support and fully migrated to systemd-journald from rsyslog. This project makes deploying OpenStack from source in a way that makes it scalable while also being simple to operate, upgrade, and grow.
  • Ironic, the Bare Metal service, aims to produce an OpenStack service and associated libraries capable of managing and provisioning physical machines in a security-aware and fault-tolerant manner. Beyond providing basic support for building software RAID and a myriad of other highlights, this project now offers a new tool for building ramdisk images, ironic-python-agent-builder.

Other improvements include:

  • Tobiko is now available within RDO! This project is an OpenStack testing framework focusing on areas mostly complementary to Tempest. While the tempest main focus has been testing OpenStack rest APIs, the main Tobiko focus would be to test OpenStack system operations while “simulating” the use of the cloud as the final user would. Tobiko’s test cases populate the cloud with workloads such as instances, allows the CI workflow to perform an operation such as an update or upgrade, and then runs test cases to validate that the cloud workloads are still functional.
  • Other highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/train/highlights.html.

During the Train cycle, we saw the following new RDO contributors:

  • Joel Capitao
  • Zoltan Caplovic
  • Sorin Sbarnea
  • Sławek Kapłoński
  • Damien Ciabrini
  • Beagles
  • Soniya Vyas
  • Kevin Carter (cloudnull)
  • fpantano
  • Michał Dulko
  • Stephen Finucane
  • Sofer Athlan-Guyot
  • Gauvain Pocentek
  • John Fulton
  • Pete Zaitcev

Welcome to all of you and Thank You So Much for participating!

But we wouldn’t want to overlook anyone. A super massive Thank You to all 65 contributors who participated in producing this release. This list includes commits to rdo-packages and rdo-infra repositories:

  • Adam Kimball
  • Alan Bishop
  • Alex Schultz
  • Alfredo Moralejo
  • Arx Cruz
  • Beagles
  • Bernard Cafarelli
  • Bogdan Dobrelya
  • Brian Rosmaita
  • Carlos Goncalves
  • Cédric Jeanneret
  • Chandan Kumar
  • Damien Ciabrini
  • Daniel Alvarez
  • David Moreau Simard
  • Dmitry Tantsur
  • Emilien Macchi
  • Eric Harney
  • fpantano
  • Gael Chamoulaud
  • Gauvain Pocentek
  • Jakub Libosvar
  • James Slagle
  • Javier Peña
  • Joel Capitao
  • John Fulton
  • Jon Schlueter
  • Kashyap Chamarthy
  • Kevin Carter (cloudnull)
  • Lee Yarwood
  • Lon Hohberger
  • Luigi Toscano
  • Luka Peschke
  • marios
  • Martin Kopec
  • Martin Mágr
  • Matthias Runge
  • Michael Turek
  • Michał Dulko
  • Michele Baldessari
  • Natal Ngétal
  • Nicolas Hicher
  • Nir Magnezi
  • Otherwiseguy
  • Gabriele Cerami
  • Pete Zaitcev
  • Quique Llorente
  • Radomiropieralski
  • Rafael Folco
  • Rlandy
  • Sagi Shnaidman
  • shrjoshi
  • Sławek Kapłoński
  • Sofer Athlan-Guyot
  • Soniya Vyas
  • Sorin Sbarnea
  • Stephen Finucane
  • Steve Baker
  • Steve Linabery
  • Tobias Urdin
  • Tony Breeds
  • Tristan de Cacqueray
  • Victoria Martinez de la Cruz
  • Wes Hayutin
  • Yatin Karel
  • Zoltan Caplovic

The Next Release Cycle
At the end of one release, focus shifts immediately to the next, Ussuri, which has an estimated GA the week of 11-15 May 2020. The full schedule is available at https://releases.openstack.org/ussuri/schedule.html.

Twice during each release cycle, RDO hosts official Test Days shortly after the first and third milestones; therefore, the upcoming test days are 19-20 December 2019 for Milestone One and 16-17 April 2020 for Milestone Three.

Get Started
There are three ways to get started with RDO.

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

For a production deployment of RDO, use the TripleO Quickstart and you’ll be running a production cloud in short order.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help
The RDO Project participates in a Q&A service at https://ask.openstack.org. We also have our users@lists.rdoproject.org for RDO-specific users and operrators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on Freenode IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel, and #tripleo on irc.freenode.net), however we have a more focused audience within the RDO venues.

Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo and #tripleo on the Freenode IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

by Rain Leander at October 31, 2019 04:18 PM

October 29, 2019


Join us at the Open Infrastructure Summit in Shanghai!

As active users and contributors to OpenStack and its projects, VEXXHOST is excited to be attending the Open Infrastructure Summit in Shanghai this year!

The post Join us at the Open Infrastructure Summit in Shanghai! appeared first on VEXXHOST.

by Samridhi Sharma at October 29, 2019 04:48 PM

Galera Cluster by Codership

Galera Cluster for MySQL 5.6.46 and MySQL 5.7.28 is GA

Codership is pleased to announce a new Generally Available (GA) release of Galera Cluster for MySQL 5.6 and 5.7, consisting of MySQL-wsrep 5.6.46 (release notes, download) and MySQL-wsrep 5.7.28 (release notes, download). There is no Galera replication library release this time, so please continue using the 3.28 version, implementing wsrep API version 25.

This release incorporates all changes to MySQL 5.6.46 and 5.7.28 respectively and can be considered an updated rebased version. It is worth noting that we will have some platforms reach end of life (EOL) status, notably OpenSUSE 13.2 and Ubuntu Trusty 14.04.

You can get the latest release of Galera Cluster from https://www.galeracluster.com. There are package repositories for Debian, Ubuntu, CentOS, RHEL, OpenSUSE and SLES. The latest versions are also available via the FreeBSD Ports Collection.

by Colin Charles at October 29, 2019 12:42 PM

October 28, 2019


53 Things to look for in OpenStack Train

Now that OpenStack Train has been released, here are some features to look for.

by Nick Chase at October 28, 2019 04:27 PM

October 24, 2019

OpenStack Superuser

Using GitHub and Gerrit with Zuul: A leboncoin case study

Described as an online flea market, leboncoin is a portal that allows individuals to buy and sell new and used goods online in their local communities.  Leboncoin is one of the top ten searched websites in France, following Google, Facebook, and YouTube to name a few.

We got talking with Guillaume Chenuet to get some answers to why Leboncoin chose Zuul, an open source CI tool, and how they use it with GitHub, Gerrit, and OpenStack.  

How did your organization get started with Zuul?

We started using Zuul for open source CI two years ago with Zuulv2 and Jenkins. At the beginning, we only used Gerrit and Jenkins, but as new developers joined leboncoin each new day, this solution was not enough. After some research and a proof-of-concept, we gave Zuul a try, running between Gerrit and Jenkins. In less than a month (and without an official thick documentation) we’ve setup a complete new stack. We ran it for a year before moving to Zuulv3. Zuulv3 is more complex in terms of setup but brings us more features using up-to-date tools like Ansible or OpenStack.

Describe how you’re using it:

We’re using Zuulv3 with Gerrit. Our workflow is close to the OpenStack one. For each review, Zuul is trigger on three “checks” pipelines: quality, integration and build. Once results are correct, we use the gate system to merge the code into repositories and build artifacts.

We are using two small OpenStack clusters (3 CTRL / 3 STRG / 5 COMPUTE) on each datacenter. Zuul is currently setup on all Gerrit projects and some GitHub projects too. Below, is our Zuulv3 infrastructure in production and in the case of datacenter loss.


Zuulv3 infrastructure in production.


Zuulv3 infrastructure in the case of DC loss.

What is your current scale?

In terms of compute resources, we currently have 480 cores, 1.3To Ram and 80To in our Ceph clusters available. In terms of jobs, we ran around 60,000 jobs per month which means ~around 2,500 jobs per day. Jobs average time is less than 5 minutes.


What benefits has your organization seen from using Zuul?

As leboncoin is growing very fast (and microservices too 🙂 ), Zuul allows us to ensure everything can be tested and at scale. Zuul is also able to work with Gerrit and GitHub which permits us to open our CI to more teams and workflows.

What have the challenges been (and how have you solved them)?

Our big challenge was to migrate from Zuulv2 to Zuulv3. Even if everything is using Ansible, it was very tiresome to migrate all our CI jobs (around 500 Jenkins jobs). With the help of Zuul guys on IRC, we used some Ansible roles and playbooks used by OpenStack but migration time was about a year.

What are your future plans with Zuul?

Our next steps are to use Kubernetes backend for small jobs like linters and improve Zuul with GitHub.

How can organizations who are interested in Zuul learn more and get involved?

Coming from OpenStack, I think meeting the community at Summits or on IRC is a good start. But Zuul needs better visibility. It is a powerful tool but the information online is limited.

Are there specific features that drew you to Zuul?

Scalability! And also ensuring than every commit merge into the repository is clean and can’t be broken.

What would you request from the Zuul upstream community?

Work on a better integration to Gerrit 3, new nodepool features and provider, a full HA and more visibility on the Internet.


Cover image courtesy of Guillaume Chenuet.

The post Using GitHub and Gerrit with Zuul: A leboncoin case study appeared first on Superuser.

by Ashleigh Gregory at October 24, 2019 02:00 PM

October 23, 2019

Corey Bryant

OpenStack Train for Ubuntu 18.04 LTS

The Ubuntu OpenStack team at Canonical is pleased to announce the general availability of OpenStack Train on Ubuntu 18.04 LTS via the Ubuntu Cloud Archive. Details of the Train release can be found at:  https://www.openstack.org/software/train

To get access to the Ubuntu Train packages:

Ubuntu 18.04 LTS

You can enable the Ubuntu Cloud Archive pocket for OpenStack Train on Ubuntu 18.04 installations by running the following commands:

    sudo add-apt-repository cloud-archive:train
    sudo apt update

The Ubuntu Cloud Archive for Train includes updates for:

aodh, barbican, ceilometer, ceph (14.2.2), cinder, designate, designate-dashboard, dpdk (18.11.2), glance, gnocchi, heat, heat-dashboard, horizon, ironic, keystone, libvirt (5.4.0), magnum, manila, manila-ui, mistral, murano, murano-dashboard, networking-arista, networking-bagpipe, networking-bgpvpn, networking-hyperv, networking-l2gw, networking-mlnx, networking-odl, networking-ovn, networking-sfc, neutron, neutron-dynamic-routing, neutron-fwaas, neutron-lbaas, neutron-lbaas-dashboard, neutron-vpnaas, nova, octavia, openstack-trove, openvswitch (2.12.0), panko, placement, qemu (4.0), sahara, sahara-dashboard, senlin, swift, trove-dashboard, vmware-nsx, watcher, and zaqar.

For a full list of packages and versions, please refer to:


Python support

The Train release of Ubuntu OpenStack is Python 3 only; all Python 2 packages have been dropped in Train.

Branch package builds

If you would like to try out the latest updates to branches, we deliver continuously integrated packages on each upstream commit via the following PPA’s:

    sudo add-apt-repository ppa:openstack-ubuntu-testing/mitaka
    sudo add-apt-repository ppa:openstack-ubuntu-testing/ocata
    sudo add-apt-repository ppa:openstack-ubuntu-testing/queens
    sudo add-apt-repository ppa:openstack-ubuntu-testing/rocky
    sudo add-apt-repository ppa:openstack-ubuntu-testing/train

Reporting bugs

If you have any issues please report bugs using the ‘ubuntu-bug’ tool to ensure that bugs get logged in the right place in Launchpad:

    sudo ubuntu-bug nova-conductor

Thanks to everyone who has contributed to OpenStack Train, both upstream and downstream. Special thanks to the Puppet OpenStack modules team and the OpenStack Charms team for their continued early testing of the Ubuntu Cloud Archive, as well as the Ubuntu and Debian OpenStack teams for all of their contributions.

Enjoy and see you in Ussuri!

(on behalf of the Ubuntu OpenStack team)

by coreycb at October 23, 2019 12:56 PM

October 22, 2019


Cycle Trailing Projects and RDO’s Latest Release Train

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Train for RPM-based distributions, CentOS Linux and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Train is the 20th release from the OpenStack project, which is the work of more than 1115 contributors from around the world.

The release is already available on the CentOS mirror network at http://mirror.centos.org/centos/7/cloud/x86_64/openstack-train/.


This is not the official announcement you’re looking for.

We’re doing something a little different this cycle – we’re waiting for some of the “cycle-trailing” projects that we’re particularly keen about, like TripleO and Kolla, to finish their push BEFORE we make the official announcement.

Photo by Denis Chick on Unsplash

Deployment and lifecycle-management tools generally want to follow the release cycle, but because they rely on the other projects being completed, they may not always publish their final release at the same time as those projects. To that effect, they may choose the cycle-trailing release model.

Cycle-trailing projects are given an extra three months after the final release date to request publication of their release. They may otherwise use intermediary releases or development milestones.

While we’re super hopeful that these cycle trailing projects will be uploaded to the CentOS mirror before OpenInfrastructure Summit Shanghai, we’re going to do the official announcement just before the Summit with or without the packages.

We’ve got a lot of people to thank!

Do you like that we’re waiting a bit for our cycle trailing projects or would you prefer the official announcement as soon as the main projects are available? Let us know in the comments and we may adjust the process for future releases!

In the meantime, keep an eye here or on the mailing lists for the official announcement COMING SOON!

by Rain Leander at October 22, 2019 02:34 PM

Sean McGinnis

October 2019 OpenStack Board Notes

Another OpenStack Foundation Board of Directors meeting was held October 22, 2019. This meeting was added primarily as to discuss the Airship’s request to for confirmation to become an official project.

The meeting agenda is published on the wiki.

OSF Updates

Jonathan Bryce gave a quick update on the OpenStack Train release that went out last week. The number of contributors, variety of companies, and overall commit numbers were pretty impressive. There were over 25,500 merged commits in Train, with 1,125 unique developers from 165 different organizations. With commits over the last cycle, OpenStack is still one of the top three active open source projects out there, after the Linux kernel and Chromium.

Jonathan also reiterated that the event structure will be different starting in 2020. The first major event planned is in Vancouver, June 8. This will be more of a collaborative event, so expect the format to be different than past Summits. I’m thinking more Project Teams Gathering than Summit.

Airship Confirmation

Matt McEuen, Alex Hughes, Kaspar Skels, and Jay Ahn went through the Airship Confirmation presentation and answered questions about the project and their roadmap. Overall, really pretty impressive what the Airship community has been able to accomplish so far.

The Airship mission statement is:

Openly collaborate across a divers, global community to provide and integrate a collection of loosely coupled but interoperable, open source tools taht declaratively automates cloud lifecycle management.

They started work inside of openstack-helm and kept to the OpenStack community Four Opens right from the start.

Project Diversity

The project was started by AT&T, so there is still a lot of work being done (code reviews, commits, etc.) from the one company, but the trend over the last couple of years has been really good, trending towards more and more contributor diversity.

They also have good policies in place to make sure the Technical Committee and Working Committee have no more than two members from the same company. Great to see this policy in place to really encourage more diversity in the spots where overall project decisions are made. Kudos to the AT&T folks for not only getting things started, but driving a lot of change while still actively encouraging others so it is not a one company show. It can be hard for some companies to realize that giving up absolute control is a good thing, especially when it comes to an open source community.

Community Feedback

Part of the confirmation process is to make sure the existing OSF projects are comfortable with the new project. There was feedback from the Zuul project and from the OpenStack TC. Rico Lin went through the TC feedback in the meeting. Only minor questions or concerns were raised there, and Matt was able to respond to most of them in the meeting. He did state he would respond to the mailing list so there was a record there of the responses.


Really the only point of concern was raised at the end. One difference between Airship and other OpenStack projects is that it is written in Go. Go has a great system built in to be able to easily use modules written by others. But that led to the question of licensing.

The OSF bylaws state:

The Board of Directors may approve a license for an Open Infrastructure Project other than Apache License 2.0, but such license must be a license approved by the Open Source Initiative at the date of adoption of such license.

The Airship code itself is Apache 2.0. But there isn’t anything done today to vet the dependencies that are pulled in to actually compile the project. The concern is the copyleft licenses usually have provisions that if they are pulled in and linked to non-copyleft code, it then makes that code fall under the copyleft requirements. So the only concern was that it just wasn’t known what the effective license of the project is today based on what is being pulled in.

It can be a very tricky area that definitely requires involvement of lawyers that understand copyright law and open source licensing. Luckily it wasn’t a show stopper. We moved to add the project and have them work with OSF legal to better understand the licensing impacts and resolve any concerns by using different dependencies if any are found to be licensed with something that would impose copyleft into Airship. The board unanimously voted in favor of Airship becoming a fully official Open Infrastructure Project.

Next Meeting

The next OSF board meeting will take place November 3rd, in Shanghai, the day before the Open Infrastructure Summit.

by Sean McGinnis at October 22, 2019 12:00 AM

October 21, 2019

OpenStack Superuser

Collaboration across Boundaries: Open Culture Drives Evolving Tech

This past summer marked a pinnacle in OpenStack’s history — the project’s ninth birthday — a project that epitomizes collaboration without boundaries. Communities comprised of diverse individuals and companies united around the globe to celebrate this exciting milestone, from Silicon Valley to the Pacific Northwest to the Far East. Participants from communities that spanned OpenStack, Kubernetes, Kata Containers, Akraino, Project ACRN and Clear Linux — and represented nearly 60 organizations — shared stories about their collective journey, and looked towards the future.

An Amazing Journey

The Shanghai event brought together several organizations, including 99Cloud, China Mobile, Intel, ZStack, East China Normal University, Shanghai Jiaotong University and Tongji University, as well as the OpenStack Foundation.

Individual OpenStack Foundation board director, Shane Wang, talked about OpenStack’s history. What began as an endeavor to bring greater choice in cloud solutions to users, combining Nova for compute from NASA with Swift for object storage from Rackspace, has since grown into a strong foundation for open infrastructure. The project is supported by one of the largest, global open source communities of 105,000 members in 187 countries from 675 organizations, backed by over 100 member companies.

“After nine years of development, OpenStack and current Open Infrastructure have attracted a large number of enterprises, users, developers and enthusiasts to join the community, and together we’ve initiated new projects and technologies to address emerging scenarios and use cases,” said Shane. “Through OpenStack and Open Infrastructure, businesses can realize healthy profits, users can satisfy their needs, innovations can be incubated through a thriving community, and individuals can grow their skills and talents. These are the reasons that the community stays strong and popular.”

Truly representative of cross-project collaboration, this Open Infrastructure umbrella now encompasses components that can be used to address existing and emerging use cases across data center and edge. Today’s applications span enterprise, artificial intelligence, machine learning, 5G networking and more. Adoption ranges from retail, financial services, academia and telecom to manufacturing, public cloud and transportation.



Junwei Liu, OpenStack board member from China Mobile, winner of the 2016 SuperUser award, joined the birthday celebration. He reflected on OpenStack’s capability to address existing and emerging business needs: “Since 2015, China Mobile, a leading company in the cloud industry, has built a public cloud, private cloud and networking cloud for internal and external customers based on OpenStack. OpenStack has been proven mature enough to meet the needs of core business and has become the de facto standard of IaaS resource management. The orchestration systems integrating Kubernetes into OpenStack as the core will be the most controllable and the most suitable cloud computing platform which meets enterprises’ own business needs.”

Ruoyu Ying, Cloud Software Engineer at Intel, reflected on the various release names, and summits, over the years. There have been several exciting milestones along the way: the inaugural release and summit, both bearing the name, Austin, to commemorate OpenStack’s birthplace in Austin, Texas; the fourth release, Diablo, which established a bi-annual release frequency and expanded the summit outside of Texas to Santa Clara, California; the ninth release, Icehouse, which heralded a move of the summit outside North America to Hong Kong and invited more developers from Asia to contribute; the eleventh release, Kilo, which expanded the summit into Europe, specifically Paris, France; the 17th release, Queens, that saw the summit move into the southern hemisphere in Sydney, Australia; and ultimately, the 20th release, Train, with the vital change in summit name to OpenInfra to accurately reflect the evolution in the project and community.

In November, the summit will be held on mainland China for the first time, and the team there is looking forward to welcoming the global community with open arms!

Collaboration across Boundaries

Meetups across Silicon Valley and the Pacific Northwest, which were sponsored by Intel, Portworx, Rancher Labs and Red Hat, personified collaboration across projects and communities. Individuals from the OpenStack, Kubernetes, Kata Containers, Akraino, Clear Linux and Project ACRN communities — representing over 50 organizations — came together to celebrate this special milestone with commemorative birthday cupcakes and a strong lineup of presentations focused on emerging technologies and use cases.

Containers and container orchestration technologies were highlights, as Jonathan Gershater, Senior Principal Product Marketing Manager at Red Hat, talked about how to deploy, orchestrate and manage enterprise Kubernetes on OpenStack, while Gunjan Patel, Cloud Architect at Palo Alto Networks, talked about the full lifecycle of a Kubernetes pod. Rajashree Mandaogane, Software Engineer at Rancher Labs, and Oksana Chuiko, Software Engineer at Portworx, delivered lightning talks focused on Kubernetes. Eric Ernst, Kata Containers’ Technical Lead and Architecture Committee Member, and Senior Software Engineer at Intel, talked about running container solutions with the extra isolation provided by Kata Containers, while Manohar Castelino and Ganesh Mahalingam, Software Engineers at Intel, gave demos of many of Kata Containers’ newest features.

Edge computing and IoT were also hot topics. Zhaorong Hou, Virtualization Software Manager at Intel, talked about how Project ACRN addresses the need for lightweight hypervisors in booming IoT development, while Srinivasa Addepalli, Senior Principal Software Architect at Intel, dove into one of the blueprints set forth by the Akraino project—the Integrated Cloud Native Stack—and how it addresses edge deployments for both network functions and application containers.

Beatriz Palmeiro, Community and Developer Advocate at Intel, engaged attendees in a discussion about how to collaborate and contribute to the Clear Linux project, while Kateryna Ivashchenko, Marketing Specialist at Portworx, provided us all with an important reminder about how not to burn out in tech.

Open Culture Drives Evolving Tech

There is incredible strength in the OpenStack community. As noted at the Shanghai event, OpenStack powers open infrastructure across data centers and edge, enabling private and hybrid cloud models to flourish. This strength is due, in part, to the amazing diversity within the OpenStack community.

Throughout its history, OpenStack has been committed to creating an open culture that invites diverse contributions. This truth is evident in many forms: diversity research, representation of women on the keynote stage and as speakers across the summits, speed mentoring workshops, diversity luncheons and more. The breadth of allies and advocates for underrepresented minorities abounds in our community, from Joseph Sandoval, who keynoted at the Berlin summit to talk about the importance of projects like OpenStack in enabling diversity, to Tim Berners-Lee, who participated in the speed mentoring workshop in Berlin, to Lisa-Marie Namphy, who organized and hosted the event in the Silicon Valley and made sure that over 50% of her presenters were women, among many others.

“OpenStack is a strategic platform that I believe will enable diversity.” — Joseph Sandoval, OpenStack User Committee Member and SRE Manager, Infrastructure Platform, Adobe

As OpenStack evolves as the foundation for the open infrastructure, and new projects and technologies emerge to tackle the challenges of IoT, edge and other exciting use cases, diversity — in gender, race, perspective, experience, expertise, skill set, and more — becomes increasingly important to the health of our communities. From developers and coders to community and program managers, ambassadors, event and meetup organizers, and more, it truly takes a village to sustain a community and ensure the health of a project!

Early OpenStack contributor, community architect, and OpenStack Ambassador Lisa-Marie Namphy reflected on OpenStack’s evolution and what she’s most excited about looking forward. As organizer of the original San Francisco Bay Area User Group, which has now expanded beyond just OpenStack to reflect the broader ecosystem of Cloud Native Containers, she has established one of the largest OpenStack & Cloud Native user groups in the world. “Our user group has always committed to showcasing the latest trends in cloud native computing, whether that was OpenStack, microservices, serverless, open networking, or our most exciting recent trend: containers! In response to our passionate and vocal community members, we’ve added more programming around Kubernetes, Istio, Kata Containers and other projects representing the diversity of the open infrastructure ecosystem. It’s as exciting as ever to be a part of this growing open cloud community!” Lisa now works as Director of Marketing at Portworx, contributing to the OpenStack, Kubernetes, and Istio communities.

Looking Forward

As we blow out the birthday candles, we’d like to thank the organizers, sponsors, contributors and participants of these meetup — with a special thank you to Kari Fredheim, Liz Aberg, Liz Warner, Sujata Tibrewala, Lisa-Marie Namphy, Maggie Liang, Shane Wang and Ruoyu Ying.

As we look forward, the OpenStack Foundation has just revealed the name of the project’s next release — Ussuri, a river in China — commemorative of the summit’s next location in Shanghai. “The river teems with different kinds of fish: grayling, sturgeon, humpback salmon (gorbusha), chum salmon (keta), and others.”1 A fitting name to embody diverse projects, communities and technologies working in unison to further innovation!


1 Source: https://en.wikipedia.org/wiki/Ussuri_River



The post Collaboration across Boundaries: Open Culture Drives Evolving Tech appeared first on Superuser.

by Nicole Huesman at October 21, 2019 02:00 PM


Community Blog Round Up 21 October 2019

Just in time for Halloween, Andrew Beekhof has a ghost story about the texture of hounds.

But first!

Where have all the blog round ups gone?!?

Well, there’s the rub, right?

We don’t usually post when there’s one or less posts from our community to round up, but this has been the only post for WEEKS now, so here it is.

Thanks, Andrew!

But that brings us to another point.

We want to hear from YOU!

RDO has a database of bloggers who write about OpenStack / RDO / TripleO / Packstack things and while we’re encouraging those people to write, we’re also wondering if we’re missing some people. Do you know of a writer who is not included in our database? Let us know in the comments below.

Photo by Jessica Furtney on Unsplash

Savaged by Softdog, a Cautionary Tale by Andrew Beekhof

Hardware is imperfect, and software contains bugs. When node level failures occur, the work required from the cluster does not decrease – affected workloads need to be restarted, putting additional stress on surviving peers and making it important to recover the lost capacity.

Read more at http://blog.clusterlabs.org/blog/2019/savaged-by-softdog

by Rain Leander at October 21, 2019 09:17 AM

October 18, 2019

OpenStack Superuser

OpenStack Ops Meetup Features Ceph, OpenStack Architectures and Operator Pain Points

Bloomberg recently hosted an OpenStack Ops Meetup in one of its New York engineering offices on September 3 and 4. The event was well attended with between 40 and 50 attendees, primarily from North America, with a few people even traveling from Japan!

The OpenStack Ops Meetups team was represented by Chris Morgan (Bloomberg), Erik McCormick (Cirrus Seven) and Shintaro Mizuno (NTT). In addition to this core group, other volunteer moderators that lead sessions included Matthew Leonard (Bloomberg), Martin Gehrke (Two Sigma), David Medberry (Red Hat), Elaine Wong-Perry (Verizon), Assaf Muller (Red Hat), David Desrosiers (Canonical), and Conrad Bennett (Verizon) with many others contributing. The official meetups team is rather small, so volunteer moderators make such events come alive and we couldn’t make them happen without all of you, thanks to everyone that helped.

An interesting topic that Bloomberg brought up at this meetup was the concept of expanding the Ceph content. Ceph is a very popular storage choice in production-quality OpenStack deployments, which is shown by the OpenStack user survey and by the fact that Ceph sessions at previous meetups have always been very popular. Bloomberg’s Matthew Leonard suggested to those attending the first Ceph session that we build upon this with more Ceph sessions, and perhaps even launch a separate Ceph operators meetup series in the future. Some of this discussion was captured here. Matthew also lead a group discussion around a deeper technical dive into challenging use cases for Ceph, such as gigantic (multi-petabyte) object stores using Ceph’s RadosGW interface. It’s a relief that we are not the only ones hitting certain technical issues at this scale.

Response from the Ceph users at the meetup was positive and we will seek to expand Ceph content at the next event.

Other evergreen topics for OpenStack operators include deployment/upgrades, upgrades/long-term support, monitoring, testing and billing. These all saw some spirited debate and exchanging of experience. The meetups team also shared some things that the ops community can point to as positive changes we have achieved, such as the policy changes allowing longer retention of older OpenStack documentation and maintenance branches.

To make the event a bit more fun, the meetups team always includes lightning talks at the end of each day. Day 1 saw an “arch show and tell” where those who were willing grabbed a microphone and talked about the specific architecture of their cloud. The variety of OpenStack architectures, use cases, market segments is astonishing.

On day 2, many of the most noteworthy sessions were again moderated by volunteers. Assaf Muller from Red Hat lead an OpenStack networking state of the union discussion, with a certain amount of RDO focus, although not exclusively. Later on Martin Gehrke from Two Sigma ran a double session covering choosing appropriate workloads for your cloud, and then one on reducing OpenStack toil.

As a slight change of pace, David Desrosiers demonstrated a lightning fast developer build of OpenStack using Canonical’s nifty “microstack” snap install of an all-in-one OpenStack instance, although our guest wifi picked this exact moment to pitch a fit – sorry David!

The final technical session of the event was another lightning talk, this time asking the guests to recount their best “ops war stories”. The organizers strongly encouraged everyone to participate, and later on revealed why – we arranged for a lighthearted scoring system and eventually awarded a winner (chosen by the attendees). There were even some nominal prizes! David Medberry moderated this session and it was a fun way to finish off the event.

The overall winner was Julia Kreger from Red Hat, who shared with us a story about “it must be a volume knob?” – it seems letting visitors near the power controls in the data center isn’t a great idea? Well, let’s just say it’s probably best if you try and hear Julia tell it in person!

The above gives just a brief flavor of the event and sorry for those sessions and moderators I didn’t mention. The next OpenStack Ops Meetup is expected to be somewhere in Europe in the first quarter of 2020.

Cover Photo courtesy of David Medberry

The post OpenStack Ops Meetup Features Ceph, OpenStack Architectures and Operator Pain Points appeared first on Superuser.

by Chris Morgan at October 18, 2019 02:00 PM

October 17, 2019


Tips for taking the new OpenStack COA (Certified OpenStack Administrator) exam – October 2019

Mirantis will be providing resources to the OpenStack Foundation, including becoming the new administrators of the upgraded Certified OpenStack Administrator (COA) exam

by Nick Chase at October 17, 2019 01:36 AM

Sean McGinnis

September 2019 OpenStack Board Notes

There was another OpenStack Foundation Board of Directors conference call on September 10, 2019. There were a couple of significant updates during this call. Well, at least one significant for the community, and one that was significant to me (more details below).

In case this is your first time reading my BoD updates, just a reminder that upcoming and past OSF board meeting information is published on the wiki and the meetings are open to everyone. Occasionally there is a need to have a private, board member only portion of the call to go over any legal affairs that can’t be discussed publicly, but that should be a rare occasion.

September 10, 2019 OpenStack Foundation Board Meeting

The original agenda can be found here. Usually there are official and unofficial notes sent out, but at least at this time, it doesn’t appear Jonathan has been able to get to that. Watch for that to show up on the wiki page referenced in the previous section.

Director Changes

There were a couple of changes in the assigned Platinum Director seats. The Platinum level sponsors are the only seats on the board that are guaranteed to the sponsor and allows them to assign a Director. So no change in sponsorships at this point, just a couple of internal personel changes that led to these changes.

With all the churn and resulting separation of Futurewei in the US from the rest of Huawei, their chair seat was moved over to Fred Li. I worked with Fred quite a bit during my time with the company. He’s a great guy and has put in a lot of work, mostly behind the scenes, to support OpenStack. Really happy to be able to work with him again. Anni has also done a lot over the years, so sad to see her go. I’m sure she will be quite busy on new things though.

On the Red Hat side, Mark McLoughlin has transitioned out, handing things over to Daniel Becker. It sounds like with the internal structure at Red Hat, Daniel is now the better representative for the OpenStack Foundation. I personally didn’t get a lot of opportunity to work with Mark, but I know he has been around for a long time and has done a lot of great things, so I’m a little sad to see him go. But also looking forward to working with Daniel.

Director Diversity Waiver

This was the significant topic to me, because, well… it was about me.

In June I switched employers, going back to Dell EMC. So far, I’ve been very happy, and it feels like I’ve gone bake home with the 14+ years between Compellent and Dell that I had prior to joining Huawei. Not that my time with Huawei wasn’t great. I think I learned a lot and had some opportunities to do things that I hadn’t done before, so no regrets.

But the catch with my going back to Dell was that they already have a Gold sponsor seat with Arkady Kanevsky and a community spot with Prakash Ramchandran.

The OpenStack Foundation Bylaws have a section (4.17) on Director Diversity. This clause limits the number of directors that can be affiliated with the same corporate entity to two. So even though Prakash and I are Individual Members (which means we are there as representatives of the community, not as representatives of our company), my move to Dell now violated that clause.

I think this was added to the bylaws back in the days where there were a few large corporate sponsors that had large teams of people dedicated to working on OpenStack. It was a safeguard to ensure no one company could overrun the Foundation based solely on their sheer number of people involved. That’s not quite as big of an issue today, but I do still think it makes sense. It is a very good thing to make sure any group like this has a diversity of people and viewpoints.

The bylaws actually explicitly state what should happen in my situation too - Article 4.17(d) states:

If a director who is an individual becomes Affiliated during his or her term and such Affiliation violates the Director Diversity Requirement, such individual shall resign as a director.

As such, I really should have stepped down on moving to Dell.

But luckily for me, there is also a provision called out in 4.17(e):

A violation of the Director Diversity Requirement may be waived by a vote of two thirds of the Board of Directors (not including the directors who are Affiliated)

This meant that 2/3 of the Board members present, not including any of us from Dell, would have to vote in favor of allowing me to continue out my term. If less than that were in favor, then I would need to step down. And presumably there would just be an open board seat for the rest of the term until elections are held again.

There was brief discussion, but I was very happy that everyone present did vote in favor of allowing me to continue out my term. I kind of feel like I should have stepped out during this portion of the call to make sure no one felt pressure by not wanting to say no in my presence, but hopefully that wasn’t the case for anyone. It was really nice get these votes, and some really good back channel support from non-board attendees listening in on the call.

What can I say - compliments and positive reinforcement go far with me. :)

So I’m happy to say I will at least be able to finish out my term for the rest of 2019. I will have to see about 2020. I don’t believe Arkady nor Prakash are planning on going anywhere, so we may need to have some internal discussions about the next election. Or, probably better, leave it up to the community to decide who they would like representing them for the Individual Director seats. Prakash has been doing a lot of great work for the India community, so if it came down to it and I lost to him, I would be just fine with that.

OW2 Associate Membership

Thierry then presented a proposal to join OW2 as an Assocaite Member. OW2 is “an independent, global, open-source software community”. So what does that mean? Basically, like the Open Source Initiative and others, they are a group of like-minded individuals, companies, and foundations that work together to support and further open source.

We (OpenStack) have actually worked with them for some time, but we had never officially joined as an Associate Member. There is no fee to join at this level, and it is really just formalizing that we are supportive of OW2’s efforts and willing to work with them and the members to help support their goals.

They have been in support of OpenStack and open infrastructure for years, so it was great to approve this effort. We are now listed as one of their Associate Members.

Interop WG Guidelines 2019.06

Egle moved to have the board approve the 2019.06 guidelines. We had held an email vote for this approval, but since we did not get reponses from every Directory, we now performed a vote in-meeting to record the voting. All present were in favor.

The interop guidelines are a way to make sure all OpenStack deployments conform to a base set of requirements. This makes sure that an end user of an OpenStack cloud has at least some level of assurance that they can move from one cloud to another and not getting a wildly different user experience. The work of the Interop Working Group has been very important to ensuring this stability and helping the ecosystem around OpenStack grow.


Prakash gave a quick update on the meetups and mini-Summits being organized in India. Sounds like a lot of really good activity happening in various regions. It’s great to see this community being supported and growing.

Alan also made a call for volunteers for the Finance and Membership committees. I had tried to get involved earlier in the year, but I think due to timing there really just wasn’t much going on at the time. With the next election coming up, and some changes in sponsors, now is actually a good time for the Membership Committee to have some more attention. I’ve joined Rob Esker to help review any new Platinum and Gold memberships. Sounds like we will have at least one new one of those coming up soon.

Summit Events

It wasn’t really an agenda topic for this Board Meeting, but I do think it’s worth pointing out here that the proposed changes to the structure of our yearly events have gone through and 2020 will start to diverge from the typical pattern we have had so far of holding to major Summits per year.

Erin Disney sent out a post about these changes to the mailing list. We will have a smaller event focused on collaboration in the spring, then a larger Summit (or Summit-like) event in later in the year.

With the maturity of OpenStack and where we are today, I really think this makes a lot more sense. There simply isn’t enough big new functionality and news coming out of the community today to justify two large marketing focused events like the Summit per year. What we really need now is to foster the environment to make sure the developers, operators, and others that are working on implementing new functionality and fixing bugs have the time and venue they need to work together and get things done. Having these smaller events and supporting more things like the regional Open Infrastructure Days will hopefully help keep that collaboration going and allow us to focus on the things that we need to do.

And the next event will be in beautiful Vancouver again, so that’s a plus!

by Sean McGinnis at October 17, 2019 12:00 AM


Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.


Last updated:
January 27, 2020 03:38 AM
All times are UTC.

Powered by: